 Home > Medical News > Medicines Company News > How does "machine learning" help new drug development?

How does "machine learning" help new drug development?

 Last Update: 2021-10-01
 Source: Internet
 Author: User

Tags

drug classification system

1 dimensional

Search more information of high quality chemicals, good prices and reliable suppliers, visit www.echemi.com

Today’s new drug discovery is inseparable from the support of computing disciplines, and various computing-related technologies have also attracted the attention of the industry due to the research and development of new drugs
.
Machine learning, as an important branch of AI, has attracted great attention from R&D and investment banks by virtue of its advantages such as assisting in the discovery of potential compounds, predicting related parameters, saving test costs, and shortening the development cycle
.
This manuscript provides an overview of the history of machine learning and its application in the field of medicine, in order to learn together with colleagues
.
Future: Precision medicine & drug discovery In recent years, the concept of precision medicine has been mentioned more and more.
It emphasizes the prevention and treatment of diseases based on individual differences (including genes & environment & life>
.
For this reason, a large amount of biomedical data has been generated in recent years, and its sources are very diverse: from small laboratories to large-scale multi-center research; these data are mainly called omics data (genomics, proteomics, metabonomics Science, pharmacogenomics, etc.
) is an inexhaustible source of information in the scientific community, which can be used to classify patients, obtain specific diagnoses, and develop new treatment methods
.
In the past ten years, the rapid improvement of computing power has gradually formed a competition with high-throughput screening in the traditional drug discovery process
.
Machine learning (ML), as a branch of artificial intelligence, has multiple methods used in the drug discovery process to predict the molecular characteristics, biological activities, interactions, and adverse reactions of new chemical entities
.
These algorithms are changing the traditional model of new drug discovery
.
Figure 1.
1 Under the background of precision medicine-the process of new drug discovery (see references) The development of ML in the field of Drug-Discovery In 1964, the Hansch equation was proposed, and the linear regression model of physical and chemical descriptors (such as hydrophobic parameters, electronic parameters and spatial parameters) , Began to be used to describe the two-dimensional structure-activity relationship, the concept of QSAR gradually deepened and developed
.
In 1998, when the concept of drug-like properties was put forward, researchers began to establish a model that could efficiently predict whether a molecule had drug potential, and gradually deepened it from the 1D/2D descriptor
.
But in general, before 2000, there were not many applications of ML in the field of drug discovery.
The main reason was the availability of data
.
In 2004, the development of PubChem and ZINC databases laid the foundation for the development of ML in drug discovery; and in 2006 and 2008, DrugBank and ChEMBL were developed, which greatly met the aforementioned data availability issues
.
In 2016, Molecular Graph Convolutions was officially released, and the results of related researchers were also published in Cell magazine in 2020, further demonstrating the potential of machine learning in this field, and discovered a molecule with antibacterial activity, halicin, and used it in the laboratory.
Has been verified
.
Figure 2.
1: Timeline of main events in the field of drug discovery-machine learning (see references) ML operation process The ML method in the field of drug discovery covers the following steps: 1) data collection; 2) mathematical descriptor generation; 3) search for the most variable Good subset; 4) Model training; 5) Model verification
.
Figure 3.
1 Drug Discovery-Machine Learning Method (see References) As mentioned above, the first is to collect data.
In addition to contributing to activity, selectivity, metabolism, toxicity, physical and chemical properties, data needs to be easy to manufacture and other attributes; For molecular and peptide drugs, SMILES and FASTA formats can be used to represent the sequence of the structure; databases such as DrugBank, PubChem, ChEMBL, ZINC, etc.
, have a large amount of data storage information
.
With the generation of mathematical descriptors (PCA, t-SNE, FS, Autoencoder related technologies), a series of data can be obtained, and the ML model can process these data
.
The data can be divided into two subsets, high-proportion data is used for model training, and low-proportion data is used for testing.
This process can obtain the best subset of variables
.
After the model is trained, follow-up verification can be completed accordingly.
If the verification result is statistically significant, it can be said that a new drug prediction model has been created
.
PS: The best model is to achieve the highest performance value at the lowest total cost
.
Input data-an extremely important part of the training of the model is the input of representative molecular descriptors, which are further related to QSAR, molecular descriptors, computational information fingerprints, graph-based machine algorithms, and so on
.
QSAR QSAR is related in the form of numerical values through the relationship between structure and activity; that is, through the integration of calculation and statistics, the biological activity is theoretically predicted, so that the theoretical design of possible new drugs in the future can be carried out, and the research and development costs can be theoretically saved
.
To conduct QSAR research, three types of information are needed: 1) the molecular structure of different compounds with a common mechanism of action; 2) the biological activity data of each ligand; 3) the physical and chemical properties
.
Molecular descriptor MD, that is, a numerical representation of molecules that quantitatively describe the corresponding physical and chemical properties; accordingly, researchers can find molecules with similar physical and chemical properties based on the similarity with the calculated descriptor values
.
Molecular descriptors can be divided into two categories: 1) experimental measurement values, such as logP, dipole moment, polarizability, etc.
; 2) theoretical values, such as structure, topology, geometry, electronics, physics, and so on
.
Theoretical molecular descriptors can create 0D/1D/2D/3D/4D/5D/6D descriptors according to their dimensions, and 3D/4D is the most in-depth study
.
Calculating information fingerprint FP is a special form of molecular descriptor, which expresses the molecular structure quickly and effectively through a bit vector with a fixed length to indicate the presence or absence of internal substructures or functional groups
.
However, fingerprints derived from chemical structures ignore biological characteristics, so that the degree of correlation between molecular structure and biological activity is reduced, so that small changes in the former will produce substantial differences in biological activity
.
FP is often associated with MACCS, Pubchem, CDK, etc.
in computing work
.
Graph-based machine algorithm The representation of compound structural formula in graphs is mainly a molecular network.
Each atom in the network is represented as a node in the network, and the algorithm used is mainly an artificial neural network
.
As early as 2009, some researchers proposed a graph neural network model; in 2016, researchers from Stanford University and Google developed a molecular convolution graph, and it is precisely because of the application of convolution algorithms to graphs that the drug discovery Computing research has taken a step forward
.
ML&Biological Issues The complexity of modern biology makes computing an indispensable tool to support biological experiments, because they allow a large amount of information to be processed with high-precision coding of theoretical models, thereby facilitating and accelerating the development of new drugs
.
Whether it is from hit-to-lead or a certain degree of ADMET, calculations can give certain predictions
.
By drawing a sample of articles from 2016 to 2020, statistically related biological issues are as follows
.
Figure 5.
1 Biological problems solved by sample articles from 2016 to 2020 (see references) As mentioned above, the highest proportion is "drug-target interaction"
.
Target research is at the forefront of disease and drug discovery
.
Needless to say, the importance of this "beginning" is self-evident .
Compound-protein interactions have become a prerequisite for the discovery of new drugs.
For example, the use of the PDB database provides a large amount of data for the interaction by accumulating a large number of receptor-ligand crystals, which is essential data for drug computing researchers.
Sources, and at the same time, a lot of software for measurement was born, such as MPLs-Pred
.
In-depth research on the future development trend of ML Bayes, support vector machines, decision trees, and artificial neural networks will undoubtedly greatly contribute to the accuracy of machine learning; and structure-based drug design will be more inseparable from machine learning to achieve Fast, efficient, and low-cost industry requirements
.
However, the advantages of machine learning have been demonstrated by a large number of studies, but it has to be said that there are no marketed drugs that have been developed based on machine learning and artificial intelligence as the core technologies
.
Therefore, drug discovery based on machine learning has always been questioned by the industry
.
However, major technological breakthroughs are often accompanied by extreme doubts in the early stage, and once a qualitative leap is achieved, it will surely receive a greater return on investment
.
Machine learning and artificial intelligence are working hard, and the future can be expected! References: 1.
review on machine learning approaches and trends in drug discovery.
doi.
org/10.
1016/j.
csbj.
2021.
08.
011 2.
AI-based language models powering drug discovery and development.
doi.
org/10.
1016/j.
drudis.
2021.
06.
009 3.
Integration of AI and traditional medicine in drug discovery.
doi.
org/10.
1016/j.
drudis.
2021.
01.
008

This article is an English version of an article which is originally in the Chinese language on echemi.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to service@echemi.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.