-
Categories
-
Pharmaceutical Intermediates
-
Active Pharmaceutical Ingredients
-
Food Additives
- Industrial Coatings
- Agrochemicals
- Dyes and Pigments
- Surfactant
- Flavors and Fragrances
- Chemical Reagents
- Catalyst and Auxiliary
- Natural Products
- Inorganic Chemistry
-
Organic Chemistry
-
Biochemical Engineering
- Analytical Chemistry
- Cosmetic Ingredient
-
Pharmaceutical Intermediates
Promotion
ECHEMI Mall
Wholesale
Weekly Price
Exhibition
News
-
Trade Service
Deep Mind, Google's artificial intelligence (AI) company, this year unveiled the predicted structure of 220 million proteins, covering nearly every protein
in a DNA database of known organisms.
Now, another tech giant is filling the protein universe with dark matter
.
Researchers at Meta (formerly Facebook) used artificial intelligence to predict the structure of about 600 million proteins from bacteria, viruses and other microbes
that have not yet been characterized.
The study was published Nov.
1 on the preprint site BioRxiv
.
"These are very mysterious proteins that offer the possibility
of gaining insight into biology.
" Alexander Rives, head of research in the Meta AI protein team, said
.
The team generated these predictions
using a "large language model.
" A "large language model" is a type of artificial intelligence that serves as the basis for
tools that predict text from a few letters or words.
Usually language models are trained on the basis of a large amount of
text.
To apply it to proteins, Rives' team "feeded" them known protein sequences that could be represented by 20 different amino acid chains, each represented by a letter
.
The model then learned to "autocomplete" proteins
in the case of ambiguous amino acid ratios.
Rives says this training gives the model an intuitive understanding of protein sequences, which contain information about the shape of
proteins.
The second step, inspired by DeepMind's pioneering artificial intelligence algorithm for protein structure, AlphaFold, combines this insight with information about the relationships between known protein structures and sequences to generate predictive structures
from protein sequences.
Earlier this summer, Rives' team reported that its model algorithm, called ESMFold, is not as accurate as AlphaFold, but about
60 times faster at predicting structures.
"This means we can scale structure prediction to a much larger database
.
" Rives said
.
As a test case, the team decided to apply the model to a large-scale sequencing database of "metagenomic" DNA from the environment, including soil, seawater, human gut, skin, and other microbial habitats
.
The vast majority of DNA entries encoding potential proteins come from organisms that have never been cultured and are unknown to scientists
.
In total, the Meta team predicted the structure of more than 617 million proteins, and the work took only two weeks
.
Rives says predictions are free and can be used by anyone, just like
the underlying code of the model.
Of those 617 million predictions, the model considers more than one-third of the predictions to be of high quality, so researchers can be confident that the overall shape of the protein is correct, and in some cases, the model can identify finer atomic-level details
.
It's worth mentioning that millions of these structures are completely new, unlike
the experimentally determined protein structure database, or the AlphaFold database predicted from known organisms.
A large portion of the AlphaFold database is made up of structures that are nearly identical to each other, while the metagenomic database is supposed to cover a large portion
of the never-before-seen protein universe.
Sergey Ovchinnikov, an evolutionary biologist at Harvard University, is skeptical
of ESMFold's hundreds of millions of predictions.
He believes that some proteins may lack a defined structure, while others may be noncoding DNA, mistaken for protein-coding material
.
Burkhard Rost, a computational biologist at the Technical University of Munich in Germany, was impressed
by the speed and accuracy of Meta's model.
But he questioned whether predicting proteins from metagenomic databases was really more
accurate than AlphaFold.
Prediction methods based on language models are better suited for quickly determining how mutations change protein structure, which
AlphaFold cannot do.
According to a representative of DeepMind, the company currently has no plans to make metagenomic structure predictions in its database, but does not rule out the possibility of
doing so in the future.
Martin Steinegger, a computational biologist at Seoul National University in South Korea, believes that the next step in such tools is clearly to study dark matter
in biology.
"We'll soon see an explosion in the analysis of these metagenomic structures
.
"
Related paper information: https://doi.
org/10.
1101/2022.
07.
20.
500902