About 70 percent of human proteins contain at least one sequence consisting of a single amino acid that is repeated many times, interspersed with some other amino acids
These "low complexity regions (LCRs)" are also present in most other organisms
The proteins that contain these sequences have many different functions, but MIT biologists have now come up with a way to identify and study
them as a unified group.
Their technique allows them to analyze similarities and differences between LCRs of different species and helps them determine the function of these sequences and the proteins
in which they reside.
Using their technique, the researchers analyzed all the proteins
found in 8 different species, from bacteria to humans.
They found that while LCRs will vary in different proteins and species, they often have a similar role — helping proteins join larger combinations, such as nucleoli, a type of organelle
found in almost all human cells.
Byron Lee, a graduate student at MIT, said: "We didn't study specific LCRs and their functions because they are involved in different processes, so they appear to be separate, and our broader approach allows us to see similarities between their properties, suggesting that the function of
LCRs may not be completely different.
" The researchers also found some differences between LCRs sequences in different species and showed that these species-specific LCRs sequences correspond to species-specific functions, such as forming plant cell walls
Lee and graduate student Nima Jaberi-Lashkari are the lead authors of the study, and the article was published today in eLife
Eliezer Calo is an assistant professor of biology at the Massachusetts Institute of Technology and the senior author
of the paper.
Previous studies have shown that LCRs are involved in a variety of cellular processes, including cell adhesion and DNA binding
These LCRs are usually rich in a single amino acid such as alanine, lysine, or glutamic acid
Finding these sequences and then studying their function one by one was a time-consuming process, so the MIT team decided to use bioinformatics — a method that uses computational methods to analyze large amounts of biological data — to evaluate
them as a larger group.
"What we want to do is take a step back, instead of looking at individual LCRs, try to look at all the LCs and see if we can observe some patterns on a larger scale, which may help us figure out what the LCRs with the specified function are doing, and it also helps us understand what some LCRs that don't have a specified function are doing
," Jaberi-Lashkari said.
To do this, the researchers used a technique called dot matrix, a method of visually representing amino acid sequences that can generate images
of each protein in the study.
They then used computational image processing methods to compare thousands of such matrices
Using this technique, the researchers were able to classify
LCRs based on the amino acids that are most frequently repeated in LCRs.
They also grouped
proteins containing LCRs based on the copy number of each LCRs type found in the proteins.
Analyzing these traits could help researchers learn more about the function of
As a demonstration, the researchers picked out a human protein called RPA43, which has three lysine-rich LCRs
This protein is one of many subunits that make up an enzyme called RNA polymerase 1, which synthesizes ribosomal RNA
The researchers found that the copy number of LYSINE-rich LCRs is important for helping proteins integrate into the nucleoli, the organelle responsible for synthesizing ribosomes
In comparing proteins found in 8 different species, the researchers found that some LCRs types are highly conserved between species, meaning that sequences change little on evolutionary
These sequences tend to be present in proteins and highly conserved cellular structures, such as nucleoli
"These sequences seem to be important for the assembly of certain parts of the nucleoli," Lee said
"Some of the principles that are known to be important for higher-order assembly seem to be working, as copy numbers may control how many interactions proteins can perform, which is important
for protein integration into the compartment.
The researchers also found differences
in LCRs between the two different types of proteins involved in nucleolar assembly.
They found that a nucleolin called TCOF contains a number of glutamic acid-rich LCRs that can help form scaffold assembly, while only a few of these glutamate-rich LCRs can be recruited as clients (proteins that interact with scaffolds
Another structure that appears to have many conserved LCRs is the nucleus plaque, which is found
within the nucleus.
The researchers also found many similarities between LCRs, which are involved in forming larger combinations, such as the extracellular matrix, a molecular network that provides structural support to animal and plant cells
The team also found some examples of structures with LCRs that appear to diverge
For example, plants have unique LCR sequences in the proteins they use to support cell walls, which are not seen in other types of organisms
The researchers now plan to extend their LCRs analysis to other species
"There's a lot to explore because we can extend this map to any species," Lee said
"This gives us the opportunity and framework to identify new combinations of organisms