The Human Genome is the set of nucleic acid sequences encoded as DNA within 23 pairs of chromosomes. It has 3 billion base pairs which make up an estimated number of 30,000 gene pairs. Though the entire human DNA was sequenced under the Human Genome project by 2003, we are still oblivious to the cellular function of most of these genes.
These genes have to be transcripted and further translated with precision and accuracy every time a protein is synthesised. A single glitch can cause ‘mutation’, which can, in turn, cause Cancer, Alzheimer’s, and other neurodegenerative diseases. The key to understanding the origin of these diseases is understanding the molecular event that causes malfunctioning protein molecules.
A team of seven researchers at St. John’s College, University of Cambridge, developed an algorithm to determine if artificial intelligence can make more advanced discoveries in the field of protein behaviour better than humans. The related research paper was published in the Proceedings of the National Academy of Science (PANS) Journal.
This research work aimed to establish an understanding of the molecular basis of the protein-rich condensation process that many proteins undergo to form subcellular components. These processes are linked to various physiological functions. The study aims to develop a common global understanding of how the protein structure determines its behaviour.
The protein behaviour predictor
This algorithm is a protein behaviour predictor named DeePhase. It has been developed along the lines of the algorithms used by Netflix, Amazon, and Facebook. This is a good strategy as the algorithms used by these tech giants are trained to make an informed prediction about the user’s choices based on the previous data. For example, Every time Alexa talks back and answers, she does so by recognising patterns in the user’s behaviour.
This algorithm can predict the biological language of the proteins involved in Cancer, Alzheimer’s, Parkinson’s, and many more neurodegenerative diseases along similar lines. In the study, a neural network-based model was trained ‘to learn’ the language of proteins to understand the molecular grammar of the cell and further identify any glitches that may occur in it.
“We specifically asked the programme to learn the language of shapeshifting biomolecular condensates – droplets of proteins found in cells – that scientists really need to understand to crack the language of biological function and malfunction that cause cancer and neurodegenerative diseases like Alzheimer’s,” said Dr Kadi Liis Saar, one of the authors of the study
The team believes that using natural language processing technology is critical in identifying the molecular origin of a glitch that could cause protein malfunction and eventually a fault in gene expression. Early prediction can help in the treatment of the affected individual to correct the grammatical mistake occurring in the translation of RNA to proteins inside the cell.
Why is this study critical?
Understanding these neurogenerative diseases at a molecular level is critical, as their treatment at the pathological level has not yielded much success yet. For instance, the mortality rate in Cancer is still very high, as is the case with Alzheimer’s. This study could help scientists in understanding over a few hundred neurodegenerative diseases.
The lead author of the study, Professor Tuomas P.J. Knowles believes that using machine learning technology in the study of neurodegenerative diseases is an ‘absolute game changer’. He said, “Ultimately, the aim will be to use artificial intelligence to develop targeted drugs to dramatically ease symptoms or to prevent dementia from happening at all.”
Future advancements in the field can create more powerful algorithms that can compute even more complex data. The free availability of this algorithm can also enhance the possibility of developing a cure to nip these diseases in their bud.
Read the full paper here.