Modeling RNA Virus and Vaccine Structure

By William Whitford (DPS Group)

​In February of 2022, the efforts of Xavier Health were assumed by the AFDO/RAPS Healthcare Products Collaborative. Because of the important work done before this transition, the Collaborative has chosen to retain documents that have Xavier branding and continue to provide them to the communities through this website.  If you have questions, please contact Timothy Hsu, Director of Health Technology Initiatives, at thsu@healthcareproducts.org.

Purpose: AI has applications throughout the value chain, and here we introduce an application in product development.

AI in Modern Medicine

Artificial Intelligence (AI) is one of the major techniques defining our modern era of medicine. The various machine learning algorithms employed in drug design include Supervised Learning, Unsupervised Learning, Semi-Supervised Learning, Reinforcement Learning, and Deep Learning. Data-related challenges related to drug design include collection, representation, normalization, characterization, heterogeneity, dimensionality, uncertainties, and bias. Algorithm challenges include multi-objective optimization difficulty, reproducibility, cofounders, model appropriateness, catastrophic forgetting, language, and AI adoption. Through QSAR (quantitative structure-activity relationship), AI and machine learning (ML) have proven their power in predicting such drug properties as drug interactions and binding affinities, solubility, and toxicology [ 1 ]. We are encouraged by the progress being made, and hopeful for the potential AI can bring to further our modern era of medicine. 

RNA Virus

An RNA virus, such as SARS-CoV-2, contains ribonucleic acid (RNA). However, there are many types of RNA viruses and they exhibit a huge diversity in the nature of that RNA. RNA structure types include positive- and negative-sense sequences, single or double strands of RNA, single or multiple pieces of RNA, the amount of RNA enclosed in the capsid, and any secondary structure of the RNA [ 2 ]. For example, coronaviruses contain a very large (~30 kb) non-segmented, positive-sense RNA genome [ 3 ]. 

RNA Vaccines

Conventional vaccines against a virus contain small amounts of inactivated virus or isolated viral proteins (the viral antigen). Instead of viral proteins, mRNA vaccines deliver mRNAs containing the instructions to make these viral antigens inside the body. mRNA vaccines are non-infectious (no chance of giving the disease), non-integrating (“non-GMO”), and cell-free [ 4 ].

There is ongoing interdisciplinary research into the optimization of mRNA vaccines, as well as into the function, structure, and processing of RNAs. These studies involve biology, chemistry, engineering, and pharmacology [ 5 ].

RNA Structure

RNA has both the ability to code information in its sequence, and the ability to form complex three-dimensional structures that can have catalytic and regulatory roles.  Knowledge of RNAs secondary structure is essential for modeling RNA structures and understanding their functional mechanisms.

Our understanding of the roles of these higher RNA structures remains rudimentary and has generally been determined by X-ray crystallography, nuclear magnetic resonance (NMR), or cryogenic electron microscopy. The development an accurate and cost-effective computational method for the prediction of RNA’s secondary structure from its sequence is still needed [ 6 ].

AI in RNA Studies

The Atomic Rotationally Equivalent Scorer (ARES) is based upon a deep neural network, a form of machine learning.  While the program can be applied to many problems in molecular structure, it was recently applied to RNA [ 7 ]. The interactions of four RNA nucleotides that govern base pairing and simple helix formation are well understood. The secondary structures formed often assemble as fairly rigid elements that interact to form more complicated tertiary structures. Knowledge of the RNA secondary structure is essential for modeling overall RNA structures as well as for understanding their functional mechanisms.

There have been RNA secondary-structure prediction methods, classified as comparative sequence analysis or folding algorithms, with thermodynamic, statistical, or probabilistic scoring schemes. These methods can work to some degree, for some RNAs and structures, if many sequences are available and are manually aligned with expert knowledge. The rapid, accurate and economical modeling of complex RNA structures has proven to be difficult. 

Recently, ARES was given a set of known RNA structures plus a large number of incorrect variations of the structures. It was then used to learn the how arrangements of each atom are positioned relative to each other. The ARES neural network then computed these features to recognize base pairs, helices, and more complex structures. ARES learned how patterns of base pairing directed higher level structures without being provided any information about these RNA structures [ 7 ]. While ARES was trained on very simple RNA systems, its final scoring function was able to predict structures of more complex RNA.  While still not complete or perfect, it does represent notable progress in the field. 

There are three challenges in modeling 3D RNA structures: 1) generating reasonable potential structures, 2) accurately modeling those that best represent the actual form, and 3) employing models to discover underlying functional motifs and use them to understand how three-dimensional structures regulate biological processes. The ARES machine-learning approach addressed the second of these. 

Hopefully, these AI driven deep learning strategies will create new scoring functions both predicting near-native structures and identifying regions most likely to form three-dimensional structures. Another goal is to directly incorporate experimental information into machine-learning strategies for modeling RNA tertiary structure. As it becomes possible to understand the principals, and predict the details of tertiary RNA structures, powerful new biological functions will no-doubt be revealed.


For Discussion: What thoughts do you have on the possibilities of AI in the role of HMW (high molecular weight) biomolecule structure elucidation (and please share any other articles you feel are relevant)?

 

REFERENCES

1. Artificial Intelligence in Drug Discovery – BioTechniques

2. RNA virus – Wikipedia

3. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4369385/

4. https://www.cdc.gov/coronavirus/2019-ncov/vaccines/different-vaccines/mRNA.html?s_cid=11344:what%20is%20mrna%20vaccine:sem.ga:p:RG:GM:gen:PTN:FY21

5. COVID-19 vaccine: What’s RNA research got to do with it? : NewsCenter (rochester.edu)

6. http://www.bioinf.man.ac.uk/resources/phase/manual/node72.html

7. R. J. L. Townshend et al., Science 373, 1047 (2021)