The document discusses RNA 3D structure prediction using the Nucleic Acid Simulation Toolkit (NAST). NAST is a coarse-grained modeling tool that produces plausible 3D structures based on primary sequence, secondary structure, and optional tertiary contacts. It summarizes simulations run with NAST on two RNA molecules, 1ZIH and 4JF2, starting from unfolded states. The results show NAST can generate structures close to the crystal structures, and secondary structure constraints improve accuracy. Allowing some errors in secondary structure has little effect, but accuracy decreases as error rates rise above 15-25%. NAST structures are more similar to each other than crystal structures, suggesting it favors certain topologies.
3. RNA folding vs Protein folding
RNA 3D Structure Prediction Tools
• Manual
• Automatic
• Full atomic
• Coarse grained
• Physics based
• Knowledge based
Background
4. Introduction to NAST
Nucleic Acid Simulation Toolkit (NAST)
• Funded by the Simbios National Center for Biomedical Computing
• A knowledge-based coarse-grained tool for modeling RNA structures. It
produces a diverse set of plausible 3D structures that satisfy user-provided
constraints based on:
• 1. Primary sequence
• 2. Known or predicted secondary structure
• 3. Known or predicted tertiary contacts (optional)
Requirements:
• Python 2.6.x
• PyOpenMM 2.0.0
(3.0.0 won't work!)
https://simtk.org/home/nast
Jonikas MA, Radmer RJ, Laederach A, Das R, Pearlman S, Herschlag D, Altman RB. Coarse-grained modeling of large
RNA molecules with knowledge-based potentials and structural filters. RNA. 2009 Feb;15(2):189-99.
5. Advantages
• Provide information about the likely topology of a
molecule
• Provide a good starting point for higher resolution
atomic models
• Be able to handle large molecules (> 76nt)
• Much faster than full-atomic simulation tools
• 1,000,000 steps within 138s
• Allow uncertainty in the secondary structure (within a
certain level)
Introduction to NAST
6. How to use NAST?
• Primary Sequence File
• Go to http://www.rnasoft.ca/strand/
• Search for your structure and get a BPSEQ file
• Use "parseBPseq.py" file in the package to generate a sequence
file
• Secondary Structure File
• Use secondary structure prediction tool
• e.g., Mcgenus
• http://eole2.lsce.ipsl.fr/ipht/tt2ne/mcgenus.php
• Tertiary Contacts File (optional)
• From experiments or phylogenetic analysis
Introduction to NAST
7. PDB ID 1ZIH
389 atoms 12 residues
Test Molecule Used
9. Definition of q value
q is a normalized measure of similarity between a
reference and comparison structure:
10. RMSD
Mean: 2.683
Sd: 0.449
3
2
3.5
4
Simulations
2.5
q value
(ref.: crystal structure)
Mean: 0.250
Variance: 0.00686
1ZIH from an Unfolded Circle State 1,000,000 steps
q value
Mean: 0.246
Variance: 0.00686
RMSD
Mean: 2.704
Sd: 0.454
Reference value:
11. Definition of GDT_TS Score
GDT_TS score
The Global Distance Test Total Score (GDT_TS) of Ca atoms is used
to assess the correctness of the predicted model. GDT_TS has been
commonly used in modeling studies and in the CASP community.
GDT_TS is defined as:
where N in the total number residues of a target, GDTd is the number of aligned
residues whose Ca-atom distance between the native structure and predicted
model is less than d A (angstrom) after superposition of the two structures; and d
is 1, 2, 4, and 8 A (angstrom).
•Zemla A: LGA: a method for finding 3D similarities in protein structures. Nucleic
Acids Res. 2003, 31: 3370-3374.
18. Without Secondary Structure
Constraints
With Secondary Structure Constraints
• q value (ref: crystal)
q value
Mean: 0.130
Variance: 0.00458
q value
Mean: 0.246
Variance: 0.00686
1ZIH from Crystal Structure 1,000,000 steps
Effect of Secondary Structure
19. RMSD
mean: 22.860
Sd: 4.798
Without Secondary Structure
Constraints
With Secondary Structure Constraints
• RMSD
RMSD
Avg.: 11.378
Sd: 1.176
4JF2 From Crystal Structure 1,000,000 steps
30
25
20
15
10
5
10
5
Effect of Secondary Structure
20. Without Secondary Structure
Constraints
With Secondary Structure
Constraints
• q value (ref: crystal)
q value
Mean: 0.125
Variance: 0.000964
q value
Mean: 0.0761
Variance: 0.00136
4JF2 From Crystal Structure 1,000,000 steps
Effect of Secondary Structure
21. Effect of Secondary Structure
• Simulations with different percentage of wrong pairs in secondary structure
(600, 000 steps)
Mean Std.
0% 6.0969 2.5971
15% 5.4951 1.8054
25% 4.4746 1.2748
35% 2.6558 2.0450
22. q value
Mean: 0.3969
Variance: 0.02268
System Consistency 1ZIH from an Unfolded Circle State
Reference Model: resulted structure
from simulation with crystal structure
(1,000,000 steps)
Reference Model:Crystal Structure
(1,000,000 steps)
q value
Mean: 0.246
Variance: 0.00686
23. Reference Model: resulted structure
from simulation with crystal structure
(1,000,000 steps)
q value
Mean:0.223
Variance: 0.00276
4JF2 From Unfolded Circle stateSystem Consistency
q value
Mean: 0.128
Variance:0.000788
Reference Model:Crystal Structure
24. Folding result from NAST is able to provide a basic idea
of the structure for a given sequence.
Small proportion of mistakes doesn’t really influence
folding result but this holds only within a certain level.
The simulation will more likely to generate a folding
that is more similar to other resulted models (with the
same steps), instead of crystal structure
More tests with GDT-TS may be needed.
Conclusion