SlideShare a Scribd company logo
GTfold : A Scalable Multicore Code for RNA
Secondary Structure Prediction
Neha Jatav
Department of Computer Science and Engineering
Indian Institute of Technology, Bombay, Powai, Mumbai, India 400076
Email: nehajatav@cse.iitb.ac.in
Project Guide:
Dr. David Bader
College of Computing
Georgia Institute of Technology, Atlanta, GA,USA
Email: bader@cc.gatech.edu
Abstract—Accurate prediction of RNA secondary structure
from the RNA base sequence is an unsolved computational
challenge. The accuracy of predictions made by free energy
minimization is limited by the quality of the energy parameters
in the underlying free energy model. The energy model that
GTfold and the de facto standard programs have been using
is Turner99, the set of nearest neighbor parameters for RNA
folding compiled by the Turner group in 1999. However, there is
a new set of thermodynamic values Turner 2004 compiled by the
Turner group in 2004. Also, using real sequences directly with
GTfold and other RNA folding programs posed a problem as
real sequences contain unspecified bases.
In this project, a user enhanced option of toggling the different
energy models has been added to GTfold. GTfold can now fold
real RNA sequences containing unidentified base N.
I. INTRODUCTION
GTfold is a fast, scalable multi-core code for predicting
RNA secondary (Article (Mathuriya, Bader, Heitsch, & Har-
vey, 2009)). RNA molecules perform a variety of different
biological functions including the role of small RNAs (with
tens or a few hundred of nucleotides) in gene splicing, editing,
and regulation. At the other end of the size spectrum, the
genomes of numerous viruses are lengthy single-stranded
RNA sequences with many thousands of nucleotides. These
single-stranded RNA sequences base pair to form molecular
structures, and the secondary structure of viruses like dengue
[3], ebola [16], and HIV [17] is known to have functional
significance. Thus, disrupting functionally significant base
pairings in RNA viral genomes is one potential method for
treating or preventing the many RNA-related diseases.
According to the thermodynamic hypothesis, the structure
having the minimum free energy (MFE) is predicted as the
secondary structure of the molecule. The free energy of a
secondary structure is the independent sum of the free energies
of distinct substructures called loops. The optimization is
performed using the dynamic programming algorithm given
by Zuker and Stiegler in 1981 [21] which is similar to the
algorithm for sequence alignment but far more complex. The
algorithm explores all the possibilities when computing the
MFE structure. There are heuristics and approximations which
have been applied to satisfy the computational requirements
in the existing folding programs. The free energies of differ-
ent loops are evaluated using thermodynamic model of free
energy.
II. RELATED WORK
The Vienna RNA, developed by the Theoretical Biochem-
istry Group and has an option of implementing their program
on a different thermodynamic model, called the Andronescu
model (Andronescu, Condon, Hoos, Mathews, & Murphy,
2007) which gives a constraint generation (CG), the first com-
putational approach to RNA free energy parameter estimation
that can be efficiently trained on large sets of structural as well
as thermodynamic data. The CG approach employs a novel
iterative scheme, whereby the energy values are first computed
as the solution to a constrained optimization problem. Then
the newly computed energy parameters are used to update
the constraints on the optimization function, so as to better
optimize the energy parameters in the next iteration. Using
this method on biologically sound data, revised parameters
can be obtained for the Turner99 energy model which provides
significant improvements in prediction accuracy over current
state of-the-art methods.
In Mfold web server developed by Michael Zuker (Article
(Zuker, 2003)), for a sequence entered into the sequence text
area box all characters except for AZ and az are removed.
Lower case characters are converted to upper case. For RNA
folding, T or t are converted to U.In addition, the letters W,
X, Y and Z also refer to A, C, G and U/T, respectively. These
nucleotides, if they pair, should do so only at the end of a helix.
Thus, the mfold web server does not support the IUPAC (In-
ternational Union of Pure and Applied Chemistry) ambiguous
DNA character convention (Cornish-Bowden, 1985).
III. RESEARCH CONTRIBUTIONS
A. Toggling of Thermodynamic values
RNA molecules are made up of A, C, G, and U, nucleotides
which can pair up according to the rules in (A,U), (U,A),
(G,C), (C,G), (G,U), (U,G). Nested base pairings result into
2D structures called secondary structures. There are 3D inter-
actions among the elements of the secondary structures which
result into 3D structures called tertiary structures. Pairings
among bases form various kinds of loops, which can be
classified based on the number of branches present in them.
Nearest neighbor thermodynamic model (NNTM) provides
a set of functions and sequence dependent parameters to
calculate the energy of various kinds of loops. The free energy
of a secondary structure is calculated by adding up the energy
of all loops and stacking present in the structure. There are
two existing thermodynamic models compiled by the Turner
group in 1999 and 2004, known as the Turner 99 model and
Turner 2004 model respectively. The energy parameters can be
toggled between any of these two models and the free energy
can be calculated according to that model.
B. Unidentified base N
The letter N should be used for an unspecified base. It
is not allowed to pair. It is very common in the real RNA
sequences. The real RNA sequences can be processed by
putting constraints on these bases. The base N is prohibited
from pairing and hence finally the RNA sequence is folded
such that none of these unidentified bases are paired.
C. Constraints folding
GTfold allows the optional incorporation of folding con-
straints. Each constraint consists of a single line in the con-
straint file that must conform to a rigid format. The various
types of constraints are itemized below. Multiple constraints
of any form are allowed in any order.
• Force a specific base pair or helix to form. The command
F i j k (1)
will force the formation of the helix (single base pair if
k=1) The triple (i, j, k) refers to k consecutive base pairs,
where rirj is the exterior closing base pair. If any of these
base pairs cannot exist, then an error will be generated
and the job will fail. The usual result is an output page
that declares Job aborted! No Structure!.
• Prohibit a specific base pair or helix from forming. The
command
P i j k (2)
will prohibit every single base pair of the form r[i+h]r[j-
h],(h varying from 0 to k), from occurring.
• Prohibit a string of consecutive bases from pairing. The
command
P i 0 k (3)
(the second to last character is zero) will prevent nu-
cleotides r[i], r[i+1], r[i+2],..., r[i+k-1] from pairing. This
TABLE I
MFES IN KCAL/MOL CALCULATED BY UNAFOLD, GTFOLD AND
RNAFOLD
Sequence Length UNAfold GTfold RNAfold
16S/X54252 698 -138 -143 -143
16S/X54253 702 -141 -149 -149
16S/X98467 1296 -460 -487 -489
16S/X65063 1433 -572 -584 -584
16S/Z17210 1436 -744 -762 -763
16S/X52949 1453 -795 -804 -805
16S/K00421 1475 -682 -687 -687
16S/Z17224 1551 -553 -568 -569
16S/X00794 1963 -723 -742 -747
TABLE II
MFE SCORES IN KCAL/MOL FOR SAME STRUCTURES ON GTFOLD AND
UNAFOLD
Sequence Length UNAfold GTfold
16S/K00421 1475 -680.5 -682.4
16S/X00794 1963 -723.1 -726.4
16S/X52949 1453 -794.6 -794.4
16S/X54252 698 -137.5 -138.7
16S/X54253 702 -141.3 -142.7
16S/X65063 1433 -571.6 -575.5
16S/X98467 1296 -460 -461.2
16S/Z17210 1436 -744 -748.3
16S/Z17224 1551 -552.6 -556.1
is a single base when k=1. Forcing too many bases to be
single stranded can generate a fatal error.
IV. EXPERIMENTAL RESULTS
A. Accuracy
The table I gives the Minimum Free Energies calculated by
the GTfold and other de facto standard programs for predicting
RNA secondary structures. In most of the cases, the Free
Energy calculated by GTfold is the minimum.
The table II shows the energy calculated by GTfold and
UNAfold for the same structure predicted by both. UNAfold
uses a different thermodynamic model as compared to that
used by GTfold. The differences lies in the calculation of the
multiloop energies and the external energies.
The table III shows the implementation of different free
thermodynamic parameters i.e. Turner 99 and Turner 04 using
GTfold.
TABLE III
MFE IN KCAL/MOL CALCULATED USING THE TURNER 99 AND TURNER
04 MODEL ON GTFOLD
Sequence Length Turner04 Turner99
16S/K00421 1475 -636.57 -687
16S/X00794 1963 -691.37 -747
16S/X52949 1453 -768.26 -805
16S/X54252 698 -121.46 -143
16S/X54253 702 -125.55 -149
16S/X65063 1433 -536.93 -584
16S/X98467 1296 -449.16 -489
16S/Z17210 1436 -724.21 -763
16S/Z17224 1551 -521.58 -569
TABLE IV
RUNNING TIMES IN SECONDS FOR UNAFOLD AND GTFOLD
Sequence Length UNAfold GTfold
16S/X54252 698 12 1
16S/X54253 702 10 1
16S/X98467 1296 23 4
16S/X65063 1433 25 4
16S/Z17210 1436 28 5
16S/X52949 1453 29 5
16S/K00421 1475 23 5
16S/Z17224 1551 34 6
16S/X00794 1963 72 9
TABLE V
RUNNING TIMES IN SECONDS FOR GTFOLD RUNNING WITH AND
WITHOUT ILSA
Sequence Length GTfold GTfoldwithILSA
16S/X54252 698 1 20
16S/X54253 702 1 21
16S/X98467 1296 4 127
16S/X65063 1433 4 164
16S/Z17210 1436 5 166
16S/X52949 1453 5 177
16S/K00421 1475 5 181
16S/Z17224 1551 6 213
16S/X00794 1963 9 440
B. Performance Timing
The table IV gives a comparison of the runtimes of GTfold
and UNAfold and it can be seen that GTfold is faster even for
larger RNA sequences.
The table V shows the running time comparison of GTfold
with and without using the Internal Loop Speed-up Algorithm.
V. CONCLUSION
GTfold can be used with both the free energy thermody-
namic models: Turner 99 as well as Turner 04 models. Users
have an option to work with either of the models. GTfold
allows the unidentified base ’N’ and hence can be used directly
with real sequences without any pre-processing or errors.
R´EF ´ERENCES
Andronescu, M., Condon, A., Hoos, H. H., Mathews, D. H.,
& Murphy, K. P. (2007). Efficient parameter estimation
for rna secondary structure prediction. Bioinformatics.
Cornish-Bowden. (1985). Nomenclature for incompletely
specified bases in nucleic acid sequences: recommen-
dations 1984. Nucleic Acids Research.
Mathuriya, A., Bader, D., Heitsch, C., & Harvey, S. (2009).
Gtfold: A scalable multicore code for rna secondary
structure prediction. 24th Annual ACM Symposium
on Applied Computing (SAC), Computational Sciences
Track, Honolulu, HI.
Zuker, M. (2003). Mfold web server for nucleic acid folding
and hybridization prediction. Nucleic Acids Research.

More Related Content

What's hot

minimize solar array switching
minimize solar array switching minimize solar array switching
minimize solar array switching
Srinivas Vasamsetti
 
A transmission line based technique for de-embedding noise parameters
A transmission line based technique for de-embedding noise parametersA transmission line based technique for de-embedding noise parameters
A transmission line based technique for de-embedding noise parameters
villa1451
 
Exp 7 (1)7. Load sharing between two interconnected power systems including t...
Exp 7 (1)7.	Load sharing between two interconnected power systems including t...Exp 7 (1)7.	Load sharing between two interconnected power systems including t...
Exp 7 (1)7. Load sharing between two interconnected power systems including t...
Shweta Yadav
 
EE6501 - Power System Analysis
EE6501 - Power System AnalysisEE6501 - Power System Analysis
Torque Profiles of Asymmetrically Wound Six-Phase Induction Motor (AWSP-IM) u...
Torque Profiles of Asymmetrically Wound Six-Phase Induction Motor (AWSP-IM) u...Torque Profiles of Asymmetrically Wound Six-Phase Induction Motor (AWSP-IM) u...
Torque Profiles of Asymmetrically Wound Six-Phase Induction Motor (AWSP-IM) u...
IOSR Journals
 
A review on_load_flow_studies_final_2
A review on_load_flow_studies_final_2A review on_load_flow_studies_final_2
A review on_load_flow_studies_final_2
yareda6
 
Fundamentals of power system
Fundamentals of power systemFundamentals of power system
Fundamentals of power system
Balaram Das
 
Modularized control strategy and performance analysis of dfig system under un...
Modularized control strategy and performance analysis of dfig system under un...Modularized control strategy and performance analysis of dfig system under un...
Modularized control strategy and performance analysis of dfig system under un...
I3E Technologies
 
FAULT DETECTION AND CLASSIFICATION ON TRANSMISSION OVERHEAD LINE USING BPP...
FAULT DETECTION AND CLASSIFICATION ON TRANSMISSION OVERHEAD LINE  USING BPP...FAULT DETECTION AND CLASSIFICATION ON TRANSMISSION OVERHEAD LINE  USING BPP...
FAULT DETECTION AND CLASSIFICATION ON TRANSMISSION OVERHEAD LINE USING BPP...
Politeknik Negeri Ujung Pandang
 
Load flow studies
Load flow studiesLoad flow studies
Load flow studies
Darshil Shah
 
POWER (LOAD) FLOW STUDY
POWER (LOAD)  FLOW STUDYPOWER (LOAD)  FLOW STUDY
POWER (LOAD) FLOW STUDY
Power System Operation
 
Pst presentation ms f19_008
Pst presentation ms f19_008Pst presentation ms f19_008
Pst presentation ms f19_008
hamza zaheer
 
T04405112116
T04405112116T04405112116
T04405112116
IJERA Editor
 
Load flow study
Load flow studyLoad flow study
Load flow study
f s
 
2010 - Stapelberg, Krzesinski - Network Re-engineering using Successive Survi...
2010 - Stapelberg, Krzesinski - Network Re-engineering using Successive Survi...2010 - Stapelberg, Krzesinski - Network Re-engineering using Successive Survi...
2010 - Stapelberg, Krzesinski - Network Re-engineering using Successive Survi...Dieter Stapelberg
 
Load flow studies 19
Load flow studies 19Load flow studies 19
Load flow studies 19
Asha Anu Kurian
 
Exp 6 . Load sharing between two interconnected power systems
Exp 6 .	Load sharing between two interconnected power systemsExp 6 .	Load sharing between two interconnected power systems
Exp 6 . Load sharing between two interconnected power systems
Shweta Yadav
 

What's hot (20)

minimize solar array switching
minimize solar array switching minimize solar array switching
minimize solar array switching
 
A transmission line based technique for de-embedding noise parameters
A transmission line based technique for de-embedding noise parametersA transmission line based technique for de-embedding noise parameters
A transmission line based technique for de-embedding noise parameters
 
Loadflowsynopsis
LoadflowsynopsisLoadflowsynopsis
Loadflowsynopsis
 
Exp 7 (1)7. Load sharing between two interconnected power systems including t...
Exp 7 (1)7.	Load sharing between two interconnected power systems including t...Exp 7 (1)7.	Load sharing between two interconnected power systems including t...
Exp 7 (1)7. Load sharing between two interconnected power systems including t...
 
EE6501 - Power System Analysis
EE6501 - Power System AnalysisEE6501 - Power System Analysis
EE6501 - Power System Analysis
 
Torque Profiles of Asymmetrically Wound Six-Phase Induction Motor (AWSP-IM) u...
Torque Profiles of Asymmetrically Wound Six-Phase Induction Motor (AWSP-IM) u...Torque Profiles of Asymmetrically Wound Six-Phase Induction Motor (AWSP-IM) u...
Torque Profiles of Asymmetrically Wound Six-Phase Induction Motor (AWSP-IM) u...
 
A review on_load_flow_studies_final_2
A review on_load_flow_studies_final_2A review on_load_flow_studies_final_2
A review on_load_flow_studies_final_2
 
Fundamentals of power system
Fundamentals of power systemFundamentals of power system
Fundamentals of power system
 
load flow 1
 load flow 1 load flow 1
load flow 1
 
Modularized control strategy and performance analysis of dfig system under un...
Modularized control strategy and performance analysis of dfig system under un...Modularized control strategy and performance analysis of dfig system under un...
Modularized control strategy and performance analysis of dfig system under un...
 
FAULT DETECTION AND CLASSIFICATION ON TRANSMISSION OVERHEAD LINE USING BPP...
FAULT DETECTION AND CLASSIFICATION ON TRANSMISSION OVERHEAD LINE  USING BPP...FAULT DETECTION AND CLASSIFICATION ON TRANSMISSION OVERHEAD LINE  USING BPP...
FAULT DETECTION AND CLASSIFICATION ON TRANSMISSION OVERHEAD LINE USING BPP...
 
Load flow studies
Load flow studiesLoad flow studies
Load flow studies
 
POWER (LOAD) FLOW STUDY
POWER (LOAD)  FLOW STUDYPOWER (LOAD)  FLOW STUDY
POWER (LOAD) FLOW STUDY
 
Pst presentation ms f19_008
Pst presentation ms f19_008Pst presentation ms f19_008
Pst presentation ms f19_008
 
T04405112116
T04405112116T04405112116
T04405112116
 
Load flow study
Load flow studyLoad flow study
Load flow study
 
2010 - Stapelberg, Krzesinski - Network Re-engineering using Successive Survi...
2010 - Stapelberg, Krzesinski - Network Re-engineering using Successive Survi...2010 - Stapelberg, Krzesinski - Network Re-engineering using Successive Survi...
2010 - Stapelberg, Krzesinski - Network Re-engineering using Successive Survi...
 
Load flow studies 19
Load flow studies 19Load flow studies 19
Load flow studies 19
 
Exp 6 . Load sharing between two interconnected power systems
Exp 6 .	Load sharing between two interconnected power systemsExp 6 .	Load sharing between two interconnected power systems
Exp 6 . Load sharing between two interconnected power systems
 
Science (1)
Science (1)Science (1)
Science (1)
 

Similar to Tech-Report

Fault detection in power transformers using random neural networks
Fault detection in power transformers using random neural networksFault detection in power transformers using random neural networks
Fault detection in power transformers using random neural networks
IJECEIAES
 
Optimum Network Reconfiguration using Grey Wolf Optimizer
Optimum Network Reconfiguration using Grey Wolf OptimizerOptimum Network Reconfiguration using Grey Wolf Optimizer
Optimum Network Reconfiguration using Grey Wolf Optimizer
TELKOMNIKA JOURNAL
 
Fractal representation of the power demand based on topological properties of...
Fractal representation of the power demand based on topological properties of...Fractal representation of the power demand based on topological properties of...
Fractal representation of the power demand based on topological properties of...
IJECEIAES
 
PaperLoad following in a deregulated power system with Thyristor Controlled S...
PaperLoad following in a deregulated power system with Thyristor Controlled S...PaperLoad following in a deregulated power system with Thyristor Controlled S...
PaperLoad following in a deregulated power system with Thyristor Controlled S...
rajeshja
 
L367174
L367174L367174
L367174
IJERA Editor
 
Performance evaluation of reversible logic based cntfet demultiplexer 2
Performance evaluation of reversible logic based cntfet demultiplexer 2Performance evaluation of reversible logic based cntfet demultiplexer 2
Performance evaluation of reversible logic based cntfet demultiplexer 2IAEME Publication
 
Performance evaluation of reversible logic based cntfet demultiplexer 2
Performance evaluation of reversible logic based cntfet demultiplexer 2Performance evaluation of reversible logic based cntfet demultiplexer 2
Performance evaluation of reversible logic based cntfet demultiplexer 2IAEME Publication
 
LOAD SHEDDING DESIGN FOR AN INDUSTRIAL COGENERATION SYSTEM
LOAD SHEDDING DESIGN FOR AN INDUSTRIAL COGENERATION SYSTEMLOAD SHEDDING DESIGN FOR AN INDUSTRIAL COGENERATION SYSTEM
LOAD SHEDDING DESIGN FOR AN INDUSTRIAL COGENERATION SYSTEM
ELELIJ
 
COMPARING OF SWITCHING FREQUENCY ON VECTOR CONTROLLED ASYNCHRONOUS MOTOR
COMPARING OF SWITCHING FREQUENCY ON VECTOR CONTROLLED ASYNCHRONOUS MOTORCOMPARING OF SWITCHING FREQUENCY ON VECTOR CONTROLLED ASYNCHRONOUS MOTOR
COMPARING OF SWITCHING FREQUENCY ON VECTOR CONTROLLED ASYNCHRONOUS MOTOR
ijscai
 
Finite Element Method for Designing and Analysis of the Transformer – A Retro...
Finite Element Method for Designing and Analysis of the Transformer – A Retro...Finite Element Method for Designing and Analysis of the Transformer – A Retro...
Finite Element Method for Designing and Analysis of the Transformer – A Retro...
idescitation
 
Power evaluation of adiabatic logic circuits in 45 nm technology
Power evaluation of adiabatic logic circuits in 45 nm technologyPower evaluation of adiabatic logic circuits in 45 nm technology
Power evaluation of adiabatic logic circuits in 45 nm technology
IAEME Publication
 
Newly fault-tolerant indirect vector control for traction inverter
Newly fault-tolerant indirect vector control for traction inverterNewly fault-tolerant indirect vector control for traction inverter
Newly fault-tolerant indirect vector control for traction inverter
International Journal of Power Electronics and Drive Systems
 
Modeling Under MATLAB by ANFIS of Three-Phase Tetrahedral Transformer Using i...
Modeling Under MATLAB by ANFIS of Three-Phase Tetrahedral Transformer Using i...Modeling Under MATLAB by ANFIS of Three-Phase Tetrahedral Transformer Using i...
Modeling Under MATLAB by ANFIS of Three-Phase Tetrahedral Transformer Using i...
TELKOMNIKA JOURNAL
 
FINAL VERSION PPT.pptx
FINAL VERSION PPT.pptxFINAL VERSION PPT.pptx
FINAL VERSION PPT.pptx
DeanAcademicsRamacha
 
SSR Damping Controller Design and Optimal Placement in Rotor-Side and Grid-Si...
SSR Damping Controller Design and Optimal Placement in Rotor-Side and Grid-Si...SSR Damping Controller Design and Optimal Placement in Rotor-Side and Grid-Si...
SSR Damping Controller Design and Optimal Placement in Rotor-Side and Grid-Si...
University of South Carolina
 
Convergence analysis of the triangular-based power flow method for AC distribu...
Convergence analysis of the triangular-based power flow method for AC distribu...Convergence analysis of the triangular-based power flow method for AC distribu...
Convergence analysis of the triangular-based power flow method for AC distribu...
IJECEIAES
 
Enhancement of the direct power control applied to DFIG-WECS
Enhancement of the direct power control applied to  DFIG-WECS  Enhancement of the direct power control applied to  DFIG-WECS
Enhancement of the direct power control applied to DFIG-WECS
IJECEIAES
 
Bank of Extended Kalman Filters for Faults Diagnosis in Wind Turbine Doubly F...
Bank of Extended Kalman Filters for Faults Diagnosis in Wind Turbine Doubly F...Bank of Extended Kalman Filters for Faults Diagnosis in Wind Turbine Doubly F...
Bank of Extended Kalman Filters for Faults Diagnosis in Wind Turbine Doubly F...
TELKOMNIKA JOURNAL
 
Adv Func Mater 2014_24_1364-1371
Adv Func Mater 2014_24_1364-1371Adv Func Mater 2014_24_1364-1371
Adv Func Mater 2014_24_1364-1371Subrata Ghosh
 
Fault Ride-Through capability of DSTATCOM for Distributed Wind Generation System
Fault Ride-Through capability of DSTATCOM for Distributed Wind Generation SystemFault Ride-Through capability of DSTATCOM for Distributed Wind Generation System
Fault Ride-Through capability of DSTATCOM for Distributed Wind Generation System
IJPEDS-IAES
 

Similar to Tech-Report (20)

Fault detection in power transformers using random neural networks
Fault detection in power transformers using random neural networksFault detection in power transformers using random neural networks
Fault detection in power transformers using random neural networks
 
Optimum Network Reconfiguration using Grey Wolf Optimizer
Optimum Network Reconfiguration using Grey Wolf OptimizerOptimum Network Reconfiguration using Grey Wolf Optimizer
Optimum Network Reconfiguration using Grey Wolf Optimizer
 
Fractal representation of the power demand based on topological properties of...
Fractal representation of the power demand based on topological properties of...Fractal representation of the power demand based on topological properties of...
Fractal representation of the power demand based on topological properties of...
 
PaperLoad following in a deregulated power system with Thyristor Controlled S...
PaperLoad following in a deregulated power system with Thyristor Controlled S...PaperLoad following in a deregulated power system with Thyristor Controlled S...
PaperLoad following in a deregulated power system with Thyristor Controlled S...
 
L367174
L367174L367174
L367174
 
Performance evaluation of reversible logic based cntfet demultiplexer 2
Performance evaluation of reversible logic based cntfet demultiplexer 2Performance evaluation of reversible logic based cntfet demultiplexer 2
Performance evaluation of reversible logic based cntfet demultiplexer 2
 
Performance evaluation of reversible logic based cntfet demultiplexer 2
Performance evaluation of reversible logic based cntfet demultiplexer 2Performance evaluation of reversible logic based cntfet demultiplexer 2
Performance evaluation of reversible logic based cntfet demultiplexer 2
 
LOAD SHEDDING DESIGN FOR AN INDUSTRIAL COGENERATION SYSTEM
LOAD SHEDDING DESIGN FOR AN INDUSTRIAL COGENERATION SYSTEMLOAD SHEDDING DESIGN FOR AN INDUSTRIAL COGENERATION SYSTEM
LOAD SHEDDING DESIGN FOR AN INDUSTRIAL COGENERATION SYSTEM
 
COMPARING OF SWITCHING FREQUENCY ON VECTOR CONTROLLED ASYNCHRONOUS MOTOR
COMPARING OF SWITCHING FREQUENCY ON VECTOR CONTROLLED ASYNCHRONOUS MOTORCOMPARING OF SWITCHING FREQUENCY ON VECTOR CONTROLLED ASYNCHRONOUS MOTOR
COMPARING OF SWITCHING FREQUENCY ON VECTOR CONTROLLED ASYNCHRONOUS MOTOR
 
Finite Element Method for Designing and Analysis of the Transformer – A Retro...
Finite Element Method for Designing and Analysis of the Transformer – A Retro...Finite Element Method for Designing and Analysis of the Transformer – A Retro...
Finite Element Method for Designing and Analysis of the Transformer – A Retro...
 
Power evaluation of adiabatic logic circuits in 45 nm technology
Power evaluation of adiabatic logic circuits in 45 nm technologyPower evaluation of adiabatic logic circuits in 45 nm technology
Power evaluation of adiabatic logic circuits in 45 nm technology
 
Newly fault-tolerant indirect vector control for traction inverter
Newly fault-tolerant indirect vector control for traction inverterNewly fault-tolerant indirect vector control for traction inverter
Newly fault-tolerant indirect vector control for traction inverter
 
Modeling Under MATLAB by ANFIS of Three-Phase Tetrahedral Transformer Using i...
Modeling Under MATLAB by ANFIS of Three-Phase Tetrahedral Transformer Using i...Modeling Under MATLAB by ANFIS of Three-Phase Tetrahedral Transformer Using i...
Modeling Under MATLAB by ANFIS of Three-Phase Tetrahedral Transformer Using i...
 
FINAL VERSION PPT.pptx
FINAL VERSION PPT.pptxFINAL VERSION PPT.pptx
FINAL VERSION PPT.pptx
 
SSR Damping Controller Design and Optimal Placement in Rotor-Side and Grid-Si...
SSR Damping Controller Design and Optimal Placement in Rotor-Side and Grid-Si...SSR Damping Controller Design and Optimal Placement in Rotor-Side and Grid-Si...
SSR Damping Controller Design and Optimal Placement in Rotor-Side and Grid-Si...
 
Convergence analysis of the triangular-based power flow method for AC distribu...
Convergence analysis of the triangular-based power flow method for AC distribu...Convergence analysis of the triangular-based power flow method for AC distribu...
Convergence analysis of the triangular-based power flow method for AC distribu...
 
Enhancement of the direct power control applied to DFIG-WECS
Enhancement of the direct power control applied to  DFIG-WECS  Enhancement of the direct power control applied to  DFIG-WECS
Enhancement of the direct power control applied to DFIG-WECS
 
Bank of Extended Kalman Filters for Faults Diagnosis in Wind Turbine Doubly F...
Bank of Extended Kalman Filters for Faults Diagnosis in Wind Turbine Doubly F...Bank of Extended Kalman Filters for Faults Diagnosis in Wind Turbine Doubly F...
Bank of Extended Kalman Filters for Faults Diagnosis in Wind Turbine Doubly F...
 
Adv Func Mater 2014_24_1364-1371
Adv Func Mater 2014_24_1364-1371Adv Func Mater 2014_24_1364-1371
Adv Func Mater 2014_24_1364-1371
 
Fault Ride-Through capability of DSTATCOM for Distributed Wind Generation System
Fault Ride-Through capability of DSTATCOM for Distributed Wind Generation SystemFault Ride-Through capability of DSTATCOM for Distributed Wind Generation System
Fault Ride-Through capability of DSTATCOM for Distributed Wind Generation System
 

Tech-Report

  • 1. GTfold : A Scalable Multicore Code for RNA Secondary Structure Prediction Neha Jatav Department of Computer Science and Engineering Indian Institute of Technology, Bombay, Powai, Mumbai, India 400076 Email: nehajatav@cse.iitb.ac.in Project Guide: Dr. David Bader College of Computing Georgia Institute of Technology, Atlanta, GA,USA Email: bader@cc.gatech.edu Abstract—Accurate prediction of RNA secondary structure from the RNA base sequence is an unsolved computational challenge. The accuracy of predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The energy model that GTfold and the de facto standard programs have been using is Turner99, the set of nearest neighbor parameters for RNA folding compiled by the Turner group in 1999. However, there is a new set of thermodynamic values Turner 2004 compiled by the Turner group in 2004. Also, using real sequences directly with GTfold and other RNA folding programs posed a problem as real sequences contain unspecified bases. In this project, a user enhanced option of toggling the different energy models has been added to GTfold. GTfold can now fold real RNA sequences containing unidentified base N. I. INTRODUCTION GTfold is a fast, scalable multi-core code for predicting RNA secondary (Article (Mathuriya, Bader, Heitsch, & Har- vey, 2009)). RNA molecules perform a variety of different biological functions including the role of small RNAs (with tens or a few hundred of nucleotides) in gene splicing, editing, and regulation. At the other end of the size spectrum, the genomes of numerous viruses are lengthy single-stranded RNA sequences with many thousands of nucleotides. These single-stranded RNA sequences base pair to form molecular structures, and the secondary structure of viruses like dengue [3], ebola [16], and HIV [17] is known to have functional significance. Thus, disrupting functionally significant base pairings in RNA viral genomes is one potential method for treating or preventing the many RNA-related diseases. According to the thermodynamic hypothesis, the structure having the minimum free energy (MFE) is predicted as the secondary structure of the molecule. The free energy of a secondary structure is the independent sum of the free energies of distinct substructures called loops. The optimization is performed using the dynamic programming algorithm given by Zuker and Stiegler in 1981 [21] which is similar to the algorithm for sequence alignment but far more complex. The algorithm explores all the possibilities when computing the MFE structure. There are heuristics and approximations which have been applied to satisfy the computational requirements in the existing folding programs. The free energies of differ- ent loops are evaluated using thermodynamic model of free energy. II. RELATED WORK The Vienna RNA, developed by the Theoretical Biochem- istry Group and has an option of implementing their program on a different thermodynamic model, called the Andronescu model (Andronescu, Condon, Hoos, Mathews, & Murphy, 2007) which gives a constraint generation (CG), the first com- putational approach to RNA free energy parameter estimation that can be efficiently trained on large sets of structural as well as thermodynamic data. The CG approach employs a novel iterative scheme, whereby the energy values are first computed as the solution to a constrained optimization problem. Then the newly computed energy parameters are used to update the constraints on the optimization function, so as to better optimize the energy parameters in the next iteration. Using this method on biologically sound data, revised parameters can be obtained for the Turner99 energy model which provides significant improvements in prediction accuracy over current state of-the-art methods. In Mfold web server developed by Michael Zuker (Article (Zuker, 2003)), for a sequence entered into the sequence text area box all characters except for AZ and az are removed. Lower case characters are converted to upper case. For RNA folding, T or t are converted to U.In addition, the letters W, X, Y and Z also refer to A, C, G and U/T, respectively. These nucleotides, if they pair, should do so only at the end of a helix. Thus, the mfold web server does not support the IUPAC (In- ternational Union of Pure and Applied Chemistry) ambiguous DNA character convention (Cornish-Bowden, 1985).
  • 2. III. RESEARCH CONTRIBUTIONS A. Toggling of Thermodynamic values RNA molecules are made up of A, C, G, and U, nucleotides which can pair up according to the rules in (A,U), (U,A), (G,C), (C,G), (G,U), (U,G). Nested base pairings result into 2D structures called secondary structures. There are 3D inter- actions among the elements of the secondary structures which result into 3D structures called tertiary structures. Pairings among bases form various kinds of loops, which can be classified based on the number of branches present in them. Nearest neighbor thermodynamic model (NNTM) provides a set of functions and sequence dependent parameters to calculate the energy of various kinds of loops. The free energy of a secondary structure is calculated by adding up the energy of all loops and stacking present in the structure. There are two existing thermodynamic models compiled by the Turner group in 1999 and 2004, known as the Turner 99 model and Turner 2004 model respectively. The energy parameters can be toggled between any of these two models and the free energy can be calculated according to that model. B. Unidentified base N The letter N should be used for an unspecified base. It is not allowed to pair. It is very common in the real RNA sequences. The real RNA sequences can be processed by putting constraints on these bases. The base N is prohibited from pairing and hence finally the RNA sequence is folded such that none of these unidentified bases are paired. C. Constraints folding GTfold allows the optional incorporation of folding con- straints. Each constraint consists of a single line in the con- straint file that must conform to a rigid format. The various types of constraints are itemized below. Multiple constraints of any form are allowed in any order. • Force a specific base pair or helix to form. The command F i j k (1) will force the formation of the helix (single base pair if k=1) The triple (i, j, k) refers to k consecutive base pairs, where rirj is the exterior closing base pair. If any of these base pairs cannot exist, then an error will be generated and the job will fail. The usual result is an output page that declares Job aborted! No Structure!. • Prohibit a specific base pair or helix from forming. The command P i j k (2) will prohibit every single base pair of the form r[i+h]r[j- h],(h varying from 0 to k), from occurring. • Prohibit a string of consecutive bases from pairing. The command P i 0 k (3) (the second to last character is zero) will prevent nu- cleotides r[i], r[i+1], r[i+2],..., r[i+k-1] from pairing. This TABLE I MFES IN KCAL/MOL CALCULATED BY UNAFOLD, GTFOLD AND RNAFOLD Sequence Length UNAfold GTfold RNAfold 16S/X54252 698 -138 -143 -143 16S/X54253 702 -141 -149 -149 16S/X98467 1296 -460 -487 -489 16S/X65063 1433 -572 -584 -584 16S/Z17210 1436 -744 -762 -763 16S/X52949 1453 -795 -804 -805 16S/K00421 1475 -682 -687 -687 16S/Z17224 1551 -553 -568 -569 16S/X00794 1963 -723 -742 -747 TABLE II MFE SCORES IN KCAL/MOL FOR SAME STRUCTURES ON GTFOLD AND UNAFOLD Sequence Length UNAfold GTfold 16S/K00421 1475 -680.5 -682.4 16S/X00794 1963 -723.1 -726.4 16S/X52949 1453 -794.6 -794.4 16S/X54252 698 -137.5 -138.7 16S/X54253 702 -141.3 -142.7 16S/X65063 1433 -571.6 -575.5 16S/X98467 1296 -460 -461.2 16S/Z17210 1436 -744 -748.3 16S/Z17224 1551 -552.6 -556.1 is a single base when k=1. Forcing too many bases to be single stranded can generate a fatal error. IV. EXPERIMENTAL RESULTS A. Accuracy The table I gives the Minimum Free Energies calculated by the GTfold and other de facto standard programs for predicting RNA secondary structures. In most of the cases, the Free Energy calculated by GTfold is the minimum. The table II shows the energy calculated by GTfold and UNAfold for the same structure predicted by both. UNAfold uses a different thermodynamic model as compared to that used by GTfold. The differences lies in the calculation of the multiloop energies and the external energies. The table III shows the implementation of different free thermodynamic parameters i.e. Turner 99 and Turner 04 using GTfold. TABLE III MFE IN KCAL/MOL CALCULATED USING THE TURNER 99 AND TURNER 04 MODEL ON GTFOLD Sequence Length Turner04 Turner99 16S/K00421 1475 -636.57 -687 16S/X00794 1963 -691.37 -747 16S/X52949 1453 -768.26 -805 16S/X54252 698 -121.46 -143 16S/X54253 702 -125.55 -149 16S/X65063 1433 -536.93 -584 16S/X98467 1296 -449.16 -489 16S/Z17210 1436 -724.21 -763 16S/Z17224 1551 -521.58 -569
  • 3. TABLE IV RUNNING TIMES IN SECONDS FOR UNAFOLD AND GTFOLD Sequence Length UNAfold GTfold 16S/X54252 698 12 1 16S/X54253 702 10 1 16S/X98467 1296 23 4 16S/X65063 1433 25 4 16S/Z17210 1436 28 5 16S/X52949 1453 29 5 16S/K00421 1475 23 5 16S/Z17224 1551 34 6 16S/X00794 1963 72 9 TABLE V RUNNING TIMES IN SECONDS FOR GTFOLD RUNNING WITH AND WITHOUT ILSA Sequence Length GTfold GTfoldwithILSA 16S/X54252 698 1 20 16S/X54253 702 1 21 16S/X98467 1296 4 127 16S/X65063 1433 4 164 16S/Z17210 1436 5 166 16S/X52949 1453 5 177 16S/K00421 1475 5 181 16S/Z17224 1551 6 213 16S/X00794 1963 9 440 B. Performance Timing The table IV gives a comparison of the runtimes of GTfold and UNAfold and it can be seen that GTfold is faster even for larger RNA sequences. The table V shows the running time comparison of GTfold with and without using the Internal Loop Speed-up Algorithm. V. CONCLUSION GTfold can be used with both the free energy thermody- namic models: Turner 99 as well as Turner 04 models. Users have an option to work with either of the models. GTfold allows the unidentified base ’N’ and hence can be used directly with real sequences without any pre-processing or errors. R´EF ´ERENCES Andronescu, M., Condon, A., Hoos, H. H., Mathews, D. H., & Murphy, K. P. (2007). Efficient parameter estimation for rna secondary structure prediction. Bioinformatics. Cornish-Bowden. (1985). Nomenclature for incompletely specified bases in nucleic acid sequences: recommen- dations 1984. Nucleic Acids Research. Mathuriya, A., Bader, D., Heitsch, C., & Harvey, S. (2009). Gtfold: A scalable multicore code for rna secondary structure prediction. 24th Annual ACM Symposium on Applied Computing (SAC), Computational Sciences Track, Honolulu, HI. Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research.