This document discusses RNA structure analysis and computational prediction of RNA secondary structure. It covers the following key points:
RNA is single-stranded but can fold into unique 3D structures guided by base pairing. There are different classes of non-coding RNAs that participate in various cellular processes. Computational prediction of RNA secondary structure is important and can be done through either ab initio or comparative approaches. Ab initio methods predict minimum free energy structures using algorithms like Nussinov and Zuker, while comparative methods analyze covariation between sequences to infer conserved structures. Common tools for RNA structure analysis include MFOLD, RNAfold and tRNAscan-SE.
Gene prediction is the process of determining where a coding gene might be in a genomic sequence. Functional proteins must begin with a Start codon (where DNA transcription begins), and end with a Stop codon (where transcription ends).
Gene prediction is the process of determining where a coding gene might be in a genomic sequence. Functional proteins must begin with a Start codon (where DNA transcription begins), and end with a Stop codon (where transcription ends).
Sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology cannot read whole genomes in one go, but rather reads small pieces of between 20 and 30,000 bases, depending on the technology used. Typically the short fragments, called reads, result from shotgun sequencing genomic DNA, or gene transcript (ESTs).
The problem of sequence assembly can be compared to taking many copies of a book, passing each of them through a shredder with a different cutter, and piecing the text of the book back together just by looking at the shredded pieces. Besides the obvious difficulty of this task, there are some extra practical issues: the original may have many repeated paragraphs, and some shreds may be modified during shredding to have typos. Excerpts from another book may also be added in, and some shreds may be completely unrecognizable.
An integrated publicly accessible bioinformatics resource to support genomic/proteomic research and scientific discovery.
Established in 1984, by the National Biomedical Research Foundation (NBRF) Georgetown University Medial Center, Washington D.C., USA.
It is the source of annotated protein databases and analysis tools for the researchers.
Serve as primary resource for the exploration of protein information.
Accessible by text search for entry and list retrieval, and also BLAST search and peptide match.
Protein Sequence, Structure, and Functional Databases: UniProtKB, Swiss-Prot, TrEMBL, PIR, MIPS, PROSITE, PRINTS, BLOCKS, Pfam, NDRB, OWL, PDB, SCOP, CATH, NDB, PQS, SYSTERS, and Motif. Presented at UGC Sponsored National Workshop on Bioinformatics and Sequence Analysis conducted by Nesamony Memorial Christian College, Marthandam on 9th and 10th October, 2017 by Prof. T. Ashok Kumar
Composite: It compile and filter sequence data from primary database.
Specialized : database—allows targeted searching on one or more specific subject areas
STS stands for sequence tagged site which is short DNA sequence, generally between 100 and 500 bp in length, that is easily recognizable and occurs only once in the chromosome or genome being studied.
The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA sequences. It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan. It is also a member of the International Nucleotide Sequence Database Collaboration or INSDC.
Lecture delivered by T. Ashok Kumar, Head, Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil, Thuckalay, INDIA. UGC Sponsored National Workshop on BIOINFORMATICS AND GENOME ANALYSIS for College Teachers on August 11 & 12, 2014. Organized by Centre for Bioinformatics, Department of Zoology, NMCC.
Ab Initio Protein Structure Prediction is a method to determine the tertiary structure of protein in the absence of experimentally solved structure of a similar/homologous protein. This method builds protein structure guided by energy function.
I had prepared this presentation for an internal project during my masters degree course.
Secondary Structure Prediction of proteins Vijay Hemmadi
Secondary structure prediction has been around for almost a quarter of a century. The early methods suffered from a lack of data. Predictions were performed on single sequences rather than families of homologous sequences, and there were relatively few known 3D structures from which to derive parameters. Probably the most famous early methods are those of Chou & Fasman, Garnier, Osguthorbe & Robson (GOR) and Lim. Although the authors originally claimed quite high accuracies (70-80 %), under careful examination, the methods were shown to be only between 56 and 60% accurate (see Kabsch & Sander, 1984 given below). An early problem in secondary structure prediction had been the inclusion of structures used to derive parameters in the set of structures used to assess the accuracy of the method.
Some good references on the subject:
Sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology cannot read whole genomes in one go, but rather reads small pieces of between 20 and 30,000 bases, depending on the technology used. Typically the short fragments, called reads, result from shotgun sequencing genomic DNA, or gene transcript (ESTs).
The problem of sequence assembly can be compared to taking many copies of a book, passing each of them through a shredder with a different cutter, and piecing the text of the book back together just by looking at the shredded pieces. Besides the obvious difficulty of this task, there are some extra practical issues: the original may have many repeated paragraphs, and some shreds may be modified during shredding to have typos. Excerpts from another book may also be added in, and some shreds may be completely unrecognizable.
An integrated publicly accessible bioinformatics resource to support genomic/proteomic research and scientific discovery.
Established in 1984, by the National Biomedical Research Foundation (NBRF) Georgetown University Medial Center, Washington D.C., USA.
It is the source of annotated protein databases and analysis tools for the researchers.
Serve as primary resource for the exploration of protein information.
Accessible by text search for entry and list retrieval, and also BLAST search and peptide match.
Protein Sequence, Structure, and Functional Databases: UniProtKB, Swiss-Prot, TrEMBL, PIR, MIPS, PROSITE, PRINTS, BLOCKS, Pfam, NDRB, OWL, PDB, SCOP, CATH, NDB, PQS, SYSTERS, and Motif. Presented at UGC Sponsored National Workshop on Bioinformatics and Sequence Analysis conducted by Nesamony Memorial Christian College, Marthandam on 9th and 10th October, 2017 by Prof. T. Ashok Kumar
Composite: It compile and filter sequence data from primary database.
Specialized : database—allows targeted searching on one or more specific subject areas
STS stands for sequence tagged site which is short DNA sequence, generally between 100 and 500 bp in length, that is easily recognizable and occurs only once in the chromosome or genome being studied.
The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA sequences. It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan. It is also a member of the International Nucleotide Sequence Database Collaboration or INSDC.
Lecture delivered by T. Ashok Kumar, Head, Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil, Thuckalay, INDIA. UGC Sponsored National Workshop on BIOINFORMATICS AND GENOME ANALYSIS for College Teachers on August 11 & 12, 2014. Organized by Centre for Bioinformatics, Department of Zoology, NMCC.
Ab Initio Protein Structure Prediction is a method to determine the tertiary structure of protein in the absence of experimentally solved structure of a similar/homologous protein. This method builds protein structure guided by energy function.
I had prepared this presentation for an internal project during my masters degree course.
Secondary Structure Prediction of proteins Vijay Hemmadi
Secondary structure prediction has been around for almost a quarter of a century. The early methods suffered from a lack of data. Predictions were performed on single sequences rather than families of homologous sequences, and there were relatively few known 3D structures from which to derive parameters. Probably the most famous early methods are those of Chou & Fasman, Garnier, Osguthorbe & Robson (GOR) and Lim. Although the authors originally claimed quite high accuracies (70-80 %), under careful examination, the methods were shown to be only between 56 and 60% accurate (see Kabsch & Sander, 1984 given below). An early problem in secondary structure prediction had been the inclusion of structures used to derive parameters in the set of structures used to assess the accuracy of the method.
Some good references on the subject:
Rna secondary structure prediction, a cuckoo search approacheSAT Journals
Abstract
RNA secondary structure prediction uses techniques like crystallography, NMR spectroscopy etc. Computation based techniques
estimate the possible base pairs that could be formed in RNA. Soft computing techniques generally select some random pair or
pair sequences and then check them according to some parameters. The final sequence of RNA which is closest to the required
fitness is selected as the final structure. The cuckoo search approach is good for finding the feasible search space locations.
Cuckoo search approach feasibly provides results for the detection of base pairs in the RNA and the RNA secondary structure.
Keywords: DNA, RNA, base pairs, pseudo-knots, structure, soft computing, techniques
Jiang Y., Xu W., Thompson L.P., Gutell R., and Miranker D. (2011).
R-PASS: A Fast Structure-based RNA Sequence Alignment Algorithm.
Proceedings of 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2011), Atlanta, GA. November 12-15, 2011. IEEE Computer Society, Washington, DC, USA. pp. 618-622.
Conformational study of polynucleotideKAUSHAL SAHU
Introduction
History
The conformation of nucleic acid
Types of polynucleotide
DNA- types and conformation of DNA
B- DNA
A-DNA Z-DNA
RNA – types and conformation of RNA
Coding DNA
Non coding DNA
structure of RNA
Primary structure of RNA
Secondary structure of RNA
Tertiary structure of RNA
Analyzing techniques
Conclusion
References
A WSN primary outline issue for a sensor system is protection of the vitality accessible at every sensor node. We propose to convey different, versatile base stations to delay the lifetime of the sensor system. We split the lifetime of the sensor system into equivalent stretches of time known as rounds. Base stations are migrated toward the begin of a round. Our strategy utilizes a whole number straight program to focus new areas for the base stations and in view of steering convention to guarantee vitality proficient directing amid every round. We propose four assessment measurements and look at our answer utilizing these measurements. Taking into account the reproduction results we demonstrate that utilizing various, versatile base stations as per the arrangement given by our plans would altogether expand the lifetime of the sensor system.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
2. RNA
• Ribonucleic acid
• Single-stranded nucleic acid polymer
• Carbohydrate is ribose
• fold into unique structures guided by
complementary pairing between nucleotide
bases.
• To many people:
– “RNA is the passive intermediary messenger
between DNA genes and the protein
translation machinery”
3.
4. Non-coding RNAs
Many non-coding RNAs exist
Adopt sophisticated 3D structures
Catalyse biochemical reactions
Different classes of non-coding RNAs participate in
different cellular process:
Gene expression regulation (miRNAs, piRNAs,
IncRNAs)
RNA maturation (snRNAs, snoRNAs)
Protein synthesis (rRNAs, tRNAs)
5. Three major types of RNA
• Messenger RNA (mRNA)
– Serving as a temporary copy of genes that is used as
a template for protein synthesis.
• Transfer RNA (tRNA)
– Functioning as adaptor molecules that decode the
genetic code.
• Ribosomal RNA (rRNA)
– Catalyzing the synthesis of proteins.
6. RNA secondary structure
• Unlike DNA, RNA is typically produced as
a single stranded molecule
• Then folds intramolecularly to form a
number of short base-paired stems.
• This base-paired structure is called the
secondary structure of the RNA.
8. Pseudoknots
Nucleic acid secondary structure
containing at least two stem
loop structures in which half of
one stem is intercalated
between the two halves of
another stem
10. Elements of a
RNA secondary structure
• Loop: single stranded subsequence bounded by
base pairs
• Hairpin loop: a loop at the end of a stem
• Bulge (loop): single stranded bases occurring
within a stem
• Interior loop: single stranded bases interrupting
both sides of a stem
• Multi-branched loop/junctions: a loop from
which three or more stems radiate
11.
12. Need
• Understanding the structures of RNA provides insights
into the functions of this class of molecules.
• Detailed structural information about RNA has significant
impact on understanding the mechanisms of a vast array
of cellular processes such as gene expression, viral
infection, and immunity.
• RNA structures can be experimentally determined using
x-ray crystallography or NMR techniques.
• However, these approaches are extremely time
consuming and expensive.
• As a result, computational prediction has become an
attractive alternative.
13. RNA Sequence Analysis
• The sequence evolution of RNA is
constrained by the structure.
• It is possible to have two different RNA
sequences with the same secondary
structure.
• Drastic changes in sequence can often be
tolerated as long as compensatory
mutations maintain base-pairing
complementarity.
16. Computational Prediction
• At present, there are essentially two types of method of
RNA structure prediction.
• One is based on the calculation of the minimum free
energy of the all possible combinations of potential
double-stranded regions derived from a single RNA
sequence.
• This can be considered as ab initio approach.
• The second is a comparative approach which infers
structures based on an evolutionary comparison of
multiple related RNA sequences.
• Involves identification of Base Covariation that
maintains 2° and 3° structure of an RNA molecule during
evolution
Base covariation – sequence is changed while base pairing interations
are preserved
17. 1. Ab Initio
• Many secondary structures can be drawn for a given
sequence
• Again, the number of secondary structures increases
exponentially with sequence length
– An RNA of 200 bases has over 1050 possible base-paired
structures
• Goal: Distinguish the biologically correct structure from all the
incorrect structures.
• In searching for the lowest energy form, all possible base-pair
patterns have to be examined.
• There are several methods for finding all the possible base-
paired regions from a given nucleic acid sequence.
• Two methods
– Dot matrix method
– Dynamic programming method
18. Dot Matrices
• A simple dot matrix can find all possible base-paring
patterns of an RNA sequence when one sequence is
compared with itself.
• Here dots are placed in the matrix to represent matching
complementary bases instead of identical ones.
• The diagonals perpendicular to the main diagonal
represent regions that can self hybridize to form
double-stranded structure with traditional A–U and G–
C base pairs.
• In reality, the pattern detection in a dot matrix is often
obscured by high noise levels.
• One way to reduce the noise in the matrix is to select an
appropriate window size of a minimum number of
contiguous base matches.
• Normally, only a window size of four consecutive base
matches is used.
• If the dot plot reveals more than one feasible structures,
the lowest energy one is chosen.
19.
20. Dynamic Programming
• The use of a dot plot can be effective in finding a single
secondary structure in a small molecule.
• However, if a large molecule contains multiple secondary
structure segments, choosing a combination that is
energetically most stable among a large number of
possibilities can be difficult.
• In this approach, an RNA sequence is compared with itself.
• A scoring scheme is applied to fill the matrix with match
scores based on Watson–Crick base complementarity.
• A path with the maximal score within a scoring matrix after
taking into account the entire sequence information
represents the most probable secondary structure form.
• The dynamic programming method produces one
structure with a single best score.
• However, this is potentially a drawback of this approach
because in reality an RNA may exist in multiple alternative
forms with near minimum energy but not necessarily the
one with maximum base pairs.
21. Algorithms for RNA secondary
structure prediction
• We need:
– An algorithm for evaluating the scores of all
possible structures
– A function that assigns the correct structure
the highest score
• Two methods:
– Nussinov folding algorithm
– Zuker folding algorithm
22. Nussinov folding algorithm
• Goal:
Find the structure with the most base pairs
• Nussinov introduced an efficient dynamic programming algorithm
for this problem
• A recursive algorithm that calculates
– the best structure for small subsequences and
– computes for a given RNA sequence the maximal number of base pairs
of any nested structure
– Nussinov Algorithm solves the problem of RNA non-crossing
secondary structure prediction by base pair maximization
– Simplistic approach ;„Does not give accurate structure predictions.
Predictions; Misses: nearest neighbor interactions, stacking interactions,„
loop length preferences
23. Zuker folding algorithm
• Most sophisticated secondary structure prediction
method for single RNAs
– An energy minimisation algorithm which assumes
that the correct structure is the one with the
lowest equilibrium free energy
• The equilibrium free energy of an RNA secondary
structure is approximated as the sum of individual
contributions from loops, base pairs and other secondary
structure elements.
• The minimum energy structure can be calculated
recursively by a dynamic programming algorithm
very similar to how the maximum base-paired structure
was calculated like the Nussinov algorithm
24. Suboptimal RNA folding
• The original Zuker algorithm finds only the
optimal structure.
• The biologically correct structure is often not the
calculated optimal structure.
• suboptimal structures are structures that a
given sequence could fold into aside from the
minimum free energy structure
• Zuker introduced a suboptimal folding algorithm.
• The algorithm samples one base pair sub
optimally.
• The rest of the structure is the optimal structure
given that base pair.
25. • Difference with the Nussinov folding algorithm:
– Energies of stems are calculated by adding stacking
contributions for the interface between
neighbouring base pairs instead of individual
contributions for each pair.
• Advantage:
– Better fit to experimentally observed equilibrium free
energy values for RNA structures, but it complicates
the dynamic programming algorithm
Zuker folding algorithm
26. RNA Analysis Tools
• MFOLD: prediction of RNA Secondary Structure by
Energy Minimization (Zuker)
• RNAfold: calculate secondary structures of RNAs
RNAeval: calculate energy of RNA sequences on given
secondary structure
• RNAheat: calculate specific heat of RNAs
• RNAdistance: calculate distances of RNA secondary
structures
• RNApdist: calculate distances of thermodynamic RNA
secondary structures ensembles
• RNAinverse: find RNA sequences with given secondary
structure
27. RNA Analysis Tools
• RNAsubopt: calculate suboptimal secondary
structures of RNAs
• tRNAscan-SE: detection of transfer RNA genes
• FAStRNA: predicts potential tRNA genes in genomic
DNA sequences.
• FAStRNA-CM relies on a probabilistic model.
• FAStRNA-CLASS relies on a pattern-matching
approach.
• palindrome: Looks for inverted repeats in a nucleotide
sequence (EMBOSS).
• RNAGA: Prediction of common secondary structures of
RNAs by genetic algorithm (Chen, Le, Maizel)
28. MFOLD
• Predicts energetically most stable structure of an
RNA molecule.
• Also uses covariance information from
phylogenetically related sequences.
• Includes methods for graphic display of predicted
molecule.
• Used for sequence lengths < 1000 nucleotides in length.
• Demands more resources on computer
• Uses N3 complexity – where N is sequence length
• Doubling sequence length increases computation time
up to 8 times
29. 2. COMPARATIVE APPROACH
• The comparative approach uses multiple
evolutionarily related RNA sequences to infer
a consensus structure.
• This approach is based on the assumption that
RNA sequences that deem to be homologous
fold into the same secondary structure.
• By comparing related RNA sequences, an
evolutionarily conserved secondary structure
can be derived.
30. Covariation
• To distinguish the conserved secondary structure among
multiple related RNA sequences, a concept of
“covariation” is used.
• It is known that RNA functional motifs are structurally
conserved.
• To maintain the secondary structures while the
homologous sequences evolve, a mutation occurring in
one position that is responsible for base pairing should
be compensated for by a mutation in the corresponding
base-pairing position so to maintain base pairing and the
stability of the secondary structure.
• Any lack of covariation can be deleterious to the RNA
structure and functions.
31. Consensus drawing
• Another aspect of the comparative method is to select a
common structure through consensus drawing.
• Because predicting secondary structures for each
individual sequence may produce errors, by comparing
all predicted structures of a group of aligned RNA
sequences and drawing a consensus, the commonly
adopted structure can be selected; many other possible
structures can be eliminated in the process.
• The comparative-based algorithms can be further
divided into two categories based on the type of input
data.
• One requires predefined alignment and the other does
not.
32. Algorithms That Use Prealignment
• This type of algorithm requires the user to provide a
pairwise or multiple alignment as input.
• The sequence alignment can be obtained using standard
alignment programs such as T-Coffee, PRRN, or Clustal.
• Based on the alignment input, the prediction programs
compute structurally consistent mutational patterns such
as covariation and derive a consensus structure
common for all the sequences.
• In practice, the consensus structure prediction is often
combined with thermodynamic calculations to improve
accuracy.
• This type of program is relatively successful for
reasonably conserved sequences.
33. RNAalifold
• RNAalifold (http://rna.tbi.univie.ac.at/cgi-
bin/alifold.cgi) is a program in the Vienna
package.
• It uses a multiple sequence alignment as
input to analyze covariation patterns on the
sequences.
• A scoring matrix is created that combines
minimum free energy and covariation
information.
• Dynamic programming is used to select the
structure that has the minimum energy for the
whole set of aligned RNA sequences.
34. Algorithms That Do Not Use
Prealignment
• This type of algorithm simultaneously aligns
multiple input sequences and infers a consensus
structure.
• The alignment is produced using dynamic
programming with a scoring scheme that
incorporates sequence similarity as well as
energy terms.
• Because the full dynamic programming for
multiple alignment is computationally too
demanding, currently available programs limit
the input to two sequences.
35. Foldalign
• Foldalign (http://foldalign.kvl.dk/server/index.html) is a
web-based program for RNA alignment and structure
prediction.
• The user provides a pair of unaligned sequences.
• The program uses a combination of Clustal and dynamic
programming with a scoring scheme that includes
covariation information to construct the alignment.
• A commonly conserved structure for both sequences is
subsequently derived based on the alignment.
• To reduce computational complexity, the program
ignores multibranch loops and is only suitable for
handling short RNA sequences.