SlideShare a Scribd company logo
1 of 94
Download to read offline
RNA: Secondary
Structure Prediction and
Analysis
Outline
1. RNA Folding
2. Dynamic Programming for RNA Secondary Structure
Prediction
3. Covariance Model for RNA Structure Prediction
4. Small RNAs: Identification and Analysis
Section 1:
RNA Folding
RNA Basics
• RNA bases: A,C,G,U
• Canonical Base Pairs
• A-U
• G-C
• G-U
• Bases can only pair with
one other base.
RNA Basics
• RNA bases: A,C,G,U
• Canonical Base Pairs
• A-U
• G-C
• G-U
• Bases can only pair with
one other base.
2 Hydrogen Bonds
RNA Basics
• RNA bases: A,C,G,U
• Canonical Base Pairs
• A-U
• G-C
• G-U
• Bases can only pair with
one other base.
3 Hydrogen Bonds—more stable
RNA Basics
• RNA bases: A,C,G,U
• Canonical Base Pairs
• A-U
• G-C
• G-U
• Bases can only pair with
one other base.
―Wobble Pairing‖
RNA Basics
http://www.genetics.wustl.edu/eddy/tRNAscan-SE/
• Various types of RNA:
• transfer RNA (tRNA)
• messenger RNA (mRNA)
• ribosomal RNA (rRNA)
• small interfering RNA (siRNA)
• micro RNA (miRNA)
• small nuclear RNA (snRNA)
• small nucleolar RNA (snoRNA)
Section 2:
Dynamic Programming for
RNA Secondary Structure
Prediction
RNA Secondary Structure
Hairpin loop Junction (Multiloop)
Bulge Loop
Single-Stranded
Interior Loop
Stem
Image– Wuchty
Pseudoknot
Sequence Alignment to Determine Structure
• Bases pair in order to form backbones and determine the
secondary structure.
• Aligning bases based on their ability to pair with each other
gives an algorithmic approach to determining the optimal
structure.
Base Pair Maximization: Dynamic Programming
Images – Sean Eddy
• S(i, j) is the folding of the
RNA subsequence of the
strand from index i to index j
which results in the highest
number of base pairs.
• Recurrence:
Base Pair Maximization: Dynamic Programming
Images – Sean Eddy
S i, j max
• S(i, j) is the folding of the
RNA subsequence of the
strand from index i to index j
which results in the highest
number of base pairs.
• Recurrence:
Base Pair Maximization: Dynamic Programming
Base pairat i and j
Images – Sean Eddy
S i, j max
S i 1, j 1 1 (if i, j base pair)
• S(i, j) is the folding of the
RNA subsequence of the
strand from index i to index j
which results in the highest
number of base pairs.
• Recurrence:
Base Pair Maximization: Dynamic Programming
Base pairat i and j Unmatchedat i
Images – Sean Eddy
S i, j max
S i 1, j 1 1 (if i, j base pair)
S i 1, j
• S(i, j) is the folding of the
RNA subsequence of the
strand from index i to index j
which results in the highest
number of base pairs.
• Recurrence:
Base Pair Maximization: Dynamic Programming
Base pairat i and j Unmatchedat jUnmatchedat i
Images – Sean Eddy
S i, j max
S i 1, j 1 1 (if i, j base pair)
S i 1, j
S i, j 1
• S(i, j) is the folding of the
RNA subsequence of the
strand from index i to index j
which results in the highest
number of base pairs.
• Recurrence:
Base Pair Maximization: Dynamic Programming
Base pairat i and j Unmatchedat jUnmatchedat i
Bifurcation
Images – Sean Eddy
S i, j max
S i 1, j 1 1 (if i, j base pair)
S i 1, j
S i, j 1
max
1 k j
S i,k S k 1, j
• S(i, j) is the folding of the
RNA subsequence of the
strand from index i to index j
which results in the highest
number of base pairs.
• Recurrence:
Images – Sean Edd
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Images—Sean Eddy
Images – Sean Edd
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Initialize first two
diagonals to 0
Images—Sean Eddy
Images – Sean Edd
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Fill in squares sweeping
diagonally
Images—Sean Eddy
Images – Sean Edd
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Fill in squares sweeping
diagonally
Images—Sean Eddy
Images – Sean Edd
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Bases cannot pair
Images—Sean Eddy
Images – Sean Edd
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Bases can pair, similar to
matched alignment
Images—Sean Eddy
Images – Sean Edd
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Dynamic Programming—
possible paths
Images—Sean Eddy
Images – Sean Edd
S(i, j – 1)
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Dynamic Programming—
possible paths
Images—Sean Eddy
Dynamic Programming—
possible paths
Images – Sean Edd
S(i + 1, j)
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Images—Sean Eddy
Images—Sean Eddy
S(i + 1, j – 1) +1
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Dynamic Programming—
possible paths
Base Pair Maximization: Dynamic Programming
Bifurcation—add values
for all k
Images—Sean Eddy
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Base Pair Maximization: Dynamic Programming
Bifurcation—add values
for all k
Images—Sean Eddy
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Base Pair Maximization: Dynamic Programming
Bifurcation—add values
for all k
Images—Sean Eddy
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Base Pair Maximization: Dynamic Programming
Bifurcation—add values
for all k
Images—Sean Eddy
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible
base pairs
• Each score independent of
overall structure
• Bifurcation adds extra
dimension
Base Pair Maximization: Drawbacks
• Base pair maximization will not necessarily lead to the most
stable structure.
• It may create structure with many interior loops or hairpins
which are energetically unfavorable.
• In comparison to aligning sequences with scattered matches—
not biologically reasonable.
Energy Minimization
• Thermodynamic Stability
• Estimated using experimental
techniques.
• Theory : Most Stable = Most
likely
• No pseudoknots due to algorithm
limitations.
• Attempts to maximize the score,
taking thermodynamics into
account.
• MFOLD and ViennaRNA
Energy Minimization Results
• Linear RNA strand folded back on itself to create secondary
structure
• Circularized representation uses this requirement
• Arcs represent base pairing
Images – David Mount
Energy Minimization Results
• All loops must have exactly three bases in them.
• Equivalent to having at least three base pairs between arc
endpoints.
Images – David Mount
Energy Minimization Results
• All loops must have exactly three bases in them.
• Equivalent to having at least three base pairs between arc
endpoints.
Images – David Mount
Energy Minimization Results
• All loops must have exactly three bases in them.
• Exception: Location where beginning and end of RNA
come together in circularized representation.
Images – David Mount
Energy Minimization Results
• All loops must have exactly three bases in them.
• Exception: Location where beginning and end of RNA
come together in circularized representation.
Images – David Mount
Trouble with Pseudoknots
• Pseudoknots cause a breakdown in the dynamic programming
algorithm.
• In order to form a pseudoknot, checks must be made to ensure
base is not already paired—this breaks down the recurrence
relations.
Images – David Mount
Trouble with Pseudoknots
• Pseudoknots cause a breakdown in the dynamic programming
algorithm.
• In order to form a pseudoknot, checks must be made to ensure
base is not already paired—this breaks down the recurrence
relations.
Images – David Mount
Trouble with Pseudoknots
• Pseudoknots cause a breakdown in the dynamic programming
algorithm.
• In order to form a pseudoknot, checks must be made to ensure
base is not already paired—this breaks down the recurrence
relations.
Images – David Mount
Energy Minimization: Drawbacks
• Computes only one optimal structure.
• Optimal solution may not represent the biologically correct
solution.
Section 3:
Covariance Model for
RNA Structure Prediction
Alternative Algorithms - Covariaton
• Incorporates Similarity-based method
• Evolution maintains sequences that are important
• Change in sequence coincides to maintain structure through
base pairs (Covariance)
• Cross-species structure conservation example – tRNA
• Manual and automated approaches have been used to identify
covarying base pairs
• Models for structure based on results
• Ordered Tree Model
• Stochastic Context Free Grammar
Alternative Algorithms - Covariaton
• Expect areas of base pairing in
tRNA to be covarying between
various species.
Alternative Algorithms - Covariaton
• Expect areas of base pairing in
tRNA to be covarying between
various species.
• Base pairing creates same stable
tRNA structure in organisms.
Alternative Algorithms - Covariaton
• Expect areas of base pairing in
tRNA to be covarying between
various species.
• Base pairing creates same stable
tRNA structure in organisms.
• Mutation in one base yields pairing
impossible and breaks down
structure.
Alternative Algorithms - Covariaton
• Expect areas of base pairing in
tRNA to be covarying between
various species.
• Base pairing creates same stable
tRNA structure in organisms.
• Mutation in one base yields pairing
impossible and breaks down
structure.
• Covariation ensures ability to base
pair is maintained and RNA
structure is conserved.
Binary Tree Representation of RNA Secondary
Structure
• Representation of RNA structure
using Binary tree
• Nodes represent
• Base pair if two bases are shown
• Loop if base and ―gap‖ (dash) are
shown
• Pseudoknots still not represented
• Tree does not permit varying
sequences
• Mismatches
• Insertions & Deletions
Images – Eddy et al.
Binary Tree Representation of RNA Secondary
Structure
• Representation of RNA structure
using Binary tree
• Nodes represent
• Base pair if two bases are shown
• Loop if base and ―gap‖ (dash) are
shown
• Pseudoknots still not represented
• Tree does not permit varying
sequences
• Mismatches
• Insertions & Deletions
Images – Eddy et al.
Binary Tree Representation of RNA Secondary
Structure
• Representation of RNA structure
using Binary tree
• Nodes represent
• Base pair if two bases are shown
• Loop if base and ―gap‖ (dash) are
shown
• Pseudoknots still not represented
• Tree does not permit varying
sequences
• Mismatches
• Insertions & Deletions
Images – Eddy et al.
Binary Tree Representation of RNA Secondary
Structure
• Representation of RNA structure
using Binary tree
• Nodes represent
• Base pair if two bases are shown
• Loop if base and ―gap‖ (dash) are
shown
• Pseudoknots still not represented
• Tree does not permit varying
sequences
• Mismatches
• Insertions & Deletions
Images – Eddy et al.
Covariance Model
• Covariance Model: HMM which permits flexible alignment
to an RNA structure – emission and transition probabilities
• Model trees based on finite number of states
• Match states – sequence conforms to the model:
• MATP: State in which bases are paired in the model and
sequence.
• MATL & MATR: State in which either right or left
bulges in the sequence and the model.
• Deletion – State in which there is deletion in the sequence
when compared to the model.
• Insertion – State in which there is an insertion relative to
model.
Covariance Model
• Covariance Model: HMM which permits flexible alignment
to an RNA structure – emission and transition probabilities
• Transitions have probabilities.
• Varying probability: Enter insertion, remain in current state,
etc.
• Bifurcation: No probability, describes path.
Covariance Model (CM) Training Algorithm
S i, j max
S i 1, j 1 M i, j
S i 1, j
S i, j 1
max
i k j
S i, k S k 1, j
• S(i, j) = Score at indices i and j in RNA when aligned to the
Covariance Model.
• Frequencies obtained by aligning model to ―training data‖—
consists of sample sequences.
• Reflect values which optimize alignment of sequences to
model.
Covariance Model (CM) Training Algorithm
S i, j max
S i 1, j 1 M i, j
S i 1, j
S i, j 1
max
i k j
S i, k S k 1, j
• S(i, j) = Score at indices i and j in RNA when aligned to the
Covariance Model.
• Frequencies obtained by aligning model to ―training data‖—
consists of sample sequences.
• Reflect values which optimize alignment of sequences to
model.
Frequency of seeing the symbols
(A, C, G, T) together in locations i and j
depending on symbol.
Covariance Model (CM) Training Algorithm
S i, j max
S i 1, j 1 M i, j
S i 1, j
S i, j 1
max
i k j
S i, k S k 1, j
• S(i, j) = Score at indices i and j in RNA when aligned to the
Covariance Model.
• Frequencies obtained by aligning model to ―training data‖—
consists of sample sequences.
• Reflect values which optimize alignment of sequences to
model.
Independent frequency of seeing the
symbols (A, C, G, T) in locations i or j
depending on symbol.
Covariance Model (CM) Training Algorithm
S i, j max
S i 1, j 1 M i, j
S i 1, j
S i, j 1
max
i k j
S i, k S k 1, j
• S(i, j) = Score at indices i and j in RNA when aligned to the
Covariance Model.
• Frequencies obtained by aligning model to ―training data‖—
consists of sample sequences.
• Reflect values which optimize alignment of sequences to
model.
Independent frequency of seeing the
symbols (A, C, G, T) in locations i or j
depending on symbol.
Alignment to CM Algorithm
• Calculate the probability score of
aligning RNA to CM.
• Three dimensional matrix—O(n³)
• Align sequence to given
subtrees in CM.
• For each subsequence,
calculate all possible states.
• Subtrees evolve from bifurcations
• For simplicity, left singlet is
default.
Images—Eddy et al.
Alignment to CM Algorithm
• For each calculation, take into
account:
• Transition (T) to next state.
• Emission probability (P) in
the state as determined by
training data.
Images—Eddy et al.
Alignment to CM Algorithm
• For each calculation, take into
account:
• Transition (T) to next state.
• Emission probability (P) in
the state as determined by
training data.
Images—Eddy et al.
Alignment to CM Algorithm
• For each calculation, take into
account:
• Transition (T) to next state.
• Emission probability (P) in
the state as determined by
training data.
Images—Eddy et al.
Alignment to CM Algorithm
• For each calculation, take into
account:
• Transition (T) to next state.
• Emission probability (P) in
the state as determined by
training data.
• Deletion—does not have emission
probability associated with it.
Images—Eddy et al.
Alignment to CM Algorithm
• For each calculation, take into
account:
• Transition (T) to next state.
• Emission probability (P) in
the state as determined by
training data.
• Deletion—does not have emission
probability associated with it.
• Bifurcation—does not have state
probability associated with it.
Images—Eddy et al.
Covariance Model Drawbacks
• Needs to be well trained.
• Not suitable for searches of large RNA.
• Structural complexity of large RNA cannot be modeled
• Runtime
• Memory requirements
Section 4:
Small RNAs: Identification
and Analysis
Discovery of small RNAs
• The first small RNA:
• In 1993 Rosalind Lee was studying a non-
coding gene in C. elegans, lin-4, that was
involved in silencing of another gene,
lin-14, at the appropriate time in the
development of the worm C. elegans.
• Two small transcripts of lin-4 (22nt and 61nt) were found to
be complementary to a sequence in the 3' UTR of lin-14.
• Because lin-4 encoded no protein, she deduced that it must
be these transcripts that are causing the silencing by RNA-
RNA interactions.
• The second small RNA wasn't discovered until 2000!
Rosalind Lee
What are small ncRNAs?
• Two flavors of small non-coding RNA:
1. micro RNA (miRNA)
2. short interfering RNA (siRNA)
• Properties of small non-coding RNA:
• Involved in silencing other mRNA transcripts.
• Called ―small‖ because they are usually only about 21-24
nucleotides long.
• Synthesized by first cutting up longer precursor sequences
(like the 61nt one that Lee discovered).
• Silence an mRNA by base pairing with some sequence on
the mRNA.
miRNA Pathway Illustration
siRNA Pathway Illustration
Complementary base
pairing facilitates the
mRNA cleavage
Features of miRNAs
• Hundreds miRNA genes are already identified in human
genome.
• Most miRNAs start with a U
• The second 7-mer on the 5' end is known as the ―seed.‖
• When an miRNAs bind to their targets, the seed sequence
has perfect or near-perfect alignment to some part of the
target sequence.
• Example: UGAGCUUAGCAG...
Features of miRNAs
• Many miRNAs are conserved across species:
• For half of known human miRNAs, >18% of all occurrences
of one of these miRNA seeds are conserved among human,
dog, rat, and mouse.
• As a rule, the full sequence of miRNAs is almost never
completely complementary to the target sequence.
• Common to see a loop or bulge after the seed when binding.
• Loop/bulge is often a hairpin because of stability.
• The site at which miRNAs attack is often in their target's 3'
UTR.
Hairpin is more stable
than a simple bulge
Bulges
The MRE is known as
the ―miRNA recognition
element.‖ This is simply
the sequence in the
target that an miRNA
binds to
miRNA Binding
Locating miRNA Genes: Experimentally
• Locating miRNA experimentally is difficult.
• Procedure:
1. Find a gene that causes down-regulation of another gene.
2. Determine if no protein is encoded.
3. Analyze the sequence to determine if it is complementary
to its target.
Locating miRNA Genes: Comparative Genomics
• Idea: Find the seed binding sites.
1. Examine well-conserved 3' UTRs among species to find
well-conserved 8-mers (A + seed) that might be an miRNA
target sequence.
2. Look for a sequence complementary to this 8-mer to
identify a potential miRNA seed. Once found, check
flanking sequence to see if any stable hairpin structures can
form—these are potentially pre-miRNAs.
3. To determine the possibility of secondary RNA structure,
use RNAfold.
Locating miRNA Genes: Example
• Suppose you found a well-conserved 8-mer in 3'
UTRs (this could be where an miRNA seed
binds in its target).
• Example: AGACTAGG
• Look elsewhere in genome for complementary
sequence (this could be an miRNA seed).
• Example: TCTGATCC
• When TCTGATCC is found, check to see (with
RNAfold) if the sequences around it could form
hairpin; if so, this could be an miRNA gene.
Finding miRNA Targets: Method 1
• Now we know of some miRNAs, but where do they attack?
• Goal: Find the targets of a set of miRNAs that are shared
between human and mouse.
• Looking for the miRNA recognition element (MRE), not
whole mRNA. This is just the part that the miRNA would
bind to.
• Basic Assumption: Whole miRNA:MRE interactions
(binding) are likely to have highly energetically favorable base
pairing.
• Basic Method: Look through the conserved 3' UTRs—this is
where the MREs are most likely to be located—and try to make
an alignment that minimizes the binding energy between the
miRNA sequence and the UTRs (most favorable).
Finding miRNA Targets: Method 1
• Method:
• First look at the binding energies of all 38-mers of the
mRNA when binding to the miRNA. Subsequently apply
several filters to pick alignments that ―look‖ like miRNA
binding.
• Why 38-mers? ~22 nt for the miRNA and the rest to allow
for bulges, loops, etc.
• Algorithm: Use a modified dynamic programming sequence
alignment algorithm to calculate the binding energies for each
38-mer.
• Modifications: Scoring and speedup
Finding miRNA Targets: Method 1
• Scoring:
• Mismatches and indels allowed.
• Matrix based on RNA-RNA binding energies.
• Use known binding energies of Watson-Crick pairing
and wobble (G-U) pairing.
• Binding energy (score) calculated for every two adjacent
pairings (unlike the standard alignment algorithm which just
takes into account the ―score‖ for one pair at a time).
• Adds dimensions to scoring matrix.
• Adds complexity to recurrence relation.
Finding miRNA targets: Method 2
• Goal: Find the set of miRNA targets for miRNAs shared
across multiple species
• Trying to identify which genes have 3' UTRs are attacked
by miRNAs
• Basic Assumptions:
1. There is perfect binding to the miRNA seed.
2. Any leftover sequence wants to achieve optimal RNA
secondary structure.
• Basic Method: For each species’ set of 3' UTRs, find sites
where there is perfect binding of the miRNA seed and
―optimal folding‖ nearby. Look for agreement among all the
species.
Method 2: Example
Method 2: Steps
1. Find a perfect match to the
miRNA seed.
2. Extend the matching region if
possible.
3. Find the optimal folding for the
remaining sequences.
4. Calculate the energy of this
interaction.
Method 2: Details
• Input: A set of miRNAs conserved among species and a set of
3' UTR sequences for those species.
• Method: For each organism:
1. Find all occurrences in the UTR sequences that match the
miRNA seed exactly.
2. Extend this region with perfect or wobble pairings.
3. With the remaining sequence of the miRNA, use the
program RNAfold to find optimal folding with the next 35
bases of the UTR sequence.
4. Calculate a score for this interaction based on the free
energy of the interaction given by RNAfold.
Method 2: Details
• Method Cont.:
5. Sum up the scores of all interactions for each UTR.
6. Rank all the organism's gene's UTRs by this score (sum of
all interactions in that UTR).
7. Repeat the above steps for each organism.
8. Create a cutoff score and a cutoff rank for the UTRs.
9. Select the set of genes where the orthologous genes across
all the sampled species have UTR's that score and rank
above this cutoff.
Method 2: Details
• Verification:
• Find the number of predicted binding sites per miRNA.
• Compare it to number of binding sites for a randomly
generated miRNA.
• The result is much higher.
Analysis of the Two Methods
• Method 1:
• Good at identifying very strong, highly complementary
miRNA targets.
• Found gene targets with one miRNA binding site, failed to
identify genes with multiple weaker binding sites.
• Method 2:
• Good at identifying gene targets that have many weaker
interactions.
• Fails to identify single-site genes.
Analysis of the Two Methods
• Both Methods:
• Speed is an issue.
• Won't find targets that aren't in the 3' UTR of a gene.
• We need more species sequenced!
• Conserved sequences are used to discover small RNAs.
• Conserved small RNAs are used to discover targets.
• Confidence in prediction of small RNAs and targets.
• Allows for broader scope with different combinations of
species.
Results
• Predicted a large portion of already known targets and
provided direction for identifying undiscovered targets.
• Found that it is more common that genes are regulated by
multiple small RNAs.
• Found that many small RNAs have multiple targets.
A Novel siRNA Mechanism
• Recently, a new mechanism of siRNA activity was discovered.
• Two genes (called A and B here) that have a cis-antisense
orientation (they are overlapping on opposite strands) have
transcripts that produce an siRNA due to the dsRNA formed
by their mRNA transcripts.
• Gene A is constitutive, gene B is induced by salt stress
• Normally, just B's transcript is present.
• When both A and B are present, we get annealing to get
dsRNA and this forms an siRNA.
• Since the siRNA is complementary to gene A's transcript,
the siRNA attacks gene A, silencing it.
• These genes might provide direction to finding new siRNAs.
Pathway: Illustration
Annealing of
transcripts nicely
sets up the
dsRNA to be used
later in making
the siRNA
Both transcripts
present if salt is
present
“A”
“B”
siRNA silences A
References
• How Do RNA Folding Algorithms Work?. S.R. Eddy.
Nature Biotechnology, 22:1457-1458, 2004.
• Borsani, O., Zhu, J, Verslues, P.E., Sunkar, R., Zhu, J.-
K. (2005). Endogenous siRNAs Derived From a Pair of
Natural cis-Antisense Transcripts Regulate Solt
Tolerance in Arabidopsis. Cell 123, Jury, W.A. and Vaux
Jr., H. (2005). The role of science in solving the world's
emerging water problems. Proc. Natl Acad. Sci. USA
102: 15715-15720.
References
• Kiriakidou, M., Nelson, P.T., Kouranov, A., Fitziev, P.,
Bouyioukos, C., Hatzigeorgiou, A., and Hatzigeorgiou,
M. (2004). A combined computational-experimental
approach predicts human microRNA targets. Genes &
Dev. 18: 1165-1178.
• Lee, R.C., Feinbaum, R.L., and Ambros, V. (1993). The
C. elegans Heterochronic Gene lin-4 Encodes Small
RNAs with Antisense Complementarity to lin-14. Cell 75:
843-854.
References
• Lee, Y. Kim, M, Han, J. Yeom, K-H, Lee, S., Baek, S.H.,
and Kim, V.N. (2004). MicroRNA genes are transcribed
by RNA polymerase II. The EMBO Journal 23: 4051-
4060.
• Lewis, B.P., Shih, I., Jones-Rhoades, M.W., Bartel, D.P.,
and Burge, C.B. (2003). Prediction of Mammalian
MicroRNA Targets. Cell 115: 787-798.
References
• Xie, X, Lu, J, Kulbokas, E.J., Golub, T.R., Mootha, V.,
Lindblad-Toh, K., Lander, E.S., and Kellis, M. (2005).
Systematic discovery of regulatory motifs in human
promoters and 30 UTRs by comparison of several
mammals. Nature 443: 338-345.

More Related Content

What's hot

Secondary structure of rna and its predicting elements
Secondary structure of rna and its predicting elementsSecondary structure of rna and its predicting elements
Secondary structure of rna and its predicting elementsVinaKhan1
 
HELIX-LOOP-HELIX, HELIX-TURN-HELIX
HELIX-LOOP-HELIX, HELIX-TURN-HELIXHELIX-LOOP-HELIX, HELIX-TURN-HELIX
HELIX-LOOP-HELIX, HELIX-TURN-HELIXnaren
 
BT631-5-primary_secondary_structures_proteins
BT631-5-primary_secondary_structures_proteinsBT631-5-primary_secondary_structures_proteins
BT631-5-primary_secondary_structures_proteinsRajesh G
 
Lecture 5 sols-2019-protein struc 2
Lecture 5 sols-2019-protein struc 2Lecture 5 sols-2019-protein struc 2
Lecture 5 sols-2019-protein struc 2SourabhKumar240
 
Conformational study of polynucleotide
Conformational study of polynucleotideConformational study of polynucleotide
Conformational study of polynucleotideKAUSHAL SAHU
 
Protein motif. by KK Sahu sir
Protein motif. by KK Sahu sirProtein motif. by KK Sahu sir
Protein motif. by KK Sahu sirKAUSHAL SAHU
 
Chapter 4 (part 1) protein structure introduction- 2nd structure
Chapter 4 (part 1)   protein structure introduction- 2nd structureChapter 4 (part 1)   protein structure introduction- 2nd structure
Chapter 4 (part 1) protein structure introduction- 2nd structureAmmedicine Medicine
 

What's hot (10)

Secondary structure of rna and its predicting elements
Secondary structure of rna and its predicting elementsSecondary structure of rna and its predicting elements
Secondary structure of rna and its predicting elements
 
Repication I
Repication IRepication I
Repication I
 
HELIX-LOOP-HELIX, HELIX-TURN-HELIX
HELIX-LOOP-HELIX, HELIX-TURN-HELIXHELIX-LOOP-HELIX, HELIX-TURN-HELIX
HELIX-LOOP-HELIX, HELIX-TURN-HELIX
 
BT631-5-primary_secondary_structures_proteins
BT631-5-primary_secondary_structures_proteinsBT631-5-primary_secondary_structures_proteins
BT631-5-primary_secondary_structures_proteins
 
Beta
BetaBeta
Beta
 
DNA binding Domains
DNA binding DomainsDNA binding Domains
DNA binding Domains
 
Lecture 5 sols-2019-protein struc 2
Lecture 5 sols-2019-protein struc 2Lecture 5 sols-2019-protein struc 2
Lecture 5 sols-2019-protein struc 2
 
Conformational study of polynucleotide
Conformational study of polynucleotideConformational study of polynucleotide
Conformational study of polynucleotide
 
Protein motif. by KK Sahu sir
Protein motif. by KK Sahu sirProtein motif. by KK Sahu sir
Protein motif. by KK Sahu sir
 
Chapter 4 (part 1) protein structure introduction- 2nd structure
Chapter 4 (part 1)   protein structure introduction- 2nd structureChapter 4 (part 1)   protein structure introduction- 2nd structure
Chapter 4 (part 1) protein structure introduction- 2nd structure
 

Viewers also liked

Structure of dna and rna
Structure of dna and rnaStructure of dna and rna
Structure of dna and rnaHimanshu Dev
 
Transcriptomics sequencing
Transcriptomics sequencingTranscriptomics sequencing
Transcriptomics sequencingcdgenomics525
 
Role of transcriptomics in gene expression studies and
Role of transcriptomics in gene expression studies andRole of transcriptomics in gene expression studies and
Role of transcriptomics in gene expression studies andSarla Rao
 
Sequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSASequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSASucheta Tripathy
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomicsSukhjinder Singh
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformaticsavrilcoghlan
 
Microarray and dna chips for transcriptome study
Microarray and dna chips for transcriptome studyMicroarray and dna chips for transcriptome study
Microarray and dna chips for transcriptome studyBia Khan
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentRai University
 
Introduction to Database Concepts
Introduction to Database ConceptsIntroduction to Database Concepts
Introduction to Database ConceptsRosalyn Lemieux
 
RNA Structures, Types and Functions
RNA Structures, Types and FunctionsRNA Structures, Types and Functions
RNA Structures, Types and FunctionsCyra Mae Soreda
 
Database system concepts
Database system conceptsDatabase system concepts
Database system conceptsKumar
 

Viewers also liked (20)

Structure of dna and rna
Structure of dna and rnaStructure of dna and rna
Structure of dna and rna
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
Transcriptomics sequencing
Transcriptomics sequencingTranscriptomics sequencing
Transcriptomics sequencing
 
Role of transcriptomics in gene expression studies and
Role of transcriptomics in gene expression studies andRole of transcriptomics in gene expression studies and
Role of transcriptomics in gene expression studies and
 
Parwati sihag
Parwati sihagParwati sihag
Parwati sihag
 
Sequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSASequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSA
 
Sequence alignment belgaum
Sequence alignment belgaumSequence alignment belgaum
Sequence alignment belgaum
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomics
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
 
Microarray and dna chips for transcriptome study
Microarray and dna chips for transcriptome studyMicroarray and dna chips for transcriptome study
Microarray and dna chips for transcriptome study
 
Fasta
FastaFasta
Fasta
 
Rna secondary structure prediction
Rna secondary structure predictionRna secondary structure prediction
Rna secondary structure prediction
 
OMICS tecnology
OMICS tecnologyOMICS tecnology
OMICS tecnology
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
Introduction to Database Concepts
Introduction to Database ConceptsIntroduction to Database Concepts
Introduction to Database Concepts
 
RNA Structures, Types and Functions
RNA Structures, Types and FunctionsRNA Structures, Types and Functions
RNA Structures, Types and Functions
 
Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
 
Transcriptomics
TranscriptomicsTranscriptomics
Transcriptomics
 
Database system concepts
Database system conceptsDatabase system concepts
Database system concepts
 

Similar to Bioalgo 2012-02-rna

Tutorial 8 (web graph models)
Tutorial 8 (web graph models)Tutorial 8 (web graph models)
Tutorial 8 (web graph models)Kira
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfsriaisvariyasundar
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
Parallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentParallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentGiuliana Carullo
 
NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...
NS-CUK Seminar:H.B.Kim,  Review on "Asymmetric transitivity preserving graph ...NS-CUK Seminar:H.B.Kim,  Review on "Asymmetric transitivity preserving graph ...
NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...ssuser4b1f48
 
Winning in Basketball with Data, Networks and Tensors
Winning in Basketball with Data, Networks and TensorsWinning in Basketball with Data, Networks and Tensors
Winning in Basketball with Data, Networks and TensorsKonstantinos Pelechrinis
 
On the Reliability and Intuitiveness of Aggregated Search Metrics
On the Reliability and Intuitiveness of Aggregated Search MetricsOn the Reliability and Intuitiveness of Aggregated Search Metrics
On the Reliability and Intuitiveness of Aggregated Search MetricsMounia Lalmas-Roelleke
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsGuozhang Wang
 
Tuning Your Engine
Tuning Your EngineTuning Your Engine
Tuning Your Enginejoelbradbury
 
Ruby Masters 2011
Ruby Masters 2011Ruby Masters 2011
Ruby Masters 2011Fabio Akita
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large GraphsNishant Gandhi
 
Modeling Data in MongoDB
Modeling Data in MongoDBModeling Data in MongoDB
Modeling Data in MongoDBlehresman
 
Seminar - Similarity Joins in SQL (performance and semantic joins)
Seminar - Similarity Joins in SQL (performance and semantic joins)Seminar - Similarity Joins in SQL (performance and semantic joins)
Seminar - Similarity Joins in SQL (performance and semantic joins)Eyal Trabelsi
 
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...Emanuele Chioso
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSpark Summit
 
Deephack report Kaggle Allen AI QA
Deephack report Kaggle Allen AI QADeephack report Kaggle Allen AI QA
Deephack report Kaggle Allen AI QAPetr Ermakov
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence AlignmentAjayPatil210
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsMaya Hristakeva
 

Similar to Bioalgo 2012-02-rna (20)

Tutorial 8 (web graph models)
Tutorial 8 (web graph models)Tutorial 8 (web graph models)
Tutorial 8 (web graph models)
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdf
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Parallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentParallel DNA Sequence Alignment
Parallel DNA Sequence Alignment
 
NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...
NS-CUK Seminar:H.B.Kim,  Review on "Asymmetric transitivity preserving graph ...NS-CUK Seminar:H.B.Kim,  Review on "Asymmetric transitivity preserving graph ...
NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...
 
Winning in Basketball with Data, Networks and Tensors
Winning in Basketball with Data, Networks and TensorsWinning in Basketball with Data, Networks and Tensors
Winning in Basketball with Data, Networks and Tensors
 
On the Reliability and Intuitiveness of Aggregated Search Metrics
On the Reliability and Intuitiveness of Aggregated Search MetricsOn the Reliability and Intuitiveness of Aggregated Search Metrics
On the Reliability and Intuitiveness of Aggregated Search Metrics
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative Computations
 
Tuning Your Engine
Tuning Your EngineTuning Your Engine
Tuning Your Engine
 
Ruby Masters 2011
Ruby Masters 2011Ruby Masters 2011
Ruby Masters 2011
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Modeling Data in MongoDB
Modeling Data in MongoDBModeling Data in MongoDB
Modeling Data in MongoDB
 
Seminar - Similarity Joins in SQL (performance and semantic joins)
Seminar - Similarity Joins in SQL (performance and semantic joins)Seminar - Similarity Joins in SQL (performance and semantic joins)
Seminar - Similarity Joins in SQL (performance and semantic joins)
 
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Deephack report Kaggle Allen AI QA
Deephack report Kaggle Allen AI QADeephack report Kaggle Allen AI QA
Deephack report Kaggle Allen AI QA
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence Alignment
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research Recommendations
 

More from BioinformaticsInstitute

Comparative Genomics and de Bruijn graphs
Comparative Genomics and de Bruijn graphsComparative Genomics and de Bruijn graphs
Comparative Genomics and de Bruijn graphsBioinformaticsInstitute
 
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
 Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес... Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...BioinformaticsInstitute
 
Вперед в прошлое. Методы генетической диагностики древней днк
Вперед в прошлое. Методы генетической диагностики древней днкВперед в прошлое. Методы генетической диагностики древней днк
Вперед в прошлое. Методы генетической диагностики древней днкBioinformaticsInstitute
 
"Зачем биологам суперкомпьютеры", Александр Предеус
"Зачем биологам суперкомпьютеры", Александр Предеус"Зачем биологам суперкомпьютеры", Александр Предеус
"Зачем биологам суперкомпьютеры", Александр ПредеусBioinformaticsInstitute
 
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...BioinformaticsInstitute
 
Рак 101 (Мария Шутова, ИоГЕН РАН)
Рак 101 (Мария Шутова, ИоГЕН РАН)Рак 101 (Мария Шутова, ИоГЕН РАН)
Рак 101 (Мария Шутова, ИоГЕН РАН)BioinformaticsInstitute
 
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...BioinformaticsInstitute
 
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)BioinformaticsInstitute
 

More from BioinformaticsInstitute (20)

Graph genome
Graph genome Graph genome
Graph genome
 
Nanopores sequencing
Nanopores sequencingNanopores sequencing
Nanopores sequencing
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
 
Comparative Genomics and de Bruijn graphs
Comparative Genomics and de Bruijn graphsComparative Genomics and de Bruijn graphs
Comparative Genomics and de Bruijn graphs
 
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
 Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес... Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
 
Вперед в прошлое. Методы генетической диагностики древней днк
Вперед в прошлое. Методы генетической диагностики древней днкВперед в прошлое. Методы генетической диагностики древней днк
Вперед в прошлое. Методы генетической диагностики древней днк
 
Knime & bioinformatics
Knime & bioinformaticsKnime & bioinformatics
Knime & bioinformatics
 
"Зачем биологам суперкомпьютеры", Александр Предеус
"Зачем биологам суперкомпьютеры", Александр Предеус"Зачем биологам суперкомпьютеры", Александр Предеус
"Зачем биологам суперкомпьютеры", Александр Предеус
 
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
 
Рак 101 (Мария Шутова, ИоГЕН РАН)
Рак 101 (Мария Шутова, ИоГЕН РАН)Рак 101 (Мария Шутова, ИоГЕН РАН)
Рак 101 (Мария Шутова, ИоГЕН РАН)
 
Плюрипотентность 101
Плюрипотентность 101Плюрипотентность 101
Плюрипотентность 101
 
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
 
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
 
Biodb 2011-everything
Biodb 2011-everythingBiodb 2011-everything
Biodb 2011-everything
 
Biodb 2011-05
Biodb 2011-05Biodb 2011-05
Biodb 2011-05
 
Biodb 2011-04
Biodb 2011-04Biodb 2011-04
Biodb 2011-04
 
Biodb 2011-03
Biodb 2011-03Biodb 2011-03
Biodb 2011-03
 
Biodb 2011-01
Biodb 2011-01Biodb 2011-01
Biodb 2011-01
 
Biodb 2011-02
Biodb 2011-02Biodb 2011-02
Biodb 2011-02
 
Ngs 3 1
Ngs 3 1Ngs 3 1
Ngs 3 1
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Bioalgo 2012-02-rna

  • 2. Outline 1. RNA Folding 2. Dynamic Programming for RNA Secondary Structure Prediction 3. Covariance Model for RNA Structure Prediction 4. Small RNAs: Identification and Analysis
  • 4. RNA Basics • RNA bases: A,C,G,U • Canonical Base Pairs • A-U • G-C • G-U • Bases can only pair with one other base.
  • 5. RNA Basics • RNA bases: A,C,G,U • Canonical Base Pairs • A-U • G-C • G-U • Bases can only pair with one other base. 2 Hydrogen Bonds
  • 6. RNA Basics • RNA bases: A,C,G,U • Canonical Base Pairs • A-U • G-C • G-U • Bases can only pair with one other base. 3 Hydrogen Bonds—more stable
  • 7. RNA Basics • RNA bases: A,C,G,U • Canonical Base Pairs • A-U • G-C • G-U • Bases can only pair with one other base. ―Wobble Pairing‖
  • 8. RNA Basics http://www.genetics.wustl.edu/eddy/tRNAscan-SE/ • Various types of RNA: • transfer RNA (tRNA) • messenger RNA (mRNA) • ribosomal RNA (rRNA) • small interfering RNA (siRNA) • micro RNA (miRNA) • small nuclear RNA (snRNA) • small nucleolar RNA (snoRNA)
  • 9. Section 2: Dynamic Programming for RNA Secondary Structure Prediction
  • 10. RNA Secondary Structure Hairpin loop Junction (Multiloop) Bulge Loop Single-Stranded Interior Loop Stem Image– Wuchty Pseudoknot
  • 11. Sequence Alignment to Determine Structure • Bases pair in order to form backbones and determine the secondary structure. • Aligning bases based on their ability to pair with each other gives an algorithmic approach to determining the optimal structure.
  • 12. Base Pair Maximization: Dynamic Programming Images – Sean Eddy • S(i, j) is the folding of the RNA subsequence of the strand from index i to index j which results in the highest number of base pairs. • Recurrence:
  • 13. Base Pair Maximization: Dynamic Programming Images – Sean Eddy S i, j max • S(i, j) is the folding of the RNA subsequence of the strand from index i to index j which results in the highest number of base pairs. • Recurrence:
  • 14. Base Pair Maximization: Dynamic Programming Base pairat i and j Images – Sean Eddy S i, j max S i 1, j 1 1 (if i, j base pair) • S(i, j) is the folding of the RNA subsequence of the strand from index i to index j which results in the highest number of base pairs. • Recurrence:
  • 15. Base Pair Maximization: Dynamic Programming Base pairat i and j Unmatchedat i Images – Sean Eddy S i, j max S i 1, j 1 1 (if i, j base pair) S i 1, j • S(i, j) is the folding of the RNA subsequence of the strand from index i to index j which results in the highest number of base pairs. • Recurrence:
  • 16. Base Pair Maximization: Dynamic Programming Base pairat i and j Unmatchedat jUnmatchedat i Images – Sean Eddy S i, j max S i 1, j 1 1 (if i, j base pair) S i 1, j S i, j 1 • S(i, j) is the folding of the RNA subsequence of the strand from index i to index j which results in the highest number of base pairs. • Recurrence:
  • 17. Base Pair Maximization: Dynamic Programming Base pairat i and j Unmatchedat jUnmatchedat i Bifurcation Images – Sean Eddy S i, j max S i 1, j 1 1 (if i, j base pair) S i 1, j S i, j 1 max 1 k j S i,k S k 1, j • S(i, j) is the folding of the RNA subsequence of the strand from index i to index j which results in the highest number of base pairs. • Recurrence:
  • 18. Images – Sean Edd Base Pair Maximization: Dynamic Programming • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension Images—Sean Eddy
  • 19. Images – Sean Edd Base Pair Maximization: Dynamic Programming • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension Initialize first two diagonals to 0 Images—Sean Eddy
  • 20. Images – Sean Edd Base Pair Maximization: Dynamic Programming • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension Fill in squares sweeping diagonally Images—Sean Eddy
  • 21. Images – Sean Edd Base Pair Maximization: Dynamic Programming • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension Fill in squares sweeping diagonally Images—Sean Eddy
  • 22. Images – Sean Edd Base Pair Maximization: Dynamic Programming • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension Bases cannot pair Images—Sean Eddy
  • 23. Images – Sean Edd Base Pair Maximization: Dynamic Programming • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension Bases can pair, similar to matched alignment Images—Sean Eddy
  • 24. Images – Sean Edd Base Pair Maximization: Dynamic Programming • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension Dynamic Programming— possible paths Images—Sean Eddy
  • 25. Images – Sean Edd S(i, j – 1) Base Pair Maximization: Dynamic Programming • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension Dynamic Programming— possible paths Images—Sean Eddy
  • 26. Dynamic Programming— possible paths Images – Sean Edd S(i + 1, j) Base Pair Maximization: Dynamic Programming • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension Images—Sean Eddy
  • 27. Images—Sean Eddy S(i + 1, j – 1) +1 Base Pair Maximization: Dynamic Programming • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension Dynamic Programming— possible paths
  • 28. Base Pair Maximization: Dynamic Programming Bifurcation—add values for all k Images—Sean Eddy • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension
  • 29. Base Pair Maximization: Dynamic Programming Bifurcation—add values for all k Images—Sean Eddy • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension
  • 30. Base Pair Maximization: Dynamic Programming Bifurcation—add values for all k Images—Sean Eddy • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension
  • 31. Base Pair Maximization: Dynamic Programming Bifurcation—add values for all k Images—Sean Eddy • Alignment Method: • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension
  • 32. Base Pair Maximization: Drawbacks • Base pair maximization will not necessarily lead to the most stable structure. • It may create structure with many interior loops or hairpins which are energetically unfavorable. • In comparison to aligning sequences with scattered matches— not biologically reasonable.
  • 33. Energy Minimization • Thermodynamic Stability • Estimated using experimental techniques. • Theory : Most Stable = Most likely • No pseudoknots due to algorithm limitations. • Attempts to maximize the score, taking thermodynamics into account. • MFOLD and ViennaRNA
  • 34. Energy Minimization Results • Linear RNA strand folded back on itself to create secondary structure • Circularized representation uses this requirement • Arcs represent base pairing Images – David Mount
  • 35. Energy Minimization Results • All loops must have exactly three bases in them. • Equivalent to having at least three base pairs between arc endpoints. Images – David Mount
  • 36. Energy Minimization Results • All loops must have exactly three bases in them. • Equivalent to having at least three base pairs between arc endpoints. Images – David Mount
  • 37. Energy Minimization Results • All loops must have exactly three bases in them. • Exception: Location where beginning and end of RNA come together in circularized representation. Images – David Mount
  • 38. Energy Minimization Results • All loops must have exactly three bases in them. • Exception: Location where beginning and end of RNA come together in circularized representation. Images – David Mount
  • 39. Trouble with Pseudoknots • Pseudoknots cause a breakdown in the dynamic programming algorithm. • In order to form a pseudoknot, checks must be made to ensure base is not already paired—this breaks down the recurrence relations. Images – David Mount
  • 40. Trouble with Pseudoknots • Pseudoknots cause a breakdown in the dynamic programming algorithm. • In order to form a pseudoknot, checks must be made to ensure base is not already paired—this breaks down the recurrence relations. Images – David Mount
  • 41. Trouble with Pseudoknots • Pseudoknots cause a breakdown in the dynamic programming algorithm. • In order to form a pseudoknot, checks must be made to ensure base is not already paired—this breaks down the recurrence relations. Images – David Mount
  • 42. Energy Minimization: Drawbacks • Computes only one optimal structure. • Optimal solution may not represent the biologically correct solution.
  • 43. Section 3: Covariance Model for RNA Structure Prediction
  • 44. Alternative Algorithms - Covariaton • Incorporates Similarity-based method • Evolution maintains sequences that are important • Change in sequence coincides to maintain structure through base pairs (Covariance) • Cross-species structure conservation example – tRNA • Manual and automated approaches have been used to identify covarying base pairs • Models for structure based on results • Ordered Tree Model • Stochastic Context Free Grammar
  • 45. Alternative Algorithms - Covariaton • Expect areas of base pairing in tRNA to be covarying between various species.
  • 46. Alternative Algorithms - Covariaton • Expect areas of base pairing in tRNA to be covarying between various species. • Base pairing creates same stable tRNA structure in organisms.
  • 47. Alternative Algorithms - Covariaton • Expect areas of base pairing in tRNA to be covarying between various species. • Base pairing creates same stable tRNA structure in organisms. • Mutation in one base yields pairing impossible and breaks down structure.
  • 48. Alternative Algorithms - Covariaton • Expect areas of base pairing in tRNA to be covarying between various species. • Base pairing creates same stable tRNA structure in organisms. • Mutation in one base yields pairing impossible and breaks down structure. • Covariation ensures ability to base pair is maintained and RNA structure is conserved.
  • 49. Binary Tree Representation of RNA Secondary Structure • Representation of RNA structure using Binary tree • Nodes represent • Base pair if two bases are shown • Loop if base and ―gap‖ (dash) are shown • Pseudoknots still not represented • Tree does not permit varying sequences • Mismatches • Insertions & Deletions Images – Eddy et al.
  • 50. Binary Tree Representation of RNA Secondary Structure • Representation of RNA structure using Binary tree • Nodes represent • Base pair if two bases are shown • Loop if base and ―gap‖ (dash) are shown • Pseudoknots still not represented • Tree does not permit varying sequences • Mismatches • Insertions & Deletions Images – Eddy et al.
  • 51. Binary Tree Representation of RNA Secondary Structure • Representation of RNA structure using Binary tree • Nodes represent • Base pair if two bases are shown • Loop if base and ―gap‖ (dash) are shown • Pseudoknots still not represented • Tree does not permit varying sequences • Mismatches • Insertions & Deletions Images – Eddy et al.
  • 52. Binary Tree Representation of RNA Secondary Structure • Representation of RNA structure using Binary tree • Nodes represent • Base pair if two bases are shown • Loop if base and ―gap‖ (dash) are shown • Pseudoknots still not represented • Tree does not permit varying sequences • Mismatches • Insertions & Deletions Images – Eddy et al.
  • 53. Covariance Model • Covariance Model: HMM which permits flexible alignment to an RNA structure – emission and transition probabilities • Model trees based on finite number of states • Match states – sequence conforms to the model: • MATP: State in which bases are paired in the model and sequence. • MATL & MATR: State in which either right or left bulges in the sequence and the model. • Deletion – State in which there is deletion in the sequence when compared to the model. • Insertion – State in which there is an insertion relative to model.
  • 54. Covariance Model • Covariance Model: HMM which permits flexible alignment to an RNA structure – emission and transition probabilities • Transitions have probabilities. • Varying probability: Enter insertion, remain in current state, etc. • Bifurcation: No probability, describes path.
  • 55. Covariance Model (CM) Training Algorithm S i, j max S i 1, j 1 M i, j S i 1, j S i, j 1 max i k j S i, k S k 1, j • S(i, j) = Score at indices i and j in RNA when aligned to the Covariance Model. • Frequencies obtained by aligning model to ―training data‖— consists of sample sequences. • Reflect values which optimize alignment of sequences to model.
  • 56. Covariance Model (CM) Training Algorithm S i, j max S i 1, j 1 M i, j S i 1, j S i, j 1 max i k j S i, k S k 1, j • S(i, j) = Score at indices i and j in RNA when aligned to the Covariance Model. • Frequencies obtained by aligning model to ―training data‖— consists of sample sequences. • Reflect values which optimize alignment of sequences to model. Frequency of seeing the symbols (A, C, G, T) together in locations i and j depending on symbol.
  • 57. Covariance Model (CM) Training Algorithm S i, j max S i 1, j 1 M i, j S i 1, j S i, j 1 max i k j S i, k S k 1, j • S(i, j) = Score at indices i and j in RNA when aligned to the Covariance Model. • Frequencies obtained by aligning model to ―training data‖— consists of sample sequences. • Reflect values which optimize alignment of sequences to model. Independent frequency of seeing the symbols (A, C, G, T) in locations i or j depending on symbol.
  • 58. Covariance Model (CM) Training Algorithm S i, j max S i 1, j 1 M i, j S i 1, j S i, j 1 max i k j S i, k S k 1, j • S(i, j) = Score at indices i and j in RNA when aligned to the Covariance Model. • Frequencies obtained by aligning model to ―training data‖— consists of sample sequences. • Reflect values which optimize alignment of sequences to model. Independent frequency of seeing the symbols (A, C, G, T) in locations i or j depending on symbol.
  • 59. Alignment to CM Algorithm • Calculate the probability score of aligning RNA to CM. • Three dimensional matrix—O(n³) • Align sequence to given subtrees in CM. • For each subsequence, calculate all possible states. • Subtrees evolve from bifurcations • For simplicity, left singlet is default. Images—Eddy et al.
  • 60. Alignment to CM Algorithm • For each calculation, take into account: • Transition (T) to next state. • Emission probability (P) in the state as determined by training data. Images—Eddy et al.
  • 61. Alignment to CM Algorithm • For each calculation, take into account: • Transition (T) to next state. • Emission probability (P) in the state as determined by training data. Images—Eddy et al.
  • 62. Alignment to CM Algorithm • For each calculation, take into account: • Transition (T) to next state. • Emission probability (P) in the state as determined by training data. Images—Eddy et al.
  • 63. Alignment to CM Algorithm • For each calculation, take into account: • Transition (T) to next state. • Emission probability (P) in the state as determined by training data. • Deletion—does not have emission probability associated with it. Images—Eddy et al.
  • 64. Alignment to CM Algorithm • For each calculation, take into account: • Transition (T) to next state. • Emission probability (P) in the state as determined by training data. • Deletion—does not have emission probability associated with it. • Bifurcation—does not have state probability associated with it. Images—Eddy et al.
  • 65. Covariance Model Drawbacks • Needs to be well trained. • Not suitable for searches of large RNA. • Structural complexity of large RNA cannot be modeled • Runtime • Memory requirements
  • 66. Section 4: Small RNAs: Identification and Analysis
  • 67. Discovery of small RNAs • The first small RNA: • In 1993 Rosalind Lee was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the development of the worm C. elegans. • Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14. • Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA- RNA interactions. • The second small RNA wasn't discovered until 2000! Rosalind Lee
  • 68. What are small ncRNAs? • Two flavors of small non-coding RNA: 1. micro RNA (miRNA) 2. short interfering RNA (siRNA) • Properties of small non-coding RNA: • Involved in silencing other mRNA transcripts. • Called ―small‖ because they are usually only about 21-24 nucleotides long. • Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered). • Silence an mRNA by base pairing with some sequence on the mRNA.
  • 70. siRNA Pathway Illustration Complementary base pairing facilitates the mRNA cleavage
  • 71. Features of miRNAs • Hundreds miRNA genes are already identified in human genome. • Most miRNAs start with a U • The second 7-mer on the 5' end is known as the ―seed.‖ • When an miRNAs bind to their targets, the seed sequence has perfect or near-perfect alignment to some part of the target sequence. • Example: UGAGCUUAGCAG...
  • 72. Features of miRNAs • Many miRNAs are conserved across species: • For half of known human miRNAs, >18% of all occurrences of one of these miRNA seeds are conserved among human, dog, rat, and mouse. • As a rule, the full sequence of miRNAs is almost never completely complementary to the target sequence. • Common to see a loop or bulge after the seed when binding. • Loop/bulge is often a hairpin because of stability. • The site at which miRNAs attack is often in their target's 3' UTR.
  • 73. Hairpin is more stable than a simple bulge Bulges The MRE is known as the ―miRNA recognition element.‖ This is simply the sequence in the target that an miRNA binds to miRNA Binding
  • 74. Locating miRNA Genes: Experimentally • Locating miRNA experimentally is difficult. • Procedure: 1. Find a gene that causes down-regulation of another gene. 2. Determine if no protein is encoded. 3. Analyze the sequence to determine if it is complementary to its target.
  • 75. Locating miRNA Genes: Comparative Genomics • Idea: Find the seed binding sites. 1. Examine well-conserved 3' UTRs among species to find well-conserved 8-mers (A + seed) that might be an miRNA target sequence. 2. Look for a sequence complementary to this 8-mer to identify a potential miRNA seed. Once found, check flanking sequence to see if any stable hairpin structures can form—these are potentially pre-miRNAs. 3. To determine the possibility of secondary RNA structure, use RNAfold.
  • 76. Locating miRNA Genes: Example • Suppose you found a well-conserved 8-mer in 3' UTRs (this could be where an miRNA seed binds in its target). • Example: AGACTAGG • Look elsewhere in genome for complementary sequence (this could be an miRNA seed). • Example: TCTGATCC • When TCTGATCC is found, check to see (with RNAfold) if the sequences around it could form hairpin; if so, this could be an miRNA gene.
  • 77. Finding miRNA Targets: Method 1 • Now we know of some miRNAs, but where do they attack? • Goal: Find the targets of a set of miRNAs that are shared between human and mouse. • Looking for the miRNA recognition element (MRE), not whole mRNA. This is just the part that the miRNA would bind to. • Basic Assumption: Whole miRNA:MRE interactions (binding) are likely to have highly energetically favorable base pairing. • Basic Method: Look through the conserved 3' UTRs—this is where the MREs are most likely to be located—and try to make an alignment that minimizes the binding energy between the miRNA sequence and the UTRs (most favorable).
  • 78. Finding miRNA Targets: Method 1 • Method: • First look at the binding energies of all 38-mers of the mRNA when binding to the miRNA. Subsequently apply several filters to pick alignments that ―look‖ like miRNA binding. • Why 38-mers? ~22 nt for the miRNA and the rest to allow for bulges, loops, etc. • Algorithm: Use a modified dynamic programming sequence alignment algorithm to calculate the binding energies for each 38-mer. • Modifications: Scoring and speedup
  • 79. Finding miRNA Targets: Method 1 • Scoring: • Mismatches and indels allowed. • Matrix based on RNA-RNA binding energies. • Use known binding energies of Watson-Crick pairing and wobble (G-U) pairing. • Binding energy (score) calculated for every two adjacent pairings (unlike the standard alignment algorithm which just takes into account the ―score‖ for one pair at a time). • Adds dimensions to scoring matrix. • Adds complexity to recurrence relation.
  • 80. Finding miRNA targets: Method 2 • Goal: Find the set of miRNA targets for miRNAs shared across multiple species • Trying to identify which genes have 3' UTRs are attacked by miRNAs • Basic Assumptions: 1. There is perfect binding to the miRNA seed. 2. Any leftover sequence wants to achieve optimal RNA secondary structure. • Basic Method: For each species’ set of 3' UTRs, find sites where there is perfect binding of the miRNA seed and ―optimal folding‖ nearby. Look for agreement among all the species.
  • 82. Method 2: Steps 1. Find a perfect match to the miRNA seed. 2. Extend the matching region if possible. 3. Find the optimal folding for the remaining sequences. 4. Calculate the energy of this interaction.
  • 83. Method 2: Details • Input: A set of miRNAs conserved among species and a set of 3' UTR sequences for those species. • Method: For each organism: 1. Find all occurrences in the UTR sequences that match the miRNA seed exactly. 2. Extend this region with perfect or wobble pairings. 3. With the remaining sequence of the miRNA, use the program RNAfold to find optimal folding with the next 35 bases of the UTR sequence. 4. Calculate a score for this interaction based on the free energy of the interaction given by RNAfold.
  • 84. Method 2: Details • Method Cont.: 5. Sum up the scores of all interactions for each UTR. 6. Rank all the organism's gene's UTRs by this score (sum of all interactions in that UTR). 7. Repeat the above steps for each organism. 8. Create a cutoff score and a cutoff rank for the UTRs. 9. Select the set of genes where the orthologous genes across all the sampled species have UTR's that score and rank above this cutoff.
  • 85. Method 2: Details • Verification: • Find the number of predicted binding sites per miRNA. • Compare it to number of binding sites for a randomly generated miRNA. • The result is much higher.
  • 86. Analysis of the Two Methods • Method 1: • Good at identifying very strong, highly complementary miRNA targets. • Found gene targets with one miRNA binding site, failed to identify genes with multiple weaker binding sites. • Method 2: • Good at identifying gene targets that have many weaker interactions. • Fails to identify single-site genes.
  • 87. Analysis of the Two Methods • Both Methods: • Speed is an issue. • Won't find targets that aren't in the 3' UTR of a gene. • We need more species sequenced! • Conserved sequences are used to discover small RNAs. • Conserved small RNAs are used to discover targets. • Confidence in prediction of small RNAs and targets. • Allows for broader scope with different combinations of species.
  • 88. Results • Predicted a large portion of already known targets and provided direction for identifying undiscovered targets. • Found that it is more common that genes are regulated by multiple small RNAs. • Found that many small RNAs have multiple targets.
  • 89. A Novel siRNA Mechanism • Recently, a new mechanism of siRNA activity was discovered. • Two genes (called A and B here) that have a cis-antisense orientation (they are overlapping on opposite strands) have transcripts that produce an siRNA due to the dsRNA formed by their mRNA transcripts. • Gene A is constitutive, gene B is induced by salt stress • Normally, just B's transcript is present. • When both A and B are present, we get annealing to get dsRNA and this forms an siRNA. • Since the siRNA is complementary to gene A's transcript, the siRNA attacks gene A, silencing it. • These genes might provide direction to finding new siRNAs.
  • 90. Pathway: Illustration Annealing of transcripts nicely sets up the dsRNA to be used later in making the siRNA Both transcripts present if salt is present “A” “B” siRNA silences A
  • 91. References • How Do RNA Folding Algorithms Work?. S.R. Eddy. Nature Biotechnology, 22:1457-1458, 2004. • Borsani, O., Zhu, J, Verslues, P.E., Sunkar, R., Zhu, J.- K. (2005). Endogenous siRNAs Derived From a Pair of Natural cis-Antisense Transcripts Regulate Solt Tolerance in Arabidopsis. Cell 123, Jury, W.A. and Vaux Jr., H. (2005). The role of science in solving the world's emerging water problems. Proc. Natl Acad. Sci. USA 102: 15715-15720.
  • 92. References • Kiriakidou, M., Nelson, P.T., Kouranov, A., Fitziev, P., Bouyioukos, C., Hatzigeorgiou, A., and Hatzigeorgiou, M. (2004). A combined computational-experimental approach predicts human microRNA targets. Genes & Dev. 18: 1165-1178. • Lee, R.C., Feinbaum, R.L., and Ambros, V. (1993). The C. elegans Heterochronic Gene lin-4 Encodes Small RNAs with Antisense Complementarity to lin-14. Cell 75: 843-854.
  • 93. References • Lee, Y. Kim, M, Han, J. Yeom, K-H, Lee, S., Baek, S.H., and Kim, V.N. (2004). MicroRNA genes are transcribed by RNA polymerase II. The EMBO Journal 23: 4051- 4060. • Lewis, B.P., Shih, I., Jones-Rhoades, M.W., Bartel, D.P., and Burge, C.B. (2003). Prediction of Mammalian MicroRNA Targets. Cell 115: 787-798.
  • 94. References • Xie, X, Lu, J, Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S., and Kellis, M. (2005). Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals. Nature 443: 338-345.