Upcoming SlideShare
Loading in...5







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Critique Critique Presentation Transcript

  • Paper Critique Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions Amer Talal Wazwaz [email_address] [email_address] 18/4/2011
  • Introduction 
    • ENCODE ENC yclopedia O f D NA E lements
    • pilot project to identify the functional elements in the genomes’ sequences.
    • non-coding RNAs ( ncRNAs )
    • A major challenge in these projects is to annotate the large number of non-coding RNAs.
    • The steadily increasing number of the discovered ncRNAs has dramatically changed views on the roles and importance of ncRNAs.
    • ncRNAs difficult to find by computational or experimental means.
  • Introduction 
    • Computationally finding ncRNAs is difficult because
    • one has to consider secondary structure as well as nucleotide sequence.
    • But structure can be detected more reliably from a set of related sequences, if available.
    • The recent approach is to align the sequences first, then do RNA structure inference based on the alignment.
  • Introduction  
    • This study describes the first large-scale search for structured ncRNAs in several vertebrate genomes
    • through using
    • a local structural motif finding algorithm, which has identified several thousands novel candidate ncRNAs.
  • Materials and Methods
    • They used CMfinder : a structure-oriented RNA motif prediction tool, to search the ENCODE regions of certain vertebrate multiple alignments.
    • CMfinder built as a complement to the RNAz / EvoFold scans of the ENCODE regions.
    • They obtained their candidates from multiple alignment
    • blocks of the UCSC MULTIZ ; one block at a time
    • (155 nt long on average).
  • Materials and Methods
    • A group of 11 high-scoring ncRNA candidates chosen for experimental verification. ncRNA candidates that were tested by RT-PCR and Northern blotting.
    • 10 were confirmed to be present as RNA transcripts in certain tissues.
    • Their experimental verification show evidence of significant differential expression across tissues.
  • Results
    • They found a large number of potential ncRNAs in the ENCODE regions.
    • They reported 6587 candidate regions with an estimated false-positive rate of 50%.
    • With their new candidates they increased the number of ncRNA candidates in the ENCODE regions by 32%.
  • Discussion
    • To demonstrate accuracy of the possible benefits of structure-aware alignment, they examined MULTIZ multiple alignment blocks identified by Wang et al. (2007)
    • with good matches to the Rfam model in all species in the same region of the alignment.
    • And reported that CMfinder’s alignment of the region differs from the MULTIZ alignment in only 13% of the positions.
  • Discussion
    • Also it is an alignment-independent
    • CMfinder
    • ignore a sequence if it does not contain the motif, and the program still report a high-scoring motif for the rest of the sequences
    • CMfinder, also
    • does not remove individual sequences with >25% and 20% gaps, respectively, as compared to RNAz and EvoFold
  • Discussion
    • Although MULTIZ is most frequently shown to be quite accurate in these challenging cases, as a rational
    • proof of cross-species conservation of each motif instance.
    • several studies occasionally revealed compelling evidence of misalignment.
    • Even small misalignments have adverse effects on drawing any biological inferences
    • Two main misalignment categories
    • " partial alignments “ " chimeric alignments “
  • "partial alignments"
    • Comprise 5.1% of the MULTIZ sequences.
    • What is aligned to the ncRNAs includes a large gap within the same or among species.
    • The aligned fragment by itself does not pass the threshold of certain tests for ncRNA family membership.
  • "chimeric alignments"
    • Comprise 5.4% of the MULTIZ sequences.
    • What is aligned to the ncRNAs not a contiguous sequence. Instead, it is composed of sequence fragments from different regions or even different chromosomes.
    • None of these fragments individually passed the threshold of certain tests of ncRNA family membership.
  • Structural approaches to distinguish ncRNAs
    • CMfinder and other structural programs classify transcripts as ncRNAs are likely to lead to significant false positive rates or discoveries.
    • Since conserved secondary structures are also commonly found in mRNAs (especially 3’ UTRs).
    • functional ncRNAs may contain secondary or tertiary structures with non-canonical base interactions, that are not considered by structural prediction programs.
  • Machine Learning
    • CMfinder
    • Integrated motif features for scoring
    • by machine-learning algorithms
    • Support Vector Machine
    • BUT these methods did not perform well
    • because of
    • heterogeneity of the features
    • limitations of available training data
  • Suggestions
    • Limiting the search to the most promising regions.
    • I suggest the CFTR region (syntenic, few duplications, higher quality of annotation and well conserved)
    • Using longer blocks (local aligned sequences)
    • >300 nt
  • Thank You
    • Questions