Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Paper Critique Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions Amer Talal Wazwaz [email_address] [email_address] 18/4/2011
  2. 3. Introduction  <ul><li>ENCODE ENC yclopedia O f D NA E lements </li></ul><ul><li>pilot project to identify the functional elements in the genomes’ sequences. </li></ul><ul><li>non-coding RNAs ( ncRNAs ) </li></ul><ul><li>A major challenge in these projects is to annotate the large number of non-coding RNAs. </li></ul><ul><li>The steadily increasing number of the discovered ncRNAs has dramatically changed views on the roles and importance of ncRNAs. </li></ul><ul><li>ncRNAs difficult to find by computational or experimental means. </li></ul>
  3. 4. Introduction  <ul><li>Computationally finding ncRNAs is difficult because </li></ul><ul><li>one has to consider secondary structure as well as nucleotide sequence. </li></ul><ul><li>But structure can be detected more reliably from a set of related sequences, if available. </li></ul><ul><li>The recent approach is to align the sequences first, then do RNA structure inference based on the alignment. </li></ul>
  4. 5. Introduction   <ul><li>This study describes the first large-scale search for structured ncRNAs in several vertebrate genomes </li></ul><ul><li>through using </li></ul><ul><li>a local structural motif finding algorithm, which has identified several thousands novel candidate ncRNAs. </li></ul>
  5. 6. Materials and Methods <ul><li>They used CMfinder : a structure-oriented RNA motif prediction tool, to search the ENCODE regions of certain vertebrate multiple alignments. </li></ul><ul><li>CMfinder built as a complement to the RNAz / EvoFold scans of the ENCODE regions. </li></ul><ul><li>They obtained their candidates from multiple alignment </li></ul><ul><li>blocks of the UCSC MULTIZ ; one block at a time </li></ul><ul><li>(155 nt long on average). </li></ul>
  6. 7. Materials and Methods <ul><li>A group of 11 high-scoring ncRNA candidates chosen for experimental verification. ncRNA candidates that were tested by RT-PCR and Northern blotting. </li></ul><ul><li>10 were confirmed to be present as RNA transcripts in certain tissues. </li></ul><ul><li>Their experimental verification show evidence of significant differential expression across tissues. </li></ul>
  7. 8. Results <ul><li>They found a large number of potential ncRNAs in the ENCODE regions. </li></ul><ul><li>They reported 6587 candidate regions with an estimated false-positive rate of 50%. </li></ul><ul><li>With their new candidates they increased the number of ncRNA candidates in the ENCODE regions by 32%. </li></ul>
  8. 9. Discussion <ul><li>To demonstrate accuracy of the possible benefits of structure-aware alignment, they examined MULTIZ multiple alignment blocks identified by Wang et al. (2007) </li></ul><ul><li>with good matches to the Rfam model in all species in the same region of the alignment. </li></ul><ul><li>And reported that CMfinder’s alignment of the region differs from the MULTIZ alignment in only 13% of the positions. </li></ul>
  9. 10. Discussion <ul><li>Also it is an alignment-independent </li></ul><ul><li>CMfinder </li></ul><ul><li>ignore a sequence if it does not contain the motif, and the program still report a high-scoring motif for the rest of the sequences </li></ul><ul><li>CMfinder, also </li></ul><ul><li>does not remove individual sequences with >25% and 20% gaps, respectively, as compared to RNAz and EvoFold </li></ul>
  10. 11. Discussion <ul><li>Although MULTIZ is most frequently shown to be quite accurate in these challenging cases, as a rational </li></ul><ul><li>proof of cross-species conservation of each motif instance. </li></ul><ul><li>several studies occasionally revealed compelling evidence of misalignment. </li></ul><ul><li>Even small misalignments have adverse effects on drawing any biological inferences </li></ul><ul><li>Two main misalignment categories </li></ul><ul><li>&quot; partial alignments “ &quot; chimeric alignments “ </li></ul>
  11. 12. &quot;partial alignments&quot; <ul><li>Comprise 5.1% of the MULTIZ sequences. </li></ul><ul><li>What is aligned to the ncRNAs includes a large gap within the same or among species. </li></ul><ul><li>The aligned fragment by itself does not pass the threshold of certain tests for ncRNA family membership. </li></ul>
  12. 13. &quot;chimeric alignments&quot; <ul><li>Comprise 5.4% of the MULTIZ sequences. </li></ul><ul><li>What is aligned to the ncRNAs not a contiguous sequence. Instead, it is composed of sequence fragments from different regions or even different chromosomes. </li></ul><ul><li>None of these fragments individually passed the threshold of certain tests of ncRNA family membership. </li></ul>
  13. 14. Structural approaches to distinguish ncRNAs <ul><li>CMfinder and other structural programs classify transcripts as ncRNAs are likely to lead to significant false positive rates or discoveries. </li></ul><ul><li>Since conserved secondary structures are also commonly found in mRNAs (especially 3’ UTRs). </li></ul><ul><li>functional ncRNAs may contain secondary or tertiary structures with non-canonical base interactions, that are not considered by structural prediction programs. </li></ul>
  14. 15. Machine Learning <ul><li>CMfinder </li></ul><ul><li>Integrated motif features for scoring </li></ul><ul><li>by machine-learning algorithms </li></ul><ul><li>Support Vector Machine </li></ul><ul><li>BUT these methods did not perform well </li></ul><ul><li>because of </li></ul><ul><li>heterogeneity of the features </li></ul><ul><li>limitations of available training data </li></ul>
  15. 16. Suggestions <ul><li>Limiting the search to the most promising regions. </li></ul><ul><li>I suggest the CFTR region (syntenic, few duplications, higher quality of annotation and well conserved) </li></ul><ul><li>Using longer blocks (local aligned sequences) </li></ul><ul><li>>300 nt </li></ul>
  16. 17. Thank You <ul><li>Questions </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.