Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this


  1. 1. Paper Critique Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions Amer Talal Wazwaz [email_address] [email_address] 18/4/2011
  2. 3. Introduction  <ul><li>ENCODE ENC yclopedia O f D NA E lements </li></ul><ul><li>pilot project to identify the functional elements in the genomes’ sequences. </li></ul><ul><li>non-coding RNAs ( ncRNAs ) </li></ul><ul><li>A major challenge in these projects is to annotate the large number of non-coding RNAs. </li></ul><ul><li>The steadily increasing number of the discovered ncRNAs has dramatically changed views on the roles and importance of ncRNAs. </li></ul><ul><li>ncRNAs difficult to find by computational or experimental means. </li></ul>
  3. 4. Introduction  <ul><li>Computationally finding ncRNAs is difficult because </li></ul><ul><li>one has to consider secondary structure as well as nucleotide sequence. </li></ul><ul><li>But structure can be detected more reliably from a set of related sequences, if available. </li></ul><ul><li>The recent approach is to align the sequences first, then do RNA structure inference based on the alignment. </li></ul>
  4. 5. Introduction   <ul><li>This study describes the first large-scale search for structured ncRNAs in several vertebrate genomes </li></ul><ul><li>through using </li></ul><ul><li>a local structural motif finding algorithm, which has identified several thousands novel candidate ncRNAs. </li></ul>
  5. 6. Materials and Methods <ul><li>They used CMfinder : a structure-oriented RNA motif prediction tool, to search the ENCODE regions of certain vertebrate multiple alignments. </li></ul><ul><li>CMfinder built as a complement to the RNAz / EvoFold scans of the ENCODE regions. </li></ul><ul><li>They obtained their candidates from multiple alignment </li></ul><ul><li>blocks of the UCSC MULTIZ ; one block at a time </li></ul><ul><li>(155 nt long on average). </li></ul>
  6. 7. Materials and Methods <ul><li>A group of 11 high-scoring ncRNA candidates chosen for experimental verification. ncRNA candidates that were tested by RT-PCR and Northern blotting. </li></ul><ul><li>10 were confirmed to be present as RNA transcripts in certain tissues. </li></ul><ul><li>Their experimental verification show evidence of significant differential expression across tissues. </li></ul>
  7. 8. Results <ul><li>They found a large number of potential ncRNAs in the ENCODE regions. </li></ul><ul><li>They reported 6587 candidate regions with an estimated false-positive rate of 50%. </li></ul><ul><li>With their new candidates they increased the number of ncRNA candidates in the ENCODE regions by 32%. </li></ul>
  8. 9. Discussion <ul><li>To demonstrate accuracy of the possible benefits of structure-aware alignment, they examined MULTIZ multiple alignment blocks identified by Wang et al. (2007) </li></ul><ul><li>with good matches to the Rfam model in all species in the same region of the alignment. </li></ul><ul><li>And reported that CMfinder’s alignment of the region differs from the MULTIZ alignment in only 13% of the positions. </li></ul>
  9. 10. Discussion <ul><li>Also it is an alignment-independent </li></ul><ul><li>CMfinder </li></ul><ul><li>ignore a sequence if it does not contain the motif, and the program still report a high-scoring motif for the rest of the sequences </li></ul><ul><li>CMfinder, also </li></ul><ul><li>does not remove individual sequences with >25% and 20% gaps, respectively, as compared to RNAz and EvoFold </li></ul>
  10. 11. Discussion <ul><li>Although MULTIZ is most frequently shown to be quite accurate in these challenging cases, as a rational </li></ul><ul><li>proof of cross-species conservation of each motif instance. </li></ul><ul><li>several studies occasionally revealed compelling evidence of misalignment. </li></ul><ul><li>Even small misalignments have adverse effects on drawing any biological inferences </li></ul><ul><li>Two main misalignment categories </li></ul><ul><li>&quot; partial alignments “ &quot; chimeric alignments “ </li></ul>
  11. 12. &quot;partial alignments&quot; <ul><li>Comprise 5.1% of the MULTIZ sequences. </li></ul><ul><li>What is aligned to the ncRNAs includes a large gap within the same or among species. </li></ul><ul><li>The aligned fragment by itself does not pass the threshold of certain tests for ncRNA family membership. </li></ul>
  12. 13. &quot;chimeric alignments&quot; <ul><li>Comprise 5.4% of the MULTIZ sequences. </li></ul><ul><li>What is aligned to the ncRNAs not a contiguous sequence. Instead, it is composed of sequence fragments from different regions or even different chromosomes. </li></ul><ul><li>None of these fragments individually passed the threshold of certain tests of ncRNA family membership. </li></ul>
  13. 14. Structural approaches to distinguish ncRNAs <ul><li>CMfinder and other structural programs classify transcripts as ncRNAs are likely to lead to significant false positive rates or discoveries. </li></ul><ul><li>Since conserved secondary structures are also commonly found in mRNAs (especially 3’ UTRs). </li></ul><ul><li>functional ncRNAs may contain secondary or tertiary structures with non-canonical base interactions, that are not considered by structural prediction programs. </li></ul>
  14. 15. Machine Learning <ul><li>CMfinder </li></ul><ul><li>Integrated motif features for scoring </li></ul><ul><li>by machine-learning algorithms </li></ul><ul><li>Support Vector Machine </li></ul><ul><li>BUT these methods did not perform well </li></ul><ul><li>because of </li></ul><ul><li>heterogeneity of the features </li></ul><ul><li>limitations of available training data </li></ul>
  15. 16. Suggestions <ul><li>Limiting the search to the most promising regions. </li></ul><ul><li>I suggest the CFTR region (syntenic, few duplications, higher quality of annotation and well conserved) </li></ul><ul><li>Using longer blocks (local aligned sequences) </li></ul><ul><li>>300 nt </li></ul>
  16. 17. Thank You <ul><li>Questions </li></ul>