Your SlideShare is downloading. ×
0
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Welch Wordifier Bosc2009

671

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
671
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. &amp;quot;Junk&amp;quot; DNA Proves to be Highly Valuable1<br />What was once thought of as DNA with zero value in plants--dubbed &amp;quot;junk&amp;quot; DNA--may turn out to be key in helping scientists improve the control of gene expression in transgenic crops.2<br />Cooper and collaborators investigated &amp;quot;junk&amp;quot; DNA in the model plant Arabidopsis thaliana, using a computer program to find short segments of DNA that appeared as molecular patterns…These linked patterns are called pyknons…<br />This discovery in plants illustrates that the link between coding DNA and junk DNA crosses higher orders of biology and suggests a universal genetic mechanism at play that is not yet fully understood. <br />1-Alfredo Flores, June 2, 2009; http://www.ars.usda.gov/is/pr/2009/090602.htm.<br />2-Bret Cooper, Soybean Genomics and Improvement Laboratory, Agricultural Research Service, USDA.<br />
  • 2. “Perhaps it is time tobid farewell to the term ‘junk’ DNA – we knew not your true nature.” <br />(Regulatory RNAs and the demise of ‘junk’ DNA. Genome Biology 2006, 7:328)<br />The genome<br />genes<br />Functional elements?<br />Functional Elements: 90%?? Junk: 10%??<br />&amp;quot;...a certain amount of hubris was required <br />for anyone to call any part of the genome &amp;apos;junk,&amp;apos; <br />given our level of ignorance.&amp;quot;(Francis Collins, 2006)<br />
  • 3. Fig. 1. Pyknons in the 3&amp;apos; UTRs of the apoptosis inhibitor birc4 (shown above the horizontal line) and nine other genes<br />Rigoutsos, Isidore et al. (2006) Proc. Natl. Acad. Sci. USA 103, 6605-6610<br />Copyright ©2006 by the National Academy of Sciences<br />
  • 4.
  • 5.
  • 6. WordSeekerA Software Suite for Discovery and Characterization of Genomic Words and Genome-Wide Patterns<br />
  • 7. www.word-seeker.org<br />
  • 8. word discovery methods<br />sequence-driven<br />(alignment-based)<br />pattern-driven<br />(enumerative)<br />exhaustive<br />optimized<br />probabilistic<br />optimization<br />deterministic<br />optimization<br />YMF<br />preprocess<br />combine <br />short patterns<br />AlignAce<br />MEME<br />WINNOWER<br />heuristic<br />exact<br />Teiresias,<br />WordSeeker<br />suffix tree,<br />Weeder<br />GuhaThakurta D., Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res. 2006 Jul 19;34(12):3585-98. Print 2006. Review. <br />Sandve GK, Drabløs F., A survey of motif discovery methods in an integrated framework. Biol Direct. 2006 Apr 6;1:11.<br />
  • 9. The WORDIFIER Pattern <br />for Functional and Regulatory Genomics<br />sequence(s)<br />words<br />WORDIFIER<br />scientist<br />scientist<br />
  • 10. OWEF: An Open Source Word Enumeration Framework for Bioinformatics<br />Kyle Kurz, Lonnie R. Welch, <br />Frank Drews, Lee Nau, <br />Jens Lichtenberg<br />Ohio University School of EECS <br />Bioinformatics Laboratory<br />
  • 11. Motivation<br />Create a robust Motif Discovery framework using abstracted core algorithms<br />Use a modular design, allowing new methods and algorithms to be implemented quickly and easily<br />Abstract C++ classes<br />Easily extensible<br />Support the Scientific Discovery process<br />
  • 12. Approach<br />
  • 13. Project Information<br />Project: <br />http://bio-s1.cs.ohiou.edu/~wordseek/download/<br />Open Source License: <br />GNU General Public License (GPL v3)<br />Language: <br />C++<br />Applications:<br />Currently in final testing phase<br />Future Work:<br />Will provide backend for WordSeeker tool at Ohio University and Ohio Supercomputer Center<br />Will be used to fully analyze the Arabidopsis thaliana genome<br />
  • 14. Open Source Implementation of Batch Extraction for Coding and Non-Coding Sequences<br />Jens Lichtenberg, Lonnie R. Welch<br />Bioinformatics Laboratory<br />School of EECS<br />Ohio University<br />
  • 15. Motivation<br />Regulatory Genomics tools return and operate on lists of Gene Symbols (e.g. STAT5A, Cd59a, Slc35f4)<br />To our knowledge, no currently supported, open source “tool” that allows extraction of specific non-coding sequences for any organism<br />Ensembl API provides limited functionality <br />
  • 16. Approach<br />connect to <br />Ensembl database<br />Input<br />Output<br />Set up repository<br />Retrieve Gene Adaptor<br />create gene object<br />Gene Symbol<br />Retrieve 5’UTR<br />Retrieve 3’UTR<br />Retrieve Exons<br />Retrieve Upstream Adaptor<br />Retrieve Introns<br />Retrieve Promoter<br />Promoter length<br />Output Files<br />
  • 17. Project Information<br />Project: <br />http://opensource.msseeker.org<br />GNU General Public License (GPL)<br />Language: <br />Perl<br />Integrated in WordSeeker motif discovery tool of Ohio University Bioinformatics Lab<br />Future Work:<br />Connection to Genbank repository information<br />Release into BioPerl or CPAN<br />
  • 18. Acknowledgements<br />Thomas Bitterman, OSC<br />Laura Elnitski, NHGRI<br />Susan Evans, OU<br />Matt Geisler, SIU<br />Erich Grotewold , OSU<br />Edwin Jacox, NHGRI<br />Stephen S. Lee, U. Idaho<br />Pooja M. Majmudar, OU<br />Paul Morris, BGSU<br />Chase Nelson, Oberlin<br />Eric Stockinger , OSU<br />Sarah Wyatt, OU<br />Alper Yilmaz, OSU<br />Jeffrey Parvin, OSU<br />Kun Huang, OSU<br />Thomas Mitchell , OSU<br />Kengo Morohashi, OSU<br />Rebecca Lamb , OSU<br />John Finer, OSU<br /><ul><li>Lonnie Welch
  • 19. Jens Lichtenberg
  • 20. Rami Alouran
  • 21. Frank Drews
  • 22. Kyle Kurz
  • 23. Xiaoyu Liang
  • 24. Lee Nau
  • 25. Matt Wiley
  • 26. Razvan Bunescu
  • 27. Joshua D. Welch
  • 28. Klaus Ecker
  • 29. Mohit Alam
  • 30. Nathaniel George
  • 31. Dazhang Gu
  • 32. Eric Petri
  • 33. Josiah Seaman
  • 34. Kaiyu Shen</li></ul>Collaborators<br />WordSeeker Team<br />Former Members of the team<br />
  • 35. a pattern “describes a problem which occurs <br />over and over again in our environment, and <br />then describes the core of the solution to that <br />problem, in such a way that you can use the <br />solution a million times over, without ever doing <br />it the same way twice [1].” <br />C. Alexander, S. Ishikawa, and M. Silverstein, A Pattern Language: Towns, <br />Buildings, Construction. Oxford University Press, 1977. <br />
  • 36. Alexander Pattern Format <br />Picture – a representative example <br />Introductory paragraph - sets the context<br /><br />Headline - the essence of the problem in one or two sentences. <br />Body – <br /><ul><li>empirical background of the pattern
  • 37. evidence for its validity
  • 38. range of different ways the pattern can be manifested</li></ul>Solution<br /><ul><li>relationships which are required to solve the stated problem in the stated context.
  • 39. stated in the form of an instruction—so that you know exactly what you need to do, to build the pattern</li></ul>Diagram - shows the solution, with labels to indicate its main components<br /><br />A paragraph which ties the pattern to all those smaller patterns in the language, which are needed to complete this pattern, to embellish it, to fill it out…<br />
  • 40. Picture, Introduction, Headline<br />With the availability of the genomic sequences of<br />numerous organisms, life scientists are working in <br />conjunction with bioinformaticians to decipher the <br />meanings of the genomes. Projects such as Encyclopedia of <br />Genomic Elements (ENCODE) [2] and Pyknons [3], seek to <br />identify and charatcetrize the functional elements in genomes. <br />The functional elements are often referred to as words.<br />Given a genomic sequence (or a set of sequences), an important problem <br />is the enumeration of all subsequences (words) contained in the sequence <br />(or the set of sequences).<br />The WORDIFIER Pattern for Functional and Regulatory Genomics<br />

×