Welch Wordifier Bosc2009
Upcoming SlideShare
Loading in...5
×
 

Welch Wordifier Bosc2009

on

  • 1,243 views

 

Statistics

Views

Total Views
1,243
Views on SlideShare
1,241
Embed Views
2

Actions

Likes
0
Downloads
4
Comments
0

1 Embed 2

http://www.slideshare.net 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Welch Wordifier Bosc2009 Welch Wordifier Bosc2009 Presentation Transcript

  • "Junk" DNA Proves to be Highly Valuable1
    What was once thought of as DNA with zero value in plants--dubbed "junk" DNA--may turn out to be key in helping scientists improve the control of gene expression in transgenic crops.2
    Cooper and collaborators investigated "junk" DNA in the model plant Arabidopsis thaliana, using a computer program to find short segments of DNA that appeared as molecular patterns…These linked patterns are called pyknons…
    This discovery in plants illustrates that the link between coding DNA and junk DNA crosses higher orders of biology and suggests a universal genetic mechanism at play that is not yet fully understood.
    1-Alfredo Flores, June 2, 2009; http://www.ars.usda.gov/is/pr/2009/090602.htm.
    2-Bret Cooper, Soybean Genomics and Improvement Laboratory, Agricultural Research Service, USDA.
  • “Perhaps it is time tobid farewell to the term ‘junk’ DNA – we knew not your true nature.”
    (Regulatory RNAs and the demise of ‘junk’ DNA. Genome Biology 2006, 7:328)
    The genome
    genes
    Functional elements?
    Functional Elements: 90%?? Junk: 10%??
    "...a certain amount of hubris was required
    for anyone to call any part of the genome 'junk,'
    given our level of ignorance."(Francis Collins, 2006)
  • Fig. 1. Pyknons in the 3' UTRs of the apoptosis inhibitor birc4 (shown above the horizontal line) and nine other genes
    Rigoutsos, Isidore et al. (2006) Proc. Natl. Acad. Sci. USA 103, 6605-6610
    Copyright ©2006 by the National Academy of Sciences
  • WordSeekerA Software Suite for Discovery and Characterization of Genomic Words and Genome-Wide Patterns
  • www.word-seeker.org
  • word discovery methods
    sequence-driven
    (alignment-based)
    pattern-driven
    (enumerative)
    exhaustive
    optimized
    probabilistic
    optimization
    deterministic
    optimization
    YMF
    preprocess
    combine
    short patterns
    AlignAce
    MEME
    WINNOWER
    heuristic
    exact
    Teiresias,
    WordSeeker
    suffix tree,
    Weeder
    GuhaThakurta D., Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res. 2006 Jul 19;34(12):3585-98. Print 2006. Review.
    Sandve GK, Drabløs F., A survey of motif discovery methods in an integrated framework. Biol Direct. 2006 Apr 6;1:11.
  • The WORDIFIER Pattern
    for Functional and Regulatory Genomics
    sequence(s)
    words
    WORDIFIER
    scientist
    scientist
  • OWEF: An Open Source Word Enumeration Framework for Bioinformatics
    Kyle Kurz, Lonnie R. Welch,
    Frank Drews, Lee Nau,
    Jens Lichtenberg
    Ohio University School of EECS
    Bioinformatics Laboratory
  • Motivation
    Create a robust Motif Discovery framework using abstracted core algorithms
    Use a modular design, allowing new methods and algorithms to be implemented quickly and easily
    Abstract C++ classes
    Easily extensible
    Support the Scientific Discovery process
  • Approach
  • Project Information
    Project:
    http://bio-s1.cs.ohiou.edu/~wordseek/download/
    Open Source License:
    GNU General Public License (GPL v3)
    Language:
    C++
    Applications:
    Currently in final testing phase
    Future Work:
    Will provide backend for WordSeeker tool at Ohio University and Ohio Supercomputer Center
    Will be used to fully analyze the Arabidopsis thaliana genome
  • Open Source Implementation of Batch Extraction for Coding and Non-Coding Sequences
    Jens Lichtenberg, Lonnie R. Welch
    Bioinformatics Laboratory
    School of EECS
    Ohio University
  • Motivation
    Regulatory Genomics tools return and operate on lists of Gene Symbols (e.g. STAT5A, Cd59a, Slc35f4)
    To our knowledge, no currently supported, open source “tool” that allows extraction of specific non-coding sequences for any organism
    Ensembl API provides limited functionality
  • Approach
    connect to
    Ensembl database
    Input
    Output
    Set up repository
    Retrieve Gene Adaptor
    create gene object
    Gene Symbol
    Retrieve 5’UTR
    Retrieve 3’UTR
    Retrieve Exons
    Retrieve Upstream Adaptor
    Retrieve Introns
    Retrieve Promoter
    Promoter length
    Output Files
  • Project Information
    Project:
    http://opensource.msseeker.org
    GNU General Public License (GPL)
    Language:
    Perl
    Integrated in WordSeeker motif discovery tool of Ohio University Bioinformatics Lab
    Future Work:
    Connection to Genbank repository information
    Release into BioPerl or CPAN
  • Acknowledgements
    Thomas Bitterman, OSC
    Laura Elnitski, NHGRI
    Susan Evans, OU
    Matt Geisler, SIU
    Erich Grotewold , OSU
    Edwin Jacox, NHGRI
    Stephen S. Lee, U. Idaho
    Pooja M. Majmudar, OU
    Paul Morris, BGSU
    Chase Nelson, Oberlin
    Eric Stockinger , OSU
    Sarah Wyatt, OU
    Alper Yilmaz, OSU
    Jeffrey Parvin, OSU
    Kun Huang, OSU
    Thomas Mitchell , OSU
    Kengo Morohashi, OSU
    Rebecca Lamb , OSU
    John Finer, OSU
    • Lonnie Welch
    • Jens Lichtenberg
    • Rami Alouran
    • Frank Drews
    • Kyle Kurz
    • Xiaoyu Liang
    • Lee Nau
    • Matt Wiley
    • Razvan Bunescu
    • Joshua D. Welch
    • Klaus Ecker
    • Mohit Alam
    • Nathaniel George
    • Dazhang Gu
    • Eric Petri
    • Josiah Seaman
    • Kaiyu Shen
    Collaborators
    WordSeeker Team
    Former Members of the team
  • a pattern “describes a problem which occurs
    over and over again in our environment, and
    then describes the core of the solution to that
    problem, in such a way that you can use the
    solution a million times over, without ever doing
    it the same way twice [1].”
    C. Alexander, S. Ishikawa, and M. Silverstein, A Pattern Language: Towns,
    Buildings, Construction. Oxford University Press, 1977.
  • Alexander Pattern Format
    Picture – a representative example
    Introductory paragraph - sets the context
    
    Headline - the essence of the problem in one or two sentences.
    Body –
    • empirical background of the pattern
    • evidence for its validity
    • range of different ways the pattern can be manifested
    Solution
    • relationships which are required to solve the stated problem in the stated context.
    • stated in the form of an instruction—so that you know exactly what you need to do, to build the pattern
    Diagram - shows the solution, with labels to indicate its main components
    
    A paragraph which ties the pattern to all those smaller patterns in the language, which are needed to complete this pattern, to embellish it, to fill it out…
  • Picture, Introduction, Headline
    With the availability of the genomic sequences of
    numerous organisms, life scientists are working in
    conjunction with bioinformaticians to decipher the
    meanings of the genomes. Projects such as Encyclopedia of
    Genomic Elements (ENCODE) [2] and Pyknons [3], seek to
    identify and charatcetrize the functional elements in genomes.
    The functional elements are often referred to as words.
    Given a genomic sequence (or a set of sequences), an important problem
    is the enumeration of all subsequences (words) contained in the sequence
    (or the set of sequences).
    The WORDIFIER Pattern for Functional and Regulatory Genomics