Successfully reported this slideshow.
Your SlideShare is downloading. ×

Fauteux Seeder Bosc2009

Upcoming SlideShare
RO Advisory Kickoff Slides
RO Advisory Kickoff Slides
Loading in …3
×

Check these out next

1 of 14 Ad
1 of 14 Ad
Advertisement

More Related Content

Advertisement

Fauteux Seeder Bosc2009

  1. 1. Seeder: Perl Modules for Cis-regulatory Motif Discovery Bioinformatics Open Source Conference June 28 2009, Stockholm François Fauteux Department of Plant Science McGill University Macdonald campus
  2. 2. Introduction • Precise control of where, when and at which level transcription occurs • Synthetic promoter engineering M. Venter, Trends Plant Sci 12, 118 (2007).
  3. 3. Transcription Factor Binding Sites
  4. 4. DNA Motif Discovery • Searching for imperfect copies of an unknown pattern • Sequence-driven approaches: not guaranteed to yield a global optimum • Enumerative approaches: computationally expensive • Convergence towards low- complexity motifs D. GuhaThakurta, Nucleic Acids Res 34, 3585 (2006). W. W. Wasserman, A. Sandelin, Nat Rev Genet 5, 276 (2004).
  5. 5. Seeder Algorithm: Input • Set B={B1,...,Bm} of background sequences • Set P={P1,...,Pn} of positive sequences • Length k of the motif seed • Length l of the full motif to discover F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
  6. 6. Seeder::Background • Enumerate all words [A C G T] • SMD: smallest HD between w and a |w|-length substring of s • SMDs between word w and background sequences probability distribution gw(y) F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
  7. 7. Seeder::Finder • Sum S(w) of SMDs between w and positive sequences p-value • Closest match to word w* (min. q-value) found in each positive sequence seed PWM • Matrix is extended to motif width and sites maximizing the score to the extended weight matrix are selected • PWM is built from those sites and the process is iterated F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
  8. 8. Seeder::Index F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
  9. 9. Seeder::Index • List of indices corresponding to words of increasing HD • Efficient lookup of minimally distant subsequence F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
  10. 10. Seeder Algorithm: Usage #!/usr/bin/perl use Seeder::Index; use Seeder::Finder; use Seeder::Background; my $index = Seeder::Index->new( seed_width => "6", out_file => "6.index", ); $index->get_index; my $background = Seeder::Background->new( seed_width => "6", strand => "revcom", hd_index_file => "6.index", seq_file => "seqs.fasta", out_file => "seqs.bkgd", ); $background->get_background; my $finder = Seeder::Finder->new( seed_width => "6", strand => "revcom", motif_width => "12", n_motif => "1", hd_index_file => "6.index", seq_file => "prom.fasta", bkgd_file => "seqs.bkgd", out_file => "prom.finder", ); $finder->find_motifs;
  11. 11. Benchmark Against Popular Tools • Binding site sequences from the Transfac database G. K. Sandve, O. Abul, V. Walseng, F. Drablos, BMC Bioinformatics 8, 193 (2007). F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
  12. 12. SSP Promoter Motifs F. Fauteux, M. V. Stromvik, submitted.
  13. 13. http://seeder.agrenv.mcgill.ca
  14. 14. Acknowledgements Supervisor Dr Martina Strömvik Advisory committee Dr Mathieu Blanchette Dr Pierre Dutilleul

×