Fauteux Seeder Bosc2009

801 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
801
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Fauteux Seeder Bosc2009

  1. 1. Seeder: Perl Modules for Cis-regulatory Motif Discovery Bioinformatics Open Source Conference June 28 2009, Stockholm François Fauteux Department of Plant Science McGill University Macdonald campus
  2. 2. Introduction • Precise control of where, when and at which level transcription occurs • Synthetic promoter engineering M. Venter, Trends Plant Sci 12, 118 (2007).
  3. 3. Transcription Factor Binding Sites
  4. 4. DNA Motif Discovery • Searching for imperfect copies of an unknown pattern • Sequence-driven approaches: not guaranteed to yield a global optimum • Enumerative approaches: computationally expensive • Convergence towards low- complexity motifs D. GuhaThakurta, Nucleic Acids Res 34, 3585 (2006). W. W. Wasserman, A. Sandelin, Nat Rev Genet 5, 276 (2004).
  5. 5. Seeder Algorithm: Input • Set B={B1,...,Bm} of background sequences • Set P={P1,...,Pn} of positive sequences • Length k of the motif seed • Length l of the full motif to discover F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
  6. 6. Seeder::Background • Enumerate all words [A C G T] • SMD: smallest HD between w and a |w|-length substring of s • SMDs between word w and background sequences probability distribution gw(y) F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
  7. 7. Seeder::Finder • Sum S(w) of SMDs between w and positive sequences p-value • Closest match to word w* (min. q-value) found in each positive sequence seed PWM • Matrix is extended to motif width and sites maximizing the score to the extended weight matrix are selected • PWM is built from those sites and the process is iterated F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
  8. 8. Seeder::Index F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
  9. 9. Seeder::Index • List of indices corresponding to words of increasing HD • Efficient lookup of minimally distant subsequence F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
  10. 10. Seeder Algorithm: Usage #!/usr/bin/perl use Seeder::Index; use Seeder::Finder; use Seeder::Background; my $index = Seeder::Index->new( seed_width => "6", out_file => "6.index", ); $index->get_index; my $background = Seeder::Background->new( seed_width => "6", strand => "revcom", hd_index_file => "6.index", seq_file => "seqs.fasta", out_file => "seqs.bkgd", ); $background->get_background; my $finder = Seeder::Finder->new( seed_width => "6", strand => "revcom", motif_width => "12", n_motif => "1", hd_index_file => "6.index", seq_file => "prom.fasta", bkgd_file => "seqs.bkgd", out_file => "prom.finder", ); $finder->find_motifs;
  11. 11. Benchmark Against Popular Tools • Binding site sequences from the Transfac database G. K. Sandve, O. Abul, V. Walseng, F. Drablos, BMC Bioinformatics 8, 193 (2007). F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
  12. 12. SSP Promoter Motifs F. Fauteux, M. V. Stromvik, submitted.
  13. 13. http://seeder.agrenv.mcgill.ca
  14. 14. Acknowledgements Supervisor Dr Martina Strömvik Advisory committee Dr Mathieu Blanchette Dr Pierre Dutilleul

×