Your SlideShare is downloading. ×
DNA Motif Finding 2010
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

DNA Motif Finding 2010

5,961

Published on

The DNA Motif finding talk given in March 2010 at the CRUK CRI. Cambridge, UK …

The DNA Motif finding talk given in March 2010 at the CRUK CRI. Cambridge, UK

It was designed to introduce wet-lab researchers to using web-based tools for doing DNA motif finding, such as on promoters of differentially expressed genes from a microarray experiment.

Published in: Education
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,961
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
248
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. DNA Motif Finding Stewart MacArthur Bioinformatics Core March 11th, 2010 Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 1 / 33
  • 2. Introduction What is a DNA Motif? DNA motifs are short, recurring patterns that are presumed to have a biological function. Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 2 / 33
  • 3. Introduction What is a DNA Motif? DNA motifs are short, recurring patterns that are presumed to have a biological function. • sequence-specific binding sites • transcription factors • nucleases • ribosome binding • mRNA processing • splicing • editing • polyadenylation • transcription termination Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 2 / 33
  • 4. Introduction What is a DNA Motif? DNA motifs are short, recurring patterns that are presumed to have a biological function. • sequence-specific binding sites • transcription factors • nucleases • ribosome binding • mRNA processing • splicing • editing • polyadenylation • transcription termination Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 2 / 33
  • 5. Representing a motif How to represent a DNA motif? How can we represent the binding specificity of a protein, such that we can reliably predict its binding to any given sequence? Restriction enzymes sites can be written as simple DNA sequence, e.g. GAATTC for EcoRI 5’-G A A T T C-3’ 3’-C T T A A G-5’ These sequences can incorporate ambiguity, e.g. GTYRAC for HincII, using the IUPAC code. GTYRAC Y = C or T R = A or C All matching sites will be cut by the restriction enzyme Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 3 / 33
  • 6. Representing a motif Transcription Factors are different... • Regulatory motifs are often degenerate,variable but similar. • Transcription factors are often pleiotropic, regulating several genes, but they may need to be expressed at different levels. • A side effect of this degeneracy is spurious binding, where the protein has affinity at positions in the genome other than their functional sites. • Degeneracy in restriction enzyme binding would be lethal • Non-specific binding competes for protein and requires more protein to be produced than would be required otherwise Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 4 / 33
  • 7. Representing a motif Consensus The Consensus Sequence • A consensus binding site is often used to represent transcription factor binding • Refers to a sequence that matches all examples of the binding site closely but not exactly • There is a trade-off between the ambiguity in the consensus and its sensitivity Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 5 / 33
  • 8. Representing a motif Consensus The Consensus Sequence • A consensus binding site is often used to represent transcription factor binding • Refers to a sequence that matches all examples of the binding site closely but not exactly • There is a trade-off between the ambiguity in the consensus and its sensitivity TACGAT TATAAT TATAAT GATACT TATGAT TATGTT Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 5 / 33
  • 9. Representing a motif Consensus The Consensus Sequence : Example TACGAT TATAAT TATAAT TATACT TATGAT TATGTT TATAAT Allowing 0 mismatches finds 2/6 Sites 1 site every 4kb Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 6 / 33
  • 10. Representing a motif Consensus The Consensus Sequence : Example TACGAT TATAAT* TATAAT* TATACT TATGAT TATGTT TATAAT Allowing 0 mismatches finds 2/6 Sites 1 site every 4kb Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 6 / 33
  • 11. Representing a motif Consensus The Consensus Sequence : Example TACGAT TATAAT* TATAAT* TATACT TATGAT* TATGTT TATAAT Allowing at most 1 mismatch finds 3/6 Sites 1 site every 200bp Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 6 / 33
  • 12. Representing a motif Consensus The Consensus Sequence : Example TACGAT* TATAAT* TATAAT* TATACT* TATGAT* TATGTT* TATAAT Allowing up to 2 mismatches finds 6/6 Sites 1 site every 30bp Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 6 / 33
  • 13. Representing a motif IUPAC IUPAC codes A Adenine C Cytosine G Guanine T Thymine R A or G Y C or T S G or C W A or T K G or T M A or C B C or G or T D A or G or T H A or C or T V A or C or G N any base . or - gap Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 7 / 33
  • 14. Representing a motif IUPAC The Consensus Sequence : Example TACGAT TATAAT TATAAT TATACT TATGAT TATGTT TATRNT Allowing 0 mismatches finds 2/6 Sites Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 8 / 33
  • 15. Representing a motif IUPAC The Consensus Sequence : Example TACGAT TATAAT* TATAAT* TATACT TATGAT* TATGTT* TATRNT Exact match finds 4/6 Sites - 1 site every 500bp Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 8 / 33
  • 16. Representing a motif IUPAC The Consensus Sequence : Example TACGAT* TATAAT* TATAAT* TATACT* TATGAT* TATGTT* TATRNT Up to one mismatch finds 6/6 Sites - 1 site every 30bp Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 8 / 33
  • 17. Representing a motif Matrix The Matrix • A position weight matrix (PWM) • also called position-specific weight matrix (PSWM) • also called position-frequency matrix (PFM) • also called position-specific scoring matrix (PSSM) • or just matrix • Alternative to the consensus. • There is a matrix element for all possible bases at every position. Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 9 / 33
  • 18. Representing a motif Matrix The Matrix • A position weight matrix (PWM) • also called position-specific weight matrix (PSWM) • also called position-frequency matrix (PFM) • also called position-specific scoring matrix (PSSM) • or just matrix • Alternative to the consensus. • There is a matrix element for all possible bases at every position. 1 2 3 4 5 6 7 8 9 10 11 A 4 13 5 3 0 0 0 0 17 0 6 C 4 1 2 0 0 0 0 0 0 1 0 G 3 3 0 0 18 0 0 0 1 4 3 T 7 1 11 15 0 18 18 18 0 13 9 Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 9 / 33
  • 19. Representing a motif Matrix Matrix Formats Counts A 4 13 5 3 0 0 0 0 17 0 6 C 4 1 2 0 0 0 0 0 0 1 0 G 3 3 0 0 18 0 0 0 1 4 3 T 7 1 11 15 0 18 18 18 0 13 9 Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 10 / 33
  • 20. Representing a motif Matrix Matrix Formats Counts A 4 13 5 3 0 0 0 0 17 0 6 C 4 1 2 0 0 0 0 0 0 1 0 G 3 3 0 0 18 0 0 0 1 4 3 T 7 1 11 15 0 18 18 18 0 13 9 Frequency A 0.2 0.7 0.3 0.2 0.0 0.0 0.0 0.0 0.9 0.0 0.3 C 0.2 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 G 0.2 0.2 0.0 0.0 1.0 0.0 0.0 0.0 0.1 0.2 0.2 T 0.4 0.1 0.6 0.8 0.0 1.0 1.0 1.0 0.0 0.7 0.5 Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 10 / 33
  • 21. Representing a motif Matrix Matrix Formats Counts A 4 13 5 3 0 0 0 0 17 0 6 C 4 1 2 0 0 0 0 0 0 1 0 G 3 3 0 0 18 0 0 0 1 4 3 T 7 1 11 15 0 18 18 18 0 13 9 Frequency A 0.2 0.7 0.3 0.2 0.0 0.0 0.0 0.0 0.9 0.0 0.3 C 0.2 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 G 0.2 0.2 0.0 0.0 1.0 0.0 0.0 0.0 0.1 0.2 0.2 T 0.4 0.1 0.6 0.8 0.0 1.0 1.0 1.0 0.0 0.7 0.5 Weight (log odds) A -0.1 1.0 0.1 -0.4 -2.9 -2.9 -2.9 -2.9 1.3 -2.9 0.3 C -0.1 -1.3 -0.7 -2.9 -2.9 -2.9 -2.9 -2.9 -2.9 -1.3 -2.9 G -0.4 -0.4 -2.9 -2.9 1.3 -2.9 -2.9 -2.9 -1.3 -0.1 -0.4 T 0.4 -1.3 0.9 1.2 -2.9 1.3 1.3 1.3 -2.9 1.0 0.7 Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 10 / 33
  • 22. Representing a motif Matrix Sequence Logos • A visual representation of the motif A 4 13 5 3 0 0 0 0 17 0 6 C 4 1 2 0 0 0 0 0 0 1 0 • Each column of the matrix is G 3 3 0 0 18 0 0 0 1 4 3 T 7 1 11 15 0 18 18 18 0 13 9 represented as a stack of letters whose size is proportional to the corresponding residue frequency • The total height of each column is proportional to its information content. Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 11 / 33
  • 23. Information theory Information Theory • Information theory is a branch of applied mathematics involved with the quantification of information • It has been applied to DNA motifs in order to determine the amount of uncertainly at each position in a site • Uncertainly is measured in bits of information, which is on a log2 scale. • Information is a decrease in uncertainty Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 12 / 33
  • 24. Information theory Information theory A 4 13 5 3 0 0 0 0 17 0 6 C 4 1 2 0 0 0 0 0 0 1 0 G 3 3 0 0 18 0 0 0 1 4 3 T 7 1 11 15 0 18 18 18 0 13 9 • 1 base occurs every time - 2 bits • 2 bases occur 50% of time - 1bit • 4 bases occur equally - 0 bits Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 13 / 33
  • 25. Information theory Information theory A 4 13 5 3 0 0 0 0 17 0 6 C 4 1 2 0 0 0 0 0 0 1 0 G 3 3 0 0 18 0 0 0 1 4 3 T 7 1 11 15 0 18 18 18 0 13 9 • 1 base occurs every time - 2 bits • 2 bases occur 50% of time - 1bit • 4 bases occur equally - 0 bits Example Ii = 2 + fb,i log2 fb,i 1 = 2 + 0.5 × log2 (0.5) + 0.5 × log2 (0.5) Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 13 / 33
  • 26. Information theory Why do we want to find them? Expression Microarrays • Find co-regulated genes • Suggest Pathways Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 14 / 33
  • 27. Information theory Why do we want to find them? Expression Microarrays ChIP seq/chip • Find co-regulated genes • Determine binding • Suggest Pathways preferences • Find co-factors Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 14 / 33
  • 28. Information theory Two Methods Pattern Matching Finding known motifs • Does protein X bind upstream of my genes? • Does it bind more than expected by chance? Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 15 / 33
  • 29. Information theory Two Methods Pattern Matching Pattern Discovery Finding known motifs Finding unknown motifs • Does protein X bind upstream • What motifs are upstream of of my genes? my genes? • Does it bind more than • What are these motifs expected by chance? Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 15 / 33
  • 30. Information theory Two Methods Pattern Matching Pattern Discovery Finding known motifs Finding unknown motifs • Does protein X bind upstream • What motifs are upstream of of my genes? my genes? • Does it bind more than • What are these motifs expected by chance? e.g. Patser, Pscan, Mast.. e.g. MEME, Weeder, MDScan ... Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 15 / 33
  • 31. Databases of Motifs Where can we find known motifs? Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 16 / 33
  • 32. Databases of Motifs Where can we find known motifs? Online databases • Multicellular Eukaryotes • Jaspar • Transfac • Pazar Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 16 / 33
  • 33. Databases of Motifs Where can we find known motifs? Online databases • Multicellular Eukaryotes • Jaspar • Transfac • Pazar • Yeast • Yeastract • SCPD • Prokaryotes • RegulonDB • Prodoric • Other • UniProbe Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 16 / 33
  • 34. Finding known motifs How do we find them? TATATTGTTTATTTTCATGACTTCATGTCGCATGTATTGTTAATTAA CACATGTCTCATGTACTGGACCATGTCTAAGGGGTGTAAGGGTACTA ACGAATCGTAGCATGTCCAGAGGTGCGGAGTACGTAAGGAGGGTGCC CATACATGTCCGTTTCATATGAGCCTGCATTAATGTACCAACCTTCA ACCATGTCTCAACATGTCGCGGGTGTGCCTCCACGTACGAGCCGGAA GTCGACTCGCATGTCTGTCAGTATTATCCAAAGCATGTCGACCTCTT CATGTCAGCGAACGCAAGATCTTCATATGAGCCTGCATTAATGTACC Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 17 / 33
  • 35. Finding known motifs Pattern Matching Counts A 4 13 5 3 0 0 0 0 17 0 6 C 4 1 2 0 0 0 0 0 0 1 0 G 3 3 0 0 18 0 0 0 1 4 3 T 7 1 11 15 0 18 18 18 0 13 9 Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 18 / 33
  • 36. Finding known motifs Pattern Matching Counts A 4 13 5 3 0 0 0 0 17 0 6 C 4 1 2 0 0 0 0 0 0 1 0 G 3 3 0 0 18 0 0 0 1 4 3 T 7 1 11 15 0 18 18 18 0 13 9 Frequency A 0.2 0.7 0.3 0.2 0.0 0.0 0.0 0.0 0.9 0.0 0.3 C 0.2 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 G 0.2 0.2 0.0 0.0 1.0 0.0 0.0 0.0 0.1 0.2 0.2 T 0.4 0.1 0.6 0.8 0.0 1.0 1.0 1.0 0.0 0.7 0.5 Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 18 / 33
  • 37. Finding known motifs Pattern Matching Counts A 4 13 5 3 0 0 0 0 17 0 6 C 4 1 2 0 0 0 0 0 0 1 0 G 3 3 0 0 18 0 0 0 1 4 3 T 7 1 11 15 0 18 18 18 0 13 9 Frequency A 0.2 0.7 0.3 0.2 0.0 0.0 0.0 0.0 0.9 0.0 0.3 C 0.2 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 G 0.2 0.2 0.0 0.0 1.0 0.0 0.0 0.0 0.1 0.2 0.2 T 0.4 0.1 0.6 0.8 0.0 1.0 1.0 1.0 0.0 0.7 0.5 Weight (log odds) A -0.1 1.0 0.1 -0.4 -2.9 -2.9 -2.9 -2.9 1.3 -2.9 0.3 C -0.1 -1.3 -0.7 -2.9 -2.9 -2.9 -2.9 -2.9 -2.9 -1.3 -2.9 G -0.4 -0.4 -2.9 -2.9 1.3 -2.9 -2.9 -2.9 -1.3 -0.1 -0.4 T 0.4 -1.3 0.9 1.2 -2.9 1.3 1.3 1.3 -2.9 1.0 0.7 Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 18 / 33
  • 38. Finding known motifs Pattern Matching A -0.1 1.0 0.1 -0.4 -2.9 -2.9 -2.9 -2.9 1.3 -2.9 0.3 C -0.1 -1.3 -0.7 -2.9 -2.9 -2.9 -2.9 -2.9 -2.9 -1.3 -2.9 G -0.4 -0.4 -2.9 -2.9 1.3 -2.9 -2.9 -2.9 -1.3 -0.1 -0.4 T 0.4 -1.3 0.9 1.2 -2.9 1.3 1.3 1.3 -2.9 1.0 0.7 TATATTGTTTATTTTCATGACTTCATGTCGCATGTATTGTTAATTAA Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 19 / 33
  • 39. Finding known motifs Pattern Matching A -0.1 1.0 0.1 -0.4 -2.9 -2.9 -2.9 -2.9 1.3 -2.9 0.3 C -0.1 -1.3 -0.7 -2.9 -2.9 -2.9 -2.9 -2.9 -2.9 -1.3 -2.9 G -0.4 -0.4 -2.9 -2.9 1.3 -2.9 -2.9 -2.9 -1.3 -0.1 -0.4 T 0.4 -1.3 0.9 1.2 -2.9 1.3 1.3 1.3 -2.9 1.0 0.7 T A T A T T G T T T A TATATTGTTTA TTTTCATGACTTCATGTCGCATGTATTGTTAATTAA Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 19 / 33
  • 40. Finding known motifs Pattern Matching A -0.1 1.0 0.1 -0.4 -2.9 -2.9 -2.9 -2.9 1.3 -2.9 0.3 C -0.1 -1.3 -0.7 -2.9 -2.9 -2.9 -2.9 -2.9 -2.9 -1.3 -2.9 G -0.4 -0.4 -2.9 -2.9 1.3 -2.9 -2.9 -2.9 -1.3 -0.1 -0.4 T 0.4 -1.3 0.9 1.2 -2.9 1.3 1.3 1.3 -2.9 1.0 0.7 A T A T T G T T T A T T ATATTGTTTAT TTTCATGACTTCATGTCGCATGTATTGTTAATTAA Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 19 / 33
  • 41. Finding known motifs Pattern Matching A -0.1 1.0 0.1 -0.4 -2.9 -2.9 -2.9 -2.9 1.3 -2.9 0.3 C -0.1 -1.3 -0.7 -2.9 -2.9 -2.9 -2.9 -2.9 -2.9 -1.3 -2.9 G -0.4 -0.4 -2.9 -2.9 1.3 -2.9 -2.9 -2.9 -1.3 -0.1 -0.4 T 0.4 -1.3 0.9 1.2 -2.9 1.3 1.3 1.3 -2.9 1.0 0.7 T A T T G T T T A T T TA TATTGTTTATT TTCATGACTTCATGTCGCATGTATTGTTAATTAA Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 19 / 33
  • 42. Finding known motifs Pattern Matching TA TATTGTTTATT TTCATGACTTCATGTCGCATG TATTGTTAATT AA Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 20 / 33
  • 43. Pattern Discovery Introduction to de-novo motif finding de-novo or ab-initio motif finding refers to finding motifs “from the beginning”, i.e. without previous knowledge Various Methods • Word-based algorithms e.g. Oligo-Analysis, Weeder • Expectation-Maximization methods e.g. MEME • Gibbs sampling methods e.g. Gibbs sampler, MotifSampler Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 21 / 33
  • 44. Pattern Discovery Guidelines • If possible, remove repeat patterns from the target sequences • Use multiple motif prediction algorithms. • Run probabilistic algorithms multiple times • Return multiple motifs • Try a range of motif widths and expected number of sites Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 22 / 33
  • 45. Pattern Discovery Guidelines • If possible, remove repeat patterns from the target sequences • Use multiple motif prediction algorithms. • Run probabilistic algorithms multiple times • Return multiple motifs • Try a range of motif widths and expected number of sites “... we do not recommend to trust pattern discovery results with vertebrate genomes. ” Jacques van Helden Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 22 / 33
  • 46. Recommended Tools Recommended Tools Pattern Matching • RSAT Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 23 / 33
  • 47. Recommended Tools Recommended Tools Pattern Matching • RSAT • Pscan Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 23 / 33
  • 48. Recommended Tools Recommended Tools Pattern Matching • RSAT • Pscan • Galaxy Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 23 / 33
  • 49. Recommended Tools Recommended Tools Pattern Matching • RSAT • Pscan • Galaxy • MotifMogul Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 23 / 33
  • 50. Recommended Tools Recommended Tools Pattern Matching • RSAT • Pscan • Galaxy • MotifMogul Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 23 / 33
  • 51. Recommended Tools Recommended Tools Pattern Matching Pattern Discovery • RSAT • RSAT • Pscan • Galaxy • MotifMogul Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 23 / 33
  • 52. Recommended Tools Recommended Tools Pattern Matching Pattern Discovery • RSAT • RSAT • Pscan • MEME • Galaxy • MotifMogul Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 23 / 33
  • 53. Recommended Tools Recommended Tools Pattern Matching Pattern Discovery • RSAT • RSAT • Pscan • MEME • Galaxy • Weeder • MotifMogul Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 23 / 33
  • 54. Recommended Tools Recommended Tools Pattern Matching Pattern Discovery • RSAT • RSAT • Pscan • MEME • Galaxy • Weeder • MotifMogul • WebMOTIFS Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 23 / 33
  • 55. Recommended Tools Recommended Tools Pattern Matching Pattern Discovery • RSAT • RSAT • Pscan • MEME • Galaxy • Weeder • MotifMogul • WebMOTIFS Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 23 / 33
  • 56. Recommended Tools RSA Tools Regulatory Sequence Analysis Tools http://rsat.ulb.ac.be/rsat/ Modular computer programs specifically designed for the detection of regulatory signals in non-coding sequences. Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 24 / 33
  • 57. Recommended Tools RSA Tools Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 25 / 33
  • 58. Recommended Tools RSA Tools Regulatory Sequence Analysis Tools Nature Protocols Series: Volume 3 No 10 2008 • Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules • Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences • Analyzing multiple data sets by interconnecting RSAT programs via SOAP Web services - an example with ChIP-chip data • Network Analysis Tools: from biological networks to clusters and pathways Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 26 / 33
  • 59. Recommended Tools RSA Tools Example Workflow Problem I have some differentially expressed genes from a microarray experiment. I would like to know if P53 binds in their promoter regions, and if so where. Workflow • BioMart: Convert Gene IDs, if necessary • RSAT: retrieve sequence • JASPAR: Get PWM (MA0106.1) • RSAT: matrix-scan • RSAT: feature map Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 27 / 33
  • 60. Recommended Tools Pscan Pscan “Finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes” Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 28 / 33
  • 61. Recommended Tools Pscan Example Workflow Problem I have some differentially expressed genes from a microarray experiment. I would like to know which transcription factors bind to their promoters. Workflow • BioMart: Convert Gene IDs, if necessary • Pscan: retrieve sequence Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 29 / 33
  • 62. Recommended Tools Galaxy Galaxy http://main.g2.bx.psu.edu “Galaxy allows you to do analyses you cannot do anywhere else without the need to install or download anything. You can analyze multiple alignments, compare genomic annotations, profile metagenomic samples and much much more...” Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
  • 63. Recommended Tools Galaxy Galaxy http://main.g2.bx.psu.edu “Galaxy allows you to do analyses you cannot do anywhere else without the need to install or download anything. You can analyze multiple alignments, compare genomic annotations, profile metagenomic samples and much much more...” • Collection of online tools http://kinchie/galaxy Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
  • 64. Recommended Tools Galaxy Galaxy http://main.g2.bx.psu.edu “Galaxy allows you to do analyses you cannot do anywhere else without the need to install or download anything. You can analyze multiple alignments, compare genomic annotations, profile metagenomic samples and much much more...” • Collection of online tools • Modular http://kinchie/galaxy Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
  • 65. Recommended Tools Galaxy Galaxy http://main.g2.bx.psu.edu “Galaxy allows you to do analyses you cannot do anywhere else without the need to install or download anything. You can analyze multiple alignments, compare genomic annotations, profile metagenomic samples and much much more...” • Collection of online tools • Modular • Can create workflows http://kinchie/galaxy Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
  • 66. Recommended Tools Galaxy Galaxy http://main.g2.bx.psu.edu “Galaxy allows you to do analyses you cannot do anywhere else without the need to install or download anything. You can analyze multiple alignments, compare genomic annotations, profile metagenomic samples and much much more...” • Collection of online tools • Modular • Can create workflows • Saved Histories http://kinchie/galaxy Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
  • 67. Recommended Tools Galaxy Galaxy http://main.g2.bx.psu.edu “Galaxy allows you to do analyses you cannot do anywhere else without the need to install or download anything. You can analyze multiple alignments, compare genomic annotations, profile metagenomic samples and much much more...” • Collection of online tools • Reproducible analysis • Modular • Can create workflows • Saved Histories http://kinchie/galaxy Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
  • 68. Recommended Tools Galaxy Galaxy http://main.g2.bx.psu.edu “Galaxy allows you to do analyses you cannot do anywhere else without the need to install or download anything. You can analyze multiple alignments, compare genomic annotations, profile metagenomic samples and much much more...” • Collection of online tools • Reproducible analysis • Modular • Shared histories • Can create workflows • Saved Histories http://kinchie/galaxy Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
  • 69. Recommended Tools Galaxy Galaxy http://main.g2.bx.psu.edu “Galaxy allows you to do analyses you cannot do anywhere else without the need to install or download anything. You can analyze multiple alignments, compare genomic annotations, profile metagenomic samples and much much more...” • Collection of online tools • Reproducible analysis • Modular • Shared histories • Can create workflows • In house version • Saved Histories http://kinchie/galaxy Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
  • 70. Recommended Tools Galaxy Galaxy http://main.g2.bx.psu.edu “Galaxy allows you to do analyses you cannot do anywhere else without the need to install or download anything. You can analyze multiple alignments, compare genomic annotations, profile metagenomic samples and much much more...” • Collection of online tools • Reproducible analysis • Modular • Shared histories • Can create workflows • In house version • Saved Histories • Easily extendable http://kinchie/galaxy Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
  • 71. Recommended Tools MEME Suite MEME Suite Suite of web based tools for motif discovery • MEME - de-novo motif finding Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 31 / 33
  • 72. Recommended Tools MEME Suite MEME Suite Suite of web based tools for motif discovery • MEME - de-novo motif finding • MAST - find matches to known motifs (MEME output) Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 31 / 33
  • 73. Recommended Tools MEME Suite MEME Suite Suite of web based tools for motif discovery • MEME - de-novo motif finding • MAST - find matches to known motifs (MEME output) • TOMTOM - Compare motifs to TRANSFAC and Jaspar Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 31 / 33
  • 74. Further Reading Further Reading • Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000 Jan;16(1):16-23. Review. PubMed PMID: 10812473. • D’haeseleer P. How does DNA sequence motif discovery work? Nat Biotechnol. 2006 Aug;24(8):959-61. Review. PubMed PMID: 16900144. • Das MK, Dai HK. A survey of DNA motif finding algorithms. BMC Bioinformatics. 2007 Nov 1;8 Suppl 7:S21. Review. PubMed PMID: 18047721; PubMed Central PMCID: PMC2099490. • Tompa M, Li N et.al. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol. 2005 Jan;23(1):137-44. PubMed PMID: 15637633. Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 32 / 33
  • 75. Practical Practical Session Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 33 / 33

×