The DNA Motif finding talk given in March 2010 at the CRUK CRI. Cambridge, UK
It was designed to introduce wet-lab researchers to using web-based tools for doing DNA motif finding, such as on promoters of differentially expressed genes from a microarray experiment.
1. DNA Motif Finding
Stewart MacArthur
Bioinformatics Core
March 11th, 2010
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 1 / 33
2. Introduction
What is a DNA Motif?
DNA motifs are short, recurring patterns that are presumed to have a
biological function.
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 2 / 33
3. Introduction
What is a DNA Motif?
DNA motifs are short, recurring patterns that are presumed to have a
biological function.
• sequence-specific binding sites
• transcription factors
• nucleases
• ribosome binding
• mRNA processing
• splicing
• editing
• polyadenylation
• transcription termination
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 2 / 33
4. Introduction
What is a DNA Motif?
DNA motifs are short, recurring patterns that are presumed to have a
biological function.
• sequence-specific binding sites
• transcription factors
• nucleases
• ribosome binding
• mRNA processing
• splicing
• editing
• polyadenylation
• transcription termination
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 2 / 33
5. Representing a motif
How to represent a DNA motif?
How can we represent the binding specificity of a protein, such that we
can reliably predict its binding to any given sequence?
Restriction enzymes sites can be written as simple DNA sequence,
e.g. GAATTC for EcoRI
5’-G A A T T C-3’
3’-C T T A A G-5’
These sequences can incorporate ambiguity, e.g. GTYRAC for HincII,
using the IUPAC code.
GTYRAC
Y = C or T
R = A or C
All matching sites will be cut by the restriction enzyme
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 3 / 33
6. Representing a motif
Transcription Factors are different...
• Regulatory motifs are often degenerate,variable but similar.
• Transcription factors are often pleiotropic, regulating several
genes, but they may need to be expressed at different levels.
• A side effect of this degeneracy is spurious binding, where the
protein has affinity at positions in the genome other than their
functional sites.
• Degeneracy in restriction enzyme binding would be lethal
• Non-specific binding competes for protein and requires more
protein to be produced than would be required otherwise
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 4 / 33
7. Representing a motif Consensus
The Consensus Sequence
• A consensus binding site is often used to represent transcription
factor binding
• Refers to a sequence that matches all examples of the binding
site closely but not exactly
• There is a trade-off between the ambiguity in the consensus and
its sensitivity
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 5 / 33
8. Representing a motif Consensus
The Consensus Sequence
• A consensus binding site is often used to represent transcription
factor binding
• Refers to a sequence that matches all examples of the binding
site closely but not exactly
• There is a trade-off between the ambiguity in the consensus and
its sensitivity
TACGAT
TATAAT
TATAAT
GATACT
TATGAT
TATGTT
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 5 / 33
9. Representing a motif Consensus
The Consensus Sequence : Example
TACGAT
TATAAT
TATAAT
TATACT
TATGAT
TATGTT
TATAAT
Allowing 0 mismatches finds 2/6 Sites
1 site every 4kb
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 6 / 33
10. Representing a motif Consensus
The Consensus Sequence : Example
TACGAT
TATAAT*
TATAAT*
TATACT
TATGAT
TATGTT
TATAAT
Allowing 0 mismatches finds 2/6 Sites
1 site every 4kb
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 6 / 33
11. Representing a motif Consensus
The Consensus Sequence : Example
TACGAT
TATAAT*
TATAAT*
TATACT
TATGAT*
TATGTT
TATAAT
Allowing at most 1 mismatch finds 3/6 Sites
1 site every 200bp
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 6 / 33
12. Representing a motif Consensus
The Consensus Sequence : Example
TACGAT*
TATAAT*
TATAAT*
TATACT*
TATGAT*
TATGTT*
TATAAT
Allowing up to 2 mismatches finds 6/6 Sites
1 site every 30bp
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 6 / 33
13. Representing a motif IUPAC
IUPAC codes
A Adenine
C Cytosine
G Guanine
T Thymine
R A or G
Y C or T
S G or C
W A or T
K G or T
M A or C
B C or G or T
D A or G or T
H A or C or T
V A or C or G
N any base
. or - gap
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 7 / 33
14. Representing a motif IUPAC
The Consensus Sequence : Example
TACGAT
TATAAT
TATAAT
TATACT
TATGAT
TATGTT
TATRNT
Allowing 0 mismatches finds 2/6 Sites
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 8 / 33
15. Representing a motif IUPAC
The Consensus Sequence : Example
TACGAT
TATAAT*
TATAAT*
TATACT
TATGAT*
TATGTT*
TATRNT
Exact match finds 4/6 Sites - 1 site every 500bp
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 8 / 33
16. Representing a motif IUPAC
The Consensus Sequence : Example
TACGAT*
TATAAT*
TATAAT*
TATACT*
TATGAT*
TATGTT*
TATRNT
Up to one mismatch finds 6/6 Sites - 1 site every 30bp
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 8 / 33
17. Representing a motif Matrix
The Matrix
• A position weight matrix (PWM)
• also called position-specific weight matrix (PSWM)
• also called position-frequency matrix (PFM)
• also called position-specific scoring matrix (PSSM)
• or just matrix
• Alternative to the consensus.
• There is a matrix element for all possible bases at every position.
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 9 / 33
18. Representing a motif Matrix
The Matrix
• A position weight matrix (PWM)
• also called position-specific weight matrix (PSWM)
• also called position-frequency matrix (PFM)
• also called position-specific scoring matrix (PSSM)
• or just matrix
• Alternative to the consensus.
• There is a matrix element for all possible bases at every position.
1 2 3 4 5 6 7 8 9 10 11
A 4 13 5 3 0 0 0 0 17 0 6
C 4 1 2 0 0 0 0 0 0 1 0
G 3 3 0 0 18 0 0 0 1 4 3
T 7 1 11 15 0 18 18 18 0 13 9
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 9 / 33
19. Representing a motif Matrix
Matrix Formats
Counts
A 4 13 5 3 0 0 0 0 17 0 6
C 4 1 2 0 0 0 0 0 0 1 0
G 3 3 0 0 18 0 0 0 1 4 3
T 7 1 11 15 0 18 18 18 0 13 9
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 10 / 33
20. Representing a motif Matrix
Matrix Formats
Counts
A 4 13 5 3 0 0 0 0 17 0 6
C 4 1 2 0 0 0 0 0 0 1 0
G 3 3 0 0 18 0 0 0 1 4 3
T 7 1 11 15 0 18 18 18 0 13 9
Frequency
A 0.2 0.7 0.3 0.2 0.0 0.0 0.0 0.0 0.9 0.0 0.3
C 0.2 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0
G 0.2 0.2 0.0 0.0 1.0 0.0 0.0 0.0 0.1 0.2 0.2
T 0.4 0.1 0.6 0.8 0.0 1.0 1.0 1.0 0.0 0.7 0.5
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 10 / 33
21. Representing a motif Matrix
Matrix Formats
Counts
A 4 13 5 3 0 0 0 0 17 0 6
C 4 1 2 0 0 0 0 0 0 1 0
G 3 3 0 0 18 0 0 0 1 4 3
T 7 1 11 15 0 18 18 18 0 13 9
Frequency
A 0.2 0.7 0.3 0.2 0.0 0.0 0.0 0.0 0.9 0.0 0.3
C 0.2 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0
G 0.2 0.2 0.0 0.0 1.0 0.0 0.0 0.0 0.1 0.2 0.2
T 0.4 0.1 0.6 0.8 0.0 1.0 1.0 1.0 0.0 0.7 0.5
Weight (log odds)
A -0.1 1.0 0.1 -0.4 -2.9 -2.9 -2.9 -2.9 1.3 -2.9 0.3
C -0.1 -1.3 -0.7 -2.9 -2.9 -2.9 -2.9 -2.9 -2.9 -1.3 -2.9
G -0.4 -0.4 -2.9 -2.9 1.3 -2.9 -2.9 -2.9 -1.3 -0.1 -0.4
T 0.4 -1.3 0.9 1.2 -2.9 1.3 1.3 1.3 -2.9 1.0 0.7
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 10 / 33
22. Representing a motif Matrix
Sequence Logos
• A visual representation of the
motif A 4 13 5 3 0 0 0 0 17 0 6
C 4 1 2 0 0 0 0 0 0 1 0
• Each column of the matrix is G 3 3 0 0 18 0 0 0 1 4 3
T 7 1 11 15 0 18 18 18 0 13 9
represented as a stack of
letters whose size is
proportional to the
corresponding residue
frequency
• The total height of each
column is proportional to its
information content.
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 11 / 33
23. Information theory
Information Theory
• Information theory is a branch of applied mathematics involved
with the quantification of information
• It has been applied to DNA motifs in order to determine the
amount of uncertainly at each position in a site
• Uncertainly is measured in bits of information, which is on a log2
scale.
• Information is a decrease in uncertainty
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 12 / 33
24. Information theory
Information theory
A 4 13 5 3 0 0 0 0 17 0 6
C 4 1 2 0 0 0 0 0 0 1 0
G 3 3 0 0 18 0 0 0 1 4 3
T 7 1 11 15 0 18 18 18 0 13 9
• 1 base occurs every time - 2 bits
• 2 bases occur 50% of time - 1bit
• 4 bases occur equally - 0 bits
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 13 / 33
25. Information theory
Information theory
A 4 13 5 3 0 0 0 0 17 0 6
C 4 1 2 0 0 0 0 0 0 1 0
G 3 3 0 0 18 0 0 0 1 4 3
T 7 1 11 15 0 18 18 18 0 13 9
• 1 base occurs every time - 2 bits
• 2 bases occur 50% of time - 1bit
• 4 bases occur equally - 0 bits
Example
Ii = 2 + fb,i log2 fb,i
1 = 2 + 0.5 × log2 (0.5) + 0.5 × log2 (0.5)
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 13 / 33
26. Information theory
Why do we want to find them?
Expression Microarrays
• Find co-regulated genes
• Suggest Pathways
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 14 / 33
27. Information theory
Why do we want to find them?
Expression Microarrays ChIP seq/chip
• Find co-regulated genes • Determine binding
• Suggest Pathways preferences
• Find co-factors
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 14 / 33
28. Information theory
Two Methods
Pattern Matching
Finding known motifs
• Does protein X bind upstream
of my genes?
• Does it bind more than
expected by chance?
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 15 / 33
29. Information theory
Two Methods
Pattern Matching Pattern Discovery
Finding known motifs Finding unknown motifs
• Does protein X bind upstream • What motifs are upstream of
of my genes? my genes?
• Does it bind more than • What are these motifs
expected by chance?
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 15 / 33
30. Information theory
Two Methods
Pattern Matching Pattern Discovery
Finding known motifs Finding unknown motifs
• Does protein X bind upstream • What motifs are upstream of
of my genes? my genes?
• Does it bind more than • What are these motifs
expected by chance?
e.g. Patser, Pscan, Mast.. e.g. MEME, Weeder, MDScan ...
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 15 / 33
31. Databases of Motifs
Where can we find known motifs?
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 16 / 33
32. Databases of Motifs
Where can we find known motifs?
Online databases
• Multicellular Eukaryotes
• Jaspar
• Transfac
• Pazar
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 16 / 33
33. Databases of Motifs
Where can we find known motifs?
Online databases
• Multicellular Eukaryotes
• Jaspar
• Transfac
• Pazar
• Yeast
• Yeastract
• SCPD
• Prokaryotes
• RegulonDB
• Prodoric
• Other
• UniProbe
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 16 / 33
34. Finding known motifs
How do we find them?
TATATTGTTTATTTTCATGACTTCATGTCGCATGTATTGTTAATTAA
CACATGTCTCATGTACTGGACCATGTCTAAGGGGTGTAAGGGTACTA
ACGAATCGTAGCATGTCCAGAGGTGCGGAGTACGTAAGGAGGGTGCC
CATACATGTCCGTTTCATATGAGCCTGCATTAATGTACCAACCTTCA
ACCATGTCTCAACATGTCGCGGGTGTGCCTCCACGTACGAGCCGGAA
GTCGACTCGCATGTCTGTCAGTATTATCCAAAGCATGTCGACCTCTT
CATGTCAGCGAACGCAAGATCTTCATATGAGCCTGCATTAATGTACC
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 17 / 33
35. Finding known motifs
Pattern Matching
Counts
A 4 13 5 3 0 0 0 0 17 0 6
C 4 1 2 0 0 0 0 0 0 1 0
G 3 3 0 0 18 0 0 0 1 4 3
T 7 1 11 15 0 18 18 18 0 13 9
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 18 / 33
36. Finding known motifs
Pattern Matching
Counts
A 4 13 5 3 0 0 0 0 17 0 6
C 4 1 2 0 0 0 0 0 0 1 0
G 3 3 0 0 18 0 0 0 1 4 3
T 7 1 11 15 0 18 18 18 0 13 9
Frequency
A 0.2 0.7 0.3 0.2 0.0 0.0 0.0 0.0 0.9 0.0 0.3
C 0.2 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0
G 0.2 0.2 0.0 0.0 1.0 0.0 0.0 0.0 0.1 0.2 0.2
T 0.4 0.1 0.6 0.8 0.0 1.0 1.0 1.0 0.0 0.7 0.5
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 18 / 33
37. Finding known motifs
Pattern Matching
Counts
A 4 13 5 3 0 0 0 0 17 0 6
C 4 1 2 0 0 0 0 0 0 1 0
G 3 3 0 0 18 0 0 0 1 4 3
T 7 1 11 15 0 18 18 18 0 13 9
Frequency
A 0.2 0.7 0.3 0.2 0.0 0.0 0.0 0.0 0.9 0.0 0.3
C 0.2 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0
G 0.2 0.2 0.0 0.0 1.0 0.0 0.0 0.0 0.1 0.2 0.2
T 0.4 0.1 0.6 0.8 0.0 1.0 1.0 1.0 0.0 0.7 0.5
Weight (log odds)
A -0.1 1.0 0.1 -0.4 -2.9 -2.9 -2.9 -2.9 1.3 -2.9 0.3
C -0.1 -1.3 -0.7 -2.9 -2.9 -2.9 -2.9 -2.9 -2.9 -1.3 -2.9
G -0.4 -0.4 -2.9 -2.9 1.3 -2.9 -2.9 -2.9 -1.3 -0.1 -0.4
T 0.4 -1.3 0.9 1.2 -2.9 1.3 1.3 1.3 -2.9 1.0 0.7
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 18 / 33
38. Finding known motifs
Pattern Matching
A -0.1 1.0 0.1 -0.4 -2.9 -2.9 -2.9 -2.9 1.3 -2.9 0.3
C -0.1 -1.3 -0.7 -2.9 -2.9 -2.9 -2.9 -2.9 -2.9 -1.3 -2.9
G -0.4 -0.4 -2.9 -2.9 1.3 -2.9 -2.9 -2.9 -1.3 -0.1 -0.4
T 0.4 -1.3 0.9 1.2 -2.9 1.3 1.3 1.3 -2.9 1.0 0.7
TATATTGTTTATTTTCATGACTTCATGTCGCATGTATTGTTAATTAA
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 19 / 33
39. Finding known motifs
Pattern Matching
A -0.1 1.0 0.1 -0.4 -2.9 -2.9 -2.9 -2.9 1.3 -2.9 0.3
C -0.1 -1.3 -0.7 -2.9 -2.9 -2.9 -2.9 -2.9 -2.9 -1.3 -2.9
G -0.4 -0.4 -2.9 -2.9 1.3 -2.9 -2.9 -2.9 -1.3 -0.1 -0.4
T 0.4 -1.3 0.9 1.2 -2.9 1.3 1.3 1.3 -2.9 1.0 0.7
T A T A T T G T T T A
TATATTGTTTA TTTTCATGACTTCATGTCGCATGTATTGTTAATTAA
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 19 / 33
40. Finding known motifs
Pattern Matching
A -0.1 1.0 0.1 -0.4 -2.9 -2.9 -2.9 -2.9 1.3 -2.9 0.3
C -0.1 -1.3 -0.7 -2.9 -2.9 -2.9 -2.9 -2.9 -2.9 -1.3 -2.9
G -0.4 -0.4 -2.9 -2.9 1.3 -2.9 -2.9 -2.9 -1.3 -0.1 -0.4
T 0.4 -1.3 0.9 1.2 -2.9 1.3 1.3 1.3 -2.9 1.0 0.7
A T A T T G T T T A T
T ATATTGTTTAT TTTCATGACTTCATGTCGCATGTATTGTTAATTAA
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 19 / 33
41. Finding known motifs
Pattern Matching
A -0.1 1.0 0.1 -0.4 -2.9 -2.9 -2.9 -2.9 1.3 -2.9 0.3
C -0.1 -1.3 -0.7 -2.9 -2.9 -2.9 -2.9 -2.9 -2.9 -1.3 -2.9
G -0.4 -0.4 -2.9 -2.9 1.3 -2.9 -2.9 -2.9 -1.3 -0.1 -0.4
T 0.4 -1.3 0.9 1.2 -2.9 1.3 1.3 1.3 -2.9 1.0 0.7
T A T T G T T T A T T
TA TATTGTTTATT TTCATGACTTCATGTCGCATGTATTGTTAATTAA
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 19 / 33
42. Finding known motifs
Pattern Matching
TA TATTGTTTATT TTCATGACTTCATGTCGCATG TATTGTTAATT AA
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 20 / 33
43. Pattern Discovery
Introduction to de-novo motif finding
de-novo or ab-initio motif finding refers to finding motifs “from the
beginning”, i.e. without previous knowledge
Various Methods
• Word-based algorithms e.g. Oligo-Analysis, Weeder
• Expectation-Maximization methods e.g. MEME
• Gibbs sampling methods e.g. Gibbs sampler, MotifSampler
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 21 / 33
44. Pattern Discovery
Guidelines
• If possible, remove repeat patterns from the target sequences
• Use multiple motif prediction algorithms.
• Run probabilistic algorithms multiple times
• Return multiple motifs
• Try a range of motif widths and expected number of sites
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 22 / 33
45. Pattern Discovery
Guidelines
• If possible, remove repeat patterns from the target sequences
• Use multiple motif prediction algorithms.
• Run probabilistic algorithms multiple times
• Return multiple motifs
• Try a range of motif widths and expected number of sites
“... we do not recommend to trust pattern discovery
results with vertebrate genomes. ”
Jacques van Helden
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 22 / 33
56. Recommended Tools RSA Tools
Regulatory Sequence Analysis Tools
http://rsat.ulb.ac.be/rsat/
Modular computer programs specifically designed for the detection of
regulatory signals in non-coding sequences.
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 24 / 33
57. Recommended Tools RSA Tools
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 25 / 33
58. Recommended Tools RSA Tools
Regulatory Sequence Analysis Tools
Nature Protocols Series: Volume 3 No 10 2008
• Using RSAT to scan genome sequences for transcription factor binding
sites and cis-regulatory modules
• Using RSAT oligo-analysis and dyad-analysis tools to discover
regulatory signals in nucleic sequences
• Analyzing multiple data sets by interconnecting RSAT programs via
SOAP Web services - an example with ChIP-chip data
• Network Analysis Tools: from biological networks to clusters and
pathways
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 26 / 33
59. Recommended Tools RSA Tools
Example Workflow
Problem
I have some differentially expressed genes from a microarray
experiment. I would like to know if P53 binds in their promoter regions,
and if so where.
Workflow
• BioMart: Convert Gene IDs, if necessary
• RSAT: retrieve sequence
• JASPAR: Get PWM (MA0106.1)
• RSAT: matrix-scan
• RSAT: feature map
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 27 / 33
60. Recommended Tools Pscan
Pscan
“Finding over-represented transcription
factor binding site motifs in sequences from
co-regulated or co-expressed genes”
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 28 / 33
61. Recommended Tools Pscan
Example Workflow
Problem
I have some differentially expressed genes from a microarray
experiment. I would like to know which transcription factors bind to
their promoters.
Workflow
• BioMart: Convert Gene IDs, if necessary
• Pscan: retrieve sequence
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 29 / 33
62. Recommended Tools Galaxy
Galaxy
http://main.g2.bx.psu.edu
“Galaxy allows you to do analyses you cannot do anywhere
else without the need to install or download anything. You can
analyze multiple alignments, compare genomic annotations, profile
metagenomic samples and much much more...”
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
63. Recommended Tools Galaxy
Galaxy
http://main.g2.bx.psu.edu
“Galaxy allows you to do analyses you cannot do anywhere
else without the need to install or download anything. You can
analyze multiple alignments, compare genomic annotations, profile
metagenomic samples and much much more...”
• Collection of online tools
http://kinchie/galaxy
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
64. Recommended Tools Galaxy
Galaxy
http://main.g2.bx.psu.edu
“Galaxy allows you to do analyses you cannot do anywhere
else without the need to install or download anything. You can
analyze multiple alignments, compare genomic annotations, profile
metagenomic samples and much much more...”
• Collection of online tools
• Modular
http://kinchie/galaxy
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
65. Recommended Tools Galaxy
Galaxy
http://main.g2.bx.psu.edu
“Galaxy allows you to do analyses you cannot do anywhere
else without the need to install or download anything. You can
analyze multiple alignments, compare genomic annotations, profile
metagenomic samples and much much more...”
• Collection of online tools
• Modular
• Can create workflows
http://kinchie/galaxy
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
66. Recommended Tools Galaxy
Galaxy
http://main.g2.bx.psu.edu
“Galaxy allows you to do analyses you cannot do anywhere
else without the need to install or download anything. You can
analyze multiple alignments, compare genomic annotations, profile
metagenomic samples and much much more...”
• Collection of online tools
• Modular
• Can create workflows
• Saved Histories
http://kinchie/galaxy
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
67. Recommended Tools Galaxy
Galaxy
http://main.g2.bx.psu.edu
“Galaxy allows you to do analyses you cannot do anywhere
else without the need to install or download anything. You can
analyze multiple alignments, compare genomic annotations, profile
metagenomic samples and much much more...”
• Collection of online tools • Reproducible analysis
• Modular
• Can create workflows
• Saved Histories
http://kinchie/galaxy
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
68. Recommended Tools Galaxy
Galaxy
http://main.g2.bx.psu.edu
“Galaxy allows you to do analyses you cannot do anywhere
else without the need to install or download anything. You can
analyze multiple alignments, compare genomic annotations, profile
metagenomic samples and much much more...”
• Collection of online tools • Reproducible analysis
• Modular • Shared histories
• Can create workflows
• Saved Histories
http://kinchie/galaxy
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
69. Recommended Tools Galaxy
Galaxy
http://main.g2.bx.psu.edu
“Galaxy allows you to do analyses you cannot do anywhere
else without the need to install or download anything. You can
analyze multiple alignments, compare genomic annotations, profile
metagenomic samples and much much more...”
• Collection of online tools • Reproducible analysis
• Modular • Shared histories
• Can create workflows • In house version
• Saved Histories
http://kinchie/galaxy
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
70. Recommended Tools Galaxy
Galaxy
http://main.g2.bx.psu.edu
“Galaxy allows you to do analyses you cannot do anywhere
else without the need to install or download anything. You can
analyze multiple alignments, compare genomic annotations, profile
metagenomic samples and much much more...”
• Collection of online tools • Reproducible analysis
• Modular • Shared histories
• Can create workflows • In house version
• Saved Histories • Easily extendable
http://kinchie/galaxy
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 30 / 33
71. Recommended Tools MEME Suite
MEME Suite
Suite of web based tools for motif discovery
• MEME - de-novo motif finding
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 31 / 33
72. Recommended Tools MEME Suite
MEME Suite
Suite of web based tools for motif discovery
• MEME - de-novo motif finding
• MAST - find matches to known
motifs (MEME output)
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 31 / 33
73. Recommended Tools MEME Suite
MEME Suite
Suite of web based tools for motif discovery
• MEME - de-novo motif finding
• MAST - find matches to known
motifs (MEME output)
• TOMTOM - Compare motifs to
TRANSFAC and Jaspar
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 31 / 33
74. Further Reading
Further Reading
• Stormo GD. DNA binding sites: representation and discovery.
Bioinformatics. 2000 Jan;16(1):16-23. Review. PubMed PMID:
10812473.
• D’haeseleer P. How does DNA sequence motif discovery work?
Nat Biotechnol. 2006 Aug;24(8):959-61. Review. PubMed PMID:
16900144.
• Das MK, Dai HK. A survey of DNA motif finding algorithms. BMC
Bioinformatics. 2007 Nov 1;8 Suppl 7:S21. Review. PubMed
PMID: 18047721; PubMed Central PMCID: PMC2099490.
• Tompa M, Li N et.al. Assessing computational tools for the
discovery of transcription factor binding sites. Nat Biotechnol.
2005 Jan;23(1):137-44. PubMed PMID: 15637633.
Stewart MacArthur (Bioinformatics Core) DNA Motif Finding March 11th, 2010 32 / 33