Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Multiple Alignment                    Dr Avril Coghlan                   alc@sanger.ac.ukNote: this talk contains animatio...
Pairwise versus Multiple Alignment• So far we have considered the alignment of two  sequences (‘pairwise alignment’)      ...
Multiple alignment• Multiple alignments are useful for comparing many  homologous sequences at once Multiple alignment of ...
Real data: Eyeless proteins              Do you think it’s sensible to              make a global multiple              al...
The alignment is not veryreliable in regions of lowsimilarityfor example look at thealignment of fly Eyeless tothe other p...
•   Algorithms for aligning 2 sequences (eg. N-W, S-W) can be    extended to multiple sequences    For aligning 3 sequence...
• The run-time increases exponentially with the  number of sequences you want to align  Aligning 4 sequences of 100 amino ...
CLUSTAL• A popular heuristic algorithm is CLUSTAL, by Des  Higgins and Paul Sharp at TCD (1988)  Cited >37,000 times; D. H...
•1 Then aligns the most similar pair of sequences  This gives us an alignment of 2 sequences (called a ‘profile’)  eg. ali...
• A property of this method is that gap creation is  irreversible: ‘once a gap, always a gap’                             ...
Software for making alignments• For multiple alignment (heuristic programs)  CLUSTAL http://www.ebi.ac.uk/Tools/msa/clusta...
Further Reading•   Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn•   Chapter 6 in Deonier et al bo...
Upcoming SlideShare
Loading in …5
×

Multiple alignment

9,206 views

Published on

Published in: Education
  • Be the first to comment

Multiple alignment

  1. 1. Multiple Alignment Dr Avril Coghlan alc@sanger.ac.ukNote: this talk contains animations which can only be seen bydownloading and using ‘View Slide show’ in Powerpoint
  2. 2. Pairwise versus Multiple Alignment• So far we have considered the alignment of two sequences (‘pairwise alignment’) Q K E S G P S S S Y C | | | | | V Q Q E S G L V R T T C• Alignment can be performed between three or more sequences (‘multiple alignment’) Q K E S G P S S S Y C | | | | | V Q Q E S G L V R T T C | | | | | | | | V Q K E S L L V R S T C
  3. 3. Multiple alignment• Multiple alignments are useful for comparing many homologous sequences at once Multiple alignment of part of Eyeless from different animals• Multiple alignments can be global or local The majority of widely used programs for making multiple alignments (eg. CLUSTAL, T-COFFEE) create global multiple alignments (not local multiple alignments) If the sequences share one stretch of high sequence similarity, it might make sense to make a multiple alignment of just that region of similarity eg. for Eyeless You can “cut out” the region of similarity from each sequence, & make a multiple alignment of that region eg. using CLUSTAL
  4. 4. Real data: Eyeless proteins Do you think it’s sensible to make a global multiple alignment of these sequences?
  5. 5. The alignment is not veryreliable in regions of lowsimilarityfor example look at thealignment of fly Eyeless tothe other proteins here
  6. 6. • Algorithms for aligning 2 sequences (eg. N-W, S-W) can be extended to multiple sequences For aligning 3 sequences using N-W, we fill in a table T that is a 3D cube, using the recurrence relation: T(i-1,j-1,k-1) + σ(S1(i),S2(j)) + σ(S1(i),S3(k)) + σ(S2(j),S3(k)) T(i, j, k) = max T(i-1, j, k) + gap penalty + gap penalty T(i, j-1, k) + gap penalty + gap penalty T(i, j, k-1) + gap penalty + gap penalty T(i-1, j, k-1) + σ(S1(i),S3(k)) + gap penalty + gap penalty T(i, j-1, k-1) + σ(S2(j),S3(k)) + gap penalty + gap penalty T(i-1, j-1, k) + σ(S1(i),S2(j)) + gap penalty + gap penalty
  7. 7. • The run-time increases exponentially with the number of sequences you want to align Aligning 4 sequences of 100 amino acids takes ~3 days!• Heuristic algorithms for multiple alignment are generally used, as they are fast eg. CLUSTAL, T-COFFEE ‘Heuristic’ means they’re not guaranteed to find the best solution (best alignment here) (While N-W & S-W are proven to find the best alignment)• A popular heuristic algorithm is CLUSTAL, by Des Higgins and Paul Sharp at Trinity College Dublin (1988) Uses a ‘progressive alignment’ approach ie. aligns the most similar 2 sequences first; adds the next most similar sequence to that alignment; adds the next most similar sequence … etc.
  8. 8. CLUSTAL• A popular heuristic algorithm is CLUSTAL, by Des Higgins and Paul Sharp at TCD (1988) Cited >37,000 times; D. Higgins is Ireland’s most cited scientist• CLUSTAL makes a global multiple alignment using a ‘progressive alignment’ approach• First computes all pairwise alignments and calculates sequence similarity between pairs• These similarities are used to build a rough ‘guide tree’ S1 S2 S3 S4
  9. 9. •1 Then aligns the most similar pair of sequences This gives us an alignment of 2 sequences (called a ‘profile’) eg. alignment of sequences S1 and S2•2 Aligns the next closest pair of sequences (or pair of profiles, or sequence and profile) eg. alignment of sequences S1 and S2•3 Aligns the next closest pair of seqs/profiles eg. alignment of profiles S1-S2 and S3-S4 MQTIF S1 MQTIF LH-IW 1 MQTIF LHIW S2 LH-IW LQS-W 3 LQSW L-S-F LQSW S3 2 L-SF LSF S4
  10. 10. • A property of this method is that gap creation is irreversible: ‘once a gap, always a gap’ MQTIF S1 MQTIF LH-IW 1 MQTIF LHIW S2 LH-IW LQS-W 3 LQSW L-S-F LQSW S3 2 L-SF LSF S4• This is a ‘heuristic algorithm’, ie. is not guaranteed to give the best alignment However, is very fast & works well in most cases
  11. 11. Software for making alignments• For multiple alignment (heuristic programs) CLUSTAL http://www.ebi.ac.uk/Tools/msa/clustalw2/ T-COFFEE http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi MUSCLE http://www.ebi.ac.uk/Tools/msa/muscle/ MAFFT http://mafft.cbrc.jp/alignment/software/
  12. 12. Further Reading• Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn• Chapter 6 in Deonier et al book Computational Genome Analysis• Practical on multiple alignment in R in the Little Book of R for Bioinformatics: https://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter5.html

×