Multiple Alignment Dr Avril Coghlan email@example.comNote: this talk contains animations which can only be seen bydownloading and using ‘View Slide show’ in Powerpoint
Pairwise versus Multiple Alignment• So far we have considered the alignment of two sequences (‘pairwise alignment’) Q K E S G P S S S Y C | | | | | V Q Q E S G L V R T T C• Alignment can be performed between three or more sequences (‘multiple alignment’) Q K E S G P S S S Y C | | | | | V Q Q E S G L V R T T C | | | | | | | | V Q K E S L L V R S T C
Multiple alignment• Multiple alignments are useful for comparing many homologous sequences at once Multiple alignment of part of Eyeless from different animals• Multiple alignments can be global or local The majority of widely used programs for making multiple alignments (eg. CLUSTAL, T-COFFEE) create global multiple alignments (not local multiple alignments) If the sequences share one stretch of high sequence similarity, it might make sense to make a multiple alignment of just that region of similarity eg. for Eyeless You can “cut out” the region of similarity from each sequence, & make a multiple alignment of that region eg. using CLUSTAL
Real data: Eyeless proteins Do you think it’s sensible to make a global multiple alignment of these sequences?
The alignment is not veryreliable in regions of lowsimilarityfor example look at thealignment of fly Eyeless tothe other proteins here
• Algorithms for aligning 2 sequences (eg. N-W, S-W) can be extended to multiple sequences For aligning 3 sequences using N-W, we fill in a table T that is a 3D cube, using the recurrence relation: T(i-1,j-1,k-1) + σ(S1(i),S2(j)) + σ(S1(i),S3(k)) + σ(S2(j),S3(k)) T(i, j, k) = max T(i-1, j, k) + gap penalty + gap penalty T(i, j-1, k) + gap penalty + gap penalty T(i, j, k-1) + gap penalty + gap penalty T(i-1, j, k-1) + σ(S1(i),S3(k)) + gap penalty + gap penalty T(i, j-1, k-1) + σ(S2(j),S3(k)) + gap penalty + gap penalty T(i-1, j-1, k) + σ(S1(i),S2(j)) + gap penalty + gap penalty
• The run-time increases exponentially with the number of sequences you want to align Aligning 4 sequences of 100 amino acids takes ~3 days!• Heuristic algorithms for multiple alignment are generally used, as they are fast eg. CLUSTAL, T-COFFEE ‘Heuristic’ means they’re not guaranteed to find the best solution (best alignment here) (While N-W & S-W are proven to find the best alignment)• A popular heuristic algorithm is CLUSTAL, by Des Higgins and Paul Sharp at Trinity College Dublin (1988) Uses a ‘progressive alignment’ approach ie. aligns the most similar 2 sequences first; adds the next most similar sequence to that alignment; adds the next most similar sequence … etc.
CLUSTAL• A popular heuristic algorithm is CLUSTAL, by Des Higgins and Paul Sharp at TCD (1988) Cited >37,000 times; D. Higgins is Ireland’s most cited scientist• CLUSTAL makes a global multiple alignment using a ‘progressive alignment’ approach• First computes all pairwise alignments and calculates sequence similarity between pairs• These similarities are used to build a rough ‘guide tree’ S1 S2 S3 S4
•1 Then aligns the most similar pair of sequences This gives us an alignment of 2 sequences (called a ‘profile’) eg. alignment of sequences S1 and S2•2 Aligns the next closest pair of sequences (or pair of profiles, or sequence and profile) eg. alignment of sequences S1 and S2•3 Aligns the next closest pair of seqs/profiles eg. alignment of profiles S1-S2 and S3-S4 MQTIF S1 MQTIF LH-IW 1 MQTIF LHIW S2 LH-IW LQS-W 3 LQSW L-S-F LQSW S3 2 L-SF LSF S4
• A property of this method is that gap creation is irreversible: ‘once a gap, always a gap’ MQTIF S1 MQTIF LH-IW 1 MQTIF LHIW S2 LH-IW LQS-W 3 LQSW L-S-F LQSW S3 2 L-SF LSF S4• This is a ‘heuristic algorithm’, ie. is not guaranteed to give the best alignment However, is very fast & works well in most cases
Software for making alignments• For multiple alignment (heuristic programs) CLUSTAL http://www.ebi.ac.uk/Tools/msa/clustalw2/ T-COFFEE http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi MUSCLE http://www.ebi.ac.uk/Tools/msa/muscle/ MAFFT http://mafft.cbrc.jp/alignment/software/
Further Reading• Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn• Chapter 6 in Deonier et al book Computational Genome Analysis• Practical on multiple alignment in R in the Little Book of R for Bioinformatics: https://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter5.html