• Save
Multiple alignment
Upcoming SlideShare
Loading in...5
×
 

Multiple alignment

on

  • 4,593 views

 

Statistics

Views

Total Views
4,593
Views on SlideShare
4,593
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Mouse sequence from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSMUST00000111083.1 Chicken from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSGALT00000019805.3 Seasquirt from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSCINT00000013350.2 Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Aligned using clustalw. Viewed in Jalview. Saved as humanflyothers_clustal.png
  • Mouse sequence from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSMUST00000111083.1 Chicken from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSGALT00000019805.3 Seasquirt from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSCINT00000013350.2 Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Aligned using clustalw. Viewed in Jalview. Saved as humanflyothers_clustal.png
  • Mouse sequence from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSMUST00000111083.1 Chicken from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSGALT00000019805.3 Seasquirt from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENSCINT00000013350.2 Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Aligned using clustalw. Viewed in Jalview. Saved as humanflyothers_clustal.png
  • Image from www.cs.iastate.edu/~cs544/.../Multiple_Sequence_Alignment.ppt slide 12 For recurrence relation, see page 189 in Jones & Pevzner ‘An introduction to bioinformatics algorithms’
  • Image credit (Des Higgins): http://www.idaireland.com/_internal/cimg!0/52302eob2zw6kiy4ed60bl5ugmuau17 Image credit (Paul Sharp): http://www.biology.ed.ac.uk/people/homepages/images/pmsharp.jpg
  • Image credit: http://www.biomedcentral.com/content/figures/1471-2105-5-113-1-l.jpg
  • Image credit: http://www.biomedcentral.com/content/figures/1471-2105-5-113-1-l.jpg

Multiple alignment Multiple alignment Presentation Transcript

  • Multiple Alignment Dr Avril Coghlan alc@sanger.ac.ukNote: this talk contains animations which can only be seen bydownloading and using ‘View Slide show’ in Powerpoint
  • Pairwise versus Multiple Alignment• So far we have considered the alignment of two sequences (‘pairwise alignment’) Q K E S G P S S S Y C | | | | | V Q Q E S G L V R T T C• Alignment can be performed between three or more sequences (‘multiple alignment’) Q K E S G P S S S Y C | | | | | V Q Q E S G L V R T T C | | | | | | | | V Q K E S L L V R S T C
  • Multiple alignment• Multiple alignments are useful for comparing many homologous sequences at once Multiple alignment of part of Eyeless from different animals• Multiple alignments can be global or local The majority of widely used programs for making multiple alignments (eg. CLUSTAL, T-COFFEE) create global multiple alignments (not local multiple alignments) If the sequences share one stretch of high sequence similarity, it might make sense to make a multiple alignment of just that region of similarity eg. for Eyeless You can “cut out” the region of similarity from each sequence, & make a multiple alignment of that region eg. using CLUSTAL
  • Real data: Eyeless proteins Do you think it’s sensible to make a global multiple alignment of these sequences?
  • The alignment is not veryreliable in regions of lowsimilarityfor example look at thealignment of fly Eyeless tothe other proteins here
  • • Algorithms for aligning 2 sequences (eg. N-W, S-W) can be extended to multiple sequences For aligning 3 sequences using N-W, we fill in a table T that is a 3D cube, using the recurrence relation: T(i-1,j-1,k-1) + σ(S1(i),S2(j)) + σ(S1(i),S3(k)) + σ(S2(j),S3(k)) T(i, j, k) = max T(i-1, j, k) + gap penalty + gap penalty T(i, j-1, k) + gap penalty + gap penalty T(i, j, k-1) + gap penalty + gap penalty T(i-1, j, k-1) + σ(S1(i),S3(k)) + gap penalty + gap penalty T(i, j-1, k-1) + σ(S2(j),S3(k)) + gap penalty + gap penalty T(i-1, j-1, k) + σ(S1(i),S2(j)) + gap penalty + gap penalty
  • • The run-time increases exponentially with the number of sequences you want to align Aligning 4 sequences of 100 amino acids takes ~3 days!• Heuristic algorithms for multiple alignment are generally used, as they are fast eg. CLUSTAL, T-COFFEE ‘Heuristic’ means they’re not guaranteed to find the best solution (best alignment here) (While N-W & S-W are proven to find the best alignment)• A popular heuristic algorithm is CLUSTAL, by Des Higgins and Paul Sharp at Trinity College Dublin (1988) Uses a ‘progressive alignment’ approach ie. aligns the most similar 2 sequences first; adds the next most similar sequence to that alignment; adds the next most similar sequence … etc.
  • CLUSTAL• A popular heuristic algorithm is CLUSTAL, by Des Higgins and Paul Sharp at TCD (1988) Cited >37,000 times; D. Higgins is Ireland’s most cited scientist• CLUSTAL makes a global multiple alignment using a ‘progressive alignment’ approach• First computes all pairwise alignments and calculates sequence similarity between pairs• These similarities are used to build a rough ‘guide tree’ S1 S2 S3 S4
  • •1 Then aligns the most similar pair of sequences This gives us an alignment of 2 sequences (called a ‘profile’) eg. alignment of sequences S1 and S2•2 Aligns the next closest pair of sequences (or pair of profiles, or sequence and profile) eg. alignment of sequences S1 and S2•3 Aligns the next closest pair of seqs/profiles eg. alignment of profiles S1-S2 and S3-S4 MQTIF S1 MQTIF LH-IW 1 MQTIF LHIW S2 LH-IW LQS-W 3 LQSW L-S-F LQSW S3 2 L-SF LSF S4
  • • A property of this method is that gap creation is irreversible: ‘once a gap, always a gap’ MQTIF S1 MQTIF LH-IW 1 MQTIF LHIW S2 LH-IW LQS-W 3 LQSW L-S-F LQSW S3 2 L-SF LSF S4• This is a ‘heuristic algorithm’, ie. is not guaranteed to give the best alignment However, is very fast & works well in most cases
  • Software for making alignments• For multiple alignment (heuristic programs) CLUSTAL http://www.ebi.ac.uk/Tools/msa/clustalw2/ T-COFFEE http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi MUSCLE http://www.ebi.ac.uk/Tools/msa/muscle/ MAFFT http://mafft.cbrc.jp/alignment/software/
  • Further Reading• Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn• Chapter 6 in Deonier et al book Computational Genome Analysis• Practical on multiple alignment in R in the Little Book of R for Bioinformatics: https://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter5.html