SlideShare a Scribd company logo
1 of 66
Pairwise Sequence Alignment Misha Kapushesky Slides:  Stuart M. Brown, Fourie Joubert, NYU St. Petersburg Russia 2010
Protein Evolution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Definition ,[object Object],[object Object],ATTGCGC    ATTGCGC    ATCCGC C ATTGCGC AT-CCGC  ATTGCGC
Orthologous and paralogous ,[object Object],[object Object],[object Object]
Pairwise Alignment ,[object Object],[object Object],[object Object],[object Object],[object Object]
Methods of Alignment ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Align by Hand ,[object Object],[object Object],[object Object]
Percent Sequence Identity ,[object Object],A C  C  T G  A  G  –  A G  A C  G  T G  –  G  C  A G 70% identical mismatch indel
Dotplot: A              T         T         C            A             C        A             T         A             T A C A T T A C G T A C Sequence 1 Sequence 2 A dotplot gives an overview of all possible alignments
Dotplot: A              T         T         C            A             C        A             T         A             T A C A T T A C G T A C T A C A T T A C G T A C A T A C A C T T A Sequence 1 Sequence 2 One possible alignment: In a dotplot each diagonal corresponds to a possible (ungapped) alignment
Insertions / Deletions in a Dotplot T A C T G T C A T T  A  C  T  G  T  T  C  A  T Sequence 1 Sequence 2 T A C T G   -   T C A T | | | | |  | | | | T A C T G T T C A T
Hemoglobin   -chain Hemoglobin  -chain Dotplot   (Window = 130 /  Stringency = 9)
Word Size Algorithm T  A C G  G T A T G A C A  G T A T C T A  C G G  T A T G A  C A G  T A T C T A C  G G T  A T G A C  A G T  A T C T A C G  G T A  T G A C A  G T A  T C C         T         A       T          G       A               C   A              T A C G G T A T G Word Size = 3 
PTHPL ASKTQILPEDLA SEDLTI PTHPL AGERAIGLARLA EEDFGM Score = 7  PTH PLASKTQILPED LASEDLTI PTH PLAGERAIGLAR LAEEDFGM Score = 11  Matrix: PAM250 Window = 12  Stringency = 9 Scoring Matrix Filtering PTHP LASKTQILPEDL ASEDLTI PTHP LAGERAIGLARL AEEDFGM Score = 11  Window / Stringency
Dotplot   (Window = 18  /  Stringency = 10) Hemoglobin  -chain Hemoglobin   -chain
Considerations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Alignment methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Rocks game ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dynamic Programming
Basic principles of dynamic programming ,[object Object],[object Object],[object Object]
Dynamic Programming ,[object Object],[object Object],[object Object],[object Object],[object Object]
Creation of an alignment path matrix Idea: Build up an optimal alignment using previous solutions for optimal alignments of smaller subsequences ,[object Object],[object Object],[object Object]
 
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Creation of an alignment path matrix
H  E  A  G  A  W  G  H  E  E  0  -8  -16  -24  -32  -40  -48  -56  -64  -72  -80  P  -8  -2  -9  -17  -25  -33  -42  -49  -57  -65  -73 A -16  -10  -3  -4  -12  -20  -28  -36  -44  -52  -60 W -24  -18  -11  -6  -7  -15  -5  -13  -21  -29  -37 H -32  -14  -18  -13  -8  -9  -13  -7  -3  -11  -19 E -40  -22  -8  -16  -16  -9  -12  -15  -7  3  -5 A -48  -30  -16  -3  -11  -11  -12  -12  -15  -5  2 E -56  -38  -24  -11  -6  -12  -14  -15  -12  -9  1 Backtracking -5 1 - A E E H H G - W W A A G - A P E - H - 0 -25 -5 -20 -13 -3 3 -8 -16 -17 Optimal global alignment: E E
Global vs.  Local  Alignments ,[object Object],[object Object]
 
needle  (Needleman & Wunsch)   creates an  end-to-end  alignment. Global Alignment Two closely related sequences:
Two sequences sharing several regions of local similarity: 1 AGGATTGGAATGCTCAGAAGCAGCTAAAGCGTGTATGCAGGATTGGAATTAAAGAGGAGGTAGACCG.... 67 1 AGGATTGGAATGCTAGGCTTGATTGCCTACCTGTAGCCACATCAGAAGCACTAAAGCGTCAGCGAGACCG  70 ||||||||||||||  |  |  | ||||  ||  | |  | ||   Global Alignment
Global Alignment (Needleman-Wunsch) ,[object Object],[object Object],[object Object],[object Object]
Local Alignment (Smith-Waterman) ,[object Object],[object Object],[object Object],[object Object]
Parameters of  Sequence  Alignment ,[object Object],[object Object],[object Object],[object Object],[object Object]
DNA Scoring Systems -very simple Sequence 1 Sequence 2 A G C T A 1 0 0 0 G 0 1 0 0 C 0 0 1 0 T 0 0 0 1 Match: 1 Mismatch: 0 Score = 5 actaccagttcatttgatacttctcaaa taccattaccgtgttaactgaaaggacttaaagact
Protein Scoring Systems PTHPLASKTQILPEDLASEDLTI PTHPLAGERAIGLARLAEEDFGM Sequence 1 Sequence 2 Scoring matrix T:G =  -2  T:T  =  5 Score =  48   C S T P A G N D . . C  9 S -1  4 T -1  1  5 P -3 -1 -1  7 A  0  1  0 -1  4 G -3  0 -2 -2  0  6 N -3  1  0 -2 -2  0  5 D -3  0 -1 -1 -2 -1  1  6  . .   C S T P A G N D . . C  9 S -1  4 T -1  1  5 P -3 -1 -1  7 A  0  1  0 -1  4 G -3  0 -2 -2  0  6 N -3  1  0 -2 -2  0  5 D -3  0 -1 -1 -2 -1  1  6  . .
[object Object],[object Object],C P G G A V I L M F Y W H K R E Q D N S T C SH S+S positive charged polar aliphatic aromatic small tiny hydrophobic Protein Scoring Systems
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Protein Scoring Systems
PAM matrices ,[object Object],[object Object],[object Object]
PAM  ( P ercent  A ccepted  M utations)  matrices ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
PAM  ( P ercent  A ccepted  M utations)  matrices ,[object Object],[object Object],[object Object],[object Object],[object Object]
A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V  B  Z A  2 -2  0  0 -2  0  0  1 -1 -1 -2 -1 -1 -3  1  1  1 -6 -3  0  2  1  R  -2  6  0 -1 -4  1 -1 -3  2 -2 -3  3  0 -4  0  0 -1  2 -4 -2  1  2  N  0  0  2  2 -4  1  1  0  2 -2 -3  1 -2 -3  0  1  0 -4 -2 -2  4  3  D  0 -1  2  4 -5  2  3  1  1 -2 -4  0 -3 -6 -1  0  0 -7 -4 -2  5  4  C  -2 -4 -4 -5 12 -5 -5 -3 -3 -2 -6 -5 -5 -4 -3  0 -2 -8  0 -2 -3 -4  Q  0  1  1  2 -5  4  2 -1  3 -2 -2  1 -1 -5  0 -1 -1 -5 -4 -2  3  5  E  0 -1  1  3 -5  2  4  0  1 -2 -3  0 -2 -5 -1  0  0 -7 -4 -2  4  5  G  1 -3  0  1 -3 -1  0  5 -2 -3 -4 -2 -3 -5  0  1  0 -7 -5 -1  2  1  H  -1  2  2  1 -3  3  1 -2  6 -2 -2  0 -2 -2  0 -1 -1 -3  0 -2  3  3  I  -1 -2 -2 -2 -2 -2 -2 -3 -2  5  2 -2  2  1 -2 -1  0 -5 -1  4 -1 -1  L  -2 -3 -3 -4 -6 -2 -3 -4 -2  2  6 -3  4  2 -3 -3 -2 -2 -1  2 -2 -1  K  -1  3  1  0 -5  1  0 -2  0 -2 -3  5  0 -5 -1  0  0 -3 -4 -2  2  2  M  -1  0 -2 -3 -5 -1 -2 -3 -2  2  4  0  6  0 -2 -2 -1 -4 -2  2 -1  0  F  -3 -4 -3 -6 -4 -5 -5 -5 -2  1  2 -5  0  9 -5 -3 -3  0  7 -1 -3 -4  P  1  0  0 -1 -3  0 -1  0  0 -2 -3 -1 -2 -5  6  1  0 -6 -5 -1  1  1  S  1  0  1  0  0 -1  0  1 -1 -1 -3  0 -2 -3  1  2  1 -2 -3 -1  2  1  T  1 -1  0  0 -2 -1  0  0 -1  0 -2  0 -1 -3  0  1  3 -5 -3  0  2  1  W  -6  2 -4 -7 -8 -5 -7 -7 -3 -5 -2 -3 -4  0 -6 -2 -5 17  0 -6 -4 -4  Y  -3 -4 -2 -4  0 -4 -4 -5  0 -1 -1 -4 -2  7 -5 -3 -3  0 10 -2 -2 -3  V  0 -2 -2 -2 -2 -2 -2 -1 -2  4  2 -2  2 -1 -1 -1  0 -6 -2  4  0  0  B  2  1  4  5 -3  3  4  2  3 -1 -2  2 -1 -3  1  2  2 -4 -2  0  6  5  Z  1  2  3  4 -4  5  5  1  3 -1 -1  2  0 -4  1  1  1 -4 -3  0  5  6   PAM 250 C -8 17 W W
PAM - limitations  ,[object Object],[object Object],[object Object]
BLOSUM matrices ,[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],A A C E C A - C  = 4 A - E  = 2 C - E  = 2 A - A  = 1 C - C  = 1 BLOSUM  ( Blo cks  Su bstitution  M atrix) A A C E C
The Blosum50 Scoring Matrix
BLOSUM  ( Blo cks  Su bstitution  M atrix) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
PAM Vs. BLOSUM ,[object Object],[object Object],[object Object],[object Object],[object Object],More distant sequences ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
TIPS on choosing a scoring matrix ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
T A T G T  G G  A A T G A  Scoring Insertions and Deletions A T G T  - -  A A T G C A A T G T A A T G C A T A T G T G G A A T G A  The creation of a gap is  penalized  with a negative score value. insertion / deletion
1 GTGATAGACACAGACCGGTGGCATTGTGG 29 |||  |  | |||  |  || || | 1 GTGTCGGGAAGAGATAACTCCGATGGTTG 29 Why Gap Penalties?   Gaps allowed but not penalized   Score:  88  Gaps not permitted  Score:  0 1 GTG.ATAG.ACACAGA..CCGGT..GGCATTGTGG 29 ||| || | | | |||  ||  |  |  || || | 1 GTGTAT.GGA.AGAGATACC..TCCG..ATGGTTG 29 Match = 5 Mismatch = -4
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Why Gap Penalties?
Gap Penalties   ,[object Object],[object Object],[object Object],[object Object],[object Object]
Scoring Insertions and Deletions A T G T T A T A C T A T G T G C G T A T A   Total Score:  4 Gap parameters: d  = 3 (gap opening) e  = 0.1 (gap extension) g  = 3 (gap lenght)  (g)  = -3 - (3 -1) 0.1 = -3.2   T A T G T G C G T A T A  A T G T - - - T A T A C insertion / deletion match = 1 mismatch = 0 Total Score: 8 - 3.2 = 4.8
Modification of Gap Penalties 1 V...LSPADKFLTNV 12 |  |||| |  | | 1 VFTELSPA.K..T.V 11 1 ...VLSPADKFLTNV 12 ||||  1 VFTELSPAKTV.... 11 gap opening penalty =  0 gap extension penalty =  0.1 score =  11.3 Score Matrix: BLOSUM62 gap opening penalty =  3 gap extension penalty =  0.1 score =  6.3
BLAST Algorithm Basic Local Alignment Search Tool ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],CCNDHRKMTCSPNDNNRK TTNDHRMTACSPDNNNKH CCNDHRKMTCSPNDNNRK YTNHHMMTTYSLDNNNKK more likely than
BLAST Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BLAST Algorithm Part 1  Removing Low-complexity Segments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Removing Low-complexity Segments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BLAST Algorithm Part 2 Harvesting k-tuples ,[object Object],[object Object],[object Object],S T S L S T S D K L M R STS TSL SLS LST
BLAST Algorithm Part 3 Finding High Scoring Triples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Finding High Scoring Triples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BLAST Algorithm Part 4 Seeding Possible Alignments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Seeding Possible Alignments Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BLAST Algorithm Part 5 Generating High Scoring Pairs (HSPs) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Extending Alignment Regions Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BLAST Algorithm Part 6 Checking Statistical Significance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BLAST Algorithm Part 7 Reporting the Alignments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

Viewers also liked

20071007 efficientalgorithms kulikov_lecture03
20071007 efficientalgorithms kulikov_lecture0320071007 efficientalgorithms kulikov_lecture03
20071007 efficientalgorithms kulikov_lecture03Computer Science Club
 
20081019 auctions nikolenko_lecture04
20081019 auctions nikolenko_lecture0420081019 auctions nikolenko_lecture04
20081019 auctions nikolenko_lecture04Computer Science Club
 
20091004 cryptoprotocols nikolenko_lecture04
20091004 cryptoprotocols nikolenko_lecture0420091004 cryptoprotocols nikolenko_lecture04
20091004 cryptoprotocols nikolenko_lecture04Computer Science Club
 
20090920 cryptoprotocols nikolenko_lecture01
20090920 cryptoprotocols nikolenko_lecture0120090920 cryptoprotocols nikolenko_lecture01
20090920 cryptoprotocols nikolenko_lecture01Computer Science Club
 
20071104 verification konev_lecture14
20071104 verification konev_lecture1420071104 verification konev_lecture14
20071104 verification konev_lecture14Computer Science Club
 
20100314 virtualization igotti_lecture06
20100314 virtualization igotti_lecture0620100314 virtualization igotti_lecture06
20100314 virtualization igotti_lecture06Computer Science Club
 
20090215 hardnessvsrandomness itsykson_lecture01
20090215 hardnessvsrandomness itsykson_lecture0120090215 hardnessvsrandomness itsykson_lecture01
20090215 hardnessvsrandomness itsykson_lecture01Computer Science Club
 
20100930 proof complexity_hirsch_lecture03
20100930 proof complexity_hirsch_lecture0320100930 proof complexity_hirsch_lecture03
20100930 proof complexity_hirsch_lecture03Computer Science Club
 

Viewers also liked (9)

20071007 efficientalgorithms kulikov_lecture03
20071007 efficientalgorithms kulikov_lecture0320071007 efficientalgorithms kulikov_lecture03
20071007 efficientalgorithms kulikov_lecture03
 
20081019 auctions nikolenko_lecture04
20081019 auctions nikolenko_lecture0420081019 auctions nikolenko_lecture04
20081019 auctions nikolenko_lecture04
 
20091004 cryptoprotocols nikolenko_lecture04
20091004 cryptoprotocols nikolenko_lecture0420091004 cryptoprotocols nikolenko_lecture04
20091004 cryptoprotocols nikolenko_lecture04
 
20090920 cryptoprotocols nikolenko_lecture01
20090920 cryptoprotocols nikolenko_lecture0120090920 cryptoprotocols nikolenko_lecture01
20090920 cryptoprotocols nikolenko_lecture01
 
20071104 verification konev_lecture14
20071104 verification konev_lecture1420071104 verification konev_lecture14
20071104 verification konev_lecture14
 
20100314 virtualization igotti_lecture06
20100314 virtualization igotti_lecture0620100314 virtualization igotti_lecture06
20100314 virtualization igotti_lecture06
 
20090215 hardnessvsrandomness itsykson_lecture01
20090215 hardnessvsrandomness itsykson_lecture0120090215 hardnessvsrandomness itsykson_lecture01
20090215 hardnessvsrandomness itsykson_lecture01
 
20100925 ontology konev_lecture02
20100925 ontology konev_lecture0220100925 ontology konev_lecture02
20100925 ontology konev_lecture02
 
20100930 proof complexity_hirsch_lecture03
20100930 proof complexity_hirsch_lecture0320100930 proof complexity_hirsch_lecture03
20100930 proof complexity_hirsch_lecture03
 

Similar to 20100515 bioinformatics kapushesky_lecture07

PCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfPCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfssusera1eccd
 
LPEI_ZCNI_Poster
LPEI_ZCNI_PosterLPEI_ZCNI_Poster
LPEI_ZCNI_PosterLong Pei
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshitaHarshita Bhawsar
 
NeedlemanWunsch.pdf
NeedlemanWunsch.pdfNeedlemanWunsch.pdf
NeedlemanWunsch.pdfYogeshwari54
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignmentKubuldinho
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological dataKrish_ver2
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Sequence Alignment - Data Bioinformatics Introduction
Sequence Alignment - Data Bioinformatics IntroductionSequence Alignment - Data Bioinformatics Introduction
Sequence Alignment - Data Bioinformatics IntroductionTenaAvdic
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence AlignmentRavi Gandham
 
Bioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesBioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesProf. Wim Van Criekinge
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Pritom Chaki
 

Similar to 20100515 bioinformatics kapushesky_lecture07 (20)

PCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfPCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdf
 
Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015
 
LPEI_ZCNI_Poster
LPEI_ZCNI_PosterLPEI_ZCNI_Poster
LPEI_ZCNI_Poster
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
 
Seq alignment
Seq alignment Seq alignment
Seq alignment
 
Dynamic programming
Dynamic programming Dynamic programming
Dynamic programming
 
Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
Alignments
AlignmentsAlignments
Alignments
 
NeedlemanWunsch.pdf
NeedlemanWunsch.pdfNeedlemanWunsch.pdf
NeedlemanWunsch.pdf
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Sequence Alignment - Data Bioinformatics Introduction
Sequence Alignment - Data Bioinformatics IntroductionSequence Alignment - Data Bioinformatics Introduction
Sequence Alignment - Data Bioinformatics Introduction
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Bioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesBioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matrices
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
Sequence alignment belgaum
Sequence alignment belgaumSequence alignment belgaum
Sequence alignment belgaum
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 

More from Computer Science Club

20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugsComputer Science Club
 
20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugsComputer Science Club
 
20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugsComputer Science Club
 
20140511 parallel programming_kalishenko_lecture12
20140511 parallel programming_kalishenko_lecture1220140511 parallel programming_kalishenko_lecture12
20140511 parallel programming_kalishenko_lecture12Computer Science Club
 
20140427 parallel programming_zlobin_lecture11
20140427 parallel programming_zlobin_lecture1120140427 parallel programming_zlobin_lecture11
20140427 parallel programming_zlobin_lecture11Computer Science Club
 
20140420 parallel programming_kalishenko_lecture10
20140420 parallel programming_kalishenko_lecture1020140420 parallel programming_kalishenko_lecture10
20140420 parallel programming_kalishenko_lecture10Computer Science Club
 
20140413 parallel programming_kalishenko_lecture09
20140413 parallel programming_kalishenko_lecture0920140413 parallel programming_kalishenko_lecture09
20140413 parallel programming_kalishenko_lecture09Computer Science Club
 
20140329 graph drawing_dainiak_lecture02
20140329 graph drawing_dainiak_lecture0220140329 graph drawing_dainiak_lecture02
20140329 graph drawing_dainiak_lecture02Computer Science Club
 
20140329 graph drawing_dainiak_lecture01
20140329 graph drawing_dainiak_lecture0120140329 graph drawing_dainiak_lecture01
20140329 graph drawing_dainiak_lecture01Computer Science Club
 
20140310 parallel programming_kalishenko_lecture03-04
20140310 parallel programming_kalishenko_lecture03-0420140310 parallel programming_kalishenko_lecture03-04
20140310 parallel programming_kalishenko_lecture03-04Computer Science Club
 
20140216 parallel programming_kalishenko_lecture01
20140216 parallel programming_kalishenko_lecture0120140216 parallel programming_kalishenko_lecture01
20140216 parallel programming_kalishenko_lecture01Computer Science Club
 

More from Computer Science Club (20)

20141223 kuznetsov distributed
20141223 kuznetsov distributed20141223 kuznetsov distributed
20141223 kuznetsov distributed
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 
20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs
 
20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs
 
20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs
 
20140511 parallel programming_kalishenko_lecture12
20140511 parallel programming_kalishenko_lecture1220140511 parallel programming_kalishenko_lecture12
20140511 parallel programming_kalishenko_lecture12
 
20140427 parallel programming_zlobin_lecture11
20140427 parallel programming_zlobin_lecture1120140427 parallel programming_zlobin_lecture11
20140427 parallel programming_zlobin_lecture11
 
20140420 parallel programming_kalishenko_lecture10
20140420 parallel programming_kalishenko_lecture1020140420 parallel programming_kalishenko_lecture10
20140420 parallel programming_kalishenko_lecture10
 
20140413 parallel programming_kalishenko_lecture09
20140413 parallel programming_kalishenko_lecture0920140413 parallel programming_kalishenko_lecture09
20140413 parallel programming_kalishenko_lecture09
 
20140329 graph drawing_dainiak_lecture02
20140329 graph drawing_dainiak_lecture0220140329 graph drawing_dainiak_lecture02
20140329 graph drawing_dainiak_lecture02
 
20140329 graph drawing_dainiak_lecture01
20140329 graph drawing_dainiak_lecture0120140329 graph drawing_dainiak_lecture01
20140329 graph drawing_dainiak_lecture01
 
20140310 parallel programming_kalishenko_lecture03-04
20140310 parallel programming_kalishenko_lecture03-0420140310 parallel programming_kalishenko_lecture03-04
20140310 parallel programming_kalishenko_lecture03-04
 
20140223-SuffixTrees-lecture01-03
20140223-SuffixTrees-lecture01-0320140223-SuffixTrees-lecture01-03
20140223-SuffixTrees-lecture01-03
 
20140216 parallel programming_kalishenko_lecture01
20140216 parallel programming_kalishenko_lecture0120140216 parallel programming_kalishenko_lecture01
20140216 parallel programming_kalishenko_lecture01
 
20131106 h10 lecture6_matiyasevich
20131106 h10 lecture6_matiyasevich20131106 h10 lecture6_matiyasevich
20131106 h10 lecture6_matiyasevich
 
20131027 h10 lecture5_matiyasevich
20131027 h10 lecture5_matiyasevich20131027 h10 lecture5_matiyasevich
20131027 h10 lecture5_matiyasevich
 
20131027 h10 lecture5_matiyasevich
20131027 h10 lecture5_matiyasevich20131027 h10 lecture5_matiyasevich
20131027 h10 lecture5_matiyasevich
 
20131013 h10 lecture4_matiyasevich
20131013 h10 lecture4_matiyasevich20131013 h10 lecture4_matiyasevich
20131013 h10 lecture4_matiyasevich
 
20131006 h10 lecture3_matiyasevich
20131006 h10 lecture3_matiyasevich20131006 h10 lecture3_matiyasevich
20131006 h10 lecture3_matiyasevich
 
20131006 h10 lecture3_matiyasevich
20131006 h10 lecture3_matiyasevich20131006 h10 lecture3_matiyasevich
20131006 h10 lecture3_matiyasevich
 

20100515 bioinformatics kapushesky_lecture07

  • 1. Pairwise Sequence Alignment Misha Kapushesky Slides: Stuart M. Brown, Fourie Joubert, NYU St. Petersburg Russia 2010
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. Dotplot: A     T     T     C    A     C    A     T     A     T A C A T T A C G T A C Sequence 1 Sequence 2 A dotplot gives an overview of all possible alignments
  • 10. Dotplot: A     T     T     C    A     C    A     T     A     T A C A T T A C G T A C T A C A T T A C G T A C A T A C A C T T A Sequence 1 Sequence 2 One possible alignment: In a dotplot each diagonal corresponds to a possible (ungapped) alignment
  • 11. Insertions / Deletions in a Dotplot T A C T G T C A T T A C T G T T C A T Sequence 1 Sequence 2 T A C T G - T C A T | | | | | | | | | T A C T G T T C A T
  • 12. Hemoglobin  -chain Hemoglobin  -chain Dotplot (Window = 130 / Stringency = 9)
  • 13. Word Size Algorithm T A C G G T A T G A C A G T A T C T A C G G T A T G A C A G T A T C T A C G G T A T G A C A G T A T C T A C G G T A T G A C A G T A T C C T A T  G A C A T A C G G T A T G Word Size = 3 
  • 14. PTHPL ASKTQILPEDLA SEDLTI PTHPL AGERAIGLARLA EEDFGM Score = 7 PTH PLASKTQILPED LASEDLTI PTH PLAGERAIGLAR LAEEDFGM Score = 11  Matrix: PAM250 Window = 12 Stringency = 9 Scoring Matrix Filtering PTHP LASKTQILPEDL ASEDLTI PTHP LAGERAIGLARL AEEDFGM Score = 11  Window / Stringency
  • 15. Dotplot (Window = 18 / Stringency = 10) Hemoglobin  -chain Hemoglobin  -chain
  • 16.
  • 17.
  • 18.
  • 20.
  • 21.
  • 22.
  • 23.  
  • 24.
  • 25. H E A G A W G H E E 0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73 A -16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60 W -24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37 H -32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19 E -40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5 A -48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2 E -56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1 Backtracking -5 1 - A E E H H G - W W A A G - A P E - H - 0 -25 -5 -20 -13 -3 3 -8 -16 -17 Optimal global alignment: E E
  • 26.
  • 27.  
  • 28. needle (Needleman & Wunsch) creates an end-to-end alignment. Global Alignment Two closely related sequences:
  • 29. Two sequences sharing several regions of local similarity: 1 AGGATTGGAATGCTCAGAAGCAGCTAAAGCGTGTATGCAGGATTGGAATTAAAGAGGAGGTAGACCG.... 67 1 AGGATTGGAATGCTAGGCTTGATTGCCTACCTGTAGCCACATCAGAAGCACTAAAGCGTCAGCGAGACCG 70 |||||||||||||| | | | |||| || | | | || Global Alignment
  • 30.
  • 31.
  • 32.
  • 33. DNA Scoring Systems -very simple Sequence 1 Sequence 2 A G C T A 1 0 0 0 G 0 1 0 0 C 0 0 1 0 T 0 0 0 1 Match: 1 Mismatch: 0 Score = 5 actaccagttcatttgatacttctcaaa taccattaccgtgttaactgaaaggacttaaagact
  • 34. Protein Scoring Systems PTHPLASKTQILPEDLASEDLTI PTHPLAGERAIGLARLAEEDFGM Sequence 1 Sequence 2 Scoring matrix T:G = -2 T:T = 5 Score = 48 C S T P A G N D . . C 9 S -1 4 T -1 1 5 P -3 -1 -1 7 A 0 1 0 -1 4 G -3 0 -2 -2 0 6 N -3 1 0 -2 -2 0 5 D -3 0 -1 -1 -2 -1 1 6 . . C S T P A G N D . . C 9 S -1 4 T -1 1 5 P -3 -1 -1 7 A 0 1 0 -1 4 G -3 0 -2 -2 0 6 N -3 1 0 -2 -2 0 5 D -3 0 -1 -1 -2 -1 1 6 . .
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. A R N D C Q E G H I L K M F P S T W Y V B Z A 2 -2 0 0 -2 0 0 1 -1 -1 -2 -1 -1 -3 1 1 1 -6 -3 0 2 1 R -2 6 0 -1 -4 1 -1 -3 2 -2 -3 3 0 -4 0 0 -1 2 -4 -2 1 2 N 0 0 2 2 -4 1 1 0 2 -2 -3 1 -2 -3 0 1 0 -4 -2 -2 4 3 D 0 -1 2 4 -5 2 3 1 1 -2 -4 0 -3 -6 -1 0 0 -7 -4 -2 5 4 C -2 -4 -4 -5 12 -5 -5 -3 -3 -2 -6 -5 -5 -4 -3 0 -2 -8 0 -2 -3 -4 Q 0 1 1 2 -5 4 2 -1 3 -2 -2 1 -1 -5 0 -1 -1 -5 -4 -2 3 5 E 0 -1 1 3 -5 2 4 0 1 -2 -3 0 -2 -5 -1 0 0 -7 -4 -2 4 5 G 1 -3 0 1 -3 -1 0 5 -2 -3 -4 -2 -3 -5 0 1 0 -7 -5 -1 2 1 H -1 2 2 1 -3 3 1 -2 6 -2 -2 0 -2 -2 0 -1 -1 -3 0 -2 3 3 I -1 -2 -2 -2 -2 -2 -2 -3 -2 5 2 -2 2 1 -2 -1 0 -5 -1 4 -1 -1 L -2 -3 -3 -4 -6 -2 -3 -4 -2 2 6 -3 4 2 -3 -3 -2 -2 -1 2 -2 -1 K -1 3 1 0 -5 1 0 -2 0 -2 -3 5 0 -5 -1 0 0 -3 -4 -2 2 2 M -1 0 -2 -3 -5 -1 -2 -3 -2 2 4 0 6 0 -2 -2 -1 -4 -2 2 -1 0 F -3 -4 -3 -6 -4 -5 -5 -5 -2 1 2 -5 0 9 -5 -3 -3 0 7 -1 -3 -4 P 1 0 0 -1 -3 0 -1 0 0 -2 -3 -1 -2 -5 6 1 0 -6 -5 -1 1 1 S 1 0 1 0 0 -1 0 1 -1 -1 -3 0 -2 -3 1 2 1 -2 -3 -1 2 1 T 1 -1 0 0 -2 -1 0 0 -1 0 -2 0 -1 -3 0 1 3 -5 -3 0 2 1 W -6 2 -4 -7 -8 -5 -7 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 0 -6 -4 -4 Y -3 -4 -2 -4 0 -4 -4 -5 0 -1 -1 -4 -2 7 -5 -3 -3 0 10 -2 -2 -3 V 0 -2 -2 -2 -2 -2 -2 -1 -2 4 2 -2 2 -1 -1 -1 0 -6 -2 4 0 0 B 2 1 4 5 -3 3 4 2 3 -1 -2 2 -1 -3 1 2 2 -4 -2 0 6 5 Z 1 2 3 4 -4 5 5 1 3 -1 -1 2 0 -4 1 1 1 -4 -3 0 5 6 PAM 250 C -8 17 W W
  • 41.
  • 42.
  • 43.
  • 45.
  • 46.
  • 47.
  • 48. T A T G T G G A A T G A Scoring Insertions and Deletions A T G T - - A A T G C A A T G T A A T G C A T A T G T G G A A T G A The creation of a gap is penalized with a negative score value. insertion / deletion
  • 49. 1 GTGATAGACACAGACCGGTGGCATTGTGG 29 ||| | | ||| | || || | 1 GTGTCGGGAAGAGATAACTCCGATGGTTG 29 Why Gap Penalties? Gaps allowed but not penalized Score: 88 Gaps not permitted Score: 0 1 GTG.ATAG.ACACAGA..CCGGT..GGCATTGTGG 29 ||| || | | | ||| || | | || || | 1 GTGTAT.GGA.AGAGATACC..TCCG..ATGGTTG 29 Match = 5 Mismatch = -4
  • 50.
  • 51.
  • 52. Scoring Insertions and Deletions A T G T T A T A C T A T G T G C G T A T A Total Score: 4 Gap parameters: d = 3 (gap opening) e = 0.1 (gap extension) g = 3 (gap lenght)  (g) = -3 - (3 -1) 0.1 = -3.2 T A T G T G C G T A T A A T G T - - - T A T A C insertion / deletion match = 1 mismatch = 0 Total Score: 8 - 3.2 = 4.8
  • 53. Modification of Gap Penalties 1 V...LSPADKFLTNV 12 | |||| | | | 1 VFTELSPA.K..T.V 11 1 ...VLSPADKFLTNV 12 |||| 1 VFTELSPAKTV.... 11 gap opening penalty = 0 gap extension penalty = 0.1 score = 11.3 Score Matrix: BLOSUM62 gap opening penalty = 3 gap extension penalty = 0.1 score = 6.3
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.

Editor's Notes

  1. Protein sequencce comparison and protein evolution ISMB 2000