SlideShare a Scribd company logo
1 of 13
Introduction to Bioinformatics
Multiple Sequence Alignment
Why Multiple Sequence Alignment?
• Up until now we have only
tried to align two sequences.
• What about more than two?
And what for?
• A faint similarity between two
sequences becomes significant
if present in many
• Multiple alignments can
reveal subtle similarities that
pairwise alignments do not
reveal
V T I S C T G S S S N I G
V T LT C T G S S S N I G
V T LS C S S S G F I F S
V T LT C T V S G T S F D
V T I T C V V S D V S H E
V T LV C L I S D F Y P G
V T LV C L I S D F Y P G
V T LV C L VS D Y F P E
Multiple Sequence Alignment
(msa) VTISCTGSSSNIGAGNHVKWYQQLPG
VTISCTGTSSNIGSITVNWYQQLPG
LRLSCSSSGFIFSSYAMYWVRQAPG
LSLTCTVSGTSFDDYYSTWVRQPPG
PEVTCVVVDVSHEDPQVKFNWYVDG
ATLVCLISDFYPGAVTVAWKADS
ATLVCLISDFYPGAVTVAWKADS
AALGCLVKDYFPEPVTVSWNSG-
VSLTCLVKGFYPSDIAVEWESNG-
• Goal: Bring the greatest number of similar
characters into the same column of the alignment
• Similar to alignment of two sequences.
Multiple Sequence Alignment: Motivation
• Correspondence. Find out which parts “do the same thing”
– Similar genes are conserved across widely divergent species,
often performing similar functions
• Structure prediction
– Use knowledge of structure of one or more members of a
protein MSA to predict structure of other members
– Structure is more conserved than sequence
• Create “profiles” for protein families
– Allow us to search for other members of the family
• Genome assembly: Automated reconstruction of “contig”
maps of genomic fragments such as ESTs
• msa is the starting point for phylogenetic analysis
• msa often allows to detect weakly conserved regions which
pairwise alignment can’t
Multiple Sequence Alignment: Approaches
• Optimal Global Alignments -
– Generalization of Dynamic programming
– Find alignment that maximizes a score function
– Computationally expensive: Time grows as product
of sequence lengths
• Global Progressive Alignments - Match closely-
related sequences first using a guide tree
• Global Iterative Alignments - Multiple re-building
attempts to find best alignment
• Local alignments
– Profile analysis,
– Block analysis
– Patterns searching and/or Statistical methods
Global msa: Challenges
• Computationally Expensive
– If msa includes matches, mismatches and gaps and also
accounts the degree of variation then global msa can be
applied to only a few sequences
• Difficult to score
– Multiple comparison necessary in each column of the msa for
a cumulative score
– Placement of gaps and scoring of substitution is more difficult
• Difficulty increases with diversity
– Relatively easy for a set of closely related sequences
– Identifying the correct ancestry relationships for a set of
distantly related sequences is more challenging
– Even difficult if some members are more alike compared
to others
Global msa: Dynamic
Programming
• The two-sequence alignment algorithm (Needleman-
Wunsch) can be generalized to any number of
sequences.
• E.g., for three sequences X, Y, W
define C[i,j,k] = score of optimum
alignment
 among X[1..i], Y[1..j], W[1..k]
• As for two sequences, divide possible alignments into
different classes, depending on how they end.
– Devise recurrence relations for C[i,j,k]
– C[i,j,k] is the maximum out of all possibilities
Xi
Yj
Wk
msa for 3 sequences: alignment can end in 7 ways
Xi-1
Yj-1
Wk-1
Xi
Yj
Wk
-
Yj
Wk
Xi
-
Wk
Xi
Yj
-
-
-
Wk
-
Yj
-
Xi
-
-
X1 . . .
Y1 . . .
W1 . . .
Aligning Three Sequences
• Same strategy as
aligning two sequences
• Use a 3-D “Manhattan
Cube”, with each axis
representing a sequence
to align
V
W
2-D edit graph
3-D edit graph
V
W
X
Dynamic programming for 3 sequences
V S N — S
— S N A —
— — — A S
V S N S
A
N
S
Each alignment is a path through the
dynamic programming matrix
S
A
Start
2-D cell versus 2-D Alignment Cell
In 3-D, 7 edges
in each unit cube
In 2-D, 3 edges
in each unit
square
C(i-1,j-1,k-1) C(i-1,j,k-1)
C(i,j-1,k)
C(i-1,j-1,k)
C (i-1,j,k)
C(i,j,k)
C(i,j,k-1)C(i,j-1,k-1)
Enumerate all possibilities and choose the best one
C (i-1,j-1) C (i-1,j)
C (i,j-1)
Multiple Alignment: Dynamic Programming
• si,j,k = max
• (x, y, z) is an entry in the 3-D scoring matrix
si-1,j-1,k-1 + (vi, wj, uk)
si-1,j-1,k +  (vi, wj, _ )
si-1,j,k-1 +  (v , _, u )i k
si,j-1,k-1
si-1,j,k
si,j-1,k
si,j,k-1
+  (_, wj, uk)
+  (vi, _ , _)
+  (_, wj, _)
+  (_, _, uk)
cube diagonal:
no in/dels
face diagonal:
one in/del
edge diagonal:
two in/dels
• Reading Materials
– Chapter 5: Bioinformatics Sequence and Genome
analysis – David W. Mount
• 2nd Edition: Page 170~194
• 1st Edition: Page 140~165
– Cédric Notredame, Desmond G. Higgins and Jaap Heringa “T-
coffee: a novel method for fast and accurate multiple
sequence alignment”, Journal of Molecular Biology, Volume
302, Issue 1, 8 September 2000, Pages 205-217
– Christopher Lee, Catherine Grasso and Mark F. Sharlow,
“Multiple sequence alignment using partial order graphs”
Bioinformatics Vol. 18 no. 3 2002, Pages 452-464
– Cédric Notredame and Desmond G. Higgins “SAGA: sequence
alignment by genetic algorithm”, Nucleic Acids Res. 1996 Apr
15;24(8):1515-24.

More Related Content

Similar to Introduction to Bioinformatics: Multiple Sequence Alignment (msa

Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfsriaisvariyasundar
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment Parinda Rajapaksha
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
Preparation Data Structures 11 graphs
Preparation Data Structures 11 graphsPreparation Data Structures 11 graphs
Preparation Data Structures 11 graphsAndres Mendez-Vazquez
 
Vector-based, Structure Preserving Stroke Gesture Recognition
Vector-based, Structure Preserving Stroke Gesture RecognitionVector-based, Structure Preserving Stroke Gesture Recognition
Vector-based, Structure Preserving Stroke Gesture RecognitionJean Vanderdonckt
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. localbenazeer fathima
 
Msa & rooted/unrooted tree
Msa & rooted/unrooted treeMsa & rooted/unrooted tree
Msa & rooted/unrooted treeSamiul Ehsan
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Prof. Wim Van Criekinge
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคลBAINIDA
 
Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.Workhorse Computing
 
The W-curve and its application.
The W-curve and its application.The W-curve and its application.
The W-curve and its application.Workhorse Computing
 
GRAPH - DISCRETE STRUCTURE AND ALGORITHM
GRAPH - DISCRETE STRUCTURE AND ALGORITHMGRAPH - DISCRETE STRUCTURE AND ALGORITHM
GRAPH - DISCRETE STRUCTURE AND ALGORITHMhimanshumishra19dec
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence AlignmentRavi Gandham
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Asiri Wijesinghe
 
EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...
EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...
EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...ChemAxon
 

Similar to Introduction to Bioinformatics: Multiple Sequence Alignment (msa (20)

Biological sequences analysis
Biological sequences analysisBiological sequences analysis
Biological sequences analysis
 
Dot matrix seminar
Dot matrix seminarDot matrix seminar
Dot matrix seminar
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdf
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
Preparation Data Structures 11 graphs
Preparation Data Structures 11 graphsPreparation Data Structures 11 graphs
Preparation Data Structures 11 graphs
 
Vector-based, Structure Preserving Stroke Gesture Recognition
Vector-based, Structure Preserving Stroke Gesture RecognitionVector-based, Structure Preserving Stroke Gesture Recognition
Vector-based, Structure Preserving Stroke Gesture Recognition
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Msa & rooted/unrooted tree
Msa & rooted/unrooted treeMsa & rooted/unrooted tree
Msa & rooted/unrooted tree
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
 
Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.
 
The W-curve and its application.
The W-curve and its application.The W-curve and its application.
The W-curve and its application.
 
Sudoku
SudokuSudoku
Sudoku
 
GRAPH - DISCRETE STRUCTURE AND ALGORITHM
GRAPH - DISCRETE STRUCTURE AND ALGORITHMGRAPH - DISCRETE STRUCTURE AND ALGORITHM
GRAPH - DISCRETE STRUCTURE AND ALGORITHM
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)
 
EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...
EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...
EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...
 

More from Daffodil International University (20)

Bresenham algorithm
Bresenham algorithmBresenham algorithm
Bresenham algorithm
 
Tic Tac Toe
Tic Tac ToeTic Tac Toe
Tic Tac Toe
 
The Waterfall Model & RAD MODEL
 The Waterfall Model &  RAD MODEL The Waterfall Model &  RAD MODEL
The Waterfall Model & RAD MODEL
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
online marketing
online marketingonline marketing
online marketing
 
normalization
normalizationnormalization
normalization
 
Bioinformatics lesson
Bioinformatics lessonBioinformatics lesson
Bioinformatics lesson
 
Blasta
BlastaBlasta
Blasta
 
Liver
LiverLiver
Liver
 
Numerical methods
Numerical methodsNumerical methods
Numerical methods
 
stack in assembally language
stack in assembally languagestack in assembally language
stack in assembally language
 
OSI Model
OSI ModelOSI Model
OSI Model
 
Topology
TopologyTopology
Topology
 
Complex number
Complex numberComplex number
Complex number
 
Ahsan Manzil
Ahsan Manzil Ahsan Manzil
Ahsan Manzil
 
Big data
Big dataBig data
Big data
 
Search
SearchSearch
Search
 
Encoders
EncodersEncoders
Encoders
 
Applications of numerical methods
Applications of numerical methodsApplications of numerical methods
Applications of numerical methods
 
Finite difference & interpolation
Finite difference & interpolationFinite difference & interpolation
Finite difference & interpolation
 

Recently uploaded

Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 

Recently uploaded (20)

Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 

Introduction to Bioinformatics: Multiple Sequence Alignment (msa

  • 2. Why Multiple Sequence Alignment? • Up until now we have only tried to align two sequences. • What about more than two? And what for? • A faint similarity between two sequences becomes significant if present in many • Multiple alignments can reveal subtle similarities that pairwise alignments do not reveal V T I S C T G S S S N I G V T LT C T G S S S N I G V T LS C S S S G F I F S V T LT C T V S G T S F D V T I T C V V S D V S H E V T LV C L I S D F Y P G V T LV C L I S D F Y P G V T LV C L VS D Y F P E
  • 3. Multiple Sequence Alignment (msa) VTISCTGSSSNIGAGNHVKWYQQLPG VTISCTGTSSNIGSITVNWYQQLPG LRLSCSSSGFIFSSYAMYWVRQAPG LSLTCTVSGTSFDDYYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG ATLVCLISDFYPGAVTVAWKADS ATLVCLISDFYPGAVTVAWKADS AALGCLVKDYFPEPVTVSWNSG- VSLTCLVKGFYPSDIAVEWESNG- • Goal: Bring the greatest number of similar characters into the same column of the alignment • Similar to alignment of two sequences.
  • 4. Multiple Sequence Alignment: Motivation • Correspondence. Find out which parts “do the same thing” – Similar genes are conserved across widely divergent species, often performing similar functions • Structure prediction – Use knowledge of structure of one or more members of a protein MSA to predict structure of other members – Structure is more conserved than sequence • Create “profiles” for protein families – Allow us to search for other members of the family • Genome assembly: Automated reconstruction of “contig” maps of genomic fragments such as ESTs • msa is the starting point for phylogenetic analysis • msa often allows to detect weakly conserved regions which pairwise alignment can’t
  • 5. Multiple Sequence Alignment: Approaches • Optimal Global Alignments - – Generalization of Dynamic programming – Find alignment that maximizes a score function – Computationally expensive: Time grows as product of sequence lengths • Global Progressive Alignments - Match closely- related sequences first using a guide tree • Global Iterative Alignments - Multiple re-building attempts to find best alignment • Local alignments – Profile analysis, – Block analysis – Patterns searching and/or Statistical methods
  • 6. Global msa: Challenges • Computationally Expensive – If msa includes matches, mismatches and gaps and also accounts the degree of variation then global msa can be applied to only a few sequences • Difficult to score – Multiple comparison necessary in each column of the msa for a cumulative score – Placement of gaps and scoring of substitution is more difficult • Difficulty increases with diversity – Relatively easy for a set of closely related sequences – Identifying the correct ancestry relationships for a set of distantly related sequences is more challenging – Even difficult if some members are more alike compared to others
  • 7. Global msa: Dynamic Programming • The two-sequence alignment algorithm (Needleman- Wunsch) can be generalized to any number of sequences. • E.g., for three sequences X, Y, W define C[i,j,k] = score of optimum alignment  among X[1..i], Y[1..j], W[1..k] • As for two sequences, divide possible alignments into different classes, depending on how they end. – Devise recurrence relations for C[i,j,k] – C[i,j,k] is the maximum out of all possibilities
  • 8. Xi Yj Wk msa for 3 sequences: alignment can end in 7 ways Xi-1 Yj-1 Wk-1 Xi Yj Wk - Yj Wk Xi - Wk Xi Yj - - - Wk - Yj - Xi - - X1 . . . Y1 . . . W1 . . .
  • 9. Aligning Three Sequences • Same strategy as aligning two sequences • Use a 3-D “Manhattan Cube”, with each axis representing a sequence to align V W 2-D edit graph 3-D edit graph V W X
  • 10. Dynamic programming for 3 sequences V S N — S — S N A — — — — A S V S N S A N S Each alignment is a path through the dynamic programming matrix S A Start
  • 11. 2-D cell versus 2-D Alignment Cell In 3-D, 7 edges in each unit cube In 2-D, 3 edges in each unit square C(i-1,j-1,k-1) C(i-1,j,k-1) C(i,j-1,k) C(i-1,j-1,k) C (i-1,j,k) C(i,j,k) C(i,j,k-1)C(i,j-1,k-1) Enumerate all possibilities and choose the best one C (i-1,j-1) C (i-1,j) C (i,j-1)
  • 12. Multiple Alignment: Dynamic Programming • si,j,k = max • (x, y, z) is an entry in the 3-D scoring matrix si-1,j-1,k-1 + (vi, wj, uk) si-1,j-1,k +  (vi, wj, _ ) si-1,j,k-1 +  (v , _, u )i k si,j-1,k-1 si-1,j,k si,j-1,k si,j,k-1 +  (_, wj, uk) +  (vi, _ , _) +  (_, wj, _) +  (_, _, uk) cube diagonal: no in/dels face diagonal: one in/del edge diagonal: two in/dels
  • 13. • Reading Materials – Chapter 5: Bioinformatics Sequence and Genome analysis – David W. Mount • 2nd Edition: Page 170~194 • 1st Edition: Page 140~165 – Cédric Notredame, Desmond G. Higgins and Jaap Heringa “T- coffee: a novel method for fast and accurate multiple sequence alignment”, Journal of Molecular Biology, Volume 302, Issue 1, 8 September 2000, Pages 205-217 – Christopher Lee, Catherine Grasso and Mark F. Sharlow, “Multiple sequence alignment using partial order graphs” Bioinformatics Vol. 18 no. 3 2002, Pages 452-464 – Cédric Notredame and Desmond G. Higgins “SAGA: sequence alignment by genetic algorithm”, Nucleic Acids Res. 1996 Apr 15;24(8):1515-24.