SlideShare a Scribd company logo
Introduction to Bioinformatics
Multiple Sequence Alignment
Why Multiple Sequence Alignment?
• Up until now we have only
tried to align two sequences.
• What about more than two?
And what for?
• A faint similarity between two
sequences becomes significant
if present in many
• Multiple alignments can
reveal subtle similarities that
pairwise alignments do not
reveal
V T I S C T G S S S N I G
V T LT C T G S S S N I G
V T LS C S S S G F I F S
V T LT C T V S G T S F D
V T I T C V V S D V S H E
V T LV C L I S D F Y P G
V T LV C L I S D F Y P G
V T LV C L VS D Y F P E
Multiple Sequence Alignment
(msa) VTISCTGSSSNIGAGNHVKWYQQLPG
VTISCTGTSSNIGSITVNWYQQLPG
LRLSCSSSGFIFSSYAMYWVRQAPG
LSLTCTVSGTSFDDYYSTWVRQPPG
PEVTCVVVDVSHEDPQVKFNWYVDG
ATLVCLISDFYPGAVTVAWKADS
ATLVCLISDFYPGAVTVAWKADS
AALGCLVKDYFPEPVTVSWNSG-
VSLTCLVKGFYPSDIAVEWESNG-
• Goal: Bring the greatest number of similar
characters into the same column of the alignment
• Similar to alignment of two sequences.
Multiple Sequence Alignment: Motivation
• Correspondence. Find out which parts “do the same thing”
– Similar genes are conserved across widely divergent species,
often performing similar functions
• Structure prediction
– Use knowledge of structure of one or more members of a
protein MSA to predict structure of other members
– Structure is more conserved than sequence
• Create “profiles” for protein families
– Allow us to search for other members of the family
• Genome assembly: Automated reconstruction of “contig”
maps of genomic fragments such as ESTs
• msa is the starting point for phylogenetic analysis
• msa often allows to detect weakly conserved regions which
pairwise alignment can’t
Multiple Sequence Alignment: Approaches
• Optimal Global Alignments -
– Generalization of Dynamic programming
– Find alignment that maximizes a score function
– Computationally expensive: Time grows as product
of sequence lengths
• Global Progressive Alignments - Match closely-
related sequences first using a guide tree
• Global Iterative Alignments - Multiple re-building
attempts to find best alignment
• Local alignments
– Profile analysis,
– Block analysis
– Patterns searching and/or Statistical methods
Global msa: Challenges
• Computationally Expensive
– If msa includes matches, mismatches and gaps and also
accounts the degree of variation then global msa can be
applied to only a few sequences
• Difficult to score
– Multiple comparison necessary in each column of the msa for
a cumulative score
– Placement of gaps and scoring of substitution is more difficult
• Difficulty increases with diversity
– Relatively easy for a set of closely related sequences
– Identifying the correct ancestry relationships for a set of
distantly related sequences is more challenging
– Even difficult if some members are more alike compared
to others
Global msa: Dynamic
Programming
• The two-sequence alignment algorithm (Needleman-
Wunsch) can be generalized to any number of
sequences.
• E.g., for three sequences X, Y, W
define C[i,j,k] = score of optimum
alignment
 among X[1..i], Y[1..j], W[1..k]
• As for two sequences, divide possible alignments into
different classes, depending on how they end.
– Devise recurrence relations for C[i,j,k]
– C[i,j,k] is the maximum out of all possibilities
Xi
Yj
Wk
msa for 3 sequences: alignment can end in 7 ways
Xi-1
Yj-1
Wk-1
Xi
Yj
Wk
-
Yj
Wk
Xi
-
Wk
Xi
Yj
-
-
-
Wk
-
Yj
-
Xi
-
-
X1 . . .
Y1 . . .
W1 . . .
Aligning Three Sequences
• Same strategy as
aligning two sequences
• Use a 3-D “Manhattan
Cube”, with each axis
representing a sequence
to align
V
W
2-D edit graph
3-D edit graph
V
W
X
Dynamic programming for 3 sequences
V S N — S
— S N A —
— — — A S
V S N S
A
N
S
Each alignment is a path through the
dynamic programming matrix
S
A
Start
2-D cell versus 2-D Alignment Cell
In 3-D, 7 edges
in each unit cube
In 2-D, 3 edges
in each unit
square
C(i-1,j-1,k-1) C(i-1,j,k-1)
C(i,j-1,k)
C(i-1,j-1,k)
C (i-1,j,k)
C(i,j,k)
C(i,j,k-1)C(i,j-1,k-1)
Enumerate all possibilities and choose the best one
C (i-1,j-1) C (i-1,j)
C (i,j-1)
Multiple Alignment: Dynamic Programming
• si,j,k = max
• (x, y, z) is an entry in the 3-D scoring matrix
si-1,j-1,k-1 + (vi, wj, uk)
si-1,j-1,k +  (vi, wj, _ )
si-1,j,k-1 +  (v , _, u )i k
si,j-1,k-1
si-1,j,k
si,j-1,k
si,j,k-1
+  (_, wj, uk)
+  (vi, _ , _)
+  (_, wj, _)
+  (_, _, uk)
cube diagonal:
no in/dels
face diagonal:
one in/del
edge diagonal:
two in/dels
• Reading Materials
– Chapter 5: Bioinformatics Sequence and Genome
analysis – David W. Mount
• 2nd Edition: Page 170~194
• 1st Edition: Page 140~165
– Cédric Notredame, Desmond G. Higgins and Jaap Heringa “T-
coffee: a novel method for fast and accurate multiple
sequence alignment”, Journal of Molecular Biology, Volume
302, Issue 1, 8 September 2000, Pages 205-217
– Christopher Lee, Catherine Grasso and Mark F. Sharlow,
“Multiple sequence alignment using partial order graphs”
Bioinformatics Vol. 18 no. 3 2002, Pages 452-464
– Cédric Notredame and Desmond G. Higgins “SAGA: sequence
alignment by genetic algorithm”, Nucleic Acids Res. 1996 Apr
15;24(8):1515-24.

More Related Content

Similar to Bioinformatics lesson

Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfsriaisvariyasundar
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment Parinda Rajapaksha
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
Preparation Data Structures 11 graphs
Preparation Data Structures 11 graphsPreparation Data Structures 11 graphs
Preparation Data Structures 11 graphsAndres Mendez-Vazquez
 
Vector-based, Structure Preserving Stroke Gesture Recognition
Vector-based, Structure Preserving Stroke Gesture RecognitionVector-based, Structure Preserving Stroke Gesture Recognition
Vector-based, Structure Preserving Stroke Gesture RecognitionJean Vanderdonckt
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. localbenazeer fathima
 
Msa & rooted/unrooted tree
Msa & rooted/unrooted treeMsa & rooted/unrooted tree
Msa & rooted/unrooted treeSamiul Ehsan
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Prof. Wim Van Criekinge
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคลBAINIDA
 
Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.Workhorse Computing
 
The W-curve and its application.
The W-curve and its application.The W-curve and its application.
The W-curve and its application.Workhorse Computing
 
GRAPH - DISCRETE STRUCTURE AND ALGORITHM
GRAPH - DISCRETE STRUCTURE AND ALGORITHMGRAPH - DISCRETE STRUCTURE AND ALGORITHM
GRAPH - DISCRETE STRUCTURE AND ALGORITHMhimanshumishra19dec
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence AlignmentRavi Gandham
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Asiri Wijesinghe
 
EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...
EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...
EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...ChemAxon
 

Similar to Bioinformatics lesson (20)

Biological sequences analysis
Biological sequences analysisBiological sequences analysis
Biological sequences analysis
 
Dot matrix seminar
Dot matrix seminarDot matrix seminar
Dot matrix seminar
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdf
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
Preparation Data Structures 11 graphs
Preparation Data Structures 11 graphsPreparation Data Structures 11 graphs
Preparation Data Structures 11 graphs
 
Vector-based, Structure Preserving Stroke Gesture Recognition
Vector-based, Structure Preserving Stroke Gesture RecognitionVector-based, Structure Preserving Stroke Gesture Recognition
Vector-based, Structure Preserving Stroke Gesture Recognition
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Msa & rooted/unrooted tree
Msa & rooted/unrooted treeMsa & rooted/unrooted tree
Msa & rooted/unrooted tree
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
 
Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.
 
The W-curve and its application.
The W-curve and its application.The W-curve and its application.
The W-curve and its application.
 
Sudoku
SudokuSudoku
Sudoku
 
GRAPH - DISCRETE STRUCTURE AND ALGORITHM
GRAPH - DISCRETE STRUCTURE AND ALGORITHMGRAPH - DISCRETE STRUCTURE AND ALGORITHM
GRAPH - DISCRETE STRUCTURE AND ALGORITHM
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)
 
EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...
EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...
EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg...
 

More from Daffodil International University (20)

Bresenham algorithm
Bresenham algorithmBresenham algorithm
Bresenham algorithm
 
Tic Tac Toe
Tic Tac ToeTic Tac Toe
Tic Tac Toe
 
The Waterfall Model & RAD MODEL
 The Waterfall Model &  RAD MODEL The Waterfall Model &  RAD MODEL
The Waterfall Model & RAD MODEL
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
online marketing
online marketingonline marketing
online marketing
 
normalization
normalizationnormalization
normalization
 
Blasta
BlastaBlasta
Blasta
 
Bioinformatics lesson
Bioinformatics lessonBioinformatics lesson
Bioinformatics lesson
 
Liver
LiverLiver
Liver
 
Numerical methods
Numerical methodsNumerical methods
Numerical methods
 
stack in assembally language
stack in assembally languagestack in assembally language
stack in assembally language
 
OSI Model
OSI ModelOSI Model
OSI Model
 
Topology
TopologyTopology
Topology
 
Complex number
Complex numberComplex number
Complex number
 
Ahsan Manzil
Ahsan Manzil Ahsan Manzil
Ahsan Manzil
 
Big data
Big dataBig data
Big data
 
Search
SearchSearch
Search
 
Encoders
EncodersEncoders
Encoders
 
Applications of numerical methods
Applications of numerical methodsApplications of numerical methods
Applications of numerical methods
 
Finite difference & interpolation
Finite difference & interpolationFinite difference & interpolation
Finite difference & interpolation
 

Recently uploaded

Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfYibeltalNibretu
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxricssacare
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXMIRIAMSALINAS13
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesRased Khan
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasiemaillard
 
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringBasic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringDenish Jangid
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
 
Forest and Wildlife Resources Class 10 Free Study Material PDF
Forest and Wildlife Resources Class 10 Free Study Material PDFForest and Wildlife Resources Class 10 Free Study Material PDF
Forest and Wildlife Resources Class 10 Free Study Material PDFVivekanand Anglo Vedic Academy
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasiemaillard
 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxJenilouCasareno
 
The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resourcesaileywriter
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxbennyroshan06
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfVivekanand Anglo Vedic Academy
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chipsGeoBlogs
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...Sayali Powar
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...Nguyen Thanh Tu Collection
 

Recently uploaded (20)

B.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdfB.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdf
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdf
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matrices
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringBasic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Forest and Wildlife Resources Class 10 Free Study Material PDF
Forest and Wildlife Resources Class 10 Free Study Material PDFForest and Wildlife Resources Class 10 Free Study Material PDF
Forest and Wildlife Resources Class 10 Free Study Material PDF
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
NCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdfNCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdf
 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
 
The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resources
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
 

Bioinformatics lesson

  • 2. Why Multiple Sequence Alignment? • Up until now we have only tried to align two sequences. • What about more than two? And what for? • A faint similarity between two sequences becomes significant if present in many • Multiple alignments can reveal subtle similarities that pairwise alignments do not reveal V T I S C T G S S S N I G V T LT C T G S S S N I G V T LS C S S S G F I F S V T LT C T V S G T S F D V T I T C V V S D V S H E V T LV C L I S D F Y P G V T LV C L I S D F Y P G V T LV C L VS D Y F P E
  • 3. Multiple Sequence Alignment (msa) VTISCTGSSSNIGAGNHVKWYQQLPG VTISCTGTSSNIGSITVNWYQQLPG LRLSCSSSGFIFSSYAMYWVRQAPG LSLTCTVSGTSFDDYYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG ATLVCLISDFYPGAVTVAWKADS ATLVCLISDFYPGAVTVAWKADS AALGCLVKDYFPEPVTVSWNSG- VSLTCLVKGFYPSDIAVEWESNG- • Goal: Bring the greatest number of similar characters into the same column of the alignment • Similar to alignment of two sequences.
  • 4. Multiple Sequence Alignment: Motivation • Correspondence. Find out which parts “do the same thing” – Similar genes are conserved across widely divergent species, often performing similar functions • Structure prediction – Use knowledge of structure of one or more members of a protein MSA to predict structure of other members – Structure is more conserved than sequence • Create “profiles” for protein families – Allow us to search for other members of the family • Genome assembly: Automated reconstruction of “contig” maps of genomic fragments such as ESTs • msa is the starting point for phylogenetic analysis • msa often allows to detect weakly conserved regions which pairwise alignment can’t
  • 5. Multiple Sequence Alignment: Approaches • Optimal Global Alignments - – Generalization of Dynamic programming – Find alignment that maximizes a score function – Computationally expensive: Time grows as product of sequence lengths • Global Progressive Alignments - Match closely- related sequences first using a guide tree • Global Iterative Alignments - Multiple re-building attempts to find best alignment • Local alignments – Profile analysis, – Block analysis – Patterns searching and/or Statistical methods
  • 6. Global msa: Challenges • Computationally Expensive – If msa includes matches, mismatches and gaps and also accounts the degree of variation then global msa can be applied to only a few sequences • Difficult to score – Multiple comparison necessary in each column of the msa for a cumulative score – Placement of gaps and scoring of substitution is more difficult • Difficulty increases with diversity – Relatively easy for a set of closely related sequences – Identifying the correct ancestry relationships for a set of distantly related sequences is more challenging – Even difficult if some members are more alike compared to others
  • 7. Global msa: Dynamic Programming • The two-sequence alignment algorithm (Needleman- Wunsch) can be generalized to any number of sequences. • E.g., for three sequences X, Y, W define C[i,j,k] = score of optimum alignment  among X[1..i], Y[1..j], W[1..k] • As for two sequences, divide possible alignments into different classes, depending on how they end. – Devise recurrence relations for C[i,j,k] – C[i,j,k] is the maximum out of all possibilities
  • 8. Xi Yj Wk msa for 3 sequences: alignment can end in 7 ways Xi-1 Yj-1 Wk-1 Xi Yj Wk - Yj Wk Xi - Wk Xi Yj - - - Wk - Yj - Xi - - X1 . . . Y1 . . . W1 . . .
  • 9. Aligning Three Sequences • Same strategy as aligning two sequences • Use a 3-D “Manhattan Cube”, with each axis representing a sequence to align V W 2-D edit graph 3-D edit graph V W X
  • 10. Dynamic programming for 3 sequences V S N — S — S N A — — — — A S V S N S A N S Each alignment is a path through the dynamic programming matrix S A Start
  • 11. 2-D cell versus 2-D Alignment Cell In 3-D, 7 edges in each unit cube In 2-D, 3 edges in each unit square C(i-1,j-1,k-1) C(i-1,j,k-1) C(i,j-1,k) C(i-1,j-1,k) C (i-1,j,k) C(i,j,k) C(i,j,k-1)C(i,j-1,k-1) Enumerate all possibilities and choose the best one C (i-1,j-1) C (i-1,j) C (i,j-1)
  • 12. Multiple Alignment: Dynamic Programming • si,j,k = max • (x, y, z) is an entry in the 3-D scoring matrix si-1,j-1,k-1 + (vi, wj, uk) si-1,j-1,k +  (vi, wj, _ ) si-1,j,k-1 +  (v , _, u )i k si,j-1,k-1 si-1,j,k si,j-1,k si,j,k-1 +  (_, wj, uk) +  (vi, _ , _) +  (_, wj, _) +  (_, _, uk) cube diagonal: no in/dels face diagonal: one in/del edge diagonal: two in/dels
  • 13. • Reading Materials – Chapter 5: Bioinformatics Sequence and Genome analysis – David W. Mount • 2nd Edition: Page 170~194 • 1st Edition: Page 140~165 – Cédric Notredame, Desmond G. Higgins and Jaap Heringa “T- coffee: a novel method for fast and accurate multiple sequence alignment”, Journal of Molecular Biology, Volume 302, Issue 1, 8 September 2000, Pages 205-217 – Christopher Lee, Catherine Grasso and Mark F. Sharlow, “Multiple sequence alignment using partial order graphs” Bioinformatics Vol. 18 no. 3 2002, Pages 452-464 – Cédric Notredame and Desmond G. Higgins “SAGA: sequence alignment by genetic algorithm”, Nucleic Acids Res. 1996 Apr 15;24(8):1515-24.