SlideShare a Scribd company logo
1 of 22
Multiple Sequence Alignment
By- Gyanika Shukla
Previous Knowledge
• Genome?
• Phylogenetic analysis?
• DNA sequencing?
• Protein sequencing?
• Total number of amino acids?
• Letter ‘M’ denotes __________ amino acid?
• Bioinformatics
• DNA/ Protein sequence similarity search?
Multiple Sequence Alignment
• In bioinformatics, a sequence alignment is a way of arranging
the sequences of DNA, RNA, or protein to identify regions of
similarity that may be a consequence of functional, structural,
or evolutionary relationships between the sequences.
• Very short or very similar sequences can be aligned by hand.
However, most interesting problems require the alignment of
lengthy, highly variable or extremely numerous sequences
that cannot be aligned solely by human effort.
• Sequence similarity searches can only run by computational
approaches when the amino acid or DNA/ RNA sequences of
different organisms are parallel aligned.
• Aligned sequences of nucleotide or amino acid
residues are typically represented as rows within
a matrix
• Computational approaches to sequence
alignment generally fall into two categories:
global alignments and local alignments.
• Calculating a global alignment is a form of global
optimization that "forces" the alignment to span
the entire length of all query sequences (DNA or
Protein). By contrast, local alignments identify
regions of similarity within short sequences that
are often widely divergent overall.
Dynamic programming
• Dynamic programming is both a mathematical
optimization method and a computer
programming method. The technique of dynamic
programming can be applied to produce global
alignments via the Needleman-Wunsch
algorithm, and local alignments via the Smith-
Waterman algorithm. In typical usage,
a substitution matrix is used to assign scores to
amino-acid or DNA matches or mismatches, and
a gap penalty. In practice often simply assign a
positive match score, a negative mismatch score,
and a negative gap penalty.
GLOBAL ALIGNMENT
• Needleman-Wunsch algorithm: The algorithm
was developed by Saul B. Needleman and
Christian D. Wunsch and published in 1970.
• The alignment of sequences by this algorithm is
completed into three steps:
1) Scoring matrix (by taking a positive match
score, a negative mismatch score, and a
negative gap penalty)
2) Trace back
3) Alignment
Lets take an example that how the global alignment
software align the sequences by algorithm..
• Suppose the two sequences to be aligned are
• GCATG
• GATTA
• Lets take match value = +1 and miss-match value
is –1 (if two diagonally represented sequences
are same, then match value is assigned and if the
sequences are different miss-match value is
assigned)
• Gap penalty is – 1
Constructing the scoring matrix
• First construct a grid such as one shown in Figure. Start the first string in
the top of the third column and start the other string at the start of the
third row. Fill out the rest of the column and row headers as in Figure.
G C A T G
G
A
T
T
A
Filling in the matrix
• Start with a zero in the second row, second
column. Move through the cells of row and
column adjacent to the written sequences
calculating the score for each cell by adding
gap penalty to the next cell in the row and
column.
• The first case with existing scores in all 3 directions is the
intersection of our first letters. The surrounding cells are
below.
• This cell has three possible candidate sums:
• The diagonal top-left neighbor has score 0. The pairing of
G and G is a match, so add the score for match: 0+1 = 1
• The top neighbor has score −1 and moving from there
represents an box, so add the score for box: (−1) + (−1) =
(−2)
• The left neighbor also has score −1, represents a box and
also produces (−2).
• The highest candidate is 1 and is entered into the box.
• An arrow is marked from the box to the new box from
where the final value is assigned.
G C A T G
0 -1 -2 -3 -4 -5
G -1 1 0 -1 -2 -3
A -2 0 0 1 0 -1
T -3 -1 -1 0 2 1
T -4 -2 -2 -1 1 1
A -5 -3 -3 -1 0 0
G C A – T G
G – A T T A 50% similarity
Trace back and Alignment
The scoring matrix is traced back from the last box of the
matrix to the next boxes from where the highest values are
obtained. If the score of the box is obtained from the
diagonal box, both the sequences corresponding to that
row and column are written opposite to each other for
alignment. If score is from upper or beside box the
sequence corresponding to that column or row
respectively is taken as “gap” and only one sequence is
written.
If the highest score is from both diagonal and upper/beside
box, then diagonal value is preferred.
Once the sequences are aligned the similarity percentage can
be calculated.
LOCAL ALIGNMENT
• Smith–Waterman algorithm or zero-one algorithm
performs local sequence alignment; that is, for
determining similar regions between two strings of
nucleic acid sequences or protein sequences.
• Instead of looking at the entire sequence, the Smith–
Waterman algorithm compares segments of all
possible lengths and optimizes the similarity measure.
• The algorithm was first proposed by Temple F. Smith
and Michael S. Waterman in 1981
• The main difference to the Needleman–Wunsch
algorithm is that negative scoring matrix cells are set to
zero.
Lets take an example that how the global alignment
software align the sequences by algorithm..
• Suppose the two sequences to be aligned are
• GCATG
• GATTA
• Lets take match value = +1 and miss-match value
is –1 (if two diagonally represented sequences
are same, then match value is assigned and if the
sequences are different miss-match value is
assigned)
• Gap penalty is – 1
• The first case with existing scores in all 3 directions is the
intersection of our first letters. The surrounding cells are
below.
• This cell has three possible candidate sums:
• The diagonal top-left neighbor has score 0. The pairing of G
and G is a match, so add the score for match: 0+1 = 1
• The top neighbor has score −1 and moving from there
represents an box, so add the score for box: (−1) + (−1) = (−2)
• The left neighbor also has score −1, represents a box and
also produces (−2).
• The highest candidate is 1 and is entered into the box.
• An arrow is marked from the box to the new box from where
the final value is assigned.
• All the negative scores must be written as zero.
G C A T G
0 0 0 0 0 0
G 0 1 0 0 0 0
A 0 0 0 1 0 0
T 0 0 0 0 2 1
T 0 0 0 0 1 1
A 0 0 0 1 0 0
G C A T
G – A T
75% Similarity
Trace back and Alignment
The scoring matrix is traced back from the highest value
obtained in the matrix. If there are more than one highest
value, scoring can be done from any box.
If the score of the box is obtained from the diagonal box, both
the sequences corresponding to that row and column are
written opposite to each other for alignment. If score is
from upper or beside box the sequence corresponding to
that column or row respectively is taken as “gap” and only
one sequence is written.
If the highest score is from both diagonal and upper/beside
box, then diagonal value is preferred.
Once the sequences are aligned the similarity percentage can
be calculated.
Dot Plot Matrix
• The dot-matrix approach, which implicitly produces a family of
alignments for individual sequence regions, is qualitative and
conceptually simple, though time-consuming to analyze on a large
scale. In the absence of noise, it can be easy to visually identify
certain sequence features—such as insertions, deletions, repeats,
or inverted repeats—from a dot-matrix plot. To construct a dot-
matrix plot, the two sequences are written along the top row and
leftmost column of a two-dimensional matrix and a dot is placed at
any point where the characters in the appropriate columns match—
this is a typical recurrence plot. Some implementations vary the size
or intensity of the dot depending on the degree of similarity of the
two characters, to accommodate conservative substitutions. The
dot plots of very closely related sequences will appear as a single
line along the matrix's main diagonal.
Disadvantages of Dot Plot
• Problems with dot plots as an information
display technique include: noise, lack of
clarity, non-intuitiveness, difficulty extracting
match summary statistics and match positions
on the two sequences. There is also much
wasted space where the match data is
inherently duplicated across the diagonal and
most of the actual area of the plot is taken up
by either empty space or noise, and, finally,
dot-plots are limited to two sequences.
G C A T G
G
A
T
T
A
40% similarity
GCATG
GATTA

More Related Content

What's hot

QUARTILES, DECILES AND PERCENTILES(2018)
QUARTILES, DECILES AND PERCENTILES(2018)QUARTILES, DECILES AND PERCENTILES(2018)
QUARTILES, DECILES AND PERCENTILES(2018)sumanmathews
 
Matrix and Matrices
Matrix and MatricesMatrix and Matrices
Matrix and Matricesesfriendsg
 
Graphing sytems inequalities
Graphing sytems inequalitiesGraphing sytems inequalities
Graphing sytems inequalitiesRamón Zurita
 
frequency distribution
 frequency distribution frequency distribution
frequency distributionUnsa Shakir
 
Calculating the Median
Calculating the MedianCalculating the Median
Calculating the MedianKen Plummer
 
February 17, 2015
February 17, 2015February 17, 2015
February 17, 2015khyps13
 
Education 309 – Statistics for Educational Research
Education 309 – Statistics for Educational ResearchEducation 309 – Statistics for Educational Research
Education 309 – Statistics for Educational ResearchSzarah Blaisze
 
Statistics For Class X
Statistics For Class XStatistics For Class X
Statistics For Class XSomil Jain
 
Excel tutorial for frequency distribution
Excel tutorial for frequency distributionExcel tutorial for frequency distribution
Excel tutorial for frequency distributionS.c. Chopra
 
Lesson 2 percentiles
Lesson 2   percentilesLesson 2   percentiles
Lesson 2 percentileskarisashley
 
Organizing data using frequency distribution
Organizing data using frequency distributionOrganizing data using frequency distribution
Organizing data using frequency distributionKennyAnnGraceBatianc
 
Scope and sequence , Budget of work ,and Weekly Lesson Plan in Educ.11
Scope and sequence , Budget of work ,and Weekly Lesson Plan in Educ.11Scope and sequence , Budget of work ,and Weekly Lesson Plan in Educ.11
Scope and sequence , Budget of work ,and Weekly Lesson Plan in Educ.11Reymart Bargamento
 

What's hot (17)

QUARTILES, DECILES AND PERCENTILES(2018)
QUARTILES, DECILES AND PERCENTILES(2018)QUARTILES, DECILES AND PERCENTILES(2018)
QUARTILES, DECILES AND PERCENTILES(2018)
 
Matrix and Matrices
Matrix and MatricesMatrix and Matrices
Matrix and Matrices
 
Graphing sytems inequalities
Graphing sytems inequalitiesGraphing sytems inequalities
Graphing sytems inequalities
 
frequency distribution
 frequency distribution frequency distribution
frequency distribution
 
2 measure of central tendency
2 measure of central  tendency2 measure of central  tendency
2 measure of central tendency
 
Psychological Statistics Chapter 2
Psychological Statistics Chapter 2Psychological Statistics Chapter 2
Psychological Statistics Chapter 2
 
Calculating the Median
Calculating the MedianCalculating the Median
Calculating the Median
 
Median
MedianMedian
Median
 
February 17, 2015
February 17, 2015February 17, 2015
February 17, 2015
 
Median
MedianMedian
Median
 
Education 309 – Statistics for Educational Research
Education 309 – Statistics for Educational ResearchEducation 309 – Statistics for Educational Research
Education 309 – Statistics for Educational Research
 
Statistics For Class X
Statistics For Class XStatistics For Class X
Statistics For Class X
 
Excel tutorial for frequency distribution
Excel tutorial for frequency distributionExcel tutorial for frequency distribution
Excel tutorial for frequency distribution
 
Lesson 2 percentiles
Lesson 2   percentilesLesson 2   percentiles
Lesson 2 percentiles
 
Organizing data using frequency distribution
Organizing data using frequency distributionOrganizing data using frequency distribution
Organizing data using frequency distribution
 
Scope and sequence , Budget of work ,and Weekly Lesson Plan in Educ.11
Scope and sequence , Budget of work ,and Weekly Lesson Plan in Educ.11Scope and sequence , Budget of work ,and Weekly Lesson Plan in Educ.11
Scope and sequence , Budget of work ,and Weekly Lesson Plan in Educ.11
 
Median
MedianMedian
Median
 

Similar to Sequence alignment unit 3

Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. localbenazeer fathima
 
5. Global and Local Alignment Algorithms.pptx
5. Global and Local Alignment Algorithms.pptx5. Global and Local Alignment Algorithms.pptx
5. Global and Local Alignment Algorithms.pptxArupKhakhlari1
 
Needleman wunsch computional ppt
Needleman wunsch computional pptNeedleman wunsch computional ppt
Needleman wunsch computional ppttarun shekhawat
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence AlignmentRavi Gandham
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfsriaisvariyasundar
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment Parinda Rajapaksha
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
Statisticsforbiologists colstons
Statisticsforbiologists colstonsStatisticsforbiologists colstons
Statisticsforbiologists colstonsandymartin
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshitaHarshita Bhawsar
 
Displaying quantitative data
Displaying quantitative dataDisplaying quantitative data
Displaying quantitative dataUlster BOCES
 

Similar to Sequence alignment unit 3 (20)

Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
5. Global and Local Alignment Algorithms.pptx
5. Global and Local Alignment Algorithms.pptx5. Global and Local Alignment Algorithms.pptx
5. Global and Local Alignment Algorithms.pptx
 
Needleman wunsch computional ppt
Needleman wunsch computional pptNeedleman wunsch computional ppt
Needleman wunsch computional ppt
 
Dynamic programming
Dynamic programming Dynamic programming
Dynamic programming
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdf
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Dot matrix seminar
Dot matrix seminarDot matrix seminar
Dot matrix seminar
 
02-alignment.pdf
02-alignment.pdf02-alignment.pdf
02-alignment.pdf
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment
 
Brute force method
Brute force methodBrute force method
Brute force method
 
seq alignment.ppt
seq alignment.pptseq alignment.ppt
seq alignment.ppt
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Dot plots-1.ppt
Dot plots-1.pptDot plots-1.ppt
Dot plots-1.ppt
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Statisticsforbiologists colstons
Statisticsforbiologists colstonsStatisticsforbiologists colstons
Statisticsforbiologists colstons
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
 
Displaying quantitative data
Displaying quantitative dataDisplaying quantitative data
Displaying quantitative data
 

Recently uploaded

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 

Recently uploaded (20)

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 

Sequence alignment unit 3

  • 2. Previous Knowledge • Genome? • Phylogenetic analysis? • DNA sequencing? • Protein sequencing? • Total number of amino acids? • Letter ‘M’ denotes __________ amino acid? • Bioinformatics • DNA/ Protein sequence similarity search?
  • 3. Multiple Sequence Alignment • In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. • Very short or very similar sequences can be aligned by hand. However, most interesting problems require the alignment of lengthy, highly variable or extremely numerous sequences that cannot be aligned solely by human effort. • Sequence similarity searches can only run by computational approaches when the amino acid or DNA/ RNA sequences of different organisms are parallel aligned.
  • 4.
  • 5. • Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix • Computational approaches to sequence alignment generally fall into two categories: global alignments and local alignments. • Calculating a global alignment is a form of global optimization that "forces" the alignment to span the entire length of all query sequences (DNA or Protein). By contrast, local alignments identify regions of similarity within short sequences that are often widely divergent overall.
  • 6. Dynamic programming • Dynamic programming is both a mathematical optimization method and a computer programming method. The technique of dynamic programming can be applied to produce global alignments via the Needleman-Wunsch algorithm, and local alignments via the Smith- Waterman algorithm. In typical usage, a substitution matrix is used to assign scores to amino-acid or DNA matches or mismatches, and a gap penalty. In practice often simply assign a positive match score, a negative mismatch score, and a negative gap penalty.
  • 7. GLOBAL ALIGNMENT • Needleman-Wunsch algorithm: The algorithm was developed by Saul B. Needleman and Christian D. Wunsch and published in 1970. • The alignment of sequences by this algorithm is completed into three steps: 1) Scoring matrix (by taking a positive match score, a negative mismatch score, and a negative gap penalty) 2) Trace back 3) Alignment
  • 8. Lets take an example that how the global alignment software align the sequences by algorithm.. • Suppose the two sequences to be aligned are • GCATG • GATTA • Lets take match value = +1 and miss-match value is –1 (if two diagonally represented sequences are same, then match value is assigned and if the sequences are different miss-match value is assigned) • Gap penalty is – 1
  • 9. Constructing the scoring matrix • First construct a grid such as one shown in Figure. Start the first string in the top of the third column and start the other string at the start of the third row. Fill out the rest of the column and row headers as in Figure. G C A T G G A T T A
  • 10. Filling in the matrix • Start with a zero in the second row, second column. Move through the cells of row and column adjacent to the written sequences calculating the score for each cell by adding gap penalty to the next cell in the row and column.
  • 11. • The first case with existing scores in all 3 directions is the intersection of our first letters. The surrounding cells are below. • This cell has three possible candidate sums: • The diagonal top-left neighbor has score 0. The pairing of G and G is a match, so add the score for match: 0+1 = 1 • The top neighbor has score −1 and moving from there represents an box, so add the score for box: (−1) + (−1) = (−2) • The left neighbor also has score −1, represents a box and also produces (−2). • The highest candidate is 1 and is entered into the box. • An arrow is marked from the box to the new box from where the final value is assigned.
  • 12.
  • 13. G C A T G 0 -1 -2 -3 -4 -5 G -1 1 0 -1 -2 -3 A -2 0 0 1 0 -1 T -3 -1 -1 0 2 1 T -4 -2 -2 -1 1 1 A -5 -3 -3 -1 0 0 G C A – T G G – A T T A 50% similarity
  • 14. Trace back and Alignment The scoring matrix is traced back from the last box of the matrix to the next boxes from where the highest values are obtained. If the score of the box is obtained from the diagonal box, both the sequences corresponding to that row and column are written opposite to each other for alignment. If score is from upper or beside box the sequence corresponding to that column or row respectively is taken as “gap” and only one sequence is written. If the highest score is from both diagonal and upper/beside box, then diagonal value is preferred. Once the sequences are aligned the similarity percentage can be calculated.
  • 15. LOCAL ALIGNMENT • Smith–Waterman algorithm or zero-one algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or protein sequences. • Instead of looking at the entire sequence, the Smith– Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure. • The algorithm was first proposed by Temple F. Smith and Michael S. Waterman in 1981 • The main difference to the Needleman–Wunsch algorithm is that negative scoring matrix cells are set to zero.
  • 16. Lets take an example that how the global alignment software align the sequences by algorithm.. • Suppose the two sequences to be aligned are • GCATG • GATTA • Lets take match value = +1 and miss-match value is –1 (if two diagonally represented sequences are same, then match value is assigned and if the sequences are different miss-match value is assigned) • Gap penalty is – 1
  • 17. • The first case with existing scores in all 3 directions is the intersection of our first letters. The surrounding cells are below. • This cell has three possible candidate sums: • The diagonal top-left neighbor has score 0. The pairing of G and G is a match, so add the score for match: 0+1 = 1 • The top neighbor has score −1 and moving from there represents an box, so add the score for box: (−1) + (−1) = (−2) • The left neighbor also has score −1, represents a box and also produces (−2). • The highest candidate is 1 and is entered into the box. • An arrow is marked from the box to the new box from where the final value is assigned. • All the negative scores must be written as zero.
  • 18. G C A T G 0 0 0 0 0 0 G 0 1 0 0 0 0 A 0 0 0 1 0 0 T 0 0 0 0 2 1 T 0 0 0 0 1 1 A 0 0 0 1 0 0 G C A T G – A T 75% Similarity
  • 19. Trace back and Alignment The scoring matrix is traced back from the highest value obtained in the matrix. If there are more than one highest value, scoring can be done from any box. If the score of the box is obtained from the diagonal box, both the sequences corresponding to that row and column are written opposite to each other for alignment. If score is from upper or beside box the sequence corresponding to that column or row respectively is taken as “gap” and only one sequence is written. If the highest score is from both diagonal and upper/beside box, then diagonal value is preferred. Once the sequences are aligned the similarity percentage can be calculated.
  • 20. Dot Plot Matrix • The dot-matrix approach, which implicitly produces a family of alignments for individual sequence regions, is qualitative and conceptually simple, though time-consuming to analyze on a large scale. In the absence of noise, it can be easy to visually identify certain sequence features—such as insertions, deletions, repeats, or inverted repeats—from a dot-matrix plot. To construct a dot- matrix plot, the two sequences are written along the top row and leftmost column of a two-dimensional matrix and a dot is placed at any point where the characters in the appropriate columns match— this is a typical recurrence plot. Some implementations vary the size or intensity of the dot depending on the degree of similarity of the two characters, to accommodate conservative substitutions. The dot plots of very closely related sequences will appear as a single line along the matrix's main diagonal.
  • 21. Disadvantages of Dot Plot • Problems with dot plots as an information display technique include: noise, lack of clarity, non-intuitiveness, difficulty extracting match summary statistics and match positions on the two sequences. There is also much wasted space where the match data is inherently duplicated across the diagonal and most of the actual area of the plot is taken up by either empty space or noise, and, finally, dot-plots are limited to two sequences.
  • 22. G C A T G G A T T A 40% similarity GCATG GATTA