Sequence alignment- global vs.
local alignment
Presented by
Fathima Hameed
outline
• Introduction
• Principle
• Types of alignment
- global alignment
- local alignment
- semi global alignment
• Difference between global and local
• Dynamic programming method
• Advantages
• Disadvantages
• references
Introduction
• A sequence alignment is a way of arranging the primary sequences of
DNA, RNA, or protein to identify regions of similarity that may be a
consequence of functional, structural, or evolutionary relationships between
the sequences.
• The sequence alignment is made between a known ssequence and unknown
sequence or between two unknown sequences.
• The known sequence is called reference sequence, unknown sequence is
called query sequence.
principle
• Alignment can reveal homology between sequences
• Similarity is descriptive term that tells about the degree of match between
the two sequences
• Sequence similarity does not always imply a common function
• Conserved function does not always imply similarity at the sequence level
• Convergent evoluation; sequences are highly similar, but are not
homologous.
Types of alignment
• Based on completeness, it was classified as three types. they are,
1. Global alignment
2. Local alignment
3. semi global alignment
Global alignment
• Is a matching the residues of two sequences across their entire length.
• It matches the identical sequences.
• To align every residue in every sequence, are most useful when the
sequences in the query set are similar and of roughly equal size.
• A general global alignment technique is called the Needleman -Wunch
algorithm and is based on dynamic programming.
Local alignment
• Is a matching two sequence from regions which have more similar with
each other.
• These are more useful for dissimilar sequences that are suspected to contain
regions of similarity or similar sequence motifs within their larger sequence
context.
• The Smith – Waterman algorithm is a general local alignment method also
based on dynamic programming.
Semi global alignment
• It’s a hybrid method, known as semi global or glocal methods.
• To find the best possible alignment that includes the start and end of one or
the other sequence.
• This can be especially useful when the downstream part of one sequence
overlaps with the upstream part of the other sequence.
Global sequence alignment Local sequence alignment
Made to align the entire sequence Finds local region
Contains all letters from both the query and
target sequence
Aligns a substring of the query sequence to a
substring of the target sequence
It have the Same length and are quite similar Finds stretches of sequence with high level of
matches
Suitable for aligning two closely related
sequences.
Suitable for aligning more distantly related
sequences
Usually done for comparing homologous genes Used for finding out conserved patterns of
DNA
These technique is the Needleman- Wunsch
algorithm
These are Smith – Waterman algorithm
Ex, > EMBOSS Needle
> Needleman – Wunsch global align
nucleotide sequences (specialized BLAST)
Ex, > BLAST
> EMBOSS Water
> LALIGN
Dynamic programming in
bioinformatics
• It is widely used in bioinformatics for the tasks such as sequence
alignment, protein folding, RNA structure prediction and protein –
DNA binding.
• Needleman and wunsch describes general algorithm for sequence
alignment.
• Maximize a score of similarity to give maximum match.
• Maximum match= largest number of nucleotides that can be match
with others.
• That want to quantify sequence similarity between two sequences.
Dynamic programming method
• It was introduced by Richard Bellman in 1940.
• The word programming here denotes finding an acceptable plan of action not computer
programming.
• It is useful in aligning nucleotides sequences of DNA and amino acid sequence of
proteins coded by that DNA.
• Is solving complex problems by breaking them into a simpler sub problems.
• Problem can be divided into many smaller parts.
• Dynamic programming is a three step process that involves:
1. initialization
2. matrix filling (scoring)
3. trace back and aligning
Dynamic programming in sequence
alignment
1.Initialization :
The first step in the global alignment dynamic programming approach is
to create a matrix with M+1 columns and N+1 rows where M and N
corresponds to the size of the sequences to be aligned.
2. Matrix filling:
we will the matrix with highest possible scores.
to align with diagonal (align in next position.)
align in off- diagonal requires inserion of corresponding gaps.
3.trace back and aligning:
move from last corner and follow arrow.
Global alignment via dynamic
programming
• 1st column and 1st row will be empty.
• Fill 1st block with zero.
• Then fill 1st row and 1st column with gap penalty multiples.
• While filling the matrix there are three possible values
horizontal; score + gap penalty
vertical ; score + gap penalty
diagonal; score + (match / mismatch)
• We have to write max score from these values in a cell
• Let,
match = +1
mismatch= -1
gap penalty= -2
Lets,
sequence - AAAC
sequence – AGC
A A A C
0 -2 -4 -6 -8
A -2 1 -1 -3 -5
G -4 -1 0 -2 -4
C -6 -3 -2 -1 -1
Backward tracking
• In backward tracking we have to move from last cell (lower corner) and
follows arrow from which cell the current cell’s values come from and go
ahead.
• Now we have to align this sequences.
• For aligning there are 2 rules.
1.If the value come from column we will have to write 2 sequences.
2. If value come from horizontal or vertical then we will have to write
perpendicular and add gap to other side.
Local alignment via dynamic
programming
• Algorithm is same as in global alignment, but there are some changes.
• We fill 1st column and 1st row with zero.
• If the value comes in negative number than it is replaced by zero.
• Backtracking will be start from maximum value.
• Let,
match= 1
mismatch = 0
gap penalty = 0
Lets ,
sequence - GAATTCAGTTA
sequence- GGATCGA
G A A T T C A G T T A
0 0 0 0 0 0 0 0 0 0 0 0
G 0 1 1 1 1 1 1 1 1 1 1 1
G 0 1 1 1 1 1 1 1 2 2 2 2
A 0 1 2 2 2 2 2 2 2 2 2 3
T 0 1 2 2 3 3 3 3 3 3 3 3
C 0 1 2 2 3 3 4 4 4 4 4 4
G 0 1 2 2 3 3 4 4 5 5 5 5
A 0 1 2 3 3 3 3 4 5 5 5 6
Backtracking
• After the matrix fill step, the maximum alignment score for the two test
sequences is 6. the trace back step determines the actual alignment that
result in the maximum score.
• Rule will be same for this as in global alignment
• Seq# 1 GAATTCAGTTA
• Seq#2 GA – TC – G – - A
so in this way we align the sequence using dynamic programming.
Uses of sequencing
• It can be used to find genes, segments of DNA that code for a specific
protein or phenotype
• If a region of DNA has been sequenced, it can be screened for
characteristics features of genes.
Advantages of global alignment:
• Easy to understand, complete sequences in output.
• Checking minor differences between 2 sequences.
• Finding polymorphisms between 2 sequences.
Advantages of local alignment:
• mRNA vs. genomic DNA ; introns/ exons
• Genes/ proteins are modular
• Finding repeat elements within 1 sequences.
• Possible to determine e-values.
References
• www.google.com
• www.cs.mcgill.ca/~rwest/wikispeedia/wpcd/wp/s/sequence-
alignment.htm
• https://www.slideshare.net/mobile/ammarkareem3/sequence-alignment-
58496054
• https:www.slideshare.net/mobile/zohaibkhan404/dynamic-programming-
42984154
Thank you

Sequence alignment global vs. local

  • 1.
    Sequence alignment- globalvs. local alignment Presented by Fathima Hameed
  • 2.
    outline • Introduction • Principle •Types of alignment - global alignment - local alignment - semi global alignment • Difference between global and local • Dynamic programming method • Advantages • Disadvantages • references
  • 3.
    Introduction • A sequencealignment is a way of arranging the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. • The sequence alignment is made between a known ssequence and unknown sequence or between two unknown sequences. • The known sequence is called reference sequence, unknown sequence is called query sequence.
  • 4.
    principle • Alignment canreveal homology between sequences • Similarity is descriptive term that tells about the degree of match between the two sequences • Sequence similarity does not always imply a common function • Conserved function does not always imply similarity at the sequence level • Convergent evoluation; sequences are highly similar, but are not homologous.
  • 5.
    Types of alignment •Based on completeness, it was classified as three types. they are, 1. Global alignment 2. Local alignment 3. semi global alignment
  • 6.
    Global alignment • Isa matching the residues of two sequences across their entire length. • It matches the identical sequences. • To align every residue in every sequence, are most useful when the sequences in the query set are similar and of roughly equal size. • A general global alignment technique is called the Needleman -Wunch algorithm and is based on dynamic programming.
  • 7.
    Local alignment • Isa matching two sequence from regions which have more similar with each other. • These are more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context. • The Smith – Waterman algorithm is a general local alignment method also based on dynamic programming.
  • 8.
    Semi global alignment •It’s a hybrid method, known as semi global or glocal methods. • To find the best possible alignment that includes the start and end of one or the other sequence. • This can be especially useful when the downstream part of one sequence overlaps with the upstream part of the other sequence.
  • 9.
    Global sequence alignmentLocal sequence alignment Made to align the entire sequence Finds local region Contains all letters from both the query and target sequence Aligns a substring of the query sequence to a substring of the target sequence It have the Same length and are quite similar Finds stretches of sequence with high level of matches Suitable for aligning two closely related sequences. Suitable for aligning more distantly related sequences Usually done for comparing homologous genes Used for finding out conserved patterns of DNA These technique is the Needleman- Wunsch algorithm These are Smith – Waterman algorithm Ex, > EMBOSS Needle > Needleman – Wunsch global align nucleotide sequences (specialized BLAST) Ex, > BLAST > EMBOSS Water > LALIGN
  • 10.
    Dynamic programming in bioinformatics •It is widely used in bioinformatics for the tasks such as sequence alignment, protein folding, RNA structure prediction and protein – DNA binding. • Needleman and wunsch describes general algorithm for sequence alignment. • Maximize a score of similarity to give maximum match. • Maximum match= largest number of nucleotides that can be match with others. • That want to quantify sequence similarity between two sequences.
  • 11.
    Dynamic programming method •It was introduced by Richard Bellman in 1940. • The word programming here denotes finding an acceptable plan of action not computer programming. • It is useful in aligning nucleotides sequences of DNA and amino acid sequence of proteins coded by that DNA. • Is solving complex problems by breaking them into a simpler sub problems. • Problem can be divided into many smaller parts. • Dynamic programming is a three step process that involves: 1. initialization 2. matrix filling (scoring) 3. trace back and aligning
  • 12.
    Dynamic programming insequence alignment 1.Initialization : The first step in the global alignment dynamic programming approach is to create a matrix with M+1 columns and N+1 rows where M and N corresponds to the size of the sequences to be aligned. 2. Matrix filling: we will the matrix with highest possible scores. to align with diagonal (align in next position.) align in off- diagonal requires inserion of corresponding gaps. 3.trace back and aligning: move from last corner and follow arrow.
  • 13.
    Global alignment viadynamic programming • 1st column and 1st row will be empty. • Fill 1st block with zero. • Then fill 1st row and 1st column with gap penalty multiples. • While filling the matrix there are three possible values horizontal; score + gap penalty vertical ; score + gap penalty diagonal; score + (match / mismatch) • We have to write max score from these values in a cell • Let, match = +1 mismatch= -1 gap penalty= -2
  • 14.
    Lets, sequence - AAAC sequence– AGC A A A C 0 -2 -4 -6 -8 A -2 1 -1 -3 -5 G -4 -1 0 -2 -4 C -6 -3 -2 -1 -1
  • 15.
    Backward tracking • Inbackward tracking we have to move from last cell (lower corner) and follows arrow from which cell the current cell’s values come from and go ahead. • Now we have to align this sequences. • For aligning there are 2 rules. 1.If the value come from column we will have to write 2 sequences. 2. If value come from horizontal or vertical then we will have to write perpendicular and add gap to other side.
  • 16.
    Local alignment viadynamic programming • Algorithm is same as in global alignment, but there are some changes. • We fill 1st column and 1st row with zero. • If the value comes in negative number than it is replaced by zero. • Backtracking will be start from maximum value. • Let, match= 1 mismatch = 0 gap penalty = 0
  • 17.
    Lets , sequence -GAATTCAGTTA sequence- GGATCGA G A A T T C A G T T A 0 0 0 0 0 0 0 0 0 0 0 0 G 0 1 1 1 1 1 1 1 1 1 1 1 G 0 1 1 1 1 1 1 1 2 2 2 2 A 0 1 2 2 2 2 2 2 2 2 2 3 T 0 1 2 2 3 3 3 3 3 3 3 3 C 0 1 2 2 3 3 4 4 4 4 4 4 G 0 1 2 2 3 3 4 4 5 5 5 5 A 0 1 2 3 3 3 3 4 5 5 5 6
  • 18.
    Backtracking • After thematrix fill step, the maximum alignment score for the two test sequences is 6. the trace back step determines the actual alignment that result in the maximum score. • Rule will be same for this as in global alignment • Seq# 1 GAATTCAGTTA • Seq#2 GA – TC – G – - A so in this way we align the sequence using dynamic programming.
  • 19.
    Uses of sequencing •It can be used to find genes, segments of DNA that code for a specific protein or phenotype • If a region of DNA has been sequenced, it can be screened for characteristics features of genes. Advantages of global alignment: • Easy to understand, complete sequences in output. • Checking minor differences between 2 sequences. • Finding polymorphisms between 2 sequences. Advantages of local alignment: • mRNA vs. genomic DNA ; introns/ exons • Genes/ proteins are modular • Finding repeat elements within 1 sequences. • Possible to determine e-values.
  • 20.
    References • www.google.com • www.cs.mcgill.ca/~rwest/wikispeedia/wpcd/wp/s/sequence- alignment.htm •https://www.slideshare.net/mobile/ammarkareem3/sequence-alignment- 58496054 • https:www.slideshare.net/mobile/zohaibkhan404/dynamic-programming- 42984154
  • 21.