SlideShare a Scribd company logo
Sequence
Alignment
Made by:- Nagendra sahu
 Alignment of pairs of sequence
 Local and global alignments
 Methods of alignments
Dot matrix analysis
Dynamic programming approach
 Use of scoring matrices and gap penalties
 Scoring matrices -- PAM and BLOSUM
Outline
Sequence alignment is a way of arranging the sequences of
DNA, RNA or protein to identify regions of similarity that may
be a consequence of functional, structural or evolutionary
relationships between the sequences.
The procedure of comparing two (pair-wise alignment) or
more multiple sequences is to search for a series of individual
characters or patterns that are in the same order in the
sequences.
 There are two types of alignment: local and global.
What is sequence alignment?
Global alignment vs Local alignment
 Global alignment is attempting to match as much of the sequence as
possible.
The tool for Global alignment is based on Needleman-Wunsch algorithm.
 Local alignment is to try to find the regions with highest density of
matches. The tool for local alignment is based on Smith-Waterman.
 Both algorithms are derivates from the basic dynamic programming
algorithm.
L G P S S K Q T G K G S - S R I W D N
Global alignment
L N - I T K S A G K G A I M R L G D A
- - - - - - - T G K G - - - - - - - -
Local alignment
- - - - - - - A G K G - - - - - - - -
 Sequence alignment is useful for discovering structural,
functional and evolutionary information in biological sequences.
 Sequences that are very much alike may have similar secondary
and 3D structure, similar function and likely a common ancestral
sequence. It is extremely unlikely that such sequences obtained
similarity by chance.
-- For DNA molecules with n nucleotides such probability is very
low P = 4-n
.
-- For proteins with n nucleotides, the probability even much lower
P = 20 –n
.
Sequence alignment makes the following tasks easy: 1.annotation
of new sequences; 2. modelling of protein structures; 3. design and
analysis of gene expression experiments
Why do sequence alignment?
An example of aligning text strings
Raw Data ???
T C A T G
C A T T G
2 matches, 0 gaps
T C A T G
| |
C A T T G
3 matches (2 end gaps)
T C A T G .
| | |
. C A T T G
4 matches, 1 insertion
T C A - T G
| | | |
. C A T T G
4 matches, 1 insertion
T C A T - G
| | | |
. C A T T G
Terminologies of sequence comparison
 Sequence identity -- exactly the same Amino Acid or Nucleotide in the
same position.
 Sequence similarity -- Substitutions with similar chemical properties.
 Sequence homology -- general term that indicates evolutionary
relatedness among sequences; we usually measure of percentage
identity of sequence homology
 Pairwise alignment -- used to find the best-matching piecewise (local)
or global alignments of two query sequences. Pairwise alignments
can only be used between two sequences at a time.
 Multiple sequence alignment -- try to align all of the sequences in a
given query set.
 Dot matrix analysis
 The dynamic programming (DP) algorithm
 Word methods
Methods of pairwise alignment
 A dot matrix analysis is a method for comparing two
sequences to look for possible alignment (Gibbs and McIntyre
1970)
 The algorithm for a dot matrix:
1. One sequence (A) is listed across the top of the matrix and the
other (B) is listed down the left side
2. Starting from the first character in B, one moves across the page
keeping in the first row and placing a dot in many column where the
character in A is the same
3. The process is continued until all possible comparisons between
A and B are made
4. Any region of similarity is revealed by a diagonal row of dots
5. Isolated dots not on diagonal represent random matches
What is Dot matrix analysis
 It can detect of matching regions can be improved by
filtering out random matches and this can be achieved by
using a sliding window
 It can be used to assess repetitiveness in a single
sequence, such as direct and inverted repeats within the
sequences
What can Dot matrix analysis do?
http://arbl.cvmbs.colostate.edu/molkit/dnadot/index.html
1st
example of Dot matrix analysis: two
identical sequences
 http://arbl.cvmbs.colostate.edu/molkit/dnadot/index.html
2nd
example of Dot matrix analysis: two very
different sequences
http://arbl.cvmbs.colostate.edu/molkit/dnadot/index.html
3rd
example of Dot matrix analysis: two similar
sequences sequences
 The approach compares every pair of characters in the two sequences
and generates an alignment, which is the best or optimal.
 The method can be useful in aligning nucleotide to protein sequences.
The method requires large amounts of computing power and is a highly
computationally demanding because the nature of dynamic programming
technique is recursion.
New algorithmic improvements as well as increasing computer capacity
make possible to align a query sequence against a large DB in a few
minutes.
Two approaches for dynamic programming: Top-down approach and
Bottom-up.
Dynamic programming algorithm
 The alignment procedure depends upon scoring system based on
probability that:
1) a particular amino acid pair is found in alignments of related proteins
(pxy);
2) the same amino acid pair is aligned by chance (pxpy);
3) introduction of a gap would be a better choice as it increases the score.
 A substitution matrix is composed of the ratio of the first two
probabilities. There are many such matrices, two of them PAM and
BLOSUM will be talked in next few slides.
 The calculation of scores for the gap introduction and its extension is
from the matrices and represent a prior knowledge and some assumptions.
For example: one of them is quite simple, if negative cost of a gap is too
high a reasonable alignment between slightly different sequences will be
never achieved but if it is too low an optimal alignment is hardly possible.
Other assumptions are based on sophisticated statistical procedures.
The procedure of the dynamic programming algorithm
An example: scoring a sequence alignment
with a gap penalty
Sequence 1 V D S - C Y
Sequence 2 V E S L C Y
Score 4 2 4 -11 9 7
Score = sum of amino acid pair scores (26)
minus single gap penalty (11) = 15
Note: 1. it is likely to have non-identical amino acids placed in the
corresponding positions.
2. Scores gained by each match are not always the same, for
instance two rare amino acids will score more than two common.
3. The alignment gap(s) may be introduced for optimising the
score. Introduction of gaps causes penalties.
Steps for the dynamic programming
algorithm
1. Score of new = Score of previous + Score of new
alignment alignment (A) aligned pair
V D S - C Y V D S - C Y
V E S L C Y V E S L C Y
15 = 8 + 7
2. Score of = Score of previous + Score of new
alignment (A) alignment (B) aligned pair
V D S - C V D S - C
V E S L C V E S L C
8 = -1 + 9
3. Repeat removing aligned pairs until end of alignments is reached
Why use a substitution matrix?
 Determine likelihood of homology between two
sequences.
 Substitutions that are more likely should get a
higher score,
 Substitutions that are less likely should get a
lower score.
How to calculate Scoring Matrices
 Log-odds matrix where each cell gives the probability of
aligning those two residues
 Score of alignment = Sum of log-odds scores of residues
 Score for each residue given by:
)log(
1
),(
ba
ab
ff
p
bas
λ
=
Types of Matrices
 Percent Identity
 Standard scoring matrix to align DNA sequences
 PAM
 Estimates the rate at which each possible residue in a
sequence changes to each other residue over time
 BLOSUM-X
 Identifies sequences that are X% similar to the query
sequence
Scoring matrices: PAM (Percent Accepted Mutation) and
BLOSUM62 (BLOcks amino acid SUbstitution Matrices)
Amino acids are grouped according to to the
chemistry of the side group: (C) sulfhydryl, (STPAG)-
small hydrophilic, (NDEQ) acid, acid amide and
hydrophilic, (HRK) basic, (MILV) small hydrophobic,
and (FYW) aromatic. Log odds values: +10 means
that ancestor probability is greater, 0 means that the
probability are equal, -4 means that the change is
random. Thus the probability of alignment YY/YY is
10+10=20, whereas YY/TP is –3-5=-8, a rare and
unexpected between homologous sequences.
BLOSUM is based on local alignments. BLOSUM was first
introduced in a paper by Henikoff and Henikoff. They
scanned the for very conserved regions of protein families
(that do not have gaps in the sequence alignment) and then
counted the relative frequencies of amino acids and their
substitution probabilities. Then, they calculated a log-odds
score for each of the 210 possible substitutions of the 20
standard amino acids.
Word methods
 Word methods, also known as k-tuple methods, are
heuristic methods that are not guaranteed to find an
optimal alignment solution, but are significantly more
efficient than dynamic programming.
 The typical tools used for this method is BLAST and
FASTA.
The list of sequence alignment software
 http://en.wikipedia.org/wiki/List_of_seque
nce_alignment_software

More Related Content

What's hot

Global alignment
Global alignmentGlobal alignment
Global alignment
Pinky Vincent
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
Harshita Bhawsar
 
Scoring schemes in bioinformatics
Scoring schemes in bioinformaticsScoring schemes in bioinformatics
Scoring schemes in bioinformatics
SumatiHajela
 
BLAST
BLASTBLAST
Clustal
ClustalClustal
Clustal
Benittabenny
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
PILLAI ASWATHY VISWANATH
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
Bioinformatics and Computational Biosciences Branch
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
Ashwini
 
Dot matrix
Dot matrixDot matrix
Dot matrix
Tania Khan
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Ramya S
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
Mariya Raju
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
Siva Dharshini R
 
Sequence alignment 1
Sequence alignment 1Sequence alignment 1
Sequence alignment 1
SumatiHajela
 
Distance based method
Distance based method Distance based method
Distance based method
Adhena Lulli
 
FASTA
FASTAFASTA
Pathway and network analysis
Pathway and network analysisPathway and network analysis
Pathway and network analysis
Manar Al-Eslam Mattar
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
naveed ul mushtaq
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
Meghaj Mallick
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
SHEETHUMOLKS
 
Phylogenetic tree construction
Phylogenetic tree constructionPhylogenetic tree construction
Phylogenetic tree construction
Uddalok Jana
 

What's hot (20)

Global alignment
Global alignmentGlobal alignment
Global alignment
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
 
Scoring schemes in bioinformatics
Scoring schemes in bioinformaticsScoring schemes in bioinformatics
Scoring schemes in bioinformatics
 
BLAST
BLASTBLAST
BLAST
 
Clustal
ClustalClustal
Clustal
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Sequence alignment 1
Sequence alignment 1Sequence alignment 1
Sequence alignment 1
 
Distance based method
Distance based method Distance based method
Distance based method
 
FASTA
FASTAFASTA
FASTA
 
Pathway and network analysis
Pathway and network analysisPathway and network analysis
Pathway and network analysis
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Phylogenetic tree construction
Phylogenetic tree constructionPhylogenetic tree construction
Phylogenetic tree construction
 

Similar to Seq alignment

sequence alignment
sequence alignmentsequence alignment
sequence alignment
ammar kareem
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
Vidya Kalaivani Rajkumar
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
alizain9604
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
H K Yoon
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
Rai University
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
Rai University
 
Laboratory 1 sequence_alignments
Laboratory 1 sequence_alignmentsLaboratory 1 sequence_alignments
Laboratory 1 sequence_alignments
seham15
 
Needleman wunsch computional ppt
Needleman wunsch computional pptNeedleman wunsch computional ppt
Needleman wunsch computional ppt
tarun shekhawat
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdf
sriaisvariyasundar
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence Analysis
Sangeeta Das
 
Dot matrix Analysis Tools (Bioinformatics)
Dot matrix Analysis Tools (Bioinformatics)Dot matrix Analysis Tools (Bioinformatics)
Dot matrix Analysis Tools (Bioinformatics)
Safa Khalid
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
Abhishek Vatsa
 
How the blast work
How the blast workHow the blast work
How the blast work
Atai Rabby
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
Computer Science Club
 
multiple sequence and pairwise alignment.pdf
multiple sequence and pairwise alignment.pdfmultiple sequence and pairwise alignment.pdf
multiple sequence and pairwise alignment.pdf
sriaisvariyasundar
 
4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptx
ArupKhakhlari1
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data
Krish_ver2
 
Sequence alignment belgaum
Sequence alignment belgaumSequence alignment belgaum
Sequence alignment belgaum
National Institute of Biologics
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
Prof. Wim Van Criekinge
 

Similar to Seq alignment (20)

sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
Laboratory 1 sequence_alignments
Laboratory 1 sequence_alignmentsLaboratory 1 sequence_alignments
Laboratory 1 sequence_alignments
 
Needleman wunsch computional ppt
Needleman wunsch computional pptNeedleman wunsch computional ppt
Needleman wunsch computional ppt
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdf
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence Analysis
 
Dot matrix Analysis Tools (Bioinformatics)
Dot matrix Analysis Tools (Bioinformatics)Dot matrix Analysis Tools (Bioinformatics)
Dot matrix Analysis Tools (Bioinformatics)
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
How the blast work
How the blast workHow the blast work
How the blast work
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
multiple sequence and pairwise alignment.pdf
multiple sequence and pairwise alignment.pdfmultiple sequence and pairwise alignment.pdf
multiple sequence and pairwise alignment.pdf
 
4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptx
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data
 
Sequence alignment belgaum
Sequence alignment belgaumSequence alignment belgaum
Sequence alignment belgaum
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 

More from Nagendrasahu6

Blood group (population genetic and evolution) by nagendra sahu
Blood group (population genetic and evolution) by nagendra sahuBlood group (population genetic and evolution) by nagendra sahu
Blood group (population genetic and evolution) by nagendra sahu
Nagendrasahu6
 
Exponential functions
Exponential functionsExponential functions
Exponential functions
Nagendrasahu6
 
the t test
the t testthe t test
the t test
Nagendrasahu6
 
blast and fasta
 blast and fasta blast and fasta
blast and fasta
Nagendrasahu6
 
Bacteriophage life cycle
Bacteriophage life cycleBacteriophage life cycle
Bacteriophage life cycle
Nagendrasahu6
 
Bilirubin
BilirubinBilirubin
Bilirubin
Nagendrasahu6
 
Bililrubin
BililrubinBililrubin
Bililrubin
Nagendrasahu6
 
BIOINFORMATICS - NCBI
BIOINFORMATICS - NCBIBIOINFORMATICS - NCBI
BIOINFORMATICS - NCBI
Nagendrasahu6
 
Thyroid gland structure and function
Thyroid gland structure and functionThyroid gland structure and function
Thyroid gland structure and function
Nagendrasahu6
 
communication and social media
communication and social mediacommunication and social media
communication and social media
Nagendrasahu6
 
Endocrinology
EndocrinologyEndocrinology
Endocrinology
Nagendrasahu6
 
Gene therapy
Gene therapyGene therapy
Gene therapy
Nagendrasahu6
 
Histological equipment
Histological equipmentHistological equipment
Histological equipment
Nagendrasahu6
 

More from Nagendrasahu6 (13)

Blood group (population genetic and evolution) by nagendra sahu
Blood group (population genetic and evolution) by nagendra sahuBlood group (population genetic and evolution) by nagendra sahu
Blood group (population genetic and evolution) by nagendra sahu
 
Exponential functions
Exponential functionsExponential functions
Exponential functions
 
the t test
the t testthe t test
the t test
 
blast and fasta
 blast and fasta blast and fasta
blast and fasta
 
Bacteriophage life cycle
Bacteriophage life cycleBacteriophage life cycle
Bacteriophage life cycle
 
Bilirubin
BilirubinBilirubin
Bilirubin
 
Bililrubin
BililrubinBililrubin
Bililrubin
 
BIOINFORMATICS - NCBI
BIOINFORMATICS - NCBIBIOINFORMATICS - NCBI
BIOINFORMATICS - NCBI
 
Thyroid gland structure and function
Thyroid gland structure and functionThyroid gland structure and function
Thyroid gland structure and function
 
communication and social media
communication and social mediacommunication and social media
communication and social media
 
Endocrinology
EndocrinologyEndocrinology
Endocrinology
 
Gene therapy
Gene therapyGene therapy
Gene therapy
 
Histological equipment
Histological equipmentHistological equipment
Histological equipment
 

Recently uploaded

Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
SSR02
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 

Recently uploaded (20)

Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 

Seq alignment

  • 2.  Alignment of pairs of sequence  Local and global alignments  Methods of alignments Dot matrix analysis Dynamic programming approach  Use of scoring matrices and gap penalties  Scoring matrices -- PAM and BLOSUM Outline
  • 3. Sequence alignment is a way of arranging the sequences of DNA, RNA or protein to identify regions of similarity that may be a consequence of functional, structural or evolutionary relationships between the sequences. The procedure of comparing two (pair-wise alignment) or more multiple sequences is to search for a series of individual characters or patterns that are in the same order in the sequences.  There are two types of alignment: local and global. What is sequence alignment?
  • 4. Global alignment vs Local alignment  Global alignment is attempting to match as much of the sequence as possible. The tool for Global alignment is based on Needleman-Wunsch algorithm.  Local alignment is to try to find the regions with highest density of matches. The tool for local alignment is based on Smith-Waterman.  Both algorithms are derivates from the basic dynamic programming algorithm. L G P S S K Q T G K G S - S R I W D N Global alignment L N - I T K S A G K G A I M R L G D A - - - - - - - T G K G - - - - - - - - Local alignment - - - - - - - A G K G - - - - - - - -
  • 5.  Sequence alignment is useful for discovering structural, functional and evolutionary information in biological sequences.  Sequences that are very much alike may have similar secondary and 3D structure, similar function and likely a common ancestral sequence. It is extremely unlikely that such sequences obtained similarity by chance. -- For DNA molecules with n nucleotides such probability is very low P = 4-n . -- For proteins with n nucleotides, the probability even much lower P = 20 –n . Sequence alignment makes the following tasks easy: 1.annotation of new sequences; 2. modelling of protein structures; 3. design and analysis of gene expression experiments Why do sequence alignment?
  • 6. An example of aligning text strings Raw Data ??? T C A T G C A T T G 2 matches, 0 gaps T C A T G | | C A T T G 3 matches (2 end gaps) T C A T G . | | | . C A T T G 4 matches, 1 insertion T C A - T G | | | | . C A T T G 4 matches, 1 insertion T C A T - G | | | | . C A T T G
  • 7. Terminologies of sequence comparison  Sequence identity -- exactly the same Amino Acid or Nucleotide in the same position.  Sequence similarity -- Substitutions with similar chemical properties.  Sequence homology -- general term that indicates evolutionary relatedness among sequences; we usually measure of percentage identity of sequence homology  Pairwise alignment -- used to find the best-matching piecewise (local) or global alignments of two query sequences. Pairwise alignments can only be used between two sequences at a time.  Multiple sequence alignment -- try to align all of the sequences in a given query set.
  • 8.  Dot matrix analysis  The dynamic programming (DP) algorithm  Word methods Methods of pairwise alignment
  • 9.  A dot matrix analysis is a method for comparing two sequences to look for possible alignment (Gibbs and McIntyre 1970)  The algorithm for a dot matrix: 1. One sequence (A) is listed across the top of the matrix and the other (B) is listed down the left side 2. Starting from the first character in B, one moves across the page keeping in the first row and placing a dot in many column where the character in A is the same 3. The process is continued until all possible comparisons between A and B are made 4. Any region of similarity is revealed by a diagonal row of dots 5. Isolated dots not on diagonal represent random matches What is Dot matrix analysis
  • 10.  It can detect of matching regions can be improved by filtering out random matches and this can be achieved by using a sliding window  It can be used to assess repetitiveness in a single sequence, such as direct and inverted repeats within the sequences What can Dot matrix analysis do?
  • 12.  http://arbl.cvmbs.colostate.edu/molkit/dnadot/index.html 2nd example of Dot matrix analysis: two very different sequences
  • 13. http://arbl.cvmbs.colostate.edu/molkit/dnadot/index.html 3rd example of Dot matrix analysis: two similar sequences sequences
  • 14.  The approach compares every pair of characters in the two sequences and generates an alignment, which is the best or optimal.  The method can be useful in aligning nucleotide to protein sequences. The method requires large amounts of computing power and is a highly computationally demanding because the nature of dynamic programming technique is recursion. New algorithmic improvements as well as increasing computer capacity make possible to align a query sequence against a large DB in a few minutes. Two approaches for dynamic programming: Top-down approach and Bottom-up. Dynamic programming algorithm
  • 15.  The alignment procedure depends upon scoring system based on probability that: 1) a particular amino acid pair is found in alignments of related proteins (pxy); 2) the same amino acid pair is aligned by chance (pxpy); 3) introduction of a gap would be a better choice as it increases the score.  A substitution matrix is composed of the ratio of the first two probabilities. There are many such matrices, two of them PAM and BLOSUM will be talked in next few slides.  The calculation of scores for the gap introduction and its extension is from the matrices and represent a prior knowledge and some assumptions. For example: one of them is quite simple, if negative cost of a gap is too high a reasonable alignment between slightly different sequences will be never achieved but if it is too low an optimal alignment is hardly possible. Other assumptions are based on sophisticated statistical procedures. The procedure of the dynamic programming algorithm
  • 16. An example: scoring a sequence alignment with a gap penalty Sequence 1 V D S - C Y Sequence 2 V E S L C Y Score 4 2 4 -11 9 7 Score = sum of amino acid pair scores (26) minus single gap penalty (11) = 15 Note: 1. it is likely to have non-identical amino acids placed in the corresponding positions. 2. Scores gained by each match are not always the same, for instance two rare amino acids will score more than two common. 3. The alignment gap(s) may be introduced for optimising the score. Introduction of gaps causes penalties.
  • 17. Steps for the dynamic programming algorithm 1. Score of new = Score of previous + Score of new alignment alignment (A) aligned pair V D S - C Y V D S - C Y V E S L C Y V E S L C Y 15 = 8 + 7 2. Score of = Score of previous + Score of new alignment (A) alignment (B) aligned pair V D S - C V D S - C V E S L C V E S L C 8 = -1 + 9 3. Repeat removing aligned pairs until end of alignments is reached
  • 18. Why use a substitution matrix?  Determine likelihood of homology between two sequences.  Substitutions that are more likely should get a higher score,  Substitutions that are less likely should get a lower score.
  • 19. How to calculate Scoring Matrices  Log-odds matrix where each cell gives the probability of aligning those two residues  Score of alignment = Sum of log-odds scores of residues  Score for each residue given by: )log( 1 ),( ba ab ff p bas λ =
  • 20. Types of Matrices  Percent Identity  Standard scoring matrix to align DNA sequences  PAM  Estimates the rate at which each possible residue in a sequence changes to each other residue over time  BLOSUM-X  Identifies sequences that are X% similar to the query sequence
  • 21. Scoring matrices: PAM (Percent Accepted Mutation) and BLOSUM62 (BLOcks amino acid SUbstitution Matrices) Amino acids are grouped according to to the chemistry of the side group: (C) sulfhydryl, (STPAG)- small hydrophilic, (NDEQ) acid, acid amide and hydrophilic, (HRK) basic, (MILV) small hydrophobic, and (FYW) aromatic. Log odds values: +10 means that ancestor probability is greater, 0 means that the probability are equal, -4 means that the change is random. Thus the probability of alignment YY/YY is 10+10=20, whereas YY/TP is –3-5=-8, a rare and unexpected between homologous sequences. BLOSUM is based on local alignments. BLOSUM was first introduced in a paper by Henikoff and Henikoff. They scanned the for very conserved regions of protein families (that do not have gaps in the sequence alignment) and then counted the relative frequencies of amino acids and their substitution probabilities. Then, they calculated a log-odds score for each of the 210 possible substitutions of the 20 standard amino acids.
  • 22. Word methods  Word methods, also known as k-tuple methods, are heuristic methods that are not guaranteed to find an optimal alignment solution, but are significantly more efficient than dynamic programming.  The typical tools used for this method is BLAST and FASTA.
  • 23. The list of sequence alignment software  http://en.wikipedia.org/wiki/List_of_seque nce_alignment_software