SlideShare a Scribd company logo
Course: B.Sc Biochemistry
Subject: Basic of Bioinformatics
Unit: III
OUTLINE
 Sequence Alignment
 Scoring Alignments and Substitution Matrices
 Inserting Gaps
 Dynamic Programming
 Database Searches
Sequence Alignment
 Comparing sequences for
– Similarity
– Homology
 Prediction of function of genes and proteins
 Construction of phylogeny
 Finding motifs
Sequence Alignment - HOMOLOGY
 Orthologues : any gene pairwise relation
where the ancestor node is a speciation
event. Often have similar function
 Paralogues : any gene pairwise relation
where the ancestor node is a duplication
event. Paralogs tend to have different
functions
Sequence Alignment - HOMOLOGY
1.
Sequence Alignment - HOMOLOGY
2.
Sequence Alignment - PHYLOGENY
3.
Sequence Alignment – PROTEIN
FUNCTIONS
4.
Scoring Alignments and Substitution
Matrices
 The quality of an alignment is measured by
giving it a quantitative score
 The simplest way of quatifying similarity
between two sequences is percentage
identity.
– Simply measured by counting the number of
identical bases or amino acids matched
between the aligned sequences.
Scoring Alignments and Substitution
Matrices
 The dot-plot
gives a visual
assesment of
similarity based
on identity.
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum]5.
Scoring Alignments and Substitution
Matrices
 Percentage identity is a relatively crude
measure and does bot give a complete
picture of the degree of similarity of two
sequences.
 Scoring identical matches 1 and mismatches
as 0 ignores the fact that the type of amino
acids involved is highly significant.
Scoring Alignments and Substitution
Matrices
 Genuine matches may not be identical:
Seq1: T H I S I S A S E Q U E N C E
Seq1: T H A T _ _ _ S E Q U E N C E
Isoleucine – Alanine: both hydrophobic
Serine – Threonine : both polar
Scoring Alignments and Substitution
Matrices
 Scoring pairs of amino acids:
– with similar properties  higher scores
– With different properties  lower scores
Scoring Alignments and Substitution
Matrices
 To assign scores for alignmens use
SUBSTITUTION MATRICES
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum]
5.
Scoring Alignments and Substitution
Matrices
 Different types of substitution matrices are
being used based on:
– The number of mutations required for
convertion of one amino acid to the other
– Similarities in physicochemical properties.
Scoring Alignments and Substitution
Matrices
 PAM substitution matrices:
– Use closely related protein sequences to
derive substitution frequencies
– Accepted Point Mutations per 100 residues
 250 PAM  250 mutation on 100 residues
Scoring Alignments and Substitution
Matrices
 BLOSUM substitution matrices:
– BLOcks of Amino Acid SUbstitution Matrix
– Use mutation data from highly conserved
local regions
– BLOSUM 62  62% identity
Scoring Alignments and Substitution
Matrices
 Which matrix to use ?
– Depends on the problem properties,
– Distantly related sequences : PAM 250 –
BLOSUM 50
– Closely related sequences: PAM 120,
BLOSUM 80
Scoring Alignments and Substitution
Matrices
 Which matrix to use ?
– Some special purpose matrices (SLIM and
PHAT are designed for membrane proteins)
– The length of the sequende is important
 Short sequences  PAM 40 or BLOSUM 80
 Long sequences  PAM 250 or BLOSUM 50
Scoring Alignments and Substitution
Matrices
 BLOSUM – 62 and PAM 120
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum] 6.
Inserting Gaps
 Gap insertion requires a scoring penalty (gap
penalty).
 To achieve correct matches gaps are
required
 Alignment programs use gap penalties to
limit the introduction of gaps in the
alignments
Inserting Gaps
 Insertions tend to be several residues long
rather than just a single residue long
– Fewer insertions and deletions occur in sequences
of structural importance
– Smaller penalty on lengthening an existing gap
(gap extension penalty) than introducing a new
gap
– Gap penaly is high  the number of gaps will be
decreased
– Gap penalty is low  more and large gaps will be
inserted.
Inserting Gaps
 Choosing gap penalties:
– Linear
– Affine
 Gap open penalty
 Gap extension penlty
Dynamic Programming
 Global and Local alignments
 Pairwise and Multiple alignments
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum] 7.
 For a pair of sequences there is a large
number of possible alignments.
 2 sequences of length 1000 have
appriximately 10600
different alignments.
Dynamic Programming
 Dynamic Programming:
– Problem can be divided into many smaller parts.
– Optimal alignment will not contain parts that are
not themselves optimal.
– Start from sufficiently short sub-sequences.
– Alignement is additive:
Dynamic Programming
 Needleman and Wunsch were the first to
propose this method.
 Find optimal global alignments.
 Align sequences:
– Seq1: x (x1x2x3…xm)
– Seq1: y (y1y2y3…yn)
Dynamic Programming
 s(a,b) = score of aligning a and b
 F(i,j) = optimal similarity of X(1:i) and Y(1:j)
 Recurrence relation:
– F(i,0) = Σ s(X(k), gap), 0 <= k <= i
– F(0,j) =Σ s(gap, B(k)), 0 <= k <= j
– F(i,j) = max [ F(i,j-1) + s(gap,Y(j),
F(i-1,j) + s(X(i),gap),
F(i-1, j-1) + s(X(i), Y(j)]
– Assume linear gap penalty
Dynamic Programming
Dynamic Programming
 Matrix S of optimal scores of sub-sequence
alignments.
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum]
9.
Dynamic Programming
S(I, T) = -1,
10.
Dynamic Programming
S(I, H) = -3,
S(I, gap) = -8,
S(gap, H) = -8
Recurrence relation:
F(i,j) = max [ F(i,j-1) + s(gap,Y(j),
F(i-1,j) + s(X(i),gap),
F(i-1, j-1) + s(X(i), Y(j)]
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum]
11.
Dynamic Programming
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum]
12.
Dynamic Programming
–Linear gap penalty (E=4)
[“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum]
13.
Dynamic Programming
 Semi – global alignment:
– When we treat terminal gaps differently than
internal gaps
– How to modify dynamic programming to be able
to make semi – global alignment ?
Dynamic Programming
 Local alignment:
– If we compare a sequence to whole genome
– Find sub-strings whose optimal global
alignment value is maximum
Dynamic Programming
 What is the difference between global and
local alignment ?
 Can we define the recuernce relation of local
alignment similar to global alignment ?
Recurrence relation of GLOBAL ALIGNMENT:
(Needleman & Wunsch)
– F(i,0) = Σ s(X(k), gap), 0 <= k <= i
– F(0,j) =Σ s(gap, B(k)), 0 <= k <= j
– F(i,j) = max [ F(i,j-1) + s(gap,Y(j),
F(i-1,j) + s(X(i),gap),
F(i-1, j-1) + s(X(i), Y(j)]
Dynamic Programming
Recurrence relation of LOCAL ALIGNMENT:
(Smith-Waterman)
– F(i,0) = 0
– F(0,j) = 0
– F(i,j) = max [ 0,
F(i,j-1) + s(gap,Y(j),
F(i-1,j) + s(X(i),gap),
F(i-1, j-1) + s(X(i), Y(j)]
Dynamic Programming
Database Searches
 FASTA and BLAST
 Use some heuristics
 Dynamic Programming Complexity
– Time O(n*m)
– Space O(n*m)
Database Searches FASTA
 Good local alignment should have some
exact match subsequence.
 Find all k-tuples. (k=1-2 for proteins, 3-6 for
DNA sequences)
 Protein k – tuples  nc, sp, … (k = 2)
 Nucleotide k – tuples  TAAA, CTCC,…(k = 4)
Database Searches FASTA
 If k = 3 for nucleotide sequences.
– There will be 64 possible k – tuples
– Assign a number e( ):
 e(A) = 0, e(C) = 1, e(G) = 2, e(T) = 3
 Each 3 – tuples are represented as  xi xi+1xi+2
 Assign a number to each 3 – tuple
– Ci = e(xi)42
+ e(xi+1)41
+ e(xi+2)40
– For example: AAA 
 AAA 042
+ 041
+ 040
= 0
 CAA 142
+ 041
+ 040
= 16
Database Searches FASTA
 Find each occurance of k – tuples in the
sequences.
 Chaining  Look – Up Tables
 Consider TAAAACTCTAAC (if k = 3):
3 - tuples Position
AAA (0) 2, 3
AAC (1) 4, 10
AAG (2) 0
AAT (3) 0
… …
Database Searches BLAST
 Use short words to search the database
sequence.
 Searches for k – mers that will score above a
threshold (T) value when aligned with query k -
mer (Remember FASTA looks for k – tuples
which are identical).
 Use a scheme based on finite state automata
(Remember FASTA use hashing and chaining
fot rapid identification of k - tuples)
Database Searches BLAST
 From Query Sequence, create query words
(for protein sequences word size is 3)
Database Searches BLAST
 Blast uses a list of high scoring words created
from words similar to query words. Considers
the words with a score bigger than a threshold
value.
Database Searches BLAST
 Scan each database sequence for an exact
match to the list of words.
 Word hits are then extended in either direction
in an attempt to generate an alignment with a
score exceeding the threshold of "S".
Database Searches BLAST
 Keep only the extended matches that have a
score at least S.
 Determine statistical significance of each
remaining match.
Database Searches BLAST
http://blast.ncbi.nlm.nih.gov/Blast.cgi
1.
14.
Database Searches BLAST
15.
Database Searches BLAST
16.
Database Searches BLAST
17.
Database Searches BLAST
18.
Database Searches HISTORY
 1970: NW
 1980: SW
 1985: FASTA
 1989: BLAST
Books and Web References
 Books Name :
1. Introduction To Bioinformatics by T. K. Attwood
2. BioInformatics by Sangita
3. Basic Bioinformatics by S.Ignacimuthu, s.j.
 http://en.wikipedia.org/wiki/Sequence_alignment
 http://pages.cs.wisc.edu/~bsettles/ibs08/lectures/02-alignment.pdf
 http://www.ks.uiuc.edu/Training/Tutorials/science/bioinformatics-
tutorial/bioinformatics.pdf
 M. Zvelebil, J. O. Baum, “Understanding Bioinformatics”, 2008,
Garland Science
 Andreas D. Baxevanis, B.F. Francis Ouellette, “Bioinformatics: A
practical guide to the analysis of genes and proteins”, 2001,
Wiley.54
Images References
 1.http://gorbi.irb.hr/files/5712/7497/9729/Slide09.jpg
 2.http://www.ensembl.org/info/genome/compara/tree_exa
mple1.png
 3.http://www.nature.com/nature/journal/v496/n7445/imag
es/nature12027-f1.2.jpg
 4.
http://upload.wikimedia.org/wikipedia/commons/e/e6/Spo
mbe_Pop2p_protein_structure_rainbow.png
 5. & 6. Book: Basic Bioinformatics by S.Ignacimuthu, s.j.
 7. to 13. Book: Basic Bioinformatics by S.Ignacimuthu, s.j.
 14. to 18. http://blast.ncbi.nlm.nih.gov/Blast.cgi

More Related Content

What's hot

Kegg
KeggKegg
Kegg
msfbi1521
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
VinaKhan1
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
Arindam Ghosh
 
Dot matrix
Dot matrixDot matrix
Dot matrix
Tania Khan
 
Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
Nusrat Gulbarga
 
dot plot analysis
dot plot analysisdot plot analysis
dot plot analysis
ShwetA Kumari
 
Ncbi
NcbiNcbi
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Subhranil Bhattacharjee
 
Dynamic programming
Dynamic programming Dynamic programming
Dynamic programming
Zohaib HUSSAIN
 
Biological data base
Biological data baseBiological data base
Biological data base
kishoreGupta17
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
Nitin Naik
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
Zeeshan Hanjra
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
Kubuldinho
 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_prediction
ShwetA Kumari
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
saswat tripathy
 
UniProt
UniProtUniProt
UniProt
AmnaA7
 
Protein databases
Protein databasesProtein databases
Protein databases
bansalaman80
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
sagrika chugh
 

What's hot (20)

Kegg
KeggKegg
Kegg
 
Proteome databases
Proteome databasesProteome databases
Proteome databases
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
 
dot plot analysis
dot plot analysisdot plot analysis
dot plot analysis
 
Ncbi
NcbiNcbi
Ncbi
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Dynamic programming
Dynamic programming Dynamic programming
Dynamic programming
 
Biological data base
Biological data baseBiological data base
Biological data base
 
Protein Structure Prediction
Protein Structure PredictionProtein Structure Prediction
Protein Structure Prediction
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_prediction
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
UniProt
UniProtUniProt
UniProt
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 

Similar to B.sc biochem i bobi u 3.1 sequence alignment

lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
alizain9604
 
Seq alignment
Seq alignment Seq alignment
Seq alignment
Nagendrasahu6
 
Blast fasta
Blast fastaBlast fasta
Blast fastayaghava
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
H K Yoon
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge
Prof. Wim Van Criekinge
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingProf. Wim Van Criekinge
 
2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge
Prof. Wim Van Criekinge
 
Protein Threading
Protein ThreadingProtein Threading
Protein Threading
SANJANA PANDEY
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
journal ijrtem
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
IJRTEMJOURNAL
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
ammar kareem
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
AnkitTiwari354
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
Rai University
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptx
ArupKhakhlari1
 
How the blast work
How the blast workHow the blast work
How the blast work
Atai Rabby
 

Similar to B.sc biochem i bobi u 3.1 sequence alignment (20)

lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Seq alignment
Seq alignment Seq alignment
Seq alignment
 
Blast fasta
Blast fastaBlast fasta
Blast fasta
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge
 
Protein Threading
Protein ThreadingProtein Threading
Protein Threading
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptx
 
PPT
PPTPPT
PPT
 
Sequence alignment belgaum
Sequence alignment belgaumSequence alignment belgaum
Sequence alignment belgaum
 
How the blast work
How the blast workHow the blast work
How the blast work
 

More from Rai University

Brochure Rai University
Brochure Rai University Brochure Rai University
Brochure Rai University
Rai University
 
Mm unit 4point2
Mm unit 4point2Mm unit 4point2
Mm unit 4point2
Rai University
 
Mm unit 4point1
Mm unit 4point1Mm unit 4point1
Mm unit 4point1
Rai University
 
Mm unit 4point3
Mm unit 4point3Mm unit 4point3
Mm unit 4point3
Rai University
 
Mm unit 3point2
Mm unit 3point2Mm unit 3point2
Mm unit 3point2
Rai University
 
Mm unit 3point1
Mm unit 3point1Mm unit 3point1
Mm unit 3point1
Rai University
 
Mm unit 2point2
Mm unit 2point2Mm unit 2point2
Mm unit 2point2
Rai University
 
Mm unit 2 point 1
Mm unit 2 point 1Mm unit 2 point 1
Mm unit 2 point 1
Rai University
 
Mm unit 1point3
Mm unit 1point3Mm unit 1point3
Mm unit 1point3
Rai University
 
Mm unit 1point2
Mm unit 1point2Mm unit 1point2
Mm unit 1point2
Rai University
 
Mm unit 1point1
Mm unit 1point1Mm unit 1point1
Mm unit 1point1
Rai University
 
Bdft ii, tmt, unit-iii, dyeing & types of dyeing,
Bdft ii, tmt, unit-iii,  dyeing & types of dyeing,Bdft ii, tmt, unit-iii,  dyeing & types of dyeing,
Bdft ii, tmt, unit-iii, dyeing & types of dyeing,
Rai University
 
Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
Bsc agri  2 pae  u-4.4 publicrevenue-presentation-130208082149-phpapp02Bsc agri  2 pae  u-4.4 publicrevenue-presentation-130208082149-phpapp02
Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
Rai University
 
Bsc agri 2 pae u-4.3 public expenditure
Bsc agri  2 pae  u-4.3 public expenditureBsc agri  2 pae  u-4.3 public expenditure
Bsc agri 2 pae u-4.3 public expenditure
Rai University
 
Bsc agri 2 pae u-4.2 public finance
Bsc agri  2 pae  u-4.2 public financeBsc agri  2 pae  u-4.2 public finance
Bsc agri 2 pae u-4.2 public finance
Rai University
 
Bsc agri 2 pae u-4.1 introduction
Bsc agri  2 pae  u-4.1 introductionBsc agri  2 pae  u-4.1 introduction
Bsc agri 2 pae u-4.1 introduction
Rai University
 
Bsc agri 2 pae u-3.3 inflation
Bsc agri  2 pae  u-3.3  inflationBsc agri  2 pae  u-3.3  inflation
Bsc agri 2 pae u-3.3 inflation
Rai University
 
Bsc agri 2 pae u-3.2 introduction to macro economics
Bsc agri  2 pae  u-3.2 introduction to macro economicsBsc agri  2 pae  u-3.2 introduction to macro economics
Bsc agri 2 pae u-3.2 introduction to macro economics
Rai University
 
Bsc agri 2 pae u-3.1 marketstructure
Bsc agri  2 pae  u-3.1 marketstructureBsc agri  2 pae  u-3.1 marketstructure
Bsc agri 2 pae u-3.1 marketstructure
Rai University
 
Bsc agri 2 pae u-3 perfect-competition
Bsc agri  2 pae  u-3 perfect-competitionBsc agri  2 pae  u-3 perfect-competition
Bsc agri 2 pae u-3 perfect-competition
Rai University
 

More from Rai University (20)

Brochure Rai University
Brochure Rai University Brochure Rai University
Brochure Rai University
 
Mm unit 4point2
Mm unit 4point2Mm unit 4point2
Mm unit 4point2
 
Mm unit 4point1
Mm unit 4point1Mm unit 4point1
Mm unit 4point1
 
Mm unit 4point3
Mm unit 4point3Mm unit 4point3
Mm unit 4point3
 
Mm unit 3point2
Mm unit 3point2Mm unit 3point2
Mm unit 3point2
 
Mm unit 3point1
Mm unit 3point1Mm unit 3point1
Mm unit 3point1
 
Mm unit 2point2
Mm unit 2point2Mm unit 2point2
Mm unit 2point2
 
Mm unit 2 point 1
Mm unit 2 point 1Mm unit 2 point 1
Mm unit 2 point 1
 
Mm unit 1point3
Mm unit 1point3Mm unit 1point3
Mm unit 1point3
 
Mm unit 1point2
Mm unit 1point2Mm unit 1point2
Mm unit 1point2
 
Mm unit 1point1
Mm unit 1point1Mm unit 1point1
Mm unit 1point1
 
Bdft ii, tmt, unit-iii, dyeing & types of dyeing,
Bdft ii, tmt, unit-iii,  dyeing & types of dyeing,Bdft ii, tmt, unit-iii,  dyeing & types of dyeing,
Bdft ii, tmt, unit-iii, dyeing & types of dyeing,
 
Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
Bsc agri  2 pae  u-4.4 publicrevenue-presentation-130208082149-phpapp02Bsc agri  2 pae  u-4.4 publicrevenue-presentation-130208082149-phpapp02
Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
 
Bsc agri 2 pae u-4.3 public expenditure
Bsc agri  2 pae  u-4.3 public expenditureBsc agri  2 pae  u-4.3 public expenditure
Bsc agri 2 pae u-4.3 public expenditure
 
Bsc agri 2 pae u-4.2 public finance
Bsc agri  2 pae  u-4.2 public financeBsc agri  2 pae  u-4.2 public finance
Bsc agri 2 pae u-4.2 public finance
 
Bsc agri 2 pae u-4.1 introduction
Bsc agri  2 pae  u-4.1 introductionBsc agri  2 pae  u-4.1 introduction
Bsc agri 2 pae u-4.1 introduction
 
Bsc agri 2 pae u-3.3 inflation
Bsc agri  2 pae  u-3.3  inflationBsc agri  2 pae  u-3.3  inflation
Bsc agri 2 pae u-3.3 inflation
 
Bsc agri 2 pae u-3.2 introduction to macro economics
Bsc agri  2 pae  u-3.2 introduction to macro economicsBsc agri  2 pae  u-3.2 introduction to macro economics
Bsc agri 2 pae u-3.2 introduction to macro economics
 
Bsc agri 2 pae u-3.1 marketstructure
Bsc agri  2 pae  u-3.1 marketstructureBsc agri  2 pae  u-3.1 marketstructure
Bsc agri 2 pae u-3.1 marketstructure
 
Bsc agri 2 pae u-3 perfect-competition
Bsc agri  2 pae  u-3 perfect-competitionBsc agri  2 pae  u-3 perfect-competition
Bsc agri 2 pae u-3 perfect-competition
 

B.sc biochem i bobi u 3.1 sequence alignment

  • 1. Course: B.Sc Biochemistry Subject: Basic of Bioinformatics Unit: III
  • 2. OUTLINE  Sequence Alignment  Scoring Alignments and Substitution Matrices  Inserting Gaps  Dynamic Programming  Database Searches
  • 3. Sequence Alignment  Comparing sequences for – Similarity – Homology  Prediction of function of genes and proteins  Construction of phylogeny  Finding motifs
  • 4. Sequence Alignment - HOMOLOGY  Orthologues : any gene pairwise relation where the ancestor node is a speciation event. Often have similar function  Paralogues : any gene pairwise relation where the ancestor node is a duplication event. Paralogs tend to have different functions
  • 5. Sequence Alignment - HOMOLOGY 1.
  • 6. Sequence Alignment - HOMOLOGY 2.
  • 7. Sequence Alignment - PHYLOGENY 3.
  • 8. Sequence Alignment – PROTEIN FUNCTIONS 4.
  • 9. Scoring Alignments and Substitution Matrices  The quality of an alignment is measured by giving it a quantitative score  The simplest way of quatifying similarity between two sequences is percentage identity. – Simply measured by counting the number of identical bases or amino acids matched between the aligned sequences.
  • 10. Scoring Alignments and Substitution Matrices  The dot-plot gives a visual assesment of similarity based on identity. [“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum]5.
  • 11. Scoring Alignments and Substitution Matrices  Percentage identity is a relatively crude measure and does bot give a complete picture of the degree of similarity of two sequences.  Scoring identical matches 1 and mismatches as 0 ignores the fact that the type of amino acids involved is highly significant.
  • 12. Scoring Alignments and Substitution Matrices  Genuine matches may not be identical: Seq1: T H I S I S A S E Q U E N C E Seq1: T H A T _ _ _ S E Q U E N C E Isoleucine – Alanine: both hydrophobic Serine – Threonine : both polar
  • 13. Scoring Alignments and Substitution Matrices  Scoring pairs of amino acids: – with similar properties  higher scores – With different properties  lower scores
  • 14. Scoring Alignments and Substitution Matrices  To assign scores for alignmens use SUBSTITUTION MATRICES [“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum] 5.
  • 15. Scoring Alignments and Substitution Matrices  Different types of substitution matrices are being used based on: – The number of mutations required for convertion of one amino acid to the other – Similarities in physicochemical properties.
  • 16. Scoring Alignments and Substitution Matrices  PAM substitution matrices: – Use closely related protein sequences to derive substitution frequencies – Accepted Point Mutations per 100 residues  250 PAM  250 mutation on 100 residues
  • 17. Scoring Alignments and Substitution Matrices  BLOSUM substitution matrices: – BLOcks of Amino Acid SUbstitution Matrix – Use mutation data from highly conserved local regions – BLOSUM 62  62% identity
  • 18. Scoring Alignments and Substitution Matrices  Which matrix to use ? – Depends on the problem properties, – Distantly related sequences : PAM 250 – BLOSUM 50 – Closely related sequences: PAM 120, BLOSUM 80
  • 19. Scoring Alignments and Substitution Matrices  Which matrix to use ? – Some special purpose matrices (SLIM and PHAT are designed for membrane proteins) – The length of the sequende is important  Short sequences  PAM 40 or BLOSUM 80  Long sequences  PAM 250 or BLOSUM 50
  • 20. Scoring Alignments and Substitution Matrices  BLOSUM – 62 and PAM 120 [“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum] 6.
  • 21. Inserting Gaps  Gap insertion requires a scoring penalty (gap penalty).  To achieve correct matches gaps are required  Alignment programs use gap penalties to limit the introduction of gaps in the alignments
  • 22. Inserting Gaps  Insertions tend to be several residues long rather than just a single residue long – Fewer insertions and deletions occur in sequences of structural importance – Smaller penalty on lengthening an existing gap (gap extension penalty) than introducing a new gap – Gap penaly is high  the number of gaps will be decreased – Gap penalty is low  more and large gaps will be inserted.
  • 23. Inserting Gaps  Choosing gap penalties: – Linear – Affine  Gap open penalty  Gap extension penlty
  • 24. Dynamic Programming  Global and Local alignments  Pairwise and Multiple alignments [“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum] 7.
  • 25.  For a pair of sequences there is a large number of possible alignments.  2 sequences of length 1000 have appriximately 10600 different alignments. Dynamic Programming
  • 26.  Dynamic Programming: – Problem can be divided into many smaller parts. – Optimal alignment will not contain parts that are not themselves optimal. – Start from sufficiently short sub-sequences. – Alignement is additive: Dynamic Programming
  • 27.  Needleman and Wunsch were the first to propose this method.  Find optimal global alignments.  Align sequences: – Seq1: x (x1x2x3…xm) – Seq1: y (y1y2y3…yn) Dynamic Programming
  • 28.  s(a,b) = score of aligning a and b  F(i,j) = optimal similarity of X(1:i) and Y(1:j)  Recurrence relation: – F(i,0) = Σ s(X(k), gap), 0 <= k <= i – F(0,j) =Σ s(gap, B(k)), 0 <= k <= j – F(i,j) = max [ F(i,j-1) + s(gap,Y(j), F(i-1,j) + s(X(i),gap), F(i-1, j-1) + s(X(i), Y(j)] – Assume linear gap penalty Dynamic Programming
  • 29. Dynamic Programming  Matrix S of optimal scores of sub-sequence alignments. [“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum] 9.
  • 31. Dynamic Programming S(I, H) = -3, S(I, gap) = -8, S(gap, H) = -8 Recurrence relation: F(i,j) = max [ F(i,j-1) + s(gap,Y(j), F(i-1,j) + s(X(i),gap), F(i-1, j-1) + s(X(i), Y(j)] [“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum] 11.
  • 33. Dynamic Programming –Linear gap penalty (E=4) [“Understanding Bioinformatics”, M. Zvelebil, J. O. Baum] 13.
  • 34. Dynamic Programming  Semi – global alignment: – When we treat terminal gaps differently than internal gaps – How to modify dynamic programming to be able to make semi – global alignment ?
  • 35. Dynamic Programming  Local alignment: – If we compare a sequence to whole genome – Find sub-strings whose optimal global alignment value is maximum
  • 36. Dynamic Programming  What is the difference between global and local alignment ?  Can we define the recuernce relation of local alignment similar to global alignment ?
  • 37. Recurrence relation of GLOBAL ALIGNMENT: (Needleman & Wunsch) – F(i,0) = Σ s(X(k), gap), 0 <= k <= i – F(0,j) =Σ s(gap, B(k)), 0 <= k <= j – F(i,j) = max [ F(i,j-1) + s(gap,Y(j), F(i-1,j) + s(X(i),gap), F(i-1, j-1) + s(X(i), Y(j)] Dynamic Programming
  • 38. Recurrence relation of LOCAL ALIGNMENT: (Smith-Waterman) – F(i,0) = 0 – F(0,j) = 0 – F(i,j) = max [ 0, F(i,j-1) + s(gap,Y(j), F(i-1,j) + s(X(i),gap), F(i-1, j-1) + s(X(i), Y(j)] Dynamic Programming
  • 39. Database Searches  FASTA and BLAST  Use some heuristics  Dynamic Programming Complexity – Time O(n*m) – Space O(n*m)
  • 40. Database Searches FASTA  Good local alignment should have some exact match subsequence.  Find all k-tuples. (k=1-2 for proteins, 3-6 for DNA sequences)  Protein k – tuples  nc, sp, … (k = 2)  Nucleotide k – tuples  TAAA, CTCC,…(k = 4)
  • 41. Database Searches FASTA  If k = 3 for nucleotide sequences. – There will be 64 possible k – tuples – Assign a number e( ):  e(A) = 0, e(C) = 1, e(G) = 2, e(T) = 3  Each 3 – tuples are represented as  xi xi+1xi+2  Assign a number to each 3 – tuple – Ci = e(xi)42 + e(xi+1)41 + e(xi+2)40 – For example: AAA   AAA 042 + 041 + 040 = 0  CAA 142 + 041 + 040 = 16
  • 42. Database Searches FASTA  Find each occurance of k – tuples in the sequences.  Chaining  Look – Up Tables  Consider TAAAACTCTAAC (if k = 3): 3 - tuples Position AAA (0) 2, 3 AAC (1) 4, 10 AAG (2) 0 AAT (3) 0 … …
  • 43. Database Searches BLAST  Use short words to search the database sequence.  Searches for k – mers that will score above a threshold (T) value when aligned with query k - mer (Remember FASTA looks for k – tuples which are identical).  Use a scheme based on finite state automata (Remember FASTA use hashing and chaining fot rapid identification of k - tuples)
  • 44. Database Searches BLAST  From Query Sequence, create query words (for protein sequences word size is 3)
  • 45. Database Searches BLAST  Blast uses a list of high scoring words created from words similar to query words. Considers the words with a score bigger than a threshold value.
  • 46. Database Searches BLAST  Scan each database sequence for an exact match to the list of words.  Word hits are then extended in either direction in an attempt to generate an alignment with a score exceeding the threshold of "S".
  • 47. Database Searches BLAST  Keep only the extended matches that have a score at least S.  Determine statistical significance of each remaining match.
  • 53. Database Searches HISTORY  1970: NW  1980: SW  1985: FASTA  1989: BLAST
  • 54. Books and Web References  Books Name : 1. Introduction To Bioinformatics by T. K. Attwood 2. BioInformatics by Sangita 3. Basic Bioinformatics by S.Ignacimuthu, s.j.  http://en.wikipedia.org/wiki/Sequence_alignment  http://pages.cs.wisc.edu/~bsettles/ibs08/lectures/02-alignment.pdf  http://www.ks.uiuc.edu/Training/Tutorials/science/bioinformatics- tutorial/bioinformatics.pdf  M. Zvelebil, J. O. Baum, “Understanding Bioinformatics”, 2008, Garland Science  Andreas D. Baxevanis, B.F. Francis Ouellette, “Bioinformatics: A practical guide to the analysis of genes and proteins”, 2001, Wiley.54
  • 55. Images References  1.http://gorbi.irb.hr/files/5712/7497/9729/Slide09.jpg  2.http://www.ensembl.org/info/genome/compara/tree_exa mple1.png  3.http://www.nature.com/nature/journal/v496/n7445/imag es/nature12027-f1.2.jpg  4. http://upload.wikimedia.org/wikipedia/commons/e/e6/Spo mbe_Pop2p_protein_structure_rainbow.png  5. & 6. Book: Basic Bioinformatics by S.Ignacimuthu, s.j.  7. to 13. Book: Basic Bioinformatics by S.Ignacimuthu, s.j.  14. to 18. http://blast.ncbi.nlm.nih.gov/Blast.cgi