SlideShare a Scribd company logo
1 of 45
SEQUENCEANALYSIS
Dr. Gobind Ram
Assistant Professor
P.G. Department of
Biotechnology
Lyallpur Khalsa College,
Jalandhar
Why Bioinformatics is Important?
• Applications areas include
– Medicine
– Pharmaceutical drug
design
– Toxicology
– Molecular evolution
– Biological computing
models
Genomics
Molecular
evolution
Biophysics
Molecular
biology
computer
science
Bioinformatics
Mathematics
ctgccgatagc
MKLVDDYTR
o
i
d1
1
1
s e
Where do the data come from?
literature
Information
Sequence alignment
Alignment: Comparing two (pairwise) or more
(multiple) sequences. Searching for a series of
identical or similar characters in the sequences.
-Similarity : Same Physicochemical properties.
- Identity :- Identical
MVNLTSDEKTAVLALWNKVDVEDCGGE
|| || ||||| ||| || || ||
MVHLTPEEKTAVNALWGKVNVDAVGGE
Sequence alignment-why???
• The basis for comparison of proteins and genes
using the similarity of their sequences is that the
the proteins or genes are related by evolution;
they have a common ancestor.
• Random mutations in the sequences accumulate
over time, so that proteins or genes that have a
common ancestor far back in time are not as similar
as proteins or genes that diverged from each other
more recently.
Alignment
• A way of arranging the objects or alphabets to
find out the similarity and difference existing
between them.
• In case of bioinformatics, it is the arrangement
of sequence (DNA,RNA or protein) to find out
the regions of similarity and difference by
virtue of which homology can be predicted.
ALIGNMENT
Local alignment Global alignment
Pairwise sequence
alignment
Multiple sequence
alignment
Why perform to pair wise sequence
alignment?
Finding homology between two sequences
Example : Protein prediction(Sequence or
Structure).
similar sequence (or structure)
similar function
Local Vs. Global
• Global alignment compares through out the sequence
and gives best overall alignment but may fail to find out
the local region of similarity among sequence which
exactly contain the domain and motif information.
• Local alignment find regions of ungapped sequence
with high level of similarity. Best for finding the motif
although two sequences are different.
Local alignment – finds regions of high similarity in
parts of the sequences
Global alignment – finds the best alignment across
the entire two sequences
Local vs. Global
Three types of nucleotide changes:
1. Substitution – a replacement of one (or more)
sequence characters by another:
2. Insertion - an insertion of one (or more) sequence
characters:
3. Deletion – a deletion of one (or more) sequence
characters:
T
A
Evolutionary changes in sequences
Insertion + Deletion  Indel
AAGA AACA

AAG
GA
A
A
Choosing an alignment:
• Many different alignments between two
sequences are possible:
AAGCTGAATTCGAA
AGGCTCATTTCTGA
A-AGCTGAATTC--GAA
AG-GCTCA-TTTCTGA-
How one can determine which is the best alignment?
AAGCTGAATT-C-GAA
AGGCT-CATTTCTGA-
. . .
Exercise
• Match: +1
• Mismatch: -2
• Indel: -1
AAGCTGAATT-C-GAA
AGGCT-CATTTCTGA-
A-AGCTGAATTC--GAA
AG-GCTCA-TTTCTGA-
Compute the scores of each of the following alignments
Scoring scheme:
-2
-2
-2
1
-2
-2
1
-2
-2
1
-2
-2
1
-2
-2
-2
A
C
G
T
A C G T
Substitution matrix
Gap penalty (opening = extending)
Open Reading Frames(ORFs)
•6 possible ORFs
–frames 1,2,and 3 in 5’ to 3’direction
–frames 1,2, and 3 in 5’ to 3’ direction
of complimentary strand.
The different reading frames give
entirely different proteins.
Each gene uses a single reading frame, so
once the ribosome gets started, it just has
to count off groups of 3 bases to produce
the proper protein.
PAM matrices
• Family of matrices PAM 80, PAM 120, PAM 250, …
• The number with a PAM matrix (the n in PAMn) represents
the evolutionary distance between the sequences on which
the matrix is based
• The (ith,jth) cell in a PAMn matrix denotes the probability that
amino-acid i will be replaced by amino-acid j in time n:
Pi→j,n .
• Greater n numbers denote greater distances
BLOSUM matrices
• Different BLOSUMn matrices are calculated independently
from BLOCKS (ungapped, manually created local alignments)
• BLOSUMn is based on a cluster of BLOCKS of sequences
that share at least n percent identity
• The (ith,jth) cell in a BLOSUM matrix denotes the log of odds
of the observed frequency and expected frequency of amino
acids i and j in the same position in the data: log(Pij/qi*qj)
• Higher n numbers denote higher identity between the
sequences on which the matrix is based
BLAST
(Basic Local Alignment Search Tool)
• The BLAST program was designed by Eugene
Myers, Stephen Altschul, Warren Gish, David J.
Lipman and Webb Miller at the NIH and was
published in J. Mol. Biol. in 1990.
• OBJECTIVE: Find high scoring ungapped segment
among related sequences
• Most widely used bioinformatics programs as the
algorithm emphasizes speed over sensitivity.
• An algorithm for comparing primary biological
sequence information to find out the similarity
existing between these two.
• Emphasizes on regions of local alignment to
detect relationship among sequences which
shares only isolated regions of similarity.
• Not only a tool for visualizing alignment but
also give a view to compare structure and
function.
Steps for BLAST
 Searches for exact matches of a small fixed length
between query sequence in the database called Seed.
 BLAST tries to extend the match in both direction
starting at the seed ungapped alignment occur---- High
Scoring Segment Pair (HSP).
 The highest scored HSP’s are presented as final report.
They are called Maximum Scoring Pairing
BLAST performs a gapped alignment
between query sequence and database
sequence using a variation of Smith-
Watermann Algorithm statistically
significant alignments are then displayed
to user
BLAST PROGRAMS
• BLASTP: protein query sequence against a protein
database, allowing for gaps.
• BLASTN: DNA query sequence against a DNA database,
allowing for gaps.
• BLASTX: DNA query sequence, translated into all six
reading frames, against a protein database, allowing for
gaps.
• TBLASTN: protein query sequence against a DNA
database, translated into all six reading frames, allowing
for gaps.
• TBLASTX: DNA query sequence, translated into all six
reading frames, against a DNA database, translated into
all six reading frames (No gaps allowed)
PSI-BLAST
(position-specific scoring matrix)
• Used to find distant relatives of a protein.
• First, a list of all closely related proteins is
created. These proteins are combined into a
general "profile" sequence.
• Now this profile used as a query and again the
search performed to get the more distantly
related sequence.
• PSI-BLAST is much more sensitive in picking
up distant evolutionary relationships than a
standard protein-protein BLAST.
Statistical Significance
Matrix
• A key element in evaluating the quality of a
pairwise sequence alignment is the
"substitution matrix", which assigns a score for
aligning any possible pair of residues.
• BLAST includes BLOSUM & PAM matrix.
BLOSUM62 Scoring Matrix
One-Letter Code for Amino Acid Alphabet (L = 20)
ACDEFGHIKLMNPQRSTVWY
S Henikoff & JG Henikoff (1993) Proteins 17:49
A C D E F G H I K L M N P Q R S T V W Y
A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2
C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2
D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3
E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2
F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3
G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3
H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2
I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1
K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2
L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1
M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1
N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2
P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3
Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1
R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2
S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2
T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2
V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1
W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2
Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7
A C D E F G H I K L M N P Q R S T V W Y
A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2
C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2
D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3
E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2
F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3
G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3
H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2
I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1
K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2
L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1
M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1
N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2
P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3
Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1
R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2
S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2
T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2
V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1
W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2
Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7
A C D E F G H I K L M N P Q R S T V W Y
A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2
C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2
D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3
E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2
F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3
G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3
H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2
I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1
K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2
L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1
M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1
N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2
P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3
Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1
R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2
S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2
T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2
V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1
W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2
Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7
A C D E F G H I K L M N P Q R S T V W Y
A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2
C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2
D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3
E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2
F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3
G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3
H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2
I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1
K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2
L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1
M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1
N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2
P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3
Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1
R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2
S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2
T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2
V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1
W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2
Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7
 
, log ab
a b
q
X a b
p p

Log-odds Score
BLOSUM62 Scoring Matrix
One-Letter Code for Amino Acid Alphabet (L = 20)
ACDEFGHIKLMNPQRSTVWY
A C D E F G H I K L M N P Q R S T V W Y
A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2
C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2
D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3
E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2
F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3
G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3
H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2
I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1
K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2
L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1
M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1
N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2
P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3
Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1
R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2
S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2
T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2
V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1
W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2
Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7
The Score Matrix
ACDEFGH
HICDYGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
ACDEFGH
HICDYGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
Gaps
Similarity
Identity
 
,
i j
X A B
ACDEFGH
HICDYGH
A
B
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
Paths in the Score Matrix
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
Deletion
Insertion
Matches
O
T
Alignments are in a one-
to-one correspondence
with score matrix paths.
Low Complexity Regions
• Amino acid or DNA sequence regions that offer very
low information due to their highly biased content
– histidine-rich domains in amino acids
– poly-A tails in DNA sequences
– poly-G tails in nucleotides
– runs of purines
– runs of pyrimidines
– runs of a single amino acid, etc.
E-value
• Depends on database size
• Indicates probability of a database
match expected as result of random
chance
• Lower E-value, more significant
sequence, less likely Db result of
random chance
E=m x n x p
E=E-value
m=total no. of residues in Database
n=no. of residues in query sequence
p= probability that high scoring pair is result of
random chance
• E-value 0.01 and 10-50 Homology
• E-value 0.01 and 10 not significant to
remote homology
• E-value>10 distantly related
Bit Score
• Measure sequence similarity which is independent of
query sequence length and database size but based on Raw
Pairwise Alignment
• High bit score , high significantly match
• S’ (λ S-lnk)/ln2
S’=bit score
λ =grumble distributation constt.
K=constt.associated with scoring matrix
(λ and k are two statistical parameters)
Low Complexity Regions (LCR)
Masking:
(I) Hard masking
(II) Soft Masking
Program for Masking
(i) SEG :high frequency region declared LCR
(ii) RepeatMasker: score for a sequence region above
certain threshold region declared LCR. Residue
masked with N’s and X’s
Mask repetitive sequences
MNPQQQQQQRST = MNPXXXXXXRST
X will not match anything in the database.
It does preserve position, however.
BLAST result page
• BLAST result page divided into 3 parts.
• Part1 contains the information regarding version, database
used, reference and length of the query sequence.
• Part-2 is the conserved regions and graphical representation
of the alignment where each line represents the alignment of
query sequence with one database sequence.
• It shows the result in 5 different color depending upon the bit
score.
• Part-3 contains the list of database sequence having
similarity obtained while database search and detail view of
alignment along with bitscore, e-value, identities, positives
and gaps.
Part-1
Part-2
Part-3
BLAST Preferred
• BLAST uses substitution matrix to find
matching while FASTA identifies identical
matching words using hashing procedure. By
default FASTA scans smaller window sizes
.Thus it gives more sensitive results than
BLAST with better coverage rates of
homologs but usually slower than BLAST
• BLAST use low complexity masking means it
may have higher specificity than FASTA
therefore false positives are reduced
• BLAST sometimes give multiple best scoring
alignments from the same sequence, FASTA
returns only one final alignment
REFRENCES
 Jin Xiong(2006). Essential Bioinformatics.
Cambridge University Press.
Mount D. W. (2004). Bioinformatics &
Genome Analysis. Cold Spring Harbor
Laboratory Press.
URL:-
WWW.ncbi.nlm.nih.gov
THANKS

More Related Content

What's hot

Genes, Genomics and Proteomics
Genes, Genomics and Proteomics Genes, Genomics and Proteomics
Genes, Genomics and Proteomics Garry D. Lasaga
 
Beyond transcription: RNA-binding proteins as emerging regulator of plant res...
Beyond transcription: RNA-binding proteins as emerging regulator of plant res...Beyond transcription: RNA-binding proteins as emerging regulator of plant res...
Beyond transcription: RNA-binding proteins as emerging regulator of plant res...BALASAHEB BIRADAR
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence AlignmentPRUTHVIRAJ K
 
Initiation and termination codons , mutation and genetic code
Initiation and termination codons , mutation and genetic codeInitiation and termination codons , mutation and genetic code
Initiation and termination codons , mutation and genetic codegohil sanjay bhagvanji
 
Transcription factors and machinery
Transcription factors and machineryTranscription factors and machinery
Transcription factors and machineryAnuKiruthika
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmProshantaShil
 
Regulation of translation or gene expression lec 33
Regulation of translation or gene expression  lec 33Regulation of translation or gene expression  lec 33
Regulation of translation or gene expression lec 33mariagul6
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBIgeetikaJethra
 
Protein protein interaction, functional proteomics
Protein protein interaction, functional proteomicsProtein protein interaction, functional proteomics
Protein protein interaction, functional proteomicsKAUSHAL SAHU
 
Major and minor grooves dna
Major and minor grooves dnaMajor and minor grooves dna
Major and minor grooves dnagokul das
 
Second genetic code overlapping and split genes
Second genetic code overlapping and split genesSecond genetic code overlapping and split genes
Second genetic code overlapping and split genesgohil sanjay bhagvanji
 
Prokaryote genome
Prokaryote genome Prokaryote genome
Prokaryote genome YashikaSood2
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS
 
Proteome
ProteomeProteome
ProteomeHARIS.P
 
PAM : Point Accepted Mutation
PAM : Point Accepted MutationPAM : Point Accepted Mutation
PAM : Point Accepted MutationAmit Kyada
 

What's hot (20)

Genes, Genomics and Proteomics
Genes, Genomics and Proteomics Genes, Genomics and Proteomics
Genes, Genomics and Proteomics
 
Beyond transcription: RNA-binding proteins as emerging regulator of plant res...
Beyond transcription: RNA-binding proteins as emerging regulator of plant res...Beyond transcription: RNA-binding proteins as emerging regulator of plant res...
Beyond transcription: RNA-binding proteins as emerging regulator of plant res...
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Initiation and termination codons , mutation and genetic code
Initiation and termination codons , mutation and genetic codeInitiation and termination codons , mutation and genetic code
Initiation and termination codons , mutation and genetic code
 
222397 lecture 16 17
222397 lecture 16 17222397 lecture 16 17
222397 lecture 16 17
 
Transcription factors and machinery
Transcription factors and machineryTranscription factors and machinery
Transcription factors and machinery
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
Regulation of translation or gene expression lec 33
Regulation of translation or gene expression  lec 33Regulation of translation or gene expression  lec 33
Regulation of translation or gene expression lec 33
 
Homology
HomologyHomology
Homology
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
Protein protein interaction, functional proteomics
Protein protein interaction, functional proteomicsProtein protein interaction, functional proteomics
Protein protein interaction, functional proteomics
 
Major and minor grooves dna
Major and minor grooves dnaMajor and minor grooves dna
Major and minor grooves dna
 
Bioinformatics - Internet
Bioinformatics - InternetBioinformatics - Internet
Bioinformatics - Internet
 
Second genetic code overlapping and split genes
Second genetic code overlapping and split genesSecond genetic code overlapping and split genes
Second genetic code overlapping and split genes
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
Prokaryote genome
Prokaryote genome Prokaryote genome
Prokaryote genome
 
Yeast Genome
Yeast Genome Yeast Genome
Yeast Genome
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
Proteome
ProteomeProteome
Proteome
 
PAM : Point Accepted Mutation
PAM : Point Accepted MutationPAM : Point Accepted Mutation
PAM : Point Accepted Mutation
 

Similar to Sequence Analysis.ppt

PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...
PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...
PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...JIA-MING CHANG
 
2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx
2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx
2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docxfelicidaddinwoodie
 
bioinfo_6th_20070720
bioinfo_6th_20070720bioinfo_6th_20070720
bioinfo_6th_20070720sesejun
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Prof. Wim Van Criekinge
 
2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekingeProf. Wim Van Criekinge
 
Genetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step ExampleGenetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step ExampleAhmed Gad
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Pritom Chaki
 
Global and local alignment in Bioinformatics
Global and local alignment in BioinformaticsGlobal and local alignment in Bioinformatics
Global and local alignment in BioinformaticsMahmudul Alam
 
Gutell 084.jmb.2002.323.0035
Gutell 084.jmb.2002.323.0035Gutell 084.jmb.2002.323.0035
Gutell 084.jmb.2002.323.0035Robin Gutell
 
wealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docx
wealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docxwealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docx
wealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docxmelbruce90096
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
Kyle Jensen's MIT Ph.D. Thesis Proposal
Kyle Jensen's MIT Ph.D. Thesis ProposalKyle Jensen's MIT Ph.D. Thesis Proposal
Kyle Jensen's MIT Ph.D. Thesis ProposalKyle Jensen
 

Similar to Sequence Analysis.ppt (20)

PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...
PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...
PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx
2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx
2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx
 
bioinfo_6th_20070720
bioinfo_6th_20070720bioinfo_6th_20070720
bioinfo_6th_20070720
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarity
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
 
2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge
 
Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
 
Genetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step ExampleGenetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step Example
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
Global and local alignment in Bioinformatics
Global and local alignment in BioinformaticsGlobal and local alignment in Bioinformatics
Global and local alignment in Bioinformatics
 
Gutell 084.jmb.2002.323.0035
Gutell 084.jmb.2002.323.0035Gutell 084.jmb.2002.323.0035
Gutell 084.jmb.2002.323.0035
 
wealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docx
wealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docxwealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docx
wealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docx
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
Similarity
SimilaritySimilarity
Similarity
 
Chicago stats talk
Chicago stats talkChicago stats talk
Chicago stats talk
 
BLAST
BLASTBLAST
BLAST
 
Kyle Jensen's MIT Ph.D. Thesis Proposal
Kyle Jensen's MIT Ph.D. Thesis ProposalKyle Jensen's MIT Ph.D. Thesis Proposal
Kyle Jensen's MIT Ph.D. Thesis Proposal
 

Recently uploaded

Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 

Recently uploaded (20)

Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 

Sequence Analysis.ppt

  • 1. SEQUENCEANALYSIS Dr. Gobind Ram Assistant Professor P.G. Department of Biotechnology Lyallpur Khalsa College, Jalandhar
  • 2. Why Bioinformatics is Important? • Applications areas include – Medicine – Pharmaceutical drug design – Toxicology – Molecular evolution – Biological computing models Genomics Molecular evolution Biophysics Molecular biology computer science Bioinformatics Mathematics
  • 3. ctgccgatagc MKLVDDYTR o i d1 1 1 s e Where do the data come from? literature Information
  • 4. Sequence alignment Alignment: Comparing two (pairwise) or more (multiple) sequences. Searching for a series of identical or similar characters in the sequences. -Similarity : Same Physicochemical properties. - Identity :- Identical MVNLTSDEKTAVLALWNKVDVEDCGGE || || ||||| ||| || || || MVHLTPEEKTAVNALWGKVNVDAVGGE
  • 5. Sequence alignment-why??? • The basis for comparison of proteins and genes using the similarity of their sequences is that the the proteins or genes are related by evolution; they have a common ancestor. • Random mutations in the sequences accumulate over time, so that proteins or genes that have a common ancestor far back in time are not as similar as proteins or genes that diverged from each other more recently.
  • 6. Alignment • A way of arranging the objects or alphabets to find out the similarity and difference existing between them. • In case of bioinformatics, it is the arrangement of sequence (DNA,RNA or protein) to find out the regions of similarity and difference by virtue of which homology can be predicted.
  • 7.
  • 8. ALIGNMENT Local alignment Global alignment Pairwise sequence alignment Multiple sequence alignment
  • 9. Why perform to pair wise sequence alignment? Finding homology between two sequences Example : Protein prediction(Sequence or Structure). similar sequence (or structure) similar function
  • 10. Local Vs. Global • Global alignment compares through out the sequence and gives best overall alignment but may fail to find out the local region of similarity among sequence which exactly contain the domain and motif information. • Local alignment find regions of ungapped sequence with high level of similarity. Best for finding the motif although two sequences are different.
  • 11. Local alignment – finds regions of high similarity in parts of the sequences Global alignment – finds the best alignment across the entire two sequences Local vs. Global
  • 12. Three types of nucleotide changes: 1. Substitution – a replacement of one (or more) sequence characters by another: 2. Insertion - an insertion of one (or more) sequence characters: 3. Deletion – a deletion of one (or more) sequence characters: T A Evolutionary changes in sequences Insertion + Deletion  Indel AAGA AACA  AAG GA A A
  • 13. Choosing an alignment: • Many different alignments between two sequences are possible: AAGCTGAATTCGAA AGGCTCATTTCTGA A-AGCTGAATTC--GAA AG-GCTCA-TTTCTGA- How one can determine which is the best alignment? AAGCTGAATT-C-GAA AGGCT-CATTTCTGA- . . .
  • 14. Exercise • Match: +1 • Mismatch: -2 • Indel: -1 AAGCTGAATT-C-GAA AGGCT-CATTTCTGA- A-AGCTGAATTC--GAA AG-GCTCA-TTTCTGA- Compute the scores of each of the following alignments Scoring scheme: -2 -2 -2 1 -2 -2 1 -2 -2 1 -2 -2 1 -2 -2 -2 A C G T A C G T Substitution matrix Gap penalty (opening = extending)
  • 15. Open Reading Frames(ORFs) •6 possible ORFs –frames 1,2,and 3 in 5’ to 3’direction –frames 1,2, and 3 in 5’ to 3’ direction of complimentary strand. The different reading frames give entirely different proteins. Each gene uses a single reading frame, so once the ribosome gets started, it just has to count off groups of 3 bases to produce the proper protein.
  • 16. PAM matrices • Family of matrices PAM 80, PAM 120, PAM 250, … • The number with a PAM matrix (the n in PAMn) represents the evolutionary distance between the sequences on which the matrix is based • The (ith,jth) cell in a PAMn matrix denotes the probability that amino-acid i will be replaced by amino-acid j in time n: Pi→j,n . • Greater n numbers denote greater distances
  • 17. BLOSUM matrices • Different BLOSUMn matrices are calculated independently from BLOCKS (ungapped, manually created local alignments) • BLOSUMn is based on a cluster of BLOCKS of sequences that share at least n percent identity • The (ith,jth) cell in a BLOSUM matrix denotes the log of odds of the observed frequency and expected frequency of amino acids i and j in the same position in the data: log(Pij/qi*qj) • Higher n numbers denote higher identity between the sequences on which the matrix is based
  • 18. BLAST (Basic Local Alignment Search Tool) • The BLAST program was designed by Eugene Myers, Stephen Altschul, Warren Gish, David J. Lipman and Webb Miller at the NIH and was published in J. Mol. Biol. in 1990. • OBJECTIVE: Find high scoring ungapped segment among related sequences • Most widely used bioinformatics programs as the algorithm emphasizes speed over sensitivity.
  • 19. • An algorithm for comparing primary biological sequence information to find out the similarity existing between these two. • Emphasizes on regions of local alignment to detect relationship among sequences which shares only isolated regions of similarity. • Not only a tool for visualizing alignment but also give a view to compare structure and function.
  • 20. Steps for BLAST  Searches for exact matches of a small fixed length between query sequence in the database called Seed.  BLAST tries to extend the match in both direction starting at the seed ungapped alignment occur---- High Scoring Segment Pair (HSP).  The highest scored HSP’s are presented as final report. They are called Maximum Scoring Pairing
  • 21. BLAST performs a gapped alignment between query sequence and database sequence using a variation of Smith- Watermann Algorithm statistically significant alignments are then displayed to user
  • 22. BLAST PROGRAMS • BLASTP: protein query sequence against a protein database, allowing for gaps. • BLASTN: DNA query sequence against a DNA database, allowing for gaps. • BLASTX: DNA query sequence, translated into all six reading frames, against a protein database, allowing for gaps. • TBLASTN: protein query sequence against a DNA database, translated into all six reading frames, allowing for gaps. • TBLASTX: DNA query sequence, translated into all six reading frames, against a DNA database, translated into all six reading frames (No gaps allowed)
  • 23. PSI-BLAST (position-specific scoring matrix) • Used to find distant relatives of a protein. • First, a list of all closely related proteins is created. These proteins are combined into a general "profile" sequence. • Now this profile used as a query and again the search performed to get the more distantly related sequence. • PSI-BLAST is much more sensitive in picking up distant evolutionary relationships than a standard protein-protein BLAST.
  • 25. Matrix • A key element in evaluating the quality of a pairwise sequence alignment is the "substitution matrix", which assigns a score for aligning any possible pair of residues. • BLAST includes BLOSUM & PAM matrix.
  • 26. BLOSUM62 Scoring Matrix One-Letter Code for Amino Acid Alphabet (L = 20) ACDEFGHIKLMNPQRSTVWY S Henikoff & JG Henikoff (1993) Proteins 17:49 A C D E F G H I K L M N P Q R S T V W Y A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2 C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2 D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3 E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2 F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3 G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3 H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2 I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1 K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2 L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1 M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1 N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2 P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3 Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1 R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2 S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2 T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2 V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1 W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2 Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7 A C D E F G H I K L M N P Q R S T V W Y A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2 C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2 D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3 E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2 F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3 G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3 H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2 I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1 K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2 L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1 M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1 N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2 P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3 Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1 R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2 S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2 T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2 V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1 W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2 Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7 A C D E F G H I K L M N P Q R S T V W Y A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2 C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2 D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3 E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2 F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3 G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3 H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2 I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1 K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2 L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1 M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1 N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2 P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3 Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1 R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2 S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2 T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2 V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1 W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2 Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7 A C D E F G H I K L M N P Q R S T V W Y A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2 C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2 D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3 E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2 F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3 G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3 H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2 I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1 K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2 L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1 M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1 N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2 P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3 Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1 R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2 S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2 T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2 V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1 W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2 Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7   , log ab a b q X a b p p  Log-odds Score
  • 27. BLOSUM62 Scoring Matrix One-Letter Code for Amino Acid Alphabet (L = 20) ACDEFGHIKLMNPQRSTVWY A C D E F G H I K L M N P Q R S T V W Y A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2 C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2 D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3 E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2 F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3 G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3 H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2 I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1 K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2 L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1 M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1 N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2 P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3 Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1 R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2 S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2 T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2 V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1 W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2 Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7
  • 28. The Score Matrix ACDEFGH HICDYGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 ACDEFGH HICDYGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH Gaps Similarity Identity   , i j X A B ACDEFGH HICDYGH A B A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8
  • 29. Paths in the Score Matrix -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH Deletion Insertion Matches O T Alignments are in a one- to-one correspondence with score matrix paths.
  • 30. Low Complexity Regions • Amino acid or DNA sequence regions that offer very low information due to their highly biased content – histidine-rich domains in amino acids – poly-A tails in DNA sequences – poly-G tails in nucleotides – runs of purines – runs of pyrimidines – runs of a single amino acid, etc.
  • 31. E-value • Depends on database size • Indicates probability of a database match expected as result of random chance • Lower E-value, more significant sequence, less likely Db result of random chance
  • 32. E=m x n x p E=E-value m=total no. of residues in Database n=no. of residues in query sequence p= probability that high scoring pair is result of random chance
  • 33. • E-value 0.01 and 10-50 Homology • E-value 0.01 and 10 not significant to remote homology • E-value>10 distantly related
  • 34. Bit Score • Measure sequence similarity which is independent of query sequence length and database size but based on Raw Pairwise Alignment • High bit score , high significantly match • S’ (λ S-lnk)/ln2 S’=bit score λ =grumble distributation constt. K=constt.associated with scoring matrix (λ and k are two statistical parameters)
  • 35. Low Complexity Regions (LCR) Masking: (I) Hard masking (II) Soft Masking Program for Masking (i) SEG :high frequency region declared LCR (ii) RepeatMasker: score for a sequence region above certain threshold region declared LCR. Residue masked with N’s and X’s
  • 36. Mask repetitive sequences MNPQQQQQQRST = MNPXXXXXXRST X will not match anything in the database. It does preserve position, however.
  • 37. BLAST result page • BLAST result page divided into 3 parts. • Part1 contains the information regarding version, database used, reference and length of the query sequence. • Part-2 is the conserved regions and graphical representation of the alignment where each line represents the alignment of query sequence with one database sequence. • It shows the result in 5 different color depending upon the bit score. • Part-3 contains the list of database sequence having similarity obtained while database search and detail view of alignment along with bitscore, e-value, identities, positives and gaps.
  • 41.
  • 42. BLAST Preferred • BLAST uses substitution matrix to find matching while FASTA identifies identical matching words using hashing procedure. By default FASTA scans smaller window sizes .Thus it gives more sensitive results than BLAST with better coverage rates of homologs but usually slower than BLAST
  • 43. • BLAST use low complexity masking means it may have higher specificity than FASTA therefore false positives are reduced • BLAST sometimes give multiple best scoring alignments from the same sequence, FASTA returns only one final alignment
  • 44. REFRENCES  Jin Xiong(2006). Essential Bioinformatics. Cambridge University Press. Mount D. W. (2004). Bioinformatics & Genome Analysis. Cold Spring Harbor Laboratory Press. URL:- WWW.ncbi.nlm.nih.gov