Defining Sequence Analysis
• Sequence Analysis is the process of
subjecting a DNA, RNA or peptide sequence to
any of a wide range of analytical methods to
understand its features, function, structure, or
• It includes-
Sequence Assembly ANALYSIS
Searching (in Databases)
• The comparison of sequences in order to find
similarity, often to infer if they are related
• Identification of intrinsic features of the sequence such
as active sites, post translational
modification sites, gene-structures, distributions
of introns and exons.
• Identification of sequence differences and variations
such as point mutations and single nucleotide
polymorphism (SNP) in order to get the genetic marker.
• Revealing the evolution and genetic diversity of
sequences and organisms
• Identification of molecular structure from sequence
• Genetic diseases
• DNA sequencing is the process of determining
the precise order of nucleotides or order of the
four bases—adenine, guanine, cytosine,
and thymine, in a strand of DNA.
• Expressed Sequence Tag (EST) is a short sub-sequence
of a cDNA sequence. ESTs may be used to identify
gene transcripts, and are instrumental in gene
discovery and in gene-sequence determination.
Sanger Sequencing or Chain Termination Method
Shotgun Sequencing method
• Protein sequencing is a technique to
determine the amino acid sequence of
a protein, as well as which conformation the
protein adopts and the extent to which it is
complexed with any non-peptide molecules.
• Mass spectrometry(MS) is an analytical technique that
ionizes chemical species and sorts the ions based on their mass
to charge ratio.
• Sequence assembly refers to the
reconstruction of a DNA sequence
by aligning and merging small DNA fragments.
It is an integral part of modern DNA
(1) cutting the DNA into small pieces,
(2) reading the small fragments,
(3) reconstituting the original DNA by merging the
information on various fragment.
• Sequence Alignment is a way of arranging the
sequences of DNA, RNA, or protein to identify
regions of similarity that may be a
consequence of functional, structural,
or evolutionary relationships between the
• It involves the identification of the correct
location of deletions and insertions that
have occurred in either of the two lineages
since the divergence from a common ancestor.
• On the basis of number of comparing
sequencing strand, it is of two types:
Multiple Sequence Alignment
Pair-wise Sequence Alignment
• Pair-wise sequence alignment only compares
two sequences at a time.
• Optimality is based on SCORE.
A pairwise alignment consists of a series of paired
bases, one base from each sequence. There are
three types of pairs:
(1) matches = the same nucleotide appears in
(2) mismatches = different nucleotides are found
in the two sequences.
(3) gaps = a base in one sequence and a null base
in the other.
• Algorithm used are Needleman-Wunsch
algorithm and the Smith-Waterman algorithm
BLAST (Basic Local Alignment Search TooL)
Multiple Sequence Alignment
• Multiple sequence alignment (MSA) is
a sequence alignment of three or more biological
sequences, generally protein, DNA, or RNA.
ClustalW, PROBCONS, MUSCLE
• Biological databases are libraries of life
sciences information, collected from scientific
experiments, published literature, high-
throughput experiment technology, and
UniGene is an NCBI database of
DNA Data Bank of Japan (DDBJ)
European Bioinformatics Institute (EMBL-