MULTIPLE SEQUENCE ALIGNMENT

Presented by
MARIYA RAJU
MULTIPLE SEQUENCE
ALIGNMENT

MULTIPLE SEQUENCE ALIGNMENT
TREE ALIGNMENT
STAR ALIGNMENT
GENETIC ALGORITHM
PATTERN IN PAIRWISE ALIGNMENT

Terminology
Homology - Two (or more) sequences have a common
ancestor
Similarity - Two sequences are similar, by some
criterias. It does not refer to any evolutionary process,
just to a comparison of the sequences by some method
Conservation - Changes at a specific position of an
amino acid in a sequence that preserve the physico-
chemical properties

Gaps
Positions at which a letter is paired with a
null are called gaps
Gap scores are typically negative
Since a single mutational event may cause
the insertion or deletion of more than one
residue
which may led to the fomation of Gaps

Identity-The extent to which two (nucleotide
or amino acid) sequences are invariant.
Motif- The biological factor which is used as a
model for studies.it may be a functional or
structural domain, active site, phosphorylation
site etc.
Profile- A quantitative motif description -
assigns a degree of similarity to a potential
match

Pairwise alignment
The process of lining up two or more
sequences.
Inorder to achieve ,maximal level of
identity
For the purpose of assessing the degree of
similarity and the possibility of homology.

It is used to find, whether the two proteins,or
nucleic acids are related structurally or
functionally.
It is used to identify, domains or motifs that
are shared between proteins.
It is the basis of BLAST searching .
It is used in the analysis of genomes.

In Pairwise alignment protein sequences
can be more informative than DNA.
protein is more informative: because
many amino acids share related
biophysical properties.

UniProt
UniProt is a comprehensive, high-quality and
freely accessible database of protein sequence
and functional information.
Many entries being derived from genome
sequencing projects.
It contains a large amount of information about
the biological function of proteins derived
from the research literature.

Dot plot
In Bioinformatics a Dot plot is a graphical
method that allows the comparison of
two biological sequences.
Which,identify regions of close similarity
between them. It is a kind of recurrence plot.

These were introduced by Gibbs and McIntyre in
1970.
They are two-dimensional matrices that have the
sequences of the proteins being compared along the
vertical and horizontal axes.
Individual cells in the matrix can be shaded black if
residues are identical so that matching sequence
segments appear as runs of diagonal lines across the
matrix.

A dot plot of a human zinc finger transcription factor.showing
regional self-similarity. The main diagonal represents the sequence's
alignmentwith itself; lines off the main diagonal represent similar or
repetitive patterns within the sequence.

A Multiple Sequence Alignment (MSA) is a
basic tool for the sequence alignment of two or
more biological sequences.
Generally Protein, DNA, or RNA.
In many cases, the input set of query sequences
are assumed to have an evolutionary
relationship.
By which they share a lineage and are
descended from a common ancestor.

Compare all sequences pairwise.
Perform cluster analysis on the pairwise data to
generate a hierarchy for alignment.
This may be in the form of a binary tree or a
simple ordering.

Build the Multiple Alignment by first aligning the
most similar pair of sequences.
Then the next most similar pair and so on.
Once an alignment of two sequences has been
made, then this is fixed.
Thus for a set of sequences A, B, C, D having
aligned .
A with C and B with D the alignment of A, B, C,
D is obtained by comparing the alignments of A
and C with that of B and D using averaged scores
at each aligned position.

An example of Multiple Alignment
• VTISCTGSSSNIGAG-NHVKWYQQLPG
• VTISCTGTSSNIGS--ITVNWYQQLPG
• LRLSCSSSGFIFSS--YAMYWVRQAPG
• LSLTCTVSGTSFDD--YYSTWVRQPPG
• PEVTCVVVDVSHEDPQVKFNWYVDG--
• ATLVCLISDFYPGA--VTVAWKADS--
• AALGCLVKDYFPEP--VTVSWNSG---
• VSLTCLVKGFYPSD--IAVEWWSNG--

Applications of MSA
Detecting similarities between sequences(closely or
distinctly related).
Detecting conserved regions or motifs in sequences.
Detection of structural homologies.
Thus, assisting the improved prediction of secondary
and teritiary structures of proteins.

Making patterns or profiles that can be further
used to predict new sequences falling in a given
family.
Inferring evolutionary trees or linkages.

Progressive Alignment Method
Iterative Refinment Method

Progressive Alignment Method
The most widely used approach to multiple sequence
alignments
Also known as the Hierarchical or Tree method
Developed by Paulien Hogeweg and Ben Hesper in
1984.
Progressive alignment builds up a final MSA by
combining pairwise alignments beginning with the
most similar pair and progressing to the most
distantly related.

All progressive alignment methods require two
stages.
First stage in which the relationships between the
sequences are represented as a tree, called a guide
tree.
Second step in which the MSA is built by adding
the sequences sequentially to the growing MSA
according to the guide tree.

Progressive alignment
algorithms
Clustal W
T-Coffee

Clustal W
The Clustal series of programs are widely used in
molecular biology
For the multiple alignment of both nucleic acid and
protein sequences and for preparing phylogenetic trees.
Works by progressive alignment: it aligns a pair of
sequences then aligns the next one onto the first pair.
Most closely related sequences are aligned first, and then
additional sequences and groups of sequences are added,
guided by the initial alignments

Uses alignment scores to produce a phylogenetic
tree.
Aligns the sequences sequentially, guided by the
phylogenetic relationships indicated by the tree.

T-Coffee
T-Coffee
(Tree based Consistency Objective Function For
alignment Evaluation)
It has advanced features to evaluate the quality of the
alignments
It produces alignment in the aln format (Clustal)
But can also produce PIR, MSF, and FASTA format.
The most common input formats are supported
(FASTA, PIR)

A set of methods to produce MSAs while reducing the errors
inherent in progressive methods are classified as "iterative"
They work similarly to progressive methods but repeatedly
realign the initial sequences as well as adding new sequences
to the growing MSA
Barton and Sternberg formulated this method for MSA.
Different iterative methods used in Bioinformatics are of
DIALGIN,MUSCLE(multiple sequence alignment by log-
expectation),etc

TREE ALIGNMENT
In computational phylogenetics, Tree Alignment is
used to analyse a set of sequences with evolutionary
relationship using a fixed tree.
Essentially,Tree Alignment is an algorithm for
optimizing phylogenetic tree
To be specific, phylogenetic tree shows an
evolutionary relationship between different species
and taxa joined together are assumed to have the
same ancestor.

In MSA ,DNA,RNA, and proteins sequences are
usually generated and they are assumed to have
evolutionary relationship .
Generally ,heuristic algorithm and tree alignment
graph are also adopted to solve multiple sequence
alignment problems.

Tree Alignment Graph
Roughly ,tree alignment graph aims to align
trees into a graph.
And finally synthesis them to develop statistics.

TAG is a combination of a set of aligning trees.
It can store conflicting hypotheses evolutionary
relationship and synthesize the source trees to develop
evolutionary hypotheses.
Therefore ,it is a basic method to solve other
alignment problems.

STAR ALIGNMENT
Star phylogeny is an another form of Tree Alignment
The Star Alignment is also used to analyse a set of
sequences with evolutionary relationship using a fixed
Star .
Instead of score , Star algorithm uses the cost notation .

An algorithm is a set of instructions that is
repeated to solve a problem.
A genetic algorithm conceptually follows steps
inspired by the biological processes of
evolution.
Genetic Algorithms follow the idea of
SURVIVAL OF THE FITTEST.
Originally developed by John Holland (1975)

Also known as Evolutionary Algorithms, genetic
algorithms demonstrate self organization and
adaptation similar to the way that the fittest
biological organism survive and reproduce.
Generally applied to spaces which are too large.
The method learns by producing offspring that are
better and better as measured by a fitness function.

Steps in Simple GA
Initialize population.
Evaluate population.
Select parents for reproduction.
Perform crossover and mutation.
Evaluate offspring.

Outline of the Basic Genetic
Algorithm
Selection-Select two parent chromosomes
from a population according to their fitness.
Crossover-With a crossover probability cross
over the parents to form a new offspring.
Mutation-With a mutation probability mutate
new offspring at each locus (position in
chromosome).

Accepting -Place new offspring in a new
population
Replace-Use new generated population
for a further run of algorithm
Test-If the end condition is satisfied, stop
and return the best solution in current
population.

MULTIPLE SEQUENCE ALIGNMENT

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MULTIPLE SEQUENCE ALIGNMENT

Similar to MULTIPLE SEQUENCE ALIGNMENT (20)

More from Mariya Raju

More from Mariya Raju (15)

Recently uploaded

Recently uploaded (20)

MULTIPLE SEQUENCE ALIGNMENT