Presented by
MARIYA RAJU
MULTIPLE SEQUENCE
ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
TREE ALIGNMENT
STAR ALIGNMENT
GENETIC ALGORITHM
PATTERN IN PAIRWISE ALIGNMENT
Terminology
Homology - Two (or more) sequences have a common
ancestor
Similarity - Two sequences are similar, by some
criterias. It does not refer to any evolutionary process,
just to a comparison of the sequences by some method
Conservation - Changes at a specific position of an
amino acid in a sequence that preserve the physico-
chemical properties
Gaps
Positions at which a letter is paired with a
null are called gaps
Gap scores are typically negative
Since a single mutational event may cause
the insertion or deletion of more than one
residue
which may led to the fomation of Gaps
Identity-The extent to which two (nucleotide
or amino acid) sequences are invariant.
Motif- The biological factor which is used as a
model for studies.it may be a functional or
structural domain, active site, phosphorylation
site etc.
Profile- A quantitative motif description -
assigns a degree of similarity to a potential
match
Pairwise alignment
The process of lining up two or more
sequences.
Inorder to achieve ,maximal level of
identity
For the purpose of assessing the degree of
similarity and the possibility of homology.
It is used to find, whether the two proteins,or
nucleic acids are related structurally or
functionally.
It is used to identify, domains or motifs that
are shared between proteins.
It is the basis of BLAST searching .
It is used in the analysis of genomes.
In Pairwise alignment protein sequences
can be more informative than DNA.
protein is more informative: because
many amino acids share related
biophysical properties.
UniProt
UniProt is a comprehensive, high-quality and
freely accessible database of protein sequence
and functional information.
Many entries being derived from genome
sequencing projects.
It contains a large amount of information about
the biological function of proteins derived
from the research literature.
Dot plot
In Bioinformatics a Dot plot is a graphical
method that allows the comparison of
two biological sequences.
Which,identify regions of close similarity
between them. It is a kind of recurrence plot.
These were introduced by Gibbs and McIntyre in
1970.
They are two-dimensional matrices that have the
sequences of the proteins being compared along the
vertical and horizontal axes.
Individual cells in the matrix can be shaded black if
residues are identical so that matching sequence
segments appear as runs of diagonal lines across the
matrix.
A dot plot of a human zinc finger transcription factor.showing
regional self-similarity. The main diagonal represents the sequence's
alignmentwith itself; lines off the main diagonal represent similar or
repetitive patterns within the sequence.
A Multiple Sequence Alignment (MSA) is a
basic tool for the sequence alignment of two or
more biological sequences.
Generally Protein, DNA, or RNA.
In many cases, the input set of query sequences
are assumed to have an evolutionary
relationship.
By which they share a lineage and are
descended from a common ancestor.
Compare all sequences pairwise.
Perform cluster analysis on the pairwise data to
generate a hierarchy for alignment.
This may be in the form of a binary tree or a
simple ordering.
Build the Multiple Alignment by first aligning the
most similar pair of sequences.
Then the next most similar pair and so on.
Once an alignment of two sequences has been
made, then this is fixed.
Thus for a set of sequences A, B, C, D having
aligned .
A with C and B with D the alignment of A, B, C,
D is obtained by comparing the alignments of A
and C with that of B and D using averaged scores
at each aligned position.
An example of Multiple Alignment
• VTISCTGSSSNIGAG-NHVKWYQQLPG
• VTISCTGTSSNIGS--ITVNWYQQLPG
• LRLSCSSSGFIFSS--YAMYWVRQAPG
• LSLTCTVSGTSFDD--YYSTWVRQPPG
• PEVTCVVVDVSHEDPQVKFNWYVDG--
• ATLVCLISDFYPGA--VTVAWKADS--
• AALGCLVKDYFPEP--VTVSWNSG---
• VSLTCLVKGFYPSD--IAVEWWSNG--
Applications of MSA
Detecting similarities between sequences(closely or
distinctly related).
Detecting conserved regions or motifs in sequences.
Detection of structural homologies.
Thus, assisting the improved prediction of secondary
and teritiary structures of proteins.
Making patterns or profiles that can be further
used to predict new sequences falling in a given
family.
Inferring evolutionary trees or linkages.
Progressive Alignment Method
Iterative Refinment Method
Progressive Alignment Method
The most widely used approach to multiple sequence
alignments
Also known as the Hierarchical or Tree method
Developed by Paulien Hogeweg and Ben Hesper in
1984.
Progressive alignment builds up a final MSA by
combining pairwise alignments beginning with the
most similar pair and progressing to the most
distantly related.
All progressive alignment methods require two
stages.
First stage in which the relationships between the
sequences are represented as a tree, called a guide
tree.
Second step in which the MSA is built by adding
the sequences sequentially to the growing MSA
according to the guide tree.
Progressive alignment
algorithms
Clustal W
T-Coffee
Clustal W
The Clustal series of programs are widely used in
molecular biology
For the multiple alignment of both nucleic acid and
protein sequences and for preparing phylogenetic trees.
Works by progressive alignment: it aligns a pair of
sequences then aligns the next one onto the first pair.
Most closely related sequences are aligned first, and then
additional sequences and groups of sequences are added,
guided by the initial alignments
Uses alignment scores to produce a phylogenetic
tree.
Aligns the sequences sequentially, guided by the
phylogenetic relationships indicated by the tree.
T-Coffee
T-Coffee
(Tree based Consistency Objective Function For
alignment Evaluation)
It has advanced features to evaluate the quality of the
alignments
It produces alignment in the aln format (Clustal)
But can also produce PIR, MSF, and FASTA format.
The most common input formats are supported
(FASTA, PIR)
A set of methods to produce MSAs while reducing the errors
inherent in progressive methods are classified as "iterative"
They work similarly to progressive methods but repeatedly
realign the initial sequences as well as adding new sequences
to the growing MSA
Barton and Sternberg formulated this method for MSA.
Different iterative methods used in Bioinformatics are of
DIALGIN,MUSCLE(multiple sequence alignment by log-
expectation),etc
TREE ALIGNMENT
In computational phylogenetics, Tree Alignment is
used to analyse a set of sequences with evolutionary
relationship using a fixed tree.
Essentially,Tree Alignment is an algorithm for
optimizing phylogenetic tree
To be specific, phylogenetic tree shows an
evolutionary relationship between different species
and taxa joined together are assumed to have the
same ancestor.
In MSA ,DNA,RNA, and proteins sequences are
usually generated and they are assumed to have
evolutionary relationship .
Generally ,heuristic algorithm and tree alignment
graph are also adopted to solve multiple sequence
alignment problems.
Tree Alignment Graph
Roughly ,tree alignment graph aims to align
trees into a graph.
And finally synthesis them to develop statistics.
TAG is a combination of a set of aligning trees.
It can store conflicting hypotheses evolutionary
relationship and synthesize the source trees to develop
evolutionary hypotheses.
Therefore ,it is a basic method to solve other
alignment problems.
TAG
STAR ALIGNMENT
Star phylogeny is an another form of Tree Alignment
The Star Alignment is also used to analyse a set of
sequences with evolutionary relationship using a fixed
Star .
Instead of score , Star algorithm uses the cost notation .
An algorithm is a set of instructions that is
repeated to solve a problem.
A genetic algorithm conceptually follows steps
inspired by the biological processes of
evolution.
Genetic Algorithms follow the idea of
SURVIVAL OF THE FITTEST.
Originally developed by John Holland (1975)
Also known as Evolutionary Algorithms, genetic
algorithms demonstrate self organization and
adaptation similar to the way that the fittest
biological organism survive and reproduce.
Generally applied to spaces which are too large.
The method learns by producing offspring that are
better and better as measured by a fitness function.
Steps in Simple GA
Initialize population.
Evaluate population.
Select parents for reproduction.
Perform crossover and mutation.
Evaluate offspring.
Outline of the Basic Genetic
Algorithm
Selection-Select two parent chromosomes
from a population according to their fitness.
Crossover-With a crossover probability cross
over the parents to form a new offspring.
Mutation-With a mutation probability mutate
new offspring at each locus (position in
chromosome).
Accepting -Place new offspring in a new
population
Replace-Use new generated population
for a further run of algorithm
Test-If the end condition is satisfied, stop
and return the best solution in current
population.
MULTIPLE  SEQUENCE  ALIGNMENT

MULTIPLE SEQUENCE ALIGNMENT

  • 1.
  • 2.
    MULTIPLE SEQUENCE ALIGNMENT TREEALIGNMENT STAR ALIGNMENT GENETIC ALGORITHM PATTERN IN PAIRWISE ALIGNMENT
  • 3.
    Terminology Homology - Two(or more) sequences have a common ancestor Similarity - Two sequences are similar, by some criterias. It does not refer to any evolutionary process, just to a comparison of the sequences by some method Conservation - Changes at a specific position of an amino acid in a sequence that preserve the physico- chemical properties
  • 5.
    Gaps Positions at whicha letter is paired with a null are called gaps Gap scores are typically negative Since a single mutational event may cause the insertion or deletion of more than one residue which may led to the fomation of Gaps
  • 7.
    Identity-The extent towhich two (nucleotide or amino acid) sequences are invariant. Motif- The biological factor which is used as a model for studies.it may be a functional or structural domain, active site, phosphorylation site etc. Profile- A quantitative motif description - assigns a degree of similarity to a potential match
  • 9.
    Pairwise alignment The processof lining up two or more sequences. Inorder to achieve ,maximal level of identity For the purpose of assessing the degree of similarity and the possibility of homology.
  • 11.
    It is usedto find, whether the two proteins,or nucleic acids are related structurally or functionally. It is used to identify, domains or motifs that are shared between proteins. It is the basis of BLAST searching . It is used in the analysis of genomes.
  • 12.
    In Pairwise alignmentprotein sequences can be more informative than DNA. protein is more informative: because many amino acids share related biophysical properties.
  • 13.
    UniProt UniProt is acomprehensive, high-quality and freely accessible database of protein sequence and functional information. Many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature.
  • 14.
    Dot plot In Bioinformaticsa Dot plot is a graphical method that allows the comparison of two biological sequences. Which,identify regions of close similarity between them. It is a kind of recurrence plot.
  • 15.
    These were introducedby Gibbs and McIntyre in 1970. They are two-dimensional matrices that have the sequences of the proteins being compared along the vertical and horizontal axes. Individual cells in the matrix can be shaded black if residues are identical so that matching sequence segments appear as runs of diagonal lines across the matrix.
  • 16.
    A dot plotof a human zinc finger transcription factor.showing regional self-similarity. The main diagonal represents the sequence's alignmentwith itself; lines off the main diagonal represent similar or repetitive patterns within the sequence.
  • 18.
    A Multiple SequenceAlignment (MSA) is a basic tool for the sequence alignment of two or more biological sequences. Generally Protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship. By which they share a lineage and are descended from a common ancestor.
  • 19.
    Compare all sequencespairwise. Perform cluster analysis on the pairwise data to generate a hierarchy for alignment. This may be in the form of a binary tree or a simple ordering.
  • 20.
    Build the MultipleAlignment by first aligning the most similar pair of sequences. Then the next most similar pair and so on. Once an alignment of two sequences has been made, then this is fixed. Thus for a set of sequences A, B, C, D having aligned . A with C and B with D the alignment of A, B, C, D is obtained by comparing the alignments of A and C with that of B and D using averaged scores at each aligned position.
  • 21.
    An example ofMultiple Alignment • VTISCTGSSSNIGAG-NHVKWYQQLPG • VTISCTGTSSNIGS--ITVNWYQQLPG • LRLSCSSSGFIFSS--YAMYWVRQAPG • LSLTCTVSGTSFDD--YYSTWVRQPPG • PEVTCVVVDVSHEDPQVKFNWYVDG-- • ATLVCLISDFYPGA--VTVAWKADS-- • AALGCLVKDYFPEP--VTVSWNSG--- • VSLTCLVKGFYPSD--IAVEWWSNG--
  • 22.
    Applications of MSA Detectingsimilarities between sequences(closely or distinctly related). Detecting conserved regions or motifs in sequences. Detection of structural homologies. Thus, assisting the improved prediction of secondary and teritiary structures of proteins.
  • 23.
    Making patterns orprofiles that can be further used to predict new sequences falling in a given family. Inferring evolutionary trees or linkages.
  • 24.
  • 25.
    Progressive Alignment Method Themost widely used approach to multiple sequence alignments Also known as the Hierarchical or Tree method Developed by Paulien Hogeweg and Ben Hesper in 1984. Progressive alignment builds up a final MSA by combining pairwise alignments beginning with the most similar pair and progressing to the most distantly related.
  • 26.
    All progressive alignmentmethods require two stages. First stage in which the relationships between the sequences are represented as a tree, called a guide tree. Second step in which the MSA is built by adding the sequences sequentially to the growing MSA according to the guide tree.
  • 27.
  • 28.
    Clustal W The Clustalseries of programs are widely used in molecular biology For the multiple alignment of both nucleic acid and protein sequences and for preparing phylogenetic trees. Works by progressive alignment: it aligns a pair of sequences then aligns the next one onto the first pair. Most closely related sequences are aligned first, and then additional sequences and groups of sequences are added, guided by the initial alignments
  • 29.
    Uses alignment scoresto produce a phylogenetic tree. Aligns the sequences sequentially, guided by the phylogenetic relationships indicated by the tree.
  • 30.
    T-Coffee T-Coffee (Tree based ConsistencyObjective Function For alignment Evaluation) It has advanced features to evaluate the quality of the alignments It produces alignment in the aln format (Clustal) But can also produce PIR, MSF, and FASTA format. The most common input formats are supported (FASTA, PIR)
  • 31.
    A set ofmethods to produce MSAs while reducing the errors inherent in progressive methods are classified as "iterative" They work similarly to progressive methods but repeatedly realign the initial sequences as well as adding new sequences to the growing MSA Barton and Sternberg formulated this method for MSA. Different iterative methods used in Bioinformatics are of DIALGIN,MUSCLE(multiple sequence alignment by log- expectation),etc
  • 32.
    TREE ALIGNMENT In computationalphylogenetics, Tree Alignment is used to analyse a set of sequences with evolutionary relationship using a fixed tree. Essentially,Tree Alignment is an algorithm for optimizing phylogenetic tree To be specific, phylogenetic tree shows an evolutionary relationship between different species and taxa joined together are assumed to have the same ancestor.
  • 33.
    In MSA ,DNA,RNA,and proteins sequences are usually generated and they are assumed to have evolutionary relationship . Generally ,heuristic algorithm and tree alignment graph are also adopted to solve multiple sequence alignment problems.
  • 34.
    Tree Alignment Graph Roughly,tree alignment graph aims to align trees into a graph. And finally synthesis them to develop statistics.
  • 35.
    TAG is acombination of a set of aligning trees. It can store conflicting hypotheses evolutionary relationship and synthesize the source trees to develop evolutionary hypotheses. Therefore ,it is a basic method to solve other alignment problems.
  • 36.
  • 37.
    STAR ALIGNMENT Star phylogenyis an another form of Tree Alignment The Star Alignment is also used to analyse a set of sequences with evolutionary relationship using a fixed Star . Instead of score , Star algorithm uses the cost notation .
  • 39.
    An algorithm isa set of instructions that is repeated to solve a problem. A genetic algorithm conceptually follows steps inspired by the biological processes of evolution. Genetic Algorithms follow the idea of SURVIVAL OF THE FITTEST. Originally developed by John Holland (1975)
  • 40.
    Also known asEvolutionary Algorithms, genetic algorithms demonstrate self organization and adaptation similar to the way that the fittest biological organism survive and reproduce. Generally applied to spaces which are too large. The method learns by producing offspring that are better and better as measured by a fitness function.
  • 41.
    Steps in SimpleGA Initialize population. Evaluate population. Select parents for reproduction. Perform crossover and mutation. Evaluate offspring.
  • 42.
    Outline of theBasic Genetic Algorithm Selection-Select two parent chromosomes from a population according to their fitness. Crossover-With a crossover probability cross over the parents to form a new offspring. Mutation-With a mutation probability mutate new offspring at each locus (position in chromosome).
  • 43.
    Accepting -Place newoffspring in a new population Replace-Use new generated population for a further run of algorithm Test-If the end condition is satisfied, stop and return the best solution in current population.