MSAT
MULTIPLE SEQUENCE ALIGNMENT TOOL
BY
GROUP 2
2/22/2018
1
OVERVIEW
 Sequence alignment
 Types of sequence alignments
 Multiple sequence alignment
 Purpose of MSA
 Types of MSA
 Progressive alignment
 Pros & Cons
2/22/2018
2
SEQUENCE ALIGNMENT
 In bioinformatics, a sequence alignments a way of arranging the sequences of DNA, RNA
, or protein to identify regions of similarity that may be a consequence of functional, stru
ctural, or evolutionary relationships between the sequences.
2/22/2018
3
SEQUENCE ALIGNMENT
 SEQUENCE ALIGNMENT Sequences often contain highly conserved regions
 These regions can be used for initial alignments
2/22/2018
4
TYPES OF SEQUENCE ALIGNMENTS
Pair‐wise alignment
 Dot matrix method
 Dynamic programming
 Word methods
Multiple sequence alignment
 Dynamic programming
 Progressive methods
 Iterative methods 2/22/2018
5
MULTIPLE SEQUENCE ALIGNMENT
 A multiple sequence alignment is tool that simultaneously aligns multiple protein
sequences, automatically utilizes information about protein domains, and has a good
compromise between speed and accuracy will have practical advantages over
current tools
 The principle is that multiple alignments are achieved by successive application of p
airwise methods.
2/22/2018
6
PURPOSE OF MSA
 In order to characterize protein families, identify shared regions of homology in a
multiple sequence alignment
 Determination of the consensus sequence of several aligned sequences.
 Consensus sequences can help to develop a sequence “finger print” which allo
ws the identification of members of distantly related protein family (motifs)
 MSA can help us to reveal biological facts
about proteins, like analysis of the secondary/tertiary structure
2/22/2018
7
2/22/2018
8
TYPES OF MSA
 Dynamic programming approach
Computes an optimal alignment for a given score function. Because of its high ru
nning time , it is not typically used in practice.
 Progressive method
This approach repeatedly aligns two sequences, two alignments, or a sequence
with an alignment.
 Iterative method
Works similarly to progressive methods but repeatedly realigns the initial sequence
s as well as adding new sequences to the growing MSA. 2/22/2018
9
PROGRESSIVE ALIGNMENT
 The most widely used approach
 Builds up a final MSA by combining pairwise
alignments beginning with the most similar pair and progressing to the most distantly r
elated
 Progressive alignment methods require two stages:
First stage in which the relationships between the sequences are represented as a tree,
called a guide tree
‐Second step in which the MSA is built by adding
the sequences sequentially to the growing MSA according to the guide tree
2/22/2018
10
USING COBALT NCBI
 Constraint based alignment tool that implements a general framework for
multiple alignment of protein sequences
 COBALT finds a collection of pairwise constraints derived from database
searches, sequence similarity and user input, combines these pairwise
constraints, and then incorporates them into a progressive multiple
alignment
 COBALT has reasonable runtime performance and alignment accuracy
comparable to or exceeding that of other tools for a broad range of
problems
2/22/2018
11
USING COBALT NCBI
 COBALT has a general framework that uses progressive multiple alignment
to combine pairwise constraints from different sources into a multiple
alignment
 When the same domain matches to multiple sequences, we can infer
several potential pairwise constraints based on these domain matches
 CDD ( Conserved Domains Database ) also contains auxiliary information
that allows COBALT to create partial profiles for input sequences before
progressive alignment begins, and this avoids computationally expensive
procedures for building profiles
2/22/2018
12
RUNTIME OF COBALT
 The runtime performance of COBALT is highly data driven
 COBALT is about five times faster than ProbCons
 COBALT is included in the NCBI C++ Toolkit
 Numerous auxiliary programs were written in C, C++ and Perl to automate
testing and summarize results
2/22/2018
13
AVAILABILITY
 COBALT is included in the NCBI C++ toolkit. A Linux executable for COBALT,
and CDD and PROSITE data used is available at:
https://www.ncbi.nlm.nih.gov/tools/cobalt/re_cobalt.cgi
 Contact: richa@helix.nih.gov
2/22/2018
14
STEP 1
2/22/2018
15
 Go to https://www.ncbi.nlm.nih.gov/
STEP 2
 The Swiss-Prot protein sequence for Schizosaccharomyces pombe Clr4 is
O60016.2.
2/22/2018
16
STEP 3
2/22/2018
17
Step 4
2/22/2018
18
STEP 5
2/22/2018
19
STEP 6
2/22/2018
20
Your patience is greatly appreciated….
2/22/2018
21
RESULT
2/22/2018
22
STEP 6
2/22/2018
23
Select First 11
STEP 7
2/22/2018
24
Your patience is greatly appreciated….
2/22/2018
25
RESULTS
2/22/2018
26
RESULTS
2/22/2018
27
STEP 8
 Notice that the above multiple alignment cant be edited “Edit and
Resubmit” link at the top of the COBALT results to remove the undesired
protein than search again.
2/22/2018
28
STEP 8 (a)
2/22/2018
29
STEP 8 (b)
2/22/2018
30
PROS AND CONS OF PROGRESSIVE
METHOD OF ALIGNMENT
 PROS:
Efficient enough to implement on a large scale for
many (100s to 1000s) sequences.
 Progressive alignment services are commonly available
on publicly accessible web servers, so users need not
locally install the applications of interest.
 Most widely used method of multiple sequence
alignment because of speed and accuracy.
2/22/2018
31
CONS…….
 Progressive alignments are not guaranteed to be
globally optimal.
 The primary problem is that when errors are made at
any stage in growing the MSA, these errors are then
propagated through to the final result.
 Performance is also particularly bad when all of the
sequences in the set are rather distantly related
2/22/2018
32
REFERENCES
 https://insidescienceresources.wordpress.com/2017/05/15/ncbi-
bioinformatics-tools-protein-blast-cobalt-and-cn3d-structure-viewer/
 https://academic.oup.com/bioinformatics/article/23/9/1073/272774
 https://www.ncbi.nlm.nih.gov/
2/22/2018
33
2/22/2018
34

Multiple Sequence Alignment Tool Using NCBI COBALT

  • 1.
    MSAT MULTIPLE SEQUENCE ALIGNMENTTOOL BY GROUP 2 2/22/2018 1
  • 2.
    OVERVIEW  Sequence alignment Types of sequence alignments  Multiple sequence alignment  Purpose of MSA  Types of MSA  Progressive alignment  Pros & Cons 2/22/2018 2
  • 3.
    SEQUENCE ALIGNMENT  Inbioinformatics, a sequence alignments a way of arranging the sequences of DNA, RNA , or protein to identify regions of similarity that may be a consequence of functional, stru ctural, or evolutionary relationships between the sequences. 2/22/2018 3
  • 4.
    SEQUENCE ALIGNMENT  SEQUENCEALIGNMENT Sequences often contain highly conserved regions  These regions can be used for initial alignments 2/22/2018 4
  • 5.
    TYPES OF SEQUENCEALIGNMENTS Pair‐wise alignment  Dot matrix method  Dynamic programming  Word methods Multiple sequence alignment  Dynamic programming  Progressive methods  Iterative methods 2/22/2018 5
  • 6.
    MULTIPLE SEQUENCE ALIGNMENT A multiple sequence alignment is tool that simultaneously aligns multiple protein sequences, automatically utilizes information about protein domains, and has a good compromise between speed and accuracy will have practical advantages over current tools  The principle is that multiple alignments are achieved by successive application of p airwise methods. 2/22/2018 6
  • 7.
    PURPOSE OF MSA In order to characterize protein families, identify shared regions of homology in a multiple sequence alignment  Determination of the consensus sequence of several aligned sequences.  Consensus sequences can help to develop a sequence “finger print” which allo ws the identification of members of distantly related protein family (motifs)  MSA can help us to reveal biological facts about proteins, like analysis of the secondary/tertiary structure 2/22/2018 7
  • 8.
  • 9.
    TYPES OF MSA Dynamic programming approach Computes an optimal alignment for a given score function. Because of its high ru nning time , it is not typically used in practice.  Progressive method This approach repeatedly aligns two sequences, two alignments, or a sequence with an alignment.  Iterative method Works similarly to progressive methods but repeatedly realigns the initial sequence s as well as adding new sequences to the growing MSA. 2/22/2018 9
  • 10.
    PROGRESSIVE ALIGNMENT  Themost widely used approach  Builds up a final MSA by combining pairwise alignments beginning with the most similar pair and progressing to the most distantly r elated  Progressive alignment methods require two stages: First stage in which the relationships between the sequences are represented as a tree, called a guide tree ‐Second step in which the MSA is built by adding the sequences sequentially to the growing MSA according to the guide tree 2/22/2018 10
  • 11.
    USING COBALT NCBI Constraint based alignment tool that implements a general framework for multiple alignment of protein sequences  COBALT finds a collection of pairwise constraints derived from database searches, sequence similarity and user input, combines these pairwise constraints, and then incorporates them into a progressive multiple alignment  COBALT has reasonable runtime performance and alignment accuracy comparable to or exceeding that of other tools for a broad range of problems 2/22/2018 11
  • 12.
    USING COBALT NCBI COBALT has a general framework that uses progressive multiple alignment to combine pairwise constraints from different sources into a multiple alignment  When the same domain matches to multiple sequences, we can infer several potential pairwise constraints based on these domain matches  CDD ( Conserved Domains Database ) also contains auxiliary information that allows COBALT to create partial profiles for input sequences before progressive alignment begins, and this avoids computationally expensive procedures for building profiles 2/22/2018 12
  • 13.
    RUNTIME OF COBALT The runtime performance of COBALT is highly data driven  COBALT is about five times faster than ProbCons  COBALT is included in the NCBI C++ Toolkit  Numerous auxiliary programs were written in C, C++ and Perl to automate testing and summarize results 2/22/2018 13
  • 14.
    AVAILABILITY  COBALT isincluded in the NCBI C++ toolkit. A Linux executable for COBALT, and CDD and PROSITE data used is available at: https://www.ncbi.nlm.nih.gov/tools/cobalt/re_cobalt.cgi  Contact: richa@helix.nih.gov 2/22/2018 14
  • 15.
    STEP 1 2/22/2018 15  Goto https://www.ncbi.nlm.nih.gov/
  • 16.
    STEP 2  TheSwiss-Prot protein sequence for Schizosaccharomyces pombe Clr4 is O60016.2. 2/22/2018 16
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    Your patience isgreatly appreciated…. 2/22/2018 21
  • 22.
  • 23.
  • 24.
  • 25.
    Your patience isgreatly appreciated…. 2/22/2018 25
  • 26.
  • 27.
  • 28.
    STEP 8  Noticethat the above multiple alignment cant be edited “Edit and Resubmit” link at the top of the COBALT results to remove the undesired protein than search again. 2/22/2018 28
  • 29.
  • 30.
  • 31.
    PROS AND CONSOF PROGRESSIVE METHOD OF ALIGNMENT  PROS: Efficient enough to implement on a large scale for many (100s to 1000s) sequences.  Progressive alignment services are commonly available on publicly accessible web servers, so users need not locally install the applications of interest.  Most widely used method of multiple sequence alignment because of speed and accuracy. 2/22/2018 31
  • 32.
    CONS…….  Progressive alignmentsare not guaranteed to be globally optimal.  The primary problem is that when errors are made at any stage in growing the MSA, these errors are then propagated through to the final result.  Performance is also particularly bad when all of the sequences in the set are rather distantly related 2/22/2018 32
  • 33.
  • 34.