Multiple Sequence Alignment Tool Using NCBI COBALT

MSAT
MULTIPLE SEQUENCE ALIGNMENT TOOL
BY
GROUP 2
2/22/2018
1

OVERVIEW
 Sequence alignment
 Types of sequence alignments
 Multiple sequence alignment
 Purpose of MSA
 Types of MSA
 Progressive alignment
 Pros & Cons
2/22/2018
2

SEQUENCE ALIGNMENT
 In bioinformatics, a sequence alignments a way of arranging the sequences of DNA, RNA
, or protein to identify regions of similarity that may be a consequence of functional, stru
ctural, or evolutionary relationships between the sequences.
2/22/2018
3

SEQUENCE ALIGNMENT
 SEQUENCE ALIGNMENT Sequences often contain highly conserved regions
 These regions can be used for initial alignments
2/22/2018
4

TYPES OF SEQUENCE ALIGNMENTS
Pair‐wise alignment
 Dot matrix method
 Dynamic programming
 Word methods
Multiple sequence alignment
 Dynamic programming
 Progressive methods
 Iterative methods 2/22/2018
5

MULTIPLE SEQUENCE ALIGNMENT
 A multiple sequence alignment is tool that simultaneously aligns multiple protein
sequences, automatically utilizes information about protein domains, and has a good
compromise between speed and accuracy will have practical advantages over
current tools
 The principle is that multiple alignments are achieved by successive application of p
airwise methods.
2/22/2018
6

PURPOSE OF MSA
 In order to characterize protein families, identify shared regions of homology in a
multiple sequence alignment
 Determination of the consensus sequence of several aligned sequences.
 Consensus sequences can help to develop a sequence “finger print” which allo
ws the identification of members of distantly related protein family (motifs)
 MSA can help us to reveal biological facts
about proteins, like analysis of the secondary/tertiary structure
2/22/2018
7

TYPES OF MSA
 Dynamic programming approach
Computes an optimal alignment for a given score function. Because of its high ru
nning time , it is not typically used in practice.
 Progressive method
This approach repeatedly aligns two sequences, two alignments, or a sequence
with an alignment.
 Iterative method
Works similarly to progressive methods but repeatedly realigns the initial sequence
s as well as adding new sequences to the growing MSA. 2/22/2018
9

PROGRESSIVE ALIGNMENT
 The most widely used approach
 Builds up a final MSA by combining pairwise
alignments beginning with the most similar pair and progressing to the most distantly r
elated
 Progressive alignment methods require two stages:
First stage in which the relationships between the sequences are represented as a tree,
called a guide tree
‐Second step in which the MSA is built by adding
the sequences sequentially to the growing MSA according to the guide tree
2/22/2018
10

USING COBALT NCBI
 Constraint based alignment tool that implements a general framework for
multiple alignment of protein sequences
 COBALT finds a collection of pairwise constraints derived from database
searches, sequence similarity and user input, combines these pairwise
constraints, and then incorporates them into a progressive multiple
alignment
 COBALT has reasonable runtime performance and alignment accuracy
comparable to or exceeding that of other tools for a broad range of
problems
2/22/2018
11

USING COBALT NCBI
 COBALT has a general framework that uses progressive multiple alignment
to combine pairwise constraints from different sources into a multiple
alignment
 When the same domain matches to multiple sequences, we can infer
several potential pairwise constraints based on these domain matches
 CDD ( Conserved Domains Database ) also contains auxiliary information
that allows COBALT to create partial profiles for input sequences before
progressive alignment begins, and this avoids computationally expensive
procedures for building profiles
2/22/2018
12

RUNTIME OF COBALT
 The runtime performance of COBALT is highly data driven
 COBALT is about five times faster than ProbCons
 COBALT is included in the NCBI C++ Toolkit
 Numerous auxiliary programs were written in C, C++ and Perl to automate
testing and summarize results
2/22/2018
13

AVAILABILITY
 COBALT is included in the NCBI C++ toolkit. A Linux executable for COBALT,
and CDD and PROSITE data used is available at:
https://www.ncbi.nlm.nih.gov/tools/cobalt/re_cobalt.cgi
 Contact: richa@helix.nih.gov
2/22/2018
14

STEP 1
2/22/2018
15
 Go to https://www.ncbi.nlm.nih.gov/

STEP 2
 The Swiss-Prot protein sequence for Schizosaccharomyces pombe Clr4 is
O60016.2.
2/22/2018
16

Your patience is greatly appreciated….
2/22/2018
21

STEP 6
2/22/2018
23
Select First 11

Your patience is greatly appreciated….
2/22/2018
25

STEP 8
 Notice that the above multiple alignment cant be edited “Edit and
Resubmit” link at the top of the COBALT results to remove the undesired
protein than search again.
2/22/2018
28

PROS AND CONS OF PROGRESSIVE
METHOD OF ALIGNMENT
 PROS:
Efficient enough to implement on a large scale for
many (100s to 1000s) sequences.
 Progressive alignment services are commonly available
on publicly accessible web servers, so users need not
locally install the applications of interest.
 Most widely used method of multiple sequence
alignment because of speed and accuracy.
2/22/2018
31

CONS…….
 Progressive alignments are not guaranteed to be
globally optimal.
 The primary problem is that when errors are made at
any stage in growing the MSA, these errors are then
propagated through to the final result.
 Performance is also particularly bad when all of the
sequences in the set are rather distantly related
2/22/2018
32

REFERENCES
 https://insidescienceresources.wordpress.com/2017/05/15/ncbi-
bioinformatics-tools-protein-blast-cobalt-and-cn3d-structure-viewer/
 https://academic.oup.com/bioinformatics/article/23/9/1073/272774
 https://www.ncbi.nlm.nih.gov/
2/22/2018
33

Multiple Sequence Alignment Tool Using NCBI COBALT

More Related Content

What's hot

Similar to Multiple Sequence Alignment Tool Using NCBI COBALT

Recently uploaded

Multiple Sequence Alignment Tool Using NCBI COBALT