CLUSTAL
BY,
BENITTA BENNY
S2BIOINFORMATICS
CLUSTAL
• Clustal - computer programs used 
in Bioinformatics for multiple sequence alignment. 
•  Many versions of Clustal over the development of 
the algorithm .  A  combination of the software 
availability and may not be supported for every 
current version of the Clustal tools.
•  Clustal Omega has the most wide variety of 
operating systems out of all the Clustal tools.
Clustal
lus ClustalW
• ClustalW like the other Clustal tools is used for 
aligning multiple nucleotide or protein sequences in 
an efficient manner. It uses progressive alignment 
methods- align the most similar sequences first and 
work their way down to the least similar sequences 
until a global alignment is created. ClustalW is a 
matrix-based algorithm- tools like T-
Coffee and Dialign are consistency-based. ClustalW  
- fairly efficient algorithm  competes - against other 
software. This program requires three or more 
sequences in order to calculate a global alignment, 
for pairwise sequence alignment (2 sequences) use 
Algorithm
• ClustalW uses progressive alignment methods.  
sequences with the best alignment score are aligned 
first, then progressively more distant groups of 
sequences are aligned. 
• This heuristic approach is necessary due to the time 
and memory demand of finding the global optimal 
solution. 
• The first step to the algorithm is computing a rough 
distance matrix between each pair of sequences, also 
known as pairwise sequence alignment. 
• The next step is a neighbor-joining method that uses 
midpoint rooting to create an overall guide tree.  
Multiple Alignment Method
• The steps are summarized as follows:
• Compare all sequences pairwise. 
• Perform cluster analysis on the pairwise data to generate a 
hierarchy for alignment. This may be in the form of a binary tree 
or a simple ordering
• Build the multiple alignment by first aligning the most similar 
pair of sequences, then the next most similar pair and so on. Once 
an alignment of  two sequences has been made, then this is fixed. 
Thus for a set of sequences A, B, C, D having aligned A with C and 
B with D the alignment of A, B, C, D is obtained by comparing the 
alignments of A and C with that of B and D using averaged scores 
at each aligned position.
ClustalW- for multiple alignment
• ClustaW is a general purpose multiple alignment
program for DNA or proteins.
• ClustalW is produced by Julie D. Thompson, Toby
Gibson of European Molecular Biology Laboratory,
Germany and Desmond Higgins of European
Bioinformatics Institute, Cambridge, UK. Algorithmic
• ClustalW is cited: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting,
positions-specific gap penalties and weight matrix choice.
Nucleic Acids Research, 22:4673-4680.
ClustalW can create multiple alignments,
manipulate existing alignments, do profile
analysis and create phylogentic trees.
Alignment can be done by 2 methods:
- slow/accurate
- fast/approximate
ClustalW - Input
http://www.ebi.ac.uk/Tools/clustalw2/index.html
Input
sequences
Gap scoring
Scoring
matrix
Email
address
Output
format
ClustalW - Output
Match strength in decreasing order: * : .
ClustalW - Output
ClustalW - Output
ClustalW - Output
Output of ClustalW
CLUSTAL W (1.7) multiple sequence alignment
HSTNFR GGGAAGAG---TTCCCCAGGGACCTCTCTCTAATCAGCCCTCTGGCCCAG------G
SYNTNFTRP GGGAAGAG---TTCCCCAGGGACCTCTCTCTAATCAGCCCTCTGGCCCAG------G
CFTNFA -------------------------------------------TGTCCAG------A
CATTNFAA GGGAAGAG---CTCCCACATGGCCTGCAACTAATCAACCCTCTGCCCCAG------A
RABTNFM AGGAGGAAGAGTCCCCAAACAACCTCCATCTAGTCAACCCTGTGGCCCAGATGGTCA
RNTNFAA AGGAGGAGAAGTTCCCAAATGGGCTCCCTCTCATCAGTTCCATGGCCCAGACCCTCA
OATNFA1 GGGAAGAGCAGTCCCCAGCTGGCCCCTCCTTCAACAGGCCTCTGGTTCAG------A
OATNFAR GGGAAGAGCAGTCCCCAGCTGGCCCCTCCTTCAACAGGCCTCTGGTTCAG------A
BSPTNFA GGGAAGAGCAGTCCCCAGGTGGCCCCTCCATCAACAGCCCTCTGGTTCAA------A
CEU14683 GGGAAGAGCAATCCCCAACTGGCCTCTCCATCAACAGCCCTCTGGTTCAG------A
**
Clustal X - Multiple Sequence
Alignment Program
• Clustal X provides a new window-based user interface to the
ClustalW program.
• It uses the Vibrant multi-platform user interface development
library, developed by the National Center for Biotechnology
Information (Bldg 38A, NIH 8600 Rockville Pike,Bethesda, MD
20894) as part of their NCBI SOFTWARE DEVELOPEMENT
TOOLKIT.
ClustalX
  Fast and scalable program written in C and C++ used
for multiple sequence alignment.
 It uses seeded guide trees and a
new HMM engine that focuses on two
profiles to generate these alignments.
 The program requires three or more
sequences in order to calculate
the multiple sequence alignment, for
two sequences use pairwise sequence
alignment tools (EMBOSS, LALIGN).
 Clustal Omega is consistency-based
and is widely viewed as one of the
fastest online implementations of all
multiple sequence alignment tools and
CLUSTAL OMEGA
shown here.
Clustal Omega has five main steps .
The first is producing a pairwise alignment using
the k-tuple method, also known as the word
method. This, in summary, is a heuristic method
to find an optimal alignment solution, but is
significantly more efficient than the dynamic
programming method of alignment. After that,
the sequences are clustered using the modified
mBed method. The mBed method calculates
pairwise distance using sequence embedding.
This step is followed by the k-means clustering
method.
ALGORITHM
method. This is shown as multiple guide tree steps
leading into one final guide tree construction because
of the way the UPGMA algorithm works.
At each step, (each diamond in the flowchart) the
nearest two clusters are combined and is repeated until
the final tree can be assessed. In the final step,
the multiple sequence alignment is produced using
HHAlign package from the HH-Suite, which uses two
profile HMM's.
A profile HMM is a linear state machine consisting of a
series of nodes, each of which corresponds roughly to a
position (column) in the alignment from which it was
built
Clustal

Clustal