2. Jens
Martensson
Content
• KEGG GenomeNet -
Introduction
• ClustalW - Introduction
- Algorithm
- Flowchart
• Multiple Alignment Method
- Introduction
• ClustalW – Work process
• Introduction to other Similar
Tools – ClusterΩ / Jalview
• Live demonstration 2
3. Jens
Martensson
KEGG - GenomeNet
• KEGG (Kyoto Encyclopedia of Genes and
Genomes) is a collection of databases
dealing with genomes, biological
pathways, diseases, drugs, and chemical
substances.
• KEGG is utilized
for bioinformatics research and
education, including data analysis
in genomics, metagenomics, metabolom
ics and other omics studies, modeling
and simulation in systems biology,
and translational research in drug
development.
• GenomeNet is one to the Bioinformatics
database with tools
3
4. Jens
Martensson
ClustalW
• ClustalW like the other Clustal tools is
used for aligning multiple nucleotide or
protein sequences in an efficient manner.
• It uses progressive alignment methods-
align the most similar sequences first and
work their way down to the least similar
sequences until a global alignment is
created.
• ClustalW is a matrix-based algorithm-
tools like T Coffee and Dialign are
consistency-based. ClustalW is fairly
efficient algorithm competes - against
other software.
• This program requires three or more
sequences in order to calculate a global
alignment and for pairwise sequence
alignment
4
5. Jens
Martensson
5
Algorithm
• ClustalW uses progressive alignment
methods. sequences with the best
alignment score are aligned first, then
progressively more distant groups of
sequences are aligned.
• This heuristic approach is necessary
due to the time and memory demand
of finding the global optimal solution.
• The first step to the algorithm is
computing a rough distance matrix
between each pair of sequences, also
known as pairwise sequence
alignment.
• The next step is a neighbor-joining
method that usesmidpoint rooting to
create an overall guide tree.
7. Jens
Martensson
7
Multiple Alignment Method
The steps are summarized as follows:
• Compare all sequences pairwise.
• Perform cluster analysis on the pairwise
data to generate a hierarchy for
alignment. This may be in the form of a
binary tree or a simple ordering
• Build the multiple alignment by first
aligning the most similar pair of
sequences, then the next most similar pair
and so on. Once an alignment of two
sequences has been made, then this is
fixed. Thus for a set of sequences A, B, C,
D having aligned A with C and B with D
the alignment of A, B, C, D is obtained by
comparing the alignments of A and C with
that of B and D using averaged scores at
each aligned position.
8. Jens
Martensson
8
ClustalW
For multiple alignment
• ClustaW is a general purpose multiple
alignment program for DNA or proteins.
• ClustalW is produced by Julie D.
Thompson, Toby Gibson of European
Molecular Biology Laboratory, Germany
and Desmond Higgins of European
Bioinformatics Institute, Cambridge, UK.
• ClustalW can create multiple alignments,
manipulate existing alignments, do
profile analysis and create phylogenetic
trees.
• Alignment can be done by 2 methods:
slow/accurate
fast/approximate