ClustalW
Rohith BH 1OX18BT031(Student)
The Department of Biotechnology
The Oxford College of Engineering
Bangalore
Jens
Martensson
Content
• KEGG GenomeNet -
Introduction
• ClustalW - Introduction
- Algorithm
- Flowchart
• Multiple Alignment Method
- Introduction
• ClustalW – Work process
• Introduction to other Similar
Tools – ClusterΩ / Jalview
• Live demonstration 2
Jens
Martensson
KEGG - GenomeNet
• KEGG (Kyoto Encyclopedia of Genes and
Genomes) is a collection of databases
dealing with genomes, biological
pathways, diseases, drugs, and chemical
substances.
• KEGG is utilized
for bioinformatics research and
education, including data analysis
in genomics, metagenomics, metabolom
ics and other omics studies, modeling
and simulation in systems biology,
and translational research in drug
development.
• GenomeNet is one to the Bioinformatics
database with tools
3
Jens
Martensson
ClustalW
• ClustalW like the other Clustal tools is
used for aligning multiple nucleotide or
protein sequences in an efficient manner.
• It uses progressive alignment methods-
align the most similar sequences first and
work their way down to the least similar
sequences until a global alignment is
created.
• ClustalW is a matrix-based algorithm-
tools like T Coffee and Dialign are
consistency-based. ClustalW is fairly
efficient algorithm competes - against
other software.
• This program requires three or more
sequences in order to calculate a global
alignment and for pairwise sequence
alignment
4
Jens
Martensson
5
Algorithm
• ClustalW uses progressive alignment
methods. sequences with the best
alignment score are aligned first, then
progressively more distant groups of
sequences are aligned.
• This heuristic approach is necessary
due to the time and memory demand
of finding the global optimal solution.
• The first step to the algorithm is
computing a rough distance matrix
between each pair of sequences, also
known as pairwise sequence
alignment.
• The next step is a neighbor-joining
method that usesmidpoint rooting to
create an overall guide tree.
Program
flowchart
Jens
Martensson
7
Multiple Alignment Method
The steps are summarized as follows:
• Compare all sequences pairwise.
• Perform cluster analysis on the pairwise
data to generate a hierarchy for
alignment. This may be in the form of a
binary tree or a simple ordering
• Build the multiple alignment by first
aligning the most similar pair of
sequences, then the next most similar pair
and so on. Once an alignment of two
sequences has been made, then this is
fixed. Thus for a set of sequences A, B, C,
D having aligned A with C and B with D
the alignment of A, B, C, D is obtained by
comparing the alignments of A and C with
that of B and D using averaged scores at
each aligned position.
Jens
Martensson
8
ClustalW
For multiple alignment
• ClustaW is a general purpose multiple
alignment program for DNA or proteins.
• ClustalW is produced by Julie D.
Thompson, Toby Gibson of European
Molecular Biology Laboratory, Germany
and Desmond Higgins of European
Bioinformatics Institute, Cambridge, UK.
• ClustalW can create multiple alignments,
manipulate existing alignments, do
profile analysis and create phylogenetic
trees.
• Alignment can be done by 2 methods:
slow/accurate
fast/approximate
Jens
Martensson
9
ClustalW - Input
Output format
Input sequences
Scoring matrix
Gap scoring
Jens
Martensson
10
ClustalW - Input
Download
• Downloading Protein
sequence in FASTA format
Jens
Martensson
11
ClustalW - Input
Sequences are
Entered
Jens
Martensson
12
ClustalW - Input
Sequences for
alignment
https://textsaver.flap.tv/lists/4a27
Jens
Martensson
13
ClustalW - Output
Match strength in
decreasing order
Jens
Martensson
14
ClustalW - Output
Guide Tree
Jens
Martensson
15
ClustalW - Output
Phylogram
Similar
tools
Clustal Ω / Jalview
Resources
• https://www.genome.jp/tools-bin/clustalw
• https://www.uniprot.org/uniprot/P02769
• Bioinformatics Tools for Multiple Sequence
Alignment < EMBL-EBI
• https://en.wikipedia.org/wiki/KEGG
• https://www.google.com
Thank
You
Rohith BH
Rohithbadimalaharinath@gmail.c
om

Clustal W - Multiple Sequence alignment

  • 1.
    ClustalW Rohith BH 1OX18BT031(Student) TheDepartment of Biotechnology The Oxford College of Engineering Bangalore
  • 2.
    Jens Martensson Content • KEGG GenomeNet- Introduction • ClustalW - Introduction - Algorithm - Flowchart • Multiple Alignment Method - Introduction • ClustalW – Work process • Introduction to other Similar Tools – ClusterΩ / Jalview • Live demonstration 2
  • 3.
    Jens Martensson KEGG - GenomeNet •KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. • KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolom ics and other omics studies, modeling and simulation in systems biology, and translational research in drug development. • GenomeNet is one to the Bioinformatics database with tools 3
  • 4.
    Jens Martensson ClustalW • ClustalW likethe other Clustal tools is used for aligning multiple nucleotide or protein sequences in an efficient manner. • It uses progressive alignment methods- align the most similar sequences first and work their way down to the least similar sequences until a global alignment is created. • ClustalW is a matrix-based algorithm- tools like T Coffee and Dialign are consistency-based. ClustalW is fairly efficient algorithm competes - against other software. • This program requires three or more sequences in order to calculate a global alignment and for pairwise sequence alignment 4
  • 5.
    Jens Martensson 5 Algorithm • ClustalW usesprogressive alignment methods. sequences with the best alignment score are aligned first, then progressively more distant groups of sequences are aligned. • This heuristic approach is necessary due to the time and memory demand of finding the global optimal solution. • The first step to the algorithm is computing a rough distance matrix between each pair of sequences, also known as pairwise sequence alignment. • The next step is a neighbor-joining method that usesmidpoint rooting to create an overall guide tree.
  • 6.
  • 7.
    Jens Martensson 7 Multiple Alignment Method Thesteps are summarized as follows: • Compare all sequences pairwise. • Perform cluster analysis on the pairwise data to generate a hierarchy for alignment. This may be in the form of a binary tree or a simple ordering • Build the multiple alignment by first aligning the most similar pair of sequences, then the next most similar pair and so on. Once an alignment of two sequences has been made, then this is fixed. Thus for a set of sequences A, B, C, D having aligned A with C and B with D the alignment of A, B, C, D is obtained by comparing the alignments of A and C with that of B and D using averaged scores at each aligned position.
  • 8.
    Jens Martensson 8 ClustalW For multiple alignment •ClustaW is a general purpose multiple alignment program for DNA or proteins. • ClustalW is produced by Julie D. Thompson, Toby Gibson of European Molecular Biology Laboratory, Germany and Desmond Higgins of European Bioinformatics Institute, Cambridge, UK. • ClustalW can create multiple alignments, manipulate existing alignments, do profile analysis and create phylogenetic trees. • Alignment can be done by 2 methods: slow/accurate fast/approximate
  • 9.
    Jens Martensson 9 ClustalW - Input Outputformat Input sequences Scoring matrix Gap scoring
  • 10.
    Jens Martensson 10 ClustalW - Input Download •Downloading Protein sequence in FASTA format
  • 11.
  • 12.
    Jens Martensson 12 ClustalW - Input Sequencesfor alignment https://textsaver.flap.tv/lists/4a27
  • 13.
    Jens Martensson 13 ClustalW - Output Matchstrength in decreasing order
  • 14.
  • 15.
  • 16.
  • 17.
    Resources • https://www.genome.jp/tools-bin/clustalw • https://www.uniprot.org/uniprot/P02769 •Bioinformatics Tools for Multiple Sequence Alignment < EMBL-EBI • https://en.wikipedia.org/wiki/KEGG • https://www.google.com
  • 18.