SlideShare a Scribd company logo
1 of 61
Download to read offline
Report on Phylogenetic tree
Introduction to bioinformatics
Page 1
Government Postgraduate College Mandian Abbottabad.
Report on Phylogenetic tree
Subject: Introduction to Bioinformatics
SUBMITTED BY:
Name: Zarlish Attique
Registration no: 187104
BS Bioinformatics Semester 04
SUBMITTED TO:
Name: Sir Muhammad Rizwam
Department: Bioinformatics
Date: June,19,2020
Report on Phylogenetic tree
Introduction to bioinformatics
Page 2
Contents
1, Phylogenetics:- ...................................................................................................................................4
1. Description:-................................................................................................................................4
2. Phylogenetic inference methods:-...............................................................................................4
3. About taxonomy:- ........................................................................................................................4
4. Brief History:- ..............................................................................................................................5
4.2. 1866,...................................................................................................................................6
5. Evolution:- ...................................................................................................................................6
2.Evolution of Bionformatics tools:- .......................................................................................................7
1.1. Bioinformatics experts ............................................................................................................7
1.2. Development and use of computational and an array of bioinformatics tools.......................7
3. Phylogenetic tree................................................................................................................................9
1. Computational phylogenetics:- ...................................................................................................9
2. Traditional phylogenetics and recent phylogenetics:-.................................................................9
3. Molecular data such as DNA sequence for genes and amino acid sequence for proteins:-... 10
4. Evolutionary history and relationship:-..................................................................................... 10
5. Phylogenetic tree is a graphical representation ...................................................................... 11
4. Types of Phylogenetic trees:-....................................................................................................... 12
5. Method for constructing Phylogenetic tree:-................................................................................ 15
.List of Methods for constructing trees:-.............................................................................................. 16
Character State method .............................................................................................................. 16
Method for validation of phylogenetic tree.................................................................................. 16
Table 1:-Representing mathods ...................................................................................................... 17
Online Softwares available for Phylogenetic analysis ......................................................................... 18
Desktop Software............................................................................................................................ 20
Libraries:-......................................................................................................................................... 21
Unweighted Pair Group Method with Arithmetic Mean ....................................................................... 22
1.1. Description:-......................................................................................................................... 22
Tree consisting of 6 OTUs............................................................................................................ 23
Another Example of UPGMA ........................................................................................................ 26
The Neighbor-Joining Method.............................................................................................................. 26
1. Note:-........................................................................................................................................... 27
Advantages and disadvantages of the neighbor-joining method .................................................... 30
Report on Phylogenetic tree
Introduction to bioinformatics
Page 3
Maximum parsimony (MP):.................................................................................................................. 30
Character based Method..................................................................................................................... 30
Maximum-likelihood (ML): ............................................................................................................... 30
Bootstrapping:-.................................................................................................................................... 31
Multiple sequence alignment (MSA) ................................................................................................... 31
Description:-..................................................................................................................................... 31
Practical Section:-................................................................................................................................ 32
ClustalW........................................................................................................................................... 32
Access:-............................................................................................................................................ 32
ClustalW for Phylogeneetic tree construction:-.................................................................................... 33
6. ClustalW |Result Interpretation..................................................................................................... 39
Applications of Phylogenetic tree construction:- ................................................................................. 54
References .......................................................................................................................................... 60
Report on Phylogenetic tree
Introduction to bioinformatics
Page 4
Phylogenetic tree
1, Phylogenetics:-
1. Description:-
In biology, phylogenetics (Greek:– phylé, phylon = tribe, clan, race + genetikós = origin,
source, birth) is a part of systematics that addresses the inference of
the evolutionary history and relationships among or within groups
of organisms (e.g. species, or more inclusive taxa).
Figure 1 represents the derivation of phylogenetics.
2. Phylogenetic inference methods:-
These relationships are hypothesized by phylogenetic inference methods that evaluate
observed heritable traits, such as DNA sequences or morphology, often under a specified
model of evolution of these traits.
3. About taxonomy:-
Taxonomy is the identification, naming and classification of organisms. Classifications
are now usually based on phylogenetic data, and many systematics contend that
only monophyletic taxa should be recognized as named groups.
3.1. School of taxonomy:-
The degree to which classification depends on inferred evolutionary history differs
depending on the school of taxonomy: phenetics ignores phylogenetic speculation
altogether, trying to represent the similarity between organisms
instead; cladistics (phylogenetic systematics) tries to reflect phylogeny in its
classifications by only recognizing groups based on shared, derived characters
(synapomorphies); evolutionary taxonomy tries to take into account both the branching
pattern and "degree of difference" to find a compromise between them.
phylon = tribe,
clan, race
genetikós =
origin, source,
birth
Phylogenetics
the inference of
the evolutionary
history and
relationships
Report on Phylogenetic tree
Introduction to bioinformatics
Page 5
Figure represents the taxonomy of one of the example known as homo sepians.
4. Brief History:-
The term "phylogeny" derives from the German Phylogenie, introduced by Haeckel in
1866, and the Darwinian approach to classification became known as the "phyletic"
approach.
4.1. 1858 Heinrich Georg Bronn
Paleontologist Heinrich Georg Bronn (1800–1862) published a hypothetical tree to
illustrating the paleontological "arrival" of new, similar species following the extinction
of an older species. Bronn did not propose a mechanism responsible for such phenomena,
precursor concept.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 6
Branching tree diagram from Heinrich Georg Bronn's work (1858)
4.2.1866, Ernst Haeckel
1866, Ernst Haeckel, first publishes his phylogeny-based evolutionary tree, A precursor
concept.
Figure represents Phylogenetic tree suggested by Haeckel (1866).
5. Evolution:-
Evolution is the change in heritable traits of biological organisms over generations due
to natural selection, mutation, gene flow, and genetic drift. Also known as descent with
Report on Phylogenetic tree
Introduction to bioinformatics
Page 7
modification. Over time these evolutionary processes lead to formation of new species
(speciation), changes within lineages (anagenesis), and loss of species (extinction).
Figure A diagram showing the relationships between various groups of organisms and
concept of evolution.
"Evolution" is also another name for evolutionary biology, the subfield
of biology concerned with studying evolutionary processes that produced the diversity of
life on Earth.
2.Evolution of Bionformatics tools:-
1.1.Bioinformatics experts
Bioinformatics experts have developed a large collection of tools to make sense of the
rapidly growing data related to molecular biology. Biological systems are complex and often
need to combine data sets and use more than one tool to understand them. Therefore,
bioinformatics experts have experimented with a number of strategies to try to integrate data
sets and tools.
Complex biological system usually requires gathering a variety of data from a variety of
sources, so multiple tools are needed. Therefore, there is a clear need for technology that
combines both data and tools to create a workflow that can be easily used by biologists.
1.2.Development and use of computational and an array of bioinformatics tools
Development and use of computational and an array of bioinformatics tools provides the
Report on Phylogenetic tree
Introduction to bioinformatics
Page 8
ability to analyze large data sets in practical computing times, and yielding an optimal or
near-optimal solutions with high probability are being possible. In response to this trend,
much of the current research in phyloinformatics (i.e., computational phylogenetics)
concentrates on the development of more efficient heuristic approaches.
Figure represents the data storage to computer with the evolution of Bioinformatics tools.
***“----The phylogenetic tree----”****
Computational phylogenetics is the application of computational algorithms, methods,
and programs to phylogenetic analyses. The goal is to assemble a phylogenetic
tree representing a hypothesis about the evolutionary ancestry of a set of genes, species,
or other taxa.
Figure The root of the tree of life.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 9
3. Phylogenetic tree
----- General Description ----
1. Computational phylogenetics:-
Computational phylogenetics is the application of computational algorithms, methods,
and programs to phylogenetic analyses. The goal is to assemble a phylogenetic
tree representing a hypothesis about the evolutionary ancestry of a set of genes, species,
or other taxa.
1.1. Example:-
For example, these techniques have been used to explore the family tree of gene α-
hemoglobin and the relationships between specific genes.
Figure The gene tree for the gene α-hemoglobin compared to the species tree. Both
match because the gene evolved from common ancestors.
2. Traditional phylogenetics and recent phylogenetics:-
Traditional phylogenetics relies on morphological data obtained by measuring and
quantifying the phenotypic properties of representative organisms, while the more recent
field of molecular phylogenetics uses nucleotide sequences encoding genes or amino
acid sequences encoding proteins as the basis for classification.
Many forms of molecular phylogenetics are closely related to and make extensive use
of sequence alignment in constructing and refining phylogenetic trees, which are used to
classify the evolutionary relationships between homologous genes represented in
the genomes of divergent species. The phylogenetic trees constructed by computational
methods are unlikely to perfectly reproduce the evolutionary tree that represents the
historical relationships between the species being analyzed. The historical species tree
may also differ from the historical tree of an individual homologous gene shared by those
species.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 10
Figure Tree of life focused on the relation between human and apes.
3. Molecular data such as DNA sequence for genes and amino acid sequence for
proteins:-
Phylogenetic analysis using molecular data such as DNA sequence for genes and amino
acid sequence for proteins is very common not only in the field of evolutionary biology
but also in the wide fields of molecular biology. The reason is that DNA sequencing
became very popular and a huge amount of sequence data of genes and proteins are
available in the public online database. Since many molecules (genes or proteins) which
have various evolutionary rates are available, it is important to choose the suitable
molecule for the phylogenetic analysis of a given lineage.
3.1.Example:-
For example, when the evolutionary rate of the gene (or protein) is too much higher for a
given lineage, the substitution of nucleotide (or amino acid) is saturated. In this case, the
accuracy of the phylogenetic analysis decreases. The methods for phylogenetic analysis
are improving along with the evolution of computer science. Thus, there are many
methods to infer phylogenetic tree, and many programs for each method are available.
4. Evolutionary history and relationship:-
Phylogenetic analysis is a method to elucidate the evolutionary history and relationship
among a group of organisms. In Past, phylogenetic analysis was based on morphological
comparison among the fossils, but the information from fossils was limited. Now,
molecular phylogenetic analysis using molecular data such as DNA or proteins become
popular.
4.1. Reasons for popularity:-
There are several reasons These include,
(1) popularity of DNA sequencing method
(2) establishment of methods for phylogenetic tree construction using gene or protein
sequences
(3) The results of a phylogenetic analysis being treated in a quantitative pattern
(4) Availability of many programs for constructing phylogenetic tree.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 11
The knowledge from phylogenetic analysis contributes to basic biology (e.g. evolutionary
history of species, the evolution of genes, and identification of sampled species) as well
as applied biology (e.g. investigation of the route of the infection of pathogenic
microorganisms). Phylogenetic trees are commonly constructed to figure out the
evolutionary relationship among species. Selection of the molecules (genes or proteins)
DNA sequences of genes, RNA sequences of functional RNA, or amino acid sequences
of proteins are used for phylogenetic analysis. To choose the molecule for phylogenetic
analysis, there are two focal points. First, the genes must be shared by all of the given
species. Secondly, the genes have the proper evolutionary rates, because proteins have
varied evolutionary rates (Miyata et al., 1980). If a species has a distant relationship, the
molecule which has low evolutionary rate should be chosen. This is because nucleotide or
amino acid substitution of gene or proteins reaches to saturation between distant species
when the evolutionary rate is high. Note that nucleotide sequence of a gene is easy to
reach to saturation than an amino acid sequence of the coded protein. In this case,
housekeeping genes which have low evolutionary rate are suitable.
5. Phylogenetic tree is a graphical representation
A phylogenetic tree is a graphical representation of the evolutionary relationships among
entities that share a common ancestor. Those entities can be species, genes, genomes, or
any other operational taxonomic unit (OTU).
More specifically, a phylogenetic tree, with its pattern of branching, represents the
descent from a common ancestor into distinct lineages. It is critical to understand that the
branching patterns and branch lengths that make up a phylogenetic tree can rarely be
observed directly, but rather they must be inferred from other information. The principle
underlying phylogenetic inference is quite simple: Analysis of the similarities and
differences among biological entities can be used to infer the evolutionary history of
those entities.
Figure The gene tree for the gene Glycosyl Hydrolase compared to the species tree. The
trees do not match because of the horizontal gene transfer (HGT).
Report on Phylogenetic tree
Introduction to bioinformatics
Page 12
4. Types of Phylogenetic trees:-
The branches of a phylogenetic tree may be represented two different ways:
1.1.Scaled and Unscaled Trees
Scaled branches
Branches will be different lengths based on the number of evolutionary changes or distance.
Unscaled branches
All branches in the tree are the same length.
Figure represents the scaled and unscaled branches trees.
Species and Gene Trees
Species Trees
“Species” Trees recover the genealogy of taxa, individuals of a population, etc.
Internal nodes represent speciation or other taxonomic events.
Species trees should contain sequences from only orthologous genes.
Gene Trees
Gene trees represent the evolutionary history of the genes included in the study.
Gene trees can provide evidence for gene duplication events, as well as speciation events.Sequences
from different homologs can be included in a gene tree; the subsequent analyses should cluster
orthologs, thus demonstrating the evolutionary history of the orthologs.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 13
Rooted versus Unrooted Trees
Rooted phylogenetic tree
In a rooted phylogenetic tree, each node with descendants represents the inferred most recent common
ancestors of the descendants, and the edge lengths in some trees may be interpreted as time estimates.In
rooted tress the ancestral state of organisms or genes is shown at the bottom of the tree, and the tree
branches, or bifurcates until it reaches the terminal branches, tips or leaves at the top of the tree.
Rooted trees shows the most basal ancestor of the tree.
Rooted trees reflect the most basal ancestor of the tree in question.
There are competing techniques for rooting a tree; one of the most common methods is through the use of
an "outgroup" (The Parsimony Methods).
Unrooted phylogenetic tree
Unrooted phylogenetic tree does not show an ancestral root.Unrooted binary tree is unrooted tree in which
each vertex has either one or three neighbors.
Unrooted trees represents the branching order but do not indicate the root or location of the last common
ancestor.
Unrooted trees shows the relatedness of organisms without indicating ancestry.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 14
Figure represents the unrooted tree with unscaled branches.
2. Terms used to describe rooted and unrooted tree:-
1.1.Clade
An ancestor (an organism, population, or species) and all of its descendants.
1.1.1. Sister clade
One member of a pair of clades originating when a single lineage splits into two. Sister
clades thus share an exclusive common ancestry and are mutually most closely related to
one another in terms of common ancestry.
1.2.Ancestor
An entity from which another entity is descended
1.3.Node
A point or vertex on a tree (in the sense of graph theory). On a phylogenetic tree, a node
is commonly used to represent (1) the split of one lineage to form two or more lineages
(internal node) or the extinction of a lineage (terminal node) or the lineage at a specified
time, often the present (terminal node), or (2) a taxon, whether ancestral (internal node)
or descendant (internal node or terminal node).
1.4.Root
The root of the tree represents the ancestral lineage, and the tips of the branches
represent the descendants of that ancestor
1.5.Leaf
Each leaf on a phylogenetic tree represents a taxon.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 15
Figure represents terms used to describe rooted and unrooted tree.
5. Method for constructing Phylogenetic tree:-
Summary:-
(1)The first step in doing phylogenetics is to choose the sequences from which the
tree should be constructed. Very popular sequences to construct phylogenetic trees
are the sequences of rRNA (the RNA the ribosome is build of) and mitochondrial
genes.These genetic material is present in almost all organisms and they have enough
mutations to reliably construct a tree. (2)The second step is to construct pairwise and
multiple sequence alignments from these sequences. (3)The third step is to choose a
method for constructing a phylogenetic tree. There exist 3 categories: distance-based,
maximum parsimony, and maximum likelihood. Maximum parsimony should be
chosen for strong sequence similarities because too much variation results in many
possible trees. For the same reason only few sequences (less than 15) should be used.
Distance based methods (e.g. clustalW) require less similarity among the sequences
than maximum parsimony methods but sequence similarities should be present. Some
sequences should be similar to one another and others are less similar. Distance based
methods can be applied to a set of many sequences. Maximum likelihood methods
may be used for very variable sequences but the computational costs increase with the
number of sequences as every possible tree must be considered.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 16
Figure represents the method for constructing phylogenetic tree.
This method is use in the practical section(mention section name)
List of Methods for constructing trees:-
Distance matrix method
1.UPGMA
2.Transfromed distance method
3.Neighbor’s Relation method
4.Neighbor joining method
5.Fitch and Margoliash method
Character State method
1.Maximum likelihood approach
Method for validation of phylogenetic tree
1.Bootstrapping
2.Felsenstein’s bootstrap test
Report on Phylogenetic tree
Introduction to bioinformatics
Page 17
Figure represent the methods for constructing phylogenetic tree.
Table 1:-Representing mathods
Method Advantage Disadvantage Other information
Maximum
parsimony
Appropriate for very similar
sequences and a small
number of sequences
Very time-consuming as
it tests all possible trees
Parsimony may fail for
diverged sequences
Suffers from the long-
branch attraction
Predict the evolutionary tree
that minimizes the number of
steps required to generate the
observed variation in the
sequences
It is built with the fewest
changes required to explain
(tree) the differences observed
in the data
Maximum
likelihood
Suitable for very dissimilar
sequences
We can formulate
hypothesis about
evolutionary relationships
A slow search algorithm
will lead to slow
response
Takes a long time for
large datasets
It tries to find a model that has
the highest probability to
generate the input sequence
under a given evolutionary
model
Methods for
constructing trees
Distance matrix
method
Character State
method
validation of
phylogenetic tree
Report on Phylogenetic tree
Introduction to bioinformatics
Page 18
More accurate phylogenetic
trees can be constructed for
a small number of taxa in a
reasonable time frame
Neighbour
joining
Faster than the character-
based method
They are fast and can be
used with a variety of
models
Conversion from
sequence data to
distance data leads to
loss of information
Provides an unrooted tree and a
single resultant tree
UPGMA Reliable for related
sequences
Evolution rate is
constant in all branches
UPGMA provides rooted tree
Fitch
Mangrolish
Less sensitive to variations
in evolutionary rate
Dependent on the model
used to obtain the
distance matrix
Online Softwares available for Phylogenetic analysis
This list of phylogenetic tree viewing software is a compilation of software tools and web
portals used in visualising phylogenetic trees.
Softwares:-
Name Description
Aquapony Javascript tree viewer for Beast
ETE toolkit Tree
Viewer
an online tool for phylogenetic tree view (newick format) that allows
multiple sequence alignments to be shown together with the trees (fasta
format)
EvolView an online tool for visualizing, annotating and managing phylogenetic trees
Report on Phylogenetic tree
Introduction to bioinformatics
Page 19
IcyTree Client-side Javascript SVG viewer for annotated rooted trees. Also supports
phylogenetic networks
Iroki Automatic customization and visualization of phylogenetic trees
iTOL -
interactive Tree
Of Life
annotate trees with various types of data and export to various graphical
formats; scriptable through a batch interface
Microreact Link, visualise and explore sequence and meta-data using phylogenetic trees,
maps and timelines
OneZoom uses IFIG (Interactive Fractal Inspired Graphs) to display phylogenetic trees
which can be zoomed in on to increase detail
Phylo.io View and compare up to 2 trees side by side with interactive HTML5
visualisations
PhyloExplorer a tool to facilitate assessment and management of phylogenetic tree
collections. Given an input collection of rooted trees, PhyloExplorer provides
facilities for obtaining statistics describing the collection, correcting invalid
taxon names, extracting taxonomically relevant parts of the collection using a
dedicated query language, and identifying related trees in
the TreeBASEdatabase.
PHYLOViZ
Online
Web-based tool for visualization, phylogenetic inference, analysis and
sharing of minimum spanning trees
PhyloWidget view, edit, and publish phylogenetic trees online; interfaces with databases
T-REX
(Webserver)
Tree inference and visualization (hierarchical, radial and axial tree
views), Horizontal gene transfer detection and HGT network visualization
TidyTree A client-side HTML5/SVG Phylogenetic Tree Renderer, based on D3.js
TreeVector scalable, interactive, phylogenetic trees for the web, produces dynamic SVG
Report on Phylogenetic tree
Introduction to bioinformatics
Page 20
or PNG output, implemented in Java
Desktop Software
Name Description OS1
ARB An integrated software environment for tree visualisation and
annotation
LM
Archaeopteryx Java tree viewer and editor (used to be ATV)
BioNumerics Universal platform for the management, storage and analysis of all
types of biological data, including tree and network inference of
sequence data
W
Bio::Phylo A collection of Perl modules for manipulating and visualizing
phylogenetic data. Bio::Philo is one part of a comprehensive suite
of Perl biology tools
All
Dendroscope An interactive viewer for large phylogenetic trees and networks All
DensiTree A viewer capable of viewing multiple overlaid trees. All
JEvTrace A multivalent browser for sequence alignment, phylogeny, and
structure. Performs an interactive Evolutionary Trace[21]
and other
phylogeny-inspired analysis.
All
MEGA Software for statistical analysis of molecular evolution. It includes
different tree visualization features
All
MultiDendrograms Interactive open-source application to calculate and plot phylogenetic
trees
All
PHYLOViZ Phylogenetic inference and data visualization for allelic/SNP
sequences profiles using Minimum Spanning Trees
All
Report on Phylogenetic tree
Introduction to bioinformatics
Page 21
TreeDyn Open-source software for tree manipulation and annotation allowing
incorporation of meta information
All
Treevolution Open-source tool for circular visualization with section and ring
distortion and several other features such as branch clustering and
pruning
All
TreeGraph 2 Open-source tree editor with numerous editing and formatting
operations including combining different phylogenetic analyses
All
TreeView Treeviewing software All
UGENE An opensource visual interface for Phylip 3.6 package All
"All" refers to Microsoft Windows, Apple OSX and Linux; L=Linux, M=Apple Mac,
W=Microsoft Windows
Libraries:-
Name Language Description
ggtree R An R package for tree visualization and annotation with grammar of
graphics supported
jsPhyloSVG Javascript open-source javascript library for rendering highly-extensible,
customizable phylogenetic trees; used for Elsevier's interactive trees
PhyD3 Javascript interactive phylogenetic tree visualization with numerical annotation
graphs, with SVG or PNG output, implemented in D3.js
phylotree.js Javascript phylotree.js is a library that extends the popular data visualization
framework D3.js, and is suitable for building JavaScript applications
where users can view and interact with phylogenetic trees
Phytools R Phylogenetic Tools for Comparative Biology (and Other Things)
Report on Phylogenetic tree
Introduction to bioinformatics
Page 22
based in R
toytree Python Toytree: A minimalist tree visualization and manipulation library for
Python
Methods for Phylogenetic tree:-
Distance matrix its advantages and disadvantages
Distance-matrix methods of phylogenetic analysis explicitly rely on a measure of "genetic
distance" between the sequences being classified, and therefore they require an MSA (multiple
sequence alignment) as an input. Distance is often defined as the fraction of mismatches at
aligned positions, with gaps either ignored or counted as mismatches. Distance methods attempt
to construct an all-to-all matrix from the sequence query set describing the distance between each
sequence pair. From this is constructed a phylogenetic tree that places closely related sequences
under the same interior node and whose branch lengths closely reproduce the observed distances
between sequences. Distance-matrix methods may produce either rooted or unrooted trees,
depending on the algorithm used to calculate them. They are frequently used as the basis for
progressive and iterative types of multiple sequence alignment. The main disadvantage of
distance-matrix methods is their inability to efficiently use information about local high-variation
regions that appear across multiple subtrees.
Unweighted Pair Group Method with Arithmetic Mean
1.1.Description:-
UPGMA: Unweighted Pair Group Method with Arithmetic Mean: A simple
clustering method that assumes a constant rate of evolution (molecular clock
hypothesis). It needs a distance matrix of the analysed taxa that can be calculated
from a multiple alignment.
UPGMA stands for :
Unweighted Pair-Group Method with Arithmetic mean
Unweighted – all pairwise distances contribute equally.
Pair-Group – groups are combined in pairs (dichotomies only).
Arithmetic mean – pairwise distances to each group (clade) are mean distances to
all members of that group.
1.2.Construction of a distance tree using clustering with the Unweighted Pair Group
Method with Arithmatic Mean (UPGMA).
The UPGMA is the simplest method of tree construction. It was originally developed
for constructing taxonomic phenograms, i.e. trees that reflect the phenotypic
similarities between Operational taxonomic units OTUs, but it can also be used to
construct phylogenetic trees if the rates of evolution are approximately constant
Report on Phylogenetic tree
Introduction to bioinformatics
Page 23
among the different lineages.
For this purpose the number of observed nucleotide or amino-acid substitutions can
be used. UPGMA employs a sequential clustering algorithm, in which local
topological relationships are identifeid in order of similarity, and the phylogenetic
tree is build in a stepwise manner.
We first identify from among all the OTUs the two OTUs that are most similar to
each other and then treat these as a new single OTU. Such a OTU is referred to as a
composite OTU. Subsequently from among the new group of OTUs we identify the
pair with the highest similarity, and so on, until we are left with only two
Tree consisting of 6 OTUs
UTUs.Suppose we have the following tree consisting of 6 OTUs:
The pairwise evolutionary distances are given by the following distance
matrix:
A B C D E
B 2
C 4 4
D 6 6 6
E 6 6 6 4
F 8 8 8 8 8
Report on Phylogenetic tree
Introduction to bioinformatics
Page 24
1.1.1. Step 1:-
We now cluster the pair of OTUs with the smallest distance, being A and B,
that are separated a distance of 2. The branching point is positioned at a
distance of 2 / 2 = 1 substitution. We thus constuct a subtree as follows:
Following the first clustering A and B are considered as a single composite
OTU(A,B) and we now calculate the new distance matrix as follows:
dist(A,B),C = (distAC + distBC) / 2 = 4
dist(A,B),D = (distAD + distBD) / 2 = 6
dist(A,B),E = (distAE + distBE) / 2 = 6
dist(A,B),F = (distAF + distBF) / 2 = 8
In other words the distance between a simple OTU and a composite OTU is
the average of the distances between the simple OTU and the constituent
simple OTUs of the composite OTU. Then a new distance matrix is
recalculated using the newly calculated distances and the whole cycle is being
repeated:
1.1.2. Step 2:-
A,B C D E
C 4
D 6 6
E 6 6 4
F 8 8 8 8
1.1.3. Step 3:-
A,B C D,E
C 4
D,E 6 6
F 8 8 8
1.1.4. Step 4:-
AB,C D,E
D,E 6
F 8 8
Report on Phylogenetic tree
Introduction to bioinformatics
Page 25
1.1.5. Step 5:-
The final step consists of clustering the last OTU, F, with the composite OTU.
ABC,DE
F 8
Although this method leads essentially to an unrooted tree, UPGMA assumes
equal rates of mutation along all the branches, as the model of evolution used.
The theoretical root, therefore, must be equidistant from all OTUs. We can
here thus apply the method of mid-point rooting. The root of the entire tree is
then positioned at dist (ABCDE),F / 2 = 4.
1.1.6. Final tree:-
The final tree as inferred by using the UPGMA method is shown below.
So now we have reconstructed the phylogenetic tree using the UPGMA method. As you can see
we have obtained the original phylogenetic tree we started with.
In bioinformatics, UPGMA is used for the creation of phenetic trees (phenograms). UPGMA was
initially designed for use in protein electrophoresis studies, but is currently most often used to
produce guide trees for more sophisticated algorithms. This algorithm is for example used
in sequence alignment procedures, as it proposes one order in which the sequences will be
aligned. Indeed, the guide tree aims at grouping the most similar sequences, regardless of their
evolutionary rate or phylogenetic affinities, and that is exactly the goal of UPGMA.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 26
Another Example of UPGMA
The Neighbor-Joining Method
1.1. Description:-
Neighbour-joining (NJ): Bottom-up clustering method that also needs a distance matrix.
NJ is a heuristic approach that does not guarantee to find the perfect result, but under
normal conditions has a very high probability to do so. It has a very good computational
efficiency, making it well suited for large datasets.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 27
1.2. The Neighbor-Joining Method
Neighbor-joining (Saitou and Nei, 1987) is a method that is related to the cluster method
but does not require the data to be ultrametric. In other words it does not require that all
lineages have diverged by equal amounts. The method is especially suited for datasets
comprising lineages with largely varying rates of evolution. It can be used in combination
with methods that allow correction for superimposed substitutions.
1.3. History
Created by Naruya Saitou and Masatoshi Nei in 1987. Usually used for trees based
on DNA or protein sequence data, the algorithm requires knowledge of the distance
between each pair of taxa (e.g., species or sequences) to form the tree.
1.4. Programs
The following programs are available
 Neighbor of the Phylip package (Jo Felsentein, Univ. Washington),
 ClustalW (D. Higgins, EMBL) ,
 Distnj in the Protml package (Adachi and Hasegawa, Univ. Tokyo)
1.5. Star decomposition method
The neighbor-joining method is a special case of the star decomposition method. In
contrast to cluster analysis neighbor-joining keeps track of nodes on a tree rather than
taxa or clusters of taxa. The raw data are provided as a distance matrix and the initial tree
is a star tree. Then a modified distance matrix is constructed in which the separation
between each pair of nodes is adjusted on the basis of their average divergeance from all
other nodes. The tree is constructed by linking the least-distant pair of nodes in this
modified matrix. When two nodes are linked, their common ancestral node is added to
the tree and the terminal nodes with their respective branches are removed from the tree.
This pruning process converts the newly added common ancestor into a terminal node on
a tree of reduced size. At each stage in the process two terminal nodes are replaced by
one new node. The process is complete when two nodes remain, separated by a single
branch..
1.6. Note:-
NB: especially its suitability to handle large datasets has led to the fact that the
method is widely used by molecular evolutionists. With the rapid growth of
sequence databases it is still one of the few methods that allows the rapid inclusion
of all homologous sequences present in the database in a single tree. A good
example can be found in the Ribosomal Database Project that maintains a tree of life
based on all available ribosomal RNA sequences.
Example of the method
Suppose we have the following tree:
Report on Phylogenetic tree
Introduction to bioinformatics
Page 28
Since B and D have accumulated mutations at a higher rate than A. The Three-point
criterion is violated and the UPGMA method cannot be used since this would group
together A and C rather than A and B. In such a case the neighbor-joining method is
one of the recommended methods.
The raw data of the tree are represented by the following distance matrix:
A B C D E
B 5
C 4 7
D 7 10 7
E 6 9 6 5
F 8 11 8 9 8
We have in total 6 OTUs (N=6).
Step 1:-
We calculate the net divergence r (i) for each OTU from all other OTUs
r(A) = 5+4+7+6+8=30
r(B) = 42
r(C) = 32
r(D) = 38
r(E) = 34
r(F) = 44
Step 2:-
Now we calculate a new distance matrix using for each pair of OUTs the formula:
M(ij)=d(ij) - [r(i) + r(j)]/(N-2) or
in the case of the pair A,B:M(AB)=d(AB) -[(r(A) + r(B)]/(N-2) = -13
A B C D E
B -13
Report on Phylogenetic tree
Introduction to bioinformatics
Page 29
Now we start with a star tree:
A
F | B
 | /
 | /
|/
/|
/ | 
/ | 
E | C
D
Step 3:-
Now we choose as neighbors those two OTUs for which Mij is the smallest. These are A
and B and D and E. Let's take A and B as neighbors and we form a new node called U.
Now we calculate the branch length from the internal node U to the external OTUs A and
B.
S(AU) =d(AB) / 2 + [r(A)-r(B)] / 2(N-2) = 1
S(BU) =d(AB) -S(AU) = 4
Step 4: Now we define new distances from U to each other terminal node:
d(CU) = d(AC) + d(BC) - d(AB) / 2 = 3
d(DU) = d(AD) + d(BD) - d(AB) / 2 = 6
d(EU) = d(AE) + d(BE) - d(AB) / 2 = 5
d(FU) = d(AF) + d(BF) - d(AB) / 2 = 7
and we create a new matrix:
U C D E
C 3
C -
11.5
-
11.5
D -10 -10 -
10.5
E -10 -10 -
10.5
-13
F -
10.5
-
10.5
-11 -
11.5
-
11.5
Report on Phylogenetic tree
Introduction to bioinformatics
Page 30
D 6 7
E 5 6 5
F 7 8 9 8
The resulting tree will be the following:
C
D |
 | A
|___/ 1
/| 
/ |  4
E | 
F 
B
N= N-1 = 5
The entire procedure is repeated starting at step 1
Advantages and disadvantages of the neighbor-joining method
 Advantages
o is fast and thus suited for large datasets and for bootstrap analysis
o permist lineages with largely different branch lengths
o permits correction for multiple substitutions
 Disadvantages
o sequence information is reduced
o gives only one possible tree
strongly dependent on the model of evolution used
Maximum parsimony (MP):
This method tries to create a phylogeny that requires the least evolutionary change. It may suffer
from long branch attraction, a problem that leads to incorrect trees in rapidly evolving lineages
(Felsenstein, 1978).
Character based Method
Maximum-likelihood (ML):
ML uses a statistical approach to infer a phylogenetic tree. ML is well suited for the analysis of
distantly related sequences, but is computationally expensive and thus not that well suited for
larger input data.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 31
Bootstrapping:-
Bootstrapping is any test or metric that uses random sampling with replacement, and falls under
the broader class of resampling methods. Bootstrapping assigns measures of accuracy (bias,
variance, confidence intervals, prediction error, etc.) to sample estimates. This technique allows
estimation of the sampling distribution of almost any statistic using random sampling methods.
Bootstrapping and jackknifing are statistical methods to evaluate and distinguish the confidence
of partial hypotheses (“branch support”) that are contained in a phylogenetic tree and have
become a standard in molecular phylogenetic analyses.
Multiple sequence alignment (MSA)
Description:-
A multiple sequence alignment (MSA) is a sequence alignment of three or more biological
sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are
assumed to have an evolutionary relationship by which they share a linkage and are descended
from a common ancestor.
Analysis And uses:-
From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be
conducted to assess the sequences' shared evolutionary origins. Visual depictions of the
alignment as in the image at right illustrate mutation events such as point mutations
(single amino acid or nucleotide changes) that appear as differing characters in a single
alignment column, and insertion or deletion mutations (indels or gaps) that appear as hyphens in
one or more of the sequences in the alignment.
Multiple sequence alignment is often used to assess sequence conservation of protein
domains, tertiary and secondary structures, and even individual amino acids or nucleotides.
Sequence set:-
Multiple sequence alignment also refers to the process of aligning such a sequence set. Because
three or more sequences of biologically relevant length can be difficult and are almost always
time-consuming to align by hand, computational algorithms are used to produce and analyze the
alignments. MSAs require more sophisticated methodologies than pairwise alignment because
they are more computationally complex. Most multiple sequence alignment programs
use heuristic methods rather than global optimization because identifying the optimal alignment
between more than a few sequences of moderate length is prohibitively computationally
expensive.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 32
Figure represents first 90 positions of a protein multiple sequence alignment of instances of the
acidic ribosomal protein P0 (L10E) from several organisms. Generated with ClustalX.
Practical Section:-
ClustalW: Clustal is a series of widely used computer programs used
in Bioinformatics for multiple sequence alignment. The third generation, released in 1994,
greatly improved upon the previous versions. It improved upon the progressive alignment
algorithm in various ways, including allowing individual sequences to be weighted down or up
according to similarity or divergence respectively in a partial alignment. It also included the
ability to run the program in batch mode from the command line.
Access:-
ClustalW can access from both NCBI(National Center for biotechnology) and EMBL(European
Management Biology Laborataory)
Report on Phylogenetic tree
Introduction to bioinformatics
Page 33
Figure represents ClustalW can access from both NCBI(National Center for biotechnology) and
EMBL(European Management Biology Laborataory)
Website link:-url to get homepage of ClustalW
https://www.genome.jp/tools-bin/clustalw
ClustalW for Phylogeneetic tree construction:-
1.Access ClustalW
Open ClustalW through website. When we open this two different types of distribution will be
there as shown in Figure – and -
Figure represents the homepage Distribution 1 of ClustalW
CLUSTALW
NCBI
EMBL
Report on Phylogenetic tree
Introduction to bioinformatics
Page 34
Figure represents the homepage Distribution 2 of ClustalW
2. Important information of Homepage:-
2nd
part of distribution should be taken as default or according to need.
3. Retreival of sequence:-
In the third step, retrieve the sequence of pqqc gene in FASTA format for multiple sequence
alignement need for construction of phylogenetic tree using Nucleotide database.
In which form you
need an output
Choose according to
need but slow and
accurate is
recommended
The sequence of
interest is in DNA or
Protein
Choose the file or
paste to execute
Click Directly on
Execute
Search for
pqqc gene
Report on Phylogenetic tree
Introduction to bioinformatics
Page 35
Figure represents the NCBI homepage use for sequence retrieval,here use pqqc gene.
Pyrroloquinoline Quinone Biosynthesis Gene pqqC.
Figure represents the pqqC gene in FASTA format.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 36
4.Use of BLASTn
BLAST it and it will take us to its output page
Report on Phylogenetic tree
Introduction to bioinformatics
Page 37
Figure represents results of BLASTn and Selection of sequence according to need but should be
5-3.
In this section select 18 sequences and download in FASTA format.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 38
Figure represents all the sequences present in notepad.
5.Multiple Sequence Alignment from ClustalW
All the aligned sequences are now placed in this software to execute and provide MSA.
Choose the
sequence file
from computer
Report on Phylogenetic tree
Introduction to bioinformatics
Page 39
6. ClustalW |Result Interpretation
Alignment
results.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 40
1.Sequence
number
2.Accessiom id
Report on Phylogenetic tree
Introduction to bioinformatics
Page 41
1.Sequence aligmment
number
2.Next to it is Score
Report on Phylogenetic tree
Introduction to bioinformatics
Page 42
1.After alignment it
forms group according to
similarity of sequences
2.Next to it is Score
Report on Phylogenetic tree
Introduction to bioinformatics
Page 43
 *Histeric represents homology or similarity and conserved
( ) gap represent gap or mismatich.
---- represents the stretch of sequence.
Accessiom id
Report on Phylogenetic tree
Introduction to bioinformatics
Page 44
In last we have clustal dendrolgrams
7. Clustal dendrolgrams/Tree Construction:-
8. Booststrapping:-
Here boostrap value 500 upto 1000. Means 1000 times tool runs and provide results.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 45
Figure represents the five trees that we can construct through ClustalW
Here we have 5 tres
1.Fast Tree
2.FastTree full
3.PhyML
4.PhyML bootsrap
5.RAxML
5.RAxML bootstrap
1.Choose PhyML bootsrap
Figure represents waiting.
https://www.genome.jp/tools-
bin/ete?id=20061915421388fff0d53d9d4b01b1b05494af7d14ba283099c8
Report on Phylogenetic tree
Introduction to bioinformatics
Page 46
Figure represents Tree that we get. It shows its relation with ancestors.
As we take 18 sequences on the basis of that resemblence it construct a tree which shows its
interaction as well as phylogenetic history or ancestors shows its relation with them.
Subtool use in
CLUSTALW
Method we use
Report on Phylogenetic tree
Introduction to bioinformatics
Page 47
Figure represents the outpage page of CLUSTALW.
Figure represents the the phylogram (phylogenetic tree ) that we get from the sequences with
accession number.These are 17 sequences that we aligned in CLUSTALW.
3. How to read this tree and Applying Filters and save its PNG form in computer.
These are accession
numbers written.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 48
Boostrap reverifired the tree or results of alignment as well.
Base of tree represents the ancestors start and move.
branch starts and end at leaf or clade.
These are Boostrap
value.
Circle in the last of
branch is called leaf
Branch /Branch length
Base of
tree/root
Report on Phylogenetic tree
Introduction to bioinformatics
Page 49
Represents Clade. With
percentage with boostrip
value next to accession
number
Report on Phylogenetic tree
Introduction to bioinformatics
Page 50
Report on Phylogenetic tree
Introduction to bioinformatics
Page 51
Report on Phylogenetic tree
Introduction to bioinformatics
Page 52
Report on Phylogenetic tree
Introduction to bioinformatics
Page 53
PNG form of phylogenetic tree.
https://www.genome.jp/tools-
bin/ete?id=20061915421388fff0d53d9d4b01b1b05494af7d14ba283099c8
Report on Phylogenetic tree
Introduction to bioinformatics
Page 54
Applications of Phylogenetic tree construction:-
1. The inference of phylogenies with computational methods has many important
applications in medical and biological research, such as drug discovery and conservation
biology.
Figure represents important applications in medical and biological research.
2. A result published by Korber et al. that times the evolution of the HIV-1 virus,
demonstrates that ML techniques can be effective in solving biological problems.
Figure represents phylogenetics tree in evolution.
3. Phylogenetic trees have already witnessed applications in numerous practical domains,
such as in conservation biology (illegal whale hunting), epidemiology (predictive
evolution), forensics (dental practice HIV transmission), gene function prediction and
Report on Phylogenetic tree
Introduction to bioinformatics
Page 55
drug development.
Figure represents gene function prediction.
4. Other applications of phylogenies include multiple sequence alignment protein structure
prediction ,gene and protein function prediction and drug design
Figure represents phylogenies include multiple sequence alignment.
5. A paper by Bader et al. addresses important industrial applications of phylogenetic trees,
e.g. in the area of commercial drug discovery.
Figure represents important industrial applications of phylogenetic trees
Report on Phylogenetic tree
Introduction to bioinformatics
Page 56
6. Due to the rapid growth of available sequence data over recent years and the constant
improvement of multiple alignment methods, it has now become feasible to compute very
large trees which comprise more than 1,000 organisms.
Figure represents Multiple sequence alignment of large no of organisms.
7. The computation of the tree-of life containing representatives of all living beings on earth
is considered to be one of the grand challenges in Bioinformatics.|
Report on Phylogenetic tree
Introduction to bioinformatics
Page 57
|
Figure represent tree of life.
8. Some large multi-institutional/multidisciplinary projects are underway which aim at
building the tree of life: CIPRES (Cyber Infrastructure for Phylogenetic Research
www.phylo.org) and ATOL (Assembling the Tree of Life project, tolweb.org).
9. Cancer research is considered one of the most significant areas in the medical
community. Mutations in genomic sequences are responsible for cancer development and
increased aggressiveness in patients The combination of all such genes mutations, or
progression pathways, across a population can be summarized in a phylogeny describing
the different evolutionary pathways.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 58
Figure represents Cancer evolutionary tree
10. Application of the phylogenetic tree can be explored for finding similarities among breast
cancer subtypes based on gene data.
11. Discovery of genes associated in cancer subtype help researchers to map different
pathways to classify cancer subtypes according to their mutations.
12. Methods of phylogenetic tree inference have proliferated in cancer genome studies such
as breast cancer.
13. Phylogenetic can capture important mutational events among different cancer types; a
network approach can also capture tumour similarities.
Figure respresents phylogenetic tree in mutation .
14. It has been observed from the literature that in cancer disease, the driver genes change the
cancer progression, and it even affects the participation of other genes thus generating
Report on Phylogenetic tree
Introduction to bioinformatics
Page 59
gene interaction network.
Figure represents phylogenetic tree and gene interaction network.
15. Phylogenetic methods can solve the problem of class prediction by using a classification
tree. Phylogenetic methods give us a deeper understanding of biological heterogeneity
among cancer subtype.
Report on Phylogenetic tree
Introduction to bioinformatics
Page 60
References
https://en.wikipedia.org/wiki/Phylogenetic_tree
https://microbenotes.com/how-to-construct-a-phylogenetic-tree/
http://www.eurekaselect.com/85739/article
https://en.ppt-online.org/762081
https://www.slideshare.net/FaisalHussain23/phylogenetic-tree-types-and-applicantion-75067233
http://www.eurekaselect.com/85739/article
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1667096/
https://www.bioinf.jku.at/teaching/current/ws_sapvl/BioInf_I_Notes.pdf
http://www.master-bioinformatik.at/curriculum/BioInf_I_Notes.pdf
http://evolution-textbook.org/content/free/contents/Chapter_27_Web.pdf
https://bip.weizmann.ac.il/education/course/introbioinfo/03/lect12/phylogenetics.pdf
https://www.slideshare.net/pscad123/phylogenetic-analysis
https://en.wikipedia.org/wiki/List_of_phylogenetics_software
https://en.wikipedia.org/wiki/Computational_phylogenetics
https://microbenotes.com/how-to-construct-a-phylogenetic-tree/
https://www.ncbi.nlm.nih.gov/Class/NAWBIS/Modules/Phylogenetics/phylo8.html
https://academic.oup.com/sysbio/article/62/4/625/1615980
https://slideplayer.com/slide/11460053/
https://www.researchgate.net/figure/Cladogram-I-Phylogram-II-Dendrogram-III_fig4_30072891
https://omictools.com/phylogenetics-and-phylogenomics-category
https://en.wikipedia.org/wiki/UPGMA
https://www.icp.ucl.ac.be/~opperd/private/upgma.html
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7123334/pdf/978-981-13-5934-
7_Chapter_17.pdf
https://www.ncbi.nlm.nih.gov/nuccore/KM251418.1?report=fasta
Report on Phylogenetic tree
Introduction to bioinformatics
Page 61

More Related Content

What's hot (20)

History and scope in bioinformatics
History and scope in bioinformaticsHistory and scope in bioinformatics
History and scope in bioinformatics
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbj
 
EMBL-EBI
EMBL-EBIEMBL-EBI
EMBL-EBI
 
Protein sequence databases
Protein sequence databasesProtein sequence databases
Protein sequence databases
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Kegg
KeggKegg
Kegg
 
RESTRICTION MAPPING
RESTRICTION MAPPINGRESTRICTION MAPPING
RESTRICTION MAPPING
 
Maximum parsimony
Maximum parsimonyMaximum parsimony
Maximum parsimony
 
Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological database
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Prosite
PrositeProsite
Prosite
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
Protein information resource (PIR)
Protein information resource (PIR)Protein information resource (PIR)
Protein information resource (PIR)
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Biological database
Biological databaseBiological database
Biological database
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
protein sequence analysis
protein sequence analysisprotein sequence analysis
protein sequence analysis
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 

Similar to Phylogenetic tree construction using bioinformatics tools Zarlish attique 187104

Bachelor-thesis Antinéa BABARIT
Bachelor-thesis Antinéa BABARITBachelor-thesis Antinéa BABARIT
Bachelor-thesis Antinéa BABARITAntin BABARIT
 
Massolino_tesiMCS_VideoSismologiaSocial_04.01.2017
Massolino_tesiMCS_VideoSismologiaSocial_04.01.2017Massolino_tesiMCS_VideoSismologiaSocial_04.01.2017
Massolino_tesiMCS_VideoSismologiaSocial_04.01.2017Giulia Massolino
 
The Jatropha System: An Integrated Approach of Rural Development
The Jatropha System: An Integrated Approach of  Rural Development  The Jatropha System: An Integrated Approach of  Rural Development
The Jatropha System: An Integrated Approach of Rural Development QZ1
 
Project Management Report - Rehabilitation of Child Soldiers
Project Management Report - Rehabilitation of Child SoldiersProject Management Report - Rehabilitation of Child Soldiers
Project Management Report - Rehabilitation of Child SoldiersFrancesca Hughes
 
gg2002gint.doc
gg2002gint.docgg2002gint.doc
gg2002gint.docbutest
 
CCSD Student Parent Handbook
CCSD Student Parent HandbookCCSD Student Parent Handbook
CCSD Student Parent Handbookbrandongrummer
 
Informes recientes
Informes recientesInformes recientes
Informes recientesOPSGuate
 
Indigenous and Traditional Peoples and Climate Change
 Indigenous and Traditional Peoples and Climate Change Indigenous and Traditional Peoples and Climate Change
Indigenous and Traditional Peoples and Climate ChangeDr Lendy Spires
 
Phase I – Literature Review
Phase I – Literature ReviewPhase I – Literature Review
Phase I – Literature Revieweconsultbw
 

Similar to Phylogenetic tree construction using bioinformatics tools Zarlish attique 187104 (20)

Student_Parent Handbook
Student_Parent HandbookStudent_Parent Handbook
Student_Parent Handbook
 
Bachelor-thesis Antinéa BABARIT
Bachelor-thesis Antinéa BABARITBachelor-thesis Antinéa BABARIT
Bachelor-thesis Antinéa BABARIT
 
book
bookbook
book
 
Massolino_tesiMCS_VideoSismologiaSocial_04.01.2017
Massolino_tesiMCS_VideoSismologiaSocial_04.01.2017Massolino_tesiMCS_VideoSismologiaSocial_04.01.2017
Massolino_tesiMCS_VideoSismologiaSocial_04.01.2017
 
CASE Network Report 54 - Poverty Dynamics in Poland. Selected Quantitative An...
CASE Network Report 54 - Poverty Dynamics in Poland. Selected Quantitative An...CASE Network Report 54 - Poverty Dynamics in Poland. Selected Quantitative An...
CASE Network Report 54 - Poverty Dynamics in Poland. Selected Quantitative An...
 
WFP-0000102101.pdf
WFP-0000102101.pdfWFP-0000102101.pdf
WFP-0000102101.pdf
 
The Jatropha System: An Integrated Approach of Rural Development
The Jatropha System: An Integrated Approach of  Rural Development  The Jatropha System: An Integrated Approach of  Rural Development
The Jatropha System: An Integrated Approach of Rural Development
 
Exposome
ExposomeExposome
Exposome
 
Clinical Porfolio
Clinical PorfolioClinical Porfolio
Clinical Porfolio
 
Project Management Report - Rehabilitation of Child Soldiers
Project Management Report - Rehabilitation of Child SoldiersProject Management Report - Rehabilitation of Child Soldiers
Project Management Report - Rehabilitation of Child Soldiers
 
General Anatomy - sample
General Anatomy - sample General Anatomy - sample
General Anatomy - sample
 
Borders of a human being
Borders of a human beingBorders of a human being
Borders of a human being
 
gg2002gint.doc
gg2002gint.docgg2002gint.doc
gg2002gint.doc
 
“Contraception in adolescence” (WHO) 2004
“Contraception in adolescence” (WHO) 2004“Contraception in adolescence” (WHO) 2004
“Contraception in adolescence” (WHO) 2004
 
tài liêu1
tài liêu1tài liêu1
tài liêu1
 
Urban Agriculture: Theory and Practice of Community Gardening
Urban Agriculture: Theory and Practice of Community GardeningUrban Agriculture: Theory and Practice of Community Gardening
Urban Agriculture: Theory and Practice of Community Gardening
 
CCSD Student Parent Handbook
CCSD Student Parent HandbookCCSD Student Parent Handbook
CCSD Student Parent Handbook
 
Informes recientes
Informes recientesInformes recientes
Informes recientes
 
Indigenous and Traditional Peoples and Climate Change
 Indigenous and Traditional Peoples and Climate Change Indigenous and Traditional Peoples and Climate Change
Indigenous and Traditional Peoples and Climate Change
 
Phase I – Literature Review
Phase I – Literature ReviewPhase I – Literature Review
Phase I – Literature Review
 

More from ZarlishAttique1

Automated and manual Primer designing and its validation using Bioinformatics...
Automated and manual Primer designing and its validation using Bioinformatics...Automated and manual Primer designing and its validation using Bioinformatics...
Automated and manual Primer designing and its validation using Bioinformatics...ZarlishAttique1
 
Genome sequencing and the development of our current information library
Genome sequencing and the development of our current information libraryGenome sequencing and the development of our current information library
Genome sequencing and the development of our current information libraryZarlishAttique1
 
QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship ZarlishAttique1
 
Zarlish attique 187104 project assignment modeller
Zarlish attique 187104 project assignment modellerZarlish attique 187104 project assignment modeller
Zarlish attique 187104 project assignment modellerZarlishAttique1
 
Receptor Effector coupling by G-Proteins Zarlish attique 187104
Receptor Effector coupling by G-Proteins Zarlish attique 187104 Receptor Effector coupling by G-Proteins Zarlish attique 187104
Receptor Effector coupling by G-Proteins Zarlish attique 187104 ZarlishAttique1
 
Computational phylogenetics theoretical concepts, methods with practical on C...
Computational phylogenetics theoretical concepts, methods with practical on C...Computational phylogenetics theoretical concepts, methods with practical on C...
Computational phylogenetics theoretical concepts, methods with practical on C...ZarlishAttique1
 

More from ZarlishAttique1 (7)

Automated and manual Primer designing and its validation using Bioinformatics...
Automated and manual Primer designing and its validation using Bioinformatics...Automated and manual Primer designing and its validation using Bioinformatics...
Automated and manual Primer designing and its validation using Bioinformatics...
 
Genome sequencing and the development of our current information library
Genome sequencing and the development of our current information libraryGenome sequencing and the development of our current information library
Genome sequencing and the development of our current information library
 
DBMS Helping material
DBMS Helping materialDBMS Helping material
DBMS Helping material
 
QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship
 
Zarlish attique 187104 project assignment modeller
Zarlish attique 187104 project assignment modellerZarlish attique 187104 project assignment modeller
Zarlish attique 187104 project assignment modeller
 
Receptor Effector coupling by G-Proteins Zarlish attique 187104
Receptor Effector coupling by G-Proteins Zarlish attique 187104 Receptor Effector coupling by G-Proteins Zarlish attique 187104
Receptor Effector coupling by G-Proteins Zarlish attique 187104
 
Computational phylogenetics theoretical concepts, methods with practical on C...
Computational phylogenetics theoretical concepts, methods with practical on C...Computational phylogenetics theoretical concepts, methods with practical on C...
Computational phylogenetics theoretical concepts, methods with practical on C...
 

Recently uploaded

Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxVarshiniMK
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Masticationvidulajaib
 

Recently uploaded (20)

Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptx
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Mastication
 

Phylogenetic tree construction using bioinformatics tools Zarlish attique 187104

  • 1. Report on Phylogenetic tree Introduction to bioinformatics Page 1 Government Postgraduate College Mandian Abbottabad. Report on Phylogenetic tree Subject: Introduction to Bioinformatics SUBMITTED BY: Name: Zarlish Attique Registration no: 187104 BS Bioinformatics Semester 04 SUBMITTED TO: Name: Sir Muhammad Rizwam Department: Bioinformatics Date: June,19,2020
  • 2. Report on Phylogenetic tree Introduction to bioinformatics Page 2 Contents 1, Phylogenetics:- ...................................................................................................................................4 1. Description:-................................................................................................................................4 2. Phylogenetic inference methods:-...............................................................................................4 3. About taxonomy:- ........................................................................................................................4 4. Brief History:- ..............................................................................................................................5 4.2. 1866,...................................................................................................................................6 5. Evolution:- ...................................................................................................................................6 2.Evolution of Bionformatics tools:- .......................................................................................................7 1.1. Bioinformatics experts ............................................................................................................7 1.2. Development and use of computational and an array of bioinformatics tools.......................7 3. Phylogenetic tree................................................................................................................................9 1. Computational phylogenetics:- ...................................................................................................9 2. Traditional phylogenetics and recent phylogenetics:-.................................................................9 3. Molecular data such as DNA sequence for genes and amino acid sequence for proteins:-... 10 4. Evolutionary history and relationship:-..................................................................................... 10 5. Phylogenetic tree is a graphical representation ...................................................................... 11 4. Types of Phylogenetic trees:-....................................................................................................... 12 5. Method for constructing Phylogenetic tree:-................................................................................ 15 .List of Methods for constructing trees:-.............................................................................................. 16 Character State method .............................................................................................................. 16 Method for validation of phylogenetic tree.................................................................................. 16 Table 1:-Representing mathods ...................................................................................................... 17 Online Softwares available for Phylogenetic analysis ......................................................................... 18 Desktop Software............................................................................................................................ 20 Libraries:-......................................................................................................................................... 21 Unweighted Pair Group Method with Arithmetic Mean ....................................................................... 22 1.1. Description:-......................................................................................................................... 22 Tree consisting of 6 OTUs............................................................................................................ 23 Another Example of UPGMA ........................................................................................................ 26 The Neighbor-Joining Method.............................................................................................................. 26 1. Note:-........................................................................................................................................... 27 Advantages and disadvantages of the neighbor-joining method .................................................... 30
  • 3. Report on Phylogenetic tree Introduction to bioinformatics Page 3 Maximum parsimony (MP):.................................................................................................................. 30 Character based Method..................................................................................................................... 30 Maximum-likelihood (ML): ............................................................................................................... 30 Bootstrapping:-.................................................................................................................................... 31 Multiple sequence alignment (MSA) ................................................................................................... 31 Description:-..................................................................................................................................... 31 Practical Section:-................................................................................................................................ 32 ClustalW........................................................................................................................................... 32 Access:-............................................................................................................................................ 32 ClustalW for Phylogeneetic tree construction:-.................................................................................... 33 6. ClustalW |Result Interpretation..................................................................................................... 39 Applications of Phylogenetic tree construction:- ................................................................................. 54 References .......................................................................................................................................... 60
  • 4. Report on Phylogenetic tree Introduction to bioinformatics Page 4 Phylogenetic tree 1, Phylogenetics:- 1. Description:- In biology, phylogenetics (Greek:– phylé, phylon = tribe, clan, race + genetikós = origin, source, birth) is a part of systematics that addresses the inference of the evolutionary history and relationships among or within groups of organisms (e.g. species, or more inclusive taxa). Figure 1 represents the derivation of phylogenetics. 2. Phylogenetic inference methods:- These relationships are hypothesized by phylogenetic inference methods that evaluate observed heritable traits, such as DNA sequences or morphology, often under a specified model of evolution of these traits. 3. About taxonomy:- Taxonomy is the identification, naming and classification of organisms. Classifications are now usually based on phylogenetic data, and many systematics contend that only monophyletic taxa should be recognized as named groups. 3.1. School of taxonomy:- The degree to which classification depends on inferred evolutionary history differs depending on the school of taxonomy: phenetics ignores phylogenetic speculation altogether, trying to represent the similarity between organisms instead; cladistics (phylogenetic systematics) tries to reflect phylogeny in its classifications by only recognizing groups based on shared, derived characters (synapomorphies); evolutionary taxonomy tries to take into account both the branching pattern and "degree of difference" to find a compromise between them. phylon = tribe, clan, race genetikós = origin, source, birth Phylogenetics the inference of the evolutionary history and relationships
  • 5. Report on Phylogenetic tree Introduction to bioinformatics Page 5 Figure represents the taxonomy of one of the example known as homo sepians. 4. Brief History:- The term "phylogeny" derives from the German Phylogenie, introduced by Haeckel in 1866, and the Darwinian approach to classification became known as the "phyletic" approach. 4.1. 1858 Heinrich Georg Bronn Paleontologist Heinrich Georg Bronn (1800–1862) published a hypothetical tree to illustrating the paleontological "arrival" of new, similar species following the extinction of an older species. Bronn did not propose a mechanism responsible for such phenomena, precursor concept.
  • 6. Report on Phylogenetic tree Introduction to bioinformatics Page 6 Branching tree diagram from Heinrich Georg Bronn's work (1858) 4.2.1866, Ernst Haeckel 1866, Ernst Haeckel, first publishes his phylogeny-based evolutionary tree, A precursor concept. Figure represents Phylogenetic tree suggested by Haeckel (1866). 5. Evolution:- Evolution is the change in heritable traits of biological organisms over generations due to natural selection, mutation, gene flow, and genetic drift. Also known as descent with
  • 7. Report on Phylogenetic tree Introduction to bioinformatics Page 7 modification. Over time these evolutionary processes lead to formation of new species (speciation), changes within lineages (anagenesis), and loss of species (extinction). Figure A diagram showing the relationships between various groups of organisms and concept of evolution. "Evolution" is also another name for evolutionary biology, the subfield of biology concerned with studying evolutionary processes that produced the diversity of life on Earth. 2.Evolution of Bionformatics tools:- 1.1.Bioinformatics experts Bioinformatics experts have developed a large collection of tools to make sense of the rapidly growing data related to molecular biology. Biological systems are complex and often need to combine data sets and use more than one tool to understand them. Therefore, bioinformatics experts have experimented with a number of strategies to try to integrate data sets and tools. Complex biological system usually requires gathering a variety of data from a variety of sources, so multiple tools are needed. Therefore, there is a clear need for technology that combines both data and tools to create a workflow that can be easily used by biologists. 1.2.Development and use of computational and an array of bioinformatics tools Development and use of computational and an array of bioinformatics tools provides the
  • 8. Report on Phylogenetic tree Introduction to bioinformatics Page 8 ability to analyze large data sets in practical computing times, and yielding an optimal or near-optimal solutions with high probability are being possible. In response to this trend, much of the current research in phyloinformatics (i.e., computational phylogenetics) concentrates on the development of more efficient heuristic approaches. Figure represents the data storage to computer with the evolution of Bioinformatics tools. ***“----The phylogenetic tree----”**** Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa. Figure The root of the tree of life.
  • 9. Report on Phylogenetic tree Introduction to bioinformatics Page 9 3. Phylogenetic tree ----- General Description ---- 1. Computational phylogenetics:- Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa. 1.1. Example:- For example, these techniques have been used to explore the family tree of gene α- hemoglobin and the relationships between specific genes. Figure The gene tree for the gene α-hemoglobin compared to the species tree. Both match because the gene evolved from common ancestors. 2. Traditional phylogenetics and recent phylogenetics:- Traditional phylogenetics relies on morphological data obtained by measuring and quantifying the phenotypic properties of representative organisms, while the more recent field of molecular phylogenetics uses nucleotide sequences encoding genes or amino acid sequences encoding proteins as the basis for classification. Many forms of molecular phylogenetics are closely related to and make extensive use of sequence alignment in constructing and refining phylogenetic trees, which are used to classify the evolutionary relationships between homologous genes represented in the genomes of divergent species. The phylogenetic trees constructed by computational methods are unlikely to perfectly reproduce the evolutionary tree that represents the historical relationships between the species being analyzed. The historical species tree may also differ from the historical tree of an individual homologous gene shared by those species.
  • 10. Report on Phylogenetic tree Introduction to bioinformatics Page 10 Figure Tree of life focused on the relation between human and apes. 3. Molecular data such as DNA sequence for genes and amino acid sequence for proteins:- Phylogenetic analysis using molecular data such as DNA sequence for genes and amino acid sequence for proteins is very common not only in the field of evolutionary biology but also in the wide fields of molecular biology. The reason is that DNA sequencing became very popular and a huge amount of sequence data of genes and proteins are available in the public online database. Since many molecules (genes or proteins) which have various evolutionary rates are available, it is important to choose the suitable molecule for the phylogenetic analysis of a given lineage. 3.1.Example:- For example, when the evolutionary rate of the gene (or protein) is too much higher for a given lineage, the substitution of nucleotide (or amino acid) is saturated. In this case, the accuracy of the phylogenetic analysis decreases. The methods for phylogenetic analysis are improving along with the evolution of computer science. Thus, there are many methods to infer phylogenetic tree, and many programs for each method are available. 4. Evolutionary history and relationship:- Phylogenetic analysis is a method to elucidate the evolutionary history and relationship among a group of organisms. In Past, phylogenetic analysis was based on morphological comparison among the fossils, but the information from fossils was limited. Now, molecular phylogenetic analysis using molecular data such as DNA or proteins become popular. 4.1. Reasons for popularity:- There are several reasons These include, (1) popularity of DNA sequencing method (2) establishment of methods for phylogenetic tree construction using gene or protein sequences (3) The results of a phylogenetic analysis being treated in a quantitative pattern (4) Availability of many programs for constructing phylogenetic tree.
  • 11. Report on Phylogenetic tree Introduction to bioinformatics Page 11 The knowledge from phylogenetic analysis contributes to basic biology (e.g. evolutionary history of species, the evolution of genes, and identification of sampled species) as well as applied biology (e.g. investigation of the route of the infection of pathogenic microorganisms). Phylogenetic trees are commonly constructed to figure out the evolutionary relationship among species. Selection of the molecules (genes or proteins) DNA sequences of genes, RNA sequences of functional RNA, or amino acid sequences of proteins are used for phylogenetic analysis. To choose the molecule for phylogenetic analysis, there are two focal points. First, the genes must be shared by all of the given species. Secondly, the genes have the proper evolutionary rates, because proteins have varied evolutionary rates (Miyata et al., 1980). If a species has a distant relationship, the molecule which has low evolutionary rate should be chosen. This is because nucleotide or amino acid substitution of gene or proteins reaches to saturation between distant species when the evolutionary rate is high. Note that nucleotide sequence of a gene is easy to reach to saturation than an amino acid sequence of the coded protein. In this case, housekeeping genes which have low evolutionary rate are suitable. 5. Phylogenetic tree is a graphical representation A phylogenetic tree is a graphical representation of the evolutionary relationships among entities that share a common ancestor. Those entities can be species, genes, genomes, or any other operational taxonomic unit (OTU). More specifically, a phylogenetic tree, with its pattern of branching, represents the descent from a common ancestor into distinct lineages. It is critical to understand that the branching patterns and branch lengths that make up a phylogenetic tree can rarely be observed directly, but rather they must be inferred from other information. The principle underlying phylogenetic inference is quite simple: Analysis of the similarities and differences among biological entities can be used to infer the evolutionary history of those entities. Figure The gene tree for the gene Glycosyl Hydrolase compared to the species tree. The trees do not match because of the horizontal gene transfer (HGT).
  • 12. Report on Phylogenetic tree Introduction to bioinformatics Page 12 4. Types of Phylogenetic trees:- The branches of a phylogenetic tree may be represented two different ways: 1.1.Scaled and Unscaled Trees Scaled branches Branches will be different lengths based on the number of evolutionary changes or distance. Unscaled branches All branches in the tree are the same length. Figure represents the scaled and unscaled branches trees. Species and Gene Trees Species Trees “Species” Trees recover the genealogy of taxa, individuals of a population, etc. Internal nodes represent speciation or other taxonomic events. Species trees should contain sequences from only orthologous genes. Gene Trees Gene trees represent the evolutionary history of the genes included in the study. Gene trees can provide evidence for gene duplication events, as well as speciation events.Sequences from different homologs can be included in a gene tree; the subsequent analyses should cluster orthologs, thus demonstrating the evolutionary history of the orthologs.
  • 13. Report on Phylogenetic tree Introduction to bioinformatics Page 13 Rooted versus Unrooted Trees Rooted phylogenetic tree In a rooted phylogenetic tree, each node with descendants represents the inferred most recent common ancestors of the descendants, and the edge lengths in some trees may be interpreted as time estimates.In rooted tress the ancestral state of organisms or genes is shown at the bottom of the tree, and the tree branches, or bifurcates until it reaches the terminal branches, tips or leaves at the top of the tree. Rooted trees shows the most basal ancestor of the tree. Rooted trees reflect the most basal ancestor of the tree in question. There are competing techniques for rooting a tree; one of the most common methods is through the use of an "outgroup" (The Parsimony Methods). Unrooted phylogenetic tree Unrooted phylogenetic tree does not show an ancestral root.Unrooted binary tree is unrooted tree in which each vertex has either one or three neighbors. Unrooted trees represents the branching order but do not indicate the root or location of the last common ancestor. Unrooted trees shows the relatedness of organisms without indicating ancestry.
  • 14. Report on Phylogenetic tree Introduction to bioinformatics Page 14 Figure represents the unrooted tree with unscaled branches. 2. Terms used to describe rooted and unrooted tree:- 1.1.Clade An ancestor (an organism, population, or species) and all of its descendants. 1.1.1. Sister clade One member of a pair of clades originating when a single lineage splits into two. Sister clades thus share an exclusive common ancestry and are mutually most closely related to one another in terms of common ancestry. 1.2.Ancestor An entity from which another entity is descended 1.3.Node A point or vertex on a tree (in the sense of graph theory). On a phylogenetic tree, a node is commonly used to represent (1) the split of one lineage to form two or more lineages (internal node) or the extinction of a lineage (terminal node) or the lineage at a specified time, often the present (terminal node), or (2) a taxon, whether ancestral (internal node) or descendant (internal node or terminal node). 1.4.Root The root of the tree represents the ancestral lineage, and the tips of the branches represent the descendants of that ancestor 1.5.Leaf Each leaf on a phylogenetic tree represents a taxon.
  • 15. Report on Phylogenetic tree Introduction to bioinformatics Page 15 Figure represents terms used to describe rooted and unrooted tree. 5. Method for constructing Phylogenetic tree:- Summary:- (1)The first step in doing phylogenetics is to choose the sequences from which the tree should be constructed. Very popular sequences to construct phylogenetic trees are the sequences of rRNA (the RNA the ribosome is build of) and mitochondrial genes.These genetic material is present in almost all organisms and they have enough mutations to reliably construct a tree. (2)The second step is to construct pairwise and multiple sequence alignments from these sequences. (3)The third step is to choose a method for constructing a phylogenetic tree. There exist 3 categories: distance-based, maximum parsimony, and maximum likelihood. Maximum parsimony should be chosen for strong sequence similarities because too much variation results in many possible trees. For the same reason only few sequences (less than 15) should be used. Distance based methods (e.g. clustalW) require less similarity among the sequences than maximum parsimony methods but sequence similarities should be present. Some sequences should be similar to one another and others are less similar. Distance based methods can be applied to a set of many sequences. Maximum likelihood methods may be used for very variable sequences but the computational costs increase with the number of sequences as every possible tree must be considered.
  • 16. Report on Phylogenetic tree Introduction to bioinformatics Page 16 Figure represents the method for constructing phylogenetic tree. This method is use in the practical section(mention section name) List of Methods for constructing trees:- Distance matrix method 1.UPGMA 2.Transfromed distance method 3.Neighbor’s Relation method 4.Neighbor joining method 5.Fitch and Margoliash method Character State method 1.Maximum likelihood approach Method for validation of phylogenetic tree 1.Bootstrapping 2.Felsenstein’s bootstrap test
  • 17. Report on Phylogenetic tree Introduction to bioinformatics Page 17 Figure represent the methods for constructing phylogenetic tree. Table 1:-Representing mathods Method Advantage Disadvantage Other information Maximum parsimony Appropriate for very similar sequences and a small number of sequences Very time-consuming as it tests all possible trees Parsimony may fail for diverged sequences Suffers from the long- branch attraction Predict the evolutionary tree that minimizes the number of steps required to generate the observed variation in the sequences It is built with the fewest changes required to explain (tree) the differences observed in the data Maximum likelihood Suitable for very dissimilar sequences We can formulate hypothesis about evolutionary relationships A slow search algorithm will lead to slow response Takes a long time for large datasets It tries to find a model that has the highest probability to generate the input sequence under a given evolutionary model Methods for constructing trees Distance matrix method Character State method validation of phylogenetic tree
  • 18. Report on Phylogenetic tree Introduction to bioinformatics Page 18 More accurate phylogenetic trees can be constructed for a small number of taxa in a reasonable time frame Neighbour joining Faster than the character- based method They are fast and can be used with a variety of models Conversion from sequence data to distance data leads to loss of information Provides an unrooted tree and a single resultant tree UPGMA Reliable for related sequences Evolution rate is constant in all branches UPGMA provides rooted tree Fitch Mangrolish Less sensitive to variations in evolutionary rate Dependent on the model used to obtain the distance matrix Online Softwares available for Phylogenetic analysis This list of phylogenetic tree viewing software is a compilation of software tools and web portals used in visualising phylogenetic trees. Softwares:- Name Description Aquapony Javascript tree viewer for Beast ETE toolkit Tree Viewer an online tool for phylogenetic tree view (newick format) that allows multiple sequence alignments to be shown together with the trees (fasta format) EvolView an online tool for visualizing, annotating and managing phylogenetic trees
  • 19. Report on Phylogenetic tree Introduction to bioinformatics Page 19 IcyTree Client-side Javascript SVG viewer for annotated rooted trees. Also supports phylogenetic networks Iroki Automatic customization and visualization of phylogenetic trees iTOL - interactive Tree Of Life annotate trees with various types of data and export to various graphical formats; scriptable through a batch interface Microreact Link, visualise and explore sequence and meta-data using phylogenetic trees, maps and timelines OneZoom uses IFIG (Interactive Fractal Inspired Graphs) to display phylogenetic trees which can be zoomed in on to increase detail Phylo.io View and compare up to 2 trees side by side with interactive HTML5 visualisations PhyloExplorer a tool to facilitate assessment and management of phylogenetic tree collections. Given an input collection of rooted trees, PhyloExplorer provides facilities for obtaining statistics describing the collection, correcting invalid taxon names, extracting taxonomically relevant parts of the collection using a dedicated query language, and identifying related trees in the TreeBASEdatabase. PHYLOViZ Online Web-based tool for visualization, phylogenetic inference, analysis and sharing of minimum spanning trees PhyloWidget view, edit, and publish phylogenetic trees online; interfaces with databases T-REX (Webserver) Tree inference and visualization (hierarchical, radial and axial tree views), Horizontal gene transfer detection and HGT network visualization TidyTree A client-side HTML5/SVG Phylogenetic Tree Renderer, based on D3.js TreeVector scalable, interactive, phylogenetic trees for the web, produces dynamic SVG
  • 20. Report on Phylogenetic tree Introduction to bioinformatics Page 20 or PNG output, implemented in Java Desktop Software Name Description OS1 ARB An integrated software environment for tree visualisation and annotation LM Archaeopteryx Java tree viewer and editor (used to be ATV) BioNumerics Universal platform for the management, storage and analysis of all types of biological data, including tree and network inference of sequence data W Bio::Phylo A collection of Perl modules for manipulating and visualizing phylogenetic data. Bio::Philo is one part of a comprehensive suite of Perl biology tools All Dendroscope An interactive viewer for large phylogenetic trees and networks All DensiTree A viewer capable of viewing multiple overlaid trees. All JEvTrace A multivalent browser for sequence alignment, phylogeny, and structure. Performs an interactive Evolutionary Trace[21] and other phylogeny-inspired analysis. All MEGA Software for statistical analysis of molecular evolution. It includes different tree visualization features All MultiDendrograms Interactive open-source application to calculate and plot phylogenetic trees All PHYLOViZ Phylogenetic inference and data visualization for allelic/SNP sequences profiles using Minimum Spanning Trees All
  • 21. Report on Phylogenetic tree Introduction to bioinformatics Page 21 TreeDyn Open-source software for tree manipulation and annotation allowing incorporation of meta information All Treevolution Open-source tool for circular visualization with section and ring distortion and several other features such as branch clustering and pruning All TreeGraph 2 Open-source tree editor with numerous editing and formatting operations including combining different phylogenetic analyses All TreeView Treeviewing software All UGENE An opensource visual interface for Phylip 3.6 package All "All" refers to Microsoft Windows, Apple OSX and Linux; L=Linux, M=Apple Mac, W=Microsoft Windows Libraries:- Name Language Description ggtree R An R package for tree visualization and annotation with grammar of graphics supported jsPhyloSVG Javascript open-source javascript library for rendering highly-extensible, customizable phylogenetic trees; used for Elsevier's interactive trees PhyD3 Javascript interactive phylogenetic tree visualization with numerical annotation graphs, with SVG or PNG output, implemented in D3.js phylotree.js Javascript phylotree.js is a library that extends the popular data visualization framework D3.js, and is suitable for building JavaScript applications where users can view and interact with phylogenetic trees Phytools R Phylogenetic Tools for Comparative Biology (and Other Things)
  • 22. Report on Phylogenetic tree Introduction to bioinformatics Page 22 based in R toytree Python Toytree: A minimalist tree visualization and manipulation library for Python Methods for Phylogenetic tree:- Distance matrix its advantages and disadvantages Distance-matrix methods of phylogenetic analysis explicitly rely on a measure of "genetic distance" between the sequences being classified, and therefore they require an MSA (multiple sequence alignment) as an input. Distance is often defined as the fraction of mismatches at aligned positions, with gaps either ignored or counted as mismatches. Distance methods attempt to construct an all-to-all matrix from the sequence query set describing the distance between each sequence pair. From this is constructed a phylogenetic tree that places closely related sequences under the same interior node and whose branch lengths closely reproduce the observed distances between sequences. Distance-matrix methods may produce either rooted or unrooted trees, depending on the algorithm used to calculate them. They are frequently used as the basis for progressive and iterative types of multiple sequence alignment. The main disadvantage of distance-matrix methods is their inability to efficiently use information about local high-variation regions that appear across multiple subtrees. Unweighted Pair Group Method with Arithmetic Mean 1.1.Description:- UPGMA: Unweighted Pair Group Method with Arithmetic Mean: A simple clustering method that assumes a constant rate of evolution (molecular clock hypothesis). It needs a distance matrix of the analysed taxa that can be calculated from a multiple alignment. UPGMA stands for : Unweighted Pair-Group Method with Arithmetic mean Unweighted – all pairwise distances contribute equally. Pair-Group – groups are combined in pairs (dichotomies only). Arithmetic mean – pairwise distances to each group (clade) are mean distances to all members of that group. 1.2.Construction of a distance tree using clustering with the Unweighted Pair Group Method with Arithmatic Mean (UPGMA). The UPGMA is the simplest method of tree construction. It was originally developed for constructing taxonomic phenograms, i.e. trees that reflect the phenotypic similarities between Operational taxonomic units OTUs, but it can also be used to construct phylogenetic trees if the rates of evolution are approximately constant
  • 23. Report on Phylogenetic tree Introduction to bioinformatics Page 23 among the different lineages. For this purpose the number of observed nucleotide or amino-acid substitutions can be used. UPGMA employs a sequential clustering algorithm, in which local topological relationships are identifeid in order of similarity, and the phylogenetic tree is build in a stepwise manner. We first identify from among all the OTUs the two OTUs that are most similar to each other and then treat these as a new single OTU. Such a OTU is referred to as a composite OTU. Subsequently from among the new group of OTUs we identify the pair with the highest similarity, and so on, until we are left with only two Tree consisting of 6 OTUs UTUs.Suppose we have the following tree consisting of 6 OTUs: The pairwise evolutionary distances are given by the following distance matrix: A B C D E B 2 C 4 4 D 6 6 6 E 6 6 6 4 F 8 8 8 8 8
  • 24. Report on Phylogenetic tree Introduction to bioinformatics Page 24 1.1.1. Step 1:- We now cluster the pair of OTUs with the smallest distance, being A and B, that are separated a distance of 2. The branching point is positioned at a distance of 2 / 2 = 1 substitution. We thus constuct a subtree as follows: Following the first clustering A and B are considered as a single composite OTU(A,B) and we now calculate the new distance matrix as follows: dist(A,B),C = (distAC + distBC) / 2 = 4 dist(A,B),D = (distAD + distBD) / 2 = 6 dist(A,B),E = (distAE + distBE) / 2 = 6 dist(A,B),F = (distAF + distBF) / 2 = 8 In other words the distance between a simple OTU and a composite OTU is the average of the distances between the simple OTU and the constituent simple OTUs of the composite OTU. Then a new distance matrix is recalculated using the newly calculated distances and the whole cycle is being repeated: 1.1.2. Step 2:- A,B C D E C 4 D 6 6 E 6 6 4 F 8 8 8 8 1.1.3. Step 3:- A,B C D,E C 4 D,E 6 6 F 8 8 8 1.1.4. Step 4:- AB,C D,E D,E 6 F 8 8
  • 25. Report on Phylogenetic tree Introduction to bioinformatics Page 25 1.1.5. Step 5:- The final step consists of clustering the last OTU, F, with the composite OTU. ABC,DE F 8 Although this method leads essentially to an unrooted tree, UPGMA assumes equal rates of mutation along all the branches, as the model of evolution used. The theoretical root, therefore, must be equidistant from all OTUs. We can here thus apply the method of mid-point rooting. The root of the entire tree is then positioned at dist (ABCDE),F / 2 = 4. 1.1.6. Final tree:- The final tree as inferred by using the UPGMA method is shown below. So now we have reconstructed the phylogenetic tree using the UPGMA method. As you can see we have obtained the original phylogenetic tree we started with. In bioinformatics, UPGMA is used for the creation of phenetic trees (phenograms). UPGMA was initially designed for use in protein electrophoresis studies, but is currently most often used to produce guide trees for more sophisticated algorithms. This algorithm is for example used in sequence alignment procedures, as it proposes one order in which the sequences will be aligned. Indeed, the guide tree aims at grouping the most similar sequences, regardless of their evolutionary rate or phylogenetic affinities, and that is exactly the goal of UPGMA.
  • 26. Report on Phylogenetic tree Introduction to bioinformatics Page 26 Another Example of UPGMA The Neighbor-Joining Method 1.1. Description:- Neighbour-joining (NJ): Bottom-up clustering method that also needs a distance matrix. NJ is a heuristic approach that does not guarantee to find the perfect result, but under normal conditions has a very high probability to do so. It has a very good computational efficiency, making it well suited for large datasets.
  • 27. Report on Phylogenetic tree Introduction to bioinformatics Page 27 1.2. The Neighbor-Joining Method Neighbor-joining (Saitou and Nei, 1987) is a method that is related to the cluster method but does not require the data to be ultrametric. In other words it does not require that all lineages have diverged by equal amounts. The method is especially suited for datasets comprising lineages with largely varying rates of evolution. It can be used in combination with methods that allow correction for superimposed substitutions. 1.3. History Created by Naruya Saitou and Masatoshi Nei in 1987. Usually used for trees based on DNA or protein sequence data, the algorithm requires knowledge of the distance between each pair of taxa (e.g., species or sequences) to form the tree. 1.4. Programs The following programs are available  Neighbor of the Phylip package (Jo Felsentein, Univ. Washington),  ClustalW (D. Higgins, EMBL) ,  Distnj in the Protml package (Adachi and Hasegawa, Univ. Tokyo) 1.5. Star decomposition method The neighbor-joining method is a special case of the star decomposition method. In contrast to cluster analysis neighbor-joining keeps track of nodes on a tree rather than taxa or clusters of taxa. The raw data are provided as a distance matrix and the initial tree is a star tree. Then a modified distance matrix is constructed in which the separation between each pair of nodes is adjusted on the basis of their average divergeance from all other nodes. The tree is constructed by linking the least-distant pair of nodes in this modified matrix. When two nodes are linked, their common ancestral node is added to the tree and the terminal nodes with their respective branches are removed from the tree. This pruning process converts the newly added common ancestor into a terminal node on a tree of reduced size. At each stage in the process two terminal nodes are replaced by one new node. The process is complete when two nodes remain, separated by a single branch.. 1.6. Note:- NB: especially its suitability to handle large datasets has led to the fact that the method is widely used by molecular evolutionists. With the rapid growth of sequence databases it is still one of the few methods that allows the rapid inclusion of all homologous sequences present in the database in a single tree. A good example can be found in the Ribosomal Database Project that maintains a tree of life based on all available ribosomal RNA sequences. Example of the method Suppose we have the following tree:
  • 28. Report on Phylogenetic tree Introduction to bioinformatics Page 28 Since B and D have accumulated mutations at a higher rate than A. The Three-point criterion is violated and the UPGMA method cannot be used since this would group together A and C rather than A and B. In such a case the neighbor-joining method is one of the recommended methods. The raw data of the tree are represented by the following distance matrix: A B C D E B 5 C 4 7 D 7 10 7 E 6 9 6 5 F 8 11 8 9 8 We have in total 6 OTUs (N=6). Step 1:- We calculate the net divergence r (i) for each OTU from all other OTUs r(A) = 5+4+7+6+8=30 r(B) = 42 r(C) = 32 r(D) = 38 r(E) = 34 r(F) = 44 Step 2:- Now we calculate a new distance matrix using for each pair of OUTs the formula: M(ij)=d(ij) - [r(i) + r(j)]/(N-2) or in the case of the pair A,B:M(AB)=d(AB) -[(r(A) + r(B)]/(N-2) = -13 A B C D E B -13
  • 29. Report on Phylogenetic tree Introduction to bioinformatics Page 29 Now we start with a star tree: A F | B | / | / |/ /| / | / | E | C D Step 3:- Now we choose as neighbors those two OTUs for which Mij is the smallest. These are A and B and D and E. Let's take A and B as neighbors and we form a new node called U. Now we calculate the branch length from the internal node U to the external OTUs A and B. S(AU) =d(AB) / 2 + [r(A)-r(B)] / 2(N-2) = 1 S(BU) =d(AB) -S(AU) = 4 Step 4: Now we define new distances from U to each other terminal node: d(CU) = d(AC) + d(BC) - d(AB) / 2 = 3 d(DU) = d(AD) + d(BD) - d(AB) / 2 = 6 d(EU) = d(AE) + d(BE) - d(AB) / 2 = 5 d(FU) = d(AF) + d(BF) - d(AB) / 2 = 7 and we create a new matrix: U C D E C 3 C - 11.5 - 11.5 D -10 -10 - 10.5 E -10 -10 - 10.5 -13 F - 10.5 - 10.5 -11 - 11.5 - 11.5
  • 30. Report on Phylogenetic tree Introduction to bioinformatics Page 30 D 6 7 E 5 6 5 F 7 8 9 8 The resulting tree will be the following: C D | | A |___/ 1 /| / | 4 E | F B N= N-1 = 5 The entire procedure is repeated starting at step 1 Advantages and disadvantages of the neighbor-joining method  Advantages o is fast and thus suited for large datasets and for bootstrap analysis o permist lineages with largely different branch lengths o permits correction for multiple substitutions  Disadvantages o sequence information is reduced o gives only one possible tree strongly dependent on the model of evolution used Maximum parsimony (MP): This method tries to create a phylogeny that requires the least evolutionary change. It may suffer from long branch attraction, a problem that leads to incorrect trees in rapidly evolving lineages (Felsenstein, 1978). Character based Method Maximum-likelihood (ML): ML uses a statistical approach to infer a phylogenetic tree. ML is well suited for the analysis of distantly related sequences, but is computationally expensive and thus not that well suited for larger input data.
  • 31. Report on Phylogenetic tree Introduction to bioinformatics Page 31 Bootstrapping:- Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods. Bootstrapping and jackknifing are statistical methods to evaluate and distinguish the confidence of partial hypotheses (“branch support”) that are contained in a phylogenetic tree and have become a standard in molecular phylogenetic analyses. Multiple sequence alignment (MSA) Description:- A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. Analysis And uses:- From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations (single amino acid or nucleotide changes) that appear as differing characters in a single alignment column, and insertion or deletion mutations (indels or gaps) that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides. Sequence set:- Multiple sequence alignment also refers to the process of aligning such a sequence set. Because three or more sequences of biologically relevant length can be difficult and are almost always time-consuming to align by hand, computational algorithms are used to produce and analyze the alignments. MSAs require more sophisticated methodologies than pairwise alignment because they are more computationally complex. Most multiple sequence alignment programs use heuristic methods rather than global optimization because identifying the optimal alignment between more than a few sequences of moderate length is prohibitively computationally expensive.
  • 32. Report on Phylogenetic tree Introduction to bioinformatics Page 32 Figure represents first 90 positions of a protein multiple sequence alignment of instances of the acidic ribosomal protein P0 (L10E) from several organisms. Generated with ClustalX. Practical Section:- ClustalW: Clustal is a series of widely used computer programs used in Bioinformatics for multiple sequence alignment. The third generation, released in 1994, greatly improved upon the previous versions. It improved upon the progressive alignment algorithm in various ways, including allowing individual sequences to be weighted down or up according to similarity or divergence respectively in a partial alignment. It also included the ability to run the program in batch mode from the command line. Access:- ClustalW can access from both NCBI(National Center for biotechnology) and EMBL(European Management Biology Laborataory)
  • 33. Report on Phylogenetic tree Introduction to bioinformatics Page 33 Figure represents ClustalW can access from both NCBI(National Center for biotechnology) and EMBL(European Management Biology Laborataory) Website link:-url to get homepage of ClustalW https://www.genome.jp/tools-bin/clustalw ClustalW for Phylogeneetic tree construction:- 1.Access ClustalW Open ClustalW through website. When we open this two different types of distribution will be there as shown in Figure – and - Figure represents the homepage Distribution 1 of ClustalW CLUSTALW NCBI EMBL
  • 34. Report on Phylogenetic tree Introduction to bioinformatics Page 34 Figure represents the homepage Distribution 2 of ClustalW 2. Important information of Homepage:- 2nd part of distribution should be taken as default or according to need. 3. Retreival of sequence:- In the third step, retrieve the sequence of pqqc gene in FASTA format for multiple sequence alignement need for construction of phylogenetic tree using Nucleotide database. In which form you need an output Choose according to need but slow and accurate is recommended The sequence of interest is in DNA or Protein Choose the file or paste to execute Click Directly on Execute Search for pqqc gene
  • 35. Report on Phylogenetic tree Introduction to bioinformatics Page 35 Figure represents the NCBI homepage use for sequence retrieval,here use pqqc gene. Pyrroloquinoline Quinone Biosynthesis Gene pqqC. Figure represents the pqqC gene in FASTA format.
  • 36. Report on Phylogenetic tree Introduction to bioinformatics Page 36 4.Use of BLASTn BLAST it and it will take us to its output page
  • 37. Report on Phylogenetic tree Introduction to bioinformatics Page 37 Figure represents results of BLASTn and Selection of sequence according to need but should be 5-3. In this section select 18 sequences and download in FASTA format.
  • 38. Report on Phylogenetic tree Introduction to bioinformatics Page 38 Figure represents all the sequences present in notepad. 5.Multiple Sequence Alignment from ClustalW All the aligned sequences are now placed in this software to execute and provide MSA. Choose the sequence file from computer
  • 39. Report on Phylogenetic tree Introduction to bioinformatics Page 39 6. ClustalW |Result Interpretation Alignment results.
  • 40. Report on Phylogenetic tree Introduction to bioinformatics Page 40 1.Sequence number 2.Accessiom id
  • 41. Report on Phylogenetic tree Introduction to bioinformatics Page 41 1.Sequence aligmment number 2.Next to it is Score
  • 42. Report on Phylogenetic tree Introduction to bioinformatics Page 42 1.After alignment it forms group according to similarity of sequences 2.Next to it is Score
  • 43. Report on Phylogenetic tree Introduction to bioinformatics Page 43  *Histeric represents homology or similarity and conserved ( ) gap represent gap or mismatich. ---- represents the stretch of sequence. Accessiom id
  • 44. Report on Phylogenetic tree Introduction to bioinformatics Page 44 In last we have clustal dendrolgrams 7. Clustal dendrolgrams/Tree Construction:- 8. Booststrapping:- Here boostrap value 500 upto 1000. Means 1000 times tool runs and provide results.
  • 45. Report on Phylogenetic tree Introduction to bioinformatics Page 45 Figure represents the five trees that we can construct through ClustalW Here we have 5 tres 1.Fast Tree 2.FastTree full 3.PhyML 4.PhyML bootsrap 5.RAxML 5.RAxML bootstrap 1.Choose PhyML bootsrap Figure represents waiting. https://www.genome.jp/tools- bin/ete?id=20061915421388fff0d53d9d4b01b1b05494af7d14ba283099c8
  • 46. Report on Phylogenetic tree Introduction to bioinformatics Page 46 Figure represents Tree that we get. It shows its relation with ancestors. As we take 18 sequences on the basis of that resemblence it construct a tree which shows its interaction as well as phylogenetic history or ancestors shows its relation with them. Subtool use in CLUSTALW Method we use
  • 47. Report on Phylogenetic tree Introduction to bioinformatics Page 47 Figure represents the outpage page of CLUSTALW. Figure represents the the phylogram (phylogenetic tree ) that we get from the sequences with accession number.These are 17 sequences that we aligned in CLUSTALW. 3. How to read this tree and Applying Filters and save its PNG form in computer. These are accession numbers written.
  • 48. Report on Phylogenetic tree Introduction to bioinformatics Page 48 Boostrap reverifired the tree or results of alignment as well. Base of tree represents the ancestors start and move. branch starts and end at leaf or clade. These are Boostrap value. Circle in the last of branch is called leaf Branch /Branch length Base of tree/root
  • 49. Report on Phylogenetic tree Introduction to bioinformatics Page 49 Represents Clade. With percentage with boostrip value next to accession number
  • 50. Report on Phylogenetic tree Introduction to bioinformatics Page 50
  • 51. Report on Phylogenetic tree Introduction to bioinformatics Page 51
  • 52. Report on Phylogenetic tree Introduction to bioinformatics Page 52
  • 53. Report on Phylogenetic tree Introduction to bioinformatics Page 53 PNG form of phylogenetic tree. https://www.genome.jp/tools- bin/ete?id=20061915421388fff0d53d9d4b01b1b05494af7d14ba283099c8
  • 54. Report on Phylogenetic tree Introduction to bioinformatics Page 54 Applications of Phylogenetic tree construction:- 1. The inference of phylogenies with computational methods has many important applications in medical and biological research, such as drug discovery and conservation biology. Figure represents important applications in medical and biological research. 2. A result published by Korber et al. that times the evolution of the HIV-1 virus, demonstrates that ML techniques can be effective in solving biological problems. Figure represents phylogenetics tree in evolution. 3. Phylogenetic trees have already witnessed applications in numerous practical domains, such as in conservation biology (illegal whale hunting), epidemiology (predictive evolution), forensics (dental practice HIV transmission), gene function prediction and
  • 55. Report on Phylogenetic tree Introduction to bioinformatics Page 55 drug development. Figure represents gene function prediction. 4. Other applications of phylogenies include multiple sequence alignment protein structure prediction ,gene and protein function prediction and drug design Figure represents phylogenies include multiple sequence alignment. 5. A paper by Bader et al. addresses important industrial applications of phylogenetic trees, e.g. in the area of commercial drug discovery. Figure represents important industrial applications of phylogenetic trees
  • 56. Report on Phylogenetic tree Introduction to bioinformatics Page 56 6. Due to the rapid growth of available sequence data over recent years and the constant improvement of multiple alignment methods, it has now become feasible to compute very large trees which comprise more than 1,000 organisms. Figure represents Multiple sequence alignment of large no of organisms. 7. The computation of the tree-of life containing representatives of all living beings on earth is considered to be one of the grand challenges in Bioinformatics.|
  • 57. Report on Phylogenetic tree Introduction to bioinformatics Page 57 | Figure represent tree of life. 8. Some large multi-institutional/multidisciplinary projects are underway which aim at building the tree of life: CIPRES (Cyber Infrastructure for Phylogenetic Research www.phylo.org) and ATOL (Assembling the Tree of Life project, tolweb.org). 9. Cancer research is considered one of the most significant areas in the medical community. Mutations in genomic sequences are responsible for cancer development and increased aggressiveness in patients The combination of all such genes mutations, or progression pathways, across a population can be summarized in a phylogeny describing the different evolutionary pathways.
  • 58. Report on Phylogenetic tree Introduction to bioinformatics Page 58 Figure represents Cancer evolutionary tree 10. Application of the phylogenetic tree can be explored for finding similarities among breast cancer subtypes based on gene data. 11. Discovery of genes associated in cancer subtype help researchers to map different pathways to classify cancer subtypes according to their mutations. 12. Methods of phylogenetic tree inference have proliferated in cancer genome studies such as breast cancer. 13. Phylogenetic can capture important mutational events among different cancer types; a network approach can also capture tumour similarities. Figure respresents phylogenetic tree in mutation . 14. It has been observed from the literature that in cancer disease, the driver genes change the cancer progression, and it even affects the participation of other genes thus generating
  • 59. Report on Phylogenetic tree Introduction to bioinformatics Page 59 gene interaction network. Figure represents phylogenetic tree and gene interaction network. 15. Phylogenetic methods can solve the problem of class prediction by using a classification tree. Phylogenetic methods give us a deeper understanding of biological heterogeneity among cancer subtype.
  • 60. Report on Phylogenetic tree Introduction to bioinformatics Page 60 References https://en.wikipedia.org/wiki/Phylogenetic_tree https://microbenotes.com/how-to-construct-a-phylogenetic-tree/ http://www.eurekaselect.com/85739/article https://en.ppt-online.org/762081 https://www.slideshare.net/FaisalHussain23/phylogenetic-tree-types-and-applicantion-75067233 http://www.eurekaselect.com/85739/article https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1667096/ https://www.bioinf.jku.at/teaching/current/ws_sapvl/BioInf_I_Notes.pdf http://www.master-bioinformatik.at/curriculum/BioInf_I_Notes.pdf http://evolution-textbook.org/content/free/contents/Chapter_27_Web.pdf https://bip.weizmann.ac.il/education/course/introbioinfo/03/lect12/phylogenetics.pdf https://www.slideshare.net/pscad123/phylogenetic-analysis https://en.wikipedia.org/wiki/List_of_phylogenetics_software https://en.wikipedia.org/wiki/Computational_phylogenetics https://microbenotes.com/how-to-construct-a-phylogenetic-tree/ https://www.ncbi.nlm.nih.gov/Class/NAWBIS/Modules/Phylogenetics/phylo8.html https://academic.oup.com/sysbio/article/62/4/625/1615980 https://slideplayer.com/slide/11460053/ https://www.researchgate.net/figure/Cladogram-I-Phylogram-II-Dendrogram-III_fig4_30072891 https://omictools.com/phylogenetics-and-phylogenomics-category https://en.wikipedia.org/wiki/UPGMA https://www.icp.ucl.ac.be/~opperd/private/upgma.html https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7123334/pdf/978-981-13-5934- 7_Chapter_17.pdf https://www.ncbi.nlm.nih.gov/nuccore/KM251418.1?report=fasta
  • 61. Report on Phylogenetic tree Introduction to bioinformatics Page 61