The document outlines the basic steps in constructing a phylogenetic tree:
1) Assembling and aligning a dataset of DNA or protein sequences of interest.
2) Using computational methods and evolutionary models to build phylogenetic trees from the sequence alignments.
3) Statistically testing and assessing the estimated trees to evaluate which tree topologies best describe the phylogenetic relationships between the sequences.
The process aims to provide a visual representation of how organisms have evolved from a common ancestor over time based on analyses of genetic similarities and differences in their molecular sequences.
4. How To Construct A Phylogenetic Tree
A phylogenetic tree is a visual representation of the relationship between
different organisms, showing the path through evolutionary time from a
common ancestor to different descendants.
Similarities and divergence among related biological sequences revealed by
sequence alignment often have to be rationalized and visualized in the
context of phylogenetic trees. Thus, molecular phylogenetics is a
fundamental aspect of bioinformatics.
Molecular phylogenetics is the branch of phylogeny that analyzes genetic,
hereditary molecular differences, predominately in DNA sequences, to gain
information on an organism’s evolutionary relationships.
5. How To Construct A Phylogenetic Tree
The similarity of biological functions and molecular mechanisms in living
organisms strongly suggests that species descended from a common ancestor.
Molecular phylogenetics uses the structure and function of molecules and how
they change over time to infer these evolutionary relationships.
From these analyses, it is possible to determine the processes by which
diversity among species has been achieved. The result of a molecular
phylogenetic analysis is expressed in a phylogenetic tree.
6. Assemble and align a dataset
• The first step is to identify a protein or DNA sequence of interest and assemble
a dataset consisting of other related sequences.
• DNA sequences of interest can be retrieved using NCBI BLAST or similar
search tools.
• Once sequences are selected and retrieved, multiple sequence alignment is
created.
• This involves arranging a set of sequences in a matrix to identify regions of
homology.
• There are many websites and software programs, such as ClustalW, MSA,
MAFFT, and T-Coffee, designed to perform multiple sequence on a given set of
molecular data.
7. Build (estimate) phylogenetic trees
Build (estimate) phylogenetic trees from sequences using computational
methods and stochastic models
• To build phylogenetic trees, statistical methods are applied to determine the
tree topology and calculate the branch lengths that best describe the
phylogenetic relationships of the aligned sequences in a dataset.
• The most common computational methods applied include distance-matrix
methods, and discrete data methods, such as maximum parsimony and
maximum likelihood.
• There are several software packages, such as Paup, PAML, PHYLIP, that apply
these most popular methods.
8. Statistically test and assess the
estimated trees.
Tree estimating algorithms generate one or more optimal trees.
This set of possible trees is subjected to a series of statistical tests to
evaluate whether one tree is better than another – and if the proposed
phylogeny is reasonable.
Common methods for assessing trees include the Bootstrap and Jackknife
Resampling methods, and analytical methods, such as parsimony, distance,
and likelihood.
9. Human Phylogenetic Tree
As you can see in the diagram, every species or individual
(in this case) has a common ancestor, and that is your
grandparent. Then it splits up into two branches: your
parent and your aunt (sibling of your parent). And then
you, your sibling, and your cousins have a unique history
because you were born from different parents yet have the
same and common ancestor as your grandparent.
10. Animal Phylogenetic Tree
As you can see in the diagram, all animals have the same common
ancestor, but they are divided because of their different
characteristics. These characteristics are jaws, lungs,
gizzards, and feathers. Thus, these characteristics
differentiate between the animals mentioned in the diagram that
nevertheless have the same ancestor.
11. Programs
PHYLIP, Mega and PAUPp (pronounced ‘pop star’) are the most comprehensive
and widely used phylogeny packages
12. Process of constructing Tree
The whole process of construction of the phylogenetic tree is divided into five
different steps,
Step 1: Choosing an appropriate markers for the phylogenetic analysis
Step 2: Multiple sequence alignments
Step 3: Selection of an evolutionary model
Step 4: Phylogenetic reconstruction
Step 5: Evaluation of the phylogenetic tree
13. Choosing an appropriate markers for the
phylogenetic analysis
Any biological information that can be used to infer the evolutionary
relationship among the taxa is known as a phylogenetic information marker.
It can be anything like DNA, RNA, protein, RFLP, AFLP, ISSR, allozymes, and
conserved intronic positions, etc. Identification of conserved genetic loci
(coding- or non-coding) is the first step in analyzing the phylogenetic
relationship.
Both coding (genes) and non-coding genetic region can be used for the
analysis of phylogenetic relationships.
14. Step 2: Multiple sequence alignments
Aligning two sequences is known as pair-wise sequence alignment, while the
alignment that includes more than two sequences is known as multiple
sequence alignments.
The main aim of multiple sequence alignment is to compare the three or
more nucleotide or protein sequences and to provide the basis for calculation
of the sequence diversities/divergences to infer the evolutionary relationship
among the taxa.
15. Global vs. Local Alignment
Global alignment algorithms which optimize overall alignment between two sequences
(Needleman & Wunsch)
Local alignment algorithms which seek only relatively conserved pieces of sequence (Smith-
Waterman)
Alignment stops at the ends of regions of strong similarity
Favors finding conserved patterns in otherwise different pairs of sequences
Sequence 1 - LGPSSKQTGKGSSRIWDN
Sequence 2 - LNITKSAGKGAIMRLGDA
17. Global alignment methods (both are progressive pairwise)
• ClustalW (most popular)
• PileUp (used in GCG package)
Local alignment methods
• Dialign
MSA – methods used....
PILEUP
aligned pairwise using dynamic
programming algorithm
scores - produce a phylogenetic tree
then used to guide the alignment
Result- global alignment produced by
the Needleman-Wunsch algorithm
No gap modifications or sequence
weighting does not guarantee an
optimal alignment
dependence of the final MSA on the
initial pairwise alignments
For closely related sequences -
CLUSTAL
CLUSTAL
Progressive Pairwise Alignment
(PPA)
globally align most similar
sequences first
construct a tree using neighbor-
joining
align the sequences sequentially,
guided by the phylogenetic
relationships
Gap penalties can be adjusted (
using other characteristics)
It can re-align just selected
sequences or selected regions in an
existing alignment
It can compute phylogenetic trees
from a set of aligned sequences.
18. Step 3: Selection of an evolutionary
model
Selection of an evolutionary model follows the multiple sequence alignment.
According to the neutral theory of evolution, most of the mutations are
neutral and can occur at the rate of 10-6 to 10-8.
Considering this fact every site in a DNA sequence must have undergone
numerous substitutions that are proportional to the evolutionary time period.
19. Step 4: Phylogenetic reconstruction
Two different methodologies are employed by the presently available
programs to generate the dendograms;
(a) clustering methods-where two most closely related taxa are placed under
single inter-node and further add third taxa considering within internodes
taxa as a single group. In this way, the program progressively adds the other
remaining taxa to yield final phylogenetic tree
(b) second type of methods generate the 'n' number of trees proportional to
the number of taxa involved in the phylogenetic analysis followed by the
selection of best fit tree topology (increased likelihood or probability) for a
given evolutionary model.
20. Step 5: evaluating the phylogenetic tree
After successful construction of the phylogenetic tree, the next step involves
evaluation of the tree topology. This process can be performed using two
evaluation methods, namely bootstrap method and interior-branch test.
The basic concept of bootstrap method is evaluation of the tree topology by
constructing phylogenetic trees equal to the given number of pseudo-data
replicates.
In this way the user defined number of data pseudo-replicates is constructed
followed by corresponding phylogenetic trees.
21. Bioinformatics Tools for Phylogenetic
Analysis
• There are several bioinformatics tools and databases that can be used for
phylogenetic analysis.
• These include PANTHER, P-Pod, PFam, TreeFam, and the PhyloFacts
structural phylogenomic encyclopedia.
• Each of these databases uses different algorithms and draws on different
sources for sequence information, and therefore the trees estimated by
PANTHER, for example, may differ significantly from those generated by P-Pod
or PFam.
• As with all bioinformatics tools of this type, it is important to test different
methods, compare the results, then determine which database works best
(according to consensus results) for studies involving different types of
datasets.