phy prAC.pptx

CONSTRUCTION OF
PHYLOGENETIC TREE
SWORNA KUMARI.C
PhD., Biotechnology
Hexacara Lifesciences pvt ltd

The basic steps in any phylogenetic
analysis

How To Construct A Phylogenetic Tree
 A phylogenetic tree is a visual representation of the relationship between
different organisms, showing the path through evolutionary time from a
common ancestor to different descendants.
 Similarities and divergence among related biological sequences revealed by
sequence alignment often have to be rationalized and visualized in the
context of phylogenetic trees. Thus, molecular phylogenetics is a
fundamental aspect of bioinformatics.
 Molecular phylogenetics is the branch of phylogeny that analyzes genetic,
hereditary molecular differences, predominately in DNA sequences, to gain
information on an organism’s evolutionary relationships.

How To Construct A Phylogenetic Tree
 The similarity of biological functions and molecular mechanisms in living
organisms strongly suggests that species descended from a common ancestor.
Molecular phylogenetics uses the structure and function of molecules and how
they change over time to infer these evolutionary relationships.
 From these analyses, it is possible to determine the processes by which
diversity among species has been achieved. The result of a molecular
phylogenetic analysis is expressed in a phylogenetic tree.

Assemble and align a dataset
• The first step is to identify a protein or DNA sequence of interest and assemble
a dataset consisting of other related sequences.
• DNA sequences of interest can be retrieved using NCBI BLAST or similar
search tools.
• Once sequences are selected and retrieved, multiple sequence alignment is
created.
• This involves arranging a set of sequences in a matrix to identify regions of
homology.
• There are many websites and software programs, such as ClustalW, MSA,
MAFFT, and T-Coffee, designed to perform multiple sequence on a given set of
molecular data.

Build (estimate) phylogenetic trees
 Build (estimate) phylogenetic trees from sequences using computational
methods and stochastic models
• To build phylogenetic trees, statistical methods are applied to determine the
tree topology and calculate the branch lengths that best describe the
phylogenetic relationships of the aligned sequences in a dataset.
• The most common computational methods applied include distance-matrix
methods, and discrete data methods, such as maximum parsimony and
maximum likelihood.
• There are several software packages, such as Paup, PAML, PHYLIP, that apply
these most popular methods.

Statistically test and assess the
estimated trees.
 Tree estimating algorithms generate one or more optimal trees.
 This set of possible trees is subjected to a series of statistical tests to
evaluate whether one tree is better than another – and if the proposed
phylogeny is reasonable.
 Common methods for assessing trees include the Bootstrap and Jackknife
Resampling methods, and analytical methods, such as parsimony, distance,
and likelihood.

Human Phylogenetic Tree
 As you can see in the diagram, every species or individual
(in this case) has a common ancestor, and that is your
grandparent. Then it splits up into two branches: your
parent and your aunt (sibling of your parent). And then
you, your sibling, and your cousins have a unique history
because you were born from different parents yet have the
same and common ancestor as your grandparent.

Animal Phylogenetic Tree
As you can see in the diagram, all animals have the same common
ancestor, but they are divided because of their different
characteristics. These characteristics are jaws, lungs,
gizzards, and feathers. Thus, these characteristics
differentiate between the animals mentioned in the diagram that
nevertheless have the same ancestor.

Programs
 PHYLIP, Mega and PAUPp (pronounced ‘pop star’) are the most comprehensive
and widely used phylogeny packages

Process of constructing Tree
 The whole process of construction of the phylogenetic tree is divided into five
different steps,
 Step 1: Choosing an appropriate markers for the phylogenetic analysis
 Step 2: Multiple sequence alignments
 Step 3: Selection of an evolutionary model
 Step 4: Phylogenetic reconstruction
 Step 5: Evaluation of the phylogenetic tree

Choosing an appropriate markers for the
phylogenetic analysis
 Any biological information that can be used to infer the evolutionary
relationship among the taxa is known as a phylogenetic information marker.
 It can be anything like DNA, RNA, protein, RFLP, AFLP, ISSR, allozymes, and
conserved intronic positions, etc. Identification of conserved genetic loci
(coding- or non-coding) is the first step in analyzing the phylogenetic
relationship.
 Both coding (genes) and non-coding genetic region can be used for the
analysis of phylogenetic relationships.

Step 2: Multiple sequence alignments
 Aligning two sequences is known as pair-wise sequence alignment, while the
alignment that includes more than two sequences is known as multiple
sequence alignments.
 The main aim of multiple sequence alignment is to compare the three or
more nucleotide or protein sequences and to provide the basis for calculation
of the sequence diversities/divergences to infer the evolutionary relationship
among the taxa.

Global vs. Local Alignment
 Global alignment algorithms which optimize overall alignment between two sequences
(Needleman & Wunsch)
 Local alignment algorithms which seek only relatively conserved pieces of sequence (Smith-
Waterman)
 Alignment stops at the ends of regions of strong similarity
 Favors finding conserved patterns in otherwise different pairs of sequences
Sequence 1 - LGPSSKQTGKGSSRIWDN
Sequence 2 - LNITKSAGKGAIMRLGDA

Global alignment methods (both are progressive pairwise)
• ClustalW (most popular)
• PileUp (used in GCG package)
Local alignment methods
• Dialign
MSA – methods used....
PILEUP
 aligned pairwise using dynamic
programming algorithm
 scores - produce a phylogenetic tree
 then used to guide the alignment
 Result- global alignment produced by
the Needleman-Wunsch algorithm
 No gap modifications or sequence
weighting does not guarantee an
optimal alignment
 dependence of the final MSA on the
initial pairwise alignments
 For closely related sequences -
CLUSTAL
CLUSTAL
 Progressive Pairwise Alignment
(PPA)
 globally align most similar
sequences first
 construct a tree using neighbor-
joining
 align the sequences sequentially,
guided by the phylogenetic
relationships
 Gap penalties can be adjusted (
using other characteristics)
 It can re-align just selected
sequences or selected regions in an
existing alignment
 It can compute phylogenetic trees
from a set of aligned sequences.

Step 3: Selection of an evolutionary
model
 Selection of an evolutionary model follows the multiple sequence alignment.
 According to the neutral theory of evolution, most of the mutations are
neutral and can occur at the rate of 10-6 to 10-8.
 Considering this fact every site in a DNA sequence must have undergone
numerous substitutions that are proportional to the evolutionary time period.

Step 4: Phylogenetic reconstruction
 Two different methodologies are employed by the presently available
programs to generate the dendograms;
 (a) clustering methods-where two most closely related taxa are placed under
single inter-node and further add third taxa considering within internodes
taxa as a single group. In this way, the program progressively adds the other
remaining taxa to yield final phylogenetic tree
 (b) second type of methods generate the 'n' number of trees proportional to
the number of taxa involved in the phylogenetic analysis followed by the
selection of best fit tree topology (increased likelihood or probability) for a
given evolutionary model.

Step 5: evaluating the phylogenetic tree
 After successful construction of the phylogenetic tree, the next step involves
evaluation of the tree topology. This process can be performed using two
evaluation methods, namely bootstrap method and interior-branch test.
 The basic concept of bootstrap method is evaluation of the tree topology by
constructing phylogenetic trees equal to the given number of pseudo-data
replicates.
 In this way the user defined number of data pseudo-replicates is constructed
followed by corresponding phylogenetic trees.

Bioinformatics Tools for Phylogenetic
Analysis
• There are several bioinformatics tools and databases that can be used for
phylogenetic analysis.
• These include PANTHER, P-Pod, PFam, TreeFam, and the PhyloFacts
structural phylogenomic encyclopedia.
• Each of these databases uses different algorithms and draws on different
sources for sequence information, and therefore the trees estimated by
PANTHER, for example, may differ significantly from those generated by P-Pod
or PFam.
• As with all bioinformatics tools of this type, it is important to test different
methods, compare the results, then determine which database works best
(according to consensus results) for studies involving different types of
datasets.

phy prAC.pptx

More Related Content

What's hot

Similar to phy prAC.pptx

More from sworna kumari chithiraivelu

Recently uploaded

phy prAC.pptx