SlideShare a Scribd company logo
1 of 17
UPGMA
Presented By
Shreya Gopinath
Phylogenetic tree construction
2 methods
• Distance-based methods –
Examples : UPGMA, Neighbor joining, Fitch-Margoliash method, minimum evolution
• Character-based methods –
Input: Aligned sequences
Output: Phylogenetic tree
Examples : Parsimony , Maximum Likelihood
UPGMA
UPGMA : Unweighted Pair Group Method with Arithmetic Mean
Developed by Sokal and Michener in 1958.
It is a Sequential clustering method
Type of distance based method for Phylogenetic Tree construction
UPGMA is the simplest method for constructing trees.
Generates rooted trees
Generates ultra metric trees from a distance matrix
Uses a simplest algorithm
Input: Distance matrix containing pairwise statistical estimation of aligned
sequences
Output: Phylogenetic tree
• UPGMA starts with a matrix of pairwise distances.
• Each sample is denoted as a 'cluster'.
• Assigns all clusters to a star-like tree.
• The algorithm constructs a rooted tree that reflects the structure present in a
pairwise similarity matrix.
• At each step, the nearest two clusters are combined into a higher-level cluster.
• It assumes an ultra-metric tree in which the distances from the root to every branch
tip are equal.
UPGMAAlgorithm
Steps
Find the i and j with the smallest distance Dij.
Create a new group (ij) which has n(ij) = ni + nj members.
Connect i and j on the tree to a new node (ij).
Give the edges connecting i to (ij) and j to (ij) same length so that the depth of group
(ij) is Dij/2.
Compute the distance between the new group and all other groups except i and j by
using
𝐷 𝑖𝑗 , 𝑘 =
Dik +𝐷 𝑗𝑘
2
Delete columns and rows corresponding to i and j and add one for (ij). If there are
two or more groups left, go back to the first step
Computational tools
• MEGA
• PHYLIP
• MVSP
• MVSP87
• SAS
• SYN-TAX
• NTSYS
• DendroUPGMA
Advantages
simple algorithm
Fastest method
easy to compute by hand or a variety of software
Trees reflect phenotypic similarities by phylogenetic distances
Data can be arranged in random order prior to analysis
Rooted trees are generated that are easy to analyze
Disadvantages
It assumes the same evolutionary speed on all lineages
It frequently generates wrong tree topologies
 Re-rooting is not allowed
Algorithm does not aim to reflect evolutionary descent
It assumes a randomized molecular clock.
Applications
• In ecology, it is one of the most popular methods for the classification of sampling units (such
as vegetation plots) on the basis of their pairwise
similarities in relevant descriptor variables (such as species composition).[3]
• In bioinformatics, UPGMA is used for the creation of phenetic trees (phenograms). UPGMA
was initially designed for use in protein
electrophoresis studies, but is currently most often used to produce guide trees for more sophi
sticated algorithms. This algorithm is for example
used in sequence alignment procedures, as it proposes one order in which the sequences will
be aligned. Indeed, the guide tree aims at grouping
the most similar sequences, regardless of their evolutionary rate or phylogenetic affinities, an
d that is exactly the goal of UPGMA.[4]
• In phylogenetics, UPGMA assumes a constant rate of evolution (molecular clock hypothesis),
and is not a wellregarded method for inferring
relationships unless this assumption has been tested and justified for the data set being used.
Example
1. Calculate the pairwise distance matrix
A B C D E F
A 0 1 3 6 7 10
B 1 0 3 6 7 10
C 3 3 0 5 6 9
D 6 6 5 0 1 7
E 7 7 6 1 0 8
F 10 10 9 7 8 0
2. Group the 2 most closely related sequences
A B C D E F
A 0 1 3 6 7 10
B 1 0 3 6 7 10
C 3 3 0 5 6 9
D 6 6 5 0 1 7
E 7 7 6 1 0 8
F 10 10 9 7 8 0
A
B
0.5
0.5
3. Recalculate the distance matrix and take the next smallest distance
A/B C D E F
A/B 0 3 6 7 10
C 3 0 5 6 9
D 6 5 0 1 7
E 7 6 1 0 8
F 10 9 7 8 0
A
B
0.5
0.5
D
E
0.5
0.5
3. Recalculate the distance matrix and take the next smallest distance
A
B
0.5
0.5
D
E
0.5
0.5
A/B C D/E F
A/B 0 3 6.5 10
C 3 0 5.5 9
D/E 6.5 5.5 0 7.5
F 10 9 7.5 0
C
1
1.5
3. Recalculate the distance matrix and take the next smallest distance
A
B
0.5
0.5
D
E
0.5
0.5
C
1
1.5
A/B/C D/E F
A/B/C 0 6 9.5
D/E 6 0 7.5
F 9.5 7.5 0
1.5
2.5
3. Recalculate the distance matrix and take the next smallest distance
A
B
0.5
0.5
D
E
0.5
0.5
C
1
1.5
1.5
2.5
A/B/C/D/E F
A/B/C/D/E 0 8.5
F 8.5 0
F4.25
1.25
UPGMA

More Related Content

What's hot

What's hot (20)

sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Clustal
ClustalClustal
Clustal
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Scop database
Scop databaseScop database
Scop database
 
Prosite
PrositeProsite
Prosite
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
Fasta
FastaFasta
Fasta
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Fasta
FastaFasta
Fasta
 

Similar to UPGMA

Presentation 2009 Journal Club Azhar Ali Shah
Presentation 2009 Journal Club Azhar Ali ShahPresentation 2009 Journal Club Azhar Ali Shah
Presentation 2009 Journal Club Azhar Ali Shah
guest5de83e
 

Similar to UPGMA (20)

Upgma
UpgmaUpgma
Upgma
 
BioINfo.pptx
BioINfo.pptxBioINfo.pptx
BioINfo.pptx
 
Phylogenetics1
Phylogenetics1Phylogenetics1
Phylogenetics1
 
Tree building
Tree buildingTree building
Tree building
 
Phylogenetic tree construction
Phylogenetic tree constructionPhylogenetic tree construction
Phylogenetic tree construction
 
PHYLOGENETIC TREE CONSTRUCTION.pptx
PHYLOGENETIC TREE CONSTRUCTION.pptxPHYLOGENETIC TREE CONSTRUCTION.pptx
PHYLOGENETIC TREE CONSTRUCTION.pptx
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Predictive analytics
Predictive analyticsPredictive analytics
Predictive analytics
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Presentation 2009 Journal Club Azhar Ali Shah
Presentation 2009 Journal Club Azhar Ali ShahPresentation 2009 Journal Club Azhar Ali Shah
Presentation 2009 Journal Club Azhar Ali Shah
 
ML basic & clustering
ML basic & clusteringML basic & clustering
ML basic & clustering
 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
 
Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Unsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxUnsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptx
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
6238578.ppt
6238578.ppt6238578.ppt
6238578.ppt
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence Analysis
 
BTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptxBTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptx
 

More from Shreya Feliz

More from Shreya Feliz (8)

Cell senescence
Cell senescenceCell senescence
Cell senescence
 
Transposable elements
Transposable elementsTransposable elements
Transposable elements
 
Expression and purification of recombinant proteins in Bacterial and yeast sy...
Expression and purification of recombinant proteins in Bacterial and yeast sy...Expression and purification of recombinant proteins in Bacterial and yeast sy...
Expression and purification of recombinant proteins in Bacterial and yeast sy...
 
Current trends in pseduogene detection and characterization
Current trends in pseduogene detection and characterizationCurrent trends in pseduogene detection and characterization
Current trends in pseduogene detection and characterization
 
Non distilled beverages
Non distilled beveragesNon distilled beverages
Non distilled beverages
 
High performance-liquid-chromatography-hplc
High performance-liquid-chromatography-hplcHigh performance-liquid-chromatography-hplc
High performance-liquid-chromatography-hplc
 
Prokaryotic and eukaryotic genome
Prokaryotic and eukaryotic genomeProkaryotic and eukaryotic genome
Prokaryotic and eukaryotic genome
 
Ct scan
Ct scanCt scan
Ct scan
 

Recently uploaded

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 

Recently uploaded (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 

UPGMA

  • 2. Phylogenetic tree construction 2 methods • Distance-based methods – Examples : UPGMA, Neighbor joining, Fitch-Margoliash method, minimum evolution • Character-based methods – Input: Aligned sequences Output: Phylogenetic tree Examples : Parsimony , Maximum Likelihood
  • 3. UPGMA UPGMA : Unweighted Pair Group Method with Arithmetic Mean Developed by Sokal and Michener in 1958. It is a Sequential clustering method Type of distance based method for Phylogenetic Tree construction UPGMA is the simplest method for constructing trees.
  • 4. Generates rooted trees Generates ultra metric trees from a distance matrix Uses a simplest algorithm Input: Distance matrix containing pairwise statistical estimation of aligned sequences Output: Phylogenetic tree
  • 5. • UPGMA starts with a matrix of pairwise distances. • Each sample is denoted as a 'cluster'. • Assigns all clusters to a star-like tree. • The algorithm constructs a rooted tree that reflects the structure present in a pairwise similarity matrix. • At each step, the nearest two clusters are combined into a higher-level cluster. • It assumes an ultra-metric tree in which the distances from the root to every branch tip are equal. UPGMAAlgorithm
  • 6. Steps Find the i and j with the smallest distance Dij. Create a new group (ij) which has n(ij) = ni + nj members. Connect i and j on the tree to a new node (ij). Give the edges connecting i to (ij) and j to (ij) same length so that the depth of group (ij) is Dij/2. Compute the distance between the new group and all other groups except i and j by using 𝐷 𝑖𝑗 , 𝑘 = Dik +𝐷 𝑗𝑘 2 Delete columns and rows corresponding to i and j and add one for (ij). If there are two or more groups left, go back to the first step
  • 7. Computational tools • MEGA • PHYLIP • MVSP • MVSP87 • SAS • SYN-TAX • NTSYS • DendroUPGMA
  • 8. Advantages simple algorithm Fastest method easy to compute by hand or a variety of software Trees reflect phenotypic similarities by phylogenetic distances Data can be arranged in random order prior to analysis Rooted trees are generated that are easy to analyze
  • 9. Disadvantages It assumes the same evolutionary speed on all lineages It frequently generates wrong tree topologies  Re-rooting is not allowed Algorithm does not aim to reflect evolutionary descent It assumes a randomized molecular clock.
  • 10. Applications • In ecology, it is one of the most popular methods for the classification of sampling units (such as vegetation plots) on the basis of their pairwise similarities in relevant descriptor variables (such as species composition).[3] • In bioinformatics, UPGMA is used for the creation of phenetic trees (phenograms). UPGMA was initially designed for use in protein electrophoresis studies, but is currently most often used to produce guide trees for more sophi sticated algorithms. This algorithm is for example used in sequence alignment procedures, as it proposes one order in which the sequences will be aligned. Indeed, the guide tree aims at grouping the most similar sequences, regardless of their evolutionary rate or phylogenetic affinities, an d that is exactly the goal of UPGMA.[4] • In phylogenetics, UPGMA assumes a constant rate of evolution (molecular clock hypothesis), and is not a wellregarded method for inferring relationships unless this assumption has been tested and justified for the data set being used.
  • 11. Example 1. Calculate the pairwise distance matrix A B C D E F A 0 1 3 6 7 10 B 1 0 3 6 7 10 C 3 3 0 5 6 9 D 6 6 5 0 1 7 E 7 7 6 1 0 8 F 10 10 9 7 8 0
  • 12. 2. Group the 2 most closely related sequences A B C D E F A 0 1 3 6 7 10 B 1 0 3 6 7 10 C 3 3 0 5 6 9 D 6 6 5 0 1 7 E 7 7 6 1 0 8 F 10 10 9 7 8 0 A B 0.5 0.5
  • 13. 3. Recalculate the distance matrix and take the next smallest distance A/B C D E F A/B 0 3 6 7 10 C 3 0 5 6 9 D 6 5 0 1 7 E 7 6 1 0 8 F 10 9 7 8 0 A B 0.5 0.5 D E 0.5 0.5
  • 14. 3. Recalculate the distance matrix and take the next smallest distance A B 0.5 0.5 D E 0.5 0.5 A/B C D/E F A/B 0 3 6.5 10 C 3 0 5.5 9 D/E 6.5 5.5 0 7.5 F 10 9 7.5 0 C 1 1.5
  • 15. 3. Recalculate the distance matrix and take the next smallest distance A B 0.5 0.5 D E 0.5 0.5 C 1 1.5 A/B/C D/E F A/B/C 0 6 9.5 D/E 6 0 7.5 F 9.5 7.5 0 1.5 2.5
  • 16. 3. Recalculate the distance matrix and take the next smallest distance A B 0.5 0.5 D E 0.5 0.5 C 1 1.5 1.5 2.5 A/B/C/D/E F A/B/C/D/E 0 8.5 F 8.5 0 F4.25 1.25