SlideShare a Scribd company logo
1 of 97
FBW 
4-11-2014 
Wim Van Criekinge
Wel les op 4 november en GEEN les op 18 november
Phylogenetics 
Introduction 
Definitions 
Species concept 
Examples 
The Tree-of-life 
Phylogenetics Methodologies 
Algorithms 
Distance Methods 
Maximum Likelihood 
Maximum Parsimony 
Rooting 
Statistical Validation 
Conclusions 
Orthologous genes 
Horizontal Gene Transfer 
Phylogenomics 
Practical Approach: PHYLIP 
Weblems
What is phylogenetics ? 
Phylogeny (phylo =tribe + genesis) 
Phylogenetic trees are about visualising evolutionary 
relationships. They reconstruct the pattern of events 
that have led to the distribution and diversity of life. 
The purpose of a phylogenetic tree is to illustrate how a 
group of objects (usually genes or organisms) are 
related to one another 
Nothing in Biology Makes Sense Except in the Light of 
Evolution. Theodosius Dobzhansky (1900-1975)
Trees 
• Diagram consisting of branches and nodes 
• Species tree (how are my species related?) 
– contains only one representative from each 
species. 
– all nodes indicate speciation events 
• Gene tree (how are my genes related?) 
– normally contains a number of genes from a 
single species 
– nodes relate either to speciation or gene 
duplication events
Clade: A set of species which includes all of the species 
derived from a single common ancestor
Species Concepts from Various Authors 
D.A. Baum and K.L. Shaw - Exclusive groups of organisms, where an exclusive group is one whose members are all more closely related to 
each other than to any organisms outside the group. 
J. Cracraft - An irreducible cluster of organisms, diagnosably distinct from other such clusters, and within which there is a parental pattern of 
ancestry and descent. 
Charles Darwin - "From these remarks it will be seen that I look at the term species, as one arbitrarily given for the sake of convenience to a set 
of individuals closely resembling each other, and that it does not essentially differ from the term variety, which is given to less distinct and 
more fluctuating forms. The term variety, again, in comparison with mere individual differences, is also applied arbitrarily, and for mere 
convenience sake" (Origin of Species, 1st ed., p. 108). 
T. Dobzhansky - The largest and most inclusive reproductive community of sexual and cross-fertilizing individuals which share a common gene 
pool. And later...Systems of populations, the gene exchange between which is limited or prevented by reproductive isolating mechanisms. 
M. Ghiselin - The most extensive units in the natural economy, such that reproductive competition occurs among their parts. 
D.M. Lambert - Groups of individuals that define themselves by a specific mate recognition system. 
J. Mallet - Identifiable genotypic clusters recognized by a deficit of intermediates, both at single loci and at multiple loci. 
E. Mayr - Groups of actually or potentially interbreeding natural populations which are reproductively isolated from other such groups. 
C.D. Michener - A group of organisms not itself divisible by phenetic gaps resulting from concordant differences in character states (except for 
morphs - such as sex, age, or caste), but separated by such phenetic gaps from other such units. 
H.E.H. Patterson - That most inclusive population of individual biparental organisms which share a common fertilization system. 
G.G. Simpson - A lineage of populations evolving with time, separately from others, with its own unique evolutionary role and tendencies. 
P.H.A. Sneath and R.R. Sokal - The smallest (most homogeneous) cluster that can be recognized upon some given criterion as being distinct 
from other clusters. 
A.R. Templeton - The most inclusive population of individuals having the potential for phenotypic cohesion through intrinsic cohesion 
mechanisms (genetic and/or demographic - i.e. ecological -exchangeability). 
E.O. Wiley - A single lineage of ancestor-descendant populations which maintains its identity from other such lineages and which has its own 
evolutionary tendencies and historical fate. 
S. Wright - A species in time and space is composed of numerous local populations, each one intercommunicating and intergrading with others.
Species 
I. Definitions: 
Species = the basic unit of classification 
> Three different ways to recognize species:
Plant Species 
Definitions: 
> Three different ways to recognize species: 
1) Morphological species = the smallest group that is 
consistently and persistently distinct (Clusters in 
morphospace) 
species are recognized initially on the basis of 
appearance; the individuals of one species look 
different from the individuals of another
Species 
Definitions: 
> Three different ways to recognize species: 
2) Biological species = a set of interbreeding or 
potentially interbreeding individuals that are 
separated from other species by reproductive 
barriers 
species are unable to interbreed
Species 
Definitions: 
> Three different ways to recognize species: 
3) Phylogenetic species = the boundary between 
reticulate (among interbreeding individuals) and 
divergent relationships (between lineages with no 
gene exchange)
Phylogenetic species 
divergent 
reticulate 
boundary 
recognized by the pattern of ancestor - descendent relationships
Species 
Definitions: 
> Three different ways to recognize species: 
4) Phylogenomics species = ability to transmit (and 
maintain) a (stable) gene pool 
Adresses the Anopheles genome topology 
variations
Branching Order in a Phylogenetic Tree 
• In the tree to the left, A and B share the most recent 
common ancestry. Thus, of the species in the tree, 
A and B are the most closely related. 
• The next most recent common ancestry is C with 
the group composed of A and B. Notice that the 
relationship of C is with the group containing A 
and B. In particular, C is not more closely related to 
B than to A. This can be emphasized by the 
following two trees, which are equivalent to each 
other:
More definitions … 
Edge, Branch 
Branch node, internal node 
Leafs 
Tips 
external node 
• A common simplifying assumption is that the three is bifurcating, 
meaning that each brach node has exactly two descendents. 
• The edges, taken together, are sometimes said to define the topology 
of the tree
Outgroups, rooted versus unrooted 
An unrooted reptilian phylogeny with an avian outgroup and 
the corresponding rooted phylogeny. The Ri represent modern 
reptiles; the Ai, inferred ancestors and the B a bird.
Some definitions …
Examples 
Phylogenetic methods may be used to 
solve crimes, test purity of products, and 
determine whether endangered species 
have been smuggled or mislabeled: 
– Vogel, G. 1998. HIV strain analysis debuts in 
murder trial. Science 282(5390): 851-853. 
– Lau, D. T.-W., et al. 2001. Authentication of 
medicinal Dendrobium species by the internal 
transcribed spacer of ribosomal DNA. Planta 
Med 67:456-460.
Examples 
– Epidemiologists use phylogenetic methods to 
understand the development of pandemics, 
patterns of disease transmission, and 
development of antimicrobial resistance or 
pathogenicity: 
• Basler, C.F., et al. 2001. Sequence of the 1918 
pandemic influenza virus nonstructural gene (NS) 
segment and characterization of recombinant viruses 
bearing the 1918 NS genes. PNAS, 98(5):2746-2751. 
• Ou, C.-Y., et al. 1992. Molecular epidemiology of HIV 
transmission in a dental practice. Science 
256(5060):1165-1171. 
• Bacillus Antracis:
Examples 
• Conservation biologists may use these techniques to 
determine which populations are in greatest need of 
protection, and other questions of population structure: 
– Trepanier, T.L., and R.W. Murphy. 2001. The Coachella Valley 
fringe-toed lizard (Uma inornata): genetic diversity and 
phylogenetic relationships of an endangered species. Mol 
Phylogenet Evol 18(3):327-334. 
– Alves, M.J., et al. 2001. Mitochondrial DNA variation in the 
highly endangered cyprinid fish Anaecypris hispanica: 
importance for conservation. Heredity 87(Pt 4):463-473. 
• Pharmaceutical researchers may use phylogenetic 
methods to determine which species are most closely 
related to other medicinal species, thus perhaps sharing 
their medicinal qualities: 
– Komatsu, K., et al. 2001. Phylogenetic analysis based on 18S 
rRNA gene and matK gene sequences of Panax vietnamensis 
and five related species. Planta Med 67:461-465.
Tree-of-life
Some Important Dates in History 
Origin of the Universe 15 billion yrs 
Formation of the Solar System 4.6 " 
First Self-replicating System 3.5 " 
Prokaryotic-Eukaryotic Divergence 2.0 " 
Plant-Animal Divergence 1.0 " 
Invertebrate-Vertebrate Divergence 0.5 " 
Mammalian Radiation Beginning 0.1 "
Tree Of Life
Tree Of Life
Tree Of Life
Tree Of Life
What Sequence to Use ? 
• To infer relationships that span the 
diversity of known life, it is 
necessary to look at genes 
conserved through the billions of 
years of evolutionary divergence. 
• The gene must display an 
appropriate level of sequence 
conservation for the divergences of 
interest. 
.
• If there is too much change, then 
the sequences become 
randomized, and there is a limit to 
the depth of the divergences that 
can be accurately inferred. 
• If there is too little change (if the 
gene is too conserved), then there 
may be little or no change between 
the evolutionary branchings of 
interest, and it will not be possible to 
infer close (genus or species level) 
relationships. 
What Sequence to Use ?
Ribosomal RNA Genes and Their Sequences 
Carl Woese 
recognized the full potential of rRNA 
sequences as a measure of phylogenetic 
relatedness. He initially used an RNA 
sequencing method that determined about 
1/4 of the nucleotides in the 16S rRNA (the 
best technology available at the time). This 
amount of data greatly exceeded anything 
else then available. Using newer methods, 
it is now routine to determine the 
sequence of the entire 16S rRNA 
molecule. Today, the accumulated 16S 
rRNA sequences (about 10,000) constitute 
the largest body of data available for 
inferring relationships among organisms.
An example of genes in this category are 
those that define the ribosomal RNAs 
(rRNAs). Most prokaryotes have three 
rRNAs, called the 5S, 16S and 23S 
rRNA. 
What Sequence to Use ? 
Namea Size (nucleotides) Location 
5S 120 Large subunit of ribosome 
16S 1500 Small subunit of ribosome 
23S 2900 Large subunit of ribosome 
a The name is based on the rate that the 
molecule sediments (sinks) in water. 
Bigger molecules sediment faster than small 
ones.
Ribosomal RNA Genes and Their Sequences 
The extraordinary conservation of rRNA genes can 
be seen in these fragments of the small subunit 
rRNA gene sequences from organisms spanning 
the known diversity of life: 
human ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGCTGCAGTTAAAAAG... 
yeast ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAG... 
Corn ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAG... 
Escherichia coli ...GTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCG... 
Anacystis nidulans ...GTGCCAGCAGCCGCGGTAATACGGGAGAGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCG... 
Thermotoga maratima ...GTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTACCCGGATTTACTGGGCGTAAAGGG... 
Methanococcus vannielii ...GTGCCAGCAGCCGCGGTAATACCGACGGCCCGAGTGGTAGCCACTCTTATTGGGCCTAAAGCG... 
Thermococcus celer ...GTGGCAGCCGCCGCGGTAATACCGGCGGCCCGAGTGGTGGCCGCTATTATTGGGCCTAAAGCG... 
Sulfolobus sulfotaricus ...GTGTCAGCCGCCGCGGTAATACCAGCTCCGCGAGTGGTCGGGGTGATTACTGGGCCTAAAGCG...
Other genes …
• Rate of evolution = rate of mutation 
• rate of evolution for any macromolecule is 
approximately constant over time (Neutral 
Theory of evolution) 
• For a given protein the rate of sequence 
evolution is approximately constant across 
lineages. Zuckerkandl and Pauling (1965) 
• This would allow speciation and duplication 
events to be dated accurately based on 
molecular data 
Molecular Clock (MC)
Noval trees using Hox genes
• (a) A traditional phylogenetic tree and
• (a) A traditional phylogenetic tree and 
• (b) the new phylogenetic tree, each showing the 
positions of selected phyla. B, bilateria; AC, 
acoelomates; PC, pseudocoelomates; C, 
coelomates; P, protostomes; L, lophotrochozoa; E, 
ecdysozoa; D, deuterostomes.
• Local and approximate molecular 
clocks more reasonable 
– one amino acid subst. 14.5 My 
– 1.3 10-9 substitutions/nucleotide site/year 
– Relative rate test (see further) 
• ((A,B),C) then measure distance between 
(A,C) & (B,C) 
Molecular Clock (MC)
Proteins evolve at highly different rates 
Rate of Change Theoretical Lookback Time 
(PAMs / 100 myrs) (myrs) 
Pseudogenes 400 45 
Fibrinopeptides 90 200 
Lactalbumins 27 670 
Lysozymes 24 850 
Ribonucleases 21 850 
Haemoglobins 12 1500 
Acid proteases 8 2300 
Cytochrome c 4 5000 
Glyceraldehyde-P dehydrogenase2 9000 
Glutamate dehydrogenase 1 18000 
PAM = number of Accepted Point Mutations per 100 amino acids.
Phylogenetics 
Introduction 
Definitions 
Species concept 
Examples 
The Tree-of-life 
Phylogenetics Methodologies 
Algorithms 
Distance Methods 
Maximum Likelihood 
Maximum Parsimony 
Rooting 
Statistical Validation 
Conclusions 
Orthologous genes 
Horizontal Gene Transfer 
Phylogenomics 
Practical Approach: PHYLIP 
Weblems
Multiple Alignment Method
• align 
• select method (evolutionary 
model) 
– Distance 
–ML 
–MP 
• generate tree 
• validate tree 
4-steps
Some definitions …
Distance matrix methods (upgma, nj, Fitch,...) 
• Convert sequence data into a 
set of discrete pairwise distance 
values (n*(n-1)/2), arranged into 
a matrix. Distance methods fit a 
tree to this matrix. 
• The phylogenetic topology tree 
is constructed by using a cluster 
analysis method (like upgma or 
nj methods).
Distance matrix methods (upgma, nj, Fitch,...)
Distance matrix methods (upgma, nj, Fitch,...) 
CGT
Distance matrix methods (upgma, nj, Fitch,...) 
Since we start with A,p(A)=1
Distance matrix methods (upgma, nj, Fitch,...) 
D=evolutionary distance ~ tijd 
F = dissimilarity ~ (1 – PX(t)) 
F ~ 1 – d
Distance matrix methods (upgma, nj, Fitch,...)
Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
Distance matrix methods: Summary 
http://www.bioportal.bic.nus.edu.sg/phylip/neighbor.html
Distance matrix methods (upgma, nj, Fitch,...) 
• The phylogeny makes an estimation of 
the distance for each pair as the sum 
of branch lengths in the path from one 
sequence to another through the tree. 
 easy to perform ; 
 quick calculation ; 
 fit for sequences having high similarity scores ; 
• drawbacks : 
 the sequences are not considered as such (loss 
of information) ; 
 all sites are generally equally treated (do not 
take into account differences of substitution 
rates ) ; 
 not applicable to distantly divergent sequences.
• In this method, the bases 
(nucleotides or amino acids) of all 
sequences at each site are 
considered separately (as 
independent), and the log-likelihood 
of having these bases are computed 
for a given topology by using a 
particular probability model. 
• This log-likelihood is added for all 
sites, and the sum of the log-likelihood 
is maximized to estimate 
the branch length of the tree. 
Maximum likelihood
Maximum likelihood
• This procedure is repeated for all 
possible topologies, and the topology 
that shows the highest likelihood is 
chosen as the final tree. 
• Notes : 
 ML estimates the branch lengths of the 
final tree ; 
 ML methods are usually consistent ; 
 ML is extented to allow differences 
between the rate of transition and 
transversion. 
• Drawbacks 
 need long computation time to construct a 
tree. 
Maximum likelihood
Maximum likelihood
Parsimony criterion 
• It consists of determining the minimum 
number of changes (substitutions) required to 
transform a sequence to its nearest neighbor. 
Maximum Parsimony 
• The maximum parsimony algorithm searches 
for the minimum number of genetic events 
(nucleotide substitutions or amino-acid 
changes) to infer the most parsimonious tree 
from a set of sequences. 
Maximum Parsimony
Maximum Parsimony 
Occam’s Razor 
Entia non sunt multiplicanda praeter necessitatem. 
William of Occam (1300-1349) 
The best tree is the one which requires the least number of 
substitutions
• The best tree is the one which needs the 
fewest changes. 
– If the evolutionary clock is not constant, the 
procedure generates results which can be 
misleading ; 
– within practical computational limits, this 
often leads in the generation of tens or more 
"equally most parsimonious trees" which 
make it difficult to justify the choice of a 
particular tree ; 
– long computation time to construct a tree. 
Maximum Parsimony
Maximum Parsimony: Branch Node A or B ?
Maximum Parsimony: A requires 5 mutaties
Maximum Parsimony: B (and propagating A->B) requires only 4 mutations
• The best tree is the one which 
needs the fewest changes. 
• Problems : 
– If the evolutionary clock is not 
constant, the procedure generates 
results which can be misleading ; 
– within practical computational limits, 
this often leads in the generation of 
tens or more "equally most 
parsimonious trees" which make it 
difficult to justify the choice of a 
particular tree ; 
– long computation time to construct a 
tree. 
Maximum Parsimony
Phylogenetics 
Introduction 
Definitions 
Species concept 
Examples 
The Tree-of-life 
Phylogenetics Methodologies 
Algorithms 
Distance Methods 
Maximum Likelihood 
Maximum Parsimony 
Rooting 
Statistical Validation 
Conclusions 
Orthologous genes 
Horizontal Gene Transfer 
Phylogenomics 
Practical Approach: PHYLIP 
Weblems
Comparative evaluation of different methods 
 There is at present no statistical 
methods which allow 
comparisons of trees obtained 
from different phylogenetic 
methods, nevertheless many 
studies have been made to 
compare the relative consistency 
of the existing methods.
Comparative evaluation of different methods 
 The consistency depends on many 
factors, among these the topology 
and branch lengths of the real tree, 
the transition/transversion rate and 
the variability of the substitution 
rates. 
 One expects that if sequences have 
strong phylogenetic relationship, 
different methods will show the 
same phylogenetic tree
Comparison of methods 
• Inconsistency 
• Neighbour Joining (NJ) is very fast but depends on 
accurate estimates of distance. This is more 
difficult with very divergent data 
• Parsimony suffers from Long Branch Attraction. 
This may be a particular problem for very divergent 
data 
• NJ can suffer from Long Branch Attraction 
• Parsimony is also computationally intensive 
• Codon usage bias can be a problem for MP and NJ 
• Maximum Likelihood is the most reliable but 
depends on the choice of model and is very slow 
• Methods may be combined
Rooting the Tree 
• In an unrooted tree the direction of 
evolution is unknown 
• The root is the hypothesized ancestor 
of the sequences in the tree 
• The root can either be placed on a 
branch or at a node 
• You should start by viewing an 
unrooted tree
Automatic rooting 
• Many software packages will root 
trees automaticall (e.g. mid-point 
rooting in NJPlot) 
• Sometimes two trees may look very 
different but, in fact, differ only in the 
position of the root 
• This normally involves assumptions… 
BEWARE!
Rooting Using an Outgroup 
1. The outgroup should be a sequence (or set 
of sequences) known to be less closely 
related to the rest of the sequences than they 
are to each other 
2. It should ideally be as closely related as 
possible to the rest of the sequences while 
still satisfying condition 1 
The root must be somewhere between the 
outgroup and the rest (either on the node or 
in a branch)
How confident am I that my tree is correct? 
Bootstrap values 
Bootstrapping is a statistical 
technique that can use random 
resampling of data to determine 
sampling error for tree topologies
Bootstrapping phylogenies 
• Characters are resampled with replacement 
to create many bootstrap replicate data sets 
• Each bootstrap replicate data set is analysed 
(e.g. with parsimony, distance, ML etc.) 
• Agreement among the resulting trees is 
summarized with a majority-rule consensus 
tree 
• Frequencies of occurrence of groups, 
bootstrap proportions (BPs), are a measure 
of support for those groups
Bootstrapping - an example 
Ciliate SSUrDNA - parsimony bootstrap 
Majority-rule consensus 
Ochromonas (1) 
Symbiodinium (2) 
Prorocentrum (3) 
Euplotes (8) 
Tetrahymena (9) 
Loxodes (4) 
Tracheloraphis (5) 
Spirostomum (6) 
Gruberia (7) 
100 
96 
84 
100 
100 
100
Bootstrap - interpretation 
• Bootstrapping is a very valuable and widely used 
technique (it is demanded by some journals) 
• BPs give an idea of how likely a given branch 
would be to be unaffected if additional data, with 
the same distribution, became available 
• BPs are not the same as confidence intervals. 
There is no simple mapping between bootstrap 
values and confidence intervals. There is no 
agreement about what constitutes a ‘good’ 
bootstrap value (> 70%, > 80%, > 85% ????) 
• Some theoretical work indicates that BPs can be a 
conservative estimate of confidence intervals 
• If the estimated tree is inconsistent all the 
bootstraps in the world won’t help you…..
Jack-knifing 
• Jack-knifing is very similar to 
bootstrapping and differs only in the 
character resampling strategy 
• Jack-knifing is not as widely 
available or widely used as 
bootstrapping 
• Tends to produce broadly similar 
results
Statistical evaluation of the obtained phylogenetic trees 
At present only sampling techniques allow testing the 
topology of a phylogenetic tree 
 Bootstrapping 
» It consists of drawing columns from a sample of 
aligned sequences, with replacement, until one gets 
a data set of the same size as the original one. 
(usually some columns are sampled several times 
others left out) 
 Half-Jacknife 
» This technique resamples half of the sequence sites 
considered and eliminates the rest. The final sample 
has half the number of initial number of sites 
without duplication.
Weblems 
W6.1: The growth hormones in most mammals have very similar ammo acid 
sequences. (The growth hormones of the Alpaca, Dog Cat Horse, Rabbit, and 
Elephant each differ from that of the Pig at no more than 3 positions out of 191.) 
Human growth hormone is very different, differing at 62 positions. The evolution of 
growth hormone accelerated sharply in the line leading to humans. By retrieving 
and aligning growth hormone sequences from species closely related to humans 
and our ancestors, determine where in the evolutionary tree leading to humans the 
accelerated evolution of growth hormone took place. 
W6.2: Humans are primates, an order that we, apes and monkeys share with lemurs 
and tarsiers. On the basis of the Beta-globin gene cluster of human, a 
chimpanzee, an old-world monkey, a new-world monkey, a lemur, and a tarsier, 
derive a phylogenetic tree of these groups. 
W6.3: Primates are mammals, a class we share with marsupials and monotremes; 
Extant marsupials live primarily in Australia, except for the opossum, found also in 
North and South America. Extant monotremes are limited to two animals from 
Australia: the platypus and echidna. Using the complete mitochondnal genome 
from human, horse (Equus caballus), wallaroo (Macropus robustus), American 
opossum (Didelphis mrgimana), and platypus (Ormthorhynchus anatmus), draw 
an evolutionary tree, indicating branch lengths. Are monotremes more closely 
related to placental mammals or to marsupials? 
W6.4: Mammals are vertebrates, a subphylum that we share with fishes, sharks, birds 
and reptiles, amphibia, and primitive jawless fishes (example: lampreys). For the 
coelacanth (Latimeria chalumnae), the great white shark (Carcharodon 
carcharias), skipjack tuna (Katsuwonus pelamis), sea lamprey (Petromyzon 
marinus), frog (Rana Ripens), and Nile crocodile (Crocodylus niloticus), using 
sequences of cytochromes c and pancreatic ribonucleases, derive evolutionary 
trees of these species.

More Related Content

What's hot

Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentRamya S
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. localbenazeer fathima
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Vijay Hemmadi
 
Orthologs,Paralogs & Xenologs
 Orthologs,Paralogs & Xenologs  Orthologs,Paralogs & Xenologs
Orthologs,Paralogs & Xenologs OsamaZafar16
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matricesAshwini
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentAfra Fathima
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAthira RG
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisPrasanthperceptron
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicshemantbreeder
 
Distance based method
Distance based method Distance based method
Distance based method Adhena Lulli
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmProshantaShil
 

What's hot (20)

Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Kegg
KeggKegg
Kegg
 
Structural databases
Structural databases Structural databases
Structural databases
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Prosite
PrositeProsite
Prosite
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Protein database
Protein databaseProtein database
Protein database
 
Orthologs,Paralogs & Xenologs
 Orthologs,Paralogs & Xenologs  Orthologs,Paralogs & Xenologs
Orthologs,Paralogs & Xenologs
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
High throughput sequencing
High throughput sequencingHigh throughput sequencing
High throughput sequencing
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Maximum parsimony
Maximum parsimonyMaximum parsimony
Maximum parsimony
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
 
Distance based method
Distance based method Distance based method
Distance based method
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
dot plot analysis
dot plot analysisdot plot analysis
dot plot analysis
 

Viewers also liked

Viewers also liked (13)

Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
Phylogenetic trees
Phylogenetic treesPhylogenetic trees
Phylogenetic trees
 
Bioinformatics Course
Bioinformatics CourseBioinformatics Course
Bioinformatics Course
 
Application of Radioimmunoassay
Application of RadioimmunoassayApplication of Radioimmunoassay
Application of Radioimmunoassay
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Phylogeny
PhylogenyPhylogeny
Phylogeny
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 

Similar to Bioinformatics t6-phylogenetics v2014

2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekingeProf. Wim Van Criekinge
 
Bioinformatics t6-phylogenetics v2013-wim_vancriekinge
Bioinformatics t6-phylogenetics v2013-wim_vancriekingeBioinformatics t6-phylogenetics v2013-wim_vancriekinge
Bioinformatics t6-phylogenetics v2013-wim_vancriekingeProf. Wim Van Criekinge
 
2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekinge2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekingeProf. Wim Van Criekinge
 
Taxonomy n Systematics 2
Taxonomy n Systematics 2Taxonomy n Systematics 2
Taxonomy n Systematics 2Hamid Ur-Rahman
 
Animal Systematics Lecture 2
Animal Systematics Lecture 2Animal Systematics Lecture 2
Animal Systematics Lecture 2Hamid Ur-Rahman
 
Species concept
Species conceptSpecies concept
Species conceptAlen Shaji
 
KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptx
KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptxKOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptx
KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptxPriyankaChakraborty95
 
Phylogentics and Phylogeny of Angiosperms
Phylogentics and Phylogeny of AngiospermsPhylogentics and Phylogeny of Angiosperms
Phylogentics and Phylogeny of AngiospermsSehrishSarfraz2
 
Bioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogeneticsBioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogeneticsProf. Wim Van Criekinge
 
bacterial systematics in the diversity of bacteria
bacterial systematics in the diversity  of bacteriabacterial systematics in the diversity  of bacteria
bacterial systematics in the diversity of bacteriatanvirastogi16
 
Phylogeney
Phylogeney Phylogeney
Phylogeney Smawi GH
 
INVERTEBRATES INTRODUCTION.pdf
INVERTEBRATES INTRODUCTION.pdfINVERTEBRATES INTRODUCTION.pdf
INVERTEBRATES INTRODUCTION.pdfNabeelTahir23
 
Principle of classification of living things
Principle of classification of living thingsPrinciple of classification of living things
Principle of classification of living thingsmnyaongo
 
W5-6-REVOLUTIONARY RELATIONSHIP.pptx
W5-6-REVOLUTIONARY RELATIONSHIP.pptxW5-6-REVOLUTIONARY RELATIONSHIP.pptx
W5-6-REVOLUTIONARY RELATIONSHIP.pptxLieLanieNavarro
 
Biodiversity electurespecies and DNA Barcoding handout.pptx
Biodiversity electurespecies and DNA Barcoding handout.pptxBiodiversity electurespecies and DNA Barcoding handout.pptx
Biodiversity electurespecies and DNA Barcoding handout.pptxRIZWANALI245
 

Similar to Bioinformatics t6-phylogenetics v2014 (20)

2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge
 
Bioinformatics t6-phylogenetics v2013-wim_vancriekinge
Bioinformatics t6-phylogenetics v2013-wim_vancriekingeBioinformatics t6-phylogenetics v2013-wim_vancriekinge
Bioinformatics t6-phylogenetics v2013-wim_vancriekinge
 
2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekinge2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekinge
 
UNIT 1 LS.ppt
UNIT 1 LS.pptUNIT 1 LS.ppt
UNIT 1 LS.ppt
 
Taxonomy n Systematics 2
Taxonomy n Systematics 2Taxonomy n Systematics 2
Taxonomy n Systematics 2
 
Animal Systematics Lecture 2
Animal Systematics Lecture 2Animal Systematics Lecture 2
Animal Systematics Lecture 2
 
Species concept
Species conceptSpecies concept
Species concept
 
KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptx
KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptxKOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptx
KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptx
 
Species concept
Species conceptSpecies concept
Species concept
 
Phylogentics and Phylogeny of Angiosperms
Phylogentics and Phylogeny of AngiospermsPhylogentics and Phylogeny of Angiosperms
Phylogentics and Phylogeny of Angiosperms
 
Bioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogeneticsBioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogenetics
 
bacterial systematics in the diversity of bacteria
bacterial systematics in the diversity  of bacteriabacterial systematics in the diversity  of bacteria
bacterial systematics in the diversity of bacteria
 
Phylogeney
Phylogeney Phylogeney
Phylogeney
 
Phylogeny-Abida.pptx
Phylogeny-Abida.pptxPhylogeny-Abida.pptx
Phylogeny-Abida.pptx
 
INVERTEBRATES INTRODUCTION.pdf
INVERTEBRATES INTRODUCTION.pdfINVERTEBRATES INTRODUCTION.pdf
INVERTEBRATES INTRODUCTION.pdf
 
Speciation in fungi
Speciation in fungiSpeciation in fungi
Speciation in fungi
 
11u bio div 04
11u bio div 0411u bio div 04
11u bio div 04
 
Principle of classification of living things
Principle of classification of living thingsPrinciple of classification of living things
Principle of classification of living things
 
W5-6-REVOLUTIONARY RELATIONSHIP.pptx
W5-6-REVOLUTIONARY RELATIONSHIP.pptxW5-6-REVOLUTIONARY RELATIONSHIP.pptx
W5-6-REVOLUTIONARY RELATIONSHIP.pptx
 
Biodiversity electurespecies and DNA Barcoding handout.pptx
Biodiversity electurespecies and DNA Barcoding handout.pptxBiodiversity electurespecies and DNA Barcoding handout.pptx
Biodiversity electurespecies and DNA Barcoding handout.pptx
 

More from Prof. Wim Van Criekinge

2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_uploadProf. Wim Van Criekinge
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Prof. Wim Van Criekinge
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 

More from Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 

Bioinformatics t6-phylogenetics v2014

  • 1.
  • 2. FBW 4-11-2014 Wim Van Criekinge
  • 3. Wel les op 4 november en GEEN les op 18 november
  • 4. Phylogenetics Introduction Definitions Species concept Examples The Tree-of-life Phylogenetics Methodologies Algorithms Distance Methods Maximum Likelihood Maximum Parsimony Rooting Statistical Validation Conclusions Orthologous genes Horizontal Gene Transfer Phylogenomics Practical Approach: PHYLIP Weblems
  • 5. What is phylogenetics ? Phylogeny (phylo =tribe + genesis) Phylogenetic trees are about visualising evolutionary relationships. They reconstruct the pattern of events that have led to the distribution and diversity of life. The purpose of a phylogenetic tree is to illustrate how a group of objects (usually genes or organisms) are related to one another Nothing in Biology Makes Sense Except in the Light of Evolution. Theodosius Dobzhansky (1900-1975)
  • 6. Trees • Diagram consisting of branches and nodes • Species tree (how are my species related?) – contains only one representative from each species. – all nodes indicate speciation events • Gene tree (how are my genes related?) – normally contains a number of genes from a single species – nodes relate either to speciation or gene duplication events
  • 7. Clade: A set of species which includes all of the species derived from a single common ancestor
  • 8. Species Concepts from Various Authors D.A. Baum and K.L. Shaw - Exclusive groups of organisms, where an exclusive group is one whose members are all more closely related to each other than to any organisms outside the group. J. Cracraft - An irreducible cluster of organisms, diagnosably distinct from other such clusters, and within which there is a parental pattern of ancestry and descent. Charles Darwin - "From these remarks it will be seen that I look at the term species, as one arbitrarily given for the sake of convenience to a set of individuals closely resembling each other, and that it does not essentially differ from the term variety, which is given to less distinct and more fluctuating forms. The term variety, again, in comparison with mere individual differences, is also applied arbitrarily, and for mere convenience sake" (Origin of Species, 1st ed., p. 108). T. Dobzhansky - The largest and most inclusive reproductive community of sexual and cross-fertilizing individuals which share a common gene pool. And later...Systems of populations, the gene exchange between which is limited or prevented by reproductive isolating mechanisms. M. Ghiselin - The most extensive units in the natural economy, such that reproductive competition occurs among their parts. D.M. Lambert - Groups of individuals that define themselves by a specific mate recognition system. J. Mallet - Identifiable genotypic clusters recognized by a deficit of intermediates, both at single loci and at multiple loci. E. Mayr - Groups of actually or potentially interbreeding natural populations which are reproductively isolated from other such groups. C.D. Michener - A group of organisms not itself divisible by phenetic gaps resulting from concordant differences in character states (except for morphs - such as sex, age, or caste), but separated by such phenetic gaps from other such units. H.E.H. Patterson - That most inclusive population of individual biparental organisms which share a common fertilization system. G.G. Simpson - A lineage of populations evolving with time, separately from others, with its own unique evolutionary role and tendencies. P.H.A. Sneath and R.R. Sokal - The smallest (most homogeneous) cluster that can be recognized upon some given criterion as being distinct from other clusters. A.R. Templeton - The most inclusive population of individuals having the potential for phenotypic cohesion through intrinsic cohesion mechanisms (genetic and/or demographic - i.e. ecological -exchangeability). E.O. Wiley - A single lineage of ancestor-descendant populations which maintains its identity from other such lineages and which has its own evolutionary tendencies and historical fate. S. Wright - A species in time and space is composed of numerous local populations, each one intercommunicating and intergrading with others.
  • 9. Species I. Definitions: Species = the basic unit of classification > Three different ways to recognize species:
  • 10. Plant Species Definitions: > Three different ways to recognize species: 1) Morphological species = the smallest group that is consistently and persistently distinct (Clusters in morphospace) species are recognized initially on the basis of appearance; the individuals of one species look different from the individuals of another
  • 11. Species Definitions: > Three different ways to recognize species: 2) Biological species = a set of interbreeding or potentially interbreeding individuals that are separated from other species by reproductive barriers species are unable to interbreed
  • 12. Species Definitions: > Three different ways to recognize species: 3) Phylogenetic species = the boundary between reticulate (among interbreeding individuals) and divergent relationships (between lineages with no gene exchange)
  • 13. Phylogenetic species divergent reticulate boundary recognized by the pattern of ancestor - descendent relationships
  • 14. Species Definitions: > Three different ways to recognize species: 4) Phylogenomics species = ability to transmit (and maintain) a (stable) gene pool Adresses the Anopheles genome topology variations
  • 15. Branching Order in a Phylogenetic Tree • In the tree to the left, A and B share the most recent common ancestry. Thus, of the species in the tree, A and B are the most closely related. • The next most recent common ancestry is C with the group composed of A and B. Notice that the relationship of C is with the group containing A and B. In particular, C is not more closely related to B than to A. This can be emphasized by the following two trees, which are equivalent to each other:
  • 16. More definitions … Edge, Branch Branch node, internal node Leafs Tips external node • A common simplifying assumption is that the three is bifurcating, meaning that each brach node has exactly two descendents. • The edges, taken together, are sometimes said to define the topology of the tree
  • 17. Outgroups, rooted versus unrooted An unrooted reptilian phylogeny with an avian outgroup and the corresponding rooted phylogeny. The Ri represent modern reptiles; the Ai, inferred ancestors and the B a bird.
  • 19. Examples Phylogenetic methods may be used to solve crimes, test purity of products, and determine whether endangered species have been smuggled or mislabeled: – Vogel, G. 1998. HIV strain analysis debuts in murder trial. Science 282(5390): 851-853. – Lau, D. T.-W., et al. 2001. Authentication of medicinal Dendrobium species by the internal transcribed spacer of ribosomal DNA. Planta Med 67:456-460.
  • 20.
  • 21. Examples – Epidemiologists use phylogenetic methods to understand the development of pandemics, patterns of disease transmission, and development of antimicrobial resistance or pathogenicity: • Basler, C.F., et al. 2001. Sequence of the 1918 pandemic influenza virus nonstructural gene (NS) segment and characterization of recombinant viruses bearing the 1918 NS genes. PNAS, 98(5):2746-2751. • Ou, C.-Y., et al. 1992. Molecular epidemiology of HIV transmission in a dental practice. Science 256(5060):1165-1171. • Bacillus Antracis:
  • 22.
  • 23. Examples • Conservation biologists may use these techniques to determine which populations are in greatest need of protection, and other questions of population structure: – Trepanier, T.L., and R.W. Murphy. 2001. The Coachella Valley fringe-toed lizard (Uma inornata): genetic diversity and phylogenetic relationships of an endangered species. Mol Phylogenet Evol 18(3):327-334. – Alves, M.J., et al. 2001. Mitochondrial DNA variation in the highly endangered cyprinid fish Anaecypris hispanica: importance for conservation. Heredity 87(Pt 4):463-473. • Pharmaceutical researchers may use phylogenetic methods to determine which species are most closely related to other medicinal species, thus perhaps sharing their medicinal qualities: – Komatsu, K., et al. 2001. Phylogenetic analysis based on 18S rRNA gene and matK gene sequences of Panax vietnamensis and five related species. Planta Med 67:461-465.
  • 25. Some Important Dates in History Origin of the Universe 15 billion yrs Formation of the Solar System 4.6 " First Self-replicating System 3.5 " Prokaryotic-Eukaryotic Divergence 2.0 " Plant-Animal Divergence 1.0 " Invertebrate-Vertebrate Divergence 0.5 " Mammalian Radiation Beginning 0.1 "
  • 30. What Sequence to Use ? • To infer relationships that span the diversity of known life, it is necessary to look at genes conserved through the billions of years of evolutionary divergence. • The gene must display an appropriate level of sequence conservation for the divergences of interest. .
  • 31. • If there is too much change, then the sequences become randomized, and there is a limit to the depth of the divergences that can be accurately inferred. • If there is too little change (if the gene is too conserved), then there may be little or no change between the evolutionary branchings of interest, and it will not be possible to infer close (genus or species level) relationships. What Sequence to Use ?
  • 32. Ribosomal RNA Genes and Their Sequences Carl Woese recognized the full potential of rRNA sequences as a measure of phylogenetic relatedness. He initially used an RNA sequencing method that determined about 1/4 of the nucleotides in the 16S rRNA (the best technology available at the time). This amount of data greatly exceeded anything else then available. Using newer methods, it is now routine to determine the sequence of the entire 16S rRNA molecule. Today, the accumulated 16S rRNA sequences (about 10,000) constitute the largest body of data available for inferring relationships among organisms.
  • 33. An example of genes in this category are those that define the ribosomal RNAs (rRNAs). Most prokaryotes have three rRNAs, called the 5S, 16S and 23S rRNA. What Sequence to Use ? Namea Size (nucleotides) Location 5S 120 Large subunit of ribosome 16S 1500 Small subunit of ribosome 23S 2900 Large subunit of ribosome a The name is based on the rate that the molecule sediments (sinks) in water. Bigger molecules sediment faster than small ones.
  • 34. Ribosomal RNA Genes and Their Sequences The extraordinary conservation of rRNA genes can be seen in these fragments of the small subunit rRNA gene sequences from organisms spanning the known diversity of life: human ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGCTGCAGTTAAAAAG... yeast ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAG... Corn ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAG... Escherichia coli ...GTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCG... Anacystis nidulans ...GTGCCAGCAGCCGCGGTAATACGGGAGAGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCG... Thermotoga maratima ...GTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTACCCGGATTTACTGGGCGTAAAGGG... Methanococcus vannielii ...GTGCCAGCAGCCGCGGTAATACCGACGGCCCGAGTGGTAGCCACTCTTATTGGGCCTAAAGCG... Thermococcus celer ...GTGGCAGCCGCCGCGGTAATACCGGCGGCCCGAGTGGTGGCCGCTATTATTGGGCCTAAAGCG... Sulfolobus sulfotaricus ...GTGTCAGCCGCCGCGGTAATACCAGCTCCGCGAGTGGTCGGGGTGATTACTGGGCCTAAAGCG...
  • 36. • Rate of evolution = rate of mutation • rate of evolution for any macromolecule is approximately constant over time (Neutral Theory of evolution) • For a given protein the rate of sequence evolution is approximately constant across lineages. Zuckerkandl and Pauling (1965) • This would allow speciation and duplication events to be dated accurately based on molecular data Molecular Clock (MC)
  • 37. Noval trees using Hox genes
  • 38. • (a) A traditional phylogenetic tree and
  • 39. • (a) A traditional phylogenetic tree and • (b) the new phylogenetic tree, each showing the positions of selected phyla. B, bilateria; AC, acoelomates; PC, pseudocoelomates; C, coelomates; P, protostomes; L, lophotrochozoa; E, ecdysozoa; D, deuterostomes.
  • 40. • Local and approximate molecular clocks more reasonable – one amino acid subst. 14.5 My – 1.3 10-9 substitutions/nucleotide site/year – Relative rate test (see further) • ((A,B),C) then measure distance between (A,C) & (B,C) Molecular Clock (MC)
  • 41. Proteins evolve at highly different rates Rate of Change Theoretical Lookback Time (PAMs / 100 myrs) (myrs) Pseudogenes 400 45 Fibrinopeptides 90 200 Lactalbumins 27 670 Lysozymes 24 850 Ribonucleases 21 850 Haemoglobins 12 1500 Acid proteases 8 2300 Cytochrome c 4 5000 Glyceraldehyde-P dehydrogenase2 9000 Glutamate dehydrogenase 1 18000 PAM = number of Accepted Point Mutations per 100 amino acids.
  • 42. Phylogenetics Introduction Definitions Species concept Examples The Tree-of-life Phylogenetics Methodologies Algorithms Distance Methods Maximum Likelihood Maximum Parsimony Rooting Statistical Validation Conclusions Orthologous genes Horizontal Gene Transfer Phylogenomics Practical Approach: PHYLIP Weblems
  • 44. • align • select method (evolutionary model) – Distance –ML –MP • generate tree • validate tree 4-steps
  • 45.
  • 47. Distance matrix methods (upgma, nj, Fitch,...) • Convert sequence data into a set of discrete pairwise distance values (n*(n-1)/2), arranged into a matrix. Distance methods fit a tree to this matrix. • The phylogenetic topology tree is constructed by using a cluster analysis method (like upgma or nj methods).
  • 48.
  • 49.
  • 50.
  • 51. Distance matrix methods (upgma, nj, Fitch,...)
  • 52. Distance matrix methods (upgma, nj, Fitch,...) CGT
  • 53. Distance matrix methods (upgma, nj, Fitch,...) Since we start with A,p(A)=1
  • 54. Distance matrix methods (upgma, nj, Fitch,...) D=evolutionary distance ~ tijd F = dissimilarity ~ (1 – PX(t)) F ~ 1 – d
  • 55. Distance matrix methods (upgma, nj, Fitch,...)
  • 56.
  • 57. Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
  • 58. Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
  • 59. Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
  • 60. Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
  • 61. Distance matrix methods: Summary http://www.bioportal.bic.nus.edu.sg/phylip/neighbor.html
  • 62. Distance matrix methods (upgma, nj, Fitch,...) • The phylogeny makes an estimation of the distance for each pair as the sum of branch lengths in the path from one sequence to another through the tree.  easy to perform ;  quick calculation ;  fit for sequences having high similarity scores ; • drawbacks :  the sequences are not considered as such (loss of information) ;  all sites are generally equally treated (do not take into account differences of substitution rates ) ;  not applicable to distantly divergent sequences.
  • 63.
  • 64.
  • 65. • In this method, the bases (nucleotides or amino acids) of all sequences at each site are considered separately (as independent), and the log-likelihood of having these bases are computed for a given topology by using a particular probability model. • This log-likelihood is added for all sites, and the sum of the log-likelihood is maximized to estimate the branch length of the tree. Maximum likelihood
  • 67. • This procedure is repeated for all possible topologies, and the topology that shows the highest likelihood is chosen as the final tree. • Notes :  ML estimates the branch lengths of the final tree ;  ML methods are usually consistent ;  ML is extented to allow differences between the rate of transition and transversion. • Drawbacks  need long computation time to construct a tree. Maximum likelihood
  • 69.
  • 70. Parsimony criterion • It consists of determining the minimum number of changes (substitutions) required to transform a sequence to its nearest neighbor. Maximum Parsimony • The maximum parsimony algorithm searches for the minimum number of genetic events (nucleotide substitutions or amino-acid changes) to infer the most parsimonious tree from a set of sequences. Maximum Parsimony
  • 71. Maximum Parsimony Occam’s Razor Entia non sunt multiplicanda praeter necessitatem. William of Occam (1300-1349) The best tree is the one which requires the least number of substitutions
  • 72. • The best tree is the one which needs the fewest changes. – If the evolutionary clock is not constant, the procedure generates results which can be misleading ; – within practical computational limits, this often leads in the generation of tens or more "equally most parsimonious trees" which make it difficult to justify the choice of a particular tree ; – long computation time to construct a tree. Maximum Parsimony
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80. Maximum Parsimony: Branch Node A or B ?
  • 81. Maximum Parsimony: A requires 5 mutaties
  • 82. Maximum Parsimony: B (and propagating A->B) requires only 4 mutations
  • 83. • The best tree is the one which needs the fewest changes. • Problems : – If the evolutionary clock is not constant, the procedure generates results which can be misleading ; – within practical computational limits, this often leads in the generation of tens or more "equally most parsimonious trees" which make it difficult to justify the choice of a particular tree ; – long computation time to construct a tree. Maximum Parsimony
  • 84. Phylogenetics Introduction Definitions Species concept Examples The Tree-of-life Phylogenetics Methodologies Algorithms Distance Methods Maximum Likelihood Maximum Parsimony Rooting Statistical Validation Conclusions Orthologous genes Horizontal Gene Transfer Phylogenomics Practical Approach: PHYLIP Weblems
  • 85. Comparative evaluation of different methods  There is at present no statistical methods which allow comparisons of trees obtained from different phylogenetic methods, nevertheless many studies have been made to compare the relative consistency of the existing methods.
  • 86. Comparative evaluation of different methods  The consistency depends on many factors, among these the topology and branch lengths of the real tree, the transition/transversion rate and the variability of the substitution rates.  One expects that if sequences have strong phylogenetic relationship, different methods will show the same phylogenetic tree
  • 87. Comparison of methods • Inconsistency • Neighbour Joining (NJ) is very fast but depends on accurate estimates of distance. This is more difficult with very divergent data • Parsimony suffers from Long Branch Attraction. This may be a particular problem for very divergent data • NJ can suffer from Long Branch Attraction • Parsimony is also computationally intensive • Codon usage bias can be a problem for MP and NJ • Maximum Likelihood is the most reliable but depends on the choice of model and is very slow • Methods may be combined
  • 88. Rooting the Tree • In an unrooted tree the direction of evolution is unknown • The root is the hypothesized ancestor of the sequences in the tree • The root can either be placed on a branch or at a node • You should start by viewing an unrooted tree
  • 89. Automatic rooting • Many software packages will root trees automaticall (e.g. mid-point rooting in NJPlot) • Sometimes two trees may look very different but, in fact, differ only in the position of the root • This normally involves assumptions… BEWARE!
  • 90. Rooting Using an Outgroup 1. The outgroup should be a sequence (or set of sequences) known to be less closely related to the rest of the sequences than they are to each other 2. It should ideally be as closely related as possible to the rest of the sequences while still satisfying condition 1 The root must be somewhere between the outgroup and the rest (either on the node or in a branch)
  • 91. How confident am I that my tree is correct? Bootstrap values Bootstrapping is a statistical technique that can use random resampling of data to determine sampling error for tree topologies
  • 92. Bootstrapping phylogenies • Characters are resampled with replacement to create many bootstrap replicate data sets • Each bootstrap replicate data set is analysed (e.g. with parsimony, distance, ML etc.) • Agreement among the resulting trees is summarized with a majority-rule consensus tree • Frequencies of occurrence of groups, bootstrap proportions (BPs), are a measure of support for those groups
  • 93. Bootstrapping - an example Ciliate SSUrDNA - parsimony bootstrap Majority-rule consensus Ochromonas (1) Symbiodinium (2) Prorocentrum (3) Euplotes (8) Tetrahymena (9) Loxodes (4) Tracheloraphis (5) Spirostomum (6) Gruberia (7) 100 96 84 100 100 100
  • 94. Bootstrap - interpretation • Bootstrapping is a very valuable and widely used technique (it is demanded by some journals) • BPs give an idea of how likely a given branch would be to be unaffected if additional data, with the same distribution, became available • BPs are not the same as confidence intervals. There is no simple mapping between bootstrap values and confidence intervals. There is no agreement about what constitutes a ‘good’ bootstrap value (> 70%, > 80%, > 85% ????) • Some theoretical work indicates that BPs can be a conservative estimate of confidence intervals • If the estimated tree is inconsistent all the bootstraps in the world won’t help you…..
  • 95. Jack-knifing • Jack-knifing is very similar to bootstrapping and differs only in the character resampling strategy • Jack-knifing is not as widely available or widely used as bootstrapping • Tends to produce broadly similar results
  • 96. Statistical evaluation of the obtained phylogenetic trees At present only sampling techniques allow testing the topology of a phylogenetic tree  Bootstrapping » It consists of drawing columns from a sample of aligned sequences, with replacement, until one gets a data set of the same size as the original one. (usually some columns are sampled several times others left out)  Half-Jacknife » This technique resamples half of the sequence sites considered and eliminates the rest. The final sample has half the number of initial number of sites without duplication.
  • 97. Weblems W6.1: The growth hormones in most mammals have very similar ammo acid sequences. (The growth hormones of the Alpaca, Dog Cat Horse, Rabbit, and Elephant each differ from that of the Pig at no more than 3 positions out of 191.) Human growth hormone is very different, differing at 62 positions. The evolution of growth hormone accelerated sharply in the line leading to humans. By retrieving and aligning growth hormone sequences from species closely related to humans and our ancestors, determine where in the evolutionary tree leading to humans the accelerated evolution of growth hormone took place. W6.2: Humans are primates, an order that we, apes and monkeys share with lemurs and tarsiers. On the basis of the Beta-globin gene cluster of human, a chimpanzee, an old-world monkey, a new-world monkey, a lemur, and a tarsier, derive a phylogenetic tree of these groups. W6.3: Primates are mammals, a class we share with marsupials and monotremes; Extant marsupials live primarily in Australia, except for the opossum, found also in North and South America. Extant monotremes are limited to two animals from Australia: the platypus and echidna. Using the complete mitochondnal genome from human, horse (Equus caballus), wallaroo (Macropus robustus), American opossum (Didelphis mrgimana), and platypus (Ormthorhynchus anatmus), draw an evolutionary tree, indicating branch lengths. Are monotremes more closely related to placental mammals or to marsupials? W6.4: Mammals are vertebrates, a subphylum that we share with fishes, sharks, birds and reptiles, amphibia, and primitive jawless fishes (example: lampreys). For the coelacanth (Latimeria chalumnae), the great white shark (Carcharodon carcharias), skipjack tuna (Katsuwonus pelamis), sea lamprey (Petromyzon marinus), frog (Rana Ripens), and Nile crocodile (Crocodylus niloticus), using sequences of cytochromes c and pancreatic ribonucleases, derive evolutionary trees of these species.