Paul Lewis SSB Past-President's address at Evol2016

© Copyright 2016 by Paul O. Lewis
Entropy and information
in phylogenetics
Past-President
Society of Systematic Biologists
Paul O. Lewis
Evolution2016
Joint Annual Meeting of SSE, ASN, and SSB
Austin, Texas ~ 19 June 2016

What is information?
details, particulars, facts, ﬁgures, statistics, data;
knowledge, intelligence; instruction, advice,
guidance, direction, counsel, enlightenment; news,
word; hot tip; informal: info, lowdown, dope, dirt,
inside story, scoop, poop.
— Synonyms of information in the Oxford American
Writer’s Thesaurus

Does information=data?
Taxon 1
Taxon 2
Taxon 3
Taxon 4
Taxon 5
Taxon 6
Taxon 7
Taxon 8
Taxon 9
Taxon 10
Taxon 11
Taxon 12
AAAAAAAAAAAAAAAAAAAAAAAAA

Information=Data?
GGGTTGAATGGGGTGCGACTTATTC
GCGGCGATAGACTGCTACTACGTGC
CCCGTGGATAGCGACGTCTACAAGA
GGCTGTCGTAGCTTCCGTGTAATAC
CCGGAGGCAAACACCCTGTTCCCCC
GGGCAATATATATCCGCACCGCTCG
AAGAGCCGACAAGTAGAATCGGGAT
AGTAGCACAAGCGACACGGCAATAA
GTCGTGTTTTACCAGAGGTTGCATA
GCGTTGTAACACCCTTACCCTCTTT
AGTACATGTATGTTTCCTTCGTTCG
TGGGTTCCGCCCCGAGACGAGGCTC
Taxon 1
Taxon 2
Taxon 3
Taxon 4
Taxon 5
Taxon 6
Taxon 7
Taxon 8
Taxon 9
Taxon 10
Taxon 11
Taxon 12

The correct exposure for
phylogenetic inference
0.02 subst./site
Data simulated on the tree
above are nearly optimal for
phylogeny estimation
ACGGTCGAGGCGTAGACTCGATCAA
ACGGTCGATGCGTAGACTCGATCAA
ACGGTCGACGCGTATACTCGATCAA
ACGGTCGACGCGGATACTCGATCAA
ACGGTTGACGCATATACTCGATCAA
ACGGTTGACGCATATACTCGATCAA
ACCGTTGACGCATATACTCGATCAA
ACCGTTGACGCATATACTCGATCAA
Taxon 1
Taxon 2
Taxon 3
Taxon 4
Taxon 5
Taxon 6
Taxon 7
Taxon 8
Taxon 9
Taxon 10
Taxon 11
Taxon 12

Negatively skewed parsimony tree length
distributions indicate information content
Noisy
Fitch 1984
Informative
most
parsimonious
tree

The g1
statistic quantiﬁes
skewness, and hence
information content
g1=0.05 g1=-0.96
Hillis 1991; Huelsenbeck 1991
slightly
positive
quite
negative

Taxon 1
Taxon 2
Taxon 3
Taxon 4
Taxon 5
A AC T G T
A AC T G T
C AG A TT
C GG A CT
C GG A CT
Shuﬄing taxon assignments within characters (sites)
removes hierarchical structure due to history
Archie 1989; Faith & Cranston 1991

Taxon 1
Taxon 2
Taxon 3
Taxon 4
Taxon 5 A
C
T
G
T
A
A
C T G
TC
AG
A
T
T
C G
G
A
C
T
C G
G
A
C
T
Shuﬄing taxon assignments within characters (sites)
removes hierarchical structure due to history
A

Shuffling tests easily
differentiate random versus
properly exposed data
unshuffled
original
now that's
significant!

TGCGTGGCGTTGGGGTAGCCCTCAC
TGCGTGGCGTTTGGGTAGCCCTCAC
TGCGTGGCGTTGGGGTAGCCCTTAC
TGCGTGGCGTGGGGGTAGCCCTCAC
TGCGTGGCGTTTGGGTAGCCCTCAC
TGCGTGGCGTTGGGGTAGCCCTTAC
TGCGTGGCGTGGGGGTAGCCCTCAC
Xie et al. 2003
T
G
A
T
G
C
AT
G C
A T
G
C
A
G CT A
Properly exposed data has lower nucleotide
compositional entropy than saturated data
S
LO
W
LY
E
V
O
LV
IN
G
SA
TU
R
A
TE
D

Plotting pairwise p-distance against model-corrected
distance reveals overexposure graphically
0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
Estimated distance
Proportiondiﬀerent
2nd codon positions
3rd codon positions
??

The Bayesian framework provides a
natural way to quantify information
0.0 0.2 0.4 0.6 0.8 1.0
θ = Pr(coin lands heads on any given ﬂip)
uniform probability density
2-headed coinfair coin2-tailed coin

The information in just 3 ﬂips is
enough to make trick coins impossible
0.0 0.2 0.4 0.6 0.8 1.0
2-headed coin2-tailed coin
0.0 0.2 0.4 0.6 0.8 1.0

The diﬀerence between prior and
posterior measures information content
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.00.0 0.2 0.4 0.6 0.8 1.0
prior
posterior
Brown 2014

Information Theory
Dissonance
Additivity
Scaling Storm
Polytomy Rainbow
Why?

“Information is the resolution of uncertainty”
— Claude Shannon, 1948

The uncertainty Claude Shannon was
interested in resolving was “Which symbol was
last transmitted over a telegraph system?”
Sender chooses 1 of 8
possible symbols
Receiver must resolve which
symbol was sent
Information = number of
questions receiver needs to ask to
determine which symbol was sent
★
★
★★ ?★
★
★
★
★

Any 1 of the 8 symbols can be identiﬁed
by answering 3 yes/no questions
★ ★ ★ ★
circle
? noyes
blue?
yes no
blue?
noyes
★? ★? ★? ★?yes no yes no yes no yes no
111 110 101 100 011 010 001 0001 1 1 1 0 0 0 011 11 10 10 01 01 00 00

Dichotomous keys embody
Shannon's basic units information
seeds?
yesno
vascular?
no yes
ﬂowers?
yesno
bryophyte fern gymnosperm angiosperm

entropy = 3
1/81/81/8 1/8 1/8 1/8 1/8 1/8 1/8
If each symbol has an equal chance of being
chosen by the sender, then 3 bits are needed to
identify each symbol on average
★ ★ ★ ★
1/8 1/8 1/8 1/8 1/8 1/8 1/8
111 110 101 100 011 010 001 000
3 bits 3 bits 3 bits 3 bits 3 bits 3 bits 3 bits 3 bits
entropy equals
average number of
questions needed
1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8

If 1 symbol is sent half the time, and 4 other
remaining symbols are equally probable, then
only 2 bits are needed on average
★ ★ ★
1 011 010 001 000
circle
? noyes
blue?
noyes
★? ★?yes no yes no
Only 1 bit needed
half the time
3 bits needed
the other half of
the time
entropy = 2
1/2 1/8 1/8 1/8 1/8

If only 1 symbol is ever sent, then no
questions need be asked by the receiver,
and thus no information is required
★ ★ ★ ★ ★ ★ ★ ★ ★ ★
entropy = 0
1
0 questions
need be
asked

In the previous examples, there is
no uncertainty at the receiving end
★ ★ ★ ★
100% correct

Noise means that the data received do not
contain enough information to unambiguously
identify the symbol transmitted
★ ★ ★ ★
73%
101
100 001111
8.1% 8.1% 8.1%
1.6 bits3 bits

If not all bits are transmitted, there will
also be uncertainty at the destination
★ ★ ★ ★
50%
100
50%
101
10

Estimating the phylogeny for 4 taxa involves identifying
1 symbol (a tree) from a total of 15 symbols
A B C D B A C D A B C D C D A B D C A B A C B D C A B D A C B D B D A C D B A C A D B C D A B C A D B C B C A D C B A DA C B D
A B C D B A C D A B C D C D A B D C A B A C B D C A B D A C B D B D A C D B A C A D B C D A B C A D B C B C A D C B A D
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110
0111 3.9 bits sent
3.9 bits received

Simulating sequence data on a tree captures
information about that tree's topology
A B C D B A C D A B C D C D A B D C A B
A C B D C A B D A C B D B D A C D B A C
A D B C D A B C A D B C B C A D C B A D
model tree

0 sites

1 site

10 sites

100 sites

1000 sites
1000 sites captures
enough information to
identify the tree topology
chosen as the model tree

In 4-taxon simulations, information
estimation works as you might expect
Relative
rate
%I
0.01 18
0.1 99
1 100
10 64
100 1.5
Percent
missing
%I
0 100
50 98
100 0
Rate
variance
%I
1 100
10 97
100 13
1000 0
info highest at
optimal rate
info decreases with
no missing data
info decreases with
rate heterogeneity

Information can be
false information!
POLICE

Dissonance
Additivity
Scaling Storm
Polytomy Rainbow
Why?
Information Theory

Horizontal transfer results in conﬂicting
information about the placement of bloodroot
(Sanguinaria)
Bocconia
Eschscholzia
Oryza
Disporum
Sanguinaria
Oryza
Disporum
Eschscholzia
Bocconia
Sanguinaria
5’ end
3’ end
(horizontally transferred
from monocots)
monocots
monocots
Papaveraceae
Papaveraceae
Bergthorsson et al. 2003rps11 mtDNA

The 5' data contains 2.9 of 3.9 bits of
information
S B E D B S E D S B E D E D S B D E S B S E B D E S B D S E B D D B S E S D B E S D B E B E S D E B S D
O O O O O O O O O O O O O
O O O O O O O O
B D S E
OO O
D S B E
O O O O
74.5%
information
D S B E
O
B D S E
O

Likewise, the 3' data captures 2.6 of 3.9
bits of information
O O O O O O O O
B D S E
OO O
D S B E
O O O O
66.8%
information
B D S E
O
D S B E
O

What do you expect will happen if we
concatenate the two data sets?
A ACGTACGTA ATATGTGTG
B ACGTACGTA GCGCACACA
C CCATGCGCA GCGCACACA
D GTACGCACA ATATGTGTG
E GTACGCACA ATATGGTTG
A ACGTACGTA
B ACGTACGTA
C CCATGCGCA
D GTACGCACA
E GTACGCACA
Data 1
A ATATGTGTG
B GCGCACACA
C GCGCACACA
D ATATGTGTG
E ATATGGTTG
Data 2
D
A
C
BE
Concatenated
Tree ﬁle

Concatenating the 3' and 5' data, we might
expect the conﬂict to be expressed as noise
O O O O O O O O
B D S E
OO O
D S B E
O O O O
B D S E
O
D S B E
O
5' tree 3' tree
hypothetical posterior distribution

Instead, we get all 3.9 bits of information needed, but
identify a tree that is neither the 3' nor the 5' tree!
5' tree 3' tree
D S B E
O
B D S E
O
D S B E
O
B D S E
O
concatenated
data contains
100% of info!

Each data set strongly rejects the other's
favorite tree, so a mediocre tree wins everything
Bocconia
Eschscholzia
Oryza
Disporum
Sanguinaria
Oryza
Disporum
Eschscholzia
Bocconia
Sanguinaria
5'
Topology 5’ 3’ Concatenated
((S,D),O),E,B) --- 0.64 ---
((S,O),D),E,B) --- 0.18 ---
((O,D),S),E,B) 0.11 0.18 1
(O,D,(B,(S,E)) 0.12 --- ---
(O,D,(E,(S,B)) 0.77 --- ---
Info 74.5% 66.8% 100%
3'
This loser wins
everything!
5' data rejects
these 2 trees
3' data rejects
these 2 trees

D
E
C
BA
Trees 2
Merging tree ﬁles provides a means of
measuring information dissonance
A ACGTACGTA ATATGTGTG
B ACGTACGTA GCGCACACA
C CCATGCGCA GCGCACACA
D GTACGCACA ATATGTGTG
E GTACGCACA ATATGGTTG
A ACGTACGTA
B ACGTACGTA
C CCATGCGCA
D GTACGCACA
E GTACGCACA
Data 1
D
C
A
BE
Trees 1
A ATATGTGTG
B GCGCACACA
C GCGCACACA
D ATATGTGTG
E ATATGGTTG
Data 2
D
C
A
BE
D
E
C
BA
Merged
D
A
C
BE
Concatenated

Merged tree ﬁle says the same thing as
individual tree ﬁles if there is no dissonance
Topology 5’ 5’ Merged
((O,D),S),E,B) 0.11 0.11 0.11
(O,D,(B,(S,E)) 0.12 0.12 0.12
(O,D,(E,(S,B)) 0.77 0.77 0.77
Info 74.5% 74.5% 74.5%
same, no dissonance
same

Dissonance is the diﬀerence between
merged info and average info
Topology 5’ 3’ Merged
((S,D),O),E,B) --- 0.64 0.32
((S,O),D),E,B) --- 0.18 0.09
((O,D),S),E,B) 0.11 0.18 0.14
(O,D,(B,(S,E)) 0.12 --- 0.06
(O,D,(E,(S,B)) 0.77 --- 0.39
Info 74.5% 66.8% 48.6%
average info = 70.7
22.1
dissonance

Additivity
Scaling Storm
Polytomy Rainbow
Why?
Information Theory
Dissonance

0.6
0.6
6.7
this clade provides the the largest
contribution because here the 945
possible trees are cut down to just 9
Entropy, information, and dissonance
can all be partitioned by clade
AB|C D|EF DE|F
ABCDEF
ABC|DEF
A|BC
A B C D E FA B C D E F A B C D E F A B C D E F
0.25 0.25 0.25 0.25
Information = 7.9 bits
= 6.7 + 0.6 + 0.6
0.5 0.5 0.5 0.5
1

Two data sets simulated on trees diﬀering only in
the swapping of two tips illustrates that
dissonance can pinpoint disagreement

Two data sets simulated on trees diﬀering only in
the swapping of two tips illustrates that
dissonance can pinpoint disagreement
All dissonance
attributed to
clade
containing
swapped taxa

Scaling Storm
Polytomy Rainbow
Why?
Information Theory
Dissonance
Additivity

There are 5.6×1026
distinct labeled
unrooted binary tree topologies for 24 taxa

There are 5.6×10
unrooted binary tree topologies for
A computer examining 1 billion trees/second would
have to start before the Big Bang in order to ﬁnish
looking through all these trees!

There are 5.6×10
A computer examining
have to start
An MCMC sample of 1 trillion trees is still 564 trillion
times too small to sample each tree once

There are 5.6×10
A computer examining
have to start
An MCMC
times too small
Bottom line: it is impossible to accurately estimate the
entropy of a posterior representing zero information
for any reasonable number of taxa

Taxa Unrooted Trees Estimated information (%)
4 3 0
5 15 0
6 105 0
7 945 1
8 10,395 6
9 135,135 22
10 2,027,025 37
11 34,459,425 47
12 654,729,075 55
If data contains zero information, inadequate
sampling results in high estimated information
content
10,000 trees
sampled

Taxa Unrooted Trees Estimated information (%)
4 3 0
5 15 0
6 105 0
7 945 1
8 10,395 6
9 135,135 22
10 2,027,025 37
11 34,459,425 47
12 654,729,075 55
If data contains zero information, inadequate
sampling results in high estimated information
content
10,000 trees
sampled
This little dot is how
much tree space
we've covered
65,473 times
larger than
sample size

Polytomy Rainbow
Why?
Information Theory
Dissonance
Additivity
Scaling Storm

Polytomy priors make it possible to
estimate low information content accurately
1 tree
25 trees
105 trees
105 trees
Lewis, Holder & Holsinger 2005

1 tree
25 trees
105 trees
105 trees
the star tree (resolution class 1)

1 tree
25 trees
105 trees
105 trees
the star tree (resolution class 1)
fully resolved (resolution class 4)

1 tree
25 trees
105 trees
105 trees
more than doubles
size of tree space

0.25
0.25
0.25
0.25
1 tree
25 trees
105 trees
105 trees
Make each of the 4
resolution classes
equally probable
under the prior

Flat resolution class prior easy to
sample even for a 24-taxon problem
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0 125 250 375 500
Info = 0.026%
10,000 trees sampled

Highly informative data sets place all
posterior in the fully-resolved resolution class
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0 2500 5000 7500 10000
Info = 100%
10,000 trees sampled
All posterior on just 1
of the 5.6×1026
possible trees!

Estimated distance
Proportiondiﬀerent
The Bayesian approach is better at assessing the
information content of 2nd vs. 3rd position sites
0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
2nd codon positions
3rd codon positions
Saturated?

The Bayesian approach is better at assessing the
information content of 2nd vs. 3rd position sites
0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
0.005
0.700
3rd position sites:
info = 86.4%
2nd position sites:
info = 75.6%
3rd positions have
more information than
2nd positions!

0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
Using the resolution class prior does not change the
conclusion that 3rd position sites have more
information than 2nd position sites
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
0 1000 2000 3000 4000
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
0 400 800 1200 1600
2nd position sites:
info = 30.2%
3rd position sites:
info = 54.9%

Why?
Information Theory
Dissonance
Additivity
Scaling Storm
Polytomy Rainbow

Why measure information
content?
• Morphology vs. molecules

content?
• Informed site-stripping
0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5

content?
• Impact of missing data
missing taxa missing genes random

content?
• Partition gene tree conﬂict
D
E
C
BA
Trees 2
A ACGTACGTA
B ACGTACGTA
C CCATGCGCA
D GTACGCACA
E GTACGCACA
Data 1
D
C
A
BE
Trees 1
A ATATGTGTG
B GCGCACACA
C GCGCACACA
D ATATGTGTG
E ATATGGTTG
Data 2
dissonance

content?
• Proﬁling information content

content?
• Divergence time analyses

Thanks!
~ UConn Collaborators ~
Ming-Hui Chen, Lynn Kuo, Louise Lewis, Karolina Fučíková,
Suman Neupane, Yu-Bo Wang, Daoyuan Shi
Supported by the National
Science Foundation
Department of Ecology and
Evolutionary Biology
http://dx.doi.org/10.1093/sysbio/syw042
Systematic Biology Advance Access

Literature Cited
Archie, J. W. 1989. A randomization test for phylogenetic information in systematic data. Systematic Zoology 38(3):239–252.
Bergthorsson U., Adams K.L., Thomason B., Palmer J.D. 2003. Widespread horizontal transfer of mitochondrial genes in
ﬂowering plants. Nature 424:197–201.
Brown, J. M. 2014. Detection of implausible phylogenetic inferences using posterior predictive assessment of model ﬁt.
Systematic Biology, 63(3), 334–348.
Faith, D. P., & Cranston, P. S. 1991. Could a cladogram this short have arisen by chance alone?: on permutation tests for cladistic
structure. Cladistics 7(1):1–28.
Fitch, W. M. 1984. Cladistic and other methods: problems, pitfalls, and potentials. Chapter 12 in: Duncan, T., and Stuessy, T. F.
(eds.), Cladistics: perspectives on the reconstruction of evolutionary history. Papers presented at a workshop on the theory and application
of cladistic methodology, March 22-28, 1981, University of California, Berkeley. Columbia University Press, New York.
Hillis, D. M. 1991. Discriminating between phylogenetic signal and random noise in DNA sequences. In M. M. Miyamoto & J.
Cracraft (Eds.), Phylogenetic analysis of DNA sequences (pp. 278–284). New York: Oxford University Press.
Huelsenbeck, J. P. 1991. Tree-length distribution skewness: an indicator of phylogenetic information. Systematic Zoology 40(3):
257–270.
Larget, B. 2013. The estimation of tree posterior probabilities using conditional clade probability distributions. Systematic Biology
62(4):501–511.
Lewis, P. O., Holder, M. T., & Holsinger, K. E. 2005. Polytomies and Bayesian phylogenetic inference. Systematic Biology 54(2):241–
253.
Xia, X., Xie, Z., Salemi, M., Chen, L., & Wang, Y. 2003. An index of substitution saturation and its application. Molecular
Phylogenetics and Evolution 26(1):1–7.
Claude Shannon photograph: http://www.itsoc.org/about/shannon
All other photographs by Paul O. Lewis

Paul Lewis SSB Past-President's address at Evol2016

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Paul Lewis SSB Past-President's address at Evol2016