8. WTF??
8
Aquifex aeolicus genes
have high similarity with
genes from…
Adding Sulfurihydrogenibium
is enough to swing it from
“early branching” to “exotic
Epsilonproteobacterium”
Eveleigh et al. Genome Biol Evol (2013)
9. What I’m trying to say here
Lateral gene transfer and other processes, coupled with
old, old divergence times, leads to:
1. Phylogenetic instability
2. Artefactual “early branching”
3. Invalid representations of evolutionary relationships
One possible solution: try to release the "phylogenetic
pressure" with a network representation
9
10. 10
Emma-Allen Vercoe
“I am not one and simple, but complex and many.”
- Virginia Woolf, The Waves
Part 2: On to networks!
11. So you want to build a network.
• Things to think about:
• What types of relationship are you trying to show?
• Do evolutionary distances matter?
• Rooted or unrooted?
• ALL relationships or just the most interesting / important
ones?
• How long will it take to build?
11
12. Implicit networks
• Good at showing uncertainty, ambiguity and
conflict in the data, BUT which type of confusion
are we looking at??
• Showing alternative bipartitions:
12
15. Z-closure: supernetworks from
trees (Huson et al., 2004)
15
• We have a set of trees T that define a set of splits
• These splits are not all necessarily compatible, nor
do they necessarily all cover the same set of taxa
• Reconcile them by merging splits from overlapping
trees (the Z-rule)
• The result is a supernetwork constructed directly
from many trees
18. 18
Core genes (n = 2317) of 30 E. coli genomes (red = pathogen, blue = non-pathogen)
Beauregard-Racine, middle authors, and Bapteste, Biol Direct (2011)
Chaos and confusion
19. So…
Explicit networks
• Show specific merging relationships (LGT,
hybridization) between lineages
• Need to balance complexity of calculation with
complexity of representation
19
20. Nodes in an explicit network
20
Huson and Scornavacca,
Genome Biol Evol (2010)
Tree node
(1 parent, 2 children)
Reticulation node
(2 parents, 1 child)
21. Cluster networks (Huson and Rupp, 2008)
• "Hardwired": each edge defines a bipartition,
therefore every displayed bipartition needs a
supporting edge
21
Galled networks (Kanj et al., 2008)
• "Softwired": switching reticulation edges on or off
gives different relationships
23. So...
• "Softwired" clusters are easier to interpret
• But their complexity is much worse! (exponential vs
polynomial)
23
24. Affinity
• Network / matrix that shows the existence (and
possibly magnitude) of direct relationships
between entities
• Unconstrained in the relationships they can show,
but not phylogenetic per se
Pseudomonas affinities (Holloway and Beiko, 2010)
24
26. What should I do?????
• The best option may depend on just *how*
incompatible your sequences / trees are, and how
much information you're willing to set aside
• Tree = "la la la..."
• Cluster network = "it's complicated..."
• Affinity = "it's complicated, let me lay it out for you"
26
27. 27
"Divide each difficulty into as many parts as is
feasible and necessary to resolve it."
- René Descartes
Part 3: Unwinding Gene Transfer
30. SPR
SPR and LGT
“Species” tree LGT Gene tree
30
Identifying LGT events: permute species tree via SPRs until it agrees with the gene tree
Exponential in the size of the tree (n taxa)
PAINFULLY SLOW
(Beiko and Hamilton, 2006)
31. SPR and LGT and MAF
AGREEMENT FOREST (AF) – a set of subtrees that are compatible between S and G
MAXIMUM AGREEMENT FOREST (MAF) – the AF with the smallest number of subtrees
SPRS,G = |MAF|S,G – 1 (Bordewich and Semple, Ann Combinatorics 2005)
31
32. Building a MAF by cutting
Example case: a & c are sisters in the species tree, but not in the gene tree (where
b1, b2, …, bn are intervening). What can we do to the gene tree?
Do this recursively until there are no discrepancies between the two trees
32
Cut a Cut c Cut all of the bis
33. FIXED PARAMETER TRACTABLE –
Exponential in the distance between trees (k), not
the number of leaves (n)
k is almost always << n
AND IT’S EVEN BETTER –
Separate a larger tree (distance k) into independent
subtrees (distances j and k – j)
33
37. 37
Remember that the happiest people are not those
getting more, but those giving more.
- H. Jackson Brown
Part 4: A Couple of New Directions
38. Thinking about metagenomes
• Sure, we can infer LGT between genomes, but what
do these genomes have to do with each other?
• We can look for evidence of within-habitat transfer
by examining metagenomic samples
38
39. Hsu et al. (submitted)
WAAFLE
Workflow for Annotating Assemblies and Finding Lateral gene transfer Events
39
Genes on a contig
40. Genomic Epidemiology
• Inferring movement of pathogens between
habitats, and evolutionary events (such as LGT of
antimicrobial resistance elements) during the
spread of a pathogen
• Thousands of genomes create challenges!!
40
42. And on that note…
We’re hiring!!!
42
http://arete-amr.ca/
43. Putting it all together...
• Networks: order out of chaos?
• Slightly less chaos out of chaos
• More importantly: less misleading about the extent of
agreement in your data
• Strong treelike signal can still come through
• Capturing LGT information from environmental
samples?
• Oy.
• Still lots of assumptions.
43