Reframing Phylogenetics
Unbiased comparative methods for
environmental metagenomics
sampling
Joe Parker
• My background & track record
• Environmental metagenomics – existing problems
• Phylogenomics Add Maximum Value To Datasets
• Illustrative study
Outline
Joe Parker: Novel methods for cutting-edge science
High-throughput
phylogenomics
Parallelised analyses
Bayesian statistics
Information-theoretic
measures
NGS datasets
Integrating clinical,
genetic & molecular
data
Machine-learning and
antigen modeling
BaTS software
>100 citations
‘000s downloads
HADPACK framework
in silico HIV vaccine design
Clinical trial
ABCDet API
First genomic convergent
evolution demonstration
Nature Oct 2013
Public alpha
Throughput & Access
Users
Users
Developers
Developers
Mason et al. (2014) Metagenomics reveals sediment microbial community response to Deepwater Horizon oil spill.
The ISME J (epub ahead of print; 23rd Jan 2014; retrieved 1st Mar 2014): doi:10.1038/ismej.2013.254
Metagenomics of an environmental disaster
Comparative approaches
Homology
Deepwater Horizon, revisited
Continuous analyses with immediate results
Iterative sample collection / analysis; rapid cline detection
Exploit phylogenetic
methods
Detect: population
dynamics, adaptive
evolution, migration
Facilitate NGS
Gene functions and
Ecosystem services
Explicitly model errors
Account for paralogy &
horizontal transfer
Reduce
ascertainment bias
Unbiased taxon /
gene discovery
Dr. Joe Parker
Dr. Elizabeth Clare
Environmental metagenomics
Dr. Steve Rossiter
Phylogenomics
Prof. Richard Nichols
Population genetics
Prof. Steve Lloyd
Parallel computing
Prof. Mark Trimmer
Biogeochemistry
Dr. Jon Grey
Aquatic ecology
Prof. Alfried Vogler (NHM)
Metagenomics & turbotaxonomy
Mr. Tim Booth (NEBC)
Bio-Linux & virtual machines
Prof. Jonathan Eisen (US)
Microbial phylogenomics
Prof. Alexei Drummond (NZ)
Bayesian phylogenetics, Geneious CSO
Dr. Matthew Hahn (US)
Genomics
Dr. Aris Katzourakis (Oxford)
Phylodynamics modelling
GridPP HTC
3,000+ cores
MidPlus HPC
2,000+ cores
Genome Centre
Sequencing expertise
Deepwater Horizon, revisited
Continuous analyses with immediate results
Iterative sample collection / analysis; rapid cline detection
Exploit phylogenetic
methods
Detect: population
dynamics, adaptive
evolution, migration
Facilitate NGS
Gene functions and
Ecosystem services
Explicitly model errors
Account for paralogy &
horizontal transfer
Reduce
ascertainment bias
Unbiased taxon /
gene discovery
Activity Goal
Publication and/or software
release
Y1 Port existing tools Build framework
Phylogenomics tools
Runtime visualisation
Taxonomic assignment
Sitewise diversity
Y2 I/O & GUI, review Develop framework
Visualisation tools
Review literature, develop theory
Non-standard genetic codes
Raft-aligned reads (RAR) demonstration
Y3 RARs, phylo stress
Anhomologous
phylogenetics
Asynchronous phylogenomics
RARs MPI implementation
Y4 Core method Integrate research
Core method demonstration,
including agent-based computation
Core method released
Y5 Alpha releases Deploy
Stable releases ported to Geneious,
CLC, Galaxy
Final report

Reframing Phylogenomics

  • 1.
    Reframing Phylogenetics Unbiased comparativemethods for environmental metagenomics sampling Joe Parker
  • 2.
    • My background& track record • Environmental metagenomics – existing problems • Phylogenomics Add Maximum Value To Datasets • Illustrative study Outline
  • 3.
    Joe Parker: Novelmethods for cutting-edge science High-throughput phylogenomics Parallelised analyses Bayesian statistics Information-theoretic measures NGS datasets Integrating clinical, genetic & molecular data Machine-learning and antigen modeling BaTS software >100 citations ‘000s downloads HADPACK framework in silico HIV vaccine design Clinical trial ABCDet API First genomic convergent evolution demonstration Nature Oct 2013 Public alpha
  • 4.
  • 5.
    Mason et al.(2014) Metagenomics reveals sediment microbial community response to Deepwater Horizon oil spill. The ISME J (epub ahead of print; 23rd Jan 2014; retrieved 1st Mar 2014): doi:10.1038/ismej.2013.254 Metagenomics of an environmental disaster
  • 6.
  • 7.
  • 9.
    Deepwater Horizon, revisited Continuousanalyses with immediate results Iterative sample collection / analysis; rapid cline detection Exploit phylogenetic methods Detect: population dynamics, adaptive evolution, migration Facilitate NGS Gene functions and Ecosystem services Explicitly model errors Account for paralogy & horizontal transfer Reduce ascertainment bias Unbiased taxon / gene discovery
  • 10.
    Dr. Joe Parker Dr.Elizabeth Clare Environmental metagenomics Dr. Steve Rossiter Phylogenomics Prof. Richard Nichols Population genetics Prof. Steve Lloyd Parallel computing Prof. Mark Trimmer Biogeochemistry Dr. Jon Grey Aquatic ecology Prof. Alfried Vogler (NHM) Metagenomics & turbotaxonomy Mr. Tim Booth (NEBC) Bio-Linux & virtual machines Prof. Jonathan Eisen (US) Microbial phylogenomics Prof. Alexei Drummond (NZ) Bayesian phylogenetics, Geneious CSO Dr. Matthew Hahn (US) Genomics Dr. Aris Katzourakis (Oxford) Phylodynamics modelling GridPP HTC 3,000+ cores MidPlus HPC 2,000+ cores Genome Centre Sequencing expertise
  • 11.
    Deepwater Horizon, revisited Continuousanalyses with immediate results Iterative sample collection / analysis; rapid cline detection Exploit phylogenetic methods Detect: population dynamics, adaptive evolution, migration Facilitate NGS Gene functions and Ecosystem services Explicitly model errors Account for paralogy & horizontal transfer Reduce ascertainment bias Unbiased taxon / gene discovery
  • 12.
    Activity Goal Publication and/orsoftware release Y1 Port existing tools Build framework Phylogenomics tools Runtime visualisation Taxonomic assignment Sitewise diversity Y2 I/O & GUI, review Develop framework Visualisation tools Review literature, develop theory Non-standard genetic codes Raft-aligned reads (RAR) demonstration Y3 RARs, phylo stress Anhomologous phylogenetics Asynchronous phylogenomics RARs MPI implementation Y4 Core method Integrate research Core method demonstration, including agent-based computation Core method released Y5 Alpha releases Deploy Stable releases ported to Geneious, CLC, Galaxy Final report

Editor's Notes

  • #2 RB: more explnation of basic ideas RK: not here – arctic microbes slide RB: ok
  • #3 me, problem, solution: My track record and why I can take this field forward Current analyses in env. Metag. Falling short, Why phylogenetics add demonstration
  • #4 Throughout my career : track record of novel models, implemented in apps for others, doing cutting-edge science Bats, >100 cites, thousands d/ls, weekly/daily user contact Hadpack initiated entirely novel hiv analysis / vaccine design w/ machine learning, phylogenetics, GUI Current work package for HT phylogenomics, detected convergent evol (NATURE)
  • #5 **Throughput** usually in terms of sequencing , Analysis – not limited by CPU intersection of able developers who are also users v.small Access drives impact Fundamental to my goals Distributed / cloud infrastructures – no bar to entry miniION etc exacerbate
  • #6 00s I could pick, this is one - Typical example of an environmental metagenomics question: oil spill effects on marine micro? Sediment cores, 50 sites single gene, handful of genomes MDS could distinguish some signal w/ geochemical variables, found some taxa, some new How many more new? Similarity based Slow Sequences embody Information, including important on adaptation etc - wasted *** Deepwater Horizon (DWH) oil spill – spring 2010 ~4.1 million barrels of oil to the Gulf of Mexico; >22% of this oil is unaccounted for, 64 sites by targeted sequencing of 16S rRNA genes, shotgun metagenomic sequencing of 14 samples 16S rRNA: most heavily oil-impacted sediments enriched in an uncultured Gammaproteobacterium and a Colwellia species, both of which were highly similar to sequences in the DWH deep-sea hydrocarbon plume. The primary drivers in structuring the microbial community were nitrogen and hydrocarbons. Annotation of unassembled metagenomic data revealed the most abundant hydrocarbon degradation pathway encoded genes involved in degrading aliphatic and simple aromatics via butane monooxygenase. Further, analysis of metagenomic sequence data revealed an increase in abundance of genes involved in denitrification pathways in samples that exceeded the Environmental Protection Agency (EPA)’s benchmarks for polycyclic aromatic hydrocarbons (PAHs) compared with those that did not. Importantly, these data demonstrate that the indigenous sediment microbiota contributed an important ecosystem service for remediation of oil in the Gulf. However, PAHs were more recalcitrant to degradation, and their persistence could have deleterious impacts on the sediment ecosystem.
  • #7 Given observed microbial diversity Phylogeny reveals evolutionary history; trait acquired once? Or multiple times – biologically significant…
  • #8 Why aren’t there more phylogenetics in environmental micro? Orthology assumptions from classical phylogenetics Simple case, defined as orthologous when gene and species histories identical. Genes = taxa, and vice versa Gene duplications give rise to paralogous copies, may confuse – esp similarity matching Secondary copies.. Or deletions screw up more !microbial communities! Horizontal transfer
  • #9 This is a COMPLETELY NOVEL approach Continuous analysis, agent-based – outputs instantly with increasing resolution How I envisage it working: [1] collection of short-read envir. Metagen. Sequences, low complexity [2] tiled into pseudo assemblies by similarity clustering. may be chimeric may be orthologous or paralogous I CALL THESE RAFT-ALIGNED-READS, and this step CRYSTALISATION each raft handled by an agent increased local order, still globally disordered [3] we can compute phylogenetic measures along sliding windows within a raft. These measure the coherence of the evolutionary signal along the raft [4] areas of great incoherence I CALL PHYLOGENETIC STRESS – thrse might correspond to chimeric reads, e.g. other taxa; paralogues; horiz transfer [5] agents can compare stress values and attempt to exchange reads; proportional to stress. I CALL THIS DISLOCATION [6] iteration towards maximally globally ordered state
  • #10 More taxa / genes Full evol. Information extracted Explicit modelling NGS-ready Fast / instant
  • #11 Compute / sequencing resources QM experts, collaborators & mentors International collaborators
  • #12 Leave it there for questions More taxa / genes Full evol. Information extracted Explicit modelling NGS-ready Fast / instant
  • #13 Research programme, not an engineering project