Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

High-throughput comparative
genomics
24th October 2013
Joe Parker,
Queen Mary University London

Topics
1. Introduction
2. Background: why phylog e nomics?
3. Examples
4. Practice
5. Case study
6. On the horizon
7. Over the horizon

Aims
• Context of phylogenomics: Next-generation
sequencing (NGS)
• Why phylog e nomics?
• Practical analyses
• Future developments

Lab Interests
• Ecology and evolution of traits
• Echolocation, sociality
• NGS data for population genetics and phylogenomics

Activities
• Phylogeny estimation/comparison
• Molecular correlates of evolution;
– site substitutions, dN/dS, composition
• Simulation
• Dataset limitations
(R-L): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey

Why phylog e nomics, not
-genetics?
• Causes of discordant signal
– Incomplete lineage sorting
– Lateral transfer
– Recombination
– Introgression

Quantitative biology
• Multiple configurations
• Hyperparameters
empirically investigated
• Determine sensitivity of
results

Distributions
• Genome-scale data
provides context
• Identify outliers
Ge ne s / taxa / tre e s
• Compare values across
biological systems

Integration with ‘Omics
• Multiple databases
• Functional data
• Bibliographic information

Tsakgogeorgia e t al. (in press)

Source material
• Samples
• Storage
• Purification
• Library prep

Sequencing
• Genome
– Sanger
– Illumina
– Pyro /454
– SOLiD
– PacBio
• Transcriptome / RNA-seq
– MyBAITS
• HiSeq / MiSeq
• IonTorrent

Infrastructure
• Desktop machines
• Computing clusters
• Grid systems
• Cloud-based computation

Assembly, Annotation
• Assembly
– To reference
(mapping)
– De novo
• Annotation
– By homology
– De novo
•SOAPdenovo
•MAKER
•Velvet
•Bowtie / Cufflinks / Tophat
•Trinity

Alignment
• PRANK
• MUSCLE
• MAFFT
• Clustal

Phylogeny inference
• MrBayes
• RAxML
• BEAST
• MP-EST
• STAR

Phylogenetic analysis
• BEAST
• HYPHY
• PAML
• Pipelines
• LRT

Parker e t al. (2013)
• De novo genomes:
– four taxa
– 2,321 protein-coding loci
– 801,301 codons
• Published:
– 18 genomes
• ~69,000 simulated datasets
• ~3,500 cluster cores

Our pipeline for detecting genome-wide convergence

mean = 0.05 mean = -0.01 mean = -0.08


Development cycle
Design
Wireframe &
specify tests
Implement
Alignment
loadSequences()
getSubstitutions()
Phylogeny
trimTaxa()
getMRCA()
DataSeries
calculateECDF()
randomise()
Regression
getResiduals()
predictInterval()
Review, refine
& refactor

Models of computation
• Cloud resources: Unlimited
flexibility, finite time
• Development trade-off
– Off-the-shelf
– Bespoke
• Exploratory work
– Real time genomic transects?
• Essential fundamental data missing
from nearly every system;
– Diversity; structure; substitution rates;
dN/dS; recombination; dispersal; lateral
transfer

Serialisation
• Process data remotely
• Freeze-dry objects, download to
desktop
• Implement new methods directly
on previously-analysed data

7. Over the horizon
• Real-time phylogenetics
• Field phylogenetics
• Alignment-free analyses

Conclusions
• Why phylogenomics?
• Practice
• Comparative approach
• Statistical context

Thanks
Steve Rossiter1, James Cotton2, Elia Stupka3 & Georgia Tsagkogeorga1
1Scho o l o f Bio lo g ical and Chemical Scie nce s, Que e n Mary, Unive rsity o f Lo ndo n
2We llcome Trust Sang e r Institute
3Ce nte r fo r Translatio nal Ge no mics and Bio info rmatics, San Raffae le Institute , Milan
Chris Walker & Dan Traynor
Que e n Mary GridPP High-thro ughput Cluste r
Chaz Mein & Anna Terry
Barts and The Lo ndo n Ge no me Ce ntre
Mahesh Pancholi
Scho o l o f Bio lo g ical and Chemical Scie nce s
BBSRC (UK); Queen Mary, University of London

Resources • My email: Joe Parker (Queen Mary University of London): j.d.parker@qmul.ac.uk
• Parker, J., Tsagkogeorga, G., Cotton, J.A., Liu, Y., Provero, P., Stupka, E. & Rossiter, S.J. (2013) Genome-wide signatures of
convergent evolution in echolocating mammals. Nature 502(7470):228-231 doi:10.1038/nature12511.
• Tsagkogeorga, G., Parker, J., Stupka, E., Cotton, J.A., & Rossiter, S.J. (2013) Phylogenomic analyses elucidate evolutionary
relationships of the bats (Chiroptera) Curr. Biol. in the press.
• Salichos, L. & Rokas, A. (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 437:327-
331. doi:10.1038/nature12130
• Backström, N., Zhang, Q. & Edwards, S.V. (2013) Evidence from a House Finch (Haemorhous mexicanus) Spleen
Transcriptome for Adaptive Evolution and Biased Gene Conversion in Passerine Birds. MBE 30(5):1046-50.
doi:10.1093/molbev/mst033
• Lindblad-Toh, K., Garber, M., Zuk, O., Lin, M.F., Parker, B.J., et al. (2011) A high-resolution map of human evolutionary
constraint using 29 mammals. Nature 478:476–482 doi:10.1038/nature10530
• Degnan, J.H. & Rosenberg, N.A. (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. TREE
24:(6)332-340 doi:10.1016/j.tree.2009.01.009
• The Tree Of Life: http://phylogenomics.blogspot.co.uk/
• RNA-seq For Everyone: http://rnaseq.uoregon.edu/index.html
• Evo-Phylo: http://www.davelunt.net/evophylo/tag/phylogenomics/
• OpenHelix: http://blog.openhelix.eu/
• Our blogs: http://evolve.sbcs.qmul.ac.uk/rossiter/ (lab) and http://www.lonelyjoeparker.com/?cat=11 (Joe)

Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

More Related Content

What's hot

Viewers also liked

Similar to Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

More from Joe Parker

Recently uploaded

Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Editor's Notes