Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
High-throughput comparative 
genomics 
24th October 2013 
Joe Parker, 
Queen Mary University London
Topics 
1. Introduction 
2. Background: why phylog e nomics? 
3. Examples 
4. Practice 
5. Case study 
6. On the horizon 
...
Aims 
• Context of phylogenomics: Next-generation 
sequencing (NGS) 
• Why phylog e nomics? 
• Practical analyses 
• Futur...
1. Our Research
Lab Interests 
• Ecology and evolution of traits 
• Echolocation, sociality 
• NGS data for population genetics and phylog...
Activities 
• Phylogeny estimation/comparison 
• Molecular correlates of evolution; 
– site substitutions, dN/dS, composit...
2. Background
Next-generation sequencing
Why phylog e nomics, not 
-genetics? 
• Causes of discordant signal 
– Incomplete lineage sorting 
– Lateral transfer 
– R...
Quantitative biology 
• Multiple configurations 
• Hyperparameters 
empirically investigated 
• Determine sensitivity of 
...
Distributions 
• Genome-scale data 
provides context 
• Identify outliers 
Ge ne s / taxa / tre e s 
• Compare values acro...
Integration with ‘Omics 
• Multiple databases 
• Functional data 
• Bibliographic information
3. Example studies
Tsakgogeorgia e t al. (in press)
Salichos & Rokas (2013)
Backström e t al. (2013)
Lindblad-Toh e t al. (2011)
4. Practice
Source material 
• Samples 
• Storage 
• Purification 
• Library prep
Sequencing 
• Genome 
– Sanger 
– Illumina 
– Pyro /454 
– SOLiD 
– PacBio 
• Transcriptome / RNA-seq 
– MyBAITS 
• HiSeq ...
Infrastructure 
• Desktop machines 
• Computing clusters 
• Grid systems 
• Cloud-based computation
Assembly, Annotation 
• Assembly 
– To reference 
(mapping) 
– De novo 
• Annotation 
– By homology 
– De novo 
•SOAPdenov...
Alignment 
• PRANK 
• MUSCLE 
• MAFFT 
• Clustal
Phylogeny inference 
• MrBayes 
• RAxML 
• BEAST 
• MP-EST 
• STAR
Phylogenetic analysis 
• BEAST 
• HYPHY 
• PAML 
• Pipelines 
• LRT
5. Case study
Parker e t al. (2013) 
• De novo genomes: 
– four taxa 
– 2,321 protein-coding loci 
– 801,301 codons 
• Published: 
– 18 ...
Our pipeline for detecting genome-wide convergence
mean = 0.05
mean = 0.05 mean = -0.01 mean = -0.08 

Development cycle 
Design 
Wireframe & 
specify tests 
Implement 
Alignment 
loadSequences() 
getSubstitutions() 
Phylogen...
Parker e t al. (2013)
Parker e t al. (2013)
6. On the horizon
Environmental metagenomics
Models of computation 
• Cloud resources: Unlimited 
flexibility, finite time 
• Development trade-off 
– Off-the-shelf 
–...
Serialisation 
• Process data remotely 
• Freeze-dry objects, download to 
desktop 
• Implement new methods directly 
on p...
7. Over the horizon 
• Real-time phylogenetics 
• Field phylogenetics 
• Alignment-free analyses
Conclusions 
• Why phylogenomics? 
• Practice 
• Comparative approach 
• Statistical context
Thanks 
Steve Rossiter1, James Cotton2, Elia Stupka3 & Georgia Tsagkogeorga1 
1Scho o l o f Bio lo g ical and Chemical Sci...
Resources • My email: Joe Parker (Queen Mary University of London): j.d.parker@qmul.ac.uk 
• Parker, J., Tsagkogeorga, G.,...
Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014
Upcoming SlideShare
Loading in …5
×

Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

1,517 views

Published on

Invited research seminar given to MSc students at University College Dublin on 24th October 2013.

I introduce the discipline of phylogenomics - comparative phylogenetic analyses of DNA sequences across genomes - and some of the applications and recent breakthroughs in the field.

As an in-depth case study I explain the methods and significance of our 2013 Nature paper on adaptive genotypic molecular convergence in echolocating mammals.

I then highlight some of the avenues of study on the frontiers of current research.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

  1. 1. High-throughput comparative genomics 24th October 2013 Joe Parker, Queen Mary University London
  2. 2. Topics 1. Introduction 2. Background: why phylog e nomics? 3. Examples 4. Practice 5. Case study 6. On the horizon 7. Over the horizon
  3. 3. Aims • Context of phylogenomics: Next-generation sequencing (NGS) • Why phylog e nomics? • Practical analyses • Future developments
  4. 4. 1. Our Research
  5. 5. Lab Interests • Ecology and evolution of traits • Echolocation, sociality • NGS data for population genetics and phylogenomics
  6. 6. Activities • Phylogeny estimation/comparison • Molecular correlates of evolution; – site substitutions, dN/dS, composition • Simulation • Dataset limitations (R-L): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey
  7. 7. 2. Background
  8. 8. Next-generation sequencing
  9. 9. Why phylog e nomics, not -genetics? • Causes of discordant signal – Incomplete lineage sorting – Lateral transfer – Recombination – Introgression
  10. 10. Quantitative biology • Multiple configurations • Hyperparameters empirically investigated • Determine sensitivity of results
  11. 11. Distributions • Genome-scale data provides context • Identify outliers Ge ne s / taxa / tre e s • Compare values across biological systems
  12. 12. Integration with ‘Omics • Multiple databases • Functional data • Bibliographic information
  13. 13. 3. Example studies
  14. 14. Tsakgogeorgia e t al. (in press)
  15. 15. Salichos & Rokas (2013)
  16. 16. Backström e t al. (2013)
  17. 17. Lindblad-Toh e t al. (2011)
  18. 18. 4. Practice
  19. 19. Source material • Samples • Storage • Purification • Library prep
  20. 20. Sequencing • Genome – Sanger – Illumina – Pyro /454 – SOLiD – PacBio • Transcriptome / RNA-seq – MyBAITS • HiSeq / MiSeq • IonTorrent
  21. 21. Infrastructure • Desktop machines • Computing clusters • Grid systems • Cloud-based computation
  22. 22. Assembly, Annotation • Assembly – To reference (mapping) – De novo • Annotation – By homology – De novo •SOAPdenovo •MAKER •Velvet •Bowtie / Cufflinks / Tophat •Trinity
  23. 23. Alignment • PRANK • MUSCLE • MAFFT • Clustal
  24. 24. Phylogeny inference • MrBayes • RAxML • BEAST • MP-EST • STAR
  25. 25. Phylogenetic analysis • BEAST • HYPHY • PAML • Pipelines • LRT
  26. 26. 5. Case study
  27. 27. Parker e t al. (2013) • De novo genomes: – four taxa – 2,321 protein-coding loci – 801,301 codons • Published: – 18 genomes • ~69,000 simulated datasets • ~3,500 cluster cores
  28. 28. Our pipeline for detecting genome-wide convergence
  29. 29. mean = 0.05
  30. 30. mean = 0.05 mean = -0.01 mean = -0.08 
  31. 31. Development cycle Design Wireframe & specify tests Implement Alignment loadSequences() getSubstitutions() Phylogeny trimTaxa() getMRCA() DataSeries calculateECDF() randomise() Regression getResiduals() predictInterval() Review, refine & refactor
  32. 32. Parker e t al. (2013)
  33. 33. Parker e t al. (2013)
  34. 34. 6. On the horizon
  35. 35. Environmental metagenomics
  36. 36. Models of computation • Cloud resources: Unlimited flexibility, finite time • Development trade-off – Off-the-shelf – Bespoke • Exploratory work – Real time genomic transects? • Essential fundamental data missing from nearly every system; – Diversity; structure; substitution rates; dN/dS; recombination; dispersal; lateral transfer
  37. 37. Serialisation • Process data remotely • Freeze-dry objects, download to desktop • Implement new methods directly on previously-analysed data
  38. 38. 7. Over the horizon • Real-time phylogenetics • Field phylogenetics • Alignment-free analyses
  39. 39. Conclusions • Why phylogenomics? • Practice • Comparative approach • Statistical context
  40. 40. Thanks Steve Rossiter1, James Cotton2, Elia Stupka3 & Georgia Tsagkogeorga1 1Scho o l o f Bio lo g ical and Chemical Scie nce s, Que e n Mary, Unive rsity o f Lo ndo n 2We llcome Trust Sang e r Institute 3Ce nte r fo r Translatio nal Ge no mics and Bio info rmatics, San Raffae le Institute , Milan Chris Walker & Dan Traynor Que e n Mary GridPP High-thro ughput Cluste r Chaz Mein & Anna Terry Barts and The Lo ndo n Ge no me Ce ntre Mahesh Pancholi Scho o l o f Bio lo g ical and Chemical Scie nce s BBSRC (UK); Queen Mary, University of London
  41. 41. Resources • My email: Joe Parker (Queen Mary University of London): j.d.parker@qmul.ac.uk • Parker, J., Tsagkogeorga, G., Cotton, J.A., Liu, Y., Provero, P., Stupka, E. & Rossiter, S.J. (2013) Genome-wide signatures of convergent evolution in echolocating mammals. Nature 502(7470):228-231 doi:10.1038/nature12511. • Tsagkogeorga, G., Parker, J., Stupka, E., Cotton, J.A., & Rossiter, S.J. (2013) Phylogenomic analyses elucidate evolutionary relationships of the bats (Chiroptera) Curr. Biol. in the press. • Salichos, L. & Rokas, A. (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 437:327- 331. doi:10.1038/nature12130 • Backström, N., Zhang, Q. & Edwards, S.V. (2013) Evidence from a House Finch (Haemorhous mexicanus) Spleen Transcriptome for Adaptive Evolution and Biased Gene Conversion in Passerine Birds. MBE 30(5):1046-50. doi:10.1093/molbev/mst033 • Lindblad-Toh, K., Garber, M., Zuk, O., Lin, M.F., Parker, B.J., et al. (2011) A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478:476–482 doi:10.1038/nature10530 • Degnan, J.H. & Rosenberg, N.A. (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. TREE 24:(6)332-340 doi:10.1016/j.tree.2009.01.009 • The Tree Of Life: http://phylogenomics.blogspot.co.uk/ • RNA-seq For Everyone: http://rnaseq.uoregon.edu/index.html • Evo-Phylo: http://www.davelunt.net/evophylo/tag/phylogenomics/ • OpenHelix: http://blog.openhelix.eu/ • Our blogs: http://evolve.sbcs.qmul.ac.uk/rossiter/ (lab) and http://www.lonelyjoeparker.com/?cat=11 (Joe)

×