Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

1,541 views

Published on

Invited research seminar given to MSc students at University College Dublin on 24th October 2013.

I introduce the discipline of phylogenomics - comparative phylogenetic analyses of DNA sequences across genomes - and some of the applications and recent breakthroughs in the field.

As an in-depth case study I explain the methods and significance of our 2013 Nature paper on adaptive genotypic molecular convergence in echolocating mammals.

I then highlight some of the avenues of study on the frontiers of current research.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

  1. 1. High-throughput comparative genomics 24th October 2013 Joe Parker, Queen Mary University London
  2. 2. Topics 1. Introduction 2. Background: why phylog e nomics? 3. Examples 4. Practice 5. Case study 6. On the horizon 7. Over the horizon
  3. 3. Aims • Context of phylogenomics: Next-generation sequencing (NGS) • Why phylog e nomics? • Practical analyses • Future developments
  4. 4. 1. Our Research
  5. 5. Lab Interests • Ecology and evolution of traits • Echolocation, sociality • NGS data for population genetics and phylogenomics
  6. 6. Activities • Phylogeny estimation/comparison • Molecular correlates of evolution; – site substitutions, dN/dS, composition • Simulation • Dataset limitations (R-L): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey
  7. 7. 2. Background
  8. 8. Next-generation sequencing
  9. 9. Why phylog e nomics, not -genetics? • Causes of discordant signal – Incomplete lineage sorting – Lateral transfer – Recombination – Introgression
  10. 10. Quantitative biology • Multiple configurations • Hyperparameters empirically investigated • Determine sensitivity of results
  11. 11. Distributions • Genome-scale data provides context • Identify outliers Ge ne s / taxa / tre e s • Compare values across biological systems
  12. 12. Integration with ‘Omics • Multiple databases • Functional data • Bibliographic information
  13. 13. 3. Example studies
  14. 14. Tsakgogeorgia e t al. (in press)
  15. 15. Salichos & Rokas (2013)
  16. 16. Backström e t al. (2013)
  17. 17. Lindblad-Toh e t al. (2011)
  18. 18. 4. Practice
  19. 19. Source material • Samples • Storage • Purification • Library prep
  20. 20. Sequencing • Genome – Sanger – Illumina – Pyro /454 – SOLiD – PacBio • Transcriptome / RNA-seq – MyBAITS • HiSeq / MiSeq • IonTorrent
  21. 21. Infrastructure • Desktop machines • Computing clusters • Grid systems • Cloud-based computation
  22. 22. Assembly, Annotation • Assembly – To reference (mapping) – De novo • Annotation – By homology – De novo •SOAPdenovo •MAKER •Velvet •Bowtie / Cufflinks / Tophat •Trinity
  23. 23. Alignment • PRANK • MUSCLE • MAFFT • Clustal
  24. 24. Phylogeny inference • MrBayes • RAxML • BEAST • MP-EST • STAR
  25. 25. Phylogenetic analysis • BEAST • HYPHY • PAML • Pipelines • LRT
  26. 26. 5. Case study
  27. 27. Parker e t al. (2013) • De novo genomes: – four taxa – 2,321 protein-coding loci – 801,301 codons • Published: – 18 genomes • ~69,000 simulated datasets • ~3,500 cluster cores
  28. 28. Our pipeline for detecting genome-wide convergence
  29. 29. mean = 0.05
  30. 30. mean = 0.05 mean = -0.01 mean = -0.08 
  31. 31. Development cycle Design Wireframe & specify tests Implement Alignment loadSequences() getSubstitutions() Phylogeny trimTaxa() getMRCA() DataSeries calculateECDF() randomise() Regression getResiduals() predictInterval() Review, refine & refactor
  32. 32. Parker e t al. (2013)
  33. 33. Parker e t al. (2013)
  34. 34. 6. On the horizon
  35. 35. Environmental metagenomics
  36. 36. Models of computation • Cloud resources: Unlimited flexibility, finite time • Development trade-off – Off-the-shelf – Bespoke • Exploratory work – Real time genomic transects? • Essential fundamental data missing from nearly every system; – Diversity; structure; substitution rates; dN/dS; recombination; dispersal; lateral transfer
  37. 37. Serialisation • Process data remotely • Freeze-dry objects, download to desktop • Implement new methods directly on previously-analysed data
  38. 38. 7. Over the horizon • Real-time phylogenetics • Field phylogenetics • Alignment-free analyses
  39. 39. Conclusions • Why phylogenomics? • Practice • Comparative approach • Statistical context
  40. 40. Thanks Steve Rossiter1, James Cotton2, Elia Stupka3 & Georgia Tsagkogeorga1 1Scho o l o f Bio lo g ical and Chemical Scie nce s, Que e n Mary, Unive rsity o f Lo ndo n 2We llcome Trust Sang e r Institute 3Ce nte r fo r Translatio nal Ge no mics and Bio info rmatics, San Raffae le Institute , Milan Chris Walker & Dan Traynor Que e n Mary GridPP High-thro ughput Cluste r Chaz Mein & Anna Terry Barts and The Lo ndo n Ge no me Ce ntre Mahesh Pancholi Scho o l o f Bio lo g ical and Chemical Scie nce s BBSRC (UK); Queen Mary, University of London
  41. 41. Resources • My email: Joe Parker (Queen Mary University of London): j.d.parker@qmul.ac.uk • Parker, J., Tsagkogeorga, G., Cotton, J.A., Liu, Y., Provero, P., Stupka, E. & Rossiter, S.J. (2013) Genome-wide signatures of convergent evolution in echolocating mammals. Nature 502(7470):228-231 doi:10.1038/nature12511. • Tsagkogeorga, G., Parker, J., Stupka, E., Cotton, J.A., & Rossiter, S.J. (2013) Phylogenomic analyses elucidate evolutionary relationships of the bats (Chiroptera) Curr. Biol. in the press. • Salichos, L. & Rokas, A. (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 437:327- 331. doi:10.1038/nature12130 • Backström, N., Zhang, Q. & Edwards, S.V. (2013) Evidence from a House Finch (Haemorhous mexicanus) Spleen Transcriptome for Adaptive Evolution and Biased Gene Conversion in Passerine Birds. MBE 30(5):1046-50. doi:10.1093/molbev/mst033 • Lindblad-Toh, K., Garber, M., Zuk, O., Lin, M.F., Parker, B.J., et al. (2011) A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478:476–482 doi:10.1038/nature10530 • Degnan, J.H. & Rosenberg, N.A. (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. TREE 24:(6)332-340 doi:10.1016/j.tree.2009.01.009 • The Tree Of Life: http://phylogenomics.blogspot.co.uk/ • RNA-seq For Everyone: http://rnaseq.uoregon.edu/index.html • Evo-Phylo: http://www.davelunt.net/evophylo/tag/phylogenomics/ • OpenHelix: http://blog.openhelix.eu/ • Our blogs: http://evolve.sbcs.qmul.ac.uk/rossiter/ (lab) and http://www.lonelyjoeparker.com/?cat=11 (Joe)

×