SlideShare a Scribd company logo
1 of 48
High-throughput comparative 
genomics 
24th October 2013 
Joe Parker, 
Queen Mary University London
Topics 
1. Introduction 
2. Background: why phylog e nomics? 
3. Examples 
4. Practice 
5. Case study 
6. On the horizon 
7. Over the horizon
Aims 
• Context of phylogenomics: Next-generation 
sequencing (NGS) 
• Why phylog e nomics? 
• Practical analyses 
• Future developments
1. Our Research
Lab Interests 
• Ecology and evolution of traits 
• Echolocation, sociality 
• NGS data for population genetics and phylogenomics
Activities 
• Phylogeny estimation/comparison 
• Molecular correlates of evolution; 
– site substitutions, dN/dS, composition 
• Simulation 
• Dataset limitations 
(R-L): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey
2. Background
Next-generation sequencing
Why phylog e nomics, not 
-genetics? 
• Causes of discordant signal 
– Incomplete lineage sorting 
– Lateral transfer 
– Recombination 
– Introgression
Quantitative biology 
• Multiple configurations 
• Hyperparameters 
empirically investigated 
• Determine sensitivity of 
results
Distributions 
• Genome-scale data 
provides context 
• Identify outliers 
Ge ne s / taxa / tre e s 
• Compare values across 
biological systems
Integration with ‘Omics 
• Multiple databases 
• Functional data 
• Bibliographic information
3. Example studies
Tsakgogeorgia e t al. (in press)
Salichos & Rokas (2013)
Backström e t al. (2013)
Lindblad-Toh e t al. (2011)
4. Practice
Source material 
• Samples 
• Storage 
• Purification 
• Library prep
Sequencing 
• Genome 
– Sanger 
– Illumina 
– Pyro /454 
– SOLiD 
– PacBio 
• Transcriptome / RNA-seq 
– MyBAITS 
• HiSeq / MiSeq 
• IonTorrent
Infrastructure 
• Desktop machines 
• Computing clusters 
• Grid systems 
• Cloud-based computation
Assembly, Annotation 
• Assembly 
– To reference 
(mapping) 
– De novo 
• Annotation 
– By homology 
– De novo 
•SOAPdenovo 
•MAKER 
•Velvet 
•Bowtie / Cufflinks / Tophat 
•Trinity
Alignment 
• PRANK 
• MUSCLE 
• MAFFT 
• Clustal
Phylogeny inference 
• MrBayes 
• RAxML 
• BEAST 
• MP-EST 
• STAR
Phylogenetic analysis 
• BEAST 
• HYPHY 
• PAML 
• Pipelines 
• LRT
5. Case study
Parker e t al. (2013) 
• De novo genomes: 
– four taxa 
– 2,321 protein-coding loci 
– 801,301 codons 
• Published: 
– 18 genomes 
• ~69,000 simulated datasets 
• ~3,500 cluster cores
Our pipeline for detecting genome-wide convergence
mean = 0.05
mean = 0.05 mean = -0.01 mean = -0.08 

Development cycle 
Design 
Wireframe & 
specify tests 
Implement 
Alignment 
loadSequences() 
getSubstitutions() 
Phylogeny 
trimTaxa() 
getMRCA() 
DataSeries 
calculateECDF() 
randomise() 
Regression 
getResiduals() 
predictInterval() 
Review, refine 
& refactor
Parker e t al. (2013)
Parker e t al. (2013)
6. On the horizon
Environmental metagenomics
Models of computation 
• Cloud resources: Unlimited 
flexibility, finite time 
• Development trade-off 
– Off-the-shelf 
– Bespoke 
• Exploratory work 
– Real time genomic transects? 
• Essential fundamental data missing 
from nearly every system; 
– Diversity; structure; substitution rates; 
dN/dS; recombination; dispersal; lateral 
transfer
Serialisation 
• Process data remotely 
• Freeze-dry objects, download to 
desktop 
• Implement new methods directly 
on previously-analysed data
7. Over the horizon 
• Real-time phylogenetics 
• Field phylogenetics 
• Alignment-free analyses
Conclusions 
• Why phylogenomics? 
• Practice 
• Comparative approach 
• Statistical context
Thanks 
Steve Rossiter1, James Cotton2, Elia Stupka3 & Georgia Tsagkogeorga1 
1Scho o l o f Bio lo g ical and Chemical Scie nce s, Que e n Mary, Unive rsity o f Lo ndo n 
2We llcome Trust Sang e r Institute 
3Ce nte r fo r Translatio nal Ge no mics and Bio info rmatics, San Raffae le Institute , Milan 
Chris Walker & Dan Traynor 
Que e n Mary GridPP High-thro ughput Cluste r 
Chaz Mein & Anna Terry 
Barts and The Lo ndo n Ge no me Ce ntre 
Mahesh Pancholi 
Scho o l o f Bio lo g ical and Chemical Scie nce s 
BBSRC (UK); Queen Mary, University of London
Resources • My email: Joe Parker (Queen Mary University of London): j.d.parker@qmul.ac.uk 
• Parker, J., Tsagkogeorga, G., Cotton, J.A., Liu, Y., Provero, P., Stupka, E. & Rossiter, S.J. (2013) Genome-wide signatures of 
convergent evolution in echolocating mammals. Nature 502(7470):228-231 doi:10.1038/nature12511. 
• Tsagkogeorga, G., Parker, J., Stupka, E., Cotton, J.A., & Rossiter, S.J. (2013) Phylogenomic analyses elucidate evolutionary 
relationships of the bats (Chiroptera) Curr. Biol. in the press. 
• Salichos, L. & Rokas, A. (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 437:327- 
331. doi:10.1038/nature12130 
• Backström, N., Zhang, Q. & Edwards, S.V. (2013) Evidence from a House Finch (Haemorhous mexicanus) Spleen 
Transcriptome for Adaptive Evolution and Biased Gene Conversion in Passerine Birds. MBE 30(5):1046-50. 
doi:10.1093/molbev/mst033 
• Lindblad-Toh, K., Garber, M., Zuk, O., Lin, M.F., Parker, B.J., et al. (2011) A high-resolution map of human evolutionary 
constraint using 29 mammals. Nature 478:476–482 doi:10.1038/nature10530 
• Degnan, J.H. & Rosenberg, N.A. (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. TREE 
24:(6)332-340 doi:10.1016/j.tree.2009.01.009 
• The Tree Of Life: http://phylogenomics.blogspot.co.uk/ 
• RNA-seq For Everyone: http://rnaseq.uoregon.edu/index.html 
• Evo-Phylo: http://www.davelunt.net/evophylo/tag/phylogenomics/ 
• OpenHelix: http://blog.openhelix.eu/ 
• Our blogs: http://evolve.sbcs.qmul.ac.uk/rossiter/ (lab) and http://www.lonelyjoeparker.com/?cat=11 (Joe)

More Related Content

What's hot

[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics
Mads Albertsen
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
GenomeInABottle
 

What's hot (20)

[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics
 
Basic Steps of the NGS Method
Basic Steps of the NGS MethodBasic Steps of the NGS Method
Basic Steps of the NGS Method
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Evolution of DNA Sequencing by Jonathan Eisen
Evolution of DNA Sequencing by Jonathan EisenEvolution of DNA Sequencing by Jonathan Eisen
Evolution of DNA Sequencing by Jonathan Eisen
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation Sequencing
 
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
 
A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...
 
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
 
transforming clinical microbiology by next generation sequencing
transforming clinical microbiology by next generation sequencingtransforming clinical microbiology by next generation sequencing
transforming clinical microbiology by next generation sequencing
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencing
 
Molecular characterization of Pst isolates from Western Canada
Molecular characterization of Pst isolates from Western CanadaMolecular characterization of Pst isolates from Western Canada
Molecular characterization of Pst isolates from Western Canada
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
 
overview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csnceroverview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csncer
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 

Viewers also liked

A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
mkim8
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal
GenomeInABottle
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
jukais
 

Viewers also liked (20)

A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platforms
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
 
Studying the microbiome
Studying the microbiomeStudying the microbiome
Studying the microbiome
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal
 
Next-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones InfographicNext-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones Infographic
 
Sequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN PlatformSequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN Platform
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plans
 
Ngs part i 2013
Ngs part i 2013Ngs part i 2013
Ngs part i 2013
 
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
2016 iHT2 San Diego Health IT Summit
2016 iHT2 San Diego Health IT Summit2016 iHT2 San Diego Health IT Summit
2016 iHT2 San Diego Health IT Summit
 
Biz model for ion proton dna sequencer
Biz model for ion proton dna sequencerBiz model for ion proton dna sequencer
Biz model for ion proton dna sequencer
 
A Survey of NGS Data Analysis on Hadoop
A Survey of NGS Data Analysis on HadoopA Survey of NGS Data Analysis on Hadoop
A Survey of NGS Data Analysis on Hadoop
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
 
A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad
 

Similar to Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

Genome sequencing in vegetable crops
Genome sequencing in vegetable cropsGenome sequencing in vegetable crops
Genome sequencing in vegetable crops
Bommesh
 
RICHELLE SOPKO_resume_042215
RICHELLE SOPKO_resume_042215RICHELLE SOPKO_resume_042215
RICHELLE SOPKO_resume_042215
Richelle Sopko
 
SJawdy_CV_June2016_no_personal
SJawdy_CV_June2016_no_personalSJawdy_CV_June2016_no_personal
SJawdy_CV_June2016_no_personal
Sara Jawdy
 
Leveraging Programmable CRISPR-Associated Transposases for Next-Generation Ge...
Leveraging Programmable CRISPR-Associated Transposases for Next-Generation Ge...Leveraging Programmable CRISPR-Associated Transposases for Next-Generation Ge...
Leveraging Programmable CRISPR-Associated Transposases for Next-Generation Ge...
InsideScientific
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Surya Saha
 

Similar to Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014 (20)

Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
 
CRISPR PROJECT.pptx
CRISPR PROJECT.pptxCRISPR PROJECT.pptx
CRISPR PROJECT.pptx
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 Introduction
 
Genome sequencing in vegetable crops
Genome sequencing in vegetable cropsGenome sequencing in vegetable crops
Genome sequencing in vegetable crops
 
RICHELLE SOPKO_resume_042215
RICHELLE SOPKO_resume_042215RICHELLE SOPKO_resume_042215
RICHELLE SOPKO_resume_042215
 
SJawdy_CV_June2016_no_personal
SJawdy_CV_June2016_no_personalSJawdy_CV_June2016_no_personal
SJawdy_CV_June2016_no_personal
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysis
 
Diversity Diversity Diversity Diversity ....
Diversity Diversity Diversity Diversity ....Diversity Diversity Diversity Diversity ....
Diversity Diversity Diversity Diversity ....
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
ASMNGS_ARETE_Beiko_2022Oct19.pptx
ASMNGS_ARETE_Beiko_2022Oct19.pptxASMNGS_ARETE_Beiko_2022Oct19.pptx
ASMNGS_ARETE_Beiko_2022Oct19.pptx
 
Encode Project
Encode ProjectEncode Project
Encode Project
 
Leveraging Programmable CRISPR-Associated Transposases for Next-Generation Ge...
Leveraging Programmable CRISPR-Associated Transposases for Next-Generation Ge...Leveraging Programmable CRISPR-Associated Transposases for Next-Generation Ge...
Leveraging Programmable CRISPR-Associated Transposases for Next-Generation Ge...
 
10.1.1.80.2149
10.1.1.80.214910.1.1.80.2149
10.1.1.80.2149
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 

More from Joe Parker

Field-based, real-time metagenomics and phylogenomics for responsive pathogen...
Field-based, real-time metagenomics and phylogenomics for responsive pathogen...Field-based, real-time metagenomics and phylogenomics for responsive pathogen...
Field-based, real-time metagenomics and phylogenomics for responsive pathogen...
Joe Parker
 

More from Joe Parker (12)

Challenges and potential of real-time phylogenomics: lessons from a metagenom...
Challenges and potential of real-time phylogenomics: lessons from a metagenom...Challenges and potential of real-time phylogenomics: lessons from a metagenom...
Challenges and potential of real-time phylogenomics: lessons from a metagenom...
 
Field-based, real-time metagenomics and phylogenomics for responsive pathogen...
Field-based, real-time metagenomics and phylogenomics for responsive pathogen...Field-based, real-time metagenomics and phylogenomics for responsive pathogen...
Field-based, real-time metagenomics and phylogenomics for responsive pathogen...
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe Parker
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' world
 
Reframing Phylogenomics
Reframing PhylogenomicsReframing Phylogenomics
Reframing Phylogenomics
 
Using field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomicsUsing field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomics
 
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
 
Joe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformaticsJoe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformatics
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe Parker
 
'Omics in extreme Environments (Lightweight bioinformatics)
'Omics in extreme Environments (Lightweight bioinformatics)'Omics in extreme Environments (Lightweight bioinformatics)
'Omics in extreme Environments (Lightweight bioinformatics)
 
Interpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasetsInterpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasets
 
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...
 

Recently uploaded

Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Sérgio Sacani
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Sérgio Sacani
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
Sérgio Sacani
 
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
PirithiRaju
 
The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...
Sérgio Sacani
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
Sérgio Sacani
 

Recently uploaded (20)

Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptx
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdf
 
The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...
 
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
 
Land use land cover change analysis and detection of its drivers using geospa...
Land use land cover change analysis and detection of its drivers using geospa...Land use land cover change analysis and detection of its drivers using geospa...
Land use land cover change analysis and detection of its drivers using geospa...
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptx
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
WASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 Rp
WASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 RpWASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 Rp
WASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 Rp
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
 
Lec 1.b Totipotency and birth of tissue culture.ppt
Lec 1.b Totipotency and birth of tissue culture.pptLec 1.b Totipotency and birth of tissue culture.ppt
Lec 1.b Totipotency and birth of tissue culture.ppt
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptx
 

Phylogenomic methods for comparative evolutionary biology - University College Dublin MSc - Joe Parker - 24th October 2014

  • 1. High-throughput comparative genomics 24th October 2013 Joe Parker, Queen Mary University London
  • 2. Topics 1. Introduction 2. Background: why phylog e nomics? 3. Examples 4. Practice 5. Case study 6. On the horizon 7. Over the horizon
  • 3. Aims • Context of phylogenomics: Next-generation sequencing (NGS) • Why phylog e nomics? • Practical analyses • Future developments
  • 5. Lab Interests • Ecology and evolution of traits • Echolocation, sociality • NGS data for population genetics and phylogenomics
  • 6. Activities • Phylogeny estimation/comparison • Molecular correlates of evolution; – site substitutions, dN/dS, composition • Simulation • Dataset limitations (R-L): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey
  • 9. Why phylog e nomics, not -genetics? • Causes of discordant signal – Incomplete lineage sorting – Lateral transfer – Recombination – Introgression
  • 10. Quantitative biology • Multiple configurations • Hyperparameters empirically investigated • Determine sensitivity of results
  • 11. Distributions • Genome-scale data provides context • Identify outliers Ge ne s / taxa / tre e s • Compare values across biological systems
  • 12. Integration with ‘Omics • Multiple databases • Functional data • Bibliographic information
  • 14. Tsakgogeorgia e t al. (in press)
  • 16. Backström e t al. (2013)
  • 17. Lindblad-Toh e t al. (2011)
  • 19. Source material • Samples • Storage • Purification • Library prep
  • 20. Sequencing • Genome – Sanger – Illumina – Pyro /454 – SOLiD – PacBio • Transcriptome / RNA-seq – MyBAITS • HiSeq / MiSeq • IonTorrent
  • 21. Infrastructure • Desktop machines • Computing clusters • Grid systems • Cloud-based computation
  • 22. Assembly, Annotation • Assembly – To reference (mapping) – De novo • Annotation – By homology – De novo •SOAPdenovo •MAKER •Velvet •Bowtie / Cufflinks / Tophat •Trinity
  • 23. Alignment • PRANK • MUSCLE • MAFFT • Clustal
  • 24. Phylogeny inference • MrBayes • RAxML • BEAST • MP-EST • STAR
  • 25. Phylogenetic analysis • BEAST • HYPHY • PAML • Pipelines • LRT
  • 27. Parker e t al. (2013) • De novo genomes: – four taxa – 2,321 protein-coding loci – 801,301 codons • Published: – 18 genomes • ~69,000 simulated datasets • ~3,500 cluster cores
  • 28. Our pipeline for detecting genome-wide convergence
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 37. mean = 0.05 mean = -0.01 mean = -0.08 
  • 38. Development cycle Design Wireframe & specify tests Implement Alignment loadSequences() getSubstitutions() Phylogeny trimTaxa() getMRCA() DataSeries calculateECDF() randomise() Regression getResiduals() predictInterval() Review, refine & refactor
  • 39. Parker e t al. (2013)
  • 40. Parker e t al. (2013)
  • 41. 6. On the horizon
  • 43. Models of computation • Cloud resources: Unlimited flexibility, finite time • Development trade-off – Off-the-shelf – Bespoke • Exploratory work – Real time genomic transects? • Essential fundamental data missing from nearly every system; – Diversity; structure; substitution rates; dN/dS; recombination; dispersal; lateral transfer
  • 44. Serialisation • Process data remotely • Freeze-dry objects, download to desktop • Implement new methods directly on previously-analysed data
  • 45. 7. Over the horizon • Real-time phylogenetics • Field phylogenetics • Alignment-free analyses
  • 46. Conclusions • Why phylogenomics? • Practice • Comparative approach • Statistical context
  • 47. Thanks Steve Rossiter1, James Cotton2, Elia Stupka3 & Georgia Tsagkogeorga1 1Scho o l o f Bio lo g ical and Chemical Scie nce s, Que e n Mary, Unive rsity o f Lo ndo n 2We llcome Trust Sang e r Institute 3Ce nte r fo r Translatio nal Ge no mics and Bio info rmatics, San Raffae le Institute , Milan Chris Walker & Dan Traynor Que e n Mary GridPP High-thro ughput Cluste r Chaz Mein & Anna Terry Barts and The Lo ndo n Ge no me Ce ntre Mahesh Pancholi Scho o l o f Bio lo g ical and Chemical Scie nce s BBSRC (UK); Queen Mary, University of London
  • 48. Resources • My email: Joe Parker (Queen Mary University of London): j.d.parker@qmul.ac.uk • Parker, J., Tsagkogeorga, G., Cotton, J.A., Liu, Y., Provero, P., Stupka, E. & Rossiter, S.J. (2013) Genome-wide signatures of convergent evolution in echolocating mammals. Nature 502(7470):228-231 doi:10.1038/nature12511. • Tsagkogeorga, G., Parker, J., Stupka, E., Cotton, J.A., & Rossiter, S.J. (2013) Phylogenomic analyses elucidate evolutionary relationships of the bats (Chiroptera) Curr. Biol. in the press. • Salichos, L. & Rokas, A. (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 437:327- 331. doi:10.1038/nature12130 • Backström, N., Zhang, Q. & Edwards, S.V. (2013) Evidence from a House Finch (Haemorhous mexicanus) Spleen Transcriptome for Adaptive Evolution and Biased Gene Conversion in Passerine Birds. MBE 30(5):1046-50. doi:10.1093/molbev/mst033 • Lindblad-Toh, K., Garber, M., Zuk, O., Lin, M.F., Parker, B.J., et al. (2011) A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478:476–482 doi:10.1038/nature10530 • Degnan, J.H. & Rosenberg, N.A. (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. TREE 24:(6)332-340 doi:10.1016/j.tree.2009.01.009 • The Tree Of Life: http://phylogenomics.blogspot.co.uk/ • RNA-seq For Everyone: http://rnaseq.uoregon.edu/index.html • Evo-Phylo: http://www.davelunt.net/evophylo/tag/phylogenomics/ • OpenHelix: http://blog.openhelix.eu/ • Our blogs: http://evolve.sbcs.qmul.ac.uk/rossiter/ (lab) and http://www.lonelyjoeparker.com/?cat=11 (Joe)

Editor's Notes

  1. Quick through this
  2. Moore’s law, sequencing data etc Order-of-magnitude improvements: Sequencing throughput, accuracy Computational power
  3. Concatenated, RAxML B) per-locus support counts; RAxML concat and coalescent gave H1 overall
  4. Almost as many discrete gene trees as genes
  5. Backstrom - approach as measuring exercise
  6. Surveying
  7. Technologies and tools, mature
  8. Technologies and tools
  9. SOAPdenovo-Trans[edit] SOAPdenovo-Trans is a de novo transcriptome assembler inherited from the SOAPdenovo2 framework, designed for assembling transcriptome with alternative splicing and different expression level. The assembler provides a more comprehensive way to construct the full-length transcript sets compare to SOAPdenovo2. Velvet/Oases[edit] (Main article: Velvet assembler) The Velvet algorithm uses de Bruijn graphs to assemble transcripts. In simulations, Velvet can produce contigs up to 50-kb N50 length using prokaryotic data and 3-kb N50 in mammalian bacterial artificial chromosomes (BACs).[15] These preliminary transcripts are transferred to Oases, which uses paired end read and long read information to build transcript isoforms.[16] Trans-ABySS[edit] ABySS is a parallel, paired-end sequence assembler. Trans-ABySS (Assembly By Short Sequences) is a software pipeline written in Python and Perl for analyzing ABySS-assembled transcriptome contigs. This pipeline can be applied to assemblies generated across a wide range of k values. It first reduces the dataset into smaller sets of non-redundant contigs, and identifies splicing events including exon-skipping, novel exons, retained introns, novel introns, and alternative splicing. The Trans-ABySS algorithms are also able to estimate gene expression levels, identify potential polyadenylation sites, as well as candidate gene-fusion events.[17] Trinity[edit] Trinity[18] first divides the sequence data into a number of de Bruijn graphs, each representing transcriptional variations at a single gene or locus. It then extracts full-length splicing isoforms and distinguishes transcripts derived from paralogous genes from each graph separately. Trinity consists of three independent software modules, which are used sequentially to produce transcripts: Inchworm assembles the RNA-Seq data into transcript sequences, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts. Chrysalis clusters the Inchworm contigs and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptional complexity for a given gene (or a family or set of genes that share a conserved sequence). Chrysalis then partitions the full read set among these separate graphs. Butterfly then processes the individual graphs in parallel, tracing the paths of reads within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes.[19] Cufflinks[edit] Cufflinks [20] is a program that assembles aligned RNA-Seq reads into transcripts, estimates their abundances, and tests for differential expression and regulation transcriptome-wide. Cufflinks constructs a parsimonious set of transcripts that "explain" the reads observed in an RNA-Seq experiment. It does so by reducing the comparative assembly problem to a problem in maximum matching in bipartite graphs. In essence, Cufflinks implements a constructive proof of Dilworth's theorem by constructing a covering relation on the read alignments, and finding a minimum path cover on the Directed acyclic graph for the relation.
  10. Technologies and tools
  11. Technologies and tools
  12. Pervasive phylogenetic incongruence test for phylogenetic discordance attributable to genetic convergence, when applied to different contexts it could equally be used to measure discordance that has arisen by other processes, some of which will be more applicable to tropical systems: - Horizontal gene transfer among bacteria - Introgression across species barriers - Incomplete lineage sorting
  13. RUNTIME --- ~weeks --> hours Object-oriented design Separation of code into modular objects Re-use methods through inheritance Abstraction of behaviour allows modifications to parts of the API without affecting other tested code Incorporate other libraries
  14. Pervasive phylogenetic incongruence test for phylogenetic discordance attributable to genetic convergence, when applied to different contexts it could equally be used to measure discordance that has arisen by other processes, some of which will be more applicable to tropical systems: - Horizontal gene transfer among bacteria - Introgression across species barriers - Incomplete lineage sorting
  15. Pervasive phylogenetic incongruence test for phylogenetic discordance attributable to genetic convergence, when applied to different contexts it could equally be used to measure discordance that has arisen by other processes, some of which will be more applicable to tropical systems: - Horizontal gene transfer among bacteria - Introgression across species barriers - Incomplete lineage sorting