Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ASHG 2015 Genome in a bottle

2,991 views

Published on

ASHG 2015 Genome in a bottle

Published in: Health & Medicine
  • Be the first to comment

ASHG 2015 Genome in a bottle

  1. 1. Genome in a Bottle: You’ve sequenced. How well did you do? October 9, 2015 Justin Zook, Marc Salit, and the Genome in a Bottle Consortium *Nothing to Disclose
  2. 2. Sequencing technologies and bioinformatics pipelines disagree O’Rawe et al. Genome Medicine 2013, 5:28
  3. 3. Sequencing technologies and bioinformatics pipelines disagree O’Rawe et al. Genome Medicine 2013, 5:28 Who is right? Is anyone right?
  4. 4. Genome in a Bottle Consortium (GIAB) Hosted by US National Institute of Standards and Technology Goal: Provide infrastructure to assess confidence in human variant calls • Appropriately consented widely available DNA samples, distributed by the Coriell Institute – Also, QCed Reference Material (RM) versions from controlled lots will be available from NIST – Also, PGP samples are commercially available • High-accuracy reference data for these samples • Tools to facilitate their use – With the Global Alliance Data Working Group Benchmarking Team Global Alliance for Genomics and Health ga4gh.org Genome in a Bottle genomeinabottle.org
  5. 5. GIAB Selected Samples CEPH/Utah Pedigree 1463 ✔ NA1288 9 NA12879 NA12890 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893 NA12877 NA12878 NA12891 NA12892 ✔ ✔ NA24149 NA24143 NA24385 Ashkenazi Jewish Trio ✔ NA24694 NA24695 NA24631 Asian (Han Chinese) Trio ✔ Note: Illumina and RTG have used data from the pedigree to improve variant calls in the specific GIAB samples. New New Personal Genome Project Available as NIST RM8398
  6. 6. NGS Validation Process using Genomes in Bottles Sample gDNA isolation Library Prep Sequencing Alignment/Mapping Variant Calling Confidence Estimates Downstream Analysis Analytical Process Genome in a Bottle Scope Pre-Analytical Process Clinical Interpretation GIAB Data
  7. 7. Pilot Genome: NA12878
  8. 8. Integrated 14 datasets from 5 platforms to establish Reference SNP/indel Calls for NA12878 Zook et al., Nature Biotechnology, 2014. ~77 % High-confidence ~23 % Uncertain
  9. 9. Uses of GIAB NA12878 Oncology – Molecular and Cellular Tumor Markers “Next Generation” Sequencing (NGS) guidelines for somatic genetic variant detection www.bioplanet.com/gcat
  10. 10. GeT-RM Browser from NCBI and CDC • http://www.ncbi.nlm.nih.gov/variation/tools/get-rm/
  11. 11. Global Alliance for Genomics and Health Benchmarking Task Team • Developed standardized definitions for performance metrics like TP, FP, and FN. • Developing sophisticated benchmarking tools • vcfeval – Len Trigg • hap.py – Peter Krusche • vgraph – Kevin Jacobs • Standardized bed files with difficult genome contexts for stratification Credit: GA4GH, Abby Beeler, Ellie Wood Stratification of FP Rates Higher FP rates at Tandem Repeats
  12. 12. New GIAB Trios from Personal Genome Project
  13. 13. Public, unembargoed data from GIAB AJ PGP Trio Long reads/”Linked” reads • ~70/30/30x PacBio – ~11kb N50 • ~100x BioNano • ~30x 10X Genomics • ~20x Moleculo • Complete Genomics LFR • ~0.005x Oxford Nanopore Short reads • 300x Illumina paired-end • 15x Illumina 6kb mate-pair • 100x Complete Genomics • 60x SOLiD 5500W • 1000x Ion Proton Exome http://biorxiv.org/content/early/2015/09/15/026468
  14. 14. GIAB Analysis Group – New Data Sets Leaders • Francisco de la Vega • Chris Mason • Tina Graves • Valerie Schneider • Justin Zook • Marc Salit Status • Analysis Group Responsibilities: – https://docs.google.com/document/d/10e A0DwB4iYTSFM_LPO9_2LyyN2xEqH49OXH htNH1uzw/edit?usp=sharing • Analysis Milestones: – https://docs.google.com/spreadsheets/d/1Pj4nSz H742g40wJz2fA6f8kFtZYAToZpSZYVPiC5st4/edit?u sp=sharing • Analysis Methods – https://docs.google.com/spreadsheet s/d/1Je2g85H7oK6kMXbBOoqQ1FM NrvGnFuUJTJn7deyYiS8/edit?usp=sha ring • Analysis Plan: – https://drive.google.com/file/d/0B7Ao1qq JJDHQdnVEaVdqbWdEdkE/view?usp=shari ng • Collecting Data and analyses on GIAB FTP Site • Recruiting people to help with the work. Goal: Establish and distribute a set of authoritative benchmark variant calls of all types and sizes, as well as homozygous reference regions, on GIAB PGP trios
  15. 15. Analysis Progress: AJ Trio • SNPs/indels – NIST working on integration – 10X/moleculo/PacBio for difficult-to-map regions • Assembly – 2 de novo assemblies – Useful for SV calling • Structural variants – Candidate calls being generated by 15+ groups with >20 different algorithms and 6 datasets – 3+ integration methods • Long-range Phasing – 2 phased calls so far (CG LFR and 10X) – Integration methods needed • Other analyses – CpG methylation with PacBio and Illumina
  16. 16. GIAB AJ Trio PacBio-only Assemblies PacBio Only Input Algorithm # of Contigs N50 Max Total Child MHAP/Celera (Phillippy Lab) 13,048 4.5Mb 35.1Mb 3.0Gb Child Daligner/Falcon (Chin/Bashir) 9,973 7.1Mb 39.2Mb 3.0Gb Mother MHAP/Celera (Phillippy Lab) 23,493 1.03Mb 8.9Mb 3.0Gb Father MHAP/Celera (Phillippy Lab) 16,326 0.91Mb 9.8Mb 3.0Gb Merged Trio Daligner/Falcon (Chin/Bashir) 5,680 9.25 Mb 50.3Mb 2.9Gb Credits: Ali Bashir, Jason Chin, Adam Phillippy, and Serge Koren
  17. 17. GIAB AJ Trio Hybrid PacBio/BioNano Assembly Hybrid (PacBio with BioNano) Input Assembly Notes # of Scaffolds N50 Max Total HG002 Falcon 248 22.7Mb 92.8Mb 2.38Gb Trio Falcon 210 29.3Mb 87.6Mb 2.32Gb Two Step Trio celera (child) + falcon (trio) 187 34.3Mb 98.0Mb 2.6Gb Credits: Ali Bashir, Jason Chin, Alex Hastie Pendleton et al, Nature Methods, 2015
  18. 18. Proposed approach to form high- confidence SV (and non-SV) calls Generate Candidate Calls Compare/evaluate calls using Parliament/MetaSV/svclassify/others?; manual inspection Integrate new and revised calls; manual inspection Combine integrated calls; manual inspection; targeted experimental validation? August 30, 2015 Nov 1, 2015 Jan 1, 2016 Jan 26, 2016 and beyond
  19. 19. Very Preliminary Confirmation of SVs Integration results from AJ son Parliament: BMC Genomics, 2015, 16:286 (performed by Andrew Carroll, DNAnexus) MetaSV: Bioinformatics, 2015, 31:2741 (performed by Marghoob Mohiyuddin, Bina/Roche) • Parliament – Candidates from Illumina – Confirmed by PacBio and/or Illumina – ~50% in both technologies – ~4.5k deletions, 1k insertions – 85% of Genotypes consistent within Trio • MetaSV – Multiple types of evidence from Illumina MetaSV Total: 2809 Parliament Total: 5467 569 (20 %) 977 (18 %) MetaSV 2240 (80 %) Parliament 4490 (82 %) 50 % reciprocal overlap Some overlap within Parliament calls
  20. 20. New GIAB GitHub Site github.com/genome-in-a-bottle Credit: Chunlin Xiao, NCBI
  21. 21. WARNINGS • Easiest to benchmark only within high- confidence bed file • Benchmark calls/regions tend to be biased towards easier variants and regions – Some clinical tests are enriched for difficult sites • Always manually inspect a subset of FPs/FNs • Stratification by variant type and region is important • Always calculate confidence intervals
  22. 22. Acknowledgments • FDA – Elizabeth Mansfield, Computing staff • Many members of Genome in a Bottle – New members welcome! – Sign up on website for email newsletters Steering Committee – Marc Salit – Justin Zook – David Mittelman – Andrew Grupe – Michael Eberle – Steve Sherry – Deanna Church – Francisco De La Vega – Christian Olsen – Monica Basehore – Lisa Kalman – Christopher Mason – Elizabeth Mansfield – Liz Kerrigan – Leming Shi – Melvin Limson – Alexander Wait Zaranek – Nils Homer – Fiona Hyland – Steve Lincoln – Don Baldwin – Robyn Temple-Smolkin – Chunlin Xiao – Kara Norman – Luke Hickey
  23. 23. For More Information www.genomeinabottle.org - sign up for general GIAB and Analysis Team google group emails github.com/genome-in-a-bottle – Guide to GIAB data & ftp www.slideshare.net/genomeinabottle www.ncbi.nlm.nih.gov/variation/tools/get-rm/ - Get-RM Browser Data: http://biorxiv.org/content/early/2015/09/15/026468 Global Alliance Benchmarking Team – ga4gh.org/#/benchmarking-team Twice yearly workshop – Winter: January 28-29, 2016 at Stanford University, California, USA – Summer at NIST, Maryland, USA Public Meetings! Justin Zook: jzook@nist.gov Marc Salit: salit@nist.gov Contribute calls or critically evaluate GIAB calls! NIST/NRC Postdoc Opportunities available!

×