GIAB-GRC workshop oct2015 giab introduction 151005


Published on

GIAB-GRC workshop oct2015 giab introduction 151005

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

GIAB-GRC workshop oct2015 giab introduction 151005

  1. 1. Genome in a Bottle Consortium GIAB/GRC Pre-ASHG Workshop October 5, 2015 Reference Materials for Clinical Applications of Human Genome Sequencing Justin Zook and Marc Salit National Institute of Standards and Technology
  2. 2. Sequencing technologies and bioinformatics pipelines disagree O’Rawe et al. Genome Medicine 2013, 5:28
  3. 3. Sequencing technologies and bioinformatics pipelines disagree O’Rawe et al. Genome Medicine 2013, 5:28 Who is right? Is anyone right?
  4. 4. GIAB Scope • The Genome in a Bottle Consortium is developing the reference materials, reference methods, and reference data needed to assess confidence in human genome variant calls. • A principal motivation for this consortium is to enable performance assessment of sequencing and science-based regulatory oversight of clinical sequencing.
  5. 5. Well-characterized, stable RMs • Obtain metrics for validation, QC, QA, PT • Determine sources and types of bias/error • Learn to resolve difficult structural variants • Improve reference genome assembly • Optimization – integration of data from multiple platforms – sequencing and analysis • Enable regulated applications Comparison of SNP Calls for NA12878 on 2 platforms, 3 analysis methods
  6. 6. NGS Validation Process using Genomes in Bottles Sample gDNA isolation Library Prep Sequencing Alignment/Mapping Variant Calling Confidence Estimates Downstream Analysis Analytical Process Genome in a Bottle Scope Pre-Analytical Process Clinical Interpretation GIAB Data
  7. 7. Genome in a Bottle Consortium (GIAB) Hosted by US National Institute of Standards and Technology Goal: Provide infrastructure to assess confidence in human variant calls • Appropriately consented widely available DNA samples, distributed by the Coriell Institute – Also, QCed Reference Material (RM) versions from controlled lots will be available from NIST – Also, PGP samples are commercially available • High-accuracy reference data for these samples • Tools to facilitate their use – With the Global Alliance Data Working Group Benchmarking Team
  8. 8. GIAB Selected Samples CEPH/Utah Pedigree 1463 ✔ NA1288 9 NA12879 NA12890 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893 NA12877 NA12878 NA12891 NA12892 ✔ ✔ NA24149 NA24143 NA24385 Ashkenazi Jewish Trio ✔ NA24694 NA24695 NA24631 Asian (Han Chinese) Trio ✔ Note: Illumina and RTG have used data from the pedigree to improve variant calls in the specific GIAB samples. New New Personal Genome Project Available as NIST RM8398
  9. 9. NIST Human Genome Reference Materials (RMs) • NIST RM 8398 is available! – – DNA isolated from large growth cell cultures – Stable, homogeneous – Best for regulated uses – DNA from same cell line at Coriell (NA12878) • New AJ and Asian Samples – Available from Coriell now – NIST RM available in 2016
  10. 10. Integrated 14 datasets from 5 platforms to establish Reference SNP/indel Calls for NA12878 Zook et al., Nature Biotechnology, 2014.
  11. 11. Integration Methods to Establish Reference Variant Calls for NA12878 Candidate Variants from Each Platform Identify Concordant Variants Identify Characteristics of Systematic Error Arbitrate Using Evidence of Systematic Error Exclude regions potentially biased for all short reads (e.g., repeats, SVs) Zook et al., Nature Biotechnology, 2014.
  12. 12. Assigning confidence to genomic regions for NA12878 High-confidence (77%) • Platforms agree or we understand the systematic biases causing disagreement • At least some methods have no evidence of systematic errors • Mendelian inheritance consistent Lower confidence (23%) • In a region known to be difficult for current technologies – Segmental Dups – Repeats, Low Complexity – High/Low GC – Etc. • Evidence of systematic error across many platforms • Inconsistent inheritance Zook et al., Nature Biotechnology, 2014.
  13. 13. Using high-confidence NIST-GIAB genotypes for NA12878 • NIST have released several versions of high- confidence genotypes for its pilot RM • These data are presently being used for benchmarking – prior to release of RMs – SNPs & indels • ~77% of the genome •Data on FTP now well-organized
  14. 14. GeT-RM Browser from NCBI and CDC • • Allows visualization of data underlying call each call
  15. 15. Uses of GIAB NA12878 Oncology – Molecular and Cellular Tumor Markers “Next Generation” Sequencing (NGS) guidelines for somatic genetic variant detection
  16. 16. Global Alliance for Genomics and Health Benchmarking Task Team • Formed June 2014 to develop methods and tools for comparing variant calls to a benchmark • Developed standardized definitions for performance metrics like TP, FP, and FN. • Initial focus on germline SNPs/indels • Developing benchmarking tools • Comparison engine • Pluggable web interface with modules for: • Reporting/calculation of metrics • Visualization/user interface • Working with Genome in a Bottle Consortium to host data and calls from their well-characterized genomes Example User Interface
  17. 17. Global Alliance for Genomics and Health Benchmarking Task Team Credit: Rebecca Truty, Complete Genomics How should we interpret this complex variant on chr21?
  18. 18. Global Alliance for Genomics and Health Benchmarking Task Team Credit: Rebecca Truty, Complete Genomics Beyond simple T/F classification: Genotype errors Trut h Callse t Description Proposed Name(s) CM#1 region match CM#2 allele match CM#3 genotype match 0/1 1/1 zygosity/genotype error GE TP 1TP, 1GE FN 1/1 0/1 1/2 0/1 1/1 0/2 2/2 common allele, FN allele GE_FN TP 1TP, 1GE, 1FN FN 0/1 1/2 common allele, FP allele GE_FP TP 1TP, 1GE, 1FP FP, FN 1/1 1/2 1/2 1/3 common allele, FP allele, FN allele GE_FP_FN TP 1TP, 1GE, 1FP, 1FN FP, FN
  19. 19. Global Alliance for Genomics and Health Benchmarking Task Team Credit: Rebecca Truty, Complete Genomics Beyond simple T/F classification: no-calls and half-calls Truth Callset Description Proposed Name(s) CM#1 region match CM#2 allele match CM#3 genotyp e match 0/1 ./1 half-call, TP allele HC_TP NC, NCV, TP 1NC, 1NCV, 1TP, 1GE TP 1/1 ./1 1NC, 1NCV, 1TP, 1GE FN 0/1 1/1 ./0 half call, FN allele(s) HC_FN NC, NCV, TP 1NC, 1NCV, 1FN FN 1/2 ./0 1NC, 2NCV, 2FN FN 1/2 ./1 ./2 half-call, TP allele, FN allele HC_TP_F N NC, NCV, TP 1NC, 1NCV, 1TP, 1GE, 1FN FN
  20. 20. Stratifying False PositivesGC Content TR Unit <7 TR Unit >=7 TR Unit 2TR Unit 1 TR Unit 3 TR Unit 4 Credit: Abby Beeler Ellie Wood GA4GH - Stratification
  21. 21. Public data from GIAB AJ PGP Trio Long reads/”Linked” reads • ~70/30/30x PacBio – ~11kb N50 • BioNano • 10X Genomics • Moleculo • Complete Genomics LFR • Oxford Nanopore Short reads • 300x Illumina paired-end • 15x Illumina 6kb mate-pair • Complete Genomics • SOLiD 5500W • Ion Proton Exome
  22. 22. GIAB Analysis Group – New Data Sets Leaders • Francisco de la Vega – Annai Systems • Chris Mason – Weil Cornell Medical Center • Tina Graves – Washington University • Valerie Schneider – NCBI •and Justin and Marc Status • Analysis Group Responsibilities: – A0DwB4iYTSFM_LPO9_2LyyN2xEqH49OXH htNH1uzw/edit?usp=sharing • Analysis Milestones: – H742g40wJz2fA6f8kFtZYAToZpSZYVPiC5st4/edit?u sp=sharing • Analysis Methods – s/d/1Je2g85H7oK6kMXbBOoqQ1FM NrvGnFuUJTJn7deyYiS8/edit?usp=sha ring • Analysis Plan: – JJDHQdnVEaVdqbWdEdkE/view?usp=shari ng • Collecting Data and analyses on GIAB FTP Site • Recruiting people to help with the work. Goal: Establish and distribute a set of authoritative benchmark variant calls of all types and sizes, as well as homozygous reference regions, on GIAB PGP trios
  23. 23. Data Release Policy: Real-time, Open, Public Release Individual Datasets • Uploaded to GIAB FTP site as it is collected • Includes raw reads, aligned reads, and variant/reference calls Integrated High-confidence Calls • First develop SNP, indel, and homozygous reference calls • Then develop SV and non- SV calls • Released calls are versioned • Preliminary callsets will be made available to be critiqued
  24. 24. Analysis Progress: AJ Trio • SNPs/indels – Several candidate callsets – NIST working on integration – Plan to use 10X/moleculo/PacBio for difficult-to-map regions • Assembly – 2 de novo assemblies of AJ trio (MHAP/PBcR and Falcon/Bionano) – Will be used by at least 2 groups for SV calling • Structural variants – Candidate calls being generated by 15+ groups with >20 different algorithms and 6 datasets – 3 integration methods: Bina-MetaSV, DNAnexus/Baylor- Parliament, NIST-svclassify – Parliament: ~7k SVs with evidence in PacBio and Illumina • Long-range Phasing – 2 phased calls so far (CG LFR and 10X) – Integration methods needed
  25. 25. Proposed approach to form high- confidence SV (and non-SV) calls Generate candidate calls from multiple methods Compare/evaluate calls using Parliament/MetaSV/svclassify/others?; manually inspect discordant calls Integrate new and revised calls Combine integrated calls (with heuristics and/or machine learning) to generate high- confidence calls August 30, 2015 Nov 1, 2015 Jan 1, 2016 Jan 26, 2016
  26. 26. Acknowledgments • FDA – Elizabeth Mansfield, Computing staff • Many members of Genome in a Bottle – New members welcome! – Sign up on website for email newsletters Steering Committee – Marc Salit – Justin Zook – David Mittelman – Andrew Grupe – Michael Eberle – Steve Sherry – Deanna Church – Francisco De La Vega – Christian Olsen – Monica Basehore – Lisa Kalman – Christopher Mason – Elizabeth Mansfield – Liz Kerrigan – Leming Shi – Melvin Limson – Alexander Wait Zaranek – Nils Homer – Fiona Hyland – Steve Lincoln – Don Baldwin – Robyn Temple-Smolkin – Chunlin Xiao – Kara Norman – Luke Hickey
  27. 27. For More Information - sign up for general GIAB and Analysis Team google group emails - exome comparison tool - Get-RM Browser Data: Global Alliance Benchmarking work group – Twice yearly workshop – Winter: January 28-29, 2016 at Stanford University, California, USA – Summer at NIST, Maryland, USA Public Meetings Justin Zook: Marc Salit: