Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

150219 agbt giab_poster_marc


Published on

General GIAB poster

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

150219 agbt giab_poster_marc

  1. 1. Bioinformatics, Data Integration, and Data Representation Group In 2012, NIST convened the Genome in a Bottle Consortium to develop the metrology infrastructure needed to enable confidence in human whole genome variant calls. Consortium products will include: • Well-characterized whole genome and synthetic DNA Reference Materials (RMs) • Reference data associated with the RMs • Reference methods (Comparison tools, documentary standards) These Genome in a Bottle products will help enable translation of whole genome sequencing to clinical applications. Expected use cases of these products include: • Enable regulated applications • Validation, QC, proficiency testing • Identify and quantify sources of bias & variability • Optimize measurement technologies • Resolve structural variants • Improve reference assembly • Integrate data from multiple platforms Overview Reference Material Selection and Design Group • Personal Genome Project samples – consent for commercialization • Ashkenazi Jewish trio • East Asian trio • Additional diversity and a large family? • Supporting inter-laboratory analysis of potential commercial reference materials - recruiting labs now • Are synthetic spike-ins a good surrogate for real somatic mutations? • Spike-ins vs. FFPE engineered cell lines vs. FFPE tissue Genome in a Bottle: So you’ve sequenced a genome, how well did you do? Marc Salit, Justin Zook, Genome in a Bottle Consortium Genome-scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, MD 20899 Measurements for Reference Material Characterization Group Performance Metrics Group Developing Benchmark Genotypes • Performance Metrics Specification • Available on GIAB blog • Global Alliance for Genomics & Health • Formed Benchmarking Task Team to develop methods and tools for comparing variant calls to a benchmark • Developed standardized definitions for performance metrics like TP, FP, and FN. • Developing benchmarking tool in 3 parts: Comparison, Reporting, and Visualization • NCBI/CDC GeT-RM Genome Browser • Visualization of data Mutation of Interest Alien Barcode Point Mutation Control Plasmids from M. Williams et al. Frederick National Laboratory for Cancer Research • Developed data integration methods and benchmark genotype calls for NA12878 • Multi-platform method • Published by Zook et al. (2014) in Nature Biotechnology • Newest calls integrate Pedigree methods • Real Time Genomics (RTG) • Illumina Platinum Genomes • NCBI hosts FTP with raw data and calls • • Mirrored to AWS S3: How you can get involved: • Join Analysis Group for Personal Genome Project trios • Help with Structural Variant calls and difficult regions of the genome • Help with analyzing data from long-read technologies • Attend our biannual workshops (January in CA, August in MD) • Help develop definitions and methods to measure performance using our well-characterized genomes with Global Alliance for Genomics & Health Benchmarking Working Group ( • Use our integrated SNP/indel/homozygous reference genotypes for NA12878 and give us feedback Reference Materials Sample Preparation Sequencing Bioinformatics Variant List, Performance metrics Genome in a Bottle Consortium New members welcome! Sign up for newsletters at Overlap of SNP calls between three variant call files and proposed methods to arbitrate between multiple datasets and produce high-confidence integrated SNP, indel, and homozygous reference genotypes. A similar integration process has been applied to our pilot genome based on NA12878 (see Zook et al, Nat. Biotech, 2014), and we plan to use these methods to produce high-confidence calls for the Ashkenazim and Asian trios from the Personal Genome Project. Structural Variants • We are developing similar methods for SVs (see Zook et al. poster) • Methodology development to annotate each SV using coverage, insert size, discordant paired ends, mapping quality, soft-clipping … • How to use long-read technologies? Normalize and take union of calls Simple SNPs/indels Illumina/SOLiD – GATK HC force calls Ion – TVC force calls If all biased or low qual, uncertain Elseif all concordant, high- conf Elseif all unbiased are concordant, high-conf Else uncertain CG – use Ref file Complex Variants Use GA4GH methods for sequential pair- wise comparison Dataset Characteristics Coverage Availability Good for… Illumina Paired-end 150x150bp ~300x/individual Fastq on FTP SNPs/indels/some SVs Illumina Long Mate pair ~6000 bp insert ~40x/individual Feb-Mar 2015 SVs Illumina “moleculo” Custom library ~30x by long fragments Feb-Mar 2015 SVs/phasing/assembly Complete Genomics Paired end ~100x/individual On FTP SNPs/indels/some SVs Complete Genomics LFR Mar 2015 SNPs/indels/phasing Ion Proton Exome 1000x/individual On FTP/SRA SNPs/indels in exome BioNano Genomics Optical mapping Feb 2015 SVs/assembly PacBio ~10kb reads ~120-150x on AJ trio 50% on FTP; Finished ~Mar 2015 SVs/phasing/assembly/S TRs Forming an analysis group: • Using long-reads • SV analysis • De novo assembly • Complex variants • All data is public • Now recruiting members