Mar2013 Performance Metrics Working Group


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Mar2013 Performance Metrics Working Group

  1. 1. Performance Metric & Figures of Merit David Jenkins on behalf of Justin H. Johnson Director of Bioinformatics
  2. 2. CLIA #21D2039005 MD State License 1853
  3. 3. Who are we?• Justin Johnson – Managing Director of Services – Director of Bioinformatics – 10 Years at JCVI before EdgeBio – Project Manager - Archon Genomics XPrize• EdgeBio – CLIA Lab – Illumina Hiseq & Miseq, Ion Proton & PGM
  4. 4. Overview – GIAB as I See It.• Which genomes?• How do we sequence them?• How do we analyze them?• How do we enable their usage?
  5. 5. Overview Bioinformatics Experimental DataData Integration • Sequence Data & Variation • Metadata/ Representation Database Refine and Feedback • RM vs. Reference • Every Base Compare and Report Visualize and Filter • Single Genome Browser • Browser over DB • • Query by Experiment Data Experimental Data = Combination of Prep / Sequencing / Analysis
  6. 6. Experimental Data• GetRM Model for Collection –• Preparation – Link to published prep protocol – ROI in Bed/GFF/GBK Format• Sequencing – Platform Information (Minimally - Name) – Chemistry (Minimally - Version)• Analysis – Link to published analysis protocol or best practices – Read Data (fastq, sra, hdf5, others) – Alignment/Assembly Data (bam) • Minimal Tag Set TBD – Variation (VCF or gVCF) • Minimal Tag Set TBD in INFO field of VCF or define external XSD •
  7. 7. gVCF
  8. 8. Meta Data• All required fields in VCF 4.1• Others (Examples) – AA : ancestral allele – AC : allele count in genotypes, for each ALT allele, in the same order as listed – AF : allele frequency for each ALT allele in the same order as listed: use this when estimated from primary data, not called genotypes – AN : total number of alleles in called genotypes – BQ : RMS base quality at this position – CIGAR : cigar string describing how to align an alternate allele to the reference allele – DB : dbSNP membership – DP : combined depth across samples, e.g. DP=154 – END : end position of the variant described in this record (for use with symbolic alleles) – H2 : membership in hapmap2 – VALIDATED : validated by follow-up experiment• Reference Block Implementations• Handle Indel Conflicts and Resolution• Genotype Quality for non-variant sites (GQX)
  9. 9. Database• Store Each Base + Meta of RM versus Reference for each Experiment from gVCF – Distinguish missing versus homozygous reference – Include copy number and phasing when available, not required• Engine that drives front end visualization (Genome Browser)• Build on GetRM/NCBI Database Work
  10. 10. Visualize and Filter• Build on GetRM/NCBI Browser Work• Single RM -> Many Experiments• Not all metadata will be visual, but most/all will be filterable• Filter data to generate ROI or VOI – Canned: i.e. Intersect of All Platforms + Analysis, All OMIM SNPs, Clinical Cert SNV List, etc – Dynamic: allowing people to explore prep, sequence, or analysis bias• Slice, Dice, Export VOI to compare and reporting SW• Allow user defined tracks• By product is community educational resource – I have a ROI for a test and want to know what platform, prep, exome kit version, etc covers it best. What do I do?
  11. 11. Parallel Database, Filter Effort (Gemini) Quinlan Lab at UVA - • Gemini – simple, flexible, and powerful framework for exploring genetic variation • Basic browser capabilities being developed • Flexible custom annotation and metadata addition to DB • Leverage the expressive power of SQL while overcoming fundamental challenges associated with using databases for very large datasets
  12. 12. Gemini
  13. 13. Gemini
  14. 14. Gemini
  15. 15. Compare and Reporting• Take in ROI or VOI from the visualize and filter stage• Take in user defined VOI or VOI + ROI• Leverage SW under to generate reports and files including BNLT: – Summary of completeness, accuracy, phasing – Discordant variants in VCF – Concordant variants in VCF – Phasing errors in VCF• Provide intuitive way to feed these resultants in downstream analysis SW (VarinatViz, IO8) or back into browser (User Defined Track)
  16. 16. • $10 million prize competition to showcase whole genome sequencing technology• Award to the team(s) who can most completely, accurately and affordably sequence 100 human genomes in 30 days or less• Competing Teams will sequence the genomes of the 100 centenarians who have evaded the usual diseases of aging such as heart disease, diabetes, cancer and Alzheimer’s
  17. 17. AGXP Validation Study Overview
  18. 18. AGXP Validation Study Analysis• 2 Major Phases using NA19239 and NA12878 – Develop Reference Standards • Fosmid Reconstruction, Variation Discovery • Technology Comparison and Bias Removal – Develop Performance Metrics • Software Development • Help labs use the data
  19. 19. Compare and Report• The website provides a simple way for anyone to compare their variant calls against the public reference genomes.• Encourages submission and analysis in public tools like Galaxy through transparent interoperability with GenomeSpace.
  20. 20. Compare and Report
  21. 21. Compare and Report
  22. 22. Compare and Report
  23. 23. Follow On• Export different categories (Concordant/Discordant/Phasing Error) variants to VariantViz IO8• Visualize Quality, Allele Frequencies, Depth, etc Info to detect patterns in and between variant categories
  24. 24. Concordant SNPsPotential false positives
  25. 25. Xprize Team• Justin H. Johnson and Team - EdgeBio• Brad Chapman Harvard: automated high-throughput analysis pipelines with custom visualization and processing tools• Gabor Marth Boston College: Read mapping, single-nucleotide and insertion-deletion polymorphism detection, and discovery of structural variants.• Aaron Quinlin University of Virginia: structural variation (SV)• Granger Sutton JCVI: Oversight Committee• Victor Jongeneel University of Illinois and NCSA: Oversight Committee• Larry Kedes UCLA: Oversight Committee
  26. 26. EdgeBio Team• LAB • IFX – Joy Adigun – David Jenkins – Ryan Mease – Anju Varadarajan – Jennifer Sheffield – Vani Rajan – Aaron Johnson – Karthik Kota – Jackie Jackson – Phil Dagasto • Adam Bennett • Isabel Llorente
  27. 27. Thank You! More info available at