Performance Metrics and Figures of Merit Working Group Summary Aug2012Presentation Transcript
Genome in a Bottle Performance Metric & Figures of Merit
OverviewBioinforma5cs Experimental Data Data • Sequence Data & Varia5on Integra5on / • Metadata Representa5on Database Reﬁne and Feedback • RM vs. Reference • Every Base Compare and Report Visualize and Filter • Single Genome Browser • Browser over DB • Valida5onProtocol.org • Query by Experiment Data Experimental Data = Combination of Prep / Sequencing / Analysis
Experimental Data• Prepara5on – Link to published prep protocol – ROI in Bed/GFF/GBK Format • Sequencing – PlaQorm Informa5on (Minimally Name) – Chemistry (Minimally Version) • Analysis – Link to published analysis protocol or best prac5ces – Read Data (fastq, sra, hdf5, others) – Alignment/Assembly Data (bam) • Minimal Tag Set TBD – Varia5on (vcf) • Minimal Tag Set TBD in INFO ﬁeld of VCF or deﬁne external XSD
Metadata• All Required ﬁelds in VCF 4.1 • Others (Examples) – AA : ancestral allele – AC : allele count in genotypes, for each ALT allele, in the same order as listed – AF : allele frequency for each ALT allele in the same order as listed: use this when es5mated from primary data, not called genotypes – AN : total number of alleles in called genotypes – BQ : RMS base quality at this posi5on – CIGAR : cigar string describing how to align an alternate allele to the reference allele – DB : dbSNP membership – DP : combined depth across samples, e.g. DP=154 – END : end posi5on of the variant described in this record (for use with symbolic alleles) – H2 : membership in hapmap2 – H3 : membership in hapmap3 – MQ : RMS mapping quality, e.g. MQ=52 – MQ0 : Number of MAPQ == 0 reads covering this record – NS : Number of samples with data – SB : strand bias at this posi5on – SOMATIC : indicates that the record is a soma5c muta5on, for cancer genomics – VALIDATED : validated by follow-‐up experiment – 1000G : membership in 1000 Genomes
Database• Store Each Base + Meta of RM versus Reference for each Experiment – Dis5nguish missing versus homozygous reference – Include copy number and phasing when available, not required • Engine that drives front end visualiza5on (Genome Browser)
Visualize and Filter• Build on GetRM/NCBI Browser Work • Single RM -‐> Many Experiments • Not all metadata will be visual, but most/all will be ﬁlterable • Filter data to generate ROI or VOI – Canned: i.e. Intersect of All PlaQorms + Analysis, All OMIM SNPs, Clinical Cert SNV List, etc – Dynamic: allowing people to explore prep, sequence, or analysis bias • Slice, Dice, Export VOI to compare and repor5ng SW • Allow user deﬁned tracks • By product is community educa5onal resource – I have a ROI for a test and want to know what plaQorm, prep, exome kit version, etc covers it best. What do I do?
Compare and Reporting• Take in ROI or VOI from the visualize and ﬁlter stage • Take in user deﬁned VOI or VOI + ROI • Poten5ally Leverage SW under Valida5onProtocol.org to generate reports and ﬁles including BNLT: – Summary of completeness, accuracy, phasing – Discordant variants in VCF – Concordant variants in VCF – Phasing errors in VCF • Provide intui5ve way to feed these resultants in downstream analysis SW or back into browser (User Deﬁned Track)
Compare and Reporting
Realistic Approach• Tell Group 3 what is needed, they provide feedback on priority and reality of request. • Should extend no maher RM or if WGS, WES, Gene Panel, etc.