Performance Metrics and Figures of Merit Working Group Summary Aug2012

398 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
398
On SlideShare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Performance Metrics and Figures of Merit Working Group Summary Aug2012

  1. 1. Genome in a Bottle Performance Metric & Figures of Merit
  2. 2. OverviewBioinforma5cs   Experimental  Data   Data   •  Sequence  Data  &  Varia5on   Integra5on  /   •  Metadata  Representa5on   Database   Refine  and  Feedback   •  RM  vs.  Reference   •  Every  Base   Compare  and  Report   Visualize  and  Filter   •  Single  Genome  Browser   •  Browser  over  DB   •  Valida5onProtocol.org   •  Query  by  Experiment  Data   Experimental Data = Combination of Prep / Sequencing / Analysis
  3. 3. Experimental Data•  Prepara5on   –  Link  to  published  prep  protocol   –  ROI  in  Bed/GFF/GBK  Format  •  Sequencing   –  PlaQorm  Informa5on  (Minimally  Name)   –  Chemistry  (Minimally  Version)  •  Analysis   –  Link  to  published  analysis  protocol  or  best  prac5ces   –  Read  Data  (fastq,  sra,  hdf5,  others)   –  Alignment/Assembly  Data  (bam)   •  Minimal  Tag  Set  TBD   –  Varia5on  (vcf)   •  Minimal  Tag  Set  TBD  in  INFO  field  of  VCF  or  define  external   XSD  
  4. 4. Metadata•  All  Required  fields  in  VCF  4.1  •  Others  (Examples)   –  AA  :  ancestral  allele   –  AC  :  allele  count  in  genotypes,  for  each  ALT  allele,  in  the  same  order  as  listed   –  AF  :  allele  frequency  for  each  ALT  allele  in  the  same  order  as  listed:  use  this  when  es5mated  from  primary  data,  not   called  genotypes   –  AN  :  total  number  of  alleles  in  called  genotypes   –  BQ  :  RMS  base  quality  at  this  posi5on   –  CIGAR  :  cigar  string  describing  how  to  align  an  alternate  allele  to  the  reference  allele   –  DB  :  dbSNP  membership   –  DP  :  combined  depth  across  samples,  e.g.  DP=154   –  END  :  end  posi5on  of  the  variant  described  in  this  record  (for  use  with  symbolic  alleles)   –  H2  :  membership  in  hapmap2   –  H3  :  membership  in  hapmap3   –  MQ  :  RMS  mapping  quality,  e.g.  MQ=52   –  MQ0  :  Number  of  MAPQ  ==  0  reads  covering  this  record   –  NS  :  Number  of  samples  with  data   –  SB  :  strand  bias  at  this  posi5on   –  SOMATIC  :  indicates  that  the  record  is  a  soma5c  muta5on,  for  cancer  genomics   –  VALIDATED  :  validated  by  follow-­‐up  experiment   –  1000G  :  membership  in  1000  Genomes  
  5. 5. Database•  Store  Each  Base  +  Meta  of  RM  versus  Reference  for  each   Experiment   –  Dis5nguish  missing  versus  homozygous  reference   –  Include  copy  number  and  phasing  when  available,  not   required  •  Engine  that  drives  front  end  visualiza5on  (Genome  Browser)  
  6. 6. Visualize and Filter•  Build  on  GetRM/NCBI  Browser  Work  •  Single  RM  -­‐>  Many  Experiments  •  Not  all  metadata  will  be  visual,  but  most/all  will  be  filterable  •  Filter  data  to  generate  ROI  or  VOI     –  Canned:      i.e.  Intersect  of  All  PlaQorms  +  Analysis,  All  OMIM  SNPs,   Clinical  Cert  SNV  List,  etc   –  Dynamic:  allowing  people  to  explore  prep,  sequence,  or  analysis  bias  •  Slice,  Dice,  Export  VOI  to  compare  and  repor5ng  SW  •  Allow  user  defined  tracks  •  By  product  is  community  educa5onal  resource   –  I  have  a  ROI  for  a  test  and  want  to  know  what  plaQorm,   prep,  exome  kit  version,  etc  covers  it  best.    What  do  I  do?  
  7. 7. Compare and Reporting•  Take  in  ROI  or  VOI  from  the  visualize  and  filter  stage  •  Take  in  user  defined  VOI  or  VOI  +  ROI  •  Poten5ally  Leverage  SW  under  Valida5onProtocol.org  to   generate  reports  and  files  including  BNLT:   –  Summary  of  completeness,  accuracy,  phasing   –  Discordant  variants  in  VCF   –  Concordant  variants  in  VCF   –  Phasing  errors  in  VCF  •  Provide  intui5ve  way  to  feed  these  resultants  in  downstream   analysis  SW  or  back  into  browser  (User  Defined  Track)  
  8. 8. Compare and Reporting
  9. 9. Realistic Approach•  Tell  Group  3  what  is  needed,  they  provide  feedback  on   priority  and  reality  of  request.  •  Should  extend  no  maher  RM  or  if  WGS,  WES,  Gene  Panel,  etc.  

×