SlideShare a Scribd company logo
1 of 26
GA4GH work towards a standardized
variant comparison tool
Kevin Jacobs
1/30/2015
Simple Question
Is a variant present within a genome?
Simple Question?
Is a variant present within a genome?
• sequence at a location:
– allele
– haplotype
– genotype
• something like a
– VCF genotype
– HGVS string
– dbSNP/ClinVar/HGMD entry
Simple Question?
Is a variant present within a genome?
• Collection of variants (and reference) in
– VCF/gVCF/BCF file
– var/MasterVar file
– dbSNP/ClinVar/HGMD/etc.
– your fancy new file format
– your fancy new database
Problem Statement
Is a variant present within a genome?
Surely this must be a solved problem?
Dr. Seuss
• Sometimes the questions
are complicated and the
answers are simple
Is this a simple question?
• It also depends on how we define…
– variant, genome, location, genotype, present
• Can we answer this question?
– Is the location well defined?
– Did we observe reads that location?
– Could we infer a single most-likely genotype at that
location?
– Are we asking about “simple” variation in a “nice”
region of the genome?
• If yes to all of these, then we can almost always
answer our question correctly.
Don’t Panic!
Why is this so hard?
• Consider c.2_4delCTAinsGC
– REF: ACTAC
– H1: =G-C=
• It can also be spelled
– c.[2C>G; 3del; 4A>C]
– c.[2C>G; 3T>C; 4del]
– c.[2del; 3T>G; 4A>C]
– …
Assumptions and notation
• We have an accurate reference genome
sequence
• Queries are relative to well-defined non-
ambiguous regions of the reference sequence
• Simple sequence query / assertion:
– VCF: (chrom, pos, ref, alts, geno)
• E.g. (chrZ, 55, A, G, 1/1)
– Generic: (chrom, start, stop, alleles)
• E.g. (chrZ, 54, 55, G, G)
– These representations are equivalent modulo some
strange encoding rules for VCF relating to null alleles
Most basic model
• A “genome” G is a set of sequence assertions
• A “query” is a proposition q∈G where q is a
sequence assertion
• E.g.
– G = { (chrZ, 55, G, G) }
– Q1 : (chrZ, 55, A, G) ∈ G = False
– Q2 : (chrZ, 55, G, G) ∈ G = True
Basic model extensions
• Simple extensions
– Indels / MNVs
– Reference calls (like gVCF)
– No calls, partial calls
– Arbitrary ploidy
– Phase, quality, filters, etc. (not show)
G = {(chrZ, 0, 24, =, =), (chrZ, 24, 25, G, G),
(chrZ, 25, 53, =, =), (chrZ, 53, 55, NN, NN),
(chrZ, 55, 88, =, =), (chrZ, 88, 92, ATAT, NNNN),
(chrZ, 92, 96, =, =), (chrZ, 96, 98, A, ☐),
(chrZ, 98, 100, =, =)}
Limitations of the basic model
• Sequence assertions do not have unique
representations
– Alignments are not unique
– Alignment models differ
– Nearby variants / phase information
– Missing data and uncertainty
• Sometimes we aren’t asking the right question
Alignments are not unique
• Precedence of insertions, deletions and
mismatches:
– REF: ACAC
– H1: =-G= (AGC)
– H2: =G-= (AGC)
Limitations of the basic model
• Sequence assertions do not have unique
representations
– REF: TCACACACAG
– H1: T--CACACAG (REF, 1, 3, ☐)
– H2: TC--ACACAG (REF, 2, 4, ☐)
– H3: TCA--CACAG (REF, 3, 5, ☐)
– H4: TCAC--ACAG (REF, 4, 6, ☐)
– H5: TCACA--CAG (REF, 5, 7, ☐)
– H6: TCACAC--AG (REF, 6, 8, ☐)
– H7: TCACACA--G (REF, 7, 9, ☐)
Alignments models differ
• Different alignment scoring:
– REF: A--CAC
– H1: =GG--= (REF, 1, 1, ☐, GG)
(REF, 1, 3, CA, ☐)
– H2: =--GG= (REF, 1, 3, CA, GG)
• Base quality aware alignments algorithms are
even more susceptible to non-unique
alignments
Ignoring phase or phase uncertainty
introduces ambiguity
– REF: ACGT
– H1: =A== (REF, 1, 2, C, A)
– H2: ==C= (REF, 2, 3, G, C)
• Vs
– REF: ACGT
– H1: =AC= (REF, 1, 2, C, A)
(REF, 2, 3, G, C)
– H2: ====
• Vs
– REF: ACGT
– H1: =AC= (REF, 1, 3, CG, AC)
– H2: ====
Missing data
G = {(chrZ, 0, 24, =, =), (chrZ, 24, 25, G, G),
(chrZ, 25, 53, =, =), (chrZ, 53, 55, NN, NN),
(chrZ, 55, 88, =, =), (chrZ, 88, 92, ATAT, NNNN),
(chrZ, 92, 96, =, =), (chrZ, 96, 98, A, ☐),
(chrZ, 98, 100, =, =)}
• Q: (chrZ, 54, 55, A, T) ∈ G  False
Multiple alleles/samples
• Remember our friend:
– REF: TCACACACAG
– H1: T--CACACAG (REF, 1, 3, ☐)
– H2: TCACACA--G (REF, 7, 9, ☐)
Multiple alleles/samples
• What left-normalizing H2 will look like in VCF?
– REF: TCACACACAG
– H2: TCACACA--G (REF, 7, 9, ☐)
– H3: TCACACACTG (REF, 8, 9, T)
– H4: TCACTCACAG (REF, 4, 5, T)
– H5: TTACACACAG (REF, 1, 2, T)
– H1: T--CACACAG (REF, 1, 3, ☐)
Bottom Line
• Is there a canonical form for sequence
assertions?
– If so, then we can normalize our data into that
form and rely on simple set-existential queries
– If not, then we need a better model
– In the mean time, we rely on heuristics to perform
comparisons and understand that they are
imperfect
Better models
• Two basic approaches
1. Standardize alignment and representations so
that we can always derive a unique canonical
representation
2. Make the comparison model “spelling agnostic”
Reference graph model
• Convert (g)VCF and other file formats into a graph
representation
• Compute whether graph can “generate” the query
haplotype or genotype
– Supporting multiple forms of ambiguity that are inherent
in the biological questions we ask.
Phase constraint
Related Problems
• What are all of the differences between two
genomes?
• Collect all alleles observed across multiple
genomes
• Merge genomes into a single coherent
representation
• Efficiently store and query a large number of
genomes
Implementation plan
• Build a reference implementation
– Open source, free, and hosted by GA4GH
– Built in Python + Cython
– Include an extensive test suite
• Not inventing any new file formats
• Implementation underway
– VCF processor built on htslib
– Rest of the engine in progress
– Accounting and testing coming soon after
Thanks to:
• Justin Zook and the other GIAB organizers
• Geneticists, who have been doing this right all along
• Complete Genomics for their calldiff algorithm
• Great discussions and debates with friends and
colleagues at NCI, NCBI, Invitae, 23andMe, 1000
Genomes, GA4GH, etc.

More Related Content

Viewers also liked

Mar2013 Reference Material Selection Working Group
Mar2013 Reference Material Selection Working GroupMar2013 Reference Material Selection Working Group
Mar2013 Reference Material Selection Working GroupGenomeInABottle
 
Jan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis PlanningJan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis PlanningGenomeInABottle
 
March 2013 Bioinformatics Working Group
March 2013 Bioinformatics Working GroupMarch 2013 Bioinformatics Working Group
March 2013 Bioinformatics Working GroupGenomeInABottle
 
GIAB Sep2016 Lightning chen sun varmatch
GIAB Sep2016 Lightning chen sun varmatchGIAB Sep2016 Lightning chen sun varmatch
GIAB Sep2016 Lightning chen sun varmatchGenomeInABottle
 
Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878GenomeInABottle
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_introGenomeInABottle
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsGenomeInABottle
 

Viewers also liked (11)

Mar2013 Reference Material Selection Working Group
Mar2013 Reference Material Selection Working GroupMar2013 Reference Material Selection Working Group
Mar2013 Reference Material Selection Working Group
 
Sept2016 sv pb_honey
Sept2016 sv pb_honeySept2016 sv pb_honey
Sept2016 sv pb_honey
 
Jan2016 horizon GIAB
Jan2016 horizon GIABJan2016 horizon GIAB
Jan2016 horizon GIAB
 
Jan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis PlanningJan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis Planning
 
March 2013 Bioinformatics Working Group
March 2013 Bioinformatics Working GroupMarch 2013 Bioinformatics Working Group
March 2013 Bioinformatics Working Group
 
Sept2016 sv illumina
Sept2016 sv illuminaSept2016 sv illumina
Sept2016 sv illumina
 
GIAB Sep2016 Lightning chen sun varmatch
GIAB Sep2016 Lightning chen sun varmatchGIAB Sep2016 Lightning chen sun varmatch
GIAB Sep2016 Lightning chen sun varmatch
 
Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_intro
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
 

Similar to Jan2015 ga4 gh variant comparison

Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...openCypher
 
WIX3001 Lecture 6 Principles of GA.pptx
WIX3001 Lecture 6 Principles of GA.pptxWIX3001 Lecture 6 Principles of GA.pptx
WIX3001 Lecture 6 Principles of GA.pptxKelvinCheah4
 
New Enhancements: GWAS Workflows with SVS
New Enhancements: GWAS Workflows with SVSNew Enhancements: GWAS Workflows with SVS
New Enhancements: GWAS Workflows with SVSGolden Helix
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignmentKubuldinho
 
Workshop on Introduction to eviews
Workshop on Introduction to eviewsWorkshop on Introduction to eviews
Workshop on Introduction to eviewsDr. Vignes Gopal
 
CNV, GWAS & Clinical Analysis Advancements in SVS
CNV, GWAS & Clinical Analysis Advancements in SVSCNV, GWAS & Clinical Analysis Advancements in SVS
CNV, GWAS & Clinical Analysis Advancements in SVSGolden Helix
 
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...Spark Summit
 
10 logic+programming+with+prolog
10 logic+programming+with+prolog10 logic+programming+with+prolog
10 logic+programming+with+prologbaran19901990
 
Evolutionary (deep) neural network
Evolutionary (deep) neural networkEvolutionary (deep) neural network
Evolutionary (deep) neural networkSoo-Yong Shin
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
Clinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's diseaseClinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's diseaseGeorge Kalangi
 
Scientific Software Development
Scientific Software DevelopmentScientific Software Development
Scientific Software Developmentjalle6
 
Bounded Model Checking
Bounded Model CheckingBounded Model Checking
Bounded Model CheckingIlham Amezzane
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Peter krusche population based targeted validation of structural variant brea...
Peter krusche population based targeted validation of structural variant brea...Peter krusche population based targeted validation of structural variant brea...
Peter krusche population based targeted validation of structural variant brea...GenomeInABottle
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsDelaina Hawkins
 

Similar to Jan2015 ga4 gh variant comparison (20)

1 md2016 homology
1 md2016 homology1 md2016 homology
1 md2016 homology
 
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
 
WIX3001 Lecture 6 Principles of GA.pptx
WIX3001 Lecture 6 Principles of GA.pptxWIX3001 Lecture 6 Principles of GA.pptx
WIX3001 Lecture 6 Principles of GA.pptx
 
02-alignment.pdf
02-alignment.pdf02-alignment.pdf
02-alignment.pdf
 
New Enhancements: GWAS Workflows with SVS
New Enhancements: GWAS Workflows with SVSNew Enhancements: GWAS Workflows with SVS
New Enhancements: GWAS Workflows with SVS
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Workshop on Introduction to eviews
Workshop on Introduction to eviewsWorkshop on Introduction to eviews
Workshop on Introduction to eviews
 
CNV, GWAS & Clinical Analysis Advancements in SVS
CNV, GWAS & Clinical Analysis Advancements in SVSCNV, GWAS & Clinical Analysis Advancements in SVS
CNV, GWAS & Clinical Analysis Advancements in SVS
 
Ch06 multalign
Ch06 multalignCh06 multalign
Ch06 multalign
 
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
 
10 logic+programming+with+prolog
10 logic+programming+with+prolog10 logic+programming+with+prolog
10 logic+programming+with+prolog
 
Evolutionary (deep) neural network
Evolutionary (deep) neural networkEvolutionary (deep) neural network
Evolutionary (deep) neural network
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Clinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's diseaseClinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's disease
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Scientific Software Development
Scientific Software DevelopmentScientific Software Development
Scientific Software Development
 
Bounded Model Checking
Bounded Model CheckingBounded Model Checking
Bounded Model Checking
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Peter krusche population based targeted validation of structural variant brea...
Peter krusche population based targeted validation of structural variant brea...Peter krusche population based targeted validation of structural variant brea...
Peter krusche population based targeted validation of structural variant brea...
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 

More from GenomeInABottle

GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GenomeInABottle
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907GenomeInABottle
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGenomeInABottle
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020GenomeInABottle
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGenomeInABottle
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGenomeInABottle
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGenomeInABottle
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGenomeInABottle
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGenomeInABottle
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyGenomeInABottle
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 

More from GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 

Recently uploaded

Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Servicevidya singh
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escortsvidya singh
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Call Girls in Nagpur High Profile
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Dipal Arora
 
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋TANUJA PANDEY
 
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...indiancallgirl4rent
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiAlinaDevecerski
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeCall Girls Delhi
 
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...Taniya Sharma
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 9332606886 ⟟ Call Me For G...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟  9332606886 ⟟ Call Me For G...Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟  9332606886 ⟟ Call Me For G...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 9332606886 ⟟ Call Me For G...narwatsonia7
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...narwatsonia7
 
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 

Recently uploaded (20)

Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
 
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 9332606886 ⟟ Call Me For G...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟  9332606886 ⟟ Call Me For G...Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟  9332606886 ⟟ Call Me For G...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 9332606886 ⟟ Call Me For G...
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
 
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
 

Jan2015 ga4 gh variant comparison

  • 1. GA4GH work towards a standardized variant comparison tool Kevin Jacobs 1/30/2015
  • 2. Simple Question Is a variant present within a genome?
  • 3. Simple Question? Is a variant present within a genome? • sequence at a location: – allele – haplotype – genotype • something like a – VCF genotype – HGVS string – dbSNP/ClinVar/HGMD entry
  • 4. Simple Question? Is a variant present within a genome? • Collection of variants (and reference) in – VCF/gVCF/BCF file – var/MasterVar file – dbSNP/ClinVar/HGMD/etc. – your fancy new file format – your fancy new database
  • 5. Problem Statement Is a variant present within a genome? Surely this must be a solved problem?
  • 6. Dr. Seuss • Sometimes the questions are complicated and the answers are simple
  • 7. Is this a simple question? • It also depends on how we define… – variant, genome, location, genotype, present • Can we answer this question? – Is the location well defined? – Did we observe reads that location? – Could we infer a single most-likely genotype at that location? – Are we asking about “simple” variation in a “nice” region of the genome? • If yes to all of these, then we can almost always answer our question correctly.
  • 9. Why is this so hard? • Consider c.2_4delCTAinsGC – REF: ACTAC – H1: =G-C= • It can also be spelled – c.[2C>G; 3del; 4A>C] – c.[2C>G; 3T>C; 4del] – c.[2del; 3T>G; 4A>C] – …
  • 10. Assumptions and notation • We have an accurate reference genome sequence • Queries are relative to well-defined non- ambiguous regions of the reference sequence • Simple sequence query / assertion: – VCF: (chrom, pos, ref, alts, geno) • E.g. (chrZ, 55, A, G, 1/1) – Generic: (chrom, start, stop, alleles) • E.g. (chrZ, 54, 55, G, G) – These representations are equivalent modulo some strange encoding rules for VCF relating to null alleles
  • 11. Most basic model • A “genome” G is a set of sequence assertions • A “query” is a proposition q∈G where q is a sequence assertion • E.g. – G = { (chrZ, 55, G, G) } – Q1 : (chrZ, 55, A, G) ∈ G = False – Q2 : (chrZ, 55, G, G) ∈ G = True
  • 12. Basic model extensions • Simple extensions – Indels / MNVs – Reference calls (like gVCF) – No calls, partial calls – Arbitrary ploidy – Phase, quality, filters, etc. (not show) G = {(chrZ, 0, 24, =, =), (chrZ, 24, 25, G, G), (chrZ, 25, 53, =, =), (chrZ, 53, 55, NN, NN), (chrZ, 55, 88, =, =), (chrZ, 88, 92, ATAT, NNNN), (chrZ, 92, 96, =, =), (chrZ, 96, 98, A, ☐), (chrZ, 98, 100, =, =)}
  • 13. Limitations of the basic model • Sequence assertions do not have unique representations – Alignments are not unique – Alignment models differ – Nearby variants / phase information – Missing data and uncertainty • Sometimes we aren’t asking the right question
  • 14. Alignments are not unique • Precedence of insertions, deletions and mismatches: – REF: ACAC – H1: =-G= (AGC) – H2: =G-= (AGC)
  • 15. Limitations of the basic model • Sequence assertions do not have unique representations – REF: TCACACACAG – H1: T--CACACAG (REF, 1, 3, ☐) – H2: TC--ACACAG (REF, 2, 4, ☐) – H3: TCA--CACAG (REF, 3, 5, ☐) – H4: TCAC--ACAG (REF, 4, 6, ☐) – H5: TCACA--CAG (REF, 5, 7, ☐) – H6: TCACAC--AG (REF, 6, 8, ☐) – H7: TCACACA--G (REF, 7, 9, ☐)
  • 16. Alignments models differ • Different alignment scoring: – REF: A--CAC – H1: =GG--= (REF, 1, 1, ☐, GG) (REF, 1, 3, CA, ☐) – H2: =--GG= (REF, 1, 3, CA, GG) • Base quality aware alignments algorithms are even more susceptible to non-unique alignments
  • 17. Ignoring phase or phase uncertainty introduces ambiguity – REF: ACGT – H1: =A== (REF, 1, 2, C, A) – H2: ==C= (REF, 2, 3, G, C) • Vs – REF: ACGT – H1: =AC= (REF, 1, 2, C, A) (REF, 2, 3, G, C) – H2: ==== • Vs – REF: ACGT – H1: =AC= (REF, 1, 3, CG, AC) – H2: ====
  • 18. Missing data G = {(chrZ, 0, 24, =, =), (chrZ, 24, 25, G, G), (chrZ, 25, 53, =, =), (chrZ, 53, 55, NN, NN), (chrZ, 55, 88, =, =), (chrZ, 88, 92, ATAT, NNNN), (chrZ, 92, 96, =, =), (chrZ, 96, 98, A, ☐), (chrZ, 98, 100, =, =)} • Q: (chrZ, 54, 55, A, T) ∈ G  False
  • 19. Multiple alleles/samples • Remember our friend: – REF: TCACACACAG – H1: T--CACACAG (REF, 1, 3, ☐) – H2: TCACACA--G (REF, 7, 9, ☐)
  • 20. Multiple alleles/samples • What left-normalizing H2 will look like in VCF? – REF: TCACACACAG – H2: TCACACA--G (REF, 7, 9, ☐) – H3: TCACACACTG (REF, 8, 9, T) – H4: TCACTCACAG (REF, 4, 5, T) – H5: TTACACACAG (REF, 1, 2, T) – H1: T--CACACAG (REF, 1, 3, ☐)
  • 21. Bottom Line • Is there a canonical form for sequence assertions? – If so, then we can normalize our data into that form and rely on simple set-existential queries – If not, then we need a better model – In the mean time, we rely on heuristics to perform comparisons and understand that they are imperfect
  • 22. Better models • Two basic approaches 1. Standardize alignment and representations so that we can always derive a unique canonical representation 2. Make the comparison model “spelling agnostic”
  • 23. Reference graph model • Convert (g)VCF and other file formats into a graph representation • Compute whether graph can “generate” the query haplotype or genotype – Supporting multiple forms of ambiguity that are inherent in the biological questions we ask. Phase constraint
  • 24. Related Problems • What are all of the differences between two genomes? • Collect all alleles observed across multiple genomes • Merge genomes into a single coherent representation • Efficiently store and query a large number of genomes
  • 25. Implementation plan • Build a reference implementation – Open source, free, and hosted by GA4GH – Built in Python + Cython – Include an extensive test suite • Not inventing any new file formats • Implementation underway – VCF processor built on htslib – Rest of the engine in progress – Accounting and testing coming soon after
  • 26. Thanks to: • Justin Zook and the other GIAB organizers • Geneticists, who have been doing this right all along • Complete Genomics for their calldiff algorithm • Great discussions and debates with friends and colleagues at NCI, NCBI, Invitae, 23andMe, 1000 Genomes, GA4GH, etc.