SlideShare a Scribd company logo
1 of 31
Real world challenges to using GRCh38 
A view from the trenches 
Deanna M. Church 
Senior Director of Genomics and Content 
Pioneering Genome-Guided Medicine 
© 2014 Personalis, Inc. All rights reserved.
Acknowledgements 
Personalis 
Jason Harris 
Sarah Garcia 
Jeanie Tirch 
Gabor Bartha 
Mark Pratt 
Scott Kirk 
Michael Clark 
Rich Chen 
John West 
Genome Reference Consortium 
Personalis, Inc. | Confidential 2 and Proprietary 
NCBI 
Valerie Schneider 
Nathan Bouk 
Terence Murphy 
Alex Astashyn 
Donna Maglott 
Melissa Landrum 
Wendy Rubinstein 
Jennifer Lee
Who we are 
Inherited 
Disease 
Diagnostics 
Personalis, Inc. | Confidential 3 and Proprietary 
Cancer 
Services 
ACE Platform 
Research 
Services
Accuracy is key to what we do 
Novel 2bp deletion in GATAD2B 
Called by GATK as paternally inherited 
Personalis, Inc. | Confidential 4 and Proprietary 
• Both affected children 
– Macrocephaly 
– Low muscle tone, hypotonia 
– Delay in early milestones 
– Dysphagia 
– Esotropia 
• Affected Male (3 yr) 
– Intellectual disability 
– Mild hearing loss 
– High arched palate 
– Small cyst near eye 
• Affected Female (15 mo) 
– Sleep apnea 
– Failure to thrive 
– Laryngomalacia 
– Anisocoria 
– Small optic nerves 
Case courtesy of Geisinger Health System
Accuracy is key to what we do 
Sample GATK-determined 
Genotype 
Personalis, Inc. | Confidential 5 and Proprietary 
Ref Alt Depth Allele 
Freq. 
Father 0/1 108 11 121 0.09 
Mother 0/0 111 0 112 0.00 
Brother 0/1 63 44 109 0.40 
Sister 0/1 64 52 119 0.44 
Case courtesy of Geisinger Health System
Excitement about GRCh38 
Personalis, Inc. | Confidential 6 and Proprietary 
DPYD 
GGAACGCAG 
GGAACACAG 
R->C 
Alt loci 
Model Centromere Sequences 
Miga et al., 2014
Medical content not on chromosome sequences 
Personalis, Inc. | Confidential 7 and Proprietary
Medical content not on chromosome sequences 
GRCh37 
NT_113939: chr19 unlocalized contig 
GRCh38 
Personalis, Inc. | Confidential 8 and Proprietary
Medical content not on chromosome sequences 
NT_167246.2: MHC alternate locus 
Sparse SNP No SNP annotation 
annotation 
Personalis, Inc. | Confidential 9 and Proprietary
By any other name 
chr19 vs 19 
GenBank: CM00681.2 
RefSeq: NC_000019.10 
Personalis, Inc. | Confidential 10 and Proprietary
By any other name 
chr19_KI270938v1_alt 
CHR_HSCHR19KIR_G248_BA2_HAP_CTG3_1 
GenBank: KI270886.1 
RefSeq: NT_187640.1 
Personalis, Inc. | Confidential 11 and Proprietary
Unflattening the data MICB 
Reporting formats (GFF, VCF, etc) don’t 
manage multiple locations easily 
Personalis, Inc. | Confidential 12 and Proprietary
NW_003871068.1 
NC_000006.12 BestRefSeq gene 31494881 31511124 . + . ID=gene13336;Name=MICB;Dbxref=GeneID:4277 
NT_167244.2 BestRefSeq gene 2827449 2843674 . + . ID=gene42005;Name=MICB;Dbxref=GeneID:4277 
NT_113891.3 BestRefSeq gene 2972222 2988464 . + . ID=gene43669;Name=MICB;Dbxref=GeneID:4277 
NT_167245.2 BestRefSeq gene 2742492 2758910 . + . ID=gene44377;Name=MICB;Dbxref=GeneID:4277 
NT_167246.2 BestRefSeq gene 2810648 2816200 . + . ID=gene44827;Name=MICB;Dbxref=GeneID:4277 
NT_167247.2 BestRefSeq gene 2836836 2853071 . + . ID=gene45127;Name=MICB;Dbxref=GeneID:4277 
ID=gene13336;Name=MICB;Dbxref=GeneID:4277 
ID=gene42005;Name=MICB;Dbxref=GeneID:4277 
ID=gene43669;Name=MICB;Dbxref=GeneID:4277 
ID=gene44377;Name=MICB;Dbxref=GeneID:4277 
ID=gene44827;Name=MICB;Dbxref=GeneID:4277 
ID=gene45127;Name=MICB;Dbxref=GeneID:4277 
Building snpEFF 
Personalis, Inc. | Confidential 13 and Proprietary
Incremental steps: using fix patches 
Personalis, Inc. | Confidential 14 and Proprietary 
SHANK2
Personalis, Inc. | Confidential 15 and Proprietary 
Using Fix patches to improve alignments 
Incremental steps: using fix patches
Migrating to GRCh38: using Fix patches 
Personalis, Inc. | Confidential 16 and Proprietary 
Fix patch 
hs37d5
Migrating to GRCh38: using Fix patches 
Personalis, Inc. | Confidential 17 and Proprietary 
Fix patch 
hs37d5
Migrating to GRCh38: using Fix patches 
GRCh37 vs. Fix Patch 
GRCh38 
Personalis, Inc. | Confidential 18 and Proprietary
GRCh37.p13 Improved alignments outside of fix patch regions 
Personalis, Inc. | Confidential 19 and Proprietary 
Jason Harris 
Regions outside of fix patches 
hs37d5 
GRCh37.p13 
hs37d5 
GRCh37.p13 
378 Ten kb windows that don’t 
overlap fix patches with >10 SNV 
call differences
GRCh37.p13 Improved alignments outside of fix patch regions 
Personalis, Inc. | Confidential 20 and Proprietary 
Jason Harris 
hs37d5 
GRCh37.p13 
hs37d5 
GRCh37.p13 
hs37d5 
GRCh37.p13
Using Fix patches 
Personalis, Inc. | Confidential 21 and Proprietary
Aligning GRCh37 and GRCh38 
A A 
B’ 
Seq in 
assembly 1 
Personalis, Inc. | Confidential 22 and Proprietary 
Seq in 
assembly 2 
B 
B 
Unique well aligned 
region in both assemblies. 
Second Pass (SP) alignments 
First Pass (FP) alignments 
SP only 
Expansion 
Assembly 1 
SP + FP 
Collapse 
Assembly 2
Aligning GRCh37 and GRCh38 
Personalis, Inc. | Confidential 23 and Proprietary
Mapping to GRCh38 
Personalis, Inc. | Confidential 24 and Proprietary
Mapping to GRCh38 
Dataset Starting 
loci 
Failure Unique to 
Personalis, Inc. | Confidential 25 and Proprietary 
Primary 
Unique to 
Alts 
Collapse 
in 
GRCh37 
Collapse 
in 
GRCh38 
GWAS 
catalog 
7,991 0 7,827 0 14 0 
ClinVar* 88,343 3 86,549 5 278 4 
GO-ESP 
6500 
1,982,177 180 1,920,864 339 5,792 324 
GIAB 2,915,713 274 2,874,786 47 1,662 4 
NCBI assembly-assembly alignments from: 
Sept 20, 2014, software version 1.7 
*clinvar_20140902.vcf
Remap vs. liftOver 
liftOver-dbSNP remap 
Personalis, Inc. | Confidential 26 and Proprietary 
rs141109950 
chr7
Remap vs. liftOver 
Personalis, Inc. | Confidential 27 and Proprietary 
rs267602252 
remap liftOver
Migrating to GRCh38 
First Pass remap Second Pass remap 
Personalis, Inc. | Confidential 28 and Proprietary
Migrating to GRCh38 
New PRODH paralog 
Sequence is unlocalized on chr22. 
Personalis, Inc. | Confidential 29 and Proprietary
Using GRCh38 to improve GRCh37 annotation 
Personalis, Inc. | Confidential 30 and Proprietary 
KCNE1 
Alignment to new paralog added in GRCh38
Getting the most out of the reference 
Still challenging because tools and 
data structures expect a flat assembly 
Remap/liftOver not the final answer for 
moving variation 
Even modest changes (via fix patches) 
are promsing 
Personalis, Inc. | Confidential 31 and Proprietary

More Related Content

What's hot

GENASSIST™ CRISPR & rAAV Genome Editing Tools
GENASSIST™ CRISPR & rAAV Genome Editing ToolsGENASSIST™ CRISPR & rAAV Genome Editing Tools
GENASSIST™ CRISPR & rAAV Genome Editing ToolsCandy Smellie
 
Theory and practice of graphical population analysis
Theory and practice of graphical population analysisTheory and practice of graphical population analysis
Theory and practice of graphical population analysisGenome Reference Consortium
 
New RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingNew RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingIntegrated DNA Technologies
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Candy Smellie
 
What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?Genome Reference Consortium
 
An Introduction to Crispr Genome Editing
An Introduction to Crispr Genome EditingAn Introduction to Crispr Genome Editing
An Introduction to Crispr Genome EditingChris Thorne
 
Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...Integrated DNA Technologies
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014vaschn
 
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MACRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MADiane McKenna
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 Diane McKenna
 
Arjun's Poster ACTUAL FINAL POSTER
Arjun's Poster ACTUAL FINAL POSTERArjun's Poster ACTUAL FINAL POSTER
Arjun's Poster ACTUAL FINAL POSTERArjun Mahadevan
 
Jan2015 using the pilot genome rm for clinical validation steve lincoln
Jan2015 using the pilot genome rm for clinical validation steve lincolnJan2015 using the pilot genome rm for clinical validation steve lincoln
Jan2015 using the pilot genome rm for clinical validation steve lincolnGenomeInABottle
 
140127 platinum genomes pedigree analyses
140127 platinum genomes pedigree analyses140127 platinum genomes pedigree analyses
140127 platinum genomes pedigree analysesGenomeInABottle
 

What's hot (20)

Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
GENASSIST™ CRISPR & rAAV Genome Editing Tools
GENASSIST™ CRISPR & rAAV Genome Editing ToolsGENASSIST™ CRISPR & rAAV Genome Editing Tools
GENASSIST™ CRISPR & rAAV Genome Editing Tools
 
Theory and practice of graphical population analysis
Theory and practice of graphical population analysisTheory and practice of graphical population analysis
Theory and practice of graphical population analysis
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
New RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingNew RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editing
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
 
agbt 2016 workshop church
agbt 2016 workshop churchagbt 2016 workshop church
agbt 2016 workshop church
 
What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
An Introduction to Crispr Genome Editing
An Introduction to Crispr Genome EditingAn Introduction to Crispr Genome Editing
An Introduction to Crispr Genome Editing
 
Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MACRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016
 
Arjun's Poster ACTUAL FINAL POSTER
Arjun's Poster ACTUAL FINAL POSTERArjun's Poster ACTUAL FINAL POSTER
Arjun's Poster ACTUAL FINAL POSTER
 
The CRISPR/Cas9 Toolbox
The CRISPR/Cas9 ToolboxThe CRISPR/Cas9 Toolbox
The CRISPR/Cas9 Toolbox
 
SNP and STR Multiplexes for NGS
SNP and STR Multiplexes for NGSSNP and STR Multiplexes for NGS
SNP and STR Multiplexes for NGS
 
Jan2015 using the pilot genome rm for clinical validation steve lincoln
Jan2015 using the pilot genome rm for clinical validation steve lincolnJan2015 using the pilot genome rm for clinical validation steve lincoln
Jan2015 using the pilot genome rm for clinical validation steve lincoln
 
140127 platinum genomes pedigree analyses
140127 platinum genomes pedigree analyses140127 platinum genomes pedigree analyses
140127 platinum genomes pedigree analyses
 

Similar to Church dm grc_workshop

Church SFAF2014 keynote
Church SFAF2014 keynoteChurch SFAF2014 keynote
Church SFAF2014 keynoteDeanna Church
 
2nd CRISPR Precision Genome Editing congress Berlin 2017 Agenda
2nd CRISPR Precision Genome Editing congress Berlin 2017 Agenda2nd CRISPR Precision Genome Editing congress Berlin 2017 Agenda
2nd CRISPR Precision Genome Editing congress Berlin 2017 AgendaDiane McKenna
 
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Neo4j
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchNolan Nichols
 
SCI What can Big Data do for Chemistry 2017 MedChemica
SCI What can Big Data do for Chemistry 2017 MedChemicaSCI What can Big Data do for Chemistry 2017 MedChemica
SCI What can Big Data do for Chemistry 2017 MedChemicaEd Griffen
 
Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVSGolden Helix
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Thermo Fisher Scientific
 
Efficient Application of NGS Family-Based Analysis
Efficient Application of NGS Family-Based AnalysisEfficient Application of NGS Family-Based Analysis
Efficient Application of NGS Family-Based AnalysisGolden Helix
 
Precision Medicine Knowledge Graph with GRAKN.AI
Precision Medicine Knowledge Graph with GRAKN.AIPrecision Medicine Knowledge Graph with GRAKN.AI
Precision Medicine Knowledge Graph with GRAKN.AIVaticle
 
140128 use cases of giab RMs
140128 use cases of giab RMs140128 use cases of giab RMs
140128 use cases of giab RMsGenomeInABottle
 
Semantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data CollaborationSemantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data CollaborationThomas Kelly, PMP
 
Sajid Sharif - Advance Plant Molecular Biology (1) - Copy.pptx
Sajid Sharif - Advance Plant Molecular Biology (1) - Copy.pptxSajid Sharif - Advance Plant Molecular Biology (1) - Copy.pptx
Sajid Sharif - Advance Plant Molecular Biology (1) - Copy.pptxIqbalians5
 
A User’s Perspective: Somatic Variant Analysis in VarSeq 2.3.0
A User’s Perspective: Somatic Variant Analysis in VarSeq 2.3.0A User’s Perspective: Somatic Variant Analysis in VarSeq 2.3.0
A User’s Perspective: Somatic Variant Analysis in VarSeq 2.3.0Golden Helix
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...David Peyruc
 
Knowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data FutureKnowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data FutureTim Williams
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Databricks
 

Similar to Church dm grc_workshop (20)

Church SFAF2014 keynote
Church SFAF2014 keynoteChurch SFAF2014 keynote
Church SFAF2014 keynote
 
2nd CRISPR Precision Genome Editing congress Berlin 2017 Agenda
2nd CRISPR Precision Genome Editing congress Berlin 2017 Agenda2nd CRISPR Precision Genome Editing congress Berlin 2017 Agenda
2nd CRISPR Precision Genome Editing congress Berlin 2017 Agenda
 
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
SCI What can Big Data do for Chemistry 2017 MedChemica
SCI What can Big Data do for Chemistry 2017 MedChemicaSCI What can Big Data do for Chemistry 2017 MedChemica
SCI What can Big Data do for Chemistry 2017 MedChemica
 
Arraygen brochure
Arraygen brochureArraygen brochure
Arraygen brochure
 
Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVS
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
 
Efficient Application of NGS Family-Based Analysis
Efficient Application of NGS Family-Based AnalysisEfficient Application of NGS Family-Based Analysis
Efficient Application of NGS Family-Based Analysis
 
Precision Medicine Knowledge Graph with GRAKN.AI
Precision Medicine Knowledge Graph with GRAKN.AIPrecision Medicine Knowledge Graph with GRAKN.AI
Precision Medicine Knowledge Graph with GRAKN.AI
 
140128 use cases of giab RMs
140128 use cases of giab RMs140128 use cases of giab RMs
140128 use cases of giab RMs
 
MDC Connects: Make the Molecules that Matter
MDC Connects: Make the Molecules that MatterMDC Connects: Make the Molecules that Matter
MDC Connects: Make the Molecules that Matter
 
Semantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data CollaborationSemantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data Collaboration
 
Sajid Sharif - Advance Plant Molecular Biology (1) - Copy.pptx
Sajid Sharif - Advance Plant Molecular Biology (1) - Copy.pptxSajid Sharif - Advance Plant Molecular Biology (1) - Copy.pptx
Sajid Sharif - Advance Plant Molecular Biology (1) - Copy.pptx
 
Sorrento Investor Presentation
Sorrento Investor PresentationSorrento Investor Presentation
Sorrento Investor Presentation
 
A User’s Perspective: Somatic Variant Analysis in VarSeq 2.3.0
A User’s Perspective: Somatic Variant Analysis in VarSeq 2.3.0A User’s Perspective: Somatic Variant Analysis in VarSeq 2.3.0
A User’s Perspective: Somatic Variant Analysis in VarSeq 2.3.0
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
 
Knowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data FutureKnowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data Future
 
Data Leveraging
Data Leveraging Data Leveraging
Data Leveraging
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
 

More from Genome Reference Consortium

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCGenome Reference Consortium
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 

More from Genome Reference Consortium (20)

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 

Church dm grc_workshop

  • 1. Real world challenges to using GRCh38 A view from the trenches Deanna M. Church Senior Director of Genomics and Content Pioneering Genome-Guided Medicine © 2014 Personalis, Inc. All rights reserved.
  • 2. Acknowledgements Personalis Jason Harris Sarah Garcia Jeanie Tirch Gabor Bartha Mark Pratt Scott Kirk Michael Clark Rich Chen John West Genome Reference Consortium Personalis, Inc. | Confidential 2 and Proprietary NCBI Valerie Schneider Nathan Bouk Terence Murphy Alex Astashyn Donna Maglott Melissa Landrum Wendy Rubinstein Jennifer Lee
  • 3. Who we are Inherited Disease Diagnostics Personalis, Inc. | Confidential 3 and Proprietary Cancer Services ACE Platform Research Services
  • 4. Accuracy is key to what we do Novel 2bp deletion in GATAD2B Called by GATK as paternally inherited Personalis, Inc. | Confidential 4 and Proprietary • Both affected children – Macrocephaly – Low muscle tone, hypotonia – Delay in early milestones – Dysphagia – Esotropia • Affected Male (3 yr) – Intellectual disability – Mild hearing loss – High arched palate – Small cyst near eye • Affected Female (15 mo) – Sleep apnea – Failure to thrive – Laryngomalacia – Anisocoria – Small optic nerves Case courtesy of Geisinger Health System
  • 5. Accuracy is key to what we do Sample GATK-determined Genotype Personalis, Inc. | Confidential 5 and Proprietary Ref Alt Depth Allele Freq. Father 0/1 108 11 121 0.09 Mother 0/0 111 0 112 0.00 Brother 0/1 63 44 109 0.40 Sister 0/1 64 52 119 0.44 Case courtesy of Geisinger Health System
  • 6. Excitement about GRCh38 Personalis, Inc. | Confidential 6 and Proprietary DPYD GGAACGCAG GGAACACAG R->C Alt loci Model Centromere Sequences Miga et al., 2014
  • 7. Medical content not on chromosome sequences Personalis, Inc. | Confidential 7 and Proprietary
  • 8. Medical content not on chromosome sequences GRCh37 NT_113939: chr19 unlocalized contig GRCh38 Personalis, Inc. | Confidential 8 and Proprietary
  • 9. Medical content not on chromosome sequences NT_167246.2: MHC alternate locus Sparse SNP No SNP annotation annotation Personalis, Inc. | Confidential 9 and Proprietary
  • 10. By any other name chr19 vs 19 GenBank: CM00681.2 RefSeq: NC_000019.10 Personalis, Inc. | Confidential 10 and Proprietary
  • 11. By any other name chr19_KI270938v1_alt CHR_HSCHR19KIR_G248_BA2_HAP_CTG3_1 GenBank: KI270886.1 RefSeq: NT_187640.1 Personalis, Inc. | Confidential 11 and Proprietary
  • 12. Unflattening the data MICB Reporting formats (GFF, VCF, etc) don’t manage multiple locations easily Personalis, Inc. | Confidential 12 and Proprietary
  • 13. NW_003871068.1 NC_000006.12 BestRefSeq gene 31494881 31511124 . + . ID=gene13336;Name=MICB;Dbxref=GeneID:4277 NT_167244.2 BestRefSeq gene 2827449 2843674 . + . ID=gene42005;Name=MICB;Dbxref=GeneID:4277 NT_113891.3 BestRefSeq gene 2972222 2988464 . + . ID=gene43669;Name=MICB;Dbxref=GeneID:4277 NT_167245.2 BestRefSeq gene 2742492 2758910 . + . ID=gene44377;Name=MICB;Dbxref=GeneID:4277 NT_167246.2 BestRefSeq gene 2810648 2816200 . + . ID=gene44827;Name=MICB;Dbxref=GeneID:4277 NT_167247.2 BestRefSeq gene 2836836 2853071 . + . ID=gene45127;Name=MICB;Dbxref=GeneID:4277 ID=gene13336;Name=MICB;Dbxref=GeneID:4277 ID=gene42005;Name=MICB;Dbxref=GeneID:4277 ID=gene43669;Name=MICB;Dbxref=GeneID:4277 ID=gene44377;Name=MICB;Dbxref=GeneID:4277 ID=gene44827;Name=MICB;Dbxref=GeneID:4277 ID=gene45127;Name=MICB;Dbxref=GeneID:4277 Building snpEFF Personalis, Inc. | Confidential 13 and Proprietary
  • 14. Incremental steps: using fix patches Personalis, Inc. | Confidential 14 and Proprietary SHANK2
  • 15. Personalis, Inc. | Confidential 15 and Proprietary Using Fix patches to improve alignments Incremental steps: using fix patches
  • 16. Migrating to GRCh38: using Fix patches Personalis, Inc. | Confidential 16 and Proprietary Fix patch hs37d5
  • 17. Migrating to GRCh38: using Fix patches Personalis, Inc. | Confidential 17 and Proprietary Fix patch hs37d5
  • 18. Migrating to GRCh38: using Fix patches GRCh37 vs. Fix Patch GRCh38 Personalis, Inc. | Confidential 18 and Proprietary
  • 19. GRCh37.p13 Improved alignments outside of fix patch regions Personalis, Inc. | Confidential 19 and Proprietary Jason Harris Regions outside of fix patches hs37d5 GRCh37.p13 hs37d5 GRCh37.p13 378 Ten kb windows that don’t overlap fix patches with >10 SNV call differences
  • 20. GRCh37.p13 Improved alignments outside of fix patch regions Personalis, Inc. | Confidential 20 and Proprietary Jason Harris hs37d5 GRCh37.p13 hs37d5 GRCh37.p13 hs37d5 GRCh37.p13
  • 21. Using Fix patches Personalis, Inc. | Confidential 21 and Proprietary
  • 22. Aligning GRCh37 and GRCh38 A A B’ Seq in assembly 1 Personalis, Inc. | Confidential 22 and Proprietary Seq in assembly 2 B B Unique well aligned region in both assemblies. Second Pass (SP) alignments First Pass (FP) alignments SP only Expansion Assembly 1 SP + FP Collapse Assembly 2
  • 23. Aligning GRCh37 and GRCh38 Personalis, Inc. | Confidential 23 and Proprietary
  • 24. Mapping to GRCh38 Personalis, Inc. | Confidential 24 and Proprietary
  • 25. Mapping to GRCh38 Dataset Starting loci Failure Unique to Personalis, Inc. | Confidential 25 and Proprietary Primary Unique to Alts Collapse in GRCh37 Collapse in GRCh38 GWAS catalog 7,991 0 7,827 0 14 0 ClinVar* 88,343 3 86,549 5 278 4 GO-ESP 6500 1,982,177 180 1,920,864 339 5,792 324 GIAB 2,915,713 274 2,874,786 47 1,662 4 NCBI assembly-assembly alignments from: Sept 20, 2014, software version 1.7 *clinvar_20140902.vcf
  • 26. Remap vs. liftOver liftOver-dbSNP remap Personalis, Inc. | Confidential 26 and Proprietary rs141109950 chr7
  • 27. Remap vs. liftOver Personalis, Inc. | Confidential 27 and Proprietary rs267602252 remap liftOver
  • 28. Migrating to GRCh38 First Pass remap Second Pass remap Personalis, Inc. | Confidential 28 and Proprietary
  • 29. Migrating to GRCh38 New PRODH paralog Sequence is unlocalized on chr22. Personalis, Inc. | Confidential 29 and Proprietary
  • 30. Using GRCh38 to improve GRCh37 annotation Personalis, Inc. | Confidential 30 and Proprietary KCNE1 Alignment to new paralog added in GRCh38
  • 31. Getting the most out of the reference Still challenging because tools and data structures expect a flat assembly Remap/liftOver not the final answer for moving variation Even modest changes (via fix patches) are promsing Personalis, Inc. | Confidential 31 and Proprietary

Editor's Notes

  1. Sarah mosacism/UDP slide
  2. Mutations in DPYD result in dihydropyrimidine dehydrogenase deficiency, an error in pyrimidine metabolism associated with thymine-uraciluria and an increased risk of toxicity in cancer patients receiving 5-flourouracil. Replace this with protein coding info and stats? And Valerie’s poster
  3. We can also see improvements outside of fix patch regions. Here we see another normalized read plot, with blue representing GRCh37 and green showing alignments to our fix patch version. Not only do we see alignment improvements, but this carries through to variant calling. We have identified 378 10 Kb windows that don’t overlap fix patches but have greater than 10 SNV call differences. Here is one such example, where a seeming SNP dense region, with lots of heterozygous SNPs now looks much cleaner- and has not heterozygous SNPs.
  4. The NCBI assembly-assembly alignment process uses a two step approach. In the first pass, a set of heuristics, including assembly structure are used to generate a set of essentially reciprocal best hits Then, the process does a second pass, looking for regions greater than 5kb in each assembly and tries to recover alignments in these regions. A report is produced that marks up whether a region is in a first pass or second pass alignment- analyzing this report can identify regions that are likely expanded in an assembly or collapsed in an assembly.
  5. We can use this data to plot the amount (expressed as an percentage of each chromosome) of collapse in each assembly. It is worth noting that this is some collapse in GRCh38, which is expected as several misassembled regions contained more than one haplotype and these haplotype expansions where removed in GRCh38. However, we can see the landscape is dominated by collapse in GRCh37. Variants called within these regions in GRCh37 are candidates for false positive variant calls.
  6. Susceptibility to thyrotoxic periodic paralysis