Transitioning to gr_ch38

•Download as PPTX, PDF•

2 likes•1,963 views

Personalis is transitioning to using the GRCh38 human reference genome. This new version includes 3.6 Mb of novel sequence and 153 genes not present in the previous assembly. Analysis of variants is more challenging with the new assembly due to additional paralogous and allelic duplications as well as alternate loci. New computational tools are needed to properly align sequences and call variants in these complex genomic regions.

Health & Medicine

© 2014 Personalis, Inc. All rights reserved.
Pioneering Genome-Guided Medicine
Deanna M. Church
Senior Directory of Genomics and Content
Transitioning to GRCh38

Personalis, Inc.2
Who we are
Inherited
Disease
Diagnostics
Cancer
Services
ACE Platform
Research
Services

Personalis, Inc.3
Reference assembly influence
Gene1 Gene2
Gene1
Sample
Ref
Assembly

Personalis, Inc.4
Excitement about GRCh38
GGAACGCAG
GGAACACAG
DPYD
R->C
Alt loci
Model Centromere Sequences
Miga et al., 2014

Personalis, Inc.5
CCL3: region: GRCh37
NC_000017.10 (chr17): 34,442,621-35,005,379

Personalis, Inc.6
CCL5-TBC1D3 region: GRCh38
NC_000017.11 (chr17): 36,032,574-36,269,924
NT_187661.1
100 Kb deletion on chromosome
Steinberg et al., 2014 http://dx.doi.org/10.1101/006841

7
Alternate Loci and Genes
3.6 Mb of novel sequence
153 genes not on primary assembly
Unique sequence in alternate loci
Total: 3.6 Mb; 153 genes only on alts

Personalis, Inc.8
Alt Loci and Genes
25% Medically Interpretable Genes (MIG)
Primary Assembly
Alt Locus
6.4%
6.2%0.18%

Personalis, Inc.9
Alt Loci and Genes
NT_167246.2: MHC alternate locus
No SNP annotationSparse SNP
annotation

Personalis, Inc.10
Analysis challenges
Primary Assembly
Paralogous duplication
Allelic duplication
Alt Locus
MapQ
https://github.com/GenomeRef/SoftwareDevTracking

Personalis, Inc.11
Analysis challenges: variant representation
Primary Assembly
Alt Locus
G>C
1/1 Only valid if homozygous for Alt
1/. Correct if heterozygous for Alt

Personalis, Inc.12
Waiting for graph representations?
Credit: UC Santa Cruz Genomics Institute

Personalis, Inc.13
Analysis challenges
chr19 vs 19
GenBank: CM00681.2
RefSeq: NC_000019.10

Personalis, Inc.14
Analysis challenges
chr19_KI270938v1_alt
CHR_HSCHR19KIR_G248_BA2_HAP_CTG3_1
GenBank: KI270886.1
RefSeq: NT_187640.1

Personalis, Inc.15
Analysis challenges MICB
Reporting formats (GFF, VCF, etc) don’t
manage multiple locations easily

Personalis, Inc.16
NW_003871068.1
NC_000006.12 BestRefSeq gene 31494881 31511124 . + . ID=gene13336;Name=MICB;Dbxref=GeneID:4277
NT_167244.2 BestRefSeq gene 2827449 2843674 . + . ID=gene42005;Name=MICB;Dbxref=GeneID:4277
NT_113891.3 BestRefSeq gene 2972222 2988464 . + . ID=gene43669;Name=MICB;Dbxref=GeneID:4277
NT_167245.2 BestRefSeq gene 2742492 2758910 . + . ID=gene44377;Name=MICB;Dbxref=GeneID:4277
NT_167246.2 BestRefSeq gene 2810648 2816200 . + . ID=gene44827;Name=MICB;Dbxref=GeneID:4277
NT_167247.2 BestRefSeq gene 2836836 2853071 . + . ID=gene45127;Name=MICB;Dbxref=GeneID:4277
ID=gene13336;Name=MICB;Dbxref=GeneID:4277
ID=gene42005;Name=MICB;Dbxref=GeneID:4277
ID=gene43669;Name=MICB;Dbxref=GeneID:4277
ID=gene44377;Name=MICB;Dbxref=GeneID:4277
ID=gene44827;Name=MICB;Dbxref=GeneID:4277
ID=gene45127;Name=MICB;Dbxref=GeneID:4277

Personalis, Inc.17
Analysis challenges
• Need aligners that can distinguish allelic and
paralogous duplication
• Need variant callers/modules than can correctly
assign genotypes in complex regions
• Need to extend file formats to accommodate new
assembly model

What's hot

Rewriting the Genome Using CRISPR and Synthetic Biology Integrated DNA Technologies

Mane v2 finalGenome Reference Consortium

Schneider grc workshop_finalGenome Reference Consortium

An Introduction to Crispr Genome EditingChris Thorne

CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MADiane McKenna

2nd CRISPR Congress Boston, 23-25 February 2016 Diane McKenna

Translating Genomes | Personalizing MedicineCandy Smellie

GENASSIST™ CRISPR & rAAV Genome Editing ToolsCandy Smellie

Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Candy Smellie

Sept2016 smallvar 10_xGenomeInABottle

Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium

project_1_simsShay Seguin

2018 1016 trio_binning_ashg_arhie_finalGenome Reference Consortium

Aug2013 tumor normal whole genome sequencingGenomeInABottle

Literature mining and large-scale data integrationLars Juhl Jensen

Lack of association between CD45 C77G polymorphism and multiple sclerosis in ...ijtsrd

F Giordano ScanPAV Analysis PipelineFrancesca Giordano

AI Systems @ ManchesterAndre Freitas

The CRISPR/Cas9 ToolboxBioInformatics, LLC

Genome editing comes of ageJan Hryca

What's hot (20)

Rewriting the Genome Using CRISPR and Synthetic Biology

Mane v2 final

Schneider grc workshop_final

An Introduction to Crispr Genome Editing

CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA

2nd CRISPR Congress Boston, 23-25 February 2016

Translating Genomes | Personalizing Medicine

GENASSIST™ CRISPR & rAAV Genome Editing Tools

Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...

Sept2016 smallvar 10_x

Advancements in the human genome reference assembly (GRCh38)

project_1_sims

2018 1016 trio_binning_ashg_arhie_final

Aug2013 tumor normal whole genome sequencing

Literature mining and large-scale data integration

Lack of association between CD45 C77G polymorphism and multiple sclerosis in ...

F Giordano ScanPAV Analysis Pipeline

AI Systems @ Manchester

The CRISPR/Cas9 Toolbox

Genome editing comes of age

Similar to Transitioning to gr_ch38

CDAC 2018 Boeva analysis chromatinMarco Antoniotti

Utilization of NGS to Identify Clinically-Relevant Mutations in cfDNA: Meet t...QIAGEN

Guide Picker Poster V3Soren Hough

Church SFAF2014 keynoteDeanna Church

Biomed centralGovernment Medical College

Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg

A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...CancerImagingInforma

CRISPR Screening: the What, Why and HowHorizonDiscovery

Zinc supplementation may reduce the risk of hepatocellular carcinoma using bi...caijjournal

Genomics & Epigenomicsgumccomm

VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCINGNARRANAGAPAVANKUMAR

Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky

Detecting clinically actionable somatic structural aberrations from targeted ...Ronak Shah

High-resolution melt analysis for semen discriminationJoana Antunes, PhD

Church gia13Deanna Church

Applications of Next generation sequencing in Drug Discoveryvjain38

Next generation Sequencing in Drug DiscoveryVanshikaJain757478

Clasificación de riesgo en renal metastásicoMauricio Lema

Use of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) for Singl...Douglas Wu

Quantitative Medicine Feb 2009Ian Foster

Similar to Transitioning to gr_ch38 (20)

CDAC 2018 Boeva analysis chromatin

Utilization of NGS to Identify Clinically-Relevant Mutations in cfDNA: Meet t...

Guide Picker Poster V3

Church SFAF2014 keynote

Biomed central

Visual Exploration of Clinical and Genomic Data for Patient Stratification

A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...

CRISPR Screening: the What, Why and How

Zinc supplementation may reduce the risk of hepatocellular carcinoma using bi...

Genomics & Epigenomics

VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCING

Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...

Detecting clinically actionable somatic structural aberrations from targeted ...

High-resolution melt analysis for semen discrimination

Church gia13

Applications of Next generation sequencing in Drug Discovery

Next generation Sequencing in Drug Discovery

Clasificación de riesgo en renal metastásico

Use of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) for Singl...

Quantitative Medicine Feb 2009

Recently uploaded

❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...Sheetaleventcompany

Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...Sheetaleventcompany

(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...TanyaAhuja34

Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Namrata Singh

Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...gragneelam30

💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...Sheetaleventcompany

ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxSwetaba Besh

Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...Sheetaleventcompany

Call Girls in Lucknow Just Call 👉👉 8875999948 Top Class Call Girl Service Ava...Janvi Singh

Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...Sheetaleventcompany

Cheap Rate Call Girls Bangalore {9179660964} ❤️VVIP BEBO Call Girls in Bangal...Sheetaleventcompany

Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora

Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room DeliveryJyoti singh

Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...Sheetaleventcompany

Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora

💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...Sheetaleventcompany

💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...Sheetaleventcompany

Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book nowtanudubay92

tongue disease lecture Dr Assadawy legacyDrMohamed Assadawy

Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Availableperfect solution

Recently uploaded (20)

❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...

Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...

(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...

Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...

Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...

💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...

ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx

Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...

Call Girls in Lucknow Just Call 👉👉 8875999948 Top Class Call Girl Service Ava...

Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...

Cheap Rate Call Girls Bangalore {9179660964} ❤️VVIP BEBO Call Girls in Bangal...

Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available

Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery

Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...

Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available

💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...

💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...

Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now

tongue disease lecture Dr Assadawy legacy

Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available

Transitioning to gr_ch38

2. Personalis, Inc.2 Who we are Inherited Disease Diagnostics Cancer Services ACE Platform Research Services

3. Personalis, Inc.3 Reference assembly influence Gene1 Gene2 Gene1 Sample Ref Assembly

4. Personalis, Inc.4 Excitement about GRCh38 GGAACGCAG GGAACACAG DPYD R->C Alt loci Model Centromere Sequences Miga et al., 2014

5. Personalis, Inc.5 CCL3: region: GRCh37 NC_000017.10 (chr17): 34,442,621-35,005,379

6. Personalis, Inc.6 CCL5-TBC1D3 region: GRCh38 NC_000017.11 (chr17): 36,032,574-36,269,924 NT_187661.1 100 Kb deletion on chromosome Steinberg et al., 2014 http://dx.doi.org/10.1101/006841

7. 7 Alternate Loci and Genes 3.6 Mb of novel sequence 153 genes not on primary assembly Unique sequence in alternate loci Total: 3.6 Mb; 153 genes only on alts

8. Personalis, Inc.8 Alt Loci and Genes 25% Medically Interpretable Genes (MIG) Primary Assembly Alt Locus 6.4% 6.2%0.18%

9. Personalis, Inc.9 Alt Loci and Genes NT_167246.2: MHC alternate locus No SNP annotationSparse SNP annotation

10. Personalis, Inc.10 Analysis challenges Primary Assembly Paralogous duplication Allelic duplication Alt Locus MapQ https://github.com/GenomeRef/SoftwareDevTracking

11. Personalis, Inc.11 Analysis challenges: variant representation Primary Assembly Alt Locus G>C 1/1 Only valid if homozygous for Alt 1/. Correct if heterozygous for Alt

12. Personalis, Inc.12 Waiting for graph representations? Credit: UC Santa Cruz Genomics Institute

13. Personalis, Inc.13 Analysis challenges chr19 vs 19 GenBank: CM00681.2 RefSeq: NC_000019.10

14. Personalis, Inc.14 Analysis challenges chr19_KI270938v1_alt CHR_HSCHR19KIR_G248_BA2_HAP_CTG3_1 GenBank: KI270886.1 RefSeq: NT_187640.1

15. Personalis, Inc.15 Analysis challenges MICB Reporting formats (GFF, VCF, etc) don’t manage multiple locations easily

16. Personalis, Inc.16 NW_003871068.1 NC_000006.12 BestRefSeq gene 31494881 31511124 . + . ID=gene13336;Name=MICB;Dbxref=GeneID:4277 NT_167244.2 BestRefSeq gene 2827449 2843674 . + . ID=gene42005;Name=MICB;Dbxref=GeneID:4277 NT_113891.3 BestRefSeq gene 2972222 2988464 . + . ID=gene43669;Name=MICB;Dbxref=GeneID:4277 NT_167245.2 BestRefSeq gene 2742492 2758910 . + . ID=gene44377;Name=MICB;Dbxref=GeneID:4277 NT_167246.2 BestRefSeq gene 2810648 2816200 . + . ID=gene44827;Name=MICB;Dbxref=GeneID:4277 NT_167247.2 BestRefSeq gene 2836836 2853071 . + . ID=gene45127;Name=MICB;Dbxref=GeneID:4277 ID=gene13336;Name=MICB;Dbxref=GeneID:4277 ID=gene42005;Name=MICB;Dbxref=GeneID:4277 ID=gene43669;Name=MICB;Dbxref=GeneID:4277 ID=gene44377;Name=MICB;Dbxref=GeneID:4277 ID=gene44827;Name=MICB;Dbxref=GeneID:4277 ID=gene45127;Name=MICB;Dbxref=GeneID:4277

17. Personalis, Inc.17 Analysis challenges • Need aligners that can distinguish allelic and paralogous duplication • Need variant callers/modules than can correctly assign genotypes in complex regions • Need to extend file formats to accommodate new assembly model

Editor's Notes

Missing and misassembled sequence in the reference assembly can have dire consequences to genome interpretation. In this example, Gene2 is missing from the reference, but present in the sample we are analyzying. Regardless of whether gene2 is missing because of an assembly error, or because it is polymorphic in the population the outcome can be the same. In the best case scenario, reads from gene2 don’t align to the reference and we just can’t analyze gene2. However, if gene2 is related to gene1, we can get off targets alignments that can confound analysis of Gene1 as well, either leading to under calling in the region, or possibly leading to inappropriately calling paralogous sequence variants as allelic sequence variants. If we take sequences we know to be missing in GRCh37, simulate reads and then align these reads to GRCh37, we see that 75% of these find an off target alignment, regardless of the alignment method used. This is why Heng Li created decoy sequences for the 1000 genomes project- in an effort to reduce off-target alignments. However, we still lack the ability to analyze gene2 in this scenario. This underscores the importance of representing all common human sequences in the reference assembly.
Mutations in DPYD result in dihydropyrimidine dehydrogenase deficiency, an error in pyrimidine metabolism associated with thymine-uraciluria and an increased risk of toxicity in cancer patients receiving 5-flourouracil. Replace this with protein coding info and stats? And Valerie’s poster
The CCL3 region on chromosome 17 allows us to explore two major updates seen in GRCh38, and hopefully will underscore the importance of representing missing paralogs in the reference. This region is known to be copy number variant, with individuals having 0-4 copies of a 90Kb repeat unit. In GRCh37, the region was assembled from several sources that contain different structural variants. This led to the creation of a false gap, and a genomic representation that does not likely exist in anyone on the planet. Being able to correctly represent the genomic architecture of this regions is important as there is some, albeit conflicting evidence, of the correlation of the number of copies of CCL3L1 with HIV infection and progression to AIDs.
To better represent this region, the GRC made a new clone tiling path in this region from a single haplotype resource derived from a hyaditiform mole. An additional allele, representing a 100 Kb insertion was also generated and placed in the assembly as an alternate locus. The reference assembly now has two correct representations of this region – though we may need more.
For this reason, many people have just ignored these sequences, but doing so in GRCh38 means losing 3.6 Mb of sequence unique to the alternate loci- sequence containing 153 genes. This graph shows the distribution of the amount of unique sequence per alternate locus- so while it is clear they do not all contribute equal amounts of novel sequence, in aggregate the amount is significant. The GRC recently held a workshop to encourage development of new tools that can handle the full assembly, and Heng Li has already distributed a version of BWA-MEM that is alt-locus aware, we need to do considerable testing and additional development to make sure we are using these sequences correctly. We also need to assess the ramifications of this new structure on other parts of the tool chain.

Transitioning to gr_ch38

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Transitioning to gr_ch38

Similar to Transitioning to gr_ch38 (20)

More from Genome Reference Consortium

More from Genome Reference Consortium (20)

Recently uploaded

Recently uploaded (20)

Transitioning to gr_ch38

Editor's Notes