SlideShare a Scribd company logo
Graph and assembly strategies for the
MHC and ribosomal DNA regions
Alexander Dilthey
The MHC is the zebrafish of the genome!
(model region)
PRGs – Population Reference Graphs
• Simple: acyclic, directed (sub-class of general variation graphs)
• Usually built from MSA, preserve gap positions
(i.e. global homology between input sequences).
• Generative model: Recombination
• Ploidy well-defined (0, 1, 2)
TA CT A G
C
C
_
_
A
TA
A
Outline
• Quick recap:
What we know about the utility of graph genome approaches
• New results:
Haplotyping in hypervariable regions (HLA)
Pseudo graph alignment
• De novo assembly of ribosomal DNA
In most of the MHC, single-reference
approaches work just fine…
Numberofkmers(millions)
4.55.0
PGFreference Platypus PRG-Viterbi PRG-Mapped
kmersrecovered
kmersnot recovered
+ long-read validation with consistent results (not shown)
Dilthey et al., Nature Genetics 2015
… graph genomes outperform in the most
complex sub-region of the MHC …
Dilthey et al., Nature Genetics 2015
… remaining problems driven by incomplete
input haplotypes + algorithmics.
Aligned kmers
Chromotype position (kb)
Readposition(kb)
0 10 20
0
2
4
6
Incomplete input haplotypes:
Large uncharacterized inversion
Algorithmics:
Incorrect HLA haplotyping.
Dilthey et al., Nature Genetics 2015
HLA haplotyping
• Hypothesis: Whole-genome sequencing data contains the information
necessary for accurate HLA typing
• “HLA typing”  HLA gene exon sequences
• HLA class I: exons 2 and 3
• HLA class II: exon 2
• Challenge: align reads to the right gene – homology hell.
• Proper read-to-graph alignment instead of k-Mers.
Class I exon homology
Exon 2 Exon 3
HLA-A 3284 alleles
HLA-B 4077 alleles
HLA-C 2799 alleles
Approach: deep PRG + mapping
Exonic MSA
T*01:01 _ _ A C G T A C T _ _
T*01:02 C A A C A T A C T _ _
T*01:03 _ _ A C G C G C T _ _
T*01:04 _ _ A T C C G C T A C
T*01:05 _ _ A T C C C C T _ _
T*01:06 _ _ _ C C T A C T _ _
Genomic MSA
T*01:01 A G C A _ _ A C G T A C T _ _ C C T A
T*01:02 A C C A C A A C A T A C T _ _ C C T A
T*01:04 _ T T A _ _ A T C C G C T A C C C T A
8 xMHC reference haplotypes
PGF (with T*01:01) A C T A G C A _ _ A C G T A C T _ _ C C T A T G A
MANN (with T*01:04) T T T _ T T A _ _ A T C C G C T A C C C T A T G A
1) Gene-only PRG – 46 (pseudo) genes, mostly HLA
|--NNN--| |--NNN--|Gene 1 Gene 2 Gene 3
Padding UTR Exon 1 Intron 1 Exon 2 UTR Padding
Numberofreferencesequences
Region covered by 'genomic' sequences
2) Varying numbers of input sequences across PRG
3) Use hierarchical MSA approach to combine in
Approach: deep PRG + mapping
Level 1
CA
_ _
C T
C
CC
G
AAligned read
2 3 4 5 6 7
A _ TATA _ C
198 9 10 11 12 13 14 15 16 17 18 25 26
C AGTATC
20 21 22 23 24
TC
TC
T T
A
_
A _
A G
C
T
C
T
T
C T
ATA
C
C {G, C}T
C
G
CA
A
_ _
A
4) Seed-and-extend paired-end mapping to PRG
5) Likelihood-based inference: maximize L( aligned reads | HLA types )
(independently per locus)
High-quality WGS data enables gold-standard
accuracy
(of note: 2/3 original discrepancies with validation data were errors in the validation data!)
… but not from exome, MiSeq data
Sequencing error?
Effective fragment length? [2 x read length + IS]
Conclusion (intermediate)
• If the input sequencing data is „good enough“, we manage near-
perfect haplotyping in the genome‘s most polymorphic region
• Effective fragment length likely the most important factor
• Not-so-good sequencing data: joint haplotyping + alignment
(i.e. alignment location is not independent of inferred haplotype)
• Read mapping implementation SLOW
Pseudo graph mapping
Input sequences
Pseudo graph mapping
Input sequences
Graph
Pseudo graph mapping
Input sequences
Graph
Align short reads to input sequences...
Pseudo graph mapping
Input sequences
Graph
Align short reads to input sequences...
... transpose onto graph
Scrubbing, cutting, cleaning
Input MSA Lin. alignment MSA coor. Scrubbed
123456789 123456X789 123456789
Seq1 AACAC_TTT Seq1 AACAC_TTT AACAC__TTT AACAC_TTT
Seq2 TTCACGTTT Read AACACGTTT AACAC_GTTT AACACGTTT
-
Graph TTCAC TTT
G
Scrubbing: get rid of INDEL-induced changes in the alignment coordinate system
Cutting: Examine alignment gap structure; cut in „bad“ areas; use longest stretch
Cleaning: Find the best gap-less sequence-to-graph alignment + extension with gaps
Graph alignment
123456789
Graph AACACGTTT
Seq1 AACACGTTT
Accuracy slightly worse; fast!
Conclusion: perhaps there is a middle ground between graph and linear sequence
alignment. Work in progress. Further tuning?
Inferred Accuracy Call Rate Inferred Accuracy Call Rate
A 6 6 1.00 1.00 6 1.00 1.00
B 6 6 1.00 1.00 6 1.00 1.00
C 6 6 1.00 1.00 6 1.00 1.00
DQA1 6 6 1.00 1.00 6 1.00 1.00
DQB1 6 6 1.00 1.00 6 1.00 1.00
DRB1 6 6 1.00 1.00 6 1.00 1.00
A 22 22 0.86 1.00 22 1.00 1.00
B 22 22 1.00 1.00 22 1.00 1.00
C 22 22 1.00 1.00 22 1.00 1.00
DQA1 12 12 1.00 1.00 12 1.00 1.00
DQB1 22 22 1.00 1.00 22 1.00 1.00
DRB1 22 22 0.91 1.00 22 0.95 1.00
Platinum
Trio
1000
Genomes
Highest
Resolution
MHC-PRG-2 HLA*PRG
NLocusCohort
Towards additional high-quality reference
haplotypes…
Remaining challenges: extreme repeats, haplotypes.
Sergey Koren
Ribosomal DNA
• Encodes ribosomal RNA
• Hundreds of copies
(tandem repeat arrays)
• Variation poorly characterized
• Step 1: Targeted approach
• Step 2: WGS-based
• Step 3: Variation graph
Read error vs variation
… from whole-genome data?
Long reads  de Bruijn graph Technology!
6% > 50k
Summary
• Variation graphs are worth the effort – at least in highly complex regions.
• Evidence: MHC „model system“
+ overall improvement of Genome inference accuracy
+ complex-locus haplotyping
• Incorporate LD?
• Middle ground between full graph alignment and linear sequence
alignment?
• Ribosomal DNA – let me know if you‘re also interested!
Acknowledgements
NIH
Adam Phillippy
Sergey Koren
Brian Walenz
Jung-Hyun Kim
Vladimir Larionov
Oxford
Gil McVean
Zam Iqbal
Alexander Mentzer
Histogenetics
Nezih Cereb
UCSF/Nantes
Pierre-Antoine Gourraud
GSK
Matt Nelson
Charles Cox

More Related Content

Viewers also liked

Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
Genome Reference Consortium
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
Genome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
Genome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
Genome Reference Consortium
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
Genome Reference Consortium
 
Everyday de novo assembly
Everyday de novo assemblyEveryday de novo assembly
Everyday de novo assembly
Genome Reference Consortium
 
AGBT 2016 Workshop Magrini
AGBT 2016 Workshop MagriniAGBT 2016 Workshop Magrini
AGBT 2016 Workshop Magrini
Genome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
Genome Reference Consortium
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
Genome Reference Consortium
 
agbt 2016 workshop church
agbt 2016 workshop churchagbt 2016 workshop church
agbt 2016 workshop church
Genome Reference Consortium
 
AQA A2 Psychology Unit 4 - Schizophrenia
AQA A2 Psychology Unit 4 - SchizophreniaAQA A2 Psychology Unit 4 - Schizophrenia
AQA A2 Psychology Unit 4 - Schizophrenia
Snowfairy007
 

Viewers also liked (14)

Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
 
Everyday de novo assembly
Everyday de novo assemblyEveryday de novo assembly
Everyday de novo assembly
 
AGBT 2016 Workshop Magrini
AGBT 2016 Workshop MagriniAGBT 2016 Workshop Magrini
AGBT 2016 Workshop Magrini
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
 
agbt 2016 workshop church
agbt 2016 workshop churchagbt 2016 workshop church
agbt 2016 workshop church
 
AQA A2 Psychology Unit 4 - Schizophrenia
AQA A2 Psychology Unit 4 - SchizophreniaAQA A2 Psychology Unit 4 - Schizophrenia
AQA A2 Psychology Unit 4 - Schizophrenia
 

Similar to Graph and assembly strategies for the MHC and ribosomal DNA regions

Biochip
BiochipBiochip
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
Computer Science Club
 
DNA Compression (Encoded using Huffman Encoding Method)
DNA Compression (Encoded using Huffman Encoding Method)DNA Compression (Encoded using Huffman Encoding Method)
DNA Compression (Encoded using Huffman Encoding Method)
Marwa Al-Rikaby
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
IOSR Journals
 
Week1_recap fkdmfsdml dmflam,dfm mmda fmndkfnaks
Week1_recap fkdmfsdml dmflam,dfm mmda fmndkfnaksWeek1_recap fkdmfsdml dmflam,dfm mmda fmndkfnaks
Week1_recap fkdmfsdml dmflam,dfm mmda fmndkfnaks
SunsunSuhartini2
 
High throughput qPCR: tips for analysis across multiple plates
High throughput qPCR: tips for analysis across multiple platesHigh throughput qPCR: tips for analysis across multiple plates
High throughput qPCR: tips for analysis across multiple plates
Integrated DNA Technologies
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
Deanna Church
 
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
Christian Have
 
ETH_SymposiumCR
ETH_SymposiumCRETH_SymposiumCR
ETH_SymposiumCR
Chantal Roth
 
Introducing data analysis: reads to results
Introducing data analysis: reads to resultsIntroducing data analysis: reads to results
Introducing data analysis: reads to results
AGRF_Ltd
 
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Thermo Fisher Scientific
 
Wang labsummer2010
Wang labsummer2010Wang labsummer2010
Wang labsummer2010
russodl
 
LPEI_ZCNI_Poster
LPEI_ZCNI_PosterLPEI_ZCNI_Poster
LPEI_ZCNI_Poster
Long Pei
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Torsten Seemann
 
Abrf poster2007
Abrf poster2007Abrf poster2007
Abrf poster2007
Elsa von Licy
 
Paired-end alignments in sequence graphs
Paired-end alignments in sequence graphsPaired-end alignments in sequence graphs
Paired-end alignments in sequence graphs
Chirag Jain
 
Daly altshuler.labmeeting
Daly altshuler.labmeetingDaly altshuler.labmeeting
Daly altshuler.labmeeting
Manuel Rivas
 
Computational Chemistry Robots
Computational Chemistry RobotsComputational Chemistry Robots
Computational Chemistry Robots
University of Cambridge
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Databricks
 
Rt2 pcr arraydataanalysisquickcarde
Rt2 pcr arraydataanalysisquickcardeRt2 pcr arraydataanalysisquickcarde
Rt2 pcr arraydataanalysisquickcarde
Elsa von Licy
 

Similar to Graph and assembly strategies for the MHC and ribosomal DNA regions (20)

Biochip
BiochipBiochip
Biochip
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
DNA Compression (Encoded using Huffman Encoding Method)
DNA Compression (Encoded using Huffman Encoding Method)DNA Compression (Encoded using Huffman Encoding Method)
DNA Compression (Encoded using Huffman Encoding Method)
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
 
Week1_recap fkdmfsdml dmflam,dfm mmda fmndkfnaks
Week1_recap fkdmfsdml dmflam,dfm mmda fmndkfnaksWeek1_recap fkdmfsdml dmflam,dfm mmda fmndkfnaks
Week1_recap fkdmfsdml dmflam,dfm mmda fmndkfnaks
 
High throughput qPCR: tips for analysis across multiple plates
High throughput qPCR: tips for analysis across multiple platesHigh throughput qPCR: tips for analysis across multiple plates
High throughput qPCR: tips for analysis across multiple plates
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
 
ETH_SymposiumCR
ETH_SymposiumCRETH_SymposiumCR
ETH_SymposiumCR
 
Introducing data analysis: reads to results
Introducing data analysis: reads to resultsIntroducing data analysis: reads to results
Introducing data analysis: reads to results
 
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
 
Wang labsummer2010
Wang labsummer2010Wang labsummer2010
Wang labsummer2010
 
LPEI_ZCNI_Poster
LPEI_ZCNI_PosterLPEI_ZCNI_Poster
LPEI_ZCNI_Poster
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
 
Abrf poster2007
Abrf poster2007Abrf poster2007
Abrf poster2007
 
Paired-end alignments in sequence graphs
Paired-end alignments in sequence graphsPaired-end alignments in sequence graphs
Paired-end alignments in sequence graphs
 
Daly altshuler.labmeeting
Daly altshuler.labmeetingDaly altshuler.labmeeting
Daly altshuler.labmeeting
 
Computational Chemistry Robots
Computational Chemistry RobotsComputational Chemistry Robots
Computational Chemistry Robots
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
 
Rt2 pcr arraydataanalysisquickcarde
Rt2 pcr arraydataanalysisquickcardeRt2 pcr arraydataanalysisquickcarde
Rt2 pcr arraydataanalysisquickcarde
 

More from Genome Reference Consortium

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
Genome Reference Consortium
 
What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
Genome Reference Consortium
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
Genome Reference Consortium
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
Genome Reference Consortium
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
Genome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
Genome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
Genome Reference Consortium
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
Genome Reference Consortium
 
Mane v2 final
Mane v2 finalMane v2 final
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
Genome Reference Consortium
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
Genome Reference Consortium
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
Genome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
Genome Reference Consortium
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
Genome Reference Consortium
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
Genome Reference Consortium
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
Genome Reference Consortium
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
Genome Reference Consortium
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
Genome Reference Consortium
 

More from Genome Reference Consortium (18)

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 

Recently uploaded

Post infectious bronchiolitis obliterans
Post infectious bronchiolitis obliteransPost infectious bronchiolitis obliterans
Post infectious bronchiolitis obliterans
drfardosy
 
THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...
THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...
THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...
DRPREETHIJAMESP
 
General Endocrinology and mechanism of action of hormones
General Endocrinology and mechanism of action of hormonesGeneral Endocrinology and mechanism of action of hormones
General Endocrinology and mechanism of action of hormones
MedicoseAcademics
 
Prakinsons disease and its affect on eye.
Prakinsons disease and its affect on eye.Prakinsons disease and its affect on eye.
Prakinsons disease and its affect on eye.
Riya Bist
 
Safeguarding Reproductive Health- Preventing Fallopian Tube Blockage After a ...
Safeguarding Reproductive Health- Preventing Fallopian Tube Blockage After a ...Safeguarding Reproductive Health- Preventing Fallopian Tube Blockage After a ...
Safeguarding Reproductive Health- Preventing Fallopian Tube Blockage After a ...
FFragrant
 
GAIRIKA.pptx for Rasashastra and Bhaisajya kalpana
GAIRIKA.pptx for Rasashastra and Bhaisajya kalpanaGAIRIKA.pptx for Rasashastra and Bhaisajya kalpana
GAIRIKA.pptx for Rasashastra and Bhaisajya kalpana
AparnaNandakumar12
 
Surat @Girls @ℂall 👄 XX000000XX 👄 With Cash Payment Home Delivery
Surat @Girls @ℂall 👄 XX000000XX 👄 With Cash Payment Home DeliverySurat @Girls @ℂall 👄 XX000000XX 👄 With Cash Payment Home Delivery
Surat @Girls @ℂall 👄 XX000000XX 👄 With Cash Payment Home Delivery
khandiya#G05
 
STRATEGIES FOR RATIONALISING/REDUCING CAESAREAN SECTION RATE BY USE OF "SION ...
STRATEGIES FOR RATIONALISING/REDUCING CAESAREAN SECTION RATE BY USE OF "SION ...STRATEGIES FOR RATIONALISING/REDUCING CAESAREAN SECTION RATE BY USE OF "SION ...
STRATEGIES FOR RATIONALISING/REDUCING CAESAREAN SECTION RATE BY USE OF "SION ...
Niranjan Chavan
 
OBSTETRICS SEPSIS - BUNDLE APPROACH.pptx
OBSTETRICS SEPSIS - BUNDLE APPROACH.pptxOBSTETRICS SEPSIS - BUNDLE APPROACH.pptx
OBSTETRICS SEPSIS - BUNDLE APPROACH.pptx
Niranjan Chavan
 
Bangalore @Girls @Call WhatsApp Numbers 🫦0000XX0000🫦 List For Friendship Girl...
Bangalore @Girls @Call WhatsApp Numbers 🫦0000XX0000🫦 List For Friendship Girl...Bangalore @Girls @Call WhatsApp Numbers 🫦0000XX0000🫦 List For Friendship Girl...
Bangalore @Girls @Call WhatsApp Numbers 🫦0000XX0000🫦 List For Friendship Girl...
paridubey2024#G05
 
Definition of Radiotherapy Treatment Planning.pptx
Definition of Radiotherapy Treatment Planning.pptxDefinition of Radiotherapy Treatment Planning.pptx
Definition of Radiotherapy Treatment Planning.pptx
Dr. Dheeraj Kumar
 
Subcutaneous nodules in rheumatic diseases Ahmed Yehia Assistant Professor of...
Subcutaneous nodules in rheumatic diseases Ahmed Yehia Assistant Professor of...Subcutaneous nodules in rheumatic diseases Ahmed Yehia Assistant Professor of...
Subcutaneous nodules in rheumatic diseases Ahmed Yehia Assistant Professor of...
Internal medicine department, faculty of Medicine Beni-Suef University Egypt
 
anthelmintic-drugs.pptx pharmacology dep
anthelmintic-drugs.pptx pharmacology depanthelmintic-drugs.pptx pharmacology dep
anthelmintic-drugs.pptx pharmacology dep
sapnasirswal
 
Yoga talk & yoga slides by Flametree Yoga 11 July 2024.pdf
Yoga talk & yoga slides by Flametree Yoga 11 July 2024.pdfYoga talk & yoga slides by Flametree Yoga 11 July 2024.pdf
Yoga talk & yoga slides by Flametree Yoga 11 July 2024.pdf
Stuart McGill
 
Article - Design and evaluation of novel inhibitors for the treatment of clea...
Article - Design and evaluation of novel inhibitors for the treatment of clea...Article - Design and evaluation of novel inhibitors for the treatment of clea...
Article - Design and evaluation of novel inhibitors for the treatment of clea...
Trustlife
 
Hemodialysis: Chapter 11, Venous Catheter - Basics, Insertion, Use and Care -...
Hemodialysis: Chapter 11, Venous Catheter - Basics, Insertion, Use and Care -...Hemodialysis: Chapter 11, Venous Catheter - Basics, Insertion, Use and Care -...
Hemodialysis: Chapter 11, Venous Catheter - Basics, Insertion, Use and Care -...
NephroTube - Dr.Gawad
 
BCBR MCQs with Answers.pdf for exam for NMC promotions
BCBR MCQs with Answers.pdf for exam for NMC promotionsBCBR MCQs with Answers.pdf for exam for NMC promotions
BCBR MCQs with Answers.pdf for exam for NMC promotions
sathya swaroop patnaik
 
Overcoming Erectile Dysfunction Lifestyle Changes and the Role of Sildigra 25...
Overcoming Erectile Dysfunction Lifestyle Changes and the Role of Sildigra 25...Overcoming Erectile Dysfunction Lifestyle Changes and the Role of Sildigra 25...
Overcoming Erectile Dysfunction Lifestyle Changes and the Role of Sildigra 25...
ED PIllsForever
 
Rice Bran Oil Manufacturing Process
Rice Bran Oil Manufacturing ProcessRice Bran Oil Manufacturing Process
Rice Bran Oil Manufacturing Process
nishurani4455
 
Verified Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service...
Verified Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service...Verified Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service...
Verified Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service...
ailynolive
 

Recently uploaded (20)

Post infectious bronchiolitis obliterans
Post infectious bronchiolitis obliteransPost infectious bronchiolitis obliterans
Post infectious bronchiolitis obliterans
 
THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...
THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...
THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...
 
General Endocrinology and mechanism of action of hormones
General Endocrinology and mechanism of action of hormonesGeneral Endocrinology and mechanism of action of hormones
General Endocrinology and mechanism of action of hormones
 
Prakinsons disease and its affect on eye.
Prakinsons disease and its affect on eye.Prakinsons disease and its affect on eye.
Prakinsons disease and its affect on eye.
 
Safeguarding Reproductive Health- Preventing Fallopian Tube Blockage After a ...
Safeguarding Reproductive Health- Preventing Fallopian Tube Blockage After a ...Safeguarding Reproductive Health- Preventing Fallopian Tube Blockage After a ...
Safeguarding Reproductive Health- Preventing Fallopian Tube Blockage After a ...
 
GAIRIKA.pptx for Rasashastra and Bhaisajya kalpana
GAIRIKA.pptx for Rasashastra and Bhaisajya kalpanaGAIRIKA.pptx for Rasashastra and Bhaisajya kalpana
GAIRIKA.pptx for Rasashastra and Bhaisajya kalpana
 
Surat @Girls @ℂall 👄 XX000000XX 👄 With Cash Payment Home Delivery
Surat @Girls @ℂall 👄 XX000000XX 👄 With Cash Payment Home DeliverySurat @Girls @ℂall 👄 XX000000XX 👄 With Cash Payment Home Delivery
Surat @Girls @ℂall 👄 XX000000XX 👄 With Cash Payment Home Delivery
 
STRATEGIES FOR RATIONALISING/REDUCING CAESAREAN SECTION RATE BY USE OF "SION ...
STRATEGIES FOR RATIONALISING/REDUCING CAESAREAN SECTION RATE BY USE OF "SION ...STRATEGIES FOR RATIONALISING/REDUCING CAESAREAN SECTION RATE BY USE OF "SION ...
STRATEGIES FOR RATIONALISING/REDUCING CAESAREAN SECTION RATE BY USE OF "SION ...
 
OBSTETRICS SEPSIS - BUNDLE APPROACH.pptx
OBSTETRICS SEPSIS - BUNDLE APPROACH.pptxOBSTETRICS SEPSIS - BUNDLE APPROACH.pptx
OBSTETRICS SEPSIS - BUNDLE APPROACH.pptx
 
Bangalore @Girls @Call WhatsApp Numbers 🫦0000XX0000🫦 List For Friendship Girl...
Bangalore @Girls @Call WhatsApp Numbers 🫦0000XX0000🫦 List For Friendship Girl...Bangalore @Girls @Call WhatsApp Numbers 🫦0000XX0000🫦 List For Friendship Girl...
Bangalore @Girls @Call WhatsApp Numbers 🫦0000XX0000🫦 List For Friendship Girl...
 
Definition of Radiotherapy Treatment Planning.pptx
Definition of Radiotherapy Treatment Planning.pptxDefinition of Radiotherapy Treatment Planning.pptx
Definition of Radiotherapy Treatment Planning.pptx
 
Subcutaneous nodules in rheumatic diseases Ahmed Yehia Assistant Professor of...
Subcutaneous nodules in rheumatic diseases Ahmed Yehia Assistant Professor of...Subcutaneous nodules in rheumatic diseases Ahmed Yehia Assistant Professor of...
Subcutaneous nodules in rheumatic diseases Ahmed Yehia Assistant Professor of...
 
anthelmintic-drugs.pptx pharmacology dep
anthelmintic-drugs.pptx pharmacology depanthelmintic-drugs.pptx pharmacology dep
anthelmintic-drugs.pptx pharmacology dep
 
Yoga talk & yoga slides by Flametree Yoga 11 July 2024.pdf
Yoga talk & yoga slides by Flametree Yoga 11 July 2024.pdfYoga talk & yoga slides by Flametree Yoga 11 July 2024.pdf
Yoga talk & yoga slides by Flametree Yoga 11 July 2024.pdf
 
Article - Design and evaluation of novel inhibitors for the treatment of clea...
Article - Design and evaluation of novel inhibitors for the treatment of clea...Article - Design and evaluation of novel inhibitors for the treatment of clea...
Article - Design and evaluation of novel inhibitors for the treatment of clea...
 
Hemodialysis: Chapter 11, Venous Catheter - Basics, Insertion, Use and Care -...
Hemodialysis: Chapter 11, Venous Catheter - Basics, Insertion, Use and Care -...Hemodialysis: Chapter 11, Venous Catheter - Basics, Insertion, Use and Care -...
Hemodialysis: Chapter 11, Venous Catheter - Basics, Insertion, Use and Care -...
 
BCBR MCQs with Answers.pdf for exam for NMC promotions
BCBR MCQs with Answers.pdf for exam for NMC promotionsBCBR MCQs with Answers.pdf for exam for NMC promotions
BCBR MCQs with Answers.pdf for exam for NMC promotions
 
Overcoming Erectile Dysfunction Lifestyle Changes and the Role of Sildigra 25...
Overcoming Erectile Dysfunction Lifestyle Changes and the Role of Sildigra 25...Overcoming Erectile Dysfunction Lifestyle Changes and the Role of Sildigra 25...
Overcoming Erectile Dysfunction Lifestyle Changes and the Role of Sildigra 25...
 
Rice Bran Oil Manufacturing Process
Rice Bran Oil Manufacturing ProcessRice Bran Oil Manufacturing Process
Rice Bran Oil Manufacturing Process
 
Verified Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service...
Verified Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service...Verified Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service...
Verified Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service...
 

Graph and assembly strategies for the MHC and ribosomal DNA regions

  • 1. Graph and assembly strategies for the MHC and ribosomal DNA regions Alexander Dilthey
  • 2. The MHC is the zebrafish of the genome! (model region)
  • 3. PRGs – Population Reference Graphs • Simple: acyclic, directed (sub-class of general variation graphs) • Usually built from MSA, preserve gap positions (i.e. global homology between input sequences). • Generative model: Recombination • Ploidy well-defined (0, 1, 2) TA CT A G C C _ _ A TA A
  • 4. Outline • Quick recap: What we know about the utility of graph genome approaches • New results: Haplotyping in hypervariable regions (HLA) Pseudo graph alignment • De novo assembly of ribosomal DNA
  • 5. In most of the MHC, single-reference approaches work just fine… Numberofkmers(millions) 4.55.0 PGFreference Platypus PRG-Viterbi PRG-Mapped kmersrecovered kmersnot recovered + long-read validation with consistent results (not shown) Dilthey et al., Nature Genetics 2015
  • 6. … graph genomes outperform in the most complex sub-region of the MHC … Dilthey et al., Nature Genetics 2015
  • 7. … remaining problems driven by incomplete input haplotypes + algorithmics. Aligned kmers Chromotype position (kb) Readposition(kb) 0 10 20 0 2 4 6 Incomplete input haplotypes: Large uncharacterized inversion Algorithmics: Incorrect HLA haplotyping. Dilthey et al., Nature Genetics 2015
  • 8. HLA haplotyping • Hypothesis: Whole-genome sequencing data contains the information necessary for accurate HLA typing • “HLA typing”  HLA gene exon sequences • HLA class I: exons 2 and 3 • HLA class II: exon 2 • Challenge: align reads to the right gene – homology hell. • Proper read-to-graph alignment instead of k-Mers.
  • 9. Class I exon homology Exon 2 Exon 3 HLA-A 3284 alleles HLA-B 4077 alleles HLA-C 2799 alleles
  • 10. Approach: deep PRG + mapping Exonic MSA T*01:01 _ _ A C G T A C T _ _ T*01:02 C A A C A T A C T _ _ T*01:03 _ _ A C G C G C T _ _ T*01:04 _ _ A T C C G C T A C T*01:05 _ _ A T C C C C T _ _ T*01:06 _ _ _ C C T A C T _ _ Genomic MSA T*01:01 A G C A _ _ A C G T A C T _ _ C C T A T*01:02 A C C A C A A C A T A C T _ _ C C T A T*01:04 _ T T A _ _ A T C C G C T A C C C T A 8 xMHC reference haplotypes PGF (with T*01:01) A C T A G C A _ _ A C G T A C T _ _ C C T A T G A MANN (with T*01:04) T T T _ T T A _ _ A T C C G C T A C C C T A T G A 1) Gene-only PRG – 46 (pseudo) genes, mostly HLA |--NNN--| |--NNN--|Gene 1 Gene 2 Gene 3 Padding UTR Exon 1 Intron 1 Exon 2 UTR Padding Numberofreferencesequences Region covered by 'genomic' sequences 2) Varying numbers of input sequences across PRG 3) Use hierarchical MSA approach to combine in
  • 11. Approach: deep PRG + mapping Level 1 CA _ _ C T C CC G AAligned read 2 3 4 5 6 7 A _ TATA _ C 198 9 10 11 12 13 14 15 16 17 18 25 26 C AGTATC 20 21 22 23 24 TC TC T T A _ A _ A G C T C T T C T ATA C C {G, C}T C G CA A _ _ A 4) Seed-and-extend paired-end mapping to PRG 5) Likelihood-based inference: maximize L( aligned reads | HLA types ) (independently per locus)
  • 12. High-quality WGS data enables gold-standard accuracy (of note: 2/3 original discrepancies with validation data were errors in the validation data!)
  • 13. … but not from exome, MiSeq data
  • 15. Effective fragment length? [2 x read length + IS]
  • 16. Conclusion (intermediate) • If the input sequencing data is „good enough“, we manage near- perfect haplotyping in the genome‘s most polymorphic region • Effective fragment length likely the most important factor • Not-so-good sequencing data: joint haplotyping + alignment (i.e. alignment location is not independent of inferred haplotype) • Read mapping implementation SLOW
  • 18. Pseudo graph mapping Input sequences Graph
  • 19. Pseudo graph mapping Input sequences Graph Align short reads to input sequences...
  • 20. Pseudo graph mapping Input sequences Graph Align short reads to input sequences... ... transpose onto graph
  • 21. Scrubbing, cutting, cleaning Input MSA Lin. alignment MSA coor. Scrubbed 123456789 123456X789 123456789 Seq1 AACAC_TTT Seq1 AACAC_TTT AACAC__TTT AACAC_TTT Seq2 TTCACGTTT Read AACACGTTT AACAC_GTTT AACACGTTT - Graph TTCAC TTT G Scrubbing: get rid of INDEL-induced changes in the alignment coordinate system Cutting: Examine alignment gap structure; cut in „bad“ areas; use longest stretch Cleaning: Find the best gap-less sequence-to-graph alignment + extension with gaps Graph alignment 123456789 Graph AACACGTTT Seq1 AACACGTTT
  • 22. Accuracy slightly worse; fast! Conclusion: perhaps there is a middle ground between graph and linear sequence alignment. Work in progress. Further tuning? Inferred Accuracy Call Rate Inferred Accuracy Call Rate A 6 6 1.00 1.00 6 1.00 1.00 B 6 6 1.00 1.00 6 1.00 1.00 C 6 6 1.00 1.00 6 1.00 1.00 DQA1 6 6 1.00 1.00 6 1.00 1.00 DQB1 6 6 1.00 1.00 6 1.00 1.00 DRB1 6 6 1.00 1.00 6 1.00 1.00 A 22 22 0.86 1.00 22 1.00 1.00 B 22 22 1.00 1.00 22 1.00 1.00 C 22 22 1.00 1.00 22 1.00 1.00 DQA1 12 12 1.00 1.00 12 1.00 1.00 DQB1 22 22 1.00 1.00 22 1.00 1.00 DRB1 22 22 0.91 1.00 22 0.95 1.00 Platinum Trio 1000 Genomes Highest Resolution MHC-PRG-2 HLA*PRG NLocusCohort
  • 23. Towards additional high-quality reference haplotypes… Remaining challenges: extreme repeats, haplotypes. Sergey Koren
  • 24. Ribosomal DNA • Encodes ribosomal RNA • Hundreds of copies (tandem repeat arrays) • Variation poorly characterized • Step 1: Targeted approach • Step 2: WGS-based • Step 3: Variation graph
  • 25. Read error vs variation … from whole-genome data? Long reads  de Bruijn graph Technology! 6% > 50k
  • 26. Summary • Variation graphs are worth the effort – at least in highly complex regions. • Evidence: MHC „model system“ + overall improvement of Genome inference accuracy + complex-locus haplotyping • Incorporate LD? • Middle ground between full graph alignment and linear sequence alignment? • Ribosomal DNA – let me know if you‘re also interested!
  • 27. Acknowledgements NIH Adam Phillippy Sergey Koren Brian Walenz Jung-Hyun Kim Vladimir Larionov Oxford Gil McVean Zam Iqbal Alexander Mentzer Histogenetics Nezih Cereb UCSF/Nantes Pierre-Antoine Gourraud GSK Matt Nelson Charles Cox