SlideShare a Scribd company logo
Arang Rhie
Adam Phillippy’s Group
Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI
De Novo Assembly of Haplotype-Resolved Genomes
and Building a Human Pan-Genome Reference
@ArangRhie
The genome assembly problem
The diploid genome assembly problem
Diploid genome
Smashed Assembly
Phased (haploid) assembly
phasing
?
De novo: From scratch,
without looking at the
original picture
(reference)
Sequenced reads
sequencing assembling
Pseudo-haplotype + alts
Why assemble genomes again,
de novo?
Asian specific insertions and the frequency, found from AK1
Under-Represented Variations in GRCh38
Seo, Rhie, Kim, and Lee et al., De novo assembly and phasing of a Korean human genome, Nature (2016)
Identify haplotype differences
A
B
• CYP2D6 is involved in metabolizing >50% of available drugs
• Genetic variation and copy number affects drug efficacy
CYP2D6*10: Intermediate ~ poor metabolizer
CYP2D6*2: Extensive metabolizer
Seo, Rhie, Kim, and Lee et al., De novo assembly and phasing of a Korean human genome, Nature (2016)
Chr. 22
Can we phase across the whole chromosomes?
Seo, Rhie, Kim, and Lee et al., De novo assembly and phasing of a Korean human genome, Nature (2016)
Complete haplotype-resolved
assemblies with trio binning
The diploid genome assembly problem
Diploid genome
Smashed Assembly
Phased (haploid) assembly
phasing
?
De novo: From scratch,
without looking at the
original picture
(reference)
Sequenced reads
sequencing assembling
Complete haplotypes
The diploid genome assembly problem
Diploid genome
Paternal assembly
?
De novo: From scratch,
without looking at the
original picture
(reference)
Phased reads
sequencing assembling
Phased reads
Maternal assembly
assembling
Trio binning with parental k-mers
Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018)
Paternal haplotigs
Maternal haplotigs
• K-mer profiling of each parent (Illumina, 60x)
Paternal
k-mers
Maternal
k-mers
• K-mer profiling of the child (PacBio, 120x)
Child
Paternal Maternal
49.6%
(67.3x)
10.9 kb
49.3%
(66.9x)
11.7 kb
1.1% (1.4x), avg 1.3 kb
Paternal reads Maternal reads
• Childs’ read binning and assembling
canu
Robust for a wide range of heterozygosity
0.8% 1.2% 1.6%0.9%
*Heterozygosity level estimated with GenomeScope
1.5%
0.12 % 0.20 % 0.29 %
NA12878 (CEU) F HG00733 (PUR) F NA19240 (YRI) F HG002 (Ashkenazi) M
Platform PacBio (WashU) PacBio 60kb (20kb) PacBio (WashU) PacBio 15kb CCS
Haplotype
(Cov.)
Maternal
(32+9x)
Paternal
(31+9x)
Maternal
(44.6x)
Paternal
(43.6x)
Maternal
(37x)
Paternal
(31x)
Maternal
(11+8x)
Paternal
(11+8x)
NG50 (Mb) 1.2 1.2 19.1 23.9 9.0 3.0 20.1 16.8
0.17 %
A nearly perfect diploid genome
125x PacBio coverage (~60x per haplotype), TrioCanu haplotig NG50 ~70 Mbp, BUSCOs 94%
Maternal (yak)Paternal (highland) Esperanza
GRCh38
1
4
Human Pan-Genome Project
Population: http://www.internationalgenome.org/
Initiative to collect diverse, high-quality haplotypes with trio binning
• Illumina WGS for the parents, PacBio and Nanopore for the child
• Pilot 10 trios selected to maximize non-ref haplotype AF
2 PUR
1 KHV
3 ACB
1 MSL
1 PJL
1 GWD1 CLM
5 African
3 American
1 East Asian
1 South Asian
What can you see from a phased assembly?
Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018)
0
Phasing the MHC region
Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018)
Maternal
Paternal
• Diploid assembly is solved by trios
Trio binning is current best practice
All levels of assembly quality improved
Complete haplotypes will become the new norm
• A human pan-genome reference
A collection of diverse, high-quality haplotypes
Including complex heterozygous SVs
Summary
VGP GenomeArk: 1st data release
https://vgp.github.io/genomeark
Jennifer Vashon of Maine Department of Inland Fisheries and
Wildlife, left, and UMass lynx team coordinator, Tanya Lama,
with an adult male lynx from northern Maine whose DNA was
used to create first-ever whole genome for the species. The
lynx has since been released to the wild. (MassWildlife photo
/ Bill Byrne)
Acknowledgements
genomeinformatics.github.io
• Adam Phillippy
• Sergey Koren
• Brian Walenz
• Alexander Dilthey
• Brian Ondov
• Jay Ghurye
Korean (AK1)
Jeong-Sun Seo
Changhoon Kim
Junsoo Kim
Sangjin Lee
Tim Smith
John Williams
Cattle/pigs
Pan-Genome
Karen Miga
Benedict Paten
NIH NHGRI NISC
VGP Assembly
Working Group
Erich Jarvis
Richard Durbin
Gene Myers
Kerstin Howe
Harris Lewin
Olivier Fedrigo
Shane McCarthy
Martin Pippel
Will Chow
Joana Damas
PacBio CCS
Michael Hunkapiller
Paul Peluso
David Rank
We are hiring!
Trio binning is available in https://github.com/marbl/canu
Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) 21
Pseudo-haplotype + alts
Complete haplotypes
Assembly Graph
Smashed haplotypes
Trio-binning outperforms FALCON-Unzip
Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018)
Primary = Longest path in the graph (pseudo-hap)
Alternate haplotigs = Alternate path in the bubble
Haplotigs = Contigs in each assembly
agree with parental haplotypes (Phased)
TrioCanu FALCON-unzip
Angusspecifick-mercounts
Angusspecifick-mercounts
Brahman specific k-mer countsBrahman specific k-mer counts
Phasing NA12878
Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018)
TrioCanu FALCON-UnzipSupernova
Phasing the F1 Cattle
Kronenberg and Kingan et al.,
FALCON-Phase: Integrating PacBio and Hi-C data for phased diploid genomes, bioRxiv (2018)
0
1,000,000
2,000,000
3,000,000
0 1,000,000 2,000,000 3,000,000
Brahman
Angus
Contig Size
20,000,000
40,000,000
60,000,000
Contig
Hap1
Hap2
Contig
Hap1
Hap2
0
1,000,000
2,000,000
3,000,000
0 1,000,000 2,000,000 3,000,000
Brahman
Angus
Contig Size
20,000,000
40,000,000
60,000,000
80,000,000
Assembly
Angus
Brahman
Assembly
Angus
Brahman
TrioCanu FALCON-Unzip FALCON-Phase

More Related Content

What's hot

hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)Shaojun Xie
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCGenome Reference Consortium
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
Theory and practice of graphical population analysis
Theory and practice of graphical population analysisTheory and practice of graphical population analysis
Theory and practice of graphical population analysisGenome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonGenome Reference Consortium
 

What's hot (20)

Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Theory and practice of graphical population analysis
Theory and practice of graphical population analysisTheory and practice of graphical population analysis
Theory and practice of graphical population analysis
 
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
 
agbt 2016 workshop church
agbt 2016 workshop churchagbt 2016 workshop church
agbt 2016 workshop church
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
150224 grc kms
150224 grc kms150224 grc kms
150224 grc kms
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
Ashg2015 grc-pruitt
Ashg2015 grc-pruittAshg2015 grc-pruitt
Ashg2015 grc-pruitt
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 

Similar to 2018 1016 trio_binning_ashg_arhie_final

HHMI Research poster -6-9-2014 Bipolar
HHMI Research poster -6-9-2014 BipolarHHMI Research poster -6-9-2014 Bipolar
HHMI Research poster -6-9-2014 BipolarHana (Hoang) Willner
 
Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsgroovescience
 
How giab fits in the rest of the world human genome structural variation co...
How giab fits in the rest of the world   human genome structural variation co...How giab fits in the rest of the world   human genome structural variation co...
How giab fits in the rest of the world human genome structural variation co...GenomeInABottle
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleGenomeInABottle
 
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainIowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainAdina Chuang Howe
 
Credit seminar on rice genomics crrected
Credit seminar on rice genomics crrectedCredit seminar on rice genomics crrected
Credit seminar on rice genomics crrectedVarsha Gayatonde
 
SNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti LabSNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti Labjsrep91
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics TechnologiesSean Davis
 
Munne et al ASRM 2009 Abstract O6
Munne et al ASRM 2009 Abstract O6Munne et al ASRM 2009 Abstract O6
Munne et al ASRM 2009 Abstract O6smunne
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotesc.titus.brown
 
The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...Borlaug Global Rust Initiative
 
PGT-A (Pre-implantation genetic testing).pptx
PGT-A (Pre-implantation genetic testing).pptxPGT-A (Pre-implantation genetic testing).pptx
PGT-A (Pre-implantation genetic testing).pptxexomeunipath
 

Similar to 2018 1016 trio_binning_ashg_arhie_final (20)

Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
HHMI Research poster -6-9-2014 Bipolar
HHMI Research poster -6-9-2014 BipolarHHMI Research poster -6-9-2014 Bipolar
HHMI Research poster -6-9-2014 Bipolar
 
Synthetic biology
Synthetic biologySynthetic biology
Synthetic biology
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Big Data Field Museum
Big Data Field MuseumBig Data Field Museum
Big Data Field Museum
 
Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traits
 
How giab fits in the rest of the world human genome structural variation co...
How giab fits in the rest of the world   human genome structural variation co...How giab fits in the rest of the world   human genome structural variation co...
How giab fits in the rest of the world human genome structural variation co...
 
CE-Symm jLBR talk
CE-Symm jLBR talkCE-Symm jLBR talk
CE-Symm jLBR talk
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainIowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
 
Credit seminar on rice genomics crrected
Credit seminar on rice genomics crrectedCredit seminar on rice genomics crrected
Credit seminar on rice genomics crrected
 
SNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti LabSNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti Lab
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
Church isca2012
Church isca2012Church isca2012
Church isca2012
 
Munne et al ASRM 2009 Abstract O6
Munne et al ASRM 2009 Abstract O6Munne et al ASRM 2009 Abstract O6
Munne et al ASRM 2009 Abstract O6
 
Animal Epigenetics
Animal Epigenetics Animal Epigenetics
Animal Epigenetics
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
 
The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...
 
PGT-A (Pre-implantation genetic testing).pptx
PGT-A (Pre-implantation genetic testing).pptxPGT-A (Pre-implantation genetic testing).pptx
PGT-A (Pre-implantation genetic testing).pptx
 

More from Genome Reference Consortium

The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsGenome Reference Consortium
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGenome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 

More from Genome Reference Consortium (17)

Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Everyday de novo assembly
Everyday de novo assemblyEveryday de novo assembly
Everyday de novo assembly
 

Recently uploaded

KOCH'S POSTULATE: an extensive over view.pptx
KOCH'S POSTULATE: an extensive over view.pptxKOCH'S POSTULATE: an extensive over view.pptx
KOCH'S POSTULATE: an extensive over view.pptxOmoniyiDayo
 
GBSN - Microbiology Lab 1 (Microbiology Lab Safety Procedures)
GBSN -  Microbiology Lab  1 (Microbiology Lab Safety Procedures)GBSN -  Microbiology Lab  1 (Microbiology Lab Safety Procedures)
GBSN - Microbiology Lab 1 (Microbiology Lab Safety Procedures)Areesha Ahmad
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Sérgio Sacani
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxmuralinath2
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent Universitypablovgd
 
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPirithiRaju
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Sérgio Sacani
 
Topography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of BengalTopography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of BengalMd Hasan Tareq
 
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Sérgio Sacani
 
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Sérgio Sacani
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!University of Hertfordshire
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsSérgio Sacani
 
Microbial bio Synthesis of nanoparticles.pptx
Microbial bio Synthesis of nanoparticles.pptxMicrobial bio Synthesis of nanoparticles.pptx
Microbial bio Synthesis of nanoparticles.pptxCherry
 
A Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthA Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthSérgio Sacani
 
Hemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. MuralinathHemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. Muralinathmuralinath2
 
The Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdfThe Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdfMohamed Said
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Sérgio Sacani
 
Erythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C KalyanErythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C Kalyanmuralinath2
 
Application of Mass Spectrometry In Biotechnology
Application of Mass Spectrometry In BiotechnologyApplication of Mass Spectrometry In Biotechnology
Application of Mass Spectrometry In BiotechnologyBhanu Krishan
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversitySteffi Friedrichs
 

Recently uploaded (20)

KOCH'S POSTULATE: an extensive over view.pptx
KOCH'S POSTULATE: an extensive over view.pptxKOCH'S POSTULATE: an extensive over view.pptx
KOCH'S POSTULATE: an extensive over view.pptx
 
GBSN - Microbiology Lab 1 (Microbiology Lab Safety Procedures)
GBSN -  Microbiology Lab  1 (Microbiology Lab Safety Procedures)GBSN -  Microbiology Lab  1 (Microbiology Lab Safety Procedures)
GBSN - Microbiology Lab 1 (Microbiology Lab Safety Procedures)
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
 
Topography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of BengalTopography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of Bengal
 
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
 
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
 
Microbial bio Synthesis of nanoparticles.pptx
Microbial bio Synthesis of nanoparticles.pptxMicrobial bio Synthesis of nanoparticles.pptx
Microbial bio Synthesis of nanoparticles.pptx
 
A Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthA Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on Earth
 
Hemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. MuralinathHemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. Muralinath
 
The Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdfThe Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdf
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
 
Erythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C KalyanErythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C Kalyan
 
Application of Mass Spectrometry In Biotechnology
Application of Mass Spectrometry In BiotechnologyApplication of Mass Spectrometry In Biotechnology
Application of Mass Spectrometry In Biotechnology
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 

2018 1016 trio_binning_ashg_arhie_final

  • 1. Arang Rhie Adam Phillippy’s Group Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI De Novo Assembly of Haplotype-Resolved Genomes and Building a Human Pan-Genome Reference @ArangRhie
  • 3. The diploid genome assembly problem Diploid genome Smashed Assembly Phased (haploid) assembly phasing ? De novo: From scratch, without looking at the original picture (reference) Sequenced reads sequencing assembling Pseudo-haplotype + alts
  • 4. Why assemble genomes again, de novo?
  • 5. Asian specific insertions and the frequency, found from AK1 Under-Represented Variations in GRCh38 Seo, Rhie, Kim, and Lee et al., De novo assembly and phasing of a Korean human genome, Nature (2016)
  • 6. Identify haplotype differences A B • CYP2D6 is involved in metabolizing >50% of available drugs • Genetic variation and copy number affects drug efficacy CYP2D6*10: Intermediate ~ poor metabolizer CYP2D6*2: Extensive metabolizer Seo, Rhie, Kim, and Lee et al., De novo assembly and phasing of a Korean human genome, Nature (2016) Chr. 22
  • 7. Can we phase across the whole chromosomes? Seo, Rhie, Kim, and Lee et al., De novo assembly and phasing of a Korean human genome, Nature (2016)
  • 9. The diploid genome assembly problem Diploid genome Smashed Assembly Phased (haploid) assembly phasing ? De novo: From scratch, without looking at the original picture (reference) Sequenced reads sequencing assembling Complete haplotypes
  • 10. The diploid genome assembly problem Diploid genome Paternal assembly ? De novo: From scratch, without looking at the original picture (reference) Phased reads sequencing assembling Phased reads Maternal assembly assembling
  • 11. Trio binning with parental k-mers Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) Paternal haplotigs Maternal haplotigs • K-mer profiling of each parent (Illumina, 60x) Paternal k-mers Maternal k-mers • K-mer profiling of the child (PacBio, 120x) Child Paternal Maternal 49.6% (67.3x) 10.9 kb 49.3% (66.9x) 11.7 kb 1.1% (1.4x), avg 1.3 kb Paternal reads Maternal reads • Childs’ read binning and assembling canu
  • 12. Robust for a wide range of heterozygosity 0.8% 1.2% 1.6%0.9% *Heterozygosity level estimated with GenomeScope 1.5% 0.12 % 0.20 % 0.29 % NA12878 (CEU) F HG00733 (PUR) F NA19240 (YRI) F HG002 (Ashkenazi) M Platform PacBio (WashU) PacBio 60kb (20kb) PacBio (WashU) PacBio 15kb CCS Haplotype (Cov.) Maternal (32+9x) Paternal (31+9x) Maternal (44.6x) Paternal (43.6x) Maternal (37x) Paternal (31x) Maternal (11+8x) Paternal (11+8x) NG50 (Mb) 1.2 1.2 19.1 23.9 9.0 3.0 20.1 16.8 0.17 %
  • 13. A nearly perfect diploid genome 125x PacBio coverage (~60x per haplotype), TrioCanu haplotig NG50 ~70 Mbp, BUSCOs 94% Maternal (yak)Paternal (highland) Esperanza GRCh38
  • 14. 1 4 Human Pan-Genome Project Population: http://www.internationalgenome.org/ Initiative to collect diverse, high-quality haplotypes with trio binning • Illumina WGS for the parents, PacBio and Nanopore for the child • Pilot 10 trios selected to maximize non-ref haplotype AF 2 PUR 1 KHV 3 ACB 1 MSL 1 PJL 1 GWD1 CLM 5 African 3 American 1 East Asian 1 South Asian
  • 15. What can you see from a phased assembly? Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) 0
  • 16. Phasing the MHC region Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) Maternal Paternal
  • 17. • Diploid assembly is solved by trios Trio binning is current best practice All levels of assembly quality improved Complete haplotypes will become the new norm • A human pan-genome reference A collection of diverse, high-quality haplotypes Including complex heterozygous SVs Summary
  • 18. VGP GenomeArk: 1st data release https://vgp.github.io/genomeark Jennifer Vashon of Maine Department of Inland Fisheries and Wildlife, left, and UMass lynx team coordinator, Tanya Lama, with an adult male lynx from northern Maine whose DNA was used to create first-ever whole genome for the species. The lynx has since been released to the wild. (MassWildlife photo / Bill Byrne)
  • 19. Acknowledgements genomeinformatics.github.io • Adam Phillippy • Sergey Koren • Brian Walenz • Alexander Dilthey • Brian Ondov • Jay Ghurye Korean (AK1) Jeong-Sun Seo Changhoon Kim Junsoo Kim Sangjin Lee Tim Smith John Williams Cattle/pigs Pan-Genome Karen Miga Benedict Paten NIH NHGRI NISC VGP Assembly Working Group Erich Jarvis Richard Durbin Gene Myers Kerstin Howe Harris Lewin Olivier Fedrigo Shane McCarthy Martin Pippel Will Chow Joana Damas PacBio CCS Michael Hunkapiller Paul Peluso David Rank We are hiring! Trio binning is available in https://github.com/marbl/canu
  • 20.
  • 21. Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) 21 Pseudo-haplotype + alts Complete haplotypes Assembly Graph Smashed haplotypes
  • 22. Trio-binning outperforms FALCON-Unzip Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) Primary = Longest path in the graph (pseudo-hap) Alternate haplotigs = Alternate path in the bubble Haplotigs = Contigs in each assembly agree with parental haplotypes (Phased) TrioCanu FALCON-unzip Angusspecifick-mercounts Angusspecifick-mercounts Brahman specific k-mer countsBrahman specific k-mer counts
  • 23. Phasing NA12878 Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) TrioCanu FALCON-UnzipSupernova
  • 24. Phasing the F1 Cattle Kronenberg and Kingan et al., FALCON-Phase: Integrating PacBio and Hi-C data for phased diploid genomes, bioRxiv (2018) 0 1,000,000 2,000,000 3,000,000 0 1,000,000 2,000,000 3,000,000 Brahman Angus Contig Size 20,000,000 40,000,000 60,000,000 Contig Hap1 Hap2 Contig Hap1 Hap2 0 1,000,000 2,000,000 3,000,000 0 1,000,000 2,000,000 3,000,000 Brahman Angus Contig Size 20,000,000 40,000,000 60,000,000 80,000,000 Assembly Angus Brahman Assembly Angus Brahman TrioCanu FALCON-Unzip FALCON-Phase

Editor's Notes

  1. Before phasing, short reads indicated a copy gain in CYP2D6 After phasing, we identified that the duplicated copy of CYP2D6 was fused with the last exon of CYP2D7 on haplotype B
  2. ref allele = #1 weight by non-ref allele’s global AF
  3. Black are typed genes, correct call for both haplotypes, all in phase. 1 indel in the DQB1. Confirms expected missing DRB3 in mother, presence in father but also shows there is other sequence there not a simple deletion