Successfully reported this slideshow.
Your SlideShare is downloading. ×

2018 1016 trio_binning_ashg_arhie_final

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Ashg2017 workshop schneider
Ashg2017 workshop schneider
Loading in …3
×

Check these out next

1 of 24 Ad

More Related Content

Slideshows for you (20)

Similar to 2018 1016 trio_binning_ashg_arhie_final (20)

Advertisement

More from Genome Reference Consortium (17)

Recently uploaded (20)

Advertisement

2018 1016 trio_binning_ashg_arhie_final

  1. 1. Arang Rhie Adam Phillippy’s Group Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI De Novo Assembly of Haplotype-Resolved Genomes and Building a Human Pan-Genome Reference @ArangRhie
  2. 2. The genome assembly problem
  3. 3. The diploid genome assembly problem Diploid genome Smashed Assembly Phased (haploid) assembly phasing ? De novo: From scratch, without looking at the original picture (reference) Sequenced reads sequencing assembling Pseudo-haplotype + alts
  4. 4. Why assemble genomes again, de novo?
  5. 5. Asian specific insertions and the frequency, found from AK1 Under-Represented Variations in GRCh38 Seo, Rhie, Kim, and Lee et al., De novo assembly and phasing of a Korean human genome, Nature (2016)
  6. 6. Identify haplotype differences A B • CYP2D6 is involved in metabolizing >50% of available drugs • Genetic variation and copy number affects drug efficacy CYP2D6*10: Intermediate ~ poor metabolizer CYP2D6*2: Extensive metabolizer Seo, Rhie, Kim, and Lee et al., De novo assembly and phasing of a Korean human genome, Nature (2016) Chr. 22
  7. 7. Can we phase across the whole chromosomes? Seo, Rhie, Kim, and Lee et al., De novo assembly and phasing of a Korean human genome, Nature (2016)
  8. 8. Complete haplotype-resolved assemblies with trio binning
  9. 9. The diploid genome assembly problem Diploid genome Smashed Assembly Phased (haploid) assembly phasing ? De novo: From scratch, without looking at the original picture (reference) Sequenced reads sequencing assembling Complete haplotypes
  10. 10. The diploid genome assembly problem Diploid genome Paternal assembly ? De novo: From scratch, without looking at the original picture (reference) Phased reads sequencing assembling Phased reads Maternal assembly assembling
  11. 11. Trio binning with parental k-mers Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) Paternal haplotigs Maternal haplotigs • K-mer profiling of each parent (Illumina, 60x) Paternal k-mers Maternal k-mers • K-mer profiling of the child (PacBio, 120x) Child Paternal Maternal 49.6% (67.3x) 10.9 kb 49.3% (66.9x) 11.7 kb 1.1% (1.4x), avg 1.3 kb Paternal reads Maternal reads • Childs’ read binning and assembling canu
  12. 12. Robust for a wide range of heterozygosity 0.8% 1.2% 1.6%0.9% *Heterozygosity level estimated with GenomeScope 1.5% 0.12 % 0.20 % 0.29 % NA12878 (CEU) F HG00733 (PUR) F NA19240 (YRI) F HG002 (Ashkenazi) M Platform PacBio (WashU) PacBio 60kb (20kb) PacBio (WashU) PacBio 15kb CCS Haplotype (Cov.) Maternal (32+9x) Paternal (31+9x) Maternal (44.6x) Paternal (43.6x) Maternal (37x) Paternal (31x) Maternal (11+8x) Paternal (11+8x) NG50 (Mb) 1.2 1.2 19.1 23.9 9.0 3.0 20.1 16.8 0.17 %
  13. 13. A nearly perfect diploid genome 125x PacBio coverage (~60x per haplotype), TrioCanu haplotig NG50 ~70 Mbp, BUSCOs 94% Maternal (yak)Paternal (highland) Esperanza GRCh38
  14. 14. 1 4 Human Pan-Genome Project Population: http://www.internationalgenome.org/ Initiative to collect diverse, high-quality haplotypes with trio binning • Illumina WGS for the parents, PacBio and Nanopore for the child • Pilot 10 trios selected to maximize non-ref haplotype AF 2 PUR 1 KHV 3 ACB 1 MSL 1 PJL 1 GWD1 CLM 5 African 3 American 1 East Asian 1 South Asian
  15. 15. What can you see from a phased assembly? Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) 0
  16. 16. Phasing the MHC region Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) Maternal Paternal
  17. 17. • Diploid assembly is solved by trios Trio binning is current best practice All levels of assembly quality improved Complete haplotypes will become the new norm • A human pan-genome reference A collection of diverse, high-quality haplotypes Including complex heterozygous SVs Summary
  18. 18. VGP GenomeArk: 1st data release https://vgp.github.io/genomeark Jennifer Vashon of Maine Department of Inland Fisheries and Wildlife, left, and UMass lynx team coordinator, Tanya Lama, with an adult male lynx from northern Maine whose DNA was used to create first-ever whole genome for the species. The lynx has since been released to the wild. (MassWildlife photo / Bill Byrne)
  19. 19. Acknowledgements genomeinformatics.github.io • Adam Phillippy • Sergey Koren • Brian Walenz • Alexander Dilthey • Brian Ondov • Jay Ghurye Korean (AK1) Jeong-Sun Seo Changhoon Kim Junsoo Kim Sangjin Lee Tim Smith John Williams Cattle/pigs Pan-Genome Karen Miga Benedict Paten NIH NHGRI NISC VGP Assembly Working Group Erich Jarvis Richard Durbin Gene Myers Kerstin Howe Harris Lewin Olivier Fedrigo Shane McCarthy Martin Pippel Will Chow Joana Damas PacBio CCS Michael Hunkapiller Paul Peluso David Rank We are hiring! Trio binning is available in https://github.com/marbl/canu
  20. 20. Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) 21 Pseudo-haplotype + alts Complete haplotypes Assembly Graph Smashed haplotypes
  21. 21. Trio-binning outperforms FALCON-Unzip Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) Primary = Longest path in the graph (pseudo-hap) Alternate haplotigs = Alternate path in the bubble Haplotigs = Contigs in each assembly agree with parental haplotypes (Phased) TrioCanu FALCON-unzip Angusspecifick-mercounts Angusspecifick-mercounts Brahman specific k-mer countsBrahman specific k-mer counts
  22. 22. Phasing NA12878 Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) TrioCanu FALCON-UnzipSupernova
  23. 23. Phasing the F1 Cattle Kronenberg and Kingan et al., FALCON-Phase: Integrating PacBio and Hi-C data for phased diploid genomes, bioRxiv (2018) 0 1,000,000 2,000,000 3,000,000 0 1,000,000 2,000,000 3,000,000 Brahman Angus Contig Size 20,000,000 40,000,000 60,000,000 Contig Hap1 Hap2 Contig Hap1 Hap2 0 1,000,000 2,000,000 3,000,000 0 1,000,000 2,000,000 3,000,000 Brahman Angus Contig Size 20,000,000 40,000,000 60,000,000 80,000,000 Assembly Angus Brahman Assembly Angus Brahman TrioCanu FALCON-Unzip FALCON-Phase

Editor's Notes

  • Before phasing, short reads indicated a copy gain in CYP2D6
    After phasing, we identified that the duplicated copy of CYP2D6 was fused with the last exon of CYP2D7 on haplotype B
  • ref allele = #1
    weight by non-ref allele’s global AF
  • Black are typed genes, correct call for both haplotypes, all in phase. 1 indel in the DQB1. Confirms expected missing DRB3 in mother, presence in father but also shows there is other sequence there not a simple deletion

×