Successfully reported this slideshow.
Your SlideShare is downloading. ×

Ashg grc workshop2014_tg

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
Loading in …3
×

Check these out next

1 of 25 Ad

More Related Content

Slideshows for you (20)

Similar to Ashg grc workshop2014_tg (20)

Advertisement

More from Genome Reference Consortium (16)

Ashg grc workshop2014_tg

  1. 1. ASHG - GRC Workshop Tina Lindsay ASHG Oct 18, 2014
  2. 2. The Human Reference is Not Complete • Reference has been found to not be optimal in some regions • Structural variation makes it difficult to assemble a truly representative genome when using a diploid sample • Some regions were recalcitrant to closure with technology and resources available at the time • Additional sequences are needed to capture the full range of diversity in humans
  3. 3. UGT2B17 – Conflicting Alleles AC074378.4 AC079749.5 AC147055.2 AC134921.2 AC140484.1 AC019173.4 AC093720.2 AC021146.7 NCBI36 NC_000004.10 (chr4) Tiling Path TMPRSS11E TMPRSS11E2 Xue Y et al, 2008 GRCh37 NC_000004.11 (chr4) Tiling Path AC074378.4 AC079749.5 AC147055.2 AC134921.1 AC093720.2 AC021146.7 TMPRSS11E GRCh37: NT_167250.1 (UGT2B17 alternate locus) AC074378.4 AC140484.1 AC019173.4 AC226496.2 AC021146.7 TMPRSS11E2 G A P
  4. 4. Allelic Diversity vs. Segmental Duplication A A C T C G C C Repeat Copies (noted by color difference) Allelic Copies Diploid Genome With a diploid genome, there is significant ambiguity sorting allelic copies from repeat copies Haploid Genome A C C C Repeat Copies (ONLY but noted by color difference) With a haploid genome, allelic differences are eliminated, and base differences are likely indicative of repeat copies
  5. 5. Hydatidiform mole 1. Fertilization of an oocyte without a nucleus 2. Post-zygotic diploidization of triploid zygotes 23x 23X 23X 23X ? Oocyte Androgenetic HM
  6. 6. Initial Use Of CHM1 Source • CHORI-17 BAC Library • CHORI-17 BAC end sequences (n=325,659) • CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs) • CHORI-17 BACs • > 750 have been sequenced • 590 of them in Genbank as phase 3
  7. 7. SRGAP2 Homology between genes Shows nearly identical segments between SRGAP2A and SRGAP2 paralogs Shows homology between SRGAP2B and SRGAP2C SRGAP2A SRGAP2B SRGAP2C Dennis, et.al. 2012
  8. 8. 1q21 1q32 1q21 1p21 1q21 patch alignment to chromosome 1
  9. 9. IGH Region Highlights Allelic Differences Watson, et. al., 2013
  10. 10. Williams-Beuren Syndrome region Slide courtesy of Megan Dennis
  11. 11. Current status of CHM1 resources • CHORI-17 BAC Library (created from CHM1 cell line) • CHORI-17 BAC end sequences (n=325,659) • CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs) • CHORI-17 BACs (>750 have been sequenced, with 592 of them in Genbank as phase 3) • Active cell line • >100X coverage Illumina 100bp reads • 300, 500bp, 3kb inserts • Reference assisted assembly CHM1_1.1 • BioNano genome map • >50X coverage of PacBio long read data
  12. 12. CHM1_1.1 Assembly • Reference-guided assembly – SRPRISM v2.3, R. Agarwala • Alignment of Illumina reads to GRCh37 primary assembly • CHORI-17 BAC clone tilepaths were then incorporated • 428 total clones • 324 clones in 45 tilepaths • 104 clones as singletons • Comparison back to GRCh37 reference to provide appropriate gaps sizes • Assembly submitted to Genbank • http://www.ncbi.nlm.nih.gov/assembly/GCF_000306695.2 • Paper to be published soon • Genome Research (in press) • biorxiv doi (doi: http://dx.doi.org/10.1101/006841)
  13. 13. CHM1_1.1 Assembly Total Sequence Length 3,037,866,619 bp Total Assembly Gap Length 210,229,812 bp Number of Scaffolds 163 Scaffold N50 50,362,920 bp Number of Contigs 40,828 Contig N50 143,936 bp CHM1_1.1 GRCh3 7
  14. 14. Incorporation of CHM1_1.1 Assembly Data in GRCh38
  15. 15. PacBio CHM1 Assembly potentially fills GRCh38 Gaps GRCh38 PacBio CHM1
  16. 16. PacBio CHM1 Assembly Shows Data Not in GRCH38 GRCh38 PacBio CHM1 Second Pass Alignment
  17. 17. CHM1 BioNano Genome Map Aligned to GRCh38 GRCh38 CHM1 BioNano Map ~15kb additional data
  18. 18. BioNano SV Calls Identified a Assembly Problems Collapse Expansion in Assembly CHM1_1.1 Assembly Gap in Sequence CHM1 BioNano Map
  19. 19. Collapse in Sequence Data Thought to be missing ~100kb in sequenced clones GRCh38
  20. 20. Gap Sizing Chr8 – Stalled Gap Estimated at ~150kb GRCh38 Sized using CHM1 Genome Map - >500 Kb
  21. 21. Future of CHM1 Assembly • Plan to make as contiguous and accurate as possible • Incorporate PacBio assembly where possible • Additional CH17 clones being sequenced through segmentally duplicated and structurally variant regions to provide local assembly benefits (isolates the repeats)
  22. 22. CYP2D6 – Providing Alternate Alleles ABC7 (NA18517) ABC8 (NA18507) ABC9 (NA18956) ABC11 (NA18555)
  23. 23. Future Directions • Continued Improvement on CHM1 Genome • Integration of Pacific Bioscience whole genome assembly • BioNano genome map data • Continue to add diversity to the reference by sequencing new samples that provide additional diversity than what is currently represented in GRCh38 • Continued sequencing of CH17 single haplotype BAC tilepaths to better represent segmentally duplicated regions • Additional collaborations with the community to develop tools to more fully utilize the full reference assembly (alternate haplotypes)
  24. 24. Acknowledgements The Genome Institute at Washington University in St. Louis Rick Wilson Bob Fulton Wes Warren Karyn Meltz Steinberg Vince Magrini Derek Albracht Milinn Kremitzki Susan Rock Debbie Scheer Aye Wollam The Finishing and Bioinformatics Teams at The Genome Institute University of Washington Evan Eichler Megan Dennis Xander Nuttler NCBI Richa Argwala Valerie Schneider University of Pittsburgh School of Medicine (CHM1 cell line) Urvashi Surti Personalis Deanna Church BioNano Genomics Pacific Biosciences UCSF Pui-Yan Kwok Yvonne Lai Chin Lin CHORI Catherine Chu Pieter de Jong

×