Successfully reported this slideshow.
Your SlideShare is downloading. ×

Ashg2017 workshop schneider

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 31 Ad

More Related Content

Slideshows for you (20)

Similar to Ashg2017 workshop schneider (20)

Advertisement

More from Genome Reference Consortium (15)

Recently uploaded (20)

Advertisement

Ashg2017 workshop schneider

  1. 1. GRC/GIAB Workshop: Getting the Most from the Reference Assembly and Reference Materials Oct 17: 1-4 pm Valerie Schneider (NCBI): GRCh38 assembly basics and updates Tina Lindsay (MGI): Reference-grade human assemblies Karen Miga (UCSC): Centromere assemblies BREAK (15 min) Benedict Paten (UCSC): Building human variation graphs Fritz Sedlazeck (BCM): Structural Variation Characterization Across the Human Genome and Populations Justin Zook (NIST): GIAB benchmarks for difficult variants
  2. 2. GRCh38 assembly basics and updates Valerie Schneider, Ph.D. NCBI 17 October 2017 https://genomereference.org
  3. 3. https://genomereference.org Twitter: @GenomeRef Announcements: grc-announce@ncbi.nlm.nih.gov
  4. 4. • Assembly basics • GRCh38 updates • Taking advantage of the data Outline
  5. 5. Assembly Basics
  6. 6. Reference Assembly Basics (For updated assemblies, only date of initial submission is counted) Other assemblies GRCh38 (reference)
  7. 7. Sanger-seq’d, clone-based assembly BAC insert BAC vector Shotgun sequence clone Assemble clone GAPS Finish (via PCR) Minimal Clone Tiling Path Define consensus from switch points of adjacent clones Consequences: • Highly contiguous • High sequence accuracy (<10-5) • Haploid mosaic Ordering the Path Fingerprint maps Genetic linkage maps Radiation hybrid maps Reference Assembly Basics
  8. 8. HuRef SOAPdenovo NA12878 ALLPATHS NA12878 Lander and Waterman (1988) Genomics SequencedNot sequenced 1X Coverage 5X Coverage 10X Coverage 37% 63% 0.6% 99.4% 0.005% 99.995% The likelihood a base is seq’d.Coverage Contig N50 MHAP CHM1 Chaisson and Eichler (2015), with modification Measure of contiguity. Half of the assembly is in contigs this length or greater. Reference Assembly Basics AK1 HX1 NA12878_prelim
  9. 9. Why all this matters: Longer haplotype blocks Fewer collapsed repeats & segmental duplications Better annotation More robust mapping target Reference Assembly Basics
  10. 10. Today’s reference assembly does not represent: 1.The most common allele/haplotype 2.The longest allele/haplotype 3.The ancestral allele/haplotype It represents the sequence available from the HGP Reference Assembly Basics
  11. 11. Gene1 Gene2 Gene1 Sample Ref Assembly Reference assembly influence Slide Credit: Deanna Church Reference Assembly Basics 75 % off-target alignments 25% no alignment chromosome variant PLoS Biology (Jul 5, 2011)
  12. 12. Sequences from haplotype 1 Sequences from haplotype 2 Reference Assembly Basics Original assembly model: compress into a consensus false gap chromosome Current assembly model: represent both haplotypes alt loci scaffold chromosomemany Gene1 Gene2 Sample Gene2 Gene1 chromosome alt scaffold Reference
  13. 13. GRCh38 (Dec. 2013) • 178 regions with alt loci: 2% of chromosome sequence (61.9 Mb) • 261 Alt Loci: 3.6 Mb novel sequence relative to chromosomes • Average alt length = 400 kb, max = ~5 Mb • >150 genes only represented on alt loci Reference Assembly Basics
  14. 14. Reference Assembly Basics • Closed gaps • Targeted base fixes • Corrected path errors • Addition of missing paralogs • Better representation of variation • Better annotation • Modeled centromeres • Genome Research 27(5):849-864 (2017) • PubMed: 28396521 GRCh38 • Changed coordinates • Remapping challenges • Alt Loci Usability • Allelic duplication/Aligners • Reporting multiple locations • Variant analysis • Clinical validation 2016 Growth in SRA submission over prior year GRCh38 GRCh37
  15. 15. Outline • Assembly basics • GRCh38 updates • Taking advantage of the data
  16. 16. GRCh38 Updates GRCh38: Dec. 2013 (n=1797) (n=1396) (n=401)
  17. 17. GRCh38 Updates (rare allele analysis)
  18. 18. GRCh38 Updates chromosome novel patch scaffold alt loci scaffold chromosome fix patch scaffold Patch release: No change to chromosome coordinates Assembly nomenclature: GRCh38.p$ GRCh38.p11 • 64 FIX, 59 NOVEL • Added >1.5 Mb novel sequence • >20 genes affected
  19. 19. GRCh38 Updates GRCh38: 5S rRNA cluster under-represented (19 copies) GRCh38 patch: 5S rRNA cluster valid representation (35 copies) Poster 423F (11:30-12:30) Updates to the human reference genome assembly Tayebeh Rezaie
  20. 20. GRCh38 Updates • Ideals: • Chromosome context for any common human sequence >500 bp • Unambiguous data interpretation at all clinically relevant loci • No systematic error/bias in genome-wide analyses • Real-World: • Community interest • Resources for curation • GRCh39 • Substantial added value • User must-haves
  21. 21. Outline • Assembly basics • GRCh38 & updates • Taking advantage of the data
  22. 22. Accessing the Data Assembly Stats https://genomereference.org
  23. 23. Accessing the Data
  24. 24. Accessing the Data
  25. 25. Accessing the Data https://www.ncbi.nlm.nih.gov/genome/gdv/ Learn more about GDV: Data CoLab #159 Weds 10:30-11:00 Poster 1531W Weds 2:00-3:00
  26. 26. Accessing the Data Assembly Support Track Set
  27. 27. Accessing the Data http://www.ensembl.org/GRC Tracks
  28. 28. Accessing the Data ftp://ngs.sanger.ac.uk/production/grit/track_hub/hub.txt
  29. 29. Outline • Assembly basics • GRCh38 updates • Taking advantage of the data
  30. 30. Credits GRCh38 Collaborators • NCBI RefSeq and gpipe annotation team • Havana annotators • Karen Miga • Karyn Meltz Steinberg • David Schwartz • Steve Goldstein • Mario Caceres • Giulio Genovese • Jeff Kidd • Peter Lansdorp • Mark Hills • David Page • Jim Knight • Stephan Schuster • 1000 Genomes GRC SAB • Rick Myers • Granger Sutton • Evan Eichler • Jim Kent • Roderic Guigo • Carol Bult • Derek Stemple • Jan Korbel • Liz Worthey • Matthew Hurles • Richard Gibbs GRC Tina Graves-Lindsay Tayebeh Rezaie Kerstin Howe Richard Durbin Paul Flicek Laura Clarke Deanna Church Curators! Developers!

×