The human reference genome is incomplete and does not fully represent structural variation. Additional sequences are needed to represent diversity. A hydatidiform mole genome (CHM1) provides an alternate haploid reference with differences from the diploid human reference. The current CHM1 assembly incorporates BAC sequences and Illumina reads. Future work includes improving the assembly using long read technologies and integrating it into the human reference to better represent human variation.
Presentation by Tina Graves-Lindsay at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on production of reference grade assemblies for various human populations.
Presentation at IMGC 2019 workshop describing the latest improvements to the mouse reference genome assembly and analyses performed in preparation for the next release of the mouse genome assembly (GRCm39).
Presentation by Benedict Paten at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on updates to the human reference assembly, GRCh38.
Presentation by Valerie Schneider discussing Genome Reference Consortium (GRC) plans for the mouse and zebrafish reference genome assemblies, presented at the 2016 meeting of the The Allied Genetic Conference (TAGC). Includes description of resources at the National Center for Biotechnology Information (NCBI) for working with reference genome assemblies.
Presentation by Valerie Schneider at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on updates to the human reference assembly, GRCh38.
Platform presentation at ASHG 2019 describing recent updates to the human reference genome assembly (GRCh38) and future plans with relevance to pan-genomic representations.
Presentation by Tina Graves-Lindsay at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on production of reference grade assemblies for various human populations.
Presentation at IMGC 2019 workshop describing the latest improvements to the mouse reference genome assembly and analyses performed in preparation for the next release of the mouse genome assembly (GRCm39).
Presentation by Benedict Paten at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on updates to the human reference assembly, GRCh38.
Presentation by Valerie Schneider discussing Genome Reference Consortium (GRC) plans for the mouse and zebrafish reference genome assemblies, presented at the 2016 meeting of the The Allied Genetic Conference (TAGC). Includes description of resources at the National Center for Biotechnology Information (NCBI) for working with reference genome assemblies.
Presentation by Valerie Schneider at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on updates to the human reference assembly, GRCh38.
Platform presentation at ASHG 2019 describing recent updates to the human reference genome assembly (GRCh38) and future plans with relevance to pan-genomic representations.
Open pacbiomodelorgpaper j_landolin_20150121Jane Landolin
Jane Ladolin's slides on Open Data Paper (http://www.nature.com/articles/sdata201445) presented at Balti and Bioinformatics virtual meeting on Jan. 21st 2015. (http://bit.ly/1KYGxr4)
How to sequence a large eukaryotic genome - and how we sequenced the cod genome. A seminar I gave for the Computational Life Science (Univ. of Oslo) seminar series, September 28, 2011
My talk for the International Genomics session at ABRF 2017. Describing the issues caused by the uncontrolled naming of NGS methods: some examples and some suggestions about how to fix this.
Speaker: Benedict C. S. Cross, PhD, Team leader (Discovery Screening), Horizon Discovery
CRISPR–Cas9 mediated genome editing provides a highly efficient way to probe gene function. Using this technology, thousands of genes can be knocked out and their function assessed in a single experiment. We have conducted over 150 of these complex and powerful screens and will use our experience to guide you through the process of screen design, performance and analysis.
We'll be discussing:
• How to use CRISPR screening for target ID and validation, understanding drug MOA and patient stratification
• The screen design, quality control and how to evaluate success of your screening program
• Horizon’s latest developments to the platform
• Horizon’s novel approaches to target validation screening
Making genome edits in mammalian cellsChris Thorne
Looking at the kind of modifications that can be made in mammalian cells, and how at Horizon moving to a haploid model system has significantly improved efficiency of both editing and validation
Presentation at 2019 ASHG GRC/GIAB workshop describing goals and progress of the telomere-to-telomere consortium to generate a genome assembly that provides representation of all sequences, including repetitive regions.
DNA-Protein interaction by 3C based method.pptxKashvi Jadia
This ppt includes different 3C based techniques for study of DNA-Protein interaction. Data from given research papers are taken for education purpose only,.
Presentation at 2019 ASHG GRC/GIAB workshop describing history of the human reference genome, current curation efforts and future plans, and the relationship of all 3 to efforts to produce a human pan-genome.
Presentation at 2019 ASHG GRC/GIAB workshop describing features and recent updates to the vg toolkit, including examples of comparisons to other methods used for alignment and variant detection.
Presentation at 2019 ASHG GRC/GIAB workshop describing recent updates to the MANE project, which aims to provide matched annotation from RefSeq and GENCODE.
Presentation at PanGenomics in the Cloud Hackathon, run by NCBI at UCSC (https://ncbiinsights.ncbi.nlm.nih.gov/2019/02/06/pangenomics-cloud-hackathon-march-2019/). Presents points to consider about the adoption of a pangenome reference, emphasizing aspects for long-term data management and wide-spread adoption.
Presentation by Fritz Sedlazeck at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on characterizing human structural variation.
Presentation by Justin Zook at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on benchmarks for indels and structural variants.
Presentation by Karen Miga at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on centromere assemblies.
Graph and assembly strategies for the MHC and ribosomal DNA regions
Ashg grc workshop2014_tg
1. ASHG - GRC Workshop
Tina Lindsay
ASHG Oct 18, 2014
2. The Human Reference is Not Complete
• Reference has been found to not be optimal in some
regions
• Structural variation makes it difficult to assemble a truly
representative genome when using a diploid sample
• Some regions were recalcitrant to closure with technology
and resources available at the time
• Additional sequences are needed to capture the full range
of diversity in humans
4. Allelic Diversity vs. Segmental Duplication
A
A
C
T
C
G
C
C
Repeat Copies (noted by color difference)
Allelic
Copies
Diploid Genome
With a diploid genome, there is significant ambiguity sorting allelic copies from repeat copies
Haploid Genome
A C C C
Repeat Copies (ONLY but noted by color difference)
With a haploid genome, allelic differences are eliminated, and base differences are likely
indicative of repeat copies
5. Hydatidiform mole
1. Fertilization of an oocyte without a nucleus
2. Post-zygotic diploidization of triploid zygotes
23x
23X
23X 23X
?
Oocyte Androgenetic HM
6. Initial Use Of CHM1 Source
• CHORI-17 BAC Library
• CHORI-17 BAC end sequences (n=325,659)
• CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs)
• CHORI-17 BACs
• > 750 have been sequenced
• 590 of them in Genbank as phase 3
7. SRGAP2 Homology between genes
Shows nearly identical segments between SRGAP2A and SRGAP2 paralogs
Shows homology between SRGAP2B and SRGAP2C
SRGAP2A
SRGAP2B
SRGAP2C
Dennis, et.al. 2012
11. Current status of CHM1 resources
• CHORI-17 BAC Library (created from CHM1 cell line)
• CHORI-17 BAC end sequences (n=325,659)
• CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs)
• CHORI-17 BACs (>750 have been sequenced, with 592 of them in
Genbank as phase 3)
• Active cell line
• >100X coverage Illumina 100bp reads
• 300, 500bp, 3kb inserts
• Reference assisted assembly CHM1_1.1
• BioNano genome map
• >50X coverage of PacBio long read data
12. CHM1_1.1 Assembly
• Reference-guided assembly – SRPRISM v2.3, R. Agarwala
• Alignment of Illumina reads to GRCh37 primary assembly
• CHORI-17 BAC clone tilepaths were then incorporated
• 428 total clones
• 324 clones in 45 tilepaths
• 104 clones as singletons
• Comparison back to GRCh37 reference to provide appropriate gaps
sizes
• Assembly submitted to Genbank
• http://www.ncbi.nlm.nih.gov/assembly/GCF_000306695.2
• Paper to be published soon
• Genome Research (in press)
• biorxiv doi (doi: http://dx.doi.org/10.1101/006841)
13. CHM1_1.1 Assembly
Total Sequence Length 3,037,866,619 bp
Total Assembly Gap Length 210,229,812 bp
Number of Scaffolds 163
Scaffold N50 50,362,920 bp
Number of Contigs 40,828
Contig N50 143,936 bp
CHM1_1.1
GRCh3
7
16. PacBio CHM1 Assembly Shows Data Not in GRCH38
GRCh38
PacBio CHM1
Second Pass Alignment
17. CHM1 BioNano Genome Map Aligned to GRCh38
GRCh38
CHM1 BioNano Map
~15kb additional data
18. BioNano SV Calls Identified a Assembly Problems
Collapse
Expansion
in Assembly
CHM1_1.1 Assembly Gap in Sequence
CHM1 BioNano Map
19. Collapse in Sequence Data
Thought to be missing ~100kb in sequenced clones
GRCh38
20. Gap Sizing
Chr8 – Stalled Gap
Estimated at ~150kb
GRCh38
Sized using CHM1 Genome Map - >500 Kb
21. Future of CHM1 Assembly
• Plan to make as contiguous and accurate as possible
• Incorporate PacBio assembly where possible
• Additional CH17 clones being sequenced through
segmentally duplicated and structurally variant regions to
provide local assembly benefits (isolates the repeats)
23. Future Directions
• Continued Improvement on CHM1 Genome
• Integration of Pacific Bioscience whole genome assembly
• BioNano genome map data
• Continue to add diversity to the reference by sequencing
new samples that provide additional diversity than what is
currently represented in GRCh38
• Continued sequencing of CH17 single haplotype BAC
tilepaths to better represent segmentally duplicated
regions
• Additional collaborations with the community to develop
tools to more fully utilize the full reference assembly
(alternate haplotypes)
24. Acknowledgements
The Genome Institute at Washington
University in St. Louis
Rick Wilson
Bob Fulton
Wes Warren
Karyn Meltz Steinberg
Vince Magrini
Derek Albracht
Milinn Kremitzki
Susan Rock
Debbie Scheer
Aye Wollam
The Finishing and Bioinformatics Teams
at The Genome Institute
University of Washington
Evan Eichler
Megan Dennis
Xander Nuttler
NCBI
Richa Argwala
Valerie Schneider
University of Pittsburgh
School of Medicine (CHM1 cell line)
Urvashi Surti
Personalis
Deanna Church
BioNano Genomics
Pacific Biosciences
UCSF
Pui-Yan Kwok
Yvonne Lai
Chin Lin
CHORI Catherine Chu
Pieter de Jong