SlideShare a Scribd company logo
1 of 18
Previewing GRCm39: assembly updates from the GRC
Tayebeh Rezaie, Ph.D.
NCBI
25 September 2019
Contributed:
• Valerie Schneider
• Kerstin Howe
• Tina Graves
• Paul Flicek
• Tayebeh Rezaie
• Nathan Bouk
• Hsiu-Chuan Chen
• Jo Wood
• Joanna Collins
• Sarah Pelan
• Will Chow
• James Torrance
• Derek Albracht
• Milinn Kremitzki
• Laura Clarke
• Jane Loveland
• NCBI RefSeq and GenColl
This work was supported in part by the Intramural Research Program of the National Library of Medicine, National Institutes of Health.
• Primary assembly unit:
• C57BL/6J chromosomes
• Unlocalized and unplaced scaffolds
(scaffold: O/O set of contigs)
• Strain-specific assembly units:
• Seq from clones representing
other strains, regions needing
additional representation
• Patches assembly unit
GRC Assembly model
Mouse Chr. 1, GRCm38
What is mouse genome assembly?
Reference is from C57BL/6J
http://genomereference.org
How we do assembly curation?
• Technology: sequencing, FISH, Optical Mapping,
alignments of end clones, assembling Illumina reads
• Sequencing: clones
• FISH: localization of unlocalized sequences
• Optical Mapping: gap sizing, path problem
• Resources: clones, WGS, PCR products
• Gap closure
• Correction of clone assembly problem
• Path problem correction
• Represent strain-variation
Examples of assembly curation in GRCm38C
Release of GRCm39 is planned in early 2020. An overview of
GRCm39 from analyses of GRCm38C, the 2nd intermediate build.
Minor or patch release: non-coordinate changing assembly versions
Major release: coordinate changing assembly versions
http://genomereference.org
Genome issues resolved post-GRCm38
Updates as of GRCm38.p6
• 65 FIX patches
• 9 NOVEL patches
GRCm38 Released updates
0 20 40 60 80 100 120 140 160 180 200
Gap
Clone
Variation
Missing
GRC
Path
Unknown
Localization
39%
21%
7%
12%
15%
2.7%
2%
1.3% total = 473
Gap + Clone = 60% of all resolved post-GRCm38
Improving the reference assembly
Six minor/patch releases since 2012, GRCm38 release
• Patch releases: non-coordinate changing assembly versions
• Fix patches (chromosome path changes)
• Novel patches (alternate representations of chromosome
sequences, derived from other strains)
GRCm38 (GCF_000001635.20) GRCm38C (GCF_008087425.1)
Total length 2,730,855,475 (Primary)
2,793,712,140 (all)
2,733,095,204 (Primary)
2,798,405,461 (all)
Total assembly gap length 79,291,755 (all) 78,606,933 (all)
# gaps between scaffolds 191 151
# gaps within scaffolds 443 213
Scaffold N50 54,517,951 100,923,795 (85% increase)
Contig N50 32,273,079 57,461,838 (78% increase)
 GRCm38C has fewer gaps and is more contiguous as compared to GRCm38
 In GRCm38C: 5 single scaffold chrs (11,12,15,16,18), 11 built from 2 scaffolds, 5 built >2 scaffolds
GRCm38/GRCm38C assemblies stats
Assembly component updates between GRCm38/GRCm38C
Number of components with change = 666 (~3.2%)
o Added: 315 (6,640,992 bp)
 Clones + PCR: 77
 Assembled Illumina reads: 95
 WGS from 'MmusSOAP1’ & 'MmusALLPATHS2’ assemblies: 81
 WGS from MGSCv3 (original mouse genome project): 17
 WGS from Eve assembly: 45
o Dropped: 330 (3,555,784 bp)
 WGS from MGSCv3 replaced with >accuracy seq: 310 (94%)
o Version bumped: 15
o Strand flipped: 4
o Version bumped + Strand flipped: 2
 Our evaluation of scaffold/component changes in GRCm38C found no unexpected changes.
RefSeq Transcript Analysis
GRCm38
Primary Unit
GRCm38C
Primary Unit
Number of sequences retrieved from Entrez 42721 42721
Number of sequences not aligning* 6 2
Number of sequences with multiple best alignments
(split transcripts)† 1 2
Number of sequences with CDS coverage <95% 41 19
*The 2 txpts not aligning to both GRCm38C & GRCm38 primary:
• Olfr100 (annotated on alt from 129X1/SvJ)
• Rs5-8s1
†GRCm38C split aligns by a gap:
• Sts (PAR), no align. to GRCm38
• Rn45s
*Other 4 not aligning to GRCm38 primary:
• Ahsp (Clone problem) and Copg2os2 (Gap), corrected
• Sts (PAR)
• Rn45s
Genes improved representation in GRCm38C
4933416I08Rik Dnah12 Mia3 Pik3c2g Sgms2
Ahnak2 Efcab7 Muc2 Ppp2r3d Slc26a6
Anxa13 Ide Muc3 Pstpip2 Spata5l1
Atg4a Ifi30 Muc4 Ptpmt1 Spry3
Auts2 Intu Muc6 Rab3a Taf1a
Baalc Jakmip3 Nadk2 Ranbp3l Tmem134
Cct6a Kazn Nhej1 Rasgrf2 Traf5
Cylc1 Kndc1 Nkain1 Rhox5 Trerf1
Dgkk Krt85 Nlrp4g Rims1 Vezf1
Assembly gap closure and complete
representation of Efcab7
Correction of an assembly false GRCm38 gap caused by
haplotype incompatibility
View curation status of Mouse Genome Issues
http://genomereference.org
Unresolved genome issues Current curation status
Resolution likelihoods as determined by GRC review;
used optical mapping to size remaining gaps and FISH
to localize unlocalized sequences.
A major obstacle: the repetitive nature of genomic
region including segmental duplications
Base Report Sources:
• Sanger mouse genomes project (n=4,148)
• Eve assembly publication (n=267)
• An additional 236 bases reported in Eve are included in the Sanger set
Analysis: Evaluate support for these bases
• Align Illumina reads derived from another C57BL/6J
sample to GRCm38 (Gnerre et al.; PMID: 21187386)
• Generate pile-up results from alignments
• Categorize results as: homozygous REF, homozygous
ALT and heterozygous
Goal: Update erroneous or very rare GRCm38 bases
*All bases common with the Eve set were homozygous ALT
*Bases reported only from Eve (n=184): 25% hom REF, 75% hom ALT
21187386
Evaluation of consequences with VEPMouse Genomes Project Bases
Sites in CDS/genes: 45
• 34 homozygous REF
• 11 homozygous ALT
21187386
Evaluation of erroneous or very rare GRCm38 bases
Base Report Sources:
• Sanger mouse genomes project (n=4,148)
• Eve assembly publication (n=267)
Conclusion and future:
• The GRC is currently preparing for the release of GRCm39
• Upon the release of GRCm39, the GRC's curation of the mouse genome reference
assembly will be limited to the resolution of community reported problems
o Contact us with a question or report an assembly issue or request info. about the
genomic region of your interest: https://www.ncbi.nlm.nih.gov/grc/contact-us
o See GRC blog posts: http://genomeref.blogspot.com/
o For FAQs and other assembly help: https://www.ncbi.nlm.nih.gov/grc/help/
o For more information see my poster P43 on Thursday
Release of GRCm39 is planned in early 2020
http://genomereference.org

More Related Content

What's hot

Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...ExternalEvents
 
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion ModelsDeep Learning JP
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
Part 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalPart 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalJoachim Jacob
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Sebastian Schmeier
 
TransPose: Towards Explainable Human Pose Estimation by Transformer
TransPose: Towards Explainable Human Pose Estimation by TransformerTransPose: Towards Explainable Human Pose Estimation by Transformer
TransPose: Towards Explainable Human Pose Estimation by TransformerYasutomo Kawanishi
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Luba Elliott
 
Rdna technology
Rdna technologyRdna technology
Rdna technologyStiti Dash
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)LOGESWARAN KA
 
Tissue Engineering: Scaffold Materials
Tissue Engineering: Scaffold MaterialsTissue Engineering: Scaffold Materials
Tissue Engineering: Scaffold MaterialsElahehEntezarmahdi
 
SSII2022 [OS3-03] スケーラブルなロボット学習システムに向けて
SSII2022 [OS3-03] スケーラブルなロボット学習システムに向けてSSII2022 [OS3-03] スケーラブルなロボット学習システムに向けて
SSII2022 [OS3-03] スケーラブルなロボット学習システムに向けてSSII
 
Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014PyData
 
[DL輪読会]SlowFast Networks for Video Recognition
[DL輪読会]SlowFast Networks for Video Recognition[DL輪読会]SlowFast Networks for Video Recognition
[DL輪読会]SlowFast Networks for Video RecognitionDeep Learning JP
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingRayhan Shahrear
 
STAIR Lab Seminar 202105
STAIR Lab Seminar 202105STAIR Lab Seminar 202105
STAIR Lab Seminar 202105Sho Takase
 

What's hot (20)

Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
 
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Part 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalPart 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goal
 
Genome evolution
Genome evolutionGenome evolution
Genome evolution
 
NGS - QC & Dataformat
NGS - QC & Dataformat NGS - QC & Dataformat
NGS - QC & Dataformat
 
The Cancer Genome Atlas Update
The Cancer Genome Atlas UpdateThe Cancer Genome Atlas Update
The Cancer Genome Atlas Update
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
 
TransPose: Towards Explainable Human Pose Estimation by Transformer
TransPose: Towards Explainable Human Pose Estimation by TransformerTransPose: Towards Explainable Human Pose Estimation by Transformer
TransPose: Towards Explainable Human Pose Estimation by Transformer
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
 
Rdna technology
Rdna technologyRdna technology
Rdna technology
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)
 
Tissue Engineering: Scaffold Materials
Tissue Engineering: Scaffold MaterialsTissue Engineering: Scaffold Materials
Tissue Engineering: Scaffold Materials
 
Pcr primer design
Pcr primer designPcr primer design
Pcr primer design
 
Human Genome Project
Human Genome ProjectHuman Genome Project
Human Genome Project
 
SSII2022 [OS3-03] スケーラブルなロボット学習システムに向けて
SSII2022 [OS3-03] スケーラブルなロボット学習システムに向けてSSII2022 [OS3-03] スケーラブルなロボット学習システムに向けて
SSII2022 [OS3-03] スケーラブルなロボット学習システムに向けて
 
Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014
 
[DL輪読会]SlowFast Networks for Video Recognition
[DL輪読会]SlowFast Networks for Video Recognition[DL輪読会]SlowFast Networks for Video Recognition
[DL輪読会]SlowFast Networks for Video Recognition
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
STAIR Lab Seminar 202105
STAIR Lab Seminar 202105STAIR Lab Seminar 202105
STAIR Lab Seminar 202105
 

Similar to Previewing GRCm39: Assembly Updates from the GRC

Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonGenome Reference Consortium
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...William Chow
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowHorizonDiscovery
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyShaojun Xie
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Deanna Church
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907GenomeInABottle
 

Similar to Previewing GRCm39: Assembly Updates from the GRC (20)

Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Grc workshop agbt2015_tg
Grc workshop agbt2015_tgGrc workshop agbt2015_tg
Grc workshop agbt2015_tg
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Church gmod2012 pt2
Church gmod2012 pt2Church gmod2012 pt2
Church gmod2012 pt2
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 

More from Genome Reference Consortium

Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsGenome Reference Consortium
 

More from Genome Reference Consortium (20)

Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 

Recently uploaded

Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentslevieagacer
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfSuchita Rawat
 
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfMODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfRevenJadePalma
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxKyawThanTint
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Ansari Aashif Raza Mohd Imtiyaz
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneySérgio Sacani
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxMAGOTI ERNEST
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxArpitaMishra69
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxGOWTHAMIM22
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandRcvets
 
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...Sérgio Sacani
 
Costs to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of UgandaCosts to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of UgandaTimothyOkuna
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfmarcuskenyatta275
 
NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.syedmuneemqadri
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!University of Hertfordshire
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxPat (JS) Heslop-Harrison
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationSérgio Sacani
 
Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfhoangquan21999
 
PARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semPARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semborkhotudu123
 

Recently uploaded (20)

Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 students
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
 
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfMODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptx
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptx
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
 
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
 
Costs to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of UgandaCosts to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of Uganda
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdf
 
NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdf
 
PARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semPARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th sem
 

Previewing GRCm39: Assembly Updates from the GRC

  • 1. Previewing GRCm39: assembly updates from the GRC Tayebeh Rezaie, Ph.D. NCBI 25 September 2019
  • 2. Contributed: • Valerie Schneider • Kerstin Howe • Tina Graves • Paul Flicek • Tayebeh Rezaie • Nathan Bouk • Hsiu-Chuan Chen • Jo Wood • Joanna Collins • Sarah Pelan • Will Chow • James Torrance • Derek Albracht • Milinn Kremitzki • Laura Clarke • Jane Loveland • NCBI RefSeq and GenColl This work was supported in part by the Intramural Research Program of the National Library of Medicine, National Institutes of Health.
  • 3. • Primary assembly unit: • C57BL/6J chromosomes • Unlocalized and unplaced scaffolds (scaffold: O/O set of contigs) • Strain-specific assembly units: • Seq from clones representing other strains, regions needing additional representation • Patches assembly unit GRC Assembly model
  • 4. Mouse Chr. 1, GRCm38 What is mouse genome assembly? Reference is from C57BL/6J http://genomereference.org
  • 5. How we do assembly curation? • Technology: sequencing, FISH, Optical Mapping, alignments of end clones, assembling Illumina reads • Sequencing: clones • FISH: localization of unlocalized sequences • Optical Mapping: gap sizing, path problem • Resources: clones, WGS, PCR products • Gap closure • Correction of clone assembly problem • Path problem correction • Represent strain-variation Examples of assembly curation in GRCm38C
  • 6. Release of GRCm39 is planned in early 2020. An overview of GRCm39 from analyses of GRCm38C, the 2nd intermediate build. Minor or patch release: non-coordinate changing assembly versions Major release: coordinate changing assembly versions http://genomereference.org
  • 7. Genome issues resolved post-GRCm38 Updates as of GRCm38.p6 • 65 FIX patches • 9 NOVEL patches GRCm38 Released updates 0 20 40 60 80 100 120 140 160 180 200 Gap Clone Variation Missing GRC Path Unknown Localization 39% 21% 7% 12% 15% 2.7% 2% 1.3% total = 473 Gap + Clone = 60% of all resolved post-GRCm38 Improving the reference assembly Six minor/patch releases since 2012, GRCm38 release • Patch releases: non-coordinate changing assembly versions • Fix patches (chromosome path changes) • Novel patches (alternate representations of chromosome sequences, derived from other strains)
  • 8. GRCm38 (GCF_000001635.20) GRCm38C (GCF_008087425.1) Total length 2,730,855,475 (Primary) 2,793,712,140 (all) 2,733,095,204 (Primary) 2,798,405,461 (all) Total assembly gap length 79,291,755 (all) 78,606,933 (all) # gaps between scaffolds 191 151 # gaps within scaffolds 443 213 Scaffold N50 54,517,951 100,923,795 (85% increase) Contig N50 32,273,079 57,461,838 (78% increase)  GRCm38C has fewer gaps and is more contiguous as compared to GRCm38  In GRCm38C: 5 single scaffold chrs (11,12,15,16,18), 11 built from 2 scaffolds, 5 built >2 scaffolds GRCm38/GRCm38C assemblies stats
  • 9. Assembly component updates between GRCm38/GRCm38C Number of components with change = 666 (~3.2%) o Added: 315 (6,640,992 bp)  Clones + PCR: 77  Assembled Illumina reads: 95  WGS from 'MmusSOAP1’ & 'MmusALLPATHS2’ assemblies: 81  WGS from MGSCv3 (original mouse genome project): 17  WGS from Eve assembly: 45 o Dropped: 330 (3,555,784 bp)  WGS from MGSCv3 replaced with >accuracy seq: 310 (94%) o Version bumped: 15 o Strand flipped: 4 o Version bumped + Strand flipped: 2  Our evaluation of scaffold/component changes in GRCm38C found no unexpected changes.
  • 10. RefSeq Transcript Analysis GRCm38 Primary Unit GRCm38C Primary Unit Number of sequences retrieved from Entrez 42721 42721 Number of sequences not aligning* 6 2 Number of sequences with multiple best alignments (split transcripts)† 1 2 Number of sequences with CDS coverage <95% 41 19 *The 2 txpts not aligning to both GRCm38C & GRCm38 primary: • Olfr100 (annotated on alt from 129X1/SvJ) • Rs5-8s1 †GRCm38C split aligns by a gap: • Sts (PAR), no align. to GRCm38 • Rn45s *Other 4 not aligning to GRCm38 primary: • Ahsp (Clone problem) and Copg2os2 (Gap), corrected • Sts (PAR) • Rn45s
  • 11. Genes improved representation in GRCm38C 4933416I08Rik Dnah12 Mia3 Pik3c2g Sgms2 Ahnak2 Efcab7 Muc2 Ppp2r3d Slc26a6 Anxa13 Ide Muc3 Pstpip2 Spata5l1 Atg4a Ifi30 Muc4 Ptpmt1 Spry3 Auts2 Intu Muc6 Rab3a Taf1a Baalc Jakmip3 Nadk2 Ranbp3l Tmem134 Cct6a Kazn Nhej1 Rasgrf2 Traf5 Cylc1 Kndc1 Nkain1 Rhox5 Trerf1 Dgkk Krt85 Nlrp4g Rims1 Vezf1
  • 12. Assembly gap closure and complete representation of Efcab7
  • 13. Correction of an assembly false GRCm38 gap caused by haplotype incompatibility
  • 14. View curation status of Mouse Genome Issues http://genomereference.org
  • 15. Unresolved genome issues Current curation status Resolution likelihoods as determined by GRC review; used optical mapping to size remaining gaps and FISH to localize unlocalized sequences. A major obstacle: the repetitive nature of genomic region including segmental duplications
  • 16. Base Report Sources: • Sanger mouse genomes project (n=4,148) • Eve assembly publication (n=267) • An additional 236 bases reported in Eve are included in the Sanger set Analysis: Evaluate support for these bases • Align Illumina reads derived from another C57BL/6J sample to GRCm38 (Gnerre et al.; PMID: 21187386) • Generate pile-up results from alignments • Categorize results as: homozygous REF, homozygous ALT and heterozygous Goal: Update erroneous or very rare GRCm38 bases *All bases common with the Eve set were homozygous ALT *Bases reported only from Eve (n=184): 25% hom REF, 75% hom ALT 21187386
  • 17. Evaluation of consequences with VEPMouse Genomes Project Bases Sites in CDS/genes: 45 • 34 homozygous REF • 11 homozygous ALT 21187386 Evaluation of erroneous or very rare GRCm38 bases Base Report Sources: • Sanger mouse genomes project (n=4,148) • Eve assembly publication (n=267)
  • 18. Conclusion and future: • The GRC is currently preparing for the release of GRCm39 • Upon the release of GRCm39, the GRC's curation of the mouse genome reference assembly will be limited to the resolution of community reported problems o Contact us with a question or report an assembly issue or request info. about the genomic region of your interest: https://www.ncbi.nlm.nih.gov/grc/contact-us o See GRC blog posts: http://genomeref.blogspot.com/ o For FAQs and other assembly help: https://www.ncbi.nlm.nih.gov/grc/help/ o For more information see my poster P43 on Thursday Release of GRCm39 is planned in early 2020 http://genomereference.org

Editor's Notes

  1. .