SlideShare a Scribd company logo
1 of 18
Previewing GRCm39: assembly updates from the GRC
Tayebeh Rezaie, Ph.D.
NCBI
25 September 2019
Contributed:
• Valerie Schneider
• Kerstin Howe
• Tina Graves
• Paul Flicek
• Tayebeh Rezaie
• Nathan Bouk
• Hsiu-Chuan Chen
• Jo Wood
• Joanna Collins
• Sarah Pelan
• Will Chow
• James Torrance
• Derek Albracht
• Milinn Kremitzki
• Laura Clarke
• Jane Loveland
• NCBI RefSeq and GenColl
This work was supported in part by the Intramural Research Program of the National Library of Medicine, National Institutes of Health.
• Primary assembly unit:
• C57BL/6J chromosomes
• Unlocalized and unplaced scaffolds
(scaffold: O/O set of contigs)
• Strain-specific assembly units:
• Seq from clones representing
other strains, regions needing
additional representation
• Patches assembly unit
GRC Assembly model
Mouse Chr. 1, GRCm38
What is mouse genome assembly?
Reference is from C57BL/6J
http://genomereference.org
How we do assembly curation?
• Technology: sequencing, FISH, Optical Mapping,
alignments of end clones, assembling Illumina reads
• Sequencing: clones
• FISH: localization of unlocalized sequences
• Optical Mapping: gap sizing, path problem
• Resources: clones, WGS, PCR products
• Gap closure
• Correction of clone assembly problem
• Path problem correction
• Represent strain-variation
Examples of assembly curation in GRCm38C
Release of GRCm39 is planned in early 2020. An overview of
GRCm39 from analyses of GRCm38C, the 2nd intermediate build.
Minor or patch release: non-coordinate changing assembly versions
Major release: coordinate changing assembly versions
http://genomereference.org
Genome issues resolved post-GRCm38
Updates as of GRCm38.p6
• 65 FIX patches
• 9 NOVEL patches
GRCm38 Released updates
0 20 40 60 80 100 120 140 160 180 200
Gap
Clone
Variation
Missing
GRC
Path
Unknown
Localization
39%
21%
7%
12%
15%
2.7%
2%
1.3% total = 473
Gap + Clone = 60% of all resolved post-GRCm38
Improving the reference assembly
Six minor/patch releases since 2012, GRCm38 release
• Patch releases: non-coordinate changing assembly versions
• Fix patches (chromosome path changes)
• Novel patches (alternate representations of chromosome
sequences, derived from other strains)
GRCm38 (GCF_000001635.20) GRCm38C (GCF_008087425.1)
Total length 2,730,855,475 (Primary)
2,793,712,140 (all)
2,733,095,204 (Primary)
2,798,405,461 (all)
Total assembly gap length 79,291,755 (all) 78,606,933 (all)
# gaps between scaffolds 191 151
# gaps within scaffolds 443 213
Scaffold N50 54,517,951 100,923,795 (85% increase)
Contig N50 32,273,079 57,461,838 (78% increase)
 GRCm38C has fewer gaps and is more contiguous as compared to GRCm38
 In GRCm38C: 5 single scaffold chrs (11,12,15,16,18), 11 built from 2 scaffolds, 5 built >2 scaffolds
GRCm38/GRCm38C assemblies stats
Assembly component updates between GRCm38/GRCm38C
Number of components with change = 666 (~3.2%)
o Added: 315 (6,640,992 bp)
 Clones + PCR: 77
 Assembled Illumina reads: 95
 WGS from 'MmusSOAP1’ & 'MmusALLPATHS2’ assemblies: 81
 WGS from MGSCv3 (original mouse genome project): 17
 WGS from Eve assembly: 45
o Dropped: 330 (3,555,784 bp)
 WGS from MGSCv3 replaced with >accuracy seq: 310 (94%)
o Version bumped: 15
o Strand flipped: 4
o Version bumped + Strand flipped: 2
 Our evaluation of scaffold/component changes in GRCm38C found no unexpected changes.
RefSeq Transcript Analysis
GRCm38
Primary Unit
GRCm38C
Primary Unit
Number of sequences retrieved from Entrez 42721 42721
Number of sequences not aligning* 6 2
Number of sequences with multiple best alignments
(split transcripts)† 1 2
Number of sequences with CDS coverage <95% 41 19
*The 2 txpts not aligning to both GRCm38C & GRCm38 primary:
• Olfr100 (annotated on alt from 129X1/SvJ)
• Rs5-8s1
†GRCm38C split aligns by a gap:
• Sts (PAR), no align. to GRCm38
• Rn45s
*Other 4 not aligning to GRCm38 primary:
• Ahsp (Clone problem) and Copg2os2 (Gap), corrected
• Sts (PAR)
• Rn45s
Genes improved representation in GRCm38C
4933416I08Rik Dnah12 Mia3 Pik3c2g Sgms2
Ahnak2 Efcab7 Muc2 Ppp2r3d Slc26a6
Anxa13 Ide Muc3 Pstpip2 Spata5l1
Atg4a Ifi30 Muc4 Ptpmt1 Spry3
Auts2 Intu Muc6 Rab3a Taf1a
Baalc Jakmip3 Nadk2 Ranbp3l Tmem134
Cct6a Kazn Nhej1 Rasgrf2 Traf5
Cylc1 Kndc1 Nkain1 Rhox5 Trerf1
Dgkk Krt85 Nlrp4g Rims1 Vezf1
Assembly gap closure and complete
representation of Efcab7
Correction of an assembly false GRCm38 gap caused by
haplotype incompatibility
View curation status of Mouse Genome Issues
http://genomereference.org
Unresolved genome issues Current curation status
Resolution likelihoods as determined by GRC review;
used optical mapping to size remaining gaps and FISH
to localize unlocalized sequences.
A major obstacle: the repetitive nature of genomic
region including segmental duplications
Base Report Sources:
• Sanger mouse genomes project (n=4,148)
• Eve assembly publication (n=267)
• An additional 236 bases reported in Eve are included in the Sanger set
Analysis: Evaluate support for these bases
• Align Illumina reads derived from another C57BL/6J
sample to GRCm38 (Gnerre et al.; PMID: 21187386)
• Generate pile-up results from alignments
• Categorize results as: homozygous REF, homozygous
ALT and heterozygous
Goal: Update erroneous or very rare GRCm38 bases
*All bases common with the Eve set were homozygous ALT
*Bases reported only from Eve (n=184): 25% hom REF, 75% hom ALT
21187386
Evaluation of consequences with VEPMouse Genomes Project Bases
Sites in CDS/genes: 45
• 34 homozygous REF
• 11 homozygous ALT
21187386
Evaluation of erroneous or very rare GRCm38 bases
Base Report Sources:
• Sanger mouse genomes project (n=4,148)
• Eve assembly publication (n=267)
Conclusion and future:
• The GRC is currently preparing for the release of GRCm39
• Upon the release of GRCm39, the GRC's curation of the mouse genome reference
assembly will be limited to the resolution of community reported problems
o Contact us with a question or report an assembly issue or request info. about the
genomic region of your interest: https://www.ncbi.nlm.nih.gov/grc/contact-us
o See GRC blog posts: http://genomeref.blogspot.com/
o For FAQs and other assembly help: https://www.ncbi.nlm.nih.gov/grc/help/
o For more information see my poster P43 on Thursday
Release of GRCm39 is planned in early 2020
http://genomereference.org

More Related Content

What's hot

Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingAmritha S R
 
Whole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxWhole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxHaibo Liu
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_predictionBas van Breukelen
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)Shaojun Xie
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingShelomi Karoon
 
Degenerate primers
Degenerate primersDegenerate primers
Degenerate primersAfra Fathima
 
Flash introduction to Qiime2 -- 16S Amplicon analysis
Flash introduction to Qiime2 -- 16S Amplicon analysisFlash introduction to Qiime2 -- 16S Amplicon analysis
Flash introduction to Qiime2 -- 16S Amplicon analysisAndrea Telatin
 
Overview of Genome Assembly Algorithms
Overview of Genome Assembly AlgorithmsOverview of Genome Assembly Algorithms
Overview of Genome Assembly AlgorithmsNtino Krampis
 
Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Hamza Khan
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq DataPhil Ewels
 
A beginner's guide to flow cytometry
A beginner's guide to flow cytometryA beginner's guide to flow cytometry
A beginner's guide to flow cytometryExpedeon
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Sri Ambati
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...VHIR Vall d’Hebron Institut de Recerca
 

What's hot (20)

Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Whole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxWhole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptx
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Rna seq
Rna seqRna seq
Rna seq
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
Bioinformatics seminar
Bioinformatics seminarBioinformatics seminar
Bioinformatics seminar
 
NGS File formats
NGS File formatsNGS File formats
NGS File formats
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Degenerate primers
Degenerate primersDegenerate primers
Degenerate primers
 
Genome analysis2
Genome analysis2Genome analysis2
Genome analysis2
 
Flash introduction to Qiime2 -- 16S Amplicon analysis
Flash introduction to Qiime2 -- 16S Amplicon analysisFlash introduction to Qiime2 -- 16S Amplicon analysis
Flash introduction to Qiime2 -- 16S Amplicon analysis
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Overview of Genome Assembly Algorithms
Overview of Genome Assembly AlgorithmsOverview of Genome Assembly Algorithms
Overview of Genome Assembly Algorithms
 
Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
A beginner's guide to flow cytometry
A beginner's guide to flow cytometryA beginner's guide to flow cytometry
A beginner's guide to flow cytometry
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
 

Similar to Previewing GRCm39: Assembly Updates from the GRC

Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonGenome Reference Consortium
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...William Chow
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowHorizonDiscovery
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyShaojun Xie
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Deanna Church
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907GenomeInABottle
 

Similar to Previewing GRCm39: Assembly Updates from the GRC (20)

Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Grc workshop agbt2015_tg
Grc workshop agbt2015_tgGrc workshop agbt2015_tg
Grc workshop agbt2015_tg
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Church gmod2012 pt2
Church gmod2012 pt2Church gmod2012 pt2
Church gmod2012 pt2
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 

More from Genome Reference Consortium

Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsGenome Reference Consortium
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGenome Reference Consortium
 

More from Genome Reference Consortium (20)

Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
 

Recently uploaded

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 

Recently uploaded (20)

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 

Previewing GRCm39: Assembly Updates from the GRC

  • 1. Previewing GRCm39: assembly updates from the GRC Tayebeh Rezaie, Ph.D. NCBI 25 September 2019
  • 2. Contributed: • Valerie Schneider • Kerstin Howe • Tina Graves • Paul Flicek • Tayebeh Rezaie • Nathan Bouk • Hsiu-Chuan Chen • Jo Wood • Joanna Collins • Sarah Pelan • Will Chow • James Torrance • Derek Albracht • Milinn Kremitzki • Laura Clarke • Jane Loveland • NCBI RefSeq and GenColl This work was supported in part by the Intramural Research Program of the National Library of Medicine, National Institutes of Health.
  • 3. • Primary assembly unit: • C57BL/6J chromosomes • Unlocalized and unplaced scaffolds (scaffold: O/O set of contigs) • Strain-specific assembly units: • Seq from clones representing other strains, regions needing additional representation • Patches assembly unit GRC Assembly model
  • 4. Mouse Chr. 1, GRCm38 What is mouse genome assembly? Reference is from C57BL/6J http://genomereference.org
  • 5. How we do assembly curation? • Technology: sequencing, FISH, Optical Mapping, alignments of end clones, assembling Illumina reads • Sequencing: clones • FISH: localization of unlocalized sequences • Optical Mapping: gap sizing, path problem • Resources: clones, WGS, PCR products • Gap closure • Correction of clone assembly problem • Path problem correction • Represent strain-variation Examples of assembly curation in GRCm38C
  • 6. Release of GRCm39 is planned in early 2020. An overview of GRCm39 from analyses of GRCm38C, the 2nd intermediate build. Minor or patch release: non-coordinate changing assembly versions Major release: coordinate changing assembly versions http://genomereference.org
  • 7. Genome issues resolved post-GRCm38 Updates as of GRCm38.p6 • 65 FIX patches • 9 NOVEL patches GRCm38 Released updates 0 20 40 60 80 100 120 140 160 180 200 Gap Clone Variation Missing GRC Path Unknown Localization 39% 21% 7% 12% 15% 2.7% 2% 1.3% total = 473 Gap + Clone = 60% of all resolved post-GRCm38 Improving the reference assembly Six minor/patch releases since 2012, GRCm38 release • Patch releases: non-coordinate changing assembly versions • Fix patches (chromosome path changes) • Novel patches (alternate representations of chromosome sequences, derived from other strains)
  • 8. GRCm38 (GCF_000001635.20) GRCm38C (GCF_008087425.1) Total length 2,730,855,475 (Primary) 2,793,712,140 (all) 2,733,095,204 (Primary) 2,798,405,461 (all) Total assembly gap length 79,291,755 (all) 78,606,933 (all) # gaps between scaffolds 191 151 # gaps within scaffolds 443 213 Scaffold N50 54,517,951 100,923,795 (85% increase) Contig N50 32,273,079 57,461,838 (78% increase)  GRCm38C has fewer gaps and is more contiguous as compared to GRCm38  In GRCm38C: 5 single scaffold chrs (11,12,15,16,18), 11 built from 2 scaffolds, 5 built >2 scaffolds GRCm38/GRCm38C assemblies stats
  • 9. Assembly component updates between GRCm38/GRCm38C Number of components with change = 666 (~3.2%) o Added: 315 (6,640,992 bp)  Clones + PCR: 77  Assembled Illumina reads: 95  WGS from 'MmusSOAP1’ & 'MmusALLPATHS2’ assemblies: 81  WGS from MGSCv3 (original mouse genome project): 17  WGS from Eve assembly: 45 o Dropped: 330 (3,555,784 bp)  WGS from MGSCv3 replaced with >accuracy seq: 310 (94%) o Version bumped: 15 o Strand flipped: 4 o Version bumped + Strand flipped: 2  Our evaluation of scaffold/component changes in GRCm38C found no unexpected changes.
  • 10. RefSeq Transcript Analysis GRCm38 Primary Unit GRCm38C Primary Unit Number of sequences retrieved from Entrez 42721 42721 Number of sequences not aligning* 6 2 Number of sequences with multiple best alignments (split transcripts)† 1 2 Number of sequences with CDS coverage <95% 41 19 *The 2 txpts not aligning to both GRCm38C & GRCm38 primary: • Olfr100 (annotated on alt from 129X1/SvJ) • Rs5-8s1 †GRCm38C split aligns by a gap: • Sts (PAR), no align. to GRCm38 • Rn45s *Other 4 not aligning to GRCm38 primary: • Ahsp (Clone problem) and Copg2os2 (Gap), corrected • Sts (PAR) • Rn45s
  • 11. Genes improved representation in GRCm38C 4933416I08Rik Dnah12 Mia3 Pik3c2g Sgms2 Ahnak2 Efcab7 Muc2 Ppp2r3d Slc26a6 Anxa13 Ide Muc3 Pstpip2 Spata5l1 Atg4a Ifi30 Muc4 Ptpmt1 Spry3 Auts2 Intu Muc6 Rab3a Taf1a Baalc Jakmip3 Nadk2 Ranbp3l Tmem134 Cct6a Kazn Nhej1 Rasgrf2 Traf5 Cylc1 Kndc1 Nkain1 Rhox5 Trerf1 Dgkk Krt85 Nlrp4g Rims1 Vezf1
  • 12. Assembly gap closure and complete representation of Efcab7
  • 13. Correction of an assembly false GRCm38 gap caused by haplotype incompatibility
  • 14. View curation status of Mouse Genome Issues http://genomereference.org
  • 15. Unresolved genome issues Current curation status Resolution likelihoods as determined by GRC review; used optical mapping to size remaining gaps and FISH to localize unlocalized sequences. A major obstacle: the repetitive nature of genomic region including segmental duplications
  • 16. Base Report Sources: • Sanger mouse genomes project (n=4,148) • Eve assembly publication (n=267) • An additional 236 bases reported in Eve are included in the Sanger set Analysis: Evaluate support for these bases • Align Illumina reads derived from another C57BL/6J sample to GRCm38 (Gnerre et al.; PMID: 21187386) • Generate pile-up results from alignments • Categorize results as: homozygous REF, homozygous ALT and heterozygous Goal: Update erroneous or very rare GRCm38 bases *All bases common with the Eve set were homozygous ALT *Bases reported only from Eve (n=184): 25% hom REF, 75% hom ALT 21187386
  • 17. Evaluation of consequences with VEPMouse Genomes Project Bases Sites in CDS/genes: 45 • 34 homozygous REF • 11 homozygous ALT 21187386 Evaluation of erroneous or very rare GRCm38 bases Base Report Sources: • Sanger mouse genomes project (n=4,148) • Eve assembly publication (n=267)
  • 18. Conclusion and future: • The GRC is currently preparing for the release of GRCm39 • Upon the release of GRCm39, the GRC's curation of the mouse genome reference assembly will be limited to the resolution of community reported problems o Contact us with a question or report an assembly issue or request info. about the genomic region of your interest: https://www.ncbi.nlm.nih.gov/grc/contact-us o See GRC blog posts: http://genomeref.blogspot.com/ o For FAQs and other assembly help: https://www.ncbi.nlm.nih.gov/grc/help/ o For more information see my poster P43 on Thursday Release of GRCm39 is planned in early 2020 http://genomereference.org

Editor's Notes

  1. .