SlideShare a Scribd company logo
GRC Workshop at AGBT 2015
Tina Graves-Lindsay
CHM1 PacBio Data and Initial Assembly Stats
• 54X Whole Genome Coverage in long reads
• 8.8kb Avg read length
• P5-C3 Chemistry
• PacBio Assembly done by Jason Chin
• Initial assembly had 4.5 MB N50 contig length
• Have alignments of PacBio CHM1 assembly to CHM1_1.1 and
GRCh38
PacBio CHM1 Assembly potentially fills GRCh38 Gaps
GRCh38
PacBio CHM1
Data exists in PacBio unitig, not present in GRCh38
CHM1_1.1 WGS Assembly Contigs
PacBio Assembly Contig
Alignment of CHM1 PacBio assembly to CHM1_1.1
BioNano Genome Map confirms assembly of PacBio Contig
PacBio Assembly Contig
BioNano Genome Map Contigs
1q21
1q21 patch alignment to chromosome 1
1q32 1q21 1p21
SRGAP2 Region in PacBio Asssembly
1q21
CHM1 Falcon vs MHAP Assembly Stats
• MHAP assembly Available for download – GCA_000772585.3
Falcon Assembly MHAP
Number of Contigs 5528 3434
N50 Contig Length 5,460,023 4,320,471
Total Assembly Size 2,818,296,359 2,828,300,545
CHM1 Assemblies – More on the Way
• MHAP Assembly
• Done by Adam Phillippy
• 1-2 more assemblies will be generated
• Dazzler Assembly
• Gene Myers version
• Longer contig N50 length
• Believe we will be evaluating it, but haven’t seen it yet
• Falcon Assemblies
• Jason Chin generating 1-2 additional Falcon assemblies using
improved software
CHM1 Assembly Assessment Methods
• Assemblies will run through NCBI QA pipeline
• Assessed for contiguity, annotation, and concordance with the
finished BAC paths
• Assembly Assembly alignments will be generated between each PB
assembly and Illumina-based CHM1 assembly as well as GRCh38
• BioNano Genome Map
• SV calls generated from comparing the map data to each of the
CHM1 assemblies
• Alignment of the Illumina reads back to the CHM1
assemblies
• Heterozygous calls are likely indicative of a collapse in the
assembly
The Platinum Genome
• What is it?
• Contiguous
• Haplotype-resolved representation of entire genome
• Best assembly from mini-assemblethon will be picked and
improved
• BAC clone paths will be incorporated into PacBio whole genome
assembly
• Comparison back to CHM1_1.1 to see if portions of the Illumina
assembly will fill in any gaps
• Pick additional BACs to cover regions of the assembly that are
still very fragmented
CHM13 – 2nd Platinum Genome
• CHM13 – another hydatidiform mole sample
• PacBio data generated
• 60X data was generated using P5 and P6 Chemistry
• Avg read length ~11kb, longer than CHM1 data
• Data available in SRA
• Generating Illumina coverage to use for assembly QA, SV
detection, and consensus base error correction
• Plan to use BACs to improve the assembly where needed
• Alignment of Assembly to BioNano Genome map
• Currently ~91% of CHM13 assembly aligns to BioNano map
contigs
CHM13 Assembled by DNAnexus
• DNAnexus is a cloud-based genome informatics & data
management platform that enables:
• Large scale genomic analysis
• Easy and secure collaboration of data
• Governance and compliance
• Simple deployment of your own code or use of pre-packaged tools
• DNAnexus packaged FALCON so that it can be run without
complicated installation and at scale.
• DNAnexus gives access to massive computational resources
on-demand.
• During assembly of CHM13 FALCON made use of 350
concurrent workers and 1400 concurrent cores.
DNAnexus FALCON Pipeline
CHM13 – 2nd Platinum Genome
Stats PacBio DNAnexus
Number of Contigs 2873 2203
N50 12,981,785 11,909,487
N90 2,100,287 1,745,715
N95 743,427 808,675
Max Contig Length 63,148,543 53,079,926
Total Sequence 2,851,367,788 2,809,672,639
Total Assembly Time 5 days 41 hours
Refseq Analysis
GRCh38 CHM1_1.1 MHAP
CHM1
PacBio
CHM1
CHM13
Number of
sequences
not aligning
21 88 67 67 125
Split
Transcripts
8 35 1245 1131 285
CDS coverage
<95%
17 266 1339 1212 265
Total Sequences Retrieved from Entrez 49680
Future Directions
• Improve assemblies of both CHM1 and CHM13 to result in a
completely resolved final assembly for each genome
• From both assemblies, add significant structural variants
to the reference as alternate loci
• Sequence additional genomes to add even more diversity
to the reference from more underrepresented populations

More Related Content

What's hot

Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
Genome Reference Consortium
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
Genome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
Genome Reference Consortium
 
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
Genome Reference Consortium
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
Genome Reference Consortium
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
Genome Reference Consortium
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
Genome Reference Consortium
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
Genome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
Genome Reference Consortium
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
Genome Reference Consortium
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
Genome Reference Consortium
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
Shaojun Xie
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
Genome Reference Consortium
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
Genome Reference Consortium
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014
vaschn
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
Genome Reference Consortium
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
Genome Reference Consortium
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
Genome Reference Consortium
 

What's hot (20)

Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 

Similar to Grc workshop agbt2015_tg

Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Stuart MacGowan
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
Lex Nederbragt
 
Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​
Jennifer Shelton
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
Genome Reference Consortium
 
ChIP-seq
ChIP-seqChIP-seq
Full-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdfFull-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdf
ATPowr
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
GenomeInABottle
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
Jennifer Shelton
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
Jennifer Shelton
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
COST action BM1006
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
Nikolay Vyahhi
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
hansjansen9999
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Prof. Wim Van Criekinge
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
sesejun
 
Distributed approach for Peptide Identification
Distributed approach for Peptide IdentificationDistributed approach for Peptide Identification
Distributed approach for Peptide Identification
abhinav vedanbhatla
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?
Adam Phillippy
 
ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics ToolsECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
Nick Loman
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...
Miten Jain
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
GenomeInABottle
 
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical ResearchBruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Danny Abukalam
 

Similar to Grc workshop agbt2015_tg (20)

Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
 
Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
ChIP-seq
ChIP-seqChIP-seq
ChIP-seq
 
Full-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdfFull-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdf
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Distributed approach for Peptide Identification
Distributed approach for Peptide IdentificationDistributed approach for Peptide Identification
Distributed approach for Peptide Identification
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?
 
ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics ToolsECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical ResearchBruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
 

More from Genome Reference Consortium

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
Genome Reference Consortium
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
Genome Reference Consortium
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
Genome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
Genome Reference Consortium
 
Mane v2 final
Mane v2 finalMane v2 final
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
Genome Reference Consortium
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
Genome Reference Consortium
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
Genome Reference Consortium
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
Genome Reference Consortium
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
Genome Reference Consortium
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
Genome Reference Consortium
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
Genome Reference Consortium
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
Genome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
Genome Reference Consortium
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
Genome Reference Consortium
 

More from Genome Reference Consortium (16)

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
 

Recently uploaded

Text Book of Critical Care Nursing ICU NURSING
Text Book of Critical Care Nursing  ICU NURSINGText Book of Critical Care Nursing  ICU NURSING
Text Book of Critical Care Nursing ICU NURSING
BP KOIRALA INSTITUTE OF HELATH SCIENCS,, NEPAL
 
JULY 2024 Oncology Cartoons by Dr Kanhu Charan Patro
JULY 2024 Oncology Cartoons by Dr Kanhu Charan PatroJULY 2024 Oncology Cartoons by Dr Kanhu Charan Patro
JULY 2024 Oncology Cartoons by Dr Kanhu Charan Patro
Kanhu Charan
 
How to Relieve Prostate Congestion- Here are some Effective Strategies.pptx
How to Relieve Prostate Congestion- Here are some Effective Strategies.pptxHow to Relieve Prostate Congestion- Here are some Effective Strategies.pptx
How to Relieve Prostate Congestion- Here are some Effective Strategies.pptx
AmandaChou9
 
Text Book of Operation Theater Nursing OT Nursing
Text Book of Operation Theater Nursing OT NursingText Book of Operation Theater Nursing OT Nursing
Text Book of Operation Theater Nursing OT Nursing
BP KOIRALA INSTITUTE OF HELATH SCIENCS,, NEPAL
 
Prostatitis Severity- How to Determine if You Have Mild Symptoms.pptx
Prostatitis Severity- How to Determine if You Have Mild Symptoms.pptxProstatitis Severity- How to Determine if You Have Mild Symptoms.pptx
Prostatitis Severity- How to Determine if You Have Mild Symptoms.pptx
AmandaChou9
 
Pharmacological Management of Hypertension: New Drugs and Mechanisms
Pharmacological Management of Hypertension: New Drugs and MechanismsPharmacological Management of Hypertension: New Drugs and Mechanisms
Pharmacological Management of Hypertension: New Drugs and Mechanisms
Medwin Publishers
 
Amygdala Medi-Trivia Quiz (Prelims) | FAQ 2024
Amygdala Medi-Trivia Quiz (Prelims) | FAQ 2024Amygdala Medi-Trivia Quiz (Prelims) | FAQ 2024
Amygdala Medi-Trivia Quiz (Prelims) | FAQ 2024
Anindya Das Adhikary
 
selllllllllllllllllllllllllllllllllllllllllllllll.pptx
selllllllllllllllllllllllllllllllllllllllllllllll.pptxselllllllllllllllllllllllllllllllllllllllllllllll.pptx
selllllllllllllllllllllllllllllllllllllllllllllll.pptx
Joebest8
 
THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...
THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...
THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...
DRPREETHIJAMESP
 
THE MANAGEMENT OF PROSTATE CANCER . pptx
THE MANAGEMENT OF PROSTATE CANCER . pptxTHE MANAGEMENT OF PROSTATE CANCER . pptx
THE MANAGEMENT OF PROSTATE CANCER . pptx
Bright Chipili
 
vaginal thrush presentation by Dr. Rewas Ali
vaginal thrush presentation by Dr. Rewas Alivaginal thrush presentation by Dr. Rewas Ali
vaginal thrush presentation by Dr. Rewas Ali
RewAs ALI
 
2nd week of Human development .embryology
2nd week of Human development .embryology2nd week of Human development .embryology
2nd week of Human development .embryology
Mithilesh Chaurasia
 
Text Book of Nursing Concepts - Fundamental of Nursing
Text Book of Nursing Concepts - Fundamental of NursingText Book of Nursing Concepts - Fundamental of Nursing
Text Book of Nursing Concepts - Fundamental of Nursing
BP KOIRALA INSTITUTE OF HELATH SCIENCS,, NEPAL
 
Top 10 Habits for Longevity [Biohacker Summit 2024]
Top 10 Habits for Longevity [Biohacker Summit 2024]Top 10 Habits for Longevity [Biohacker Summit 2024]
Top 10 Habits for Longevity [Biohacker Summit 2024]
Olli Sovijärvi
 
Drug Repurposing for Parasitic Diseases.pptx
Drug Repurposing for Parasitic Diseases.pptxDrug Repurposing for Parasitic Diseases.pptx
Drug Repurposing for Parasitic Diseases.pptx
drebrahiim
 
Exploring Alternatives- Why Laparoscopy Isn't Always Best for Hydrosalpinx.pptx
Exploring Alternatives- Why Laparoscopy Isn't Always Best for Hydrosalpinx.pptxExploring Alternatives- Why Laparoscopy Isn't Always Best for Hydrosalpinx.pptx
Exploring Alternatives- Why Laparoscopy Isn't Always Best for Hydrosalpinx.pptx
FFragrant
 
Medical oncologic management of Colorectal cancer-1-1.pptx
Medical oncologic management of Colorectal cancer-1-1.pptxMedical oncologic management of Colorectal cancer-1-1.pptx
Medical oncologic management of Colorectal cancer-1-1.pptx
robel26
 
Recognizing and Managing Bacterial Vaginosis.pptx
Recognizing and Managing Bacterial Vaginosis.pptxRecognizing and Managing Bacterial Vaginosis.pptx
Recognizing and Managing Bacterial Vaginosis.pptx
FFragrant
 
2024 07 12 Do you share my autistic traits_ - Google Sheets.pdf
2024 07 12 Do you share my autistic traits_ - Google Sheets.pdf2024 07 12 Do you share my autistic traits_ - Google Sheets.pdf
2024 07 12 Do you share my autistic traits_ - Google Sheets.pdf
CarriePoppy
 
Approach to Head Injuiry, Intracranial Pressure Measurement and Management.pptx
Approach to Head Injuiry, Intracranial Pressure Measurement and Management.pptxApproach to Head Injuiry, Intracranial Pressure Measurement and Management.pptx
Approach to Head Injuiry, Intracranial Pressure Measurement and Management.pptx
Bipul Thakur
 

Recently uploaded (20)

Text Book of Critical Care Nursing ICU NURSING
Text Book of Critical Care Nursing  ICU NURSINGText Book of Critical Care Nursing  ICU NURSING
Text Book of Critical Care Nursing ICU NURSING
 
JULY 2024 Oncology Cartoons by Dr Kanhu Charan Patro
JULY 2024 Oncology Cartoons by Dr Kanhu Charan PatroJULY 2024 Oncology Cartoons by Dr Kanhu Charan Patro
JULY 2024 Oncology Cartoons by Dr Kanhu Charan Patro
 
How to Relieve Prostate Congestion- Here are some Effective Strategies.pptx
How to Relieve Prostate Congestion- Here are some Effective Strategies.pptxHow to Relieve Prostate Congestion- Here are some Effective Strategies.pptx
How to Relieve Prostate Congestion- Here are some Effective Strategies.pptx
 
Text Book of Operation Theater Nursing OT Nursing
Text Book of Operation Theater Nursing OT NursingText Book of Operation Theater Nursing OT Nursing
Text Book of Operation Theater Nursing OT Nursing
 
Prostatitis Severity- How to Determine if You Have Mild Symptoms.pptx
Prostatitis Severity- How to Determine if You Have Mild Symptoms.pptxProstatitis Severity- How to Determine if You Have Mild Symptoms.pptx
Prostatitis Severity- How to Determine if You Have Mild Symptoms.pptx
 
Pharmacological Management of Hypertension: New Drugs and Mechanisms
Pharmacological Management of Hypertension: New Drugs and MechanismsPharmacological Management of Hypertension: New Drugs and Mechanisms
Pharmacological Management of Hypertension: New Drugs and Mechanisms
 
Amygdala Medi-Trivia Quiz (Prelims) | FAQ 2024
Amygdala Medi-Trivia Quiz (Prelims) | FAQ 2024Amygdala Medi-Trivia Quiz (Prelims) | FAQ 2024
Amygdala Medi-Trivia Quiz (Prelims) | FAQ 2024
 
selllllllllllllllllllllllllllllllllllllllllllllll.pptx
selllllllllllllllllllllllllllllllllllllllllllllll.pptxselllllllllllllllllllllllllllllllllllllllllllllll.pptx
selllllllllllllllllllllllllllllllllllllllllllllll.pptx
 
THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...
THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...
THE REVIEW OF THE ENCYCLOPEDIA OF PURE MATERIA MEDICA.BHMS.MATERIA MEDICA.HOM...
 
THE MANAGEMENT OF PROSTATE CANCER . pptx
THE MANAGEMENT OF PROSTATE CANCER . pptxTHE MANAGEMENT OF PROSTATE CANCER . pptx
THE MANAGEMENT OF PROSTATE CANCER . pptx
 
vaginal thrush presentation by Dr. Rewas Ali
vaginal thrush presentation by Dr. Rewas Alivaginal thrush presentation by Dr. Rewas Ali
vaginal thrush presentation by Dr. Rewas Ali
 
2nd week of Human development .embryology
2nd week of Human development .embryology2nd week of Human development .embryology
2nd week of Human development .embryology
 
Text Book of Nursing Concepts - Fundamental of Nursing
Text Book of Nursing Concepts - Fundamental of NursingText Book of Nursing Concepts - Fundamental of Nursing
Text Book of Nursing Concepts - Fundamental of Nursing
 
Top 10 Habits for Longevity [Biohacker Summit 2024]
Top 10 Habits for Longevity [Biohacker Summit 2024]Top 10 Habits for Longevity [Biohacker Summit 2024]
Top 10 Habits for Longevity [Biohacker Summit 2024]
 
Drug Repurposing for Parasitic Diseases.pptx
Drug Repurposing for Parasitic Diseases.pptxDrug Repurposing for Parasitic Diseases.pptx
Drug Repurposing for Parasitic Diseases.pptx
 
Exploring Alternatives- Why Laparoscopy Isn't Always Best for Hydrosalpinx.pptx
Exploring Alternatives- Why Laparoscopy Isn't Always Best for Hydrosalpinx.pptxExploring Alternatives- Why Laparoscopy Isn't Always Best for Hydrosalpinx.pptx
Exploring Alternatives- Why Laparoscopy Isn't Always Best for Hydrosalpinx.pptx
 
Medical oncologic management of Colorectal cancer-1-1.pptx
Medical oncologic management of Colorectal cancer-1-1.pptxMedical oncologic management of Colorectal cancer-1-1.pptx
Medical oncologic management of Colorectal cancer-1-1.pptx
 
Recognizing and Managing Bacterial Vaginosis.pptx
Recognizing and Managing Bacterial Vaginosis.pptxRecognizing and Managing Bacterial Vaginosis.pptx
Recognizing and Managing Bacterial Vaginosis.pptx
 
2024 07 12 Do you share my autistic traits_ - Google Sheets.pdf
2024 07 12 Do you share my autistic traits_ - Google Sheets.pdf2024 07 12 Do you share my autistic traits_ - Google Sheets.pdf
2024 07 12 Do you share my autistic traits_ - Google Sheets.pdf
 
Approach to Head Injuiry, Intracranial Pressure Measurement and Management.pptx
Approach to Head Injuiry, Intracranial Pressure Measurement and Management.pptxApproach to Head Injuiry, Intracranial Pressure Measurement and Management.pptx
Approach to Head Injuiry, Intracranial Pressure Measurement and Management.pptx
 

Grc workshop agbt2015_tg

  • 1. GRC Workshop at AGBT 2015 Tina Graves-Lindsay
  • 2. CHM1 PacBio Data and Initial Assembly Stats • 54X Whole Genome Coverage in long reads • 8.8kb Avg read length • P5-C3 Chemistry • PacBio Assembly done by Jason Chin • Initial assembly had 4.5 MB N50 contig length • Have alignments of PacBio CHM1 assembly to CHM1_1.1 and GRCh38
  • 3. PacBio CHM1 Assembly potentially fills GRCh38 Gaps GRCh38 PacBio CHM1 Data exists in PacBio unitig, not present in GRCh38
  • 4. CHM1_1.1 WGS Assembly Contigs PacBio Assembly Contig Alignment of CHM1 PacBio assembly to CHM1_1.1
  • 5. BioNano Genome Map confirms assembly of PacBio Contig PacBio Assembly Contig BioNano Genome Map Contigs
  • 6. 1q21 1q21 patch alignment to chromosome 1 1q32 1q21 1p21
  • 7. SRGAP2 Region in PacBio Asssembly 1q21
  • 8. CHM1 Falcon vs MHAP Assembly Stats • MHAP assembly Available for download – GCA_000772585.3 Falcon Assembly MHAP Number of Contigs 5528 3434 N50 Contig Length 5,460,023 4,320,471 Total Assembly Size 2,818,296,359 2,828,300,545
  • 9. CHM1 Assemblies – More on the Way • MHAP Assembly • Done by Adam Phillippy • 1-2 more assemblies will be generated • Dazzler Assembly • Gene Myers version • Longer contig N50 length • Believe we will be evaluating it, but haven’t seen it yet • Falcon Assemblies • Jason Chin generating 1-2 additional Falcon assemblies using improved software
  • 10. CHM1 Assembly Assessment Methods • Assemblies will run through NCBI QA pipeline • Assessed for contiguity, annotation, and concordance with the finished BAC paths • Assembly Assembly alignments will be generated between each PB assembly and Illumina-based CHM1 assembly as well as GRCh38 • BioNano Genome Map • SV calls generated from comparing the map data to each of the CHM1 assemblies • Alignment of the Illumina reads back to the CHM1 assemblies • Heterozygous calls are likely indicative of a collapse in the assembly
  • 11. The Platinum Genome • What is it? • Contiguous • Haplotype-resolved representation of entire genome • Best assembly from mini-assemblethon will be picked and improved • BAC clone paths will be incorporated into PacBio whole genome assembly • Comparison back to CHM1_1.1 to see if portions of the Illumina assembly will fill in any gaps • Pick additional BACs to cover regions of the assembly that are still very fragmented
  • 12. CHM13 – 2nd Platinum Genome • CHM13 – another hydatidiform mole sample • PacBio data generated • 60X data was generated using P5 and P6 Chemistry • Avg read length ~11kb, longer than CHM1 data • Data available in SRA • Generating Illumina coverage to use for assembly QA, SV detection, and consensus base error correction • Plan to use BACs to improve the assembly where needed • Alignment of Assembly to BioNano Genome map • Currently ~91% of CHM13 assembly aligns to BioNano map contigs
  • 13. CHM13 Assembled by DNAnexus • DNAnexus is a cloud-based genome informatics & data management platform that enables: • Large scale genomic analysis • Easy and secure collaboration of data • Governance and compliance • Simple deployment of your own code or use of pre-packaged tools • DNAnexus packaged FALCON so that it can be run without complicated installation and at scale. • DNAnexus gives access to massive computational resources on-demand. • During assembly of CHM13 FALCON made use of 350 concurrent workers and 1400 concurrent cores.
  • 15. CHM13 – 2nd Platinum Genome Stats PacBio DNAnexus Number of Contigs 2873 2203 N50 12,981,785 11,909,487 N90 2,100,287 1,745,715 N95 743,427 808,675 Max Contig Length 63,148,543 53,079,926 Total Sequence 2,851,367,788 2,809,672,639 Total Assembly Time 5 days 41 hours
  • 16. Refseq Analysis GRCh38 CHM1_1.1 MHAP CHM1 PacBio CHM1 CHM13 Number of sequences not aligning 21 88 67 67 125 Split Transcripts 8 35 1245 1131 285 CDS coverage <95% 17 266 1339 1212 265 Total Sequences Retrieved from Entrez 49680
  • 17. Future Directions • Improve assemblies of both CHM1 and CHM13 to result in a completely resolved final assembly for each genome • From both assemblies, add significant structural variants to the reference as alternate loci • Sequence additional genomes to add even more diversity to the reference from more underrepresented populations

Editor's Notes

  1. As I mentioned earlier, PacBio generated 54X coverage of long sequence reads with an average read length of almost 9kb. It was sequenced with P5-C3 chemistry and the assembly was performed by Jason Chin from PacBio. The initial assembly N50 contig length was 4.5Mb, which is a much longer N50 than our Illumina-based assembly. NCBI generated assembly assembly alignments for us, both to our Illumina based CHM1 assembly and also to GRCh38. In the next few slides I will show you some examples how this long read data is very useful in providing a more complete whole genome assembly.
  2. Here is an example where the PacBio assembly looks to close a GRCh38 gap. This could be a region that we were unable to close iin the reference because of a cloning bias in the BAC libraries we but now with the longer PacBio reads, more of these regions can be filled
  3. Here is an example of this contiguity. This is an alignment of the CHM1_1.1 whole genome assembly to the PacBio CHM1 assembly. The top lines in green line represents the Illumina based WGS assembly. This second track is the WGS contigs. The bottom track is blue represents one PacBio contig. This contig is1.5MB long and overlaps 37 contigs in the WGS assembly and one larger gap This shows the big difference in contiguity when comparing the PacBio assembly to the Illumina based CHM1 reference guided assembly ----- Meeting Notes (2/20/15 09:12) ----- take out second track
  4. Another resource we have is a BioNano Genome Map of CHM1. BioNano is a nanopore mapping technology where the DNA in very long molecules is nicked and labeled and run through a nanochannel. Here is an example of the CHM1 Bionano map aligned to this same 1.5 Mb Pb contig. On top in green is the PacBio contig. The lines indicate the in silico nick sites. The Blue bars indicate the Bionano contigs. You can see how well they align. The Bionano data can be used as an independent source to assess the CHM1.
  5. If we look back to the SRGAP2 example, remember, the sequence duplications are very large and are in three separate regions along the chromosome
  6. Even with the extremely long contiguity, there are still some regions of the genome that the PacBio assembly does not represent very well. This is the 1q21 region, The top panel in blue represents a portion of the clone path that was sequenced to resplve this region in the reference. The next panel represents the PacBio contigs. The take home message here is look at how fragmented the PB assembly is through this region. Even with the huge N50 contig length, regions with large duplication will not assemble well. So through regions like this, BAC paths are still needed to fully resolve the sequence.
  7. Look for filtered stats in mhap
  8. Since the 54X coverage was made available by PacBio, there have been a few other groups that have assembled this data. We recently had a phone call with a couple of the groups to discuss which CHM1 assembly might be best. On the call, it was decided that we would have a mini-assemblathon. Both Adam Phillippy, and Jason Chin wanted to generate a few additional assemblies with improvement in their software. Adam’s initial assembly had a little bit shorter N50 contig length. This assembly is available in Genbank.
  9. All assemblies from the mini-assemblathon will be run through the NCBI QA pipeline. They will be assessed for contiguity, annotation, and concordance with the finished BAC sequences. Assembly assembly alignments will also be performed between each of the PB assemblies and the Illumina CHM1 assembly as well as GRCh38. Another assessment tool we have is a BioNano Genome map. The map contigs will be aligned to the assemblies and then SV calls will be generated. These SV calls should point to potential assembly errors. Another tool we can use is the alignment using bwa of the Illumina data back to the assembly.
  10. Our ultimate goal for CHM1 is to have a Platinum assembly, one that is as contiguous as possible and is a haplotype-resolved representation of the enitre genome. This could be used as another option when mapping reads, especially for projects where haplotypes are important. Topic for discussion – what to do with centromeres and telomeres
  11. We have a second hydatidiform mole sample known as CHM13. We plan for this to be a second platinum genome. So far, we have generated 60X coverage of long read PacBio data. Both P5 and P6 chemistry was used, and the average read length was much longer than what it was for CHM1. These reads are available in the SRA. We have also begun generating Illumina data to use for assembly QA, SV detection, and possibly for consensus accuracy correction.
  12. Are there any small contig filters?