SlideShare a Scribd company logo
Advancing the Human Reference Assembly
Valerie Schneider
NCBI
25 February 2015
The Human Reference Genome: Today, Tomorrow and Next ?
http://genomereference.org
Dilthey et al.Paten et al.
Scientific Models
Outline
• The assembly model
• Basics
• Value added
• Challenges
• Future relevance of the reference
• Multiple genomes
• Haploid genomes
• Assembly updates
• Mechanisms
• Requirements/Challenges
Sequences from haplotype 1
Sequences from haplotype 2
Old Assembly model: compress into a consensus
Current Assembly model: represent both haplotypes
GRC Assembly Model
many
Assembly (e.g. GRCh38)
Primary
Assembly
Unit
Non-nuclear
assembly unit
(e.g. MT)
PAR
Genomic
Region
(MHC)
Genomic
Region
(UGT2B17)
Genomic
Region
(MAPT)
Church et al., PLoS Biol. 2011 Jul;9(7):e1001091
GRC Assembly Model
ALT
2
ALT
3
ALT
4
ALT
5
ALT
6
ALT
7
ALT
1
GRC Assembly Model
Alt loci alignments are an integral part of the assembly model
alignment to chr + scaffold sequence = Alt
GRCh38
• 178 regions with alt loci: 2% of chromosome
sequence (61.9 Mb)
• 261 Alt Loci: 3.6 Mb novel sequence relative to
chromosomes
• Average alt length = 400 kb, max = ~5 Mb
GRCh38
GRC Assembly Model
The human reference assembly represents population
genomic diversity in the context of linear sequences
GRCh38: Alt Loci
Alignment Legend
no alignmentmismatchdeletion
GRCh38: Alt Loci
GRCh38 alt
loci alignment
GRCh37 chr. 7
chromosome
alt/patch
reads On-target alignment
Off-target alignments
(n=122,922)
GRCh38: Alt Loci
GRC Assembly Model
http://notvine.co/
http://designtaxi.com/
Challenges: Allelic Duplication
Challenges: Reporting Multiple Locations
SRPRISM
Challenges: Solutions
https://github.com/samtools/hts-specs/issues/51
Multiple Genome Era
ADMIXTURE?
http://medcitynews.com/
Multiple Genome Era
Assembly (e.g. GRCh38.p1)
Primary
Assembly
Unit
Non-nuclear
assembly unit
(e.g. MT)
ALT
1
ALT
2
ALT
3
ALT
4
ALT
5
ALT
6
ALT
7
PAR
Genomic
Region
(MHC)
Genomic
Region
(UGT2B17)
Genomic
Region
(MAPT)
Patches
Genomic
Region
(ABO)
Genomic
Region
(FOXO6)
Genomic
Region
(FCGBP)
Assembly Updates
Patches
FIX NOVEL
SCAFFOLD STATUS AT NEXT
MAJOR ASSEMBLY RELEASE
ALT
LOCI
--
(integrated)
Treat as:
Allelic
Treat as:
Preferred
Assembly Updates
Assembly Updates
GRC
• Finished Quality
• INSDC Accessioned
• Representative of an actual DNA molecule
Criteria for Reference Assembly Component Sequences
Summary
• Reference Assembly: Today
• Multi-allelic
• Need compatible toolsuites
• Reference Assembly: Tomorrow
• Defining sequence context
• Providing coordinates
• Reference Assembly: Next ?
• Patches
• Challenges
GRCh38 Collaborators
• NCBI RefSeq and gpipe annotation team
• Havana annotators
• Karen Miga
• David Schwartz
• Steve Goldstein
• Mario Caceres
• Giulio Genovese
• Jeff Kidd
• Peter Lansdorp
• Mark Hills
• David Page
• Jim Knight
• Stephan Schuster
• 1000 Genomes
GRC SAB
• Rick Myers
• Granger Sutton
• Evan Eichler
• Jim Kent
• Roderic Guigo
• Carol Bult
• Derek Stemple
• Jan Korbel
• Liz Worthey
• Matthew Hurles
• Richard Gibbs
GRC Credits
Workshop sponsor:
http://genomereference.org

More Related Content

What's hot

Variant Calling II
Variant Calling IIVariant Calling II
Variant Calling II
Genome Reference Consortium
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
Genome Reference Consortium
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014
vaschn
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
Genome Reference Consortium
 
agbt 2016 workshop church
agbt 2016 workshop churchagbt 2016 workshop church
agbt 2016 workshop church
Genome Reference Consortium
 
Ashg2015 grc-pruitt
Ashg2015 grc-pruittAshg2015 grc-pruitt
Ashg2015 grc-pruitt
Genome Reference Consortium
 
Grc workshop agbt2015_tg
Grc workshop agbt2015_tgGrc workshop agbt2015_tg
Grc workshop agbt2015_tg
Genome Reference Consortium
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
Genome Reference Consortium
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
Genome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
Genome Reference Consortium
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
Shaojun Xie
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
Genome Reference Consortium
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
Genome Reference Consortium
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
Genome Reference Consortium
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
Genome Reference Consortium
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
Genome Reference Consortium
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
Genome Reference Consortium
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
Genome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
Genome Reference Consortium
 

What's hot (20)

Variant Calling II
Variant Calling IIVariant Calling II
Variant Calling II
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
agbt 2016 workshop church
agbt 2016 workshop churchagbt 2016 workshop church
agbt 2016 workshop church
 
Ashg2015 grc-pruitt
Ashg2015 grc-pruittAshg2015 grc-pruitt
Ashg2015 grc-pruitt
 
Grc workshop agbt2015_tg
Grc workshop agbt2015_tgGrc workshop agbt2015_tg
Grc workshop agbt2015_tg
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 

Similar to Agbt2015 workshop schneider

A machine-learning view on heterogeneous catalyst design and discovery
A machine-learning view on heterogeneous catalyst design and discoveryA machine-learning view on heterogeneous catalyst design and discovery
A machine-learning view on heterogeneous catalyst design and discovery
Ichigaku Takigawa
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
NextMove Software
 
Concurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine LearningConcurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine Learning
jeykottalam
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
Deanna Church
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
GenomeInABottle
 
Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
GigaScience, BGI Hong Kong
 
Biology-Derived Algorithms in Engineering Optimization
Biology-Derived Algorithms in Engineering OptimizationBiology-Derived Algorithms in Engineering Optimization
Biology-Derived Algorithms in Engineering Optimization
Xin-She Yang
 
Implementation of DNA sequence alignment algorithms using Fpga ,ML,and CNN
Implementation of DNA sequence alignment algorithms  using Fpga ,ML,and CNNImplementation of DNA sequence alignment algorithms  using Fpga ,ML,and CNN
Implementation of DNA sequence alignment algorithms using Fpga ,ML,and CNN
Amr Rashed
 
CIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptxCIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptx
Fatma Sayed Ibrahim
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...
William Chow
 
ga-2.ppt
ga-2.pptga-2.ppt
ga-2.ppt
sayedmha
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
NECST Lab @ Politecnico di Milano
 
Recent software and services to support the SBML community
Recent software and services to support the SBML community Recent software and services to support the SBML community
Recent software and services to support the SBML community
Mike Hucka
 
NAME's Structure of the Grammatic Genome
NAME's Structure of the Grammatic GenomeNAME's Structure of the Grammatic Genome
NAME's Structure of the Grammatic Genome
A-Square Technology Group/Nascent Applied Methods and Endeavors
 
Second Order Heuristics in ACGP
Second Order Heuristics in ACGPSecond Order Heuristics in ACGP
Second Order Heuristics in ACGPhauschildm
 
Applying the Scientific Method to Simulation Experiments
Applying the Scientific Method to Simulation ExperimentsApplying the Scientific Method to Simulation Experiments
Applying the Scientific Method to Simulation Experiments
Frank Bergmann
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems Biology
Mike Hucka
 
Cram 3.1 / Crumble
Cram 3.1 / CrumbleCram 3.1 / Crumble
Cram 3.1 / Crumble
JamesBonfield
 
GIAB Sep2016 Lightning chen sun varmatch
GIAB Sep2016 Lightning chen sun varmatchGIAB Sep2016 Lightning chen sun varmatch
GIAB Sep2016 Lightning chen sun varmatch
GenomeInABottle
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Ganesan Narayanasamy
 

Similar to Agbt2015 workshop schneider (20)

A machine-learning view on heterogeneous catalyst design and discovery
A machine-learning view on heterogeneous catalyst design and discoveryA machine-learning view on heterogeneous catalyst design and discovery
A machine-learning view on heterogeneous catalyst design and discovery
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Concurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine LearningConcurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine Learning
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
 
Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
 
Biology-Derived Algorithms in Engineering Optimization
Biology-Derived Algorithms in Engineering OptimizationBiology-Derived Algorithms in Engineering Optimization
Biology-Derived Algorithms in Engineering Optimization
 
Implementation of DNA sequence alignment algorithms using Fpga ,ML,and CNN
Implementation of DNA sequence alignment algorithms  using Fpga ,ML,and CNNImplementation of DNA sequence alignment algorithms  using Fpga ,ML,and CNN
Implementation of DNA sequence alignment algorithms using Fpga ,ML,and CNN
 
CIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptxCIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptx
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...
 
ga-2.ppt
ga-2.pptga-2.ppt
ga-2.ppt
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 
Recent software and services to support the SBML community
Recent software and services to support the SBML community Recent software and services to support the SBML community
Recent software and services to support the SBML community
 
NAME's Structure of the Grammatic Genome
NAME's Structure of the Grammatic GenomeNAME's Structure of the Grammatic Genome
NAME's Structure of the Grammatic Genome
 
Second Order Heuristics in ACGP
Second Order Heuristics in ACGPSecond Order Heuristics in ACGP
Second Order Heuristics in ACGP
 
Applying the Scientific Method to Simulation Experiments
Applying the Scientific Method to Simulation ExperimentsApplying the Scientific Method to Simulation Experiments
Applying the Scientific Method to Simulation Experiments
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems Biology
 
Cram 3.1 / Crumble
Cram 3.1 / CrumbleCram 3.1 / Crumble
Cram 3.1 / Crumble
 
GIAB Sep2016 Lightning chen sun varmatch
GIAB Sep2016 Lightning chen sun varmatchGIAB Sep2016 Lightning chen sun varmatch
GIAB Sep2016 Lightning chen sun varmatch
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
 

More from Genome Reference Consortium

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
Genome Reference Consortium
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
Genome Reference Consortium
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
Genome Reference Consortium
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
Genome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
Genome Reference Consortium
 
Mane v2 final
Mane v2 finalMane v2 final
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
Genome Reference Consortium
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
Genome Reference Consortium
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
Genome Reference Consortium
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
Genome Reference Consortium
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
Genome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
Genome Reference Consortium
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
Genome Reference Consortium
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
Genome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
Genome Reference Consortium
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
Genome Reference Consortium
 

More from Genome Reference Consortium (18)

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
 

Recently uploaded

Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Oleg Kshivets
 
A Classical Text Review on Basavarajeeyam
A Classical Text Review on BasavarajeeyamA Classical Text Review on Basavarajeeyam
A Classical Text Review on Basavarajeeyam
Dr. Jyothirmai Paindla
 
share - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptxshare - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptx
Tina Purnat
 
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachIntegrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Ayurveda ForAll
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
Dr. Jyothirmai Paindla
 
Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
Dr. Rabia Inam Gandapore
 
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists  Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Saeid Safari
 
Non-respiratory Functions of the Lungs.pdf
Non-respiratory Functions of the Lungs.pdfNon-respiratory Functions of the Lungs.pdf
Non-respiratory Functions of the Lungs.pdf
MedicoseAcademics
 
Aortic Association CBL Pilot April 19 – 20 Bern
Aortic Association CBL Pilot April 19 – 20 BernAortic Association CBL Pilot April 19 – 20 Bern
Aortic Association CBL Pilot April 19 – 20 Bern
suvadeepdas911
 
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in IndiaTop 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Swastik Ayurveda
 
Adv. biopharm. APPLICATION OF PHARMACOKINETICS : TARGETED DRUG DELIVERY SYSTEMS
Adv. biopharm. APPLICATION OF PHARMACOKINETICS : TARGETED DRUG DELIVERY SYSTEMSAdv. biopharm. APPLICATION OF PHARMACOKINETICS : TARGETED DRUG DELIVERY SYSTEMS
Adv. biopharm. APPLICATION OF PHARMACOKINETICS : TARGETED DRUG DELIVERY SYSTEMS
AkankshaAshtankar
 
Management of Traumatic Splenic injury.pptx
Management of Traumatic Splenic injury.pptxManagement of Traumatic Splenic injury.pptx
Management of Traumatic Splenic injury.pptx
AkshaySarraf1
 
ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.
drhasanrajab
 
Physiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of TastePhysiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of Taste
MedicoseAcademics
 
How STIs Influence the Development of Pelvic Inflammatory Disease.pptx
How STIs Influence the Development of Pelvic Inflammatory Disease.pptxHow STIs Influence the Development of Pelvic Inflammatory Disease.pptx
How STIs Influence the Development of Pelvic Inflammatory Disease.pptx
FFragrant
 
ARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTS
ARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTSARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTS
ARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTS
Dr. Vinay Pareek
 
Sex determination from mandible pelvis and skull
Sex determination from mandible pelvis and skullSex determination from mandible pelvis and skull
Sex determination from mandible pelvis and skull
ShashankRoodkee
 
micro teaching on communication m.sc nursing.pdf
micro teaching on communication m.sc nursing.pdfmicro teaching on communication m.sc nursing.pdf
micro teaching on communication m.sc nursing.pdf
Anurag Sharma
 
263778731218 Abortion Clinic /Pills In Harare ,
263778731218 Abortion Clinic /Pills In Harare ,263778731218 Abortion Clinic /Pills In Harare ,
263778731218 Abortion Clinic /Pills In Harare ,
sisternakatoto
 
Physiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdfPhysiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdf
MedicoseAcademics
 

Recently uploaded (20)

Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
 
A Classical Text Review on Basavarajeeyam
A Classical Text Review on BasavarajeeyamA Classical Text Review on Basavarajeeyam
A Classical Text Review on Basavarajeeyam
 
share - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptxshare - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptx
 
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachIntegrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
 
Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
 
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists  Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
 
Non-respiratory Functions of the Lungs.pdf
Non-respiratory Functions of the Lungs.pdfNon-respiratory Functions of the Lungs.pdf
Non-respiratory Functions of the Lungs.pdf
 
Aortic Association CBL Pilot April 19 – 20 Bern
Aortic Association CBL Pilot April 19 – 20 BernAortic Association CBL Pilot April 19 – 20 Bern
Aortic Association CBL Pilot April 19 – 20 Bern
 
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in IndiaTop 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in India
 
Adv. biopharm. APPLICATION OF PHARMACOKINETICS : TARGETED DRUG DELIVERY SYSTEMS
Adv. biopharm. APPLICATION OF PHARMACOKINETICS : TARGETED DRUG DELIVERY SYSTEMSAdv. biopharm. APPLICATION OF PHARMACOKINETICS : TARGETED DRUG DELIVERY SYSTEMS
Adv. biopharm. APPLICATION OF PHARMACOKINETICS : TARGETED DRUG DELIVERY SYSTEMS
 
Management of Traumatic Splenic injury.pptx
Management of Traumatic Splenic injury.pptxManagement of Traumatic Splenic injury.pptx
Management of Traumatic Splenic injury.pptx
 
ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.
 
Physiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of TastePhysiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of Taste
 
How STIs Influence the Development of Pelvic Inflammatory Disease.pptx
How STIs Influence the Development of Pelvic Inflammatory Disease.pptxHow STIs Influence the Development of Pelvic Inflammatory Disease.pptx
How STIs Influence the Development of Pelvic Inflammatory Disease.pptx
 
ARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTS
ARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTSARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTS
ARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTS
 
Sex determination from mandible pelvis and skull
Sex determination from mandible pelvis and skullSex determination from mandible pelvis and skull
Sex determination from mandible pelvis and skull
 
micro teaching on communication m.sc nursing.pdf
micro teaching on communication m.sc nursing.pdfmicro teaching on communication m.sc nursing.pdf
micro teaching on communication m.sc nursing.pdf
 
263778731218 Abortion Clinic /Pills In Harare ,
263778731218 Abortion Clinic /Pills In Harare ,263778731218 Abortion Clinic /Pills In Harare ,
263778731218 Abortion Clinic /Pills In Harare ,
 
Physiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdfPhysiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdf
 

Agbt2015 workshop schneider

Editor's Notes

  1. I’d like open this workshop by reminding everyone of the difference between a genome and an assembly. A human genome is a physical object. An assembly is our representation of that object. It is a model. And as shown here, genome models can take many forms. And as these atomic models illustrate, scientific models evolve over time to reflect our growing knowledge base. And so it is with the human assembly model, the reference genome.
  2. Today’s workshop addresses the advancement of the human reference genome assembly in the context of new data and technologies. In my talk, I’ll discuss the current reference assembly, highlighting the following topics (read outline). I’ll be followed by Karyn, Tina and Deanna, who will each be talking in more detail about some of these items that I introduce.
  3. When assembling the genome of single diploid individual, there may be divergent haplotypes that confound genome assembly. In the original reference assembly model, which was essentially a stick model of linear chromosomes, there really wasn’t a good way to represent highly variant or complex genomic regions. Different haplotypes were simply compressed into a consensus. The insertion of different haplotypes, however, often led to non-existent allele combinations and artificial gaps, as illustrated here. This issue led the GRC to develop a new assembly model several years ago that has a mechanism to cleanly represent multiple haplotypes: alternate loci. They allow the reference assembly to contain alternate representations for regions where haplotype compression isn’t appropriate or a single sequence path is considered insufficient. At the same time, the model retains the linear chromosomes with which most users are comfortable. As a result of the adoption of this model, it’s important to understand that the reference assembly isn’t a haploid or even a diploid genome representation. For any locus, it can represent many haplotypes.
  4. This slide explains how the assembly model accomplishes this. The first thing to know is that the “assembly” is comprised of multiple assembly units. The primary assembly unit is the collection of chromosomes and unlocalized and unplaced scaffolds. This is essentially the original haploid assembly model. Non-nuclear genomes are assigned to their own assembly unit. Regions are defined for those areas of the genome for which alternate sequence representation is desired. Alternate sequence representations for those regions go into alternate loci assembly units. The first alternate sequence representations for each region goes into into one assembly unit. Each additional sequence representation for a region goes into its own assembly unit. We also define the PAR regions, to account for sequence shared by the sex chromosomes.
  5. The alternate loci are stand-alone accessioned scaffold sequences that are given chromosome context via their alignment to the primary assembly unit. This image shows a portion of GRCh38 chr. 17, with its regions and alt loci alignments. As you can see, the relationships of the alts to the primary assembly can be complex, with indels and inversions. For this reason, the GRC curates these alignments. One point I want to make is that the alignments of the alt loci to the chromosomes are an integral part of the assembly model. The alignment, in conjunction with the sequence, is what defines the alt. The alignments are available for download with the assembly from GenBank.
  6. The ideogram image in this slide shows the genome-wide locations of alternate loci in GRCh38, along with some basic alt loci stats.
  7. What all this means is that you don’t have to wait for the development of a graph-based genome representation and corresponding tool suites to do genomic analyses that benefit from variant sequence representations. The current assembly model allows the reference to represent population genomic diversity in the context of linear sequences, which are the currency for most existing analysis pipelines. The next couple of slides show you some of the value added to analyses by use of the full assembly model.
  8. Gene content is one way in which alt loci add value to the assembly. In this slide, you can see several genes annotated in the regions of this alternate representation of the chr. 19 KIR region that have no alignment to the chromosome. Deanna will tell you more about genes unique to alt loci in her talk. [Thus, if you’re not using the entire assembly in your analyses, you may be missing genes. This can affect the development of exome capture reagents. In addition, many of these alts contain paralogous gene copies that will affect alignments and your understanding of the protein content of the genome.]
  9. Alternate loci also have implications for read mapping and data interpretation. This image from the NCBI 1000G browser shows a region of GRCh37 chr.7 encompassing UPK3B, a gene expressed in primary mesothelial cells. The chromosomal representation of UPK3B has not changed in GRCh38, and is believed to represent a relatively rare insertion allele. An alternate loci for this region is included in GRCh38, and represents the deletion allele, as illustrated by its alignment to GRCh37. As illustrated by the 3 samples shown here, alignment profiles in this region vary depending on the alignment method used, in this case bwa or mosaik. As a result, it’s difficult to ascertain the genotypes of these samples or the distribution of these alleles in the human population. However, with the inclusion of the alternate scaffold, we can better interpret the data. This slide shows the alignment of previously unmapped reads from one of these 1000G samples to the GRCh38 alt across the indel boundary, indicating that the sample contains the deletion allele. From analyses such as these, we can see how the inclusion of alternate loci in alignment target sets may improve alignments and data interpretation.
  10. Alternate loci also have a broader impact on read alignments. Since we first developed this model, we’ve been interested in the effect of alt loci on read mapping. This slide describes a study we did a few years ago with the GRCh37.p9 assembly. We looked at the alignment behavior of simulated reads sourced from sequence unique to alt loci. We asked what happened to them when aligned to the primary assembly unit without the alt loci, where their true target is missing. We aligned the reads either as singletons or pairs, using two different aligners (BWA and srprism). As shown in this graph, regardless of read pairing or the aligner, 25% of these reads failed to align (red). What’s particularly concerning is that nearly three-quarters had an off-target alignment on the primary assembly unit (in blue). These off-target alignments are likely to result in errors in variation analyses. This analysis demonstrates the broader value of including alternate loci in alignment target sets.
  11. While it’s clear that alternate loci add value to the reference assembly, you need the right set of tools to take advantage of them. Unfortunately, using many common analyses suites and file formats with the current assembly model is kind of like eating yogurt with chopsticks. They give you a taste of the richness of the data, but leave a lot behind. This is a point that Deanna will address in greater detail later today, so I’ll only outline the challenges researchers face in using the full assembly. But because the assembly model is still based on linear sequences, it should be possible to modify our current tools and file formats to take full advantage of the reference, rather than starting from scratch.
  12. The first issue is allelic duplication. Most current aligners cannot distinguish the allelic duplication introduced by alternate loci from segmental duplication. As a result, reads aligning to sequence common to the chromosomes and alternate loci tend to be down-weighted and excluded from further analysis. This slide shows a graphical view of an alt locus scaffold, with the alignments of the chromosome and reads from a 1000G sample. The top set of alignments represent reads that aligned to both the alt and the chr. The bottom set are the reads that aligned only to the alt. Zooming in, we see these are reads aligning to an insertion in the alt sequence. Unless the aligner can distinguish chromosomal regions associated with alt loci and not down-weight alignments in those regions, the gains in picking up new read alignments are likely to be offset by the discarding of other alignments.
  13. Another challenge to using alternate loci comes in reporting features associated with >1 location. As shown in this image that illustrates the TNXB locus on the reference chromosome and 3 alts, genes may have different structures in different locations. Modifications to file formats such as GFF will make it easier to recognize sequence relationships across the assembly when reporting gene and exon locations. Variant analysis and reporting is another area where changes are needed. As illustrated here, GRCh38 includes representations for the two major haplotypes at the MAPT locus. Depending on sample genotype, it may be desirable to report on more than one representation. However, the VCF format requires modification to support this.
  14. A GRC workshop held last fall led to a publication that helped raise awareness of these issues, and some proposals, such as this one by Aaron Quinlan to make VCF alt-friendly, were discussed. There’s a git issue available for those who are interested. Additionally, bwa-mem recently became alt aware, joining SRPRISM as an alternate aware short read aligner. These changes show that use of the full assembly model is possible and the necessary tools are starting to become available.
  15. I now want to shift gears and discuss the place of the reference as we enter a new era in which this assembly may no longer stand apart in terms of its quality or completeness. It’s important to remember that the human reference assembly is a special kind of genome model. In today’s era of personal genome sequencing, most assemblies only model a haploid or diploid genome. But the reference assembly is a model of many diploid genomes, meant to represent the “human” genome. This slide shows the assembly composition of the GRCh38 primary assembly. While 70% of the genome comes from one donor, sequence from >70 individuals is represented.
  16. Because the reference is derived from many individuals and includes alternate sequence representations, it is likely to remain our best resource for putting sequences identified in any individual into a genomic context. Likewise, b/c a common coordinate system remains critical for communication and reporting purposes, we’re likely to see the reference retain this role as well. The table shows the latest versions of human genome assemblies in GenBank. Those in red are population-specific, and more population-specific genomes are under construction today. When analyzing samples from known populations, population-specific references or collections of population-specific genomes may be particularly valuable for variant or haplotype analysis. Even with a reference that is a graph of population variation, certain analyses may benefit from using only sub-paths in the graph. However, it’s important to realize that the utility of population-specific references may be limited for admixed samples. Given that much of the US population is admixed, this is an important consideration for resource development. Today you’ll hear from Tina about gold genomes, a set of genomes from diverse populations that are being sequenced to provide new representations for some of the genome’s most variable regions. These data will be incorporated into the reference.
  17. Karyn and Tina will also be talking today about platinum assemblies that are derived from hydatidiform moles, which have haploid genomes. Without allelic duplication complicating their assembly, these resources facilitate the resolution of some of the most complex segmentally duplicated genome regions. These platinum genomes will be assembled to reference quality. However, it’s important to realize that there are no plans to replace the reference with either of these platinum mole assemblies. Like other individual genomes, they are limited in their representation of diversity. As you’ll hear, the GRC does intend to use these genomes to improve or augment the reference. As we enter a new era of multiple high quality genomes, we still envision the reference playing important roles.
  18. In the last few minutes of this talk, I’ll discuss ongoing efforts to improve the reference. The “patches” feature of the model allows the GRC to make assembly updates available in a timely fashion without disrupting the chromosome coordinates upon which other users rely. Regions are defined for the genomic locations to be updated, and the sequences representing those updates are put into the “Patches” assembly unit. Like the alt loci, the patches are stand-alone scaffold sequences with alignments. It’s important to distinguish the two types of patches and the ways in which they should be used for analysis: (1) FIX patches correct problems in the assembly: deprecated in next assembly release. (2) NOVEL patches add new alternate sequence representations to the assembly: become alternate loci in the next assembly release.
  19. An example of a GRCh38 fix patch is shown on top in this issue summary from the GRC website, where sequence from a fosmid was used to patch a deleted BAC disrupting representation of the FOXO6 gene. An example of a GRCh38 novel patch is shown on the bottom, where the GRC (in collaboration with the Pharmacogenomics Research Network) added representation for another structural variant of the CYP2D6 locus. The GRC releases patches on a quarterly cycle with the next release planned for the end of March.
  20. With all of the NGS and new genome data, you might think that the GRC is awash in sequences with which to update the assembly. But the reality sometimes feels more like this. Although there is a lot of sequence data available, sequence meeting all 3 of these reference criteria is still limited. Quality is less of an issue today than a couple years ago, but more groups doing sequencing and assembly are putting their data on “public” FTP sites, but not submitting it to an INSDC database. We encourage groups to submit their data so that it can contribute to this valuable public resource. Lastly, the reference assembly is clone based, and all component sequences are representative of a DNA molecule found in an actual individual. As long as the community feels it is important for the reference to represent actual sequences, the ability to phase or resolve haplotypes in newly sequence genomes or generate finished quality sequence from single molecules will be critical to incorporating new sequence into the assembly.