3. • Distinguishing features of the human reference
assembly
• Implications for genomic analyses and tools
• Where do you get assembly-relevant data?
Outline
6. Today’s reference assembly does not represent:
1.The most common allele
2.The longest allele
3.The ancestral allele
Assembly Basics
It represents the sequence available from the HGP
7. GRC Assembly Model
Sequences from haplotype 1
Sequences from haplotype 2
Old Assembly model: compress into a consensus
Current Assembly model: represent both haplotypes
many
8. Assembly (e.g. GRCh38)
Primary
Assembly
Unit
Non-nuclear
assembly unit
(e.g. MT)
PAR
Genomic
Region
(MHC)
Genomic
Region
(UGT2B17)
Genomic
Region
(MAPT)
Church et al., PLoS Biol. 2011 Jul;9(7):e1001091
GRC Assembly Model
ALT
2
ALT
3
ALT
4
ALT
5
ALT
6
ALT
7
ALT
1
9. The alignments of the alternate loci scaffolds to the
chromosomes are an integral part of the assembly
and can be downloaded from GenBank with the
assembly sequences
10. Assembly (e.g. GRCh38.p1)
Primary
Assembly
Unit
Non-nuclear
assembly unit
(e.g. MT)
ALT
1
ALT
2
ALT
3
ALT
4
ALT
5
ALT
6
ALT
7
PAR
Genomic
Region
(MHC)
Genomic
Region
(UGT2B17)
Genomic
Region
(MAPT)
Church et al., PLoS Biol. 2011 Jul;9(7):e1001091
Patches
Genomic
Region
(ABO)
Genomic
Region
(FOXO6)
Genomic
Region
(FCGBP)
GRC Assembly Model
Patches
FIX NOVEL
SCAFFOLD STATUS AT NEXT
MAJOR ASSEMBLY RELEASE
ALT
LOCI
--
(integrated)
Treat as:
Allelic
Treat as:
Preferred
12. GRC: Assembly Model
GRCh38
• 178 regions with alt loci: 2% of chromosome
sequence (61.9 Mb)
• 261 Alt Loci: 3.6 Mb novel sequence relative to
chromosomes
17. Anatomy of an alt
AC012314.8
CU151838.1
ALT LOCI
AC012314.8
AC245052.3 CHR. 19
Due to anchor components, alternate loci contain some sequence
that is redundant to the primary assembly unit
34. Mapped to latest GRCh38 and GRCh37.p13
Accessing the Datahttps://genomereference.org
35. GRCh38 Collaborators
• NCBI RefSeq and gpipe annotation team
• Havana annotators
• Karen Miga
• David Schwartz
• Steve Goldstein
• Mario Caceres
• Giulio Genovese
• Jeff Kidd
• Peter Lansdorp
• Mark Hills
• David Page
• Jim Knight
• Stephan Schuster
• 1000 Genomes
GRC SAB
• Rick Myers
• Granger Sutton
• Evan Eichler
• Jim Kent
• Roderic Guigo
• Carol Bult
• Derek Stemple
• Jan Korbel
• Liz Worthey
• Matthew Hurles
• Richard Gibbs
GRC Creditshttps://genomereference.org
37. Masks and alt aware aligners reduce the incidence of
ambiguous alignments observed when aligning reads to
the full assembly
Mask1: mask chr for fix patches, scaffold for novel/alts. Mask2: mask only on scaffolds
Simulated Reads
GRCh38: Alt Loci
41. • Distinguishing features of the human reference
assembly
• Implications for genomic analyses and tools
• Where do you get assembly-relevant data?
Outline