Church iowa2013

Analyzing Individual
Genomes

Deanna M. Church
Staff Scientist, NCBI

@deannachurch

Valerie Schneider, NCBI
http://genomereference.org

ISCA

ClinVar

Christa Lese Martin (Geisinger)
Erin Riggs (Geisinger)
Jose Mena
Mike Feolo
Tim Hefferon
John Garner
John Lopez

Alex Astashyn
Shanmuga Chitipiralla
Douglas Hoffman
Wonhee Jang
Brandi Kattman
Melissa Landrum
Jennifer Lee
Adriana Malheiro
Wendy Rubinstein
George Riley
Amanjeev Sethi
Ricardo Villamarin
Donna Maglott

GRC
Valerie Schneider (NCBI)
The Genome Institute at Washington University
The Wellcome Trust Sanger Institute
The European Bioinformatics Institute

Acknowledgements
GeT-RM
Lisa Kalman (CDC)
Birgit Funke (Harvard)
Mahduri Hegde (Emory)
Maryam Halavi
Chao Chen
Jon Trow
Douglas Slotta
Peter Meric
Daniel Frishberg
Victor Ananiev

Why should you care about
the Reference Assembly?

Genes, NCBI Homo sapiens Annotation Release 105

Transcript
CDS

dbSNP Build 138 using annotation release 104

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

What is the
Reference Assembly?

An assembly is a

MODEL of the genome

BAC insert
BAC vector

Shotgun sequence

Assemble

GAPS

“finishers” go in to manually
fill the gaps, often by PCR

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1012

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1321

RP11-34P13

64E8

Gaps

RP4-669L17

RP5-857K21 RP11-206L10

RP11-54O7

AL139246.20

NCBI35 (hg17)

AL139246.21

GRCh37 (hg19)

Build sequence contigs based on contigs
defined in TPF (Tiling Path File).
Check for orientation consistencies
Select switch points
Instantiate sequence for further analysis
Switch point

Consensus sequence

nsv832911 (nstd68)

Submitted on NCBI35 (hg17)

NCBI35 (hg17) Tiling Path

Moved approximately 2 Mb
distal on chr15

NC_0000015.8 (chr15)

Gap Inserted

GRCh37 (hg19) Tiling Path
NC_0000015.9 (chr15)

HG-24

Removed from assembly

Added to assembly

Sequences from haplotype 1
Sequences from haplotype 2

Old Assembly model: compress into a consensus

New Assembly model: represent both haplotypes

nsv532126 (nstd37)

NCBI36 NC_000004.10 (chr4) Tiling Path
AC079749.5
AC074378.4

AC147055.2
AC134921.2

AC019173.4
AC140484.1

AC021146.7
AC093720.2

TMPRSS11E2

TMPRSS11E

GRCh37 NC_000004.11 (chr4) Tiling Path
AC079749.5
AC074378.4

AC147055.2
AC134921.1

AC021146.7
AC093720.2

TMPRSS11E

GRCh37: NT_167250.1 (UGT2B17 alternate locus)
AC021146.7

AC019173.4
AC074378.4

AC226496.2
AC140484.1

TMPRSS11E2

Xue Y et al, 2008

UGT2B17

MHC

MAPT

GRCh37 (hg19)

7 alternate haplotypes
at the MHC
Alternate loci released as:
FASTA
AGP
Alignment to chromosome


MHC (chr6)
Chr 6 representation (PGF)

Alt_Ref_Locus_2 (COX)

Variant Calling and the
Reference Assembly

Part of chr22 assembly
Alternate locus for chr22

White: Insertion
Black: Deletion

Kidd et al, 2007 APOBEC cluster

Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320
NM_031192.3: transcript from C57BL/6J
NM_031193.2: transcript from FVB/N

129S6/SvEvTac Alt Locus Alignment Ren1 (allelic)

FVB/N Transcript Alignment Ren2 (paralog)

Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320
NM_031192.3: transcript from C57BL/6J
NM_031193.2: transcript from FVB/N

129S6/SvEvTac Ren1
FVB Ren2 Tx

Paralogous
diff

SNP +
Paralogous
diff

Doggett et al., 2006

Hydin: chr16 (16q22.2)
Hydin2: chr1 (1q21.1)
Missing in NCBI35/NCBI36 Unlocalized in GRCh37 Finished in GRCh38

Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
(Paralogous)
(Allelic)

Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID

CDC27
1KG Phase 1 Strict accessibility mask
SNP (all)
SNP (not 1KG)

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

GRCh38 is coming
(September, 2013)

Adding Novel Sequence

Karen Miga and Jim Kent

arXiv:1307.0035

Dennis et al., 2012

1q32

1q21

1p21

1p21 patch alignment to chromosome 1

GRCh37 (current reference assembly)
NC_000023.10 (chrX)

Preview of GRCh38 (scheduled Fall 2013)
NW_003871103.3

TEX28

LOC101060233
(opsin related)

TKTL1

LOC101060234
(TEX28 related)

FAM23_MRC1 Region, chr10

Segmental Duplications
1KG accessibility Mask

Novel Patch

250 kb of artificial duplication

Human Resolved for GRCh38

GRCh37p13
120 Fix Patches
60 Novel

From Assembly 1 <-> Assembly 2
Assembly <-> RefSeqGene/LRG
Primary Assembly <-> Alternate loci

http://www.ncbi.nlm.nih.gov/genome/tools/remap

Church iowa2013

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Church iowa2013

Similar to Church iowa2013 (20)

More from Deanna Church

More from Deanna Church (9)

Recently uploaded

Recently uploaded (20)

Church iowa2013

Editor's Notes