SlideShare a Scribd company logo
1 of 42
Download to read offline
Centromere Sequence Assembly
Karen H. Miga
University of California, Santa Cruz
10/17/17
GRC GIAB Workshop
ASHG
Megabase-sized gapsP-ARM Q-ARM
CEN
HUMAN	
  CENTROMERES:	
  MULTI-­‐MEGABASE	
  SIZED	
  
GAPS	
  IN	
  ALL	
  CHROMOSOME	
  ASSEMBLIES
CEN
PROGRESS	
  UPDATE:	
  	
  
CENTROMERE	
  SEQUENCE	
  ASSEMBLIES
1.	
  	
  	
  GRCh38	
  Reference	
  Models	
  for	
  Human	
  
Centromere	
  Arrays
2.	
  	
  Efforts	
  to	
  Generate	
  True,	
  Linear	
  Assemblies	
  of	
  
Centromeric	
  regions:	
  Chromosome	
  Y
3.	
  	
  Future	
  PerspecSve
p-arm q-arm... ...
Multi-megabase sized arrays of satellite DNA
...ATCCGATTACG ATCCGATTACGATCCGATTACG... ...ATCCGATTACG ATCCGATTACGATCCGATTACG...
CHALLENGE	
  OF	
  ASSEMBLING	
  LONG	
  TRACTS	
  OF	
  
(NEAR	
  IDENTICAL)	
  TANDEM	
  REPEATS
p-arm q-arm
... ...ALPHA SATELLITE
~171bp
Tandem Repeat
Wide Range of Percent ID: ~60-100%
1 2 3 4
HUMAN	
  CENTROMERES:	
  ALPHA	
  SATELLITTE
Narrow Range of Percent ID: 94% - 100%
“Higher Order Repeat”
Multi-monomeric Repeat Unit
p-arm q-arm
... ...
1 2 3 4 1 2 3 4 1 2 3 4
HIGHER	
  ORDER	
  REPEATS	
  
p-arm q-arm
... ...
p-arm q-arm
... ...
Array “A”
Array “B” Array “C”
chrX
chr3
CHROMOSOME-­‐SPECIFIC	
  SATELLITE	
  

SEQUENCE	
  ORGANIZATION
p-arm q-arm
... ...
... ...-A- -T-
GENOME	
  MODEL	
  OF	
  SEQUENCE	
  ORGANIZATION	
  
IN	
  CENTROMERE-­‐ASSIGNED	
  GAPS
p-arm q-arm
... ...
... ...-A- -T-
GENOME	
  MODEL	
  OF	
  SEQUENCE	
  ORGANIZATION	
  
IN	
  CENTROMERE-­‐ASSIGNED	
  GAPS
LINE
SINE
OTHER
NON-ALPHA SATELLITE
p-arm q-arm
... ...
... ...-A- -T-
GENOME	
  MODEL	
  OF	
  SEQUENCE	
  ORGANIZATION	
  
IN	
  CENTROMERE-­‐ASSIGNED	
  GAPS
LINE
SINE
OTHER
NON-ALPHA SATELLITE
Unmapped
(Yet Assembled) Scaffolds
Characterize HORs in Human Genome1
1. GRCh38	
  Alpha	
  Satellite	
  Reference	
  Models	
  
1
A B C D E F
Characterize HORs in Human Genome1
1. GRCh38	
  Alpha	
  Satellite	
  Reference	
  Models	
  
1
>200 ENCODE datasets
A B C D E F
Characterize HORs in Human Genome1
1. GRCh38	
  Alpha	
  Satellite	
  Reference	
  Models	
  
>200 ENCODE datasets
y Step Example For Single P-read, I
α-Centauri
(centromeric automated repeat identification)
5’…
…3’
10x
10
B
C
D
EF
A
10
10
10
10
10
5’ 3’
1
http://github.com/volkansevim/alpha-
CENTAURI.
B
C
D
EF
A
Chromosome specific assignment
?
Experimental Evidence:
FISH Hybridization/Mapping and Screening Somatic
Cell Hybrid Panel
B
C
D
EF
A
D7Z1
6-mer
Waye	
  et	
  al	
  (1987)	
  
98%	
  	
  GenBank:	
  M16101	
  
Flow Sorted Chromosome
Alignment/Enrichment
Sequence enrichment analysis of isolated
human chromosomes
Long Range Paired Read Support
“Anchor” to mapped to the assembled p-arm and/
or q-arm
Chromosome specific assignment
Chromosome-assignment of Higher Order Repeats
Characterize HORs in Human Genome
1. GRCh38	
  Alpha	
  Satellite	
  Reference	
  Models	
  
DXZ1 (12-mer)
CENX
e.g.
1 2 3 4 5 6 7 8 9 10 11 12
LINEHuRef
WGS Sanger
read Db
Constructing WGS Read Libraries for each HOR array2
LINEA/T
1
Characterize HORs in Human Genome
1. GRCh38	
  Alpha	
  Satellite	
  Reference	
  Models	
  
Constructing WGS Read Libraries for each HOR array
m3v1
m1v1
m2v1
m2v2
m4v1
m12v1
m5v1
m6v1
LINE
m11v1
m10v1
m9v1
m8v1
m7v1
1.01.0
1.0
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.3
0.7
0.3
0.7
1.0
LINEA/T
2
1
3 Model ArrayVariants in Sequence Graph:
linearSat
• 2nd Order Markov Chain
• Length determined by normalized
array length estimates
m3v1
m1v1
m2v1
m2v2
m4v1
m12v1
m5v1
m6v1
LINE
m11v1
m10v1
m9v1
m8v1
m7v1
1.01.0
1.0
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.3
0.7
0.3
0.7
1.0
Not the “true” long-range
organization, yet adequately
represents the alpha satellite
array sequence
https://github.com/JimKent/linearSat
LINEAR	
  ORDERING	
  OF	
  REFERENCE	
  MODELS	
  AND	
  
ASSEMBLED	
  CONTIGS	
  USING	
  MATE	
  PAIRS
CENXXp Xq
3.8 Mb
chrX
2.25Mb; ~860 HOR units0.73Mb; ~43 HOR units
0.3Mb;
Low Copy
Repeat
pp
3p 3qCEN3.1 CEN3.2
Unmapped HuRef
Assembled Contig(s)
(e.g.ABBA01185959)
chr3
Yp Yq
Xp Xq
100Kb
12p 12q
17q17p
2p 2q
6p 6q
3p 3q
4p 4q
11p 11q
8p 8q
10p 10q
7p 7pq 7q
9q9p
1p 1q
16q16p
18p 18q
19p 19q
20q20p
5p 5q
1
2
3
4
5
6
7
8
9
10
11
12
15
16
17
18
19
20
15q 15p
X
Y
21p
14q
21q
Acrocentric Chr
(13,14,21,22)
An Initial Draft of
Human Centromere
Sequence Composition
Alpha	
  Satellite	
  Reference	
  Models:	
  
~60	
  Mb	
  (59571670	
  bp)
CENTROMERE	
  SEQUENCE	
  ASSEMBLY	
  
1. GRCh38	
  Alpha	
  Satellite	
  Reference	
  Models	
  
2. Linear	
  Assembly	
  of	
  a	
  Human	
  Centromere	
  
Miga, KH., et al. Genome research 24.4 (2014): 697-707.l 20
LINEAR	
  ASSEMBLY	
  OF	
  	
  A	
  HUMAN	
  
CENTROMERE	
  ON	
  THE	
  Y	
  CHROMOSOME
Small, haploid satellite array
with well-characterized 5.8 kb repeat
p-arm q-arm
BACS:	
  OVERLAP-­‐LAYOUT-­‐ASSEMBLY
p-arm q-arm
Collection of 9 BACs known to
span the Y Centromere
Overlap determined by single copy sequence variants
Tilford et al 2001 Nature
HIGH	
  QUALITY	
  +	
  LONG	
  (100	
  kb	
  +)	
  READS
~100 kb
Collapsed
Representation
Challenge of
Assembling
Identical Tandem
Repeats with Short
Reads
HIGH	
  QUALITY	
  +	
  LONG	
  (100	
  kb	
  +)	
  READS
High Quality Consensus Sequence
~100 kb
NANOPORE	
  SEQUENCING:	
  LONGBOARD	
  (1D)UCSC LONGBOARD 1D PROTOCOL
LONGBOARD 1D PROTOCOL
NANOPORE	
  SEQUENCING:	
  LONGBOARD	
  (1D)
UCSC LONGBOARD 1D PROTOCOL
In total, we have generated 3500+ reads
greater than 150 kb
NANOPORE	
  SEQUENCING:	
  LONGBOARD	
  (1D)
MULTIPLE ALIGNMENT STRATEGY TO IMPROV
QUALITY BY CONSENSUS
High Qualit
Consensus Req
Modest Cove
UCSC LONGBOARD 1D PROTOCOL
MULTIPLE	
  ALIGNMENT	
  STRATEGY	
  TO	
  IMPROVE	
  
QUALITY	
  BY	
  CONSENSUS
RP11 718M18
221.4 kb
Vector
Insert
634 Predicted
Nucleotide Variants
2 Tandem Structural
Rearrangements
38 CENY RPTS (>99% Identity
to published consensus)
Homopolymers
[A]n
Homopolymers
[T]n
Identify informative, single
copy sites in the array useful
for overlap BAC-based
assembly
Y SINGLE COPY VARIANTS USING ILLUMINA DATA
RP11 718M18
221.4 kb
VALIDATE	
  HIGH-­‐CONFIDENT	
  
	
  SINGLE	
  COPY	
  VARIANTS	
  WITH	
  ILLUMINA
RP11 718M18
221.4 kb
VALIDATE	
  HIGH-­‐CONFIDENT	
  
	
  SINGLE	
  COPY	
  VARIANTS
LINEAR	
  ASSEMBLY	
  OF	
  HUMAN	
  Y	
  CENTROMERE
Future	
  PerspecSve
1.	
  	
  	
  Linear	
  assemblies	
  of	
  human	
  centromeric	
  
regions	
  improve	
  in	
  step	
  with	
  sequencing	
  
technology	
  (i.e.	
  read	
  length	
  and	
  quality)	
  
2.	
  	
  One	
  genome	
  is	
  not	
  enough:	
  Highly	
  variable	
  
3.	
  	
  Linear	
  CEN	
  assemblies	
  present	
  a	
  mapping	
  
challenge	
  to	
  most	
  genomic	
  applicaSons
True Linear Maps of Human CEN Regions
Y CEN
True Linear
Arrangement
Informatics/Analysis
Data Structure
Key Advantages of Satellite DNA Graphs
1. Eliminates sequence redundancy
Key Advantages of Satellite DNA Graphs
Improves Unambiguous Short Read Mapping
REPEAT REPEAT REPEAT
?
5’ 3’REPEAT
Benedict Paten Adam Novak
Centromere Graphs
Demonstrate unambiguous mapping
the majority ( > 98%) of
1000 genome alpha satellite reads
1. Eliminates sequence redundancy
Key Advantages of Satellite DNA Graphs
1. Eliminates sequence redundancy
2. Information describing long-range haplotypes are
retained as defined “paths” in the graph:
Key Advantages of Satellite DNA Graphs
1. Eliminates sequence redundancy
2. Information describing long-range haplotypes are
retained as defined “paths” in the graph
3. Graph data structure and sequence analysis tools
will be consistent with the rest of the human genome
The major histocompatibility complex (Kiran Garimella & Gil McVean)
Creating (and mapping to) a
Universal Reference Genome
Benedict Paten, Adam Novak, David
Haussler, UC Santa Cruz
Mark Akeson
Miten Jain
Hugh Olsen
Benedict Paten
Dave Deamer
Robin AbuShumays
Andrew Smith
Ian Fiddes
Art Rand
Logan Mulroney
Jordan Eizenga
Rojin Safavi
Rachel Lawton
Andrew Bailey
Ariah Mackie
David Haussler
Benedict Paten
Jim Kent
Sofie Salama
UCSC Nanopore Analysis Group
Miten Jain Hugh Olsen Mark Akeson
Dan Turner
David Stoddart
Oxford Nanopore Technologies
Huntington F. Willard
David Page
Product Version
Device MinION MK1
Flow cell FLO-MIN106
Kits Rapid Sequencing Kit
Data
analysis
Albacore 1.0.1
Metrichor 1D
Acknowledgements

More Related Content

What's hot

hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)Shaojun Xie
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortiumGenomeInABottle
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCGenome Reference Consortium
 
Generating high-quality reference human genomes using PromethION nanopore seq...
Generating high-quality reference human genomes using PromethION nanopore seq...Generating high-quality reference human genomes using PromethION nanopore seq...
Generating high-quality reference human genomes using PromethION nanopore seq...Miten Jain
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Genome Reference Consortium
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonGenome Reference Consortium
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeLex Nederbragt
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsGenomeInABottle
 
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Saul Kravitz
 

What's hot (20)

hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
 
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
Generating high-quality reference human genomes using PromethION nanopore seq...
Generating high-quality reference human genomes using PromethION nanopore seq...Generating high-quality reference human genomes using PromethION nanopore seq...
Generating high-quality reference human genomes using PromethION nanopore seq...
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
 
Ashg2015 grc-pruitt
Ashg2015 grc-pruittAshg2015 grc-pruitt
Ashg2015 grc-pruitt
 
150224 grc kms
150224 grc kms150224 grc kms
150224 grc kms
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
 
agbt 2016 workshop church
agbt 2016 workshop churchagbt 2016 workshop church
agbt 2016 workshop church
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 

Similar to 101717.kh miga ashg_grc

Winnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequencesWinnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequencesChirag Jain
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionGenomeInABottle
 
High throughput sequencing with Thermostable Group II Intron Reverse Transcri...
High throughput sequencing with Thermostable Group II Intron Reverse Transcri...High throughput sequencing with Thermostable Group II Intron Reverse Transcri...
High throughput sequencing with Thermostable Group II Intron Reverse Transcri...Douglas Wu
 
General pipeline of transcriptomics analysis
General pipeline of transcriptomics analysisGeneral pipeline of transcriptomics analysis
General pipeline of transcriptomics analysisSanty Marques-Ladeira
 
Decoding ancient Bulgarian DNA with semiconductor-based sequencing
Decoding ancient Bulgarian DNA with semiconductor-based sequencingDecoding ancient Bulgarian DNA with semiconductor-based sequencing
Decoding ancient Bulgarian DNA with semiconductor-based sequencingThermo Fisher Scientific
 
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering ChannelsLTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering ChannelsIllaKolani1
 
FPGA Implementation of a GA
FPGA Implementation of a GAFPGA Implementation of a GA
FPGA Implementation of a GAHocine Merabti
 
de Bruijn Graph Construction from Combination of Short and Long Reads
de Bruijn Graph Construction from Combination of Short and Long Readsde Bruijn Graph Construction from Combination of Short and Long Reads
de Bruijn Graph Construction from Combination of Short and Long ReadsSikder Tahsin Al-Amin
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
Code Division Multiple Access.pptx
Code Division Multiple Access.pptxCode Division Multiple Access.pptx
Code Division Multiple Access.pptxzakariahassanhassan1
 
IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptgrssieee
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Deanna Church
 
Deep Learning Tomography
Deep Learning TomographyDeep Learning Tomography
Deep Learning TomographyAmir Adler
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentationaustinps
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Gunnar Rätsch
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataAlireza Doustmohammadi
 
Performance Evaluation of Adaptive Continuous Wavelet Transform based Rake Re...
Performance Evaluation of Adaptive Continuous Wavelet Transform based Rake Re...Performance Evaluation of Adaptive Continuous Wavelet Transform based Rake Re...
Performance Evaluation of Adaptive Continuous Wavelet Transform based Rake Re...IJECEIAES
 
Performance Analysis of Ultra Wideband Receivers for High Data Rate Wireless ...
Performance Analysis of Ultra Wideband Receivers for High Data Rate Wireless ...Performance Analysis of Ultra Wideband Receivers for High Data Rate Wireless ...
Performance Analysis of Ultra Wideband Receivers for High Data Rate Wireless ...graphhoc
 

Similar to 101717.kh miga ashg_grc (20)

Winnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequencesWinnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequences
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detection
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
High throughput sequencing with Thermostable Group II Intron Reverse Transcri...
High throughput sequencing with Thermostable Group II Intron Reverse Transcri...High throughput sequencing with Thermostable Group II Intron Reverse Transcri...
High throughput sequencing with Thermostable Group II Intron Reverse Transcri...
 
General pipeline of transcriptomics analysis
General pipeline of transcriptomics analysisGeneral pipeline of transcriptomics analysis
General pipeline of transcriptomics analysis
 
Decoding ancient Bulgarian DNA with semiconductor-based sequencing
Decoding ancient Bulgarian DNA with semiconductor-based sequencingDecoding ancient Bulgarian DNA with semiconductor-based sequencing
Decoding ancient Bulgarian DNA with semiconductor-based sequencing
 
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering ChannelsLTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
 
FPGA Implementation of a GA
FPGA Implementation of a GAFPGA Implementation of a GA
FPGA Implementation of a GA
 
de Bruijn Graph Construction from Combination of Short and Long Reads
de Bruijn Graph Construction from Combination of Short and Long Readsde Bruijn Graph Construction from Combination of Short and Long Reads
de Bruijn Graph Construction from Combination of Short and Long Reads
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Code Division Multiple Access.pptx
Code Division Multiple Access.pptxCode Division Multiple Access.pptx
Code Division Multiple Access.pptx
 
IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.ppt
 
Biochip
BiochipBiochip
Biochip
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
Deep Learning Tomography
Deep Learning TomographyDeep Learning Tomography
Deep Learning Tomography
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing Data
 
Performance Evaluation of Adaptive Continuous Wavelet Transform based Rake Re...
Performance Evaluation of Adaptive Continuous Wavelet Transform based Rake Re...Performance Evaluation of Adaptive Continuous Wavelet Transform based Rake Re...
Performance Evaluation of Adaptive Continuous Wavelet Transform based Rake Re...
 
Performance Analysis of Ultra Wideband Receivers for High Data Rate Wireless ...
Performance Analysis of Ultra Wideband Receivers for High Data Rate Wireless ...Performance Analysis of Ultra Wideband Receivers for High Data Rate Wireless ...
Performance Analysis of Ultra Wideband Receivers for High Data Rate Wireless ...
 

More from Genome Reference Consortium

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?Genome Reference Consortium
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 

More from Genome Reference Consortium (20)

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 

Recently uploaded

The Orbit & its contents by Dr. Rabia I. Gandapore.pptx
The Orbit & its contents by Dr. Rabia I. Gandapore.pptxThe Orbit & its contents by Dr. Rabia I. Gandapore.pptx
The Orbit & its contents by Dr. Rabia I. Gandapore.pptxDr. Rabia Inam Gandapore
 
Impact of cancers therapies on the loss in cardiac function, myocardial fffic...
Impact of cancers therapies on the loss in cardiac function, myocardial fffic...Impact of cancers therapies on the loss in cardiac function, myocardial fffic...
Impact of cancers therapies on the loss in cardiac function, myocardial fffic...Catherine Liao
 
DEVELOPMENT OF OCCLUSION IN PEDIATRIC DENTISTRY
DEVELOPMENT OF OCCLUSION IN PEDIATRIC DENTISTRYDEVELOPMENT OF OCCLUSION IN PEDIATRIC DENTISTRY
DEVELOPMENT OF OCCLUSION IN PEDIATRIC DENTISTRYChsaiteja3
 
DECIPHERING COMMON ECG FINDINGS IN ED.pptx
DECIPHERING COMMON ECG FINDINGS IN ED.pptxDECIPHERING COMMON ECG FINDINGS IN ED.pptx
DECIPHERING COMMON ECG FINDINGS IN ED.pptxdrwaque
 
End Feel -joint end feel - Normal and Abnormal end feel
End Feel -joint end feel - Normal and Abnormal end feelEnd Feel -joint end feel - Normal and Abnormal end feel
End Feel -joint end feel - Normal and Abnormal end feeldranji1
 
Scientificity and feasibility study of non-invasive central arterial pressure...
Scientificity and feasibility study of non-invasive central arterial pressure...Scientificity and feasibility study of non-invasive central arterial pressure...
Scientificity and feasibility study of non-invasive central arterial pressure...Catherine Liao
 
Multiple sclerosis diet.230524.ppt3.pptx
Multiple sclerosis diet.230524.ppt3.pptxMultiple sclerosis diet.230524.ppt3.pptx
Multiple sclerosis diet.230524.ppt3.pptxMeenakshiGursamy
 
Denture base resins materials and its mechanism of action
Denture base resins materials and its mechanism of actionDenture base resins materials and its mechanism of action
Denture base resins materials and its mechanism of actionDr.shiva sai vemula
 
Vaccines: A Powerful and Cost-Effective Tool Protecting Americans Against Dis...
Vaccines: A Powerful and Cost-Effective Tool Protecting Americans Against Dis...Vaccines: A Powerful and Cost-Effective Tool Protecting Americans Against Dis...
Vaccines: A Powerful and Cost-Effective Tool Protecting Americans Against Dis...PhRMA
 
MRI Artifacts and Their Remedies/Corrections.pptx
MRI Artifacts and Their Remedies/Corrections.pptxMRI Artifacts and Their Remedies/Corrections.pptx
MRI Artifacts and Their Remedies/Corrections.pptxDr. Dheeraj Kumar
 
TUBERCULINUM-2.BHMS.MATERIA MEDICA.HOMOEOPATHY
TUBERCULINUM-2.BHMS.MATERIA MEDICA.HOMOEOPATHYTUBERCULINUM-2.BHMS.MATERIA MEDICA.HOMOEOPATHY
TUBERCULINUM-2.BHMS.MATERIA MEDICA.HOMOEOPATHYDRPREETHIJAMESP
 
hypo and hyper thyroidism final lecture.pptx
hypo and hyper thyroidism  final lecture.pptxhypo and hyper thyroidism  final lecture.pptx
hypo and hyper thyroidism final lecture.pptxdr shahida
 
THORACOTOMY . SURGICAL PERSPECTIVES VOL 1
THORACOTOMY . SURGICAL PERSPECTIVES VOL 1THORACOTOMY . SURGICAL PERSPECTIVES VOL 1
THORACOTOMY . SURGICAL PERSPECTIVES VOL 1DR SETH JOTHAM
 
NCLEX RN REVIEW EXAM CONTENT BLUE BOOK PDF
NCLEX RN REVIEW EXAM CONTENT BLUE BOOK PDFNCLEX RN REVIEW EXAM CONTENT BLUE BOOK PDF
NCLEX RN REVIEW EXAM CONTENT BLUE BOOK PDFShahid Hussain
 
Integrated Neuromuscular Inhibition Technique (INIT)
Integrated Neuromuscular Inhibition Technique (INIT)Integrated Neuromuscular Inhibition Technique (INIT)
Integrated Neuromuscular Inhibition Technique (INIT)Anjali Parmar
 
Effects of vaping e-cigarettes on arterial health
Effects of vaping e-cigarettes on arterial healthEffects of vaping e-cigarettes on arterial health
Effects of vaping e-cigarettes on arterial healthCatherine Liao
 
SURGICAL ANATOMY OF ORAL IMPLANTOLOGY.pptx
SURGICAL ANATOMY OF ORAL IMPLANTOLOGY.pptxSURGICAL ANATOMY OF ORAL IMPLANTOLOGY.pptx
SURGICAL ANATOMY OF ORAL IMPLANTOLOGY.pptxSuresh Kumar K
 
A thorough review of supernormal conduction.pptx
A thorough review of supernormal conduction.pptxA thorough review of supernormal conduction.pptx
A thorough review of supernormal conduction.pptxSergio Pinski
 
5CL-ADB powder supplier 5cl adb 5cladba 5cl raw materials vendor on sale now
5CL-ADB powder supplier 5cl adb 5cladba 5cl raw materials vendor on sale now5CL-ADB powder supplier 5cl adb 5cladba 5cl raw materials vendor on sale now
5CL-ADB powder supplier 5cl adb 5cladba 5cl raw materials vendor on sale nowSherrylee83
 
180-hour Power Capsules For Men In Ghana
180-hour Power Capsules For Men In Ghana180-hour Power Capsules For Men In Ghana
180-hour Power Capsules For Men In Ghanahealthwatchghana
 

Recently uploaded (20)

The Orbit & its contents by Dr. Rabia I. Gandapore.pptx
The Orbit & its contents by Dr. Rabia I. Gandapore.pptxThe Orbit & its contents by Dr. Rabia I. Gandapore.pptx
The Orbit & its contents by Dr. Rabia I. Gandapore.pptx
 
Impact of cancers therapies on the loss in cardiac function, myocardial fffic...
Impact of cancers therapies on the loss in cardiac function, myocardial fffic...Impact of cancers therapies on the loss in cardiac function, myocardial fffic...
Impact of cancers therapies on the loss in cardiac function, myocardial fffic...
 
DEVELOPMENT OF OCCLUSION IN PEDIATRIC DENTISTRY
DEVELOPMENT OF OCCLUSION IN PEDIATRIC DENTISTRYDEVELOPMENT OF OCCLUSION IN PEDIATRIC DENTISTRY
DEVELOPMENT OF OCCLUSION IN PEDIATRIC DENTISTRY
 
DECIPHERING COMMON ECG FINDINGS IN ED.pptx
DECIPHERING COMMON ECG FINDINGS IN ED.pptxDECIPHERING COMMON ECG FINDINGS IN ED.pptx
DECIPHERING COMMON ECG FINDINGS IN ED.pptx
 
End Feel -joint end feel - Normal and Abnormal end feel
End Feel -joint end feel - Normal and Abnormal end feelEnd Feel -joint end feel - Normal and Abnormal end feel
End Feel -joint end feel - Normal and Abnormal end feel
 
Scientificity and feasibility study of non-invasive central arterial pressure...
Scientificity and feasibility study of non-invasive central arterial pressure...Scientificity and feasibility study of non-invasive central arterial pressure...
Scientificity and feasibility study of non-invasive central arterial pressure...
 
Multiple sclerosis diet.230524.ppt3.pptx
Multiple sclerosis diet.230524.ppt3.pptxMultiple sclerosis diet.230524.ppt3.pptx
Multiple sclerosis diet.230524.ppt3.pptx
 
Denture base resins materials and its mechanism of action
Denture base resins materials and its mechanism of actionDenture base resins materials and its mechanism of action
Denture base resins materials and its mechanism of action
 
Vaccines: A Powerful and Cost-Effective Tool Protecting Americans Against Dis...
Vaccines: A Powerful and Cost-Effective Tool Protecting Americans Against Dis...Vaccines: A Powerful and Cost-Effective Tool Protecting Americans Against Dis...
Vaccines: A Powerful and Cost-Effective Tool Protecting Americans Against Dis...
 
MRI Artifacts and Their Remedies/Corrections.pptx
MRI Artifacts and Their Remedies/Corrections.pptxMRI Artifacts and Their Remedies/Corrections.pptx
MRI Artifacts and Their Remedies/Corrections.pptx
 
TUBERCULINUM-2.BHMS.MATERIA MEDICA.HOMOEOPATHY
TUBERCULINUM-2.BHMS.MATERIA MEDICA.HOMOEOPATHYTUBERCULINUM-2.BHMS.MATERIA MEDICA.HOMOEOPATHY
TUBERCULINUM-2.BHMS.MATERIA MEDICA.HOMOEOPATHY
 
hypo and hyper thyroidism final lecture.pptx
hypo and hyper thyroidism  final lecture.pptxhypo and hyper thyroidism  final lecture.pptx
hypo and hyper thyroidism final lecture.pptx
 
THORACOTOMY . SURGICAL PERSPECTIVES VOL 1
THORACOTOMY . SURGICAL PERSPECTIVES VOL 1THORACOTOMY . SURGICAL PERSPECTIVES VOL 1
THORACOTOMY . SURGICAL PERSPECTIVES VOL 1
 
NCLEX RN REVIEW EXAM CONTENT BLUE BOOK PDF
NCLEX RN REVIEW EXAM CONTENT BLUE BOOK PDFNCLEX RN REVIEW EXAM CONTENT BLUE BOOK PDF
NCLEX RN REVIEW EXAM CONTENT BLUE BOOK PDF
 
Integrated Neuromuscular Inhibition Technique (INIT)
Integrated Neuromuscular Inhibition Technique (INIT)Integrated Neuromuscular Inhibition Technique (INIT)
Integrated Neuromuscular Inhibition Technique (INIT)
 
Effects of vaping e-cigarettes on arterial health
Effects of vaping e-cigarettes on arterial healthEffects of vaping e-cigarettes on arterial health
Effects of vaping e-cigarettes on arterial health
 
SURGICAL ANATOMY OF ORAL IMPLANTOLOGY.pptx
SURGICAL ANATOMY OF ORAL IMPLANTOLOGY.pptxSURGICAL ANATOMY OF ORAL IMPLANTOLOGY.pptx
SURGICAL ANATOMY OF ORAL IMPLANTOLOGY.pptx
 
A thorough review of supernormal conduction.pptx
A thorough review of supernormal conduction.pptxA thorough review of supernormal conduction.pptx
A thorough review of supernormal conduction.pptx
 
5CL-ADB powder supplier 5cl adb 5cladba 5cl raw materials vendor on sale now
5CL-ADB powder supplier 5cl adb 5cladba 5cl raw materials vendor on sale now5CL-ADB powder supplier 5cl adb 5cladba 5cl raw materials vendor on sale now
5CL-ADB powder supplier 5cl adb 5cladba 5cl raw materials vendor on sale now
 
180-hour Power Capsules For Men In Ghana
180-hour Power Capsules For Men In Ghana180-hour Power Capsules For Men In Ghana
180-hour Power Capsules For Men In Ghana
 

101717.kh miga ashg_grc

  • 1. Centromere Sequence Assembly Karen H. Miga University of California, Santa Cruz 10/17/17 GRC GIAB Workshop ASHG
  • 2. Megabase-sized gapsP-ARM Q-ARM CEN HUMAN  CENTROMERES:  MULTI-­‐MEGABASE  SIZED   GAPS  IN  ALL  CHROMOSOME  ASSEMBLIES
  • 3. CEN
  • 4. PROGRESS  UPDATE:     CENTROMERE  SEQUENCE  ASSEMBLIES 1.      GRCh38  Reference  Models  for  Human   Centromere  Arrays 2.    Efforts  to  Generate  True,  Linear  Assemblies  of   Centromeric  regions:  Chromosome  Y 3.    Future  PerspecSve
  • 5. p-arm q-arm... ... Multi-megabase sized arrays of satellite DNA ...ATCCGATTACG ATCCGATTACGATCCGATTACG... ...ATCCGATTACG ATCCGATTACGATCCGATTACG... CHALLENGE  OF  ASSEMBLING  LONG  TRACTS  OF   (NEAR  IDENTICAL)  TANDEM  REPEATS
  • 6. p-arm q-arm ... ...ALPHA SATELLITE ~171bp Tandem Repeat Wide Range of Percent ID: ~60-100% 1 2 3 4 HUMAN  CENTROMERES:  ALPHA  SATELLITTE
  • 7. Narrow Range of Percent ID: 94% - 100% “Higher Order Repeat” Multi-monomeric Repeat Unit p-arm q-arm ... ... 1 2 3 4 1 2 3 4 1 2 3 4 HIGHER  ORDER  REPEATS  
  • 8. p-arm q-arm ... ... p-arm q-arm ... ... Array “A” Array “B” Array “C” chrX chr3 CHROMOSOME-­‐SPECIFIC  SATELLITE  
 SEQUENCE  ORGANIZATION
  • 9. p-arm q-arm ... ... ... ...-A- -T- GENOME  MODEL  OF  SEQUENCE  ORGANIZATION   IN  CENTROMERE-­‐ASSIGNED  GAPS
  • 10. p-arm q-arm ... ... ... ...-A- -T- GENOME  MODEL  OF  SEQUENCE  ORGANIZATION   IN  CENTROMERE-­‐ASSIGNED  GAPS LINE SINE OTHER NON-ALPHA SATELLITE
  • 11. p-arm q-arm ... ... ... ...-A- -T- GENOME  MODEL  OF  SEQUENCE  ORGANIZATION   IN  CENTROMERE-­‐ASSIGNED  GAPS LINE SINE OTHER NON-ALPHA SATELLITE Unmapped (Yet Assembled) Scaffolds
  • 12. Characterize HORs in Human Genome1 1. GRCh38  Alpha  Satellite  Reference  Models   1
  • 13. A B C D E F Characterize HORs in Human Genome1 1. GRCh38  Alpha  Satellite  Reference  Models   1
  • 14. >200 ENCODE datasets A B C D E F Characterize HORs in Human Genome1 1. GRCh38  Alpha  Satellite  Reference  Models   >200 ENCODE datasets y Step Example For Single P-read, I α-Centauri (centromeric automated repeat identification) 5’… …3’ 10x 10 B C D EF A 10 10 10 10 10 5’ 3’ 1 http://github.com/volkansevim/alpha- CENTAURI.
  • 16. Experimental Evidence: FISH Hybridization/Mapping and Screening Somatic Cell Hybrid Panel B C D EF A D7Z1 6-mer Waye  et  al  (1987)   98%    GenBank:  M16101   Flow Sorted Chromosome Alignment/Enrichment Sequence enrichment analysis of isolated human chromosomes Long Range Paired Read Support “Anchor” to mapped to the assembled p-arm and/ or q-arm Chromosome specific assignment
  • 18. Characterize HORs in Human Genome 1. GRCh38  Alpha  Satellite  Reference  Models   DXZ1 (12-mer) CENX e.g. 1 2 3 4 5 6 7 8 9 10 11 12 LINEHuRef WGS Sanger read Db Constructing WGS Read Libraries for each HOR array2 LINEA/T 1
  • 19. Characterize HORs in Human Genome 1. GRCh38  Alpha  Satellite  Reference  Models   Constructing WGS Read Libraries for each HOR array m3v1 m1v1 m2v1 m2v2 m4v1 m12v1 m5v1 m6v1 LINE m11v1 m10v1 m9v1 m8v1 m7v1 1.01.0 1.0 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.3 0.7 0.3 0.7 1.0 LINEA/T 2 1 3 Model ArrayVariants in Sequence Graph:
  • 20. linearSat • 2nd Order Markov Chain • Length determined by normalized array length estimates m3v1 m1v1 m2v1 m2v2 m4v1 m12v1 m5v1 m6v1 LINE m11v1 m10v1 m9v1 m8v1 m7v1 1.01.0 1.0 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.3 0.7 0.3 0.7 1.0 Not the “true” long-range organization, yet adequately represents the alpha satellite array sequence https://github.com/JimKent/linearSat
  • 21. LINEAR  ORDERING  OF  REFERENCE  MODELS  AND   ASSEMBLED  CONTIGS  USING  MATE  PAIRS CENXXp Xq 3.8 Mb chrX 2.25Mb; ~860 HOR units0.73Mb; ~43 HOR units 0.3Mb; Low Copy Repeat pp 3p 3qCEN3.1 CEN3.2 Unmapped HuRef Assembled Contig(s) (e.g.ABBA01185959) chr3
  • 22. Yp Yq Xp Xq 100Kb 12p 12q 17q17p 2p 2q 6p 6q 3p 3q 4p 4q 11p 11q 8p 8q 10p 10q 7p 7pq 7q 9q9p 1p 1q 16q16p 18p 18q 19p 19q 20q20p 5p 5q 1 2 3 4 5 6 7 8 9 10 11 12 15 16 17 18 19 20 15q 15p X Y 21p 14q 21q Acrocentric Chr (13,14,21,22) An Initial Draft of Human Centromere Sequence Composition Alpha  Satellite  Reference  Models:   ~60  Mb  (59571670  bp)
  • 23. CENTROMERE  SEQUENCE  ASSEMBLY   1. GRCh38  Alpha  Satellite  Reference  Models   2. Linear  Assembly  of  a  Human  Centromere   Miga, KH., et al. Genome research 24.4 (2014): 697-707.l 20
  • 24. LINEAR  ASSEMBLY  OF    A  HUMAN   CENTROMERE  ON  THE  Y  CHROMOSOME Small, haploid satellite array with well-characterized 5.8 kb repeat p-arm q-arm
  • 25. BACS:  OVERLAP-­‐LAYOUT-­‐ASSEMBLY p-arm q-arm Collection of 9 BACs known to span the Y Centromere Overlap determined by single copy sequence variants Tilford et al 2001 Nature
  • 26. HIGH  QUALITY  +  LONG  (100  kb  +)  READS ~100 kb Collapsed Representation Challenge of Assembling Identical Tandem Repeats with Short Reads
  • 27. HIGH  QUALITY  +  LONG  (100  kb  +)  READS High Quality Consensus Sequence ~100 kb
  • 28. NANOPORE  SEQUENCING:  LONGBOARD  (1D)UCSC LONGBOARD 1D PROTOCOL
  • 29. LONGBOARD 1D PROTOCOL NANOPORE  SEQUENCING:  LONGBOARD  (1D)
  • 30. UCSC LONGBOARD 1D PROTOCOL In total, we have generated 3500+ reads greater than 150 kb NANOPORE  SEQUENCING:  LONGBOARD  (1D)
  • 31. MULTIPLE ALIGNMENT STRATEGY TO IMPROV QUALITY BY CONSENSUS High Qualit Consensus Req Modest Cove UCSC LONGBOARD 1D PROTOCOL MULTIPLE  ALIGNMENT  STRATEGY  TO  IMPROVE   QUALITY  BY  CONSENSUS
  • 32. RP11 718M18 221.4 kb Vector Insert 634 Predicted Nucleotide Variants 2 Tandem Structural Rearrangements 38 CENY RPTS (>99% Identity to published consensus) Homopolymers [A]n Homopolymers [T]n
  • 33. Identify informative, single copy sites in the array useful for overlap BAC-based assembly Y SINGLE COPY VARIANTS USING ILLUMINA DATA RP11 718M18 221.4 kb VALIDATE  HIGH-­‐CONFIDENT    SINGLE  COPY  VARIANTS  WITH  ILLUMINA RP11 718M18 221.4 kb
  • 34. VALIDATE  HIGH-­‐CONFIDENT    SINGLE  COPY  VARIANTS
  • 35. LINEAR  ASSEMBLY  OF  HUMAN  Y  CENTROMERE
  • 36. Future  PerspecSve 1.      Linear  assemblies  of  human  centromeric   regions  improve  in  step  with  sequencing   technology  (i.e.  read  length  and  quality)   2.    One  genome  is  not  enough:  Highly  variable   3.    Linear  CEN  assemblies  present  a  mapping   challenge  to  most  genomic  applicaSons
  • 37. True Linear Maps of Human CEN Regions Y CEN True Linear Arrangement Informatics/Analysis Data Structure
  • 38. Key Advantages of Satellite DNA Graphs 1. Eliminates sequence redundancy
  • 39. Key Advantages of Satellite DNA Graphs Improves Unambiguous Short Read Mapping REPEAT REPEAT REPEAT ? 5’ 3’REPEAT Benedict Paten Adam Novak Centromere Graphs Demonstrate unambiguous mapping the majority ( > 98%) of 1000 genome alpha satellite reads 1. Eliminates sequence redundancy
  • 40. Key Advantages of Satellite DNA Graphs 1. Eliminates sequence redundancy 2. Information describing long-range haplotypes are retained as defined “paths” in the graph:
  • 41. Key Advantages of Satellite DNA Graphs 1. Eliminates sequence redundancy 2. Information describing long-range haplotypes are retained as defined “paths” in the graph 3. Graph data structure and sequence analysis tools will be consistent with the rest of the human genome The major histocompatibility complex (Kiran Garimella & Gil McVean)
  • 42. Creating (and mapping to) a Universal Reference Genome Benedict Paten, Adam Novak, David Haussler, UC Santa Cruz Mark Akeson Miten Jain Hugh Olsen Benedict Paten Dave Deamer Robin AbuShumays Andrew Smith Ian Fiddes Art Rand Logan Mulroney Jordan Eizenga Rojin Safavi Rachel Lawton Andrew Bailey Ariah Mackie David Haussler Benedict Paten Jim Kent Sofie Salama UCSC Nanopore Analysis Group Miten Jain Hugh Olsen Mark Akeson Dan Turner David Stoddart Oxford Nanopore Technologies Huntington F. Willard David Page Product Version Device MinION MK1 Flow cell FLO-MIN106 Kits Rapid Sequencing Kit Data analysis Albacore 1.0.1 Metrichor 1D Acknowledgements