An assessment of computational
genotyping of Structural Variations for
clinical diagnosis
Fritz Sedlazeck
Oct, 16, 2018
Scientific interests
Detection of Variants
Clairvoyante
Lou et al. (in review)
Sniffles
Sedlazeck et. al. (2018)
SURVIVOR
Jeffares et. al. (2017)
Mapping/ Assembly reads
NextGenMap-LR
Sedlazeck et. al. (2018)
Falcon Unzip
Chin et.al. (2016)
NextGenMap
Sedlazeck et.al. (2013)
Benchmarking
Teaser
Smolka et.al. (2015)
Sequencing
Jünemann et.al. (2013)
Applications
Model organisms:
-Cancer (SKBR3) (in preparation)
-miRNA editing (Vesely et.al. 2012)
Non Model organisms:
-Cottus transposons (Dennenmoser
et. al. 2017)
-Clunio (Kaiser et. al. 2016)
-Seabass (Vij et.al. 2016)
-Pineapple (Ming et.al. 2015)
“moonlight”'
How to detect Structural Variations
Structural Variations
Genomic DisordersEvolution
Impact on regulation Impact on phenotypes
RegulatoryState
Cell Line
A
54
9A
o
rta
B
_
ce
lls_
P
B
_R
o
ad
m
ap
C
D
1
4C
D
16
__
m
on
ocyte_
C
B
C
D
14
C
D
16
__m
ono
cyte
_V
B
C
D
4
_a
b_
T
_
cell_
V
B
C
D
8_a
b_
T
_
ce
ll_C
B
C
M
_C
D
4
_ab
_T
_cell_
V
B
D
N
D
_4
1
e
osin
o
ph
il_V
B
E
P
C
_V
B
eryth
rob
la
st_C
B
F
e
ta
l_
A
dren
al_
G
la
n
d
F
e
tal_
Intestin
e_L
arg
e
F
etal_
In
te
stin
e_
S
m
all
F
e
ta
l_
M
u
scle
_
Le
g
F
etal_
M
uscle
_T
runk
F
etal_
S
tom
a
ch
F
e
tal_
T
hym
us
G
astric
G
M
12
87
8
H
1_
m
esenchym
al
H
1_
ne
uron
al_
p
rog
en
itor
H
1_
troph
ob
la
st
H
1
E
S
C
H
9
H
e
La
_S
3
H
e
pG
2H
M
E
CH
S
M
M
H
S
M
M
tub
e
H
U
V
E
C
_p
ro
l_
C
B
H
U
V
E
CIM
R
90
iP
S
_2
0b
iP
S
_D
F
_19
_1
1
iP
S
_D
F
_6
_9K
56
2
Le
ft_V
e
ntric
leL
un
g
M
0_
m
acro
ph
ag
e_
C
B
M
0_
m
acrop
hag
e_
V
B
M
1_m
acro
ph
age
_C
B
M
1_
m
acro
ph
ag
e_
V
B
M
2
_m
a
crop
ha
ge
_C
B
M
2_
m
acro
ph
ag
e_
V
B
M
ono
cyte
s_C
D
1
4_
P
B
_R
o
ad
m
ap
M
on
ocyte
s_
C
D
1
4
M
S
C
_V
B
n
aiv
e
_B
_ce
ll_
V
B
N
a
tu
ral_
K
ille
r_cells_P
B
ne
utrop
hil_
C
B
n
eutrop
hil_m
ye
lo
cyte
_B
M
n
eu
tro
ph
il_V
BN
H
_A
N
H
D
F
_A
DN
H
E
KN
H
LF
O
steob
l
O
vary
P
an
crea
s
P
la
ce
nta
P
soa
s_M
uscle
R
ig
ht_A
triu
m
S
m
all_
Intestin
e
S
ple
e
n
T
_cells_P
B
_R
oa
dm
a
p
T
hym
us
C
T
C
F
_b
in
din
g
_siteA
C
T
IV
E
C
T
C
F
_
bin
d
in
g_
site
IN
A
C
T
IV
E
C
T
C
F
_bin
d
in
g_
site
P
O
IS
E
D
C
T
C
F
_
bin
d
in
g_
site
R
E
P
R
E
S
S
E
D
e
nha
ncerA
C
T
IV
E
e
nh
an
ce
rIN
A
C
T
IV
E
en
han
ce
rP
O
IS
E
D
e
nh
an
cerR
E
P
R
E
S
S
E
D
op
en
_chrom
a
tin
_reg
io
nA
C
T
IV
E
o
pe
n_
chro
m
atin
_
re
gio
n
IN
A
C
T
IV
E
o
pe
n_
chro
m
atin
_re
gio
n
N
A
ope
n_
ch
ro
m
atin
_
regio
n
P
O
IS
E
D
o
pe
n_
chro
m
atin
_re
gio
n
R
E
P
R
E
S
S
E
D
p
rom
o
te
rA
C
T
IV
E
pro
m
oter_
fla
n
kin
g
_reg
io
nA
C
T
IV
E
p
rom
o
te
r_fla
nkin
g_
re
gio
n
IN
A
C
T
IV
E
p
rom
o
te
r_fla
nkin
g_
regio
n
P
O
IS
E
D
p
ro
m
o
te
r_fla
nkin
g_re
gio
n
R
E
P
R
E
S
S
E
D
prom
oterIN
A
C
T
IV
E
pro
m
oterP
O
IS
E
D
prom
oterR
E
P
R
E
S
S
E
D
T
F
_b
in
din
g
_siteA
C
T
IV
E
T
F
_
bin
d
in
g_
site
IN
A
C
T
IV
E
T
F
_
bin
d
in
g_
site
N
A
T
F
_
bin
d
in
g_
site
P
O
IS
E
D
T
F
_
bin
d
in
g_
site
R
E
P
R
E
S
S
E
D
A
54
9A
o
rta
B
_
ce
lls_
P
B
_R
o
ad
m
ap
C
D
1
4C
D
16
__
m
on
ocyte_
C
B
C
D
14
C
D
16
__m
ono
cyte
_V
B
C
D
4
_a
b_
T
_
cell_
V
B
C
D
8_a
b_
T
_
ce
ll_C
B
C
M
_C
D
4
_ab
_T
_cell_
V
B
D
N
D
_4
1
e
osin
o
ph
il_V
B
E
P
C
_V
B
eryth
rob
la
st_C
B
F
e
ta
l_
A
dren
al_
G
la
n
d
F
e
tal_
Intestin
e_L
arg
e
F
etal_
In
te
stin
e_
S
m
all
F
e
ta
l_
M
u
scle
_
Le
g
F
etal_
M
uscle
_T
runk
F
etal_
S
tom
a
ch
F
e
tal_
T
hym
us
G
astric
G
M
12
87
8
H
1_
m
esenchym
al
H
1_
ne
uron
al_
p
rog
en
itor
H
1_
troph
ob
la
st
H
1
E
S
C
H
9
H
e
La
_S
3
H
e
pG
2H
M
E
CH
S
M
M
H
S
M
M
tub
e
H
U
V
E
C
_p
ro
l_
C
B
H
U
V
E
CIM
R
90
iP
S
_2
0b
iP
S
_D
F
_19
_1
1
iP
S
_D
F
_6
_9K
56
2
Le
ft_V
e
ntric
leL
un
g
M
0_
m
acro
ph
ag
e_
C
B
M
0_
m
acrop
hag
e_
V
B
M
1_m
acro
ph
age
_C
B
M
1_
m
acro
ph
ag
e_
V
B
M
2
_m
a
crop
ha
ge
_C
B
M
2_
m
acro
ph
ag
e_
V
B
M
ono
cyte
s_C
D
1
4_
P
B
_R
o
ad
m
ap
M
on
ocyte
s_
C
D
1
4
M
S
C
_V
B
n
aiv
e
_B
_ce
ll_
V
B
N
a
tu
ral_
K
ille
r_cells_P
B
ne
utrop
hil_
C
B
n
eutrop
hil_m
ye
lo
cyte
_B
M
n
eu
tro
ph
il_V
BN
H
_A
N
H
D
F
_A
DN
H
E
KN
H
LF
O
steob
l
O
vary
P
an
crea
s
P
la
ce
nta
P
soa
s_M
uscle
R
ig
ht_A
triu
m
S
m
all_
Intestin
e
S
ple
e
n
T
_cells_P
B
_R
oa
dm
a
p
T
hym
us
C
T
C
F
_b
in
din
g
_siteA
C
T
IV
E
C
T
C
F
_
bin
d
in
g_
site
IN
A
C
T
IV
E
C
T
C
F
_bin
d
in
g_
site
P
O
IS
E
D
C
T
C
F
_
bin
d
in
g_
site
R
E
P
R
E
S
S
E
D
e
nha
ncerA
C
T
IV
E
e
nh
an
ce
rIN
A
C
T
IV
E
en
han
ce
rP
O
IS
E
D
e
nh
an
cerR
E
P
R
E
S
S
E
D
op
en
_chrom
a
tin
_reg
io
nA
C
T
IV
E
o
pe
n_
chro
m
atin
_
re
gio
n
IN
A
C
T
IV
E
o
pe
n_
chro
m
atin
_re
gio
n
N
A
ope
n_
ch
ro
m
atin
_
regio
n
P
O
IS
E
D
o
pe
n_
chro
m
atin
_re
gio
n
R
E
P
R
E
S
S
E
D
p
rom
o
te
rA
C
T
IV
E
pro
m
oter_
fla
n
kin
g
_reg
io
nA
C
T
IV
E
p
rom
o
te
r_fla
nkin
g_
re
gio
n
IN
A
C
T
IV
E
p
rom
o
te
r_fla
nkin
g_
regio
n
P
O
IS
E
D
p
ro
m
o
te
r_fla
nkin
g_re
gio
n
R
E
P
R
E
S
S
E
D
prom
oterIN
A
C
T
IV
E
pro
m
oterP
O
IS
E
D
prom
oterR
E
P
R
E
S
S
E
D
T
F
_b
in
din
g
_siteA
C
T
IV
E
T
F
_
bin
d
in
g_
site
IN
A
C
T
IV
E
T
F
_
bin
d
in
g_
site
N
A
T
F
_
bin
d
in
g_
site
P
O
IS
E
D
T
F
_
bin
d
in
g_
site
R
E
P
R
E
S
S
E
D
0500100015002000
scale
affected#
Remaining Challenges for SVs calling
1. Accuracy of the calls
1. False positives
2. False negatives
2. Functional interpretation?
1. Population frequencies/ Curation
Illumina data
PacBio data
ONT data
Remaining Challenges for SVs calling
1. Accuracy of the calls
1. False positives
2. False negatives
2. Functional interpretation?
1. Population frequencies/ Curation
Illumina
PacBio
Nanopore
How to call SV in routine scans?
SV genotyping
• Advantages
• Low/no false positives
• False negatives ??
• Focus on variants that are know to have an impact.
• Disadvantages
• We cannot find novel SVs
Varuna Chander
Approaches
• DELLY: SV caller that also supports genotyping
• STIX: SV genotyper
• SVTyper: SV genotyper
• SV2: SV genotyper
Simulated data
1. We simulated SVs of different
types and sizes
2. Called SVs with Delly, Manta
and Lumpy
3. Merged calls with SURVIVOR
4. Used the merges as input to
the SV genotyper
5. Evaluated their results for SV
that they support.
Giab v0.5.0 deletions
• Most of the genotyper only
handle the DEL
• Constrain on the input
format/field
• Lack of sensitivity
Paragraph
• Graph based SV genotyper
• GiaB all types:
• Sensitivity: 82%
• Precision: 99%
• GT concordance: 80%
• Available:
github.com/Illumina/paragraph
P Krusche et al (in preparation)
Remaining Challenges for SVs calling
1. Accuracy of the calls
1. False positives
2. False negatives
2. Functional interpretation?
1. Population frequencies/ Curation
STIX: Population frequency
• Online framework to annotate
your SVs with allele
frequencies.
• ~0.1 sec / SV
• Storing informative reads
• (0.18% of BAM)
• Currently ~9000 samples
• Multiple ethnicities
Layer et al. (in preparation)
Acknowledgments
Varuna Chander
William Salerno
Richard Gibbs
Peter Krusche
Sai Chen,
Mike Eberle
Ryan Layer

Giab sv genotyping

  • 1.
    An assessment ofcomputational genotyping of Structural Variations for clinical diagnosis Fritz Sedlazeck Oct, 16, 2018
  • 2.
    Scientific interests Detection ofVariants Clairvoyante Lou et al. (in review) Sniffles Sedlazeck et. al. (2018) SURVIVOR Jeffares et. al. (2017) Mapping/ Assembly reads NextGenMap-LR Sedlazeck et. al. (2018) Falcon Unzip Chin et.al. (2016) NextGenMap Sedlazeck et.al. (2013) Benchmarking Teaser Smolka et.al. (2015) Sequencing Jünemann et.al. (2013) Applications Model organisms: -Cancer (SKBR3) (in preparation) -miRNA editing (Vesely et.al. 2012) Non Model organisms: -Cottus transposons (Dennenmoser et. al. 2017) -Clunio (Kaiser et. al. 2016) -Seabass (Vij et.al. 2016) -Pineapple (Ming et.al. 2015) “moonlight”'
  • 3.
    How to detectStructural Variations
  • 4.
    Structural Variations Genomic DisordersEvolution Impacton regulation Impact on phenotypes RegulatoryState Cell Line A 54 9A o rta B _ ce lls_ P B _R o ad m ap C D 1 4C D 16 __ m on ocyte_ C B C D 14 C D 16 __m ono cyte _V B C D 4 _a b_ T _ cell_ V B C D 8_a b_ T _ ce ll_C B C M _C D 4 _ab _T _cell_ V B D N D _4 1 e osin o ph il_V B E P C _V B eryth rob la st_C B F e ta l_ A dren al_ G la n d F e tal_ Intestin e_L arg e F etal_ In te stin e_ S m all F e ta l_ M u scle _ Le g F etal_ M uscle _T runk F etal_ S tom a ch F e tal_ T hym us G astric G M 12 87 8 H 1_ m esenchym al H 1_ ne uron al_ p rog en itor H 1_ troph ob la st H 1 E S C H 9 H e La _S 3 H e pG 2H M E CH S M M H S M M tub e H U V E C _p ro l_ C B H U V E CIM R 90 iP S _2 0b iP S _D F _19 _1 1 iP S _D F _6 _9K 56 2 Le ft_V e ntric leL un g M 0_ m acro ph ag e_ C B M 0_ m acrop hag e_ V B M 1_m acro ph age _C B M 1_ m acro ph ag e_ V B M 2 _m a crop ha ge _C B M 2_ m acro ph ag e_ V B M ono cyte s_C D 1 4_ P B _R o ad m ap M on ocyte s_ C D 1 4 M S C _V B n aiv e _B _ce ll_ V B N a tu ral_ K ille r_cells_P B ne utrop hil_ C B n eutrop hil_m ye lo cyte _B M n eu tro ph il_V BN H _A N H D F _A DN H E KN H LF O steob l O vary P an crea s P la ce nta P soa s_M uscle R ig ht_A triu m S m all_ Intestin e S ple e n T _cells_P B _R oa dm a p T hym us C T C F _b in din g _siteA C T IV E C T C F _ bin d in g_ site IN A C T IV E C T C F _bin d in g_ site P O IS E D C T C F _ bin d in g_ site R E P R E S S E D e nha ncerA C T IV E e nh an ce rIN A C T IV E en han ce rP O IS E D e nh an cerR E P R E S S E D op en _chrom a tin _reg io nA C T IV E o pe n_ chro m atin _ re gio n IN A C T IV E o pe n_ chro m atin _re gio n N A ope n_ ch ro m atin _ regio n P O IS E D o pe n_ chro m atin _re gio n R E P R E S S E D p rom o te rA C T IV E pro m oter_ fla n kin g _reg io nA C T IV E p rom o te r_fla nkin g_ re gio n IN A C T IV E p rom o te r_fla nkin g_ regio n P O IS E D p ro m o te r_fla nkin g_re gio n R E P R E S S E D prom oterIN A C T IV E pro m oterP O IS E D prom oterR E P R E S S E D T F _b in din g _siteA C T IV E T F _ bin d in g_ site IN A C T IV E T F _ bin d in g_ site N A T F _ bin d in g_ site P O IS E D T F _ bin d in g_ site R E P R E S S E D A 54 9A o rta B _ ce lls_ P B _R o ad m ap C D 1 4C D 16 __ m on ocyte_ C B C D 14 C D 16 __m ono cyte _V B C D 4 _a b_ T _ cell_ V B C D 8_a b_ T _ ce ll_C B C M _C D 4 _ab _T _cell_ V B D N D _4 1 e osin o ph il_V B E P C _V B eryth rob la st_C B F e ta l_ A dren al_ G la n d F e tal_ Intestin e_L arg e F etal_ In te stin e_ S m all F e ta l_ M u scle _ Le g F etal_ M uscle _T runk F etal_ S tom a ch F e tal_ T hym us G astric G M 12 87 8 H 1_ m esenchym al H 1_ ne uron al_ p rog en itor H 1_ troph ob la st H 1 E S C H 9 H e La _S 3 H e pG 2H M E CH S M M H S M M tub e H U V E C _p ro l_ C B H U V E CIM R 90 iP S _2 0b iP S _D F _19 _1 1 iP S _D F _6 _9K 56 2 Le ft_V e ntric leL un g M 0_ m acro ph ag e_ C B M 0_ m acrop hag e_ V B M 1_m acro ph age _C B M 1_ m acro ph ag e_ V B M 2 _m a crop ha ge _C B M 2_ m acro ph ag e_ V B M ono cyte s_C D 1 4_ P B _R o ad m ap M on ocyte s_ C D 1 4 M S C _V B n aiv e _B _ce ll_ V B N a tu ral_ K ille r_cells_P B ne utrop hil_ C B n eutrop hil_m ye lo cyte _B M n eu tro ph il_V BN H _A N H D F _A DN H E KN H LF O steob l O vary P an crea s P la ce nta P soa s_M uscle R ig ht_A triu m S m all_ Intestin e S ple e n T _cells_P B _R oa dm a p T hym us C T C F _b in din g _siteA C T IV E C T C F _ bin d in g_ site IN A C T IV E C T C F _bin d in g_ site P O IS E D C T C F _ bin d in g_ site R E P R E S S E D e nha ncerA C T IV E e nh an ce rIN A C T IV E en han ce rP O IS E D e nh an cerR E P R E S S E D op en _chrom a tin _reg io nA C T IV E o pe n_ chro m atin _ re gio n IN A C T IV E o pe n_ chro m atin _re gio n N A ope n_ ch ro m atin _ regio n P O IS E D o pe n_ chro m atin _re gio n R E P R E S S E D p rom o te rA C T IV E pro m oter_ fla n kin g _reg io nA C T IV E p rom o te r_fla nkin g_ re gio n IN A C T IV E p rom o te r_fla nkin g_ regio n P O IS E D p ro m o te r_fla nkin g_re gio n R E P R E S S E D prom oterIN A C T IV E pro m oterP O IS E D prom oterR E P R E S S E D T F _b in din g _siteA C T IV E T F _ bin d in g_ site IN A C T IV E T F _ bin d in g_ site N A T F _ bin d in g_ site P O IS E D T F _ bin d in g_ site R E P R E S S E D 0500100015002000 scale affected#
  • 5.
    Remaining Challenges forSVs calling 1. Accuracy of the calls 1. False positives 2. False negatives 2. Functional interpretation? 1. Population frequencies/ Curation Illumina data PacBio data ONT data
  • 6.
    Remaining Challenges forSVs calling 1. Accuracy of the calls 1. False positives 2. False negatives 2. Functional interpretation? 1. Population frequencies/ Curation Illumina PacBio Nanopore
  • 7.
    How to callSV in routine scans? SV genotyping • Advantages • Low/no false positives • False negatives ?? • Focus on variants that are know to have an impact. • Disadvantages • We cannot find novel SVs Varuna Chander
  • 8.
    Approaches • DELLY: SVcaller that also supports genotyping • STIX: SV genotyper • SVTyper: SV genotyper • SV2: SV genotyper
  • 9.
    Simulated data 1. Wesimulated SVs of different types and sizes 2. Called SVs with Delly, Manta and Lumpy 3. Merged calls with SURVIVOR 4. Used the merges as input to the SV genotyper 5. Evaluated their results for SV that they support.
  • 10.
    Giab v0.5.0 deletions •Most of the genotyper only handle the DEL • Constrain on the input format/field • Lack of sensitivity
  • 11.
    Paragraph • Graph basedSV genotyper • GiaB all types: • Sensitivity: 82% • Precision: 99% • GT concordance: 80% • Available: github.com/Illumina/paragraph P Krusche et al (in preparation)
  • 12.
    Remaining Challenges forSVs calling 1. Accuracy of the calls 1. False positives 2. False negatives 2. Functional interpretation? 1. Population frequencies/ Curation
  • 13.
    STIX: Population frequency •Online framework to annotate your SVs with allele frequencies. • ~0.1 sec / SV • Storing informative reads • (0.18% of BAM) • Currently ~9000 samples • Multiple ethnicities Layer et al. (in preparation)
  • 14.
    Acknowledgments Varuna Chander William Salerno RichardGibbs Peter Krusche Sai Chen, Mike Eberle Ryan Layer

Editor's Notes

  • #2 Welcome everyone. My name is Fritz Sedlazeck and I am currently working at the Human Genome Sequencing Center @ Baylor in Houston. Today I am going to give you an update on our efforts to improve long read mapping as well as SV calling. Before I dive into that let me shortly introduce myself and my scientific interest.
  • #3 I am a computational biologist mainly focusing on method developing for mapping of short and long reads and detection of SVs. To get a better insight in what are the artifact and what is the true signal that we have to deal with I imitated and contribute in benchmarking studies for sequencers and mappers. Overall I am also happy to collaborate with many people on multiple organisms around the world.
  • #5 Evolution: Main driver. E.g. gene gains or loss. Hybrid Genome architecture (cottus) Genomic disorders: Cancer (in prep.) and other diseases Impact on regulation that we are currently studying over ENTEX Impact of phenotypes: That I just published where we could show the contribution of CNV and rearrangements on traits.
  • #8 Establish catalog of SVs that we already understand. Use genotyping to scan for these SVs Report found SV per sample
  • #15 Display slide during questions. Check with Will!!