Dr. Sascha Ott (University of Warwick) - Data-driven systems medicine

Sascha Ott
Single-cell RNA sequencing in
reproductive medicine

How does Drop-seq work?
mRNA CAPTURE
Uniquely barcoded
mRNA capture beads
Cells
Cells
Oil
Oil
sample loop
to minimise
bead
damage.
The oil, cells and mRNA capture beads are
pumped through the microfluidic chip.
The two aqueous streams of barcoded mRNA
capture beads and cells mix together less than a
millisecond before the microfluidic junction and
are then encapsulated in droplets.
The lysis buffer in which beads are resuspended
breaks the cells open and the mRNA is captured
on the bead. Following mRNA capture in
droplets, the emulsion that comes off the chip is

Data used and key questions
Endometrial data - recurrent miscarriages:
Prof. Jan Brosens
Prof. Siobhan Quenby

- Fresh patient samples
processed using Drop-seq
Prof. Jan Brosens

- Aim to identify differences
between RPL patients and controls
Prof. Jan Brosens

Pancreatic data - pancreas in pregnancy:
Dr Mike Khan
Prof. Jan Brosens

- Pancreatic islet samples from pregnant
mice processed using Drop-seq
Dr Mike Khan
Prof. Jan Brosens

- Aim to identify mechanisms of pancreatic
adaptation during pregnancy
- Pancreatic islet samples from pregnant
mice processed using Drop-seq
Dr Mike Khan
Prof. Jan Brosens

Computational Problem
−20
0
20
−20 0 20
tSNE_1
tSNE_2
Endothelial
Epithelial − state 1
Epithelial − state 2
NK
Stroma − state 1
Stroma − state 2
Stroma − state 3
Stroma − state 4
Unknown

Raw Data
TAOK1
FOS
GZMA
CXCL14
PAEP
RIMKLB
GNLY
CD55
JUN
GZMB
…
Cell1
Cell2
Cell3
Cell4
Cell5
Cell6
Cell7
Cell8
Cell9
Cell10
Cell11
Cell12
Cell13
Cell14
Cell15
0
0
0
0
34
0
0
0
0
13
1
0
371
388
33
105
43
0
0
18
0
0
0
1
31
0
0
0
0
14
1
203
0
0
5
0
3
12
13
1
0
312
0
0
1
0
0
21
9
2
8
0
209
272
46
28
22
0
0
44
0
0
172
157
48
60
15
0
0
30
13
0
0
1
115
5
7
0
0
91
0
75
0
0
1
0
2
21
17
0
1
1
18
169
11
73
36
0
0
15
2
176
0
0
13
1
1
3
26
8
0
129
0
2
2
0
0
1
19
1
11
0
2
0
116
0
4
0
0
61
18
0
0
5
131
0
0
0
0
93
2
0
155
273
82
29
23
0
0
51
( total of 17,362 rows )
…
…
…
…
…
…
…

Highest expression levels inform cell type
Cell 1 Cell 2 Cell 3 Cell 4 Cell 5
1 Ins2 Gcg Ghrl Sst Ppy
2 Ins1 Ttr Rbp4 Ppy Pyy
3 Iapp Malat1 Ins2 Iapp Malat1
4 Malat1 Chga Malat1 Malat1 Chgb
5 Chga Hsp90b1 Hsp90b1 Pyy Gcg
Cell Type Beta Alpha Epsilon Delta Gamma

Expression profiles are intrinsically bimodal
0.0 0.1 0.2 0.3 0.4 0.5
0123456
library normalised expression
frequency(logscale)
Ins2

Julia’s Algorithm for Cell
Classification (JACC the Ripper)

JACC the Ripper
tailored specifically for scRNA-seq
data
0.0 0.1 0.2 0.3 0.4 0.5
0123456
frequency(logscale)

JACC the Ripper
data
0.0 0.1 0.2 0.3 0.4 0.5
0123456
frequency(logscale)
incorporates a “top-down”
hierarchical classification system

JACC the Ripper
data
0.0 0.1 0.2 0.3 0.4 0.5
0123456
frequency(logscale)
incorporates a “top-down”
hierarchical classification system
compiles a report providing
evidence to the user

Three types of bimodality profiles
within scRNA-seq data

1. strong marker genes with many positive cells
Pancreas (mouse, unpublished data)
Testis (human, public data)

0.0 0.1 0.2 0.3 0.4 0.5
0123456
frequency(logscale)
Ins2
0.00 0.02 0.04 0.06 0.08 0.10
02468
frequency(logscale)
TNP1

most abundant genes
0.0 0.1 0.2 0.3 0.4 0.5
0123456
frequency(logscale)
Ins2
0.00 0.02 0.04 0.06 0.08 0.10
02468
frequency(logscale)
TNP1

most abundant genes
0.0 0.1 0.2 0.3 0.4 0.5
0123456
frequency(logscale)
Ins2
0.00 0.02 0.04 0.06 0.08 0.10
02468
frequency(logscale)
TNP1
Hartigan’s dip test of unimodality
frequency
0.00 0.05 0.10 0.15 0.20
0100200300400500600
dip p−value = 0.9834
dip p−value < 2.2e−16
Ins1
Malat1

most abundant genes
positive and negative cells
0.0 0.1 0.2 0.3 0.4 0.5
0123456
frequency(logscale)
Ins2
0.00 0.02 0.04 0.06 0.08 0.10
02468
frequency(logscale)
TNP1
Hartigan’s dip test of unimodality
frequency
0.00 0.05 0.10 0.15 0.20
0100200300400500600
dip p−value = 0.9834
dip p−value < 2.2e−16
Ins1
Malat1

2. strong marker genes with few positive cells

0.00 0.05 0.10 0.15 0.20
02468
frequency(logscale)
Ghrl
0.000 0.005 0.010 0.015 0.020 0.025
02468
frequency(logscale)
MT1G

most abundant genes not labelled as
multimodal by dip test
0.00 0.05 0.10 0.15 0.20
02468
frequency(logscale)
Ghrl
0.000 0.005 0.010 0.015 0.020 0.025
02468
frequency(logscale)
MT1G

rare cell type assessment
0.00 0.05 0.10 0.15 0.20
02468
frequency(logscale)
Ghrl
0.000 0.005 0.010 0.015 0.020 0.025
02468
frequency(logscale)
MT1G

3. weak marker genes with many positive cells

0.0000 0.0010 0.0020
02468
frequency(logscale)
Hhex
0.000 0.002 0.004
02468
frequency(logscale)
VWF

0.0000 0.0010 0.0020
02468
frequency(logscale)
Hhex
0.000 0.002 0.004
02468
frequency(logscale)
VWF
all other genes

0.0000 0.0010 0.0020
02468
frequency(logscale)
Hhex
0.000 0.002 0.004
02468
frequency(logscale)
VWF
all other genes
weak marker gene assessment

0.0000 0.0010 0.0020
02468
frequency(logscale)
Hhex
0.000 0.002 0.004
02468
frequency(logscale)
VWF
all other genes
0 10 20 30 40 50 60 70
02468
0 10 20 30 40 50 60 70
02468
UMI counts
frequency(logscale)
Gpx3
observed
expected
heavy tail,
deviates from binomial
0 10 20 30 40 50 60 70
02468
0 10 20 30 40 50 60 70
02468
UMI counts
frequency(logscale)
Canx
observed
expected
no heavy tail,
agrees with binomial

0.0000 0.0010 0.0020
02468
frequency(logscale)
Hhex
0.000 0.002 0.004
02468
frequency(logscale)
VWF
all other genes
0 10 20 30 40 50 60 70
02468
0 10 20 30 40 50 60 70
02468
UMI counts
frequency(logscale)
Gpx3
observed
expected
heavy tail,
0 10 20 30 40 50 60 70
02468
0 10 20 30 40 50 60 70
02468
UMI counts
frequency(logscale)
Canx
observed
expected
no heavy tail,

0.0000 0.0010 0.0020
02468
frequency(logscale)
Hhex
0.000 0.002 0.004
02468
frequency(logscale)
VWF
all other genes
grouping of “gene friends”
0 10 20 30 40 50 60 70
02468
0 10 20 30 40 50 60 70
02468
UMI counts
frequency(logscale)
Gpx3
observed
expected
heavy tail,
0 10 20 30 40 50 60 70
02468
0 10 20 30 40 50 60 70
02468
UMI counts
frequency(logscale)
Canx
observed
expected
no heavy tail,

Recursive approach of JACC
dataset

dataset
0.0 0.1 0.2 0.3 0.4 0.5
0123456
frequency(logscale)
0.000 0.005 0.010 0.015 0.020 0.025
02468
frequency(logscale)
0.0000 0.0010 0.0020
02468
frequency(logscale)
0.0000 0.0010 0.0020
02468
frequency(logscale)
0.0000 0.0010 0.0020
02468
frequency(logscale)
0.0000 0.0010 0.0020
02468
frequency(logscale)
?

dataset
cell type 1 cell type 2
cell type 4

●
●
●
●
●●
●●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●● ●
●
●● ●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
● ●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●●● ● ●●
●●●
●●●
●●
●
●●● ●● ●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●● ●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
● ●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●●
●
●●● ●●●
●
● ●
●●
●
● ●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●●
●
●●
●
●
● ●● ● ●●
●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●
● ●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1
2
−2
0
2
4
6
−4 0 4 8
Component 1
Component2
orig.ident ● ● ● ●D15 D9 NonP Post
Analysis of pancreas data

Summary
➢ JACC – novel method tailored specifically for scRNA-seq
data
➢ advantages over standard work-flows:
o identifies rare cell populations
o simple in use - no lengthy parameter optimisation steps
o produces a detailed report – transparency to the user
o keeps user close to true nature of data
➢ impact: could be used in any scRNA-seq research
environment  
➢ To appear here:  
http://wsbc.warwick.ac.uk/wsbcToolsWebpage/ 
➢ Developing interactive version of JACC the Ripper

Dr. Sascha Ott (University of Warwick) - Data-driven systems medicine

Recommended

Recommended

More Related Content

Similar to Dr. Sascha Ott (University of Warwick) - Data-driven systems medicine

Similar to Dr. Sascha Ott (University of Warwick) - Data-driven systems medicine (20)

More from mntbs1

More from mntbs1 (15)

Recently uploaded

Recently uploaded (20)

Dr. Sascha Ott (University of Warwick) - Data-driven systems medicine