A machine learning “APPROACH” to recruitment in OA
1. Paweł Widera, PhD
School of Computing Science
Newcastle University, Newcastle, UK
0/31
Disclosure Information
I have no financial relationships with commercial interests to disclose.
My presentation does not include discussion of off-label or investigational use.
2. A machine learning “APPROACH”
to recruitment in OA
Paweł Widera
pawel.widera@ncl.ac.uk
01100010011000110011000100111111110011011110001TCTTTGATACTACGATATGCCCAGTAGGAACCATTTAGAAGTCGCTTGT
10011000100100101000100111101011100000100001110ATTGCTCAAACGTAGGGTAACCGCCGATGAGGTGTTGTTTTGAGGGGT
01000011010010101001011100001010110001010110101GTTGTTACCCATGGGATAGTTAGAAGGG GGATGATCTCACTCTCGTGGC
11000011100110100001111010100000110000101110011AGTGCGTCCCGTACACTTAAGTGGGACTGTCCGTTAATAGGTCTCCGGT
00010110111000110111101101000101001000010110110CTCCTTGCGCACCCGCTCTAGCTGATATGTGCCGTATGGCGAAGCAAA
10100100000000100011100011011011111011001111000AAACACCGGTTCTAGTTTAAGTACGACCCGGGTCGCTCTGGGCATCAT
10111011010011010011110101100101110001101011010AAGCAGGTATAGTGGGAGACAGGCACACCATTAGTTTTAGATGTTCTT
01000100011100001000101000111010110111011001111TGACCACGCTCTAAAGGTCGGCGAGCTTGTCCGCGTTGGGACCGCGTT
00110000110001011110001011001011100111101101100CTCGCCCCGCTTGGGATTGTGATTGTGTAGCTATGACGGAGGGTTTAGG
00010011111110110000011000011100101001101010111AGGCCAGTCTCCCTTCACTGTTTCCCATCACCCCTAGGCTCAAGTATGA
01010111110010001010011001100010010111000110101AGCTGGCGCAGAATCCAGCTTACCGAGGAGCGCGACAGGGAATACTA
10101101001010111010100011111001101101001000101TAATTGAGTCCCTCCCGCATGGCCTGACCGCGGCAGGTTCGTAATGTAG
00101110101010010011101100011111100011100101001GCAGGCAAACATGTCTACTTTACGATGTAGAGTCGTATACAACTGGGA
11110110011001110111011000001011111111110011100AGTCGAGGTGTAAGACCAAGCCCCGGGCGGTGCGGCCTTCGATACAGC
10000110110010110001001010011100111000010010110ATTGTACACATTCTTACGGTAAGTTAAAATCGAGGTGGACAATTAGCC
11001110110001001100110011011100111010110000111TTAAGCATTCTAACGTGCCTACAGTGATAGCTCCAATTGATTCCCTGGA
11001010000100000010110001011000110001010010101AGTCCTGTGAAGCATAGCGGTACTTGCGTTTATCATAGCATGCTGAATC
01110001010101100001111011100001000000011010001GGTGTTCGCGAAATATTCATGCGTGATAGCTACTTTTGTGCCTTGTAAA
00100101101100000010010110100110001010110010111GCGCGAGTCTCCTCGTATCTCCTTTGCGATAGCGAAAAACAAAAAGTA
01000100001001101110101110111101011000011100101AGGCGCCATGTAGTAGGTTAACGCTCAGCGTGGGACCGCCTGCACCTA
11110011010011001111100011011111001110101001110TTCAAGGGAGACCACATTTATTAGTGCTTTAATGACCGTTGGAACCAC
00011000111100010001000100000101010101000010111TGGAAGGCTGTCACTCTGGGGAATTTTCCATCACGGTTAGGGATGTAG
10011001111011100010100011001001110111101000111CAACGATAAAGCAACGCTAGATGGGCATATAGTTACAAGGTTGGCAGC
00001100110100101110111110011011000101100011100GCCGTAATTTCTCAAAGTGGCCAATGGCTGTTATTATAGTACAAACATA
01001101001011010100011001111101001011100110001CGTCGCTGTTGTTTGTCCCAAGGTGTTGTGTTACACAGGCTAACAGAC
101101001000I000011010011010001C01011100000010111TTTTATATCAATATCTTCGACACAGGGASCTGAGGTTGTGTGCTCCCAT
T
OARSI World Congress on Osteoarthritis
Toronto, Canada
2019-05-02
1/31
3. WP1 Leaders
Christoph Ladel
John Loughlin
WP3 Leaders
Floris Lafeber
Agnès Lalande
Data analysis / Machine learning
Jaume Bacardit
Paweł Widera
Paco Welsing
Data harmonisation
Samuel Danso
Paweł Widera
Paco Welsing
Sjaak Peelen
CHECK
Anne Marijnissen
Eefje van Helvoort
MUST
Ida Haugen
Janicke Magnus
Mari Skinnes
DIGICOD
Francis Berenbaum
Jeremie Sellam
PROCOAC
Francisco Blanco
Joana Silva
Carlos Tilve
HOSTAS
Margreet Kloppenburg
Marieke Leof
Jose Krol
Principal Investigators
Jonathan Larkin
Harrie Weinans
2/31
5. APPROACH main objectives
Recruit Collect data Discover
recruit patients for the APPROACH cohort
(who would progress enough in 2 years time)
take as many measurements as possible (at
baseline and 6, 12 and 24 months follow-up visits)
discover attributes and phenotypes related to progression
4/31
6. APPROACH cohort
new longitudinal cohort focused on fast knee OA progressors
recruited from 5 existing European OA cohorts using machine
learning models trained to estimate progression probability
KL grade JSN grade JSW follow-up active years
CHECK
MUST
HOSTAS
✓
✓
✓
✓
✓ 1–10 years
—
2–4 years
2006–2016
2010–2013
2009–2017
DIGICOD
PROCOAC ✓
1 year
2–9 years
2013–2017
1991–2016
5/31
8. Machine learning from historical data
PATIENTS DATA
a1 |a2 |a3 | ... | aN
a1 |a2 |a3 | ... | aN
.
.
.
a1 |a2 |a3 | ... | aN
CATEGORIES
pa in
no progression
.
.
.
structure
MACHINE
LEARNING
ALGORITHM
ML MODEL
(predictor)
PREDICTED
CATEGORIES
pain
structure
.
.
.
no progression
+
7/31
builds computational models from input examples,
and uses them to predict outputs categories
INPUT LEARNING OUTPUT
9. Input data
0 1 2 3 4 5 6 7 8
inclusion
ΔJSW
ΔJSW
patient
periods
Progression periods
duration: 2+ years
ACR clinical criteria
satisfied at inclusion
8/31
10. Output categories
Structure periods
minimum total JSW, must decrease by at least 0.3mm per year
Pain periods
must experience progressive or intense sustained pain
pain increase must be at least 5 WOMAC points per year
pain at the follow-up must be significant (≥ 40 WOMAC points)
special exception: rapid pain progression (≥ 10 points per year)
9/31
11. Output categories
Structure periods
minimum total JSW, must decrease by at least 0.3mm per year
Pain periods
must experience progressive or intense sustained pain
pain increase must be at least 5 WOMAC points per year
pain at the follow-up must be significant ( ≥ 40 WOMAC points)
special exception: rapid pain progression ( ≥ 10 points per year)
Predicted categories
N: non-progressors
P: increased or stable pain
S: structural progression
P+S: both
9/31
13. Learning strategies
balanced random subsets (20% of the examples)
full imbalance set (63% N, 12% P,20% S, 5%P+S)
Data preprocessing
precise attribute type
nominal (left, right)
ordinal (l0w, medium, high)
continuous (1.2, 2.4, 3.7)
missing values imputation
Classification algorithms
single model
(LR, k-NN, SVC, RF)
4 sub-models (one vs. rest)
2 sub-models (per label: P | S)
11/31
14. 11/31
Learning strategies
balanced random subsets (20% of the examples)
full imbalance set (63% N, 12% P,20% S, 5%P+S)
Data preprocessing
precise attribute type
nominal (left, right)
ordinal (l0w, medium, high)
continuous (1.2, 2.4, 3.7)
missing values imputation
Classification algorithms
single model
(LR, k-NN, SVC, RF)
4 sub-models (one vs. rest)
2 sub-models (per label: P | S)
15. Learning strategies
balanced random subsets (20% of the examples)
full imbalance set (63% N, 12% P,20% S, 5%P+S)
Data preprocessing
precise attribute type
nominal (left, right)
ordinal (l0w, medium, high)
continuous (1.2, 2.4, 3.7)
missing values imputation
Classification algorithms
single model
(LR, k-NN, SVC, RF)
4 sub-models (one vs. rest)
2 sub-models (per label: P | S)
11/31
16. Validation
10 repeats of 10-fold cross-validation (CV)
25 models trained per fold
median model score from the median CV-repeat
10-fold CV — training vs. test
fold 1 fold 2 . . . fold 9 fold 10
fold 1 fold 2 . . . fold 9 fold 10
.. . .. . .. . . . .
fold 1 fold 2 . . . fold 9 fold 10
fold 1 fold 2 . . . fold 9 fold 10
CV repeats — randomised partitions
CV 1 CV 2 .. . CV 10
model 1 model 1 model 1
model 2 model 2 model 2
. . . . . . . . .
model 25 model 25 model 25
12/31
17. 12/31
Validation
10 repeats of 10-fold cross-validation (CV)
25 models trained per fold
median model score from the median CV-repeat
10-fold CV — training vs. test
fold 1 fold 2 . . . fold 9 fold 10
fold 1 fold 2 . . . fold 9 fold 10
.. . .. . .. . . . .
fold 1 fold 2 . . . fold 9 fold 10
fold 1 fold 2 . . . fold 9 fold 10
CV repeats — randomised partitions
CV 1 CV 2 .. . CV 10
model 1 model 1 model 1
model 2 model 2 model 2
. . . . . . . . .
model 25 model 25 model 25
18. Validation
10 repeats of 10-fold cross-validation (CV)
25 models trained per fold
median model score from the median CV-repeat
10-fold CV — training vs. test
fold 1 fold 2 . . . fold 9 fold 10
fold 1 fold 2 . . . fold 9 fold 10
.. . .. . .. . . . .
fold 1 fold 2 . . . fold 9 fold 10
fold 1 fold 2 . . . fold 9 fold 10
CV repeats — randomised partitions
CV 1 CV 2 .. . CV 10
model 1 model 1 model 1
model 2 model 2 model 2
. . . . . . . . .
model 25 model 25 model 25
12/31
19. Model performance
FPR (probability of falsealarm)
1.0
TPR(probabilityofdetection)
AUC = 0.81
P
FPR (probability of falsealarm)
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.
6
0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
1.0
AUC = 0.69
13/31
S
20. A look inside the models
P S
SHAP TreeExplainer [Lundberg et al., 2018][Lundberg and Lee, 2017]
14/31
22. Challenges arising from the data
differences in
collected data
differences in
visits timing
decisions on
out-dated
information
1 cohorts other than CHECK cannot be used in model training
(not enough data to define progression)
2 the date of last visit differs across cohorts and patients
3 model prediction confidence gets worse for older data
16/31
23. Harmonisation to CHECK
CHECK
training
periods patients
HOSTAS
H
O
S
T
A
S
Harmonisation
semantic processing
common data concepts
mapping between the concepts
syntactic processing
data transformation
17/31
24. Baseline shift
2017 20192014
last visit
baseline shift
progression
APPROACH lifespan
Model training
using harmonised CHECK periods
baseline shift of 0, 2, 3 and 5 years
Consequences
need to train separate model
for each cohort and each shift
18/31
25. Baseline shift
2017 20192014
last visit
baseline shift
progression
APPROACH lifespan
Model training
using harmonised CHECK periods
baseline shift of 0, 2, 3 and 5 years
Consequences
need to train separate model
for each cohort and each shift
18/31
26. Multi-model prediction
MUST MODEL
5 years shift
id1 | p1(S) | p1(P) | score1
id2 | p2(S) | p2(P) | score2
.
.
.
idN | pN(S) | pN(P) | scoreN
RANKINGPATIENTS DATA PREDICTION
MUST
2011
2012
2013
HOSTAS
2011
2012
2013
2014
2015
MUST MODEL
3 years shift
HOSTAS MODEL
5 years shift
HOSTAS MODEL
3 years shift
HOSTAS MODEL
2 years shift
id1 | p1(S) | p1(P) | score1
id2 | p2(S) | p2(P) | score2
.
.
.
idN | pN(S) | pN(P) | scoreN
19/31
27. Ranking based on model confidence
patient 1 patient 2 patient N
probabilities
P S
0.823 0.466
0.839 0.447
0.679 0.586
0.728 0.528
0.801 0.454
ranking of patients
MACHINE LEARNING MODEL
the ML model is composed of two
sub-models, separately predicting
pain (P) and structure (S) related
progression
the outcome of the model is per
patient probability of becoming an
OA progressor
the probabilities are used to rank the
patients
top patients in the ranking are more
likely to progress
20/31
30. Two-stage recruitment process
new patients
CHECK
cohort 1
cohort 4
selection criteria
ML model 1
ML model 4
ML model 0 screening
ML model
M
E
A
S
U
R
E
M
E
N
T
S
APPROACH
cohort
SELECTION SCREENING
H
H
22/31
31. 23/31
What we expect?
enrol 75% of screened patients (the most likely to progress)
Screening model attributes
basic: age, sex, BMI
pain intensity: KOOS, NRS
KIDA: bone density, eminence height, joint
space width, varus angle, osteophyte area
prediction model N only P only S P + S
uninformed selection 60% 13% 21% 6%
screening model (top 150 patients) 29% 30% 18% 23%
selection best (CHECK 2y shift, top 150) 35% 31% 21% 14%
selection worst (MUST 5y shift, top 60) 55% 32% 7% 7%
*(optimistic estimate on CHECK data)
32. 23/31
What we expect?
enrol 75% of screened patients (the most likely to progress)
Screening model attributes
basic: age, sex, BMI
pain intensity: KOOS, NRS
KIDA: bone density, eminence height, joint
space width, varus angle, osteophyte area
prediction model N only P only S P + S
uninformed selection 60% 13% 21% 6%
screening model (top 150 patients) 29% 30% 18% 23%
selection best (CHECK 2y shift, top 150) 35% 31% 21% 14%
selection worst (MUST 5y shift, top 60) 55% 32% 7% 7%
*(optimistic estimate on CHECK data)
34. Why we use ranking?
SERGAS
02
SERGAS
03
SERGAS
01
threshold
YES
NO
NO
Case-by-case enrolment
requires an acceptance
threshold defined up-front
Ranking-based enrolment
given enough patients in the
initial ranking, decisions for
next patients are easier
25/31
35. Why we use ranking?
SERGAS
02
SERGAS
03
SERGAS
01
APPROACH
NO
YES
YES
YES
NO
YES
YES
YESLUMC 03
UMCU 02
APHP 02
UMCU 04
LUMC 01
APHP 01
UMCU 03
LUMC 04
APHP 04
UMCU 01
LUMC 02
APHP 03
NO
YES
YES
YES
NO
YES
?
Case-by-case enrolment
requires an acceptance
threshold defined up-front
Ranking-based enrolment
given enough patients in the
initial ranking, decisions for
next patients are easier
25/31
36. Recruitment data flow
Utrecht
Processing server
(Lygature)
SCREENING
clinical
data
knee X-ray
Oslo
A Coruña
Leiden
Paris
eCRF
(Servier)
KIDA analysis
(UMCU)
TranSMART
(TraiT)
Predictor
(NU)
scores & decisions
26/31
ranking
(NU)
XNAT
(Lygature)
39. Summary
Highlights
CHECK data used as a proxy for training ML models for all cohorts
two-phase recruitment based on ML model predictions
multi-model selection from data of variable age
screening visit to make enrolment decisions on up-to-date data
patients ranking instead of individual decisions (choice in context)
web application for decision support and progress monitoring
29/31
40. Acknowledgement and disclaimer
Acknowledgement
The research leading to these results has received support from the Innovative
Medicines Initiative Joint Undertaking under grant Agreement no. 115770,
resources of which are composed of financial contribution from the European Union’s
Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in kind
contribution. See: https://www.imi.eur0pa.eu.
Disclaimer
This communication reflects the views of the APPROACH consortium and neither IMI
nor the European Union and EFPIA are liable for any use that may be made of the
information contained herein.
30/31
41. References
Lundberg, S. M., Erion, G. G., and Lee, S.-I. (2018).
Consistent individualized feature attribution for tree ensembles.
Computing Research Repository, arXiv:1802.03888v2.
Lundberg, S. M. and Lee, S.-I. (2017).
A unified approach to interpreting model predictions.
In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information
Processing Systems (NIPS 2017), pages 4765–4774, Long Beach, CA, USA.
31/31
43. Ranking functions
sum
score = P + S
scaled sum
score =
P
max(P)
+
S
max(S)
z-score sum
score =
P − µP
+
S − µS
σP σS
P S score P S score P S score
0.782 0.544 1.326 0.782 0.544 1.798 0.707 0.597 4.6
0.724 0.584 1.308 0.707 0.597 1.791 0.724 0.584 4.555
... ... ... ... ... ... ... ... ...
0.112 0.175 0.287 0.112 0.175 0.41 0.106 0.179 -3.819
0.106 0.179 0.285 0.106 0.179 0.409 0.112 0.175 -3.828
Key findings
sum of z-scores — best for selection without shift
simple sum — best for selection with large shift (≥ 3years)
2/2