Measuring mental health with machine learning and brain imaging

Measuring mental health
with machine learning and brain imaging
Gaël Varoquaux,

Reading health from brain images
machine
learning
Mental
Health
How old is this person?
Is she at risk of developing Alzheimer’s Disease?
Is she depressed?
Needs sophisticated analysis
G Varoquaux 2

1 Hope: non-trivial imaging biomarkers
2 Achilles’ heal: Evidence for prediction
3 Vision: Broader validity
G Varoquaux 3

1 Hope: non-trivial imaging biomarkers
Machine learning can capture
non-trivial mental phenotypes in brain images
Example: Autism Spectrum Disorder
Challenging: spectrum disorder; diagnostics based on symptoms
G Varoquaux 4

1 A competition to assess the state of the art
Rich data
Multimodal: cortical thickness & resting-state fMRI
Cohort of 2000 individuals
An open competition
3000e for the best prediction of autism status
Competition open during 3 months
Trustworthy assessment of state of the art: Final prediction score
Best performers: ∼ 0.8 AUC ROC
G Varoquaux 5
[Traut... 2021]

1 A competition: More data makes a big difference
500 1000 1500 2000
number of subjects in training set
0.75
0.80
0.85
Prediction
performance AUC = 0.89
(ROC-AUC)
0.5+0.39 (1 e 0.047 n)
Prediction for different samples sizes
fit:
± 1 std. dev.
Amount of data is currently the limiting factor
G Varoquaux 6
[Traut... 2021]

1 A competition: rest fMRI trumps cortical thickness
0.65 0.70 0.75 0.80
ROC-AUC
anatomy
functional
anatomy + functional
anatomy + functional +
age + sex
Prediction score
Obtained by removing anatomy or function from models
G Varoquaux 7
[Traut... 2021]

Given labels and enough data, machine learning
will extract non-trivial biomarkers of mental health
G Varoquaux 8

2 Achilles’ heal: Evidence for prediction
Can we trust published biomarkers of psychiatric conditions?
G Varoquaux 9

2 Assessing prediction requires unseen data
[Poldrack... 2020]
2 1 0 1 2
5.0
7.5
10.0
12.5
15.0
17.5
20.0
22.5
25.0
order = 1
order = 2
order = 15
0
0
20
40
60
80
100
Mean
squared
error
Quality of fit on data used to fit is not meaningful
Only new (test) data, can measure prediction
G Varoquaux 10

2 Evidence for prediction
[Poldrack... 2020]
Established on unseen data
Test set
Train set
Full data
eg cross-validation
G Varoquaux 11

2 Competition: assessment via a private set
[Traut... 2021]
Our competition was evaluated on a hidden private set
ROC-AUC of a submission
0.6 0.7 0.8 0.9 1.0
0.5
Public set
Private set
G Varoquaux 12

2 Competition: analysts overfit the public set
[Traut... 2021]
0.5 0.6 0.7 0.8 0.9 1.0
ROC-AUC
start
middle
finish
Scores during the competition
Public set
Private set
Human overfit by optimizing public-set score
Cross-validation is noisy, not trustworthy
G Varoquaux 13

2 Cross-validation is noisy: confidence intervals
[Varoquaux 2017]
100
200
300
1000
Number of available samples
19% +15%
10% +8%
10% +10%
7% +5%
7% +7%
5% +4%
6% +6%
3% +2%
3% +3%
LOO
50 splits, 20% test
LOO
50 splits, 20% test
LOO
50 splits, 20% test
LOO
50 splits, 20% test
50 splits, 20% test
50 splits, 20% test
G Varoquaux 14

2 Analytic variability explores cross-validation noise
[Varoquaux 2017]
Trivial analytic variations on a permuted data:
smoothing, SVM vs log-reg, feature selection
30% 40% 50% 60% 70%
Crossvalidation scores for different decoders
4 first
4 last
6 first
6 last
all 12
Sessions used 25% 39%
40% 71%
38% 57%
47% 57%
44% 52%
n~72
n~72
n~108
n~108
n~216
With small n, by chance, some analytic
choices give seemingly good predictions
G Varoquaux 15

2 Less noise in cross-validation, less optimism?
[Varoquaux 2017]
In the literature, effect sizes decrease with sample sizes
50%
75%
100%
p=.05
Wolfer2015:
Psychiatric diagnostic
p=.05
Arbabshirani2017:
Alzheimer's
p=.05
Woo2017:
Alzheimer's
p=.05
Woo2017:
Depression
30 100 3001000
50%
75%
100%
p=.05
Brown2017:
Connectome learning
30 100 3001000
p=.05
Arbabshirani2017:
Schizophrenia
30 100 3001000
p=.05
Woo2017:
Psychosis
30 100 3001000
p=.05
Reported
accuracy
Study sample size
Woo2017:
Autism
G Varoquaux 16

Small sample sizes gives wiggle room that kills my
trust in publications
G Varoquaux 17

3 Vision: Broader validity
Bigger datasets and clinically-useful settings
require rethinking studies
G Varoquaux 18

3 Scarcity of outcomes to predict
Supervised learning needs large datasets with labels
Most individuals have no diagnosed condition
UK Biobank: normal aging cohort, n = 440 000 cohort
- 500 Alzheimer’s disease
- 500 Schizophrenia
Turn to new outcomes
G Varoquaux 19

3 Brain age, a proxy clinical outcome
[Liem... 2017]
Train with chronological age, predict brain aging
Expected age given an image of the brain
Discrepancy with chronological age (brain-age delta)
correlates with cognitive impairment
0 2 4
Brain aging discrepancy
(years)
-0.38
0.74
1.72
Objective Cognitive
Impairment group
Normal
Mild
Major
Biomarker
from surrogate outcome,
not directly clinically relevant
but useful
G Varoquaux 20

3 Proxy mental-health measures [Dadi... 2021]
Pushing beyond brain age to a broader agenda
Machine-learning on imperfect correlates to build
Biomarkers (objectively measured characteristics) of mental health
Extracted despite lack of reliable diagnosis information
That integrate complementary patient information
Applicable to a wide population (beyond a specific disorder),
For:
treatment development
public-health policy
personalized medicine
G Varoquaux 21

3 Proxy measures for aging, neuroticism, intelligence [Dadi... 2021]
Machine learning on brain imaging & socio-demographics
to reconstruct canonical assessments of
Aging (measured via age)
Neuroticism, fluid intelligence (measured via questionnaires)
G Varoquaux 22

Ecological validity: Strong association to real-life health behavior
0.12
0.06
−0.04
0.10
−0.13
−0.00
−0.12
−0.00
0.13
−0.10
−0.06
−0.02
Brain Age Delta Fluid Intelligence Neuroticism
−0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2
# Cigarettes smoked
(Pack−Years)
Sleep duration (hours)
Metabolic Equivalent Task
(minutes/week)
# Alcoholic beverages
A proxy measure
Specific associations of proxy and target measures with health−related habits
G Varoquaux 22

Ecological validity: Association stronger than with original measures
0.12
0.06
−0.04
0.10
−0.13
−0.00
−0.12
−0.00
0.13
−0.10
−0.06
−0.02
Brain Age Delta Fluid Intelligence Neuroticism
−0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2
# Cigarettes smoked
(Pack−Years)
(minutes/week)
A proxy measure
0.08
0.02
0.03
0.04
−0.05
−0.02
−0.12
0.01
0.08
−0.05
−0.05
−0.02
Age Fluid Intelligence Neuroticism
−0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2
# Cigarettes smoked
(Pack−Years)
(minutes/week)
βproxy ± bootstrap−based uncertainty estimates
B target measure
Specific associations of proxy and target measures with health−related habits
G Varoquaux 22

3 Importance of brain imaging [Dadi... 2021]
0.67
0.62
Brain
Imaging
only
Age
0.0 0.2 0.4 0.6 0.00
All variables
R2
Brain Imaging no yes
complete set o
Using ↓ to predict:
Mood and sentiment,
life style, education
age, sex, early life
Brain imaging helps to measure aging
G Varoquaux 23

3 Importance of brain imaging [Dadi... 2021]
0.67
0.62
Brain
Imaging
only
0.16
0.17
Brain
Imaging
only
0.29
0.32
Brain
Imaging
only
Age Fluid intelligence Neuroticism
0.0 0.2 0.4 0.6 0.00 0.05 0.10 0.15 0.20 0.25 0.0 0.1 0.2 0.3 0.4
ariables
R2
± CV−based uncertainty estimates
Brain Imaging no yes
Approximation quality of proxy measures derived from
complete set of sociodemographics with and without brain imaging
ing ↓ to predict:
and sentiment,
yle, education
sex, early life
Brain imaging helps to measure aging
But not fluid intelligence & neuroticism
⇒ We must give more importance to socio-demographics
in image analysis
G Varoquaux 23

@GaelVaroquaux
Brain imaging, mental health & machine learning:
a bittersweet tale
Machine learning can extract non-trivial biomarkers of mental health,
given sufficient labelled images.
Be creative about those labels: [Dadi... 2021]
proxy measures built on common and ecological assessessments
Small sample sizes often undermine my trust in publications
Socio-demographics trump imaging to explain psychological assessements
Field wants mechanisms and intervention targets
Using the wrong methods ⇒ story-telling primes valid evidence
Poster: How I failed – WTh553

References I
A. Abraham, M. Milham, A. Di Martino, R. C. Craddock, D. Samaras,
B. Thirion, and G. Varoquaux. Deriving reproducible biomarkers from
multi-site resting-state data: An autism-based example. NeuroImage, 2017.
Biomarkers Definitions Working Group. Biomarkers and surrogate endpoints:
preferred definitions and conceptual framework. Clinical pharmacology and
therapeutics, 69:89—95, 2001.
K. Dadi, M. Rahim, A. Abraham, D. Chyzhyk, M. Milham, B. Thirion,
G. Varoquaux, and A. D. N. Initiative. Benchmarking functional
connectome-based predictive models for resting-state fmri. NeuroImage, 2019.
K. Dadi, G. Varoquaux, J. Houenou, D. Bzdok, B. Thirion, and D. Engemann.
Beyond brain age: Empirically-derived proxy measures of mental health. 2020a.
K. Dadi, G. Varoquaux, A. Machlouzarides-Shalit, K. J. Gorgolewski,
D. Wassermann, B. Thirion, and A. Mensch. Fine-grain atlases of functional
modes for fMRI analysis. neuroimage, page in press, 2020b.

References II
K. Dadi, G. Varoquaux, J. Houenou, D. Bzdok, B. Thirion, and D. Engemann.
Population modeling with machine learning can enhance measures of mental
health. GigaScience, 10(10):giab071, 2021.
D. A. Engemann, O. Kozynets, D. Sabbagh, G. Lemaı̂tre, G. Varoquaux, F. Liem,
and A. Gramfort. Combining magnetoencephalography with magnetic
resonance imaging enhances learning of surrogate-biomarkers. eLife, 9:e54055,
2020.
F. Liem, G. Varoquaux, J. Kynast, F. Beyer, S. K. Masouleh, J. M. Huntenburg,
L. Lampe, M. Rahim, A. Abraham, R. C. Craddock, ... Predicting brain-age
from multimodal imaging data captures cognitive impairment. NeuroImage,
2017.
R. A. Poldrack, G. Huckins, and G. Varoquaux. Establishment of best practices
for evidence for prediction: a review. JAMA psychiatry, 77:534, 2020.
B. Thirion, G. Varoquaux, E. Dohmatob, and J. Poline. Which fMRI clustering
gives good brain parcellations? Name: Frontiers in Neuroscience, 8:167, 2014.

References III
N. Traut, K. Heuer, G. Lemaı̂tre, A. Beggiato, D. Germanaud, M. Elmaleh,
A. Bethegies, L. Bonasse-Gahot, W. Cai, S. Chambon, ... Insights from an
autism imaging biomarker challenge: promises and threats to biomarker
discovery. medRxiv, 2021.
G. Varoquaux. Cross-validation failure: small sample sizes lead to large error
bars. NeuroImage, 2017.
G. Varoquaux, F. Baronnet, A. Kleinschmidt, P. Fillard, and B. Thirion.
Detection of brain functional-connectivity difference in post-stroke patients
using group-level covariance modeling. In MICCAI. 2010.

Measuring mental health with machine learning and brain imaging

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Measuring mental health with machine learning and brain imaging

Similar to Measuring mental health with machine learning and brain imaging (20)

More from Gael Varoquaux

More from Gael Varoquaux (20)

Recently uploaded

Recently uploaded (20)

Measuring mental health with machine learning and brain imaging