1. Machine learning can extract biomarkers of mental health from brain images, but evidence is often undermined by small sample sizes in studies.
2. Larger datasets and use of proxy measures not based on formal diagnoses may help address this issue and provide more broadly applicable insights.
3. Incorporating additional data like socioeconomic factors may be important to better explain psychological constructs than brain images alone.
2. Reading health from brain images
machine
learning
Mental
Health
How old is this person?
Is she at risk of developing Alzheimer’s Disease?
Is she depressed?
Needs sophisticated analysis
G Varoquaux 2
3. 1 Hope: non-trivial imaging biomarkers
2 Achilles’ heal: Evidence for prediction
3 Vision: Broader validity
G Varoquaux 3
4. 1 Hope: non-trivial imaging biomarkers
Machine learning can capture
non-trivial mental phenotypes in brain images
Example: Autism Spectrum Disorder
Challenging: spectrum disorder; diagnostics based on symptoms
G Varoquaux 4
5. 1 A competition to assess the state of the art
Rich data
Multimodal: cortical thickness & resting-state fMRI
Cohort of 2000 individuals
An open competition
3000e for the best prediction of autism status
Competition open during 3 months
Trustworthy assessment of state of the art: Final prediction score
Best performers: ∼ 0.8 AUC ROC
G Varoquaux 5
[Traut... 2021]
6. 1 A competition: More data makes a big difference
500 1000 1500 2000
number of subjects in training set
0.75
0.80
0.85
Prediction
performance AUC = 0.89
(ROC-AUC)
0.5+0.39 (1 e 0.047 n)
Prediction for different samples sizes
fit:
± 1 std. dev.
Amount of data is currently the limiting factor
G Varoquaux 6
[Traut... 2021]
7. 1 A competition: rest fMRI trumps cortical thickness
0.65 0.70 0.75 0.80
ROC-AUC
anatomy
functional
anatomy + functional
anatomy + functional +
age + sex
Prediction score
Obtained by removing anatomy or function from models
G Varoquaux 7
[Traut... 2021]
8. Given labels and enough data, machine learning
will extract non-trivial biomarkers of mental health
G Varoquaux 8
9. 2 Achilles’ heal: Evidence for prediction
Can we trust published biomarkers of psychiatric conditions?
G Varoquaux 9
10. 2 Assessing prediction requires unseen data
[Poldrack... 2020]
2 1 0 1 2
5.0
7.5
10.0
12.5
15.0
17.5
20.0
22.5
25.0
order = 1
order = 2
order = 15
0
0
20
40
60
80
100
Mean
squared
error
Quality of fit on data used to fit is not meaningful
Only new (test) data, can measure prediction
G Varoquaux 10
11. 2 Evidence for prediction
[Poldrack... 2020]
Established on unseen data
Test set
Train set
Full data
eg cross-validation
G Varoquaux 11
12. 2 Competition: assessment via a private set
[Traut... 2021]
Our competition was evaluated on a hidden private set
ROC-AUC of a submission
0.6 0.7 0.8 0.9 1.0
0.5
Public set
Private set
G Varoquaux 12
13. 2 Competition: analysts overfit the public set
[Traut... 2021]
0.5 0.6 0.7 0.8 0.9 1.0
ROC-AUC
start
middle
finish
Scores during the competition
Public set
Private set
Human overfit by optimizing public-set score
Cross-validation is noisy, not trustworthy
G Varoquaux 13
14. 2 Cross-validation is noisy: confidence intervals
[Varoquaux 2017]
100
200
300
1000
Number of available samples
19% +15%
10% +8%
10% +10%
7% +5%
7% +7%
5% +4%
6% +6%
3% +2%
3% +3%
LOO
50 splits, 20% test
LOO
50 splits, 20% test
LOO
50 splits, 20% test
LOO
50 splits, 20% test
50 splits, 20% test
50 splits, 20% test
G Varoquaux 14
15. 2 Analytic variability explores cross-validation noise
[Varoquaux 2017]
Trivial analytic variations on a permuted data:
smoothing, SVM vs log-reg, feature selection
30% 40% 50% 60% 70%
Crossvalidation scores for different decoders
4 first
4 last
6 first
6 last
all 12
Sessions used 25% 39%
40% 71%
38% 57%
47% 57%
44% 52%
n~72
n~72
n~108
n~108
n~216
With small n, by chance, some analytic
choices give seemingly good predictions
G Varoquaux 15
16. 2 Less noise in cross-validation, less optimism?
[Varoquaux 2017]
In the literature, effect sizes decrease with sample sizes
50%
75%
100%
p=.05
Wolfer2015:
Psychiatric diagnostic
p=.05
Arbabshirani2017:
Alzheimer's
p=.05
Woo2017:
Alzheimer's
p=.05
Woo2017:
Depression
30 100 3001000
50%
75%
100%
p=.05
Brown2017:
Connectome learning
30 100 3001000
p=.05
Arbabshirani2017:
Schizophrenia
30 100 3001000
p=.05
Woo2017:
Psychosis
30 100 3001000
p=.05
Reported
accuracy
Study sample size
Woo2017:
Autism
G Varoquaux 16
17. Small sample sizes gives wiggle room that kills my
trust in publications
G Varoquaux 17
18. 3 Vision: Broader validity
Bigger datasets and clinically-useful settings
require rethinking studies
G Varoquaux 18
19. 3 Scarcity of outcomes to predict
Supervised learning needs large datasets with labels
Most individuals have no diagnosed condition
UK Biobank: normal aging cohort, n = 440 000 cohort
- 500 Alzheimer’s disease
- 500 Schizophrenia
Turn to new outcomes
G Varoquaux 19
20. 3 Brain age, a proxy clinical outcome
[Liem... 2017]
Train with chronological age, predict brain aging
Expected age given an image of the brain
Discrepancy with chronological age (brain-age delta)
correlates with cognitive impairment
0 2 4
Brain aging discrepancy
(years)
-0.38
0.74
1.72
Objective Cognitive
Impairment group
Normal
Mild
Major
Biomarker
from surrogate outcome,
not directly clinically relevant
but useful
G Varoquaux 20
21. 3 Proxy mental-health measures [Dadi... 2021]
Pushing beyond brain age to a broader agenda
Machine-learning on imperfect correlates to build
Biomarkers (objectively measured characteristics) of mental health
Extracted despite lack of reliable diagnosis information
That integrate complementary patient information
Applicable to a wide population (beyond a specific disorder),
For:
treatment development
public-health policy
personalized medicine
G Varoquaux 21
22. 3 Proxy measures for aging, neuroticism, intelligence [Dadi... 2021]
Machine learning on brain imaging & socio-demographics
to reconstruct canonical assessments of
Aging (measured via age)
Neuroticism, fluid intelligence (measured via questionnaires)
G Varoquaux 22
23. 3 Proxy measures for aging, neuroticism, intelligence [Dadi... 2021]
Machine learning on brain imaging & socio-demographics
to reconstruct canonical assessments of
Aging (measured via age)
Neuroticism, fluid intelligence (measured via questionnaires)
Ecological validity: Strong association to real-life health behavior
0.12
0.06
−0.04
0.10
−0.13
−0.00
−0.12
−0.00
0.13
−0.10
−0.06
−0.02
Brain Age Delta Fluid Intelligence Neuroticism
−0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2
# Cigarettes smoked
(Pack−Years)
Sleep duration (hours)
Metabolic Equivalent Task
(minutes/week)
# Alcoholic beverages
A proxy measure
Specific associations of proxy and target measures with health−related habits
G Varoquaux 22
24. 3 Proxy measures for aging, neuroticism, intelligence [Dadi... 2021]
Machine learning on brain imaging & socio-demographics
to reconstruct canonical assessments of
Aging (measured via age)
Neuroticism, fluid intelligence (measured via questionnaires)
Ecological validity: Association stronger than with original measures
0.12
0.06
−0.04
0.10
−0.13
−0.00
−0.12
−0.00
0.13
−0.10
−0.06
−0.02
Brain Age Delta Fluid Intelligence Neuroticism
−0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2
# Cigarettes smoked
(Pack−Years)
Sleep duration (hours)
Metabolic Equivalent Task
(minutes/week)
# Alcoholic beverages
A proxy measure
0.08
0.02
0.03
0.04
−0.05
−0.02
−0.12
0.01
0.08
−0.05
−0.05
−0.02
Age Fluid Intelligence Neuroticism
−0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2
# Cigarettes smoked
(Pack−Years)
Sleep duration (hours)
Metabolic Equivalent Task
(minutes/week)
# Alcoholic beverages
βproxy ± bootstrap−based uncertainty estimates
B target measure
Specific associations of proxy and target measures with health−related habits
G Varoquaux 22
25. 3 Importance of brain imaging [Dadi... 2021]
0.67
0.62
Brain
Imaging
only
Age
0.0 0.2 0.4 0.6 0.00
All variables
R2
Brain Imaging no yes
complete set o
Using ↓ to predict:
Mood and sentiment,
life style, education
age, sex, early life
Brain imaging helps to measure aging
G Varoquaux 23
26. 3 Importance of brain imaging [Dadi... 2021]
0.67
0.62
Brain
Imaging
only
0.16
0.17
Brain
Imaging
only
0.29
0.32
Brain
Imaging
only
Age Fluid intelligence Neuroticism
0.0 0.2 0.4 0.6 0.00 0.05 0.10 0.15 0.20 0.25 0.0 0.1 0.2 0.3 0.4
ariables
R2
± CV−based uncertainty estimates
Brain Imaging no yes
Approximation quality of proxy measures derived from
complete set of sociodemographics with and without brain imaging
ing ↓ to predict:
and sentiment,
yle, education
sex, early life
Brain imaging helps to measure aging
But not fluid intelligence & neuroticism
⇒ We must give more importance to socio-demographics
in image analysis
G Varoquaux 23
27. @GaelVaroquaux
Brain imaging, mental health & machine learning:
a bittersweet tale
Machine learning can extract non-trivial biomarkers of mental health,
given sufficient labelled images.
Be creative about those labels: [Dadi... 2021]
proxy measures built on common and ecological assessessments
Small sample sizes often undermine my trust in publications
Socio-demographics trump imaging to explain psychological assessements
Field wants mechanisms and intervention targets
Using the wrong methods ⇒ story-telling primes valid evidence
Poster: How I failed – WTh553
28. References I
A. Abraham, M. Milham, A. Di Martino, R. C. Craddock, D. Samaras,
B. Thirion, and G. Varoquaux. Deriving reproducible biomarkers from
multi-site resting-state data: An autism-based example. NeuroImage, 2017.
Biomarkers Definitions Working Group. Biomarkers and surrogate endpoints:
preferred definitions and conceptual framework. Clinical pharmacology and
therapeutics, 69:89—95, 2001.
K. Dadi, M. Rahim, A. Abraham, D. Chyzhyk, M. Milham, B. Thirion,
G. Varoquaux, and A. D. N. Initiative. Benchmarking functional
connectome-based predictive models for resting-state fmri. NeuroImage, 2019.
K. Dadi, G. Varoquaux, J. Houenou, D. Bzdok, B. Thirion, and D. Engemann.
Beyond brain age: Empirically-derived proxy measures of mental health. 2020a.
K. Dadi, G. Varoquaux, A. Machlouzarides-Shalit, K. J. Gorgolewski,
D. Wassermann, B. Thirion, and A. Mensch. Fine-grain atlases of functional
modes for fMRI analysis. neuroimage, page in press, 2020b.
29. References II
K. Dadi, G. Varoquaux, J. Houenou, D. Bzdok, B. Thirion, and D. Engemann.
Population modeling with machine learning can enhance measures of mental
health. GigaScience, 10(10):giab071, 2021.
D. A. Engemann, O. Kozynets, D. Sabbagh, G. Lemaı̂tre, G. Varoquaux, F. Liem,
and A. Gramfort. Combining magnetoencephalography with magnetic
resonance imaging enhances learning of surrogate-biomarkers. eLife, 9:e54055,
2020.
F. Liem, G. Varoquaux, J. Kynast, F. Beyer, S. K. Masouleh, J. M. Huntenburg,
L. Lampe, M. Rahim, A. Abraham, R. C. Craddock, ... Predicting brain-age
from multimodal imaging data captures cognitive impairment. NeuroImage,
2017.
R. A. Poldrack, G. Huckins, and G. Varoquaux. Establishment of best practices
for evidence for prediction: a review. JAMA psychiatry, 77:534, 2020.
B. Thirion, G. Varoquaux, E. Dohmatob, and J. Poline. Which fMRI clustering
gives good brain parcellations? Name: Frontiers in Neuroscience, 8:167, 2014.
30. References III
N. Traut, K. Heuer, G. Lemaı̂tre, A. Beggiato, D. Germanaud, M. Elmaleh,
A. Bethegies, L. Bonasse-Gahot, W. Cai, S. Chambon, ... Insights from an
autism imaging biomarker challenge: promises and threats to biomarker
discovery. medRxiv, 2021.
G. Varoquaux. Cross-validation failure: small sample sizes lead to large error
bars. NeuroImage, 2017.
G. Varoquaux, F. Baronnet, A. Kleinschmidt, P. Fillard, and B. Thirion.
Detection of brain functional-connectivity difference in post-stroke patients
using group-level covariance modeling. In MICCAI. 2010.