Patient subtypes: real or not?

Patient subtypes: real or not?
(Clustering, you’re doing it wrong)
Paul Agapow
Data Science Institute
<p.agapow@ic.ac.uk>
@agapow
March 2018

Patient assignment is a clustering
problem
Precision medicine
(assign patients to clusters)
Translational medicine
(infer clusters from patients)

Clustering requires many decisions
• Similarity
• Group boundaries
• Spectra
• Ground truth
• Noise / non-clustered
• Resolution / cut-offs
• Uninteresting clusters
• Stable & robust

So are clusters real?
Every dataset contains clusters, with
different set of clusters being revealed by
different methods, but not all of these
clusters are real or interesting or
meaningful.
(Paraphrased from Christian Hennig)

Example: Diabetes
• 6 clinical vars
produce 5
clusters
• Validated in
other
datasets

But!
After van Smeden,
Harrel, Dahly:
• Dimension reduce 6
to 5 only?
• 2 related random
vars can generate 6
“clusters”
• Not validated in
other data types

Example: Asthma
• Complex & heterogeneous
• Many attempts to stratify: 3,
4, 6+ clusters

Are asthma clusters real?
• Use multiple
methods &
multiple
datasets to
validate
• Compare nested
clusters with
homogeneity &
completeness

Are all asthma clusters
equally real?
• Requires many more genes to id
TAC3a & b than 1 or 2
• Because different clusters cluster
differently

Biclustering
• Better
approach?
• Simultaneously
group samples
& features
• Relieves
assumption of
clustering
everything

Take home
• There are many ways to cluster &
thus many clusters
• Lots of different ways to be a cluster,
even in same dataset
• Too easy to fish
• Validate with other data types

Or ...
Clustering methods are hypothesis
generators, cluster partitions are
hypotheses and need to be validated or
proven to be useful.

Thanks
• Nazanin Kermani
• Mansoor Saqi
• Axel Oehmichen

Patient subtypes: real or not?

Recommended

Recommended

More Related Content

Similar to Patient subtypes: real or not?

Similar to Patient subtypes: real or not? (20)

More from Paul Agapow

More from Paul Agapow (20)

Recently uploaded

Recently uploaded (20)

Patient subtypes: real or not?