Patient assignment is a clustering
problem
Precision medicine
(assign patients to clusters)
Translational medicine
(infer clusters from patients)
Clustering requires many decisions
• Similarity
• Group boundaries
• Spectra
• Ground truth
• Noise / non-clustered
• Resolution / cut-offs
• Uninteresting clusters
• Stable & robust
So are clusters real?
Every dataset contains clusters, with
different set of clusters being revealed by
different methods, but not all of these
clusters are real or interesting or
meaningful.
(Paraphrased from Christian Hennig)
But!
After van Smeden,
Harrel, Dahly:
• Dimension reduce 6
to 5 only?
• 2 related random
vars can generate 6
“clusters”
• Not validated in
other data types
Take home
• There are many ways to cluster &
thus many clusters
• Lots of different ways to be a cluster,
even in same dataset
• Too easy to fish
• Validate with other data types
Or ...
Clustering methods are hypothesis
generators, cluster partitions are
hypotheses and need to be validated or
proven to be useful.