Clustering Microarray Data
Heather Turner
Department of Statistics
University of Warwick, UK
Heather Turner (University of Warwick) 1/9
Overview of Microarray Experiment
−→ −→
Array of p genes Scanned image n × p matrix
(×n) (×n)
Heather Turner (University of Warwick) Clustering Microarray Data 2/9
Example: Serum Stimulation of
Human Fibroblasts
(Eisen, Spellman, Brown & Botstein, PNAS,
1998)
9,800 spots representing 8,600 genes
12 samples taken over 24 hour period
Highlighted clusters can be roughly
categorised as genes involved in
A cholesterol biosynthesis
B the cell cycle
C the immediate–early response
D signaling and angiogenesis
E wound healing and tissue remodelling
Heather Turner (University of Warwick) Clustering Microarray Data 3/9
Why the need for specialised techniques?
Application
Dimensions of the data are nonstandard (large n, small p)
Structure
Both genes and sample clusters may be of interest
Co-expression may be restricted to a subset of the attributes
Genes/samples may belong to more than one group
Many “uninteresting” genes
Nature
Clusters of interest may not be characterised by similar
expression profile
Samples may be taken over time
Heather Turner (University of Warwick) Clustering Microarray Data 4/9
One-way Clustering Techniques
Increased structural flexibility
Overlapping non-exhaustive clusters Context-specific clusters
Gene shaving: Hastie et al, Clustering On Subsets of
Genome Biol., 2000 Attributes (COSA): Friedman
and Meulman, JRSS B, 2004
Heather Turner (University of Warwick) Clustering Microarray Data 5/9
Two-way Clustering Techniques
Use conventional one-way methods iteratively
Sample clusters within gene clusters Clusters within two-way clusters
Inter-related two-way Coupled Two-Way Clustering
clustering: Tang et al, BIBE 01 (CTWC): Getz et al, PNAS,
2003
EMMIX-GENE: McLachlan et
al, Bioinformatics, 2002
Heather Turner (University of Warwick) Clustering Microarray Data 6/9
Co-clustering Techniques
Simultaneously cluster both genes and samples
Two-way partition Conjugate clusters
Spectral bi-clustering: Kluger, Double Conjugated Clustering
Genome Res., 2003 (DCC): Busygin et al, SIAM
ICDM 02
Co-clustering: Cho, SIAM
ICDM 04
Heather Turner (University of Warwick) Clustering Microarray Data 7/9
Biclustering Techniques
Retrieve isolated two-way clusters: biclusters
Clusters based on latent model Biclusters
Rich probabilistic models: Segal SAMBA: Tanay et al,
et al, Bioinformatics, 2001 Bioinformatics, 2002
Plaid models: Lazzeroni and
Owen, Statist. Sinica, 2002
Heather Turner (University of Warwick) Clustering Microarray Data 8/9
Current Situation
Many novel methods, few used in practice
Molecular biologists often have limited (access to) statistical
expertise
Limited number of methods in publically available software
Little work on performance evaluation
Development of methods continues
Improved algorithms
Time series
Three-way data
Integretation of other sources of data
Heather Turner (University of Warwick) Clustering Microarray Data 9/9