Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Clustering Microarray Data


Published on

  • Be the first to comment

  • Be the first to like this

Clustering Microarray Data

  1. 1. Clustering Microarray Data Heather Turner Department of Statistics University of Warwick, UKHeather Turner (University of Warwick) 1/9
  2. 2. Overview of Microarray Experiment −→ −→ Array of p genes Scanned image n × p matrix (×n) (×n)Heather Turner (University of Warwick) Clustering Microarray Data 2/9
  3. 3. Example: Serum Stimulation of Human Fibroblasts (Eisen, Spellman, Brown & Botstein, PNAS, 1998) 9,800 spots representing 8,600 genes 12 samples taken over 24 hour period Highlighted clusters can be roughly categorised as genes involved in A cholesterol biosynthesis B the cell cycle C the immediate–early response D signaling and angiogenesis E wound healing and tissue remodellingHeather Turner (University of Warwick) Clustering Microarray Data 3/9
  4. 4. Why the need for specialised techniques? Application Dimensions of the data are nonstandard (large n, small p) Structure Both genes and sample clusters may be of interest Co-expression may be restricted to a subset of the attributes Genes/samples may belong to more than one group Many “uninteresting” genes Nature Clusters of interest may not be characterised by similar expression profile Samples may be taken over timeHeather Turner (University of Warwick) Clustering Microarray Data 4/9
  5. 5. One-way Clustering Techniques Increased structural flexibility Overlapping non-exhaustive clusters Context-specific clusters Gene shaving: Hastie et al, Clustering On Subsets of Genome Biol., 2000 Attributes (COSA): Friedman and Meulman, JRSS B, 2004Heather Turner (University of Warwick) Clustering Microarray Data 5/9
  6. 6. Two-way Clustering Techniques Use conventional one-way methods iteratively Sample clusters within gene clusters Clusters within two-way clusters Inter-related two-way Coupled Two-Way Clustering clustering: Tang et al, BIBE 01 (CTWC): Getz et al, PNAS, 2003 EMMIX-GENE: McLachlan et al, Bioinformatics, 2002Heather Turner (University of Warwick) Clustering Microarray Data 6/9
  7. 7. Co-clustering Techniques Simultaneously cluster both genes and samples Two-way partition Conjugate clusters Spectral bi-clustering: Kluger, Double Conjugated Clustering Genome Res., 2003 (DCC): Busygin et al, SIAM ICDM 02 Co-clustering: Cho, SIAM ICDM 04Heather Turner (University of Warwick) Clustering Microarray Data 7/9
  8. 8. Biclustering Techniques Retrieve isolated two-way clusters: biclusters Clusters based on latent model Biclusters Rich probabilistic models: Segal SAMBA: Tanay et al, et al, Bioinformatics, 2001 Bioinformatics, 2002 Plaid models: Lazzeroni and Owen, Statist. Sinica, 2002Heather Turner (University of Warwick) Clustering Microarray Data 8/9
  9. 9. Current Situation Many novel methods, few used in practice Molecular biologists often have limited (access to) statistical expertise Limited number of methods in publically available software Little work on performance evaluation Development of methods continues Improved algorithms Time series Three-way data Integretation of other sources of dataHeather Turner (University of Warwick) Clustering Microarray Data 9/9