SlideShare a Scribd company logo
1 of 9
Download to read offline
Clustering Microarray Data

                                           Heather Turner

                                          Department of Statistics
                                         University of Warwick, UK




Heather Turner (University of Warwick)                               1/9
Overview of Microarray Experiment




                                         −→                                −→




    Array of p genes                          Scanned image                     n × p matrix
         (×n)                                     (×n)


Heather Turner (University of Warwick)        Clustering Microarray Data                       2/9
Example: Serum Stimulation of
                                         Human Fibroblasts
                                         (Eisen, Spellman, Brown & Botstein, PNAS,
                                         1998)
                                              9,800 spots representing 8,600 genes
                                              12 samples taken over 24 hour period
                                              Highlighted clusters can be roughly
                                              categorised as genes involved in
                                              A cholesterol biosynthesis
                                              B the cell cycle
                                              C the immediate–early response
                                              D signaling and angiogenesis
                                              E wound healing and tissue remodelling

Heather Turner (University of Warwick)        Clustering Microarray Data               3/9
Why the need for specialised techniques?

          Application
                  Dimensions of the data are nonstandard (large n, small p)
          Structure
                  Both genes and sample clusters may be of interest
                  Co-expression may be restricted to a subset of the attributes
                  Genes/samples may belong to more than one group
                  Many “uninteresting” genes
          Nature
                  Clusters of interest may not be characterised by similar
                  expression profile
                  Samples may be taken over time


Heather Turner (University of Warwick)   Clustering Microarray Data               4/9
One-way Clustering Techniques

          Increased structural flexibility
     Overlapping non-exhaustive clusters                              Context-specific clusters




            Gene shaving: Hastie et al,                         Clustering On Subsets of
            Genome Biol., 2000                                  Attributes (COSA): Friedman
                                                                and Meulman, JRSS B, 2004


Heather Turner (University of Warwick)   Clustering Microarray Data                              5/9
Two-way Clustering Techniques
          Use conventional one-way methods iteratively
        Sample clusters within gene clusters                   Clusters within two-way clusters




                Inter-related two-way                                 Coupled Two-Way Clustering
                clustering: Tang et al, BIBE 01                       (CTWC): Getz et al, PNAS,
                                                                      2003
                EMMIX-GENE: McLachlan et
                al, Bioinformatics, 2002
Heather Turner (University of Warwick)   Clustering Microarray Data                           6/9
Co-clustering Techniques
          Simultaneously cluster both genes and samples
                   Two-way partition                                  Conjugate clusters




            Spectral bi-clustering: Kluger,                     Double Conjugated Clustering
            Genome Res., 2003                                   (DCC): Busygin et al, SIAM
                                                                ICDM 02
            Co-clustering: Cho, SIAM
            ICDM 04
Heather Turner (University of Warwick)   Clustering Microarray Data                        7/9
Biclustering Techniques
          Retrieve isolated two-way clusters: biclusters
         Clusters based on latent model                                 Biclusters




            Rich probabilistic models: Segal                    SAMBA: Tanay et al,
            et al, Bioinformatics, 2001                         Bioinformatics, 2002

                                                                Plaid models: Lazzeroni and
                                                                Owen, Statist. Sinica, 2002
Heather Turner (University of Warwick)   Clustering Microarray Data                           8/9
Current Situation

          Many novel methods, few used in practice
                  Molecular biologists often have limited (access to) statistical
                  expertise
                  Limited number of methods in publically available software
          Little work on performance evaluation
          Development of methods continues
                  Improved algorithms
                  Time series
                  Three-way data
                  Integretation of other sources of data



Heather Turner (University of Warwick)      Clustering Microarray Data              9/9

More Related Content

What's hot

Structural genomics
Structural genomicsStructural genomics
Structural genomicsAshfaq Ahmad
 
Bioinformatics-General_Intro
Bioinformatics-General_IntroBioinformatics-General_Intro
Bioinformatics-General_IntroAbhiroop Ghatak
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics TechnologiesSean Davis
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGijbbjournal
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomicsAisha Kalsoom
 
Structural Genomics
Structural GenomicsStructural Genomics
Structural GenomicsAqsa Javed
 
Decision Support System for Bat Identification using Random Forest and C5.0
Decision Support System for Bat Identification using Random Forest and C5.0Decision Support System for Bat Identification using Random Forest and C5.0
Decision Support System for Bat Identification using Random Forest and C5.0TELKOMNIKA JOURNAL
 
Nikon Small World, Photography Competition 2015
 Nikon Small World, Photography Competition 2015 Nikon Small World, Photography Competition 2015
Nikon Small World, Photography Competition 2015maditabalnco
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
 
Basics in bioinformatics
Basics in bioinformaticsBasics in bioinformatics
Basics in bioinformaticsMamun Billah
 
Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...IAEME Publication
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of BioinformaticsVinaKhan1
 

What's hot (20)

Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Bioinformatics-General_Intro
Bioinformatics-General_IntroBioinformatics-General_Intro
Bioinformatics-General_Intro
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
NTU-2019
NTU-2019NTU-2019
NTU-2019
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
 
Structural Genomics
Structural GenomicsStructural Genomics
Structural Genomics
 
Gdt 2-126 (1)
Gdt 2-126 (1)Gdt 2-126 (1)
Gdt 2-126 (1)
 
Prof. Mohamed Labib Salem's students
Prof. Mohamed Labib Salem's studentsProf. Mohamed Labib Salem's students
Prof. Mohamed Labib Salem's students
 
Decision Support System for Bat Identification using Random Forest and C5.0
Decision Support System for Bat Identification using Random Forest and C5.0Decision Support System for Bat Identification using Random Forest and C5.0
Decision Support System for Bat Identification using Random Forest and C5.0
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Nikon Small World, Photography Competition 2015
 Nikon Small World, Photography Competition 2015 Nikon Small World, Photography Competition 2015
Nikon Small World, Photography Competition 2015
 
MoM2010: Bioinformatics
MoM2010: BioinformaticsMoM2010: Bioinformatics
MoM2010: Bioinformatics
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
 
Basics in bioinformatics
Basics in bioinformaticsBasics in bioinformatics
Basics in bioinformatics
 
Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 

Viewers also liked

gnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Modelsgnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Modelshtstatistics
 
Multiplicative Interaction Models in R
Multiplicative Interaction Models in RMultiplicative Interaction Models in R
Multiplicative Interaction Models in Rhtstatistics
 
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Nonlinear Discrete-time Hazard Models for Entry into MarriageNonlinear Discrete-time Hazard Models for Entry into Marriage
Nonlinear Discrete-time Hazard Models for Entry into Marriagehtstatistics
 
Custom Functions for Specifying Nonlinear Terms to gnm
Custom Functions for Specifying Nonlinear Terms to gnmCustom Functions for Specifying Nonlinear Terms to gnm
Custom Functions for Specifying Nonlinear Terms to gnmhtstatistics
 
Generalized Bradley-Terry Modelling of Football Results
Generalized Bradley-Terry Modelling of Football ResultsGeneralized Bradley-Terry Modelling of Football Results
Generalized Bradley-Terry Modelling of Football Resultshtstatistics
 
From L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized ModelsFrom L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized Modelshtstatistics
 
Modelling the Diluting Effect of Social Mobility on Health Inequality
Modelling the Diluting Effect of Social Mobility on Health InequalityModelling the Diluting Effect of Social Mobility on Health Inequality
Modelling the Diluting Effect of Social Mobility on Health Inequalityhtstatistics
 
BradleyTerry2: Flexible Models for Paired Comparisons
BradleyTerry2: Flexible Models for Paired ComparisonsBradleyTerry2: Flexible Models for Paired Comparisons
BradleyTerry2: Flexible Models for Paired Comparisonshtstatistics
 
Detecting Drug Effects in the Brain
Detecting Drug Effects in the BrainDetecting Drug Effects in the Brain
Detecting Drug Effects in the Brainhtstatistics
 
Sample slides from "Programming with R" course
Sample slides from "Programming with R" courseSample slides from "Programming with R" course
Sample slides from "Programming with R" coursehtstatistics
 
Sample slides from "Getting Started with R" course
Sample slides from "Getting Started with R" courseSample slides from "Getting Started with R" course
Sample slides from "Getting Started with R" coursehtstatistics
 
Collaborative Solutions eHealth Event - FactNexus
Collaborative Solutions eHealth Event - FactNexusCollaborative Solutions eHealth Event - FactNexus
Collaborative Solutions eHealth Event - FactNexusCollaborative Solutions
 
Collaborative Solutions eHealth Event - Claydata
Collaborative Solutions eHealth Event - ClaydataCollaborative Solutions eHealth Event - Claydata
Collaborative Solutions eHealth Event - ClaydataCollaborative Solutions
 
Digital Jungle (ฉบับภาษาไทย)
Digital Jungle (ฉบับภาษาไทย)Digital Jungle (ฉบับภาษาไทย)
Digital Jungle (ฉบับภาษาไทย)Reach China Holdings Limited
 
Enrollment Update-Board of Re...
                                                Enrollment Update-Board of Re...                                                Enrollment Update-Board of Re...
Enrollment Update-Board of Re...NMSU
 
GIS Uygulamaları ile Zincir Proje Yönetimi
GIS Uygulamaları ile Zincir Proje YönetimiGIS Uygulamaları ile Zincir Proje Yönetimi
GIS Uygulamaları ile Zincir Proje YönetimiSerdar Serdaroglu, MSc
 
אמנות ישראלית עכשווית הרצאה 3
אמנות ישראלית עכשווית  הרצאה 3אמנות ישראלית עכשווית  הרצאה 3
אמנות ישראלית עכשווית הרצאה 3Hadassa Gorohovski
 

Viewers also liked (20)

gnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Modelsgnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Models
 
Multiplicative Interaction Models in R
Multiplicative Interaction Models in RMultiplicative Interaction Models in R
Multiplicative Interaction Models in R
 
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Nonlinear Discrete-time Hazard Models for Entry into MarriageNonlinear Discrete-time Hazard Models for Entry into Marriage
Nonlinear Discrete-time Hazard Models for Entry into Marriage
 
Custom Functions for Specifying Nonlinear Terms to gnm
Custom Functions for Specifying Nonlinear Terms to gnmCustom Functions for Specifying Nonlinear Terms to gnm
Custom Functions for Specifying Nonlinear Terms to gnm
 
Generalized Bradley-Terry Modelling of Football Results
Generalized Bradley-Terry Modelling of Football ResultsGeneralized Bradley-Terry Modelling of Football Results
Generalized Bradley-Terry Modelling of Football Results
 
From L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized ModelsFrom L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized Models
 
Modelling the Diluting Effect of Social Mobility on Health Inequality
Modelling the Diluting Effect of Social Mobility on Health InequalityModelling the Diluting Effect of Social Mobility on Health Inequality
Modelling the Diluting Effect of Social Mobility on Health Inequality
 
BradleyTerry2: Flexible Models for Paired Comparisons
BradleyTerry2: Flexible Models for Paired ComparisonsBradleyTerry2: Flexible Models for Paired Comparisons
BradleyTerry2: Flexible Models for Paired Comparisons
 
Detecting Drug Effects in the Brain
Detecting Drug Effects in the BrainDetecting Drug Effects in the Brain
Detecting Drug Effects in the Brain
 
Sample slides from "Programming with R" course
Sample slides from "Programming with R" courseSample slides from "Programming with R" course
Sample slides from "Programming with R" course
 
Sample slides from "Getting Started with R" course
Sample slides from "Getting Started with R" courseSample slides from "Getting Started with R" course
Sample slides from "Getting Started with R" course
 
Collaborative Solutions eHealth Event - FactNexus
Collaborative Solutions eHealth Event - FactNexusCollaborative Solutions eHealth Event - FactNexus
Collaborative Solutions eHealth Event - FactNexus
 
Moral issue of euthanasia
Moral issue of euthanasiaMoral issue of euthanasia
Moral issue of euthanasia
 
Collaborative Solutions eHealth Event - Claydata
Collaborative Solutions eHealth Event - ClaydataCollaborative Solutions eHealth Event - Claydata
Collaborative Solutions eHealth Event - Claydata
 
Digital Jungle (ฉบับภาษาไทย)
Digital Jungle (ฉบับภาษาไทย)Digital Jungle (ฉบับภาษาไทย)
Digital Jungle (ฉบับภาษาไทย)
 
Enrollment Update-Board of Re...
                                                Enrollment Update-Board of Re...                                                Enrollment Update-Board of Re...
Enrollment Update-Board of Re...
 
CS Education Event - Class Cover
CS Education Event - Class CoverCS Education Event - Class Cover
CS Education Event - Class Cover
 
GIS Uygulamaları ile Zincir Proje Yönetimi
GIS Uygulamaları ile Zincir Proje YönetimiGIS Uygulamaları ile Zincir Proje Yönetimi
GIS Uygulamaları ile Zincir Proje Yönetimi
 
אמנות ישראלית עכשווית הרצאה 3
אמנות ישראלית עכשווית  הרצאה 3אמנות ישראלית עכשווית  הרצאה 3
אמנות ישראלית עכשווית הרצאה 3
 
English Project 5. C L
English Project 5. C LEnglish Project 5. C L
English Project 5. C L
 

Similar to Clustering Microarray Data

Identification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning MethodIdentification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning Methodpraveena06
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyMaté Ongenaert
 
CHAVEZ_SESSION23_ACADEMICPAPER.docx
CHAVEZ_SESSION23_ACADEMICPAPER.docxCHAVEZ_SESSION23_ACADEMICPAPER.docx
CHAVEZ_SESSION23_ACADEMICPAPER.docxArvieChavez1
 
DNA CHIPS AND MICROARRAY.pptx
DNA CHIPS AND MICROARRAY.pptxDNA CHIPS AND MICROARRAY.pptx
DNA CHIPS AND MICROARRAY.pptxShabnum
 
LE03.doc
LE03.docLE03.doc
LE03.docbutest
 
Molecular marker and its application in breed improvement and conservation.docx
Molecular marker and its application in breed improvement and conservation.docxMolecular marker and its application in breed improvement and conservation.docx
Molecular marker and its application in breed improvement and conservation.docxTrilokMandal2
 
Dna microarray and its role in plant pathology
Dna microarray and its role in plant pathologyDna microarray and its role in plant pathology
Dna microarray and its role in plant pathologyAbhilasha Sharma
 
Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Robin Gutell
 
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...IJECEIAES
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsNatalio Krasnogor
 
Tools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databasesTools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databasesValery Tkachenko
 
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Sage Base
 
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008Saul Kravitz
 
Michael DeBrota et al. - Assessment of Computational Histopathology in Thorac...
Michael DeBrota et al. - Assessment of Computational Histopathology in Thorac...Michael DeBrota et al. - Assessment of Computational Histopathology in Thorac...
Michael DeBrota et al. - Assessment of Computational Histopathology in Thorac...Michael DeBrota
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andAlexander Decker
 

Similar to Clustering Microarray Data (20)

Identification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning MethodIdentification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning Method
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 
Thesis ppt
Thesis pptThesis ppt
Thesis ppt
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biology
 
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
 
CHAVEZ_SESSION23_ACADEMICPAPER.docx
CHAVEZ_SESSION23_ACADEMICPAPER.docxCHAVEZ_SESSION23_ACADEMICPAPER.docx
CHAVEZ_SESSION23_ACADEMICPAPER.docx
 
DNA CHIPS AND MICROARRAY.pptx
DNA CHIPS AND MICROARRAY.pptxDNA CHIPS AND MICROARRAY.pptx
DNA CHIPS AND MICROARRAY.pptx
 
LE03.doc
LE03.docLE03.doc
LE03.doc
 
10.1.1.80.2149
10.1.1.80.214910.1.1.80.2149
10.1.1.80.2149
 
Molecular marker and its application in breed improvement and conservation.docx
Molecular marker and its application in breed improvement and conservation.docxMolecular marker and its application in breed improvement and conservation.docx
Molecular marker and its application in breed improvement and conservation.docx
 
Dna microarray and its role in plant pathology
Dna microarray and its role in plant pathologyDna microarray and its role in plant pathology
Dna microarray and its role in plant pathology
 
Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277
 
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
 
Tools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databasesTools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databases
 
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
 
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Michael DeBrota et al. - Assessment of Computational Histopathology in Thorac...
Michael DeBrota et al. - Assessment of Computational Histopathology in Thorac...Michael DeBrota et al. - Assessment of Computational Histopathology in Thorac...
Michael DeBrota et al. - Assessment of Computational Histopathology in Thorac...
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 

Clustering Microarray Data

  • 1. Clustering Microarray Data Heather Turner Department of Statistics University of Warwick, UK Heather Turner (University of Warwick) 1/9
  • 2. Overview of Microarray Experiment −→ −→ Array of p genes Scanned image n × p matrix (×n) (×n) Heather Turner (University of Warwick) Clustering Microarray Data 2/9
  • 3. Example: Serum Stimulation of Human Fibroblasts (Eisen, Spellman, Brown & Botstein, PNAS, 1998) 9,800 spots representing 8,600 genes 12 samples taken over 24 hour period Highlighted clusters can be roughly categorised as genes involved in A cholesterol biosynthesis B the cell cycle C the immediate–early response D signaling and angiogenesis E wound healing and tissue remodelling Heather Turner (University of Warwick) Clustering Microarray Data 3/9
  • 4. Why the need for specialised techniques? Application Dimensions of the data are nonstandard (large n, small p) Structure Both genes and sample clusters may be of interest Co-expression may be restricted to a subset of the attributes Genes/samples may belong to more than one group Many “uninteresting” genes Nature Clusters of interest may not be characterised by similar expression profile Samples may be taken over time Heather Turner (University of Warwick) Clustering Microarray Data 4/9
  • 5. One-way Clustering Techniques Increased structural flexibility Overlapping non-exhaustive clusters Context-specific clusters Gene shaving: Hastie et al, Clustering On Subsets of Genome Biol., 2000 Attributes (COSA): Friedman and Meulman, JRSS B, 2004 Heather Turner (University of Warwick) Clustering Microarray Data 5/9
  • 6. Two-way Clustering Techniques Use conventional one-way methods iteratively Sample clusters within gene clusters Clusters within two-way clusters Inter-related two-way Coupled Two-Way Clustering clustering: Tang et al, BIBE 01 (CTWC): Getz et al, PNAS, 2003 EMMIX-GENE: McLachlan et al, Bioinformatics, 2002 Heather Turner (University of Warwick) Clustering Microarray Data 6/9
  • 7. Co-clustering Techniques Simultaneously cluster both genes and samples Two-way partition Conjugate clusters Spectral bi-clustering: Kluger, Double Conjugated Clustering Genome Res., 2003 (DCC): Busygin et al, SIAM ICDM 02 Co-clustering: Cho, SIAM ICDM 04 Heather Turner (University of Warwick) Clustering Microarray Data 7/9
  • 8. Biclustering Techniques Retrieve isolated two-way clusters: biclusters Clusters based on latent model Biclusters Rich probabilistic models: Segal SAMBA: Tanay et al, et al, Bioinformatics, 2001 Bioinformatics, 2002 Plaid models: Lazzeroni and Owen, Statist. Sinica, 2002 Heather Turner (University of Warwick) Clustering Microarray Data 8/9
  • 9. Current Situation Many novel methods, few used in practice Molecular biologists often have limited (access to) statistical expertise Limited number of methods in publically available software Little work on performance evaluation Development of methods continues Improved algorithms Time series Three-way data Integretation of other sources of data Heather Turner (University of Warwick) Clustering Microarray Data 9/9