SlideShare a Scribd company logo
Welcome to my presentation
on
A Comparative Study of Clustering for Gene
Expression Data in Bioinformatics
Roll: 08054746
Reg: 1484
Department of Statistics
Rajshahi University
Rajshahi-6205
Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

1
Outline
1. Why choosing clustering technique ?
2. Some Objectives
3. Methods and materials
4. Results and Discussions
5. Conclusion

Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

2
1. Why choosing Clustering Technique
Cluster analysis programs are routinely run as a first
step of data summary and grouping genes in a
microarray data analysis.
Mainly the gene expression data is so much
noisy, mixture with expression pattern, down
regulated and up regulated.
That’s why we show here a comparative study of four
clustering algorithms and two proximity measures
applied on most commonly used iris data, simulated
data and six real cancer gene expression data sets.

Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

3
2. Some Objectives
 Find significant cluster according to
similarities, intensities and regulations among it’s
objects.
 Compare several method of HC with K-means
based on two proximity measures.
 To asses the quality and reliability of clustering by
Calinaski Harabasz (CH) and Daviece Bouldin (DB)
index.

Bioinformatics Lab, Dept. of Statistics, University of Rajshahi
Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

4
Methods
1. Single Linkage or Nearest
Neighbor Method
2. Complete Linkage or
Furthest Neighbor Method
3. Average Linkage Method
K-means clustering
Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

5
Davies–Bouldin (DB) Index
The Davies–Bouldin index is a metric for evaluating
clustering algorithms (Davies and Bouldin, 1969). This
is an internal evaluation scheme and it is a cluster
separation measure.

Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

6
Calinski-Harabasz (CH) Index

Where, SSB is the overall between-cluster variance, SSW is the overall within-cluster
variance, k is the number of clusters, and N is the number of observations.

Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

7
Data sets
Chip Tissue

n

Armstrong-V2 [2]

Affy

Blood

72

3

Bhattacharjee
[3]
Nutt-V1 [6]

Affy

Lung

203

Affy

Brain

Alizadeh-V2 [1]

cDNA

Garber [4]

m

d

24,20,28

12582

2194

5

139,17,6,21,20

12600

1543

50

4

14,7,14,15

12625

1377

Blood

62

3

42,9,11

4022

2093

cDNA

Liang [5]

Lung

66

4

17,40,4,5

24192

4553

cDNA

Dataset

#C Dist. Classes

Brain

37

3

28,6,3

24192

1411

Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

8
2 Clusters
3 Clusters

In this example, the objects g1, g2, g3, g4, g5, g6, g7, g8, g9 and g10 have been clustered. The
place at the bottom of the tree, where the object names are written, are called leaves. The
junctions are called nodes. It is possible to use a hierarchical clustering algorithm to find groups
in the data, by cutting the tree at a certain height. For instance, it might be considered than on
the example there are two groups, (g2, g3, g1, g8) and (g6, g10, g5, g7, g4, g9) or three groups
(g2, g3, g1, g8), (g6, g10) and (g5, g7, g4, g9) or ten groups, each containing only one leaf.
Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

9
Hierarchical Clustering of Simulated Data

Fig: Heat map

Green color dendrogram shows the best result and we make a Heat map by
this method. i.e Complete HC with respect to Euclidean distance give the
best result then other methods.
Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

10
K-means of Simulated Data

Table: Davies-Bouldin index
No. of Cluster
Cluster Size
DB index

K=2
20,40
0.897

K=3
20,20,20
0.321

K=4
12,20,8,20
0.797

K=5
4,4,12,20,20
0.825

From the above table we see, when the number of cluster k=3
the DB index give the lower value. Therefore we may conclude
that three clusters are present in this data set.
Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

11
HC of Armstrong-V2 Data(d)

Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

12
Several HC Nutt-V1 Data (c)

Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

13
K-means of Alizadeh-V2 Data

Table: Davies-Bouldin index
No. of Cluster
Cluster Size
DB index

K=2

K=3
, ,

K=4

K=5
22 9 3 10 18

2.

Table represent, when the number of cluster k=3 the DB index give the lower
value. The sizes of the cluster is ,
and
and the actual cluster size is , 9
and 11. When the number of cluster is 3 than the DB index gives the lower value.
Therefore we may conclude that three clusters are present in Alizadeh-V2 data.
Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

14
K-means of Liang Data

Table: Davies-Bouldin index
No. of Cluster
Cluster Size
DB index

K=2
29 8
1.23

K=3
, ,

K=4
6 9 3 19
2.09

K=5
1 2 19 14 1
1.215

Table 4.3 represent, when the number of cluster k=3 the DB index give the lower
value. The sizes of the cluster is ,
and 3 and the actual cluster size is ,
and
. When the number of cluster is 3 than the DB index gives the lower value.
Therefore we may conclude that three clusters are present in Armstrong-V2 data.
Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

15
Several HC Liang Data(c,d,e,f)
28,6,3

Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

16
Heat map of Liang Data

Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

17
Compare HC with K-means for Affymetrix data sets
Dataset

Distance Method

Cluster Method

Calinski-Harabasz (CH)

Armstrong-V2

Euclidean

Single

1.889

Euclidean

Complete

11.803

Euclidean

Average

6.674

Pearson

Single

0.914

Pearson

Complete

12.559

Pearson

Average

10.393

K-means

11.943

Euclidean

Single

1.786

Euclidean

Complete

34.702

Euclidean

Average

26.850

Pearson

Single

1.700

Pearson

Complete

26.512

Pearson

Average

12.902

K-means

22.924

Euclidean

Single

3.167

Euclidean

Complete

7.938

Euclidean

Average

5.269

Pearson

Single

0.941

Pearson

Complete

4.273

Pearson

Average

2.987

K-means

6.051

Bhattacharjee

Nutt-V1

Bioinformatics Lab, Dept. of Statistics, University of Rajshahi
Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

18
Compare HC with K-means for Affymetrix data sets by visualization technique
Mean of the CH index for Affy Chip
20
18
16

CH index

14

12

Pearson
Euclidean

10
8
6
4
2
0
Single

Average

Complete

K-Means

From the above graph we see that Complete linkage with Euclidean achieves CH index
of 18.14 which is larger CH than Single, Average and K-means with respect to their
proximity measure. Therefore we may conclude that the complete linkage method
gives the better result for the Affymetrix data sets.
Bioinformatics Lab, Dept. of Statistics, University of Rajshahi

19
Compare HC with K-means for cDNA data sets
Dataset

Distance Method

Cluster Method

Calinski-Harabasz (CH)

Alizadeh-V2

Euclidean
Euclidean
Euclidean
Pearson
Pearson
Pearson

Single
Complete
Average
Single
Complete
Average

2.047
11.161
11.068
0.980
11.229
10.319

K-means

13.003

Single

2.772

Garber

Euclidean
Euclidean

Euclidean
Pearson
Pearson
Pearson

Liang

Complete

19.097

5.166
0.855
7.693
18.912
9.269
9.057
19.665
10.279
19.665
19.665
19.665

K-means

Euclidean
Euclidean
Euclidean
Pearson
Pearson
Pearson

Average
Single
Complete
Average
K-means
Single
Complete
Average
Single
Complete
Average

23.781

Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

20
Compare HC with K-means for cDNA data sets by visualization technique

Mean of the CH index for cDNA Chip
18
16
14

CH index

12
10

Euclidean
Pearson

8
6
4
2
0
Single

Complete

Average

K-Means

From the above graph we see that K-means achieves a CH index of 17.01 which is
larger CH than Single, Complete and Average with respect to their proximity
measure. Therefore we may conclude that the K-means method gives the better
result for the cDNA data sets.
Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

21
Conclusions
Our results reveal that the complete linkage with euclidean
distance exhibited the best performance for Affymetrix data
sets. For cDNA data sets the K-means clustering exhibited the
best performance in terms of recovering the true structure of
the data sets. To the best of our knowledge, the comparative
study of several HC and K-means with the validity index as CH
and DB are poorly documented in literature.

Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

22
Future Research Interest
1. Comparison on Hierarchical clustering method with the
Self-Organizing Maps method and other existing update
clustering methods.
2. Investigate the performance of the different hierarchical
clustering method in a comparison of the other existing
methods by false discovery rate (FDR), misclassification
error rate (MER), receiver operating characteristic (ROC)
and area under ROC curve using resampling technique.
3. Comparing both supervised and unsupervised methods
for gene expression data.
Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi

23
Thank
you
Reference

[1] Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell
JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner
TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown
PO, Staudt LM (2000); Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.
Nature. 403:503-511.
[2] Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander
ES, Golub TR, Korsmeyer SJ (2002); MLL translocations specify a distinctgene expression profile that
distinguishes a unique leukemia; Nat Genet. 30:41-47.
[3] Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette
M,Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M
(2001); Classification of human lung carcinomas by mRNA expression profiling reveals distinct
adenocarcinoma subclasses; Proc Natl Acad Sci USA. 98(24):13790-13795.
[4] Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach M, Rijn M van
de, Rosen GD, Perou CM, Whyte RI, Altman RB, Brown PO, Botstein D, Petersen I (2001); Diversity of gene
expression in adenocarcinoma of the lung; Proc Natl Acad Sci USA. 98(24):13784-13789.
[5] Liang Y, Diehn M, Watson N, Bollen AW, Aldape KD, Nicholas MK, Lamborn KR, Berger MS, Botstein
D, Brown PO, Israel MA (2005); Gene expression profiling reveals molecularly and clinically distinct subtypes
of glioblastoma multiforme; Proc Natl Acad Sci USA. 102(16):5814-5819.
[6] Nutt CL, Mani DR, Betensky RA, Tamayo P, Cairncross JG, Ladd C, Pohl U, Hartmann C, McLaughlin
ME, Batchelor TT, Black PM, von Deimling A, Pomeroy SL, Golub TR, Louis DN (2003); Gene expressionbased
classification of malignant gliomas correlates better with survival than histological classification; Cancer Res.
63(7):1602-1607.

More Related Content

What's hot

Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure Determination
Amjad Ibrahim
 
Phage display technology
Phage display technologyPhage display technology
Phage display technology
EchoHan4
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
Vidya Kalaivani Rajkumar
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
Prof. Dr. Basavaraj Nanjwade
 
Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomics
Bahauddin Zakariya University lahore
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
Ayesha Choudhury
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
DEBPRASAD DUTTA
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
Bioinformatics and Computational Biosciences Branch
 
BLAST
BLASTBLAST
Identification of proteins by 2D gel
Identification of proteins by 2D gelIdentification of proteins by 2D gel
Identification of proteins by 2D gel
Afra Fathima
 
Pyrosequencing
PyrosequencingPyrosequencing
Pyrosequencing
qadardana kakar
 
Prosite
PrositeProsite
Protein microarrays, ICAT, and HPLC protein purification
Protein microarrays, ICAT, and HPLC protein purificationProtein microarrays, ICAT, and HPLC protein purification
Protein microarrays, ICAT, and HPLC protein purification
Raul Soto
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomics
sonam786
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
ajay301
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
Hoffman Lab
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
Tasuduq Yaqoob
 
PROTEIN MICROARRAYS
PROTEIN MICROARRAYSPROTEIN MICROARRAYS
PROTEIN MICROARRAYS
Ann Mary Mathew
 
Nucleic Acid Hybridisation
Nucleic Acid HybridisationNucleic Acid Hybridisation
Nucleic Acid Hybridisation
mgsonline
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell Technology
QIAGEN
 

What's hot (20)

Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure Determination
 
Phage display technology
Phage display technologyPhage display technology
Phage display technology
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
 
Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomics
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
BLAST
BLASTBLAST
BLAST
 
Identification of proteins by 2D gel
Identification of proteins by 2D gelIdentification of proteins by 2D gel
Identification of proteins by 2D gel
 
Pyrosequencing
PyrosequencingPyrosequencing
Pyrosequencing
 
Prosite
PrositeProsite
Prosite
 
Protein microarrays, ICAT, and HPLC protein purification
Protein microarrays, ICAT, and HPLC protein purificationProtein microarrays, ICAT, and HPLC protein purification
Protein microarrays, ICAT, and HPLC protein purification
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomics
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 
PROTEIN MICROARRAYS
PROTEIN MICROARRAYSPROTEIN MICROARRAYS
PROTEIN MICROARRAYS
 
Nucleic Acid Hybridisation
Nucleic Acid HybridisationNucleic Acid Hybridisation
Nucleic Acid Hybridisation
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell Technology
 

Similar to A comparative study of Clustering for Gene expression data in Bioinformatics

Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Md Rahman
 
Capstone poster gail_falcione (1)
Capstone poster gail_falcione (1)Capstone poster gail_falcione (1)
Capstone poster gail_falcione (1)
Gail Falcione
 
Estimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale dataEstimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale data
Nick Stauner
 
article.pdf
article.pdfarticle.pdf
article.pdf
ProgramCoordinator9
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
Peter Embi
 
Dispensing Processes Impact Apparent Biological Activity as Determined by Com...
Dispensing Processes Impact Apparent Biological Activity as Determined by Com...Dispensing Processes Impact Apparent Biological Activity as Determined by Com...
Dispensing Processes Impact Apparent Biological Activity as Determined by Com...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Why High-Throughput Screening Data quality is important: Ephrin pharmacophore...
Why High-Throughput Screening Data quality is important: Ephrin pharmacophore...Why High-Throughput Screening Data quality is important: Ephrin pharmacophore...
Why High-Throughput Screening Data quality is important: Ephrin pharmacophore...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Arjun Manrai - National Academies Talk - June 6, 2019
Arjun Manrai - National Academies Talk - June 6, 2019Arjun Manrai - National Academies Talk - June 6, 2019
Arjun Manrai - National Academies Talk - June 6, 2019
Arjun Manrai
 
20170215 daniel gartner session 3_a1_kss_2016_paper_31
20170215 daniel gartner session 3_a1_kss_2016_paper_3120170215 daniel gartner session 3_a1_kss_2016_paper_31
20170215 daniel gartner session 3_a1_kss_2016_paper_31
International Society of Service Innovation Professionals
 
Ascendable Clarification for Coronary Illness Prediction using Classification...
Ascendable Clarification for Coronary Illness Prediction using Classification...Ascendable Clarification for Coronary Illness Prediction using Classification...
Ascendable Clarification for Coronary Illness Prediction using Classification...
ijtsrd
 
Contrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and ClassificationContrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and Classification
Artificial Intelligence Institute at UofSC
 
Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...
IJECEIAES
 
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASEMISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE
IJDKP
 
E(p)owering Your Institution
E(p)owering Your InstitutionE(p)owering Your Institution
E(p)owering Your Institution
Douglas Joubert
 
Cluster 02
Cluster 02Cluster 02
Cluster 02
Lorna Wildgaard
 
Purdue cancer center retreat poster Christy Cooper 12062014FINAL
Purdue cancer center retreat poster Christy Cooper 12062014FINALPurdue cancer center retreat poster Christy Cooper 12062014FINAL
Purdue cancer center retreat poster Christy Cooper 12062014FINAL
Christy Cooper
 
CV Chien-Wei Lin
CV Chien-Wei LinCV Chien-Wei Lin
CV Chien-Wei Lin
Chien-Wei Lin
 
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Artificial Intelligence Institute at UofSC
 
FSUJena Environmental Cheminformatics to Identify Unknowns April 2019
FSUJena Environmental Cheminformatics to Identify Unknowns April 2019FSUJena Environmental Cheminformatics to Identify Unknowns April 2019
FSUJena Environmental Cheminformatics to Identify Unknowns April 2019
Emma Schymanski
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
Elsa Fecke
 

Similar to A comparative study of Clustering for Gene expression data in Bioinformatics (20)

Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
 
Capstone poster gail_falcione (1)
Capstone poster gail_falcione (1)Capstone poster gail_falcione (1)
Capstone poster gail_falcione (1)
 
Estimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale dataEstimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale data
 
article.pdf
article.pdfarticle.pdf
article.pdf
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
 
Dispensing Processes Impact Apparent Biological Activity as Determined by Com...
Dispensing Processes Impact Apparent Biological Activity as Determined by Com...Dispensing Processes Impact Apparent Biological Activity as Determined by Com...
Dispensing Processes Impact Apparent Biological Activity as Determined by Com...
 
Why High-Throughput Screening Data quality is important: Ephrin pharmacophore...
Why High-Throughput Screening Data quality is important: Ephrin pharmacophore...Why High-Throughput Screening Data quality is important: Ephrin pharmacophore...
Why High-Throughput Screening Data quality is important: Ephrin pharmacophore...
 
Arjun Manrai - National Academies Talk - June 6, 2019
Arjun Manrai - National Academies Talk - June 6, 2019Arjun Manrai - National Academies Talk - June 6, 2019
Arjun Manrai - National Academies Talk - June 6, 2019
 
20170215 daniel gartner session 3_a1_kss_2016_paper_31
20170215 daniel gartner session 3_a1_kss_2016_paper_3120170215 daniel gartner session 3_a1_kss_2016_paper_31
20170215 daniel gartner session 3_a1_kss_2016_paper_31
 
Ascendable Clarification for Coronary Illness Prediction using Classification...
Ascendable Clarification for Coronary Illness Prediction using Classification...Ascendable Clarification for Coronary Illness Prediction using Classification...
Ascendable Clarification for Coronary Illness Prediction using Classification...
 
Contrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and ClassificationContrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and Classification
 
Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...
 
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASEMISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE
 
E(p)owering Your Institution
E(p)owering Your InstitutionE(p)owering Your Institution
E(p)owering Your Institution
 
Cluster 02
Cluster 02Cluster 02
Cluster 02
 
Purdue cancer center retreat poster Christy Cooper 12062014FINAL
Purdue cancer center retreat poster Christy Cooper 12062014FINALPurdue cancer center retreat poster Christy Cooper 12062014FINAL
Purdue cancer center retreat poster Christy Cooper 12062014FINAL
 
CV Chien-Wei Lin
CV Chien-Wei LinCV Chien-Wei Lin
CV Chien-Wei Lin
 
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
 
FSUJena Environmental Cheminformatics to Identify Unknowns April 2019
FSUJena Environmental Cheminformatics to Identify Unknowns April 2019FSUJena Environmental Cheminformatics to Identify Unknowns April 2019
FSUJena Environmental Cheminformatics to Identify Unknowns April 2019
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
 

Recently uploaded

CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdfCHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
rishi2789
 
NVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control programNVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control program
Sapna Thakur
 
share - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptxshare - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptx
Tina Purnat
 
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptxREGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
LaniyaNasrink
 
Netter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdfNetter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdf
BrissaOrtiz3
 
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdfCHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
rishi2789
 
Histololgy of Female Reproductive System.pptx
Histololgy of Female Reproductive System.pptxHistololgy of Female Reproductive System.pptx
Histololgy of Female Reproductive System.pptx
AyeshaZaid1
 
Adhd Medication Shortage Uk - trinexpharmacy.com
Adhd Medication Shortage Uk - trinexpharmacy.comAdhd Medication Shortage Uk - trinexpharmacy.com
Adhd Medication Shortage Uk - trinexpharmacy.com
reignlana06
 
Light House Retreats: Plant Medicine Retreat Europe
Light House Retreats: Plant Medicine Retreat EuropeLight House Retreats: Plant Medicine Retreat Europe
Light House Retreats: Plant Medicine Retreat Europe
Lighthouse Retreat
 
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptxMuscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Dr. Rabia Inam Gandapore
 
ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.
drhasanrajab
 
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidadeNovas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Prof. Marcus Renato de Carvalho
 
Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...
Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...
Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...
chandankumarsmartiso
 
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptxThyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Dr. Rabia Inam Gandapore
 
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdfCHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
rishi2789
 
A Classical Text Review on Basavarajeeyam
A Classical Text Review on BasavarajeeyamA Classical Text Review on Basavarajeeyam
A Classical Text Review on Basavarajeeyam
Dr. Jyothirmai Paindla
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
Dr. Jyothirmai Paindla
 
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in IndiaTop 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Swastik Ayurveda
 
Cell Therapy Expansion and Challenges in Autoimmune Disease
Cell Therapy Expansion and Challenges in Autoimmune DiseaseCell Therapy Expansion and Challenges in Autoimmune Disease
Cell Therapy Expansion and Challenges in Autoimmune Disease
Health Advances
 
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Oleg Kshivets
 

Recently uploaded (20)

CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdfCHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
 
NVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control programNVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control program
 
share - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptxshare - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptx
 
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptxREGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
 
Netter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdfNetter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdf
 
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdfCHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
 
Histololgy of Female Reproductive System.pptx
Histololgy of Female Reproductive System.pptxHistololgy of Female Reproductive System.pptx
Histololgy of Female Reproductive System.pptx
 
Adhd Medication Shortage Uk - trinexpharmacy.com
Adhd Medication Shortage Uk - trinexpharmacy.comAdhd Medication Shortage Uk - trinexpharmacy.com
Adhd Medication Shortage Uk - trinexpharmacy.com
 
Light House Retreats: Plant Medicine Retreat Europe
Light House Retreats: Plant Medicine Retreat EuropeLight House Retreats: Plant Medicine Retreat Europe
Light House Retreats: Plant Medicine Retreat Europe
 
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptxMuscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptx
 
ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.
 
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidadeNovas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
 
Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...
Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...
Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...
 
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptxThyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
 
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdfCHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
 
A Classical Text Review on Basavarajeeyam
A Classical Text Review on BasavarajeeyamA Classical Text Review on Basavarajeeyam
A Classical Text Review on Basavarajeeyam
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
 
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in IndiaTop 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in India
 
Cell Therapy Expansion and Challenges in Autoimmune Disease
Cell Therapy Expansion and Challenges in Autoimmune DiseaseCell Therapy Expansion and Challenges in Autoimmune Disease
Cell Therapy Expansion and Challenges in Autoimmune Disease
 
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
 

A comparative study of Clustering for Gene expression data in Bioinformatics

  • 1. Welcome to my presentation on A Comparative Study of Clustering for Gene Expression Data in Bioinformatics Roll: 08054746 Reg: 1484 Department of Statistics Rajshahi University Rajshahi-6205 Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 1
  • 2. Outline 1. Why choosing clustering technique ? 2. Some Objectives 3. Methods and materials 4. Results and Discussions 5. Conclusion Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 2
  • 3. 1. Why choosing Clustering Technique Cluster analysis programs are routinely run as a first step of data summary and grouping genes in a microarray data analysis. Mainly the gene expression data is so much noisy, mixture with expression pattern, down regulated and up regulated. That’s why we show here a comparative study of four clustering algorithms and two proximity measures applied on most commonly used iris data, simulated data and six real cancer gene expression data sets. Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 3
  • 4. 2. Some Objectives  Find significant cluster according to similarities, intensities and regulations among it’s objects.  Compare several method of HC with K-means based on two proximity measures.  To asses the quality and reliability of clustering by Calinaski Harabasz (CH) and Daviece Bouldin (DB) index. Bioinformatics Lab, Dept. of Statistics, University of Rajshahi Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 4
  • 5. Methods 1. Single Linkage or Nearest Neighbor Method 2. Complete Linkage or Furthest Neighbor Method 3. Average Linkage Method K-means clustering Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 5
  • 6. Davies–Bouldin (DB) Index The Davies–Bouldin index is a metric for evaluating clustering algorithms (Davies and Bouldin, 1969). This is an internal evaluation scheme and it is a cluster separation measure. Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 6
  • 7. Calinski-Harabasz (CH) Index Where, SSB is the overall between-cluster variance, SSW is the overall within-cluster variance, k is the number of clusters, and N is the number of observations. Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 7
  • 8. Data sets Chip Tissue n Armstrong-V2 [2] Affy Blood 72 3 Bhattacharjee [3] Nutt-V1 [6] Affy Lung 203 Affy Brain Alizadeh-V2 [1] cDNA Garber [4] m d 24,20,28 12582 2194 5 139,17,6,21,20 12600 1543 50 4 14,7,14,15 12625 1377 Blood 62 3 42,9,11 4022 2093 cDNA Liang [5] Lung 66 4 17,40,4,5 24192 4553 cDNA Dataset #C Dist. Classes Brain 37 3 28,6,3 24192 1411 Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 8
  • 9. 2 Clusters 3 Clusters In this example, the objects g1, g2, g3, g4, g5, g6, g7, g8, g9 and g10 have been clustered. The place at the bottom of the tree, where the object names are written, are called leaves. The junctions are called nodes. It is possible to use a hierarchical clustering algorithm to find groups in the data, by cutting the tree at a certain height. For instance, it might be considered than on the example there are two groups, (g2, g3, g1, g8) and (g6, g10, g5, g7, g4, g9) or three groups (g2, g3, g1, g8), (g6, g10) and (g5, g7, g4, g9) or ten groups, each containing only one leaf. Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 9
  • 10. Hierarchical Clustering of Simulated Data Fig: Heat map Green color dendrogram shows the best result and we make a Heat map by this method. i.e Complete HC with respect to Euclidean distance give the best result then other methods. Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 10
  • 11. K-means of Simulated Data Table: Davies-Bouldin index No. of Cluster Cluster Size DB index K=2 20,40 0.897 K=3 20,20,20 0.321 K=4 12,20,8,20 0.797 K=5 4,4,12,20,20 0.825 From the above table we see, when the number of cluster k=3 the DB index give the lower value. Therefore we may conclude that three clusters are present in this data set. Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 11
  • 12. HC of Armstrong-V2 Data(d) Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 12
  • 13. Several HC Nutt-V1 Data (c) Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 13
  • 14. K-means of Alizadeh-V2 Data Table: Davies-Bouldin index No. of Cluster Cluster Size DB index K=2 K=3 , , K=4 K=5 22 9 3 10 18 2. Table represent, when the number of cluster k=3 the DB index give the lower value. The sizes of the cluster is , and and the actual cluster size is , 9 and 11. When the number of cluster is 3 than the DB index gives the lower value. Therefore we may conclude that three clusters are present in Alizadeh-V2 data. Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 14
  • 15. K-means of Liang Data Table: Davies-Bouldin index No. of Cluster Cluster Size DB index K=2 29 8 1.23 K=3 , , K=4 6 9 3 19 2.09 K=5 1 2 19 14 1 1.215 Table 4.3 represent, when the number of cluster k=3 the DB index give the lower value. The sizes of the cluster is , and 3 and the actual cluster size is , and . When the number of cluster is 3 than the DB index gives the lower value. Therefore we may conclude that three clusters are present in Armstrong-V2 data. Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 15
  • 16. Several HC Liang Data(c,d,e,f) 28,6,3 Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 16
  • 17. Heat map of Liang Data Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 17
  • 18. Compare HC with K-means for Affymetrix data sets Dataset Distance Method Cluster Method Calinski-Harabasz (CH) Armstrong-V2 Euclidean Single 1.889 Euclidean Complete 11.803 Euclidean Average 6.674 Pearson Single 0.914 Pearson Complete 12.559 Pearson Average 10.393 K-means 11.943 Euclidean Single 1.786 Euclidean Complete 34.702 Euclidean Average 26.850 Pearson Single 1.700 Pearson Complete 26.512 Pearson Average 12.902 K-means 22.924 Euclidean Single 3.167 Euclidean Complete 7.938 Euclidean Average 5.269 Pearson Single 0.941 Pearson Complete 4.273 Pearson Average 2.987 K-means 6.051 Bhattacharjee Nutt-V1 Bioinformatics Lab, Dept. of Statistics, University of Rajshahi Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 18
  • 19. Compare HC with K-means for Affymetrix data sets by visualization technique Mean of the CH index for Affy Chip 20 18 16 CH index 14 12 Pearson Euclidean 10 8 6 4 2 0 Single Average Complete K-Means From the above graph we see that Complete linkage with Euclidean achieves CH index of 18.14 which is larger CH than Single, Average and K-means with respect to their proximity measure. Therefore we may conclude that the complete linkage method gives the better result for the Affymetrix data sets. Bioinformatics Lab, Dept. of Statistics, University of Rajshahi 19
  • 20. Compare HC with K-means for cDNA data sets Dataset Distance Method Cluster Method Calinski-Harabasz (CH) Alizadeh-V2 Euclidean Euclidean Euclidean Pearson Pearson Pearson Single Complete Average Single Complete Average 2.047 11.161 11.068 0.980 11.229 10.319 K-means 13.003 Single 2.772 Garber Euclidean Euclidean Euclidean Pearson Pearson Pearson Liang Complete 19.097 5.166 0.855 7.693 18.912 9.269 9.057 19.665 10.279 19.665 19.665 19.665 K-means Euclidean Euclidean Euclidean Pearson Pearson Pearson Average Single Complete Average K-means Single Complete Average Single Complete Average 23.781 Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 20
  • 21. Compare HC with K-means for cDNA data sets by visualization technique Mean of the CH index for cDNA Chip 18 16 14 CH index 12 10 Euclidean Pearson 8 6 4 2 0 Single Complete Average K-Means From the above graph we see that K-means achieves a CH index of 17.01 which is larger CH than Single, Complete and Average with respect to their proximity measure. Therefore we may conclude that the K-means method gives the better result for the cDNA data sets. Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 21
  • 22. Conclusions Our results reveal that the complete linkage with euclidean distance exhibited the best performance for Affymetrix data sets. For cDNA data sets the K-means clustering exhibited the best performance in terms of recovering the true structure of the data sets. To the best of our knowledge, the comparative study of several HC and K-means with the validity index as CH and DB are poorly documented in literature. Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 22
  • 23. Future Research Interest 1. Comparison on Hierarchical clustering method with the Self-Organizing Maps method and other existing update clustering methods. 2. Investigate the performance of the different hierarchical clustering method in a comparison of the other existing methods by false discovery rate (FDR), misclassification error rate (MER), receiver operating characteristic (ROC) and area under ROC curve using resampling technique. 3. Comparing both supervised and unsupervised methods for gene expression data. Md. Bipul Hossen, Dept. of Statistics, University of Rajshahi 23
  • 25. Reference [1] Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000); Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 403:503-511. [2] Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ (2002); MLL translocations specify a distinctgene expression profile that distinguishes a unique leukemia; Nat Genet. 30:41-47. [3] Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M,Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M (2001); Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses; Proc Natl Acad Sci USA. 98(24):13790-13795. [4] Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach M, Rijn M van de, Rosen GD, Perou CM, Whyte RI, Altman RB, Brown PO, Botstein D, Petersen I (2001); Diversity of gene expression in adenocarcinoma of the lung; Proc Natl Acad Sci USA. 98(24):13784-13789. [5] Liang Y, Diehn M, Watson N, Bollen AW, Aldape KD, Nicholas MK, Lamborn KR, Berger MS, Botstein D, Brown PO, Israel MA (2005); Gene expression profiling reveals molecularly and clinically distinct subtypes of glioblastoma multiforme; Proc Natl Acad Sci USA. 102(16):5814-5819. [6] Nutt CL, Mani DR, Betensky RA, Tamayo P, Cairncross JG, Ladd C, Pohl U, Hartmann C, McLaughlin ME, Batchelor TT, Black PM, von Deimling A, Pomeroy SL, Golub TR, Louis DN (2003); Gene expressionbased classification of malignant gliomas correlates better with survival than histological classification; Cancer Res. 63(7):1602-1607.