Multi-Objective Optimization for Clustering of
Medical Publications
Asif Ekbal1

Sriparna Saha1

India Institute of Techno...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Contents

Clustering for Evidenc...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Contents

Clustering for Evidenc...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Evidence Based Medicine

http://...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

The Dream

MOO for Medical Clust...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

The Bottom-line Answer

MOO for ...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

A Means of Getting There
Output
...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

This Work

Each question is form...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Related Work

Uses of Document C...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Contents

Clustering for Evidenc...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Clustering and Multi-Objective O...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Information in Internal Validity...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

I -Index (Maulik & Bandyopadhyay...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

XB-Index (Xie & Beni, 1991)

XB(...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

MOO: The Pareto Optimal Front
f2...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Contents

Clustering for Evidenc...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

String Representation
AMOSA-clus...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Assignment of Points to the Clus...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Search Operators
Mutation 1 Pert...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Selecting a Solution
The algorit...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Contents

Clustering for Evidenc...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Data
Clinical Inquiries from the...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Results

Distance
Measure

AMOSA...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Finding the Number of Clusters
D...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Finding the Number of Clusters

...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Conclusions
Conclusions
Unsuperv...
Clustering for Evidence Based Medicine

Clustering as a MOO Problem

AMOSA-clus

Results

Conclusions
Conclusions
Unsuperv...
Upcoming SlideShare
Loading in …5
×

Multi-Objective Optimization for Clustering of Medical Publications

608 views

Published on

A. Ekbal, S. Saha, D. Mollá, and K. Ravikumar.
Multi-Objective Optimization for Clustering of Medical
Publications (2013). Proceedings of the Australasian
Language Technology Association Workshop 2013
(ALTA 2013),
pp53-61, Brisbane, Australia. http://aclweb.org/anthology/U/U13/

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
608
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Multi-Objective Optimization for Clustering of Medical Publications

  1. 1. Multi-Objective Optimization for Clustering of Medical Publications Asif Ekbal1 Sriparna Saha1 India Institute of Technology1 Patna, Bihar, India Diego Moll´2 a K Ravikumar1 Centre for Language Technology2 Macquarie University Sydney, Australia ALTA 2013, Brisbane, Australia
  2. 2. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Contents Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 2/26
  3. 3. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Contents Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 3/26
  4. 4. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Evidence Based Medicine http://laikaspoetnik.wordpress.com/2009/04/04/evidence-based-medicine-the-facebook-of-medicine/ MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 4/26
  5. 5. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results The Dream MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 5/26
  6. 6. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results The Bottom-line Answer MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 6/26
  7. 7. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results A Means of Getting There Output Input QUESTION: Which treatments work best for hemorrhoids? DOCUMENTS: [11289288] [12972967] [1442682] [15486746] [16235372] [16252313] [17054255] [17380367] clustering =⇒ summarisation 1. Excision is the most effective treatment for thrombosed external hemorrhoids. [11289288] [12972967] [15486746] 2. For prolapsed internal hemorrhoids, the best definitive treatment is traditional hemorrhoidectomy. [17054255] [17380367] 3. Of nonoperative techniques, rubber band ligation produces the lowest rate of recurrence. [1442682] [16252313] [16235372] MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 7/26
  8. 8. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results This Work Each question is formulated as an independent clustering task. Input Output QUESTION: Which treatments work best for hemorrhoids? DOCUMENTS: [11289288] [12972967] [1442682] [15486746] [16235372] [16252313] [17054255] [17380367] clustering =⇒ MOO for Medical Clustering 1. [11289288] [12972967] [15486746] 2. [17054255] [17380367] 3. [1442682] [16252313] [16235372] Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 8/26
  9. 9. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Related Work Uses of Document Clustering Clustering in EBM Web search Cluster search results Topic detection and tracking Cluster based on interventions Training data expansion Shash & Molla (2013): k-means clustering on our data set Multi-document summarisation MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 9/26
  10. 10. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Contents Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 10/26
  11. 11. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Clustering and Multi-Objective Optimization Most existing clustering techniques are based on a single criterion of goodness. Several criteria of goodness have been proposed. So why not try several criteria at once? Internal Validity External Validity BIC-index CH-index Minkowski scores Silhouette-index F-measures DB-index ... ... MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 11/26
  12. 12. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Information in Internal Validity Indices Compactness Measures the distance among the various elements of the cluster. We want clusters with short distances between its elements. Separability Measures the distance between clusters. We want relatively large distances between clusters. MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 12/26
  13. 13. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results I -Index (Maulik & Bandyopadhyay, 2002) I (K ) = ( K EK DK cj xk j nk E1 EK = = = = = = 1 E1 × × DK )p K EK number of clusters nk K k k=1 j=1 de (c k , x j ) K maxi,j=1 de (c i , c j ) centroid of the jth cluster jth point of the kth cluster total number of points present in the kth cluster increases I as the clusters become more compact. DK increases I as the separation between clusters increase. (p is a parameter set to 2 in this paper) MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 13/26
  14. 14. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results XB-Index (Xie & Beni, 1991) XB(K ) = K cj xk j n [uij ]K ×n = = = = = K i=1 n 2 j=1 uij xj − ci n(mini=k c i − c k 2 2) number of clusters centroid of the jth cluster jth point of the kth cluster total number of points present in the dataset cluster membership matrix The numerator quantifies the compactness of the clusters. The denominator quantifies the separation between clusters. MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 14/26
  15. 15. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results MOO: The Pareto Optimal Front f2(minimize) 2 4 1 5 3 f1(maximize) MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 15/26
  16. 16. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Contents Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 16/26
  17. 17. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results String Representation AMOSA-clus implements simulated annealing (SA). Centroid-based real-encoding: Each member of the archive is encoded as a string that represents the centroids of the partitions. Each centroid is indivisible. Given a fixed maximum number of clusters Kmax , the initial number of centroids and their centroids are determined randomly. < 12.3 1.4 22.1 0.01 0.0 15.3 10.2 7.5 > Represents four cluster centroids: (12.3, 1.4), (22.1, 0.01), (0.0, 15.3), (10.2, 7.5) MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 17/26
  18. 18. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Assignment of Points to the Clusters Assignment of points and update of cluster centroids resembles an iteration of the K -means clustering algorithm. 1. A point j is assigned to the cluster k whose centroid has the minimum distance to j: k = argmini=1,...K d(x j , c i ) (1) 2. After all points are assigned to a cluster, the cluster centroids are updated: ci = MOO for Medical Clustering ni i j=1 (x j ) ni , 1≤i ≤K Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a (2) 18/26
  19. 19. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Search Operators Mutation 1 Perturb the centroids of a random cluster using a Laplacian distribution: p( ) ∝ e − | −µ| δ Mutation 2 Delete a random cluster centroid. Mutation 3 Add a new cluster centroid. < 3.5 1.5 2.1 4.9 1.6 1.2 > 1. If we choose centroid 2, then update centroid (2.1, 4.9). The new string is: < 3.5 1.5 1.2 3.6 1.6 1.2 > 2. If we choose centroid 3, the new string will be: < 3.5 1.5 2.1 4.9 >. 3. New string: < 3.5 1.5 2.1 4.9 MOO for Medical Clustering 1.6 1.2 9.7 2.5 > Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 19/26
  20. 20. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Selecting a Solution The algorithm produces a set of alternative solutions. Each solution is optimal according to some criteria. Unsupervised Setting Semi-supervised Setting Choose one solution randomly. f2(minimize) 2 Select the solution with best entropy in known assignments. 4 1 Each question has a portion of known clustering assignments. 5 3 f1(maximize) MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 20/26
  21. 21. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Contents Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 21/26
  22. 22. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Data Clinical Inquiries from the Journal of Family Practice. 276 clinical questions (276 clustering tasks). Each question has an average of 5.89 documents. Which treatments work best for hemorrhoids? 1. Excision is the most effective treatment for thrombosed external hemorrhoids. [11289288] [12972967] [15486746] 2. For prolapsed internal hemorrhoids, the best definitive treatment is traditional hemorrhoidectomy. [17054255] [17380367] 3. Of nonoperative techniques, rubber band ligation produces the lowest rate of recurrence. [1442682] [16252313] [16235372] MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 22/26
  23. 23. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Results Distance Measure AMOSA-clus1 AMOSA-clus2 best average best average K-means (baseline) Euclidean Cosine 0.190 0.187 0.249 0.231 0.177 0.177 0.235 0.230 0.240 0.237 Unsupervised: Average solution is slightly better than baseline (differences statistically significant). Semi-supervised: Best solution is clearly better than baseline (differences statistically significant). MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 23/26
  24. 24. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Finding the Number of Clusters Distance Measure AMOSA-clus1 AMOSA-clus2 best average best average K-means (baseline) Euclidean Cosine 0.190 0.187 0.249 0.231 0.177 0.177 0.235 0.230 0.240 0.237 AMOSA-clus1: Number of clusters as given by the original data. Average 2.38 clusters. AMOSA-clus2: Try several numbers of clusters and select the solution that optimises I -index and XB-index. Euclidean distance: Average 2.34 clusters. Cosine distance: Average 2.51 clusters. MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 24/26
  25. 25. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Finding the Number of Clusters error = − predictedi )2 # of questions i (targeti Method Error AMOSA-clus2 Cosine AMOSA-clus2 Euclidean k=1 k=2 k=3 k=4 Rule of Thumb Cover 1.90 1.91 3.91 2.14 2.38 4.61 2.56 1.98 MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 25/26
  26. 26. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Conclusions Conclusions Unsupervised setting: slight improvement over k-means baseline. Semi-supervised setting: clear improvement over k-means baseline. Number of clusters: better than standard methods. Further Work Test on other domains. Test using other cluster validity indices. Compare with other semi-supervised methods. MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 26/26
  27. 27. Clustering for Evidence Based Medicine Clustering as a MOO Problem AMOSA-clus Results Conclusions Conclusions Unsupervised setting: slight improvement over k-means baseline. Semi-supervised setting: clear improvement over k-means baseline. Number of clusters: better than standard methods. Further Work Test on other domains. Test using other cluster validity indices. Compare with other semi-supervised methods. Questions? MOO for Medical Clustering Asif Ekbal, Sriparna Saha, Diego Moll´, K Ravikumar a 26/26

×