SlideShare a Scribd company logo
1 of 39
Download to read offline
K-Means Clustering Problem
            Ahmad Sabiq
          Febri Maspiyanti
       Indah Kuntum Khairina
          Wiwin Farhania
              Yonatan
What is k-means?
• To partition n objects into k clusters, based on
  attributes.
  – Objects of the same cluster are close their
    attributes are related to each other.
  – Objects of different clusters are far apart their
    attributes are very dissimilar.
Algorithm
• Input: n objects, k (integer k ≤ n)
• Output: k clusters
• Steps:
   1. Select k initial centroids.
   2. Calculate the distance between each object and
      each centroid.
   3. Assign each object to the cluster with the nearest
      centroid.
   4. Recalculate each centroid.
   5. If the centroids don’t change, stop (convergence).
      Otherwise, back to step 2.
• Complexity: O(k.n.d.total_iteration)
Initialization
• Why is it important? What does it affect?
  – Clustering result local optimum!
  – Total iteration / complexity
Good Initialization
3 clusters with 2 iterations…
Bad Initialization
3 clusters with 4 iterations…
Initialization Methods
1.   Random
2.   Forgy
3.   Macqueen
4.   Kaufman
Random
• Algorithm:
  1. Assigns each object to a random cluster.
  2. Computes the initial centroid of each cluster.
Random
Random
Random
9
8
7
6
5
4
3
2
1
0
    0   5   10    15   20   25   30   35
Forgy
• Algorithm:
  1. Chooses k objects at random and uses them as the initial
     centroids.
Forgy
9
8
7
6
5
4
3
2
1
0
    0   5   10   15   20   25   30   35
MacQueen
• Algorithm:
  1. Chooses k objects at random and uses them as the initial
     centroids.
  2. Assign each object to the cluster with the nearest
     centroid.
  3. After each assignment, recalculate the centroid.
MacQueen
9
8
7
6
5
4
3
2
1
0
    0   5   10     15   20   25   30   35
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
                        C=0




d = 24,33

            D = 15,52
Kaufman
          C=0


          C=0   C=0

          C=0




          C=0
Kaufman
                       C=0


                       C=0   C=0

                       C=0



∑C1 = 2,74
                       C=0
Kaufman
                                       ∑C5 = 52,55

                                       ∑C6 = 55,88   ∑C9 = 42,69

                                  ∑C7 = 53,77




∑C1 = 2,74                           ∑C8 = 51,16

         ∑C2 = 12,,21


         ∑C3 = 12,36



        ∑C3 = 8,38
Kaufman
                                       ∑C5 = 52,55

                                       ∑C6 = 55,88   ∑C9 = 42,69

                                  ∑C7 = 53,77




∑C1 = 2,74                           ∑C8 = 51,16

         ∑C2 = 12,,21


         ∑C3 = 12,36



        ∑C3 = 8,38
Reference
1. J.M. Peña, J.A. Lozano, and P. Larrañaga. An Empirical
   Comparison of Four Initialization Methods for the K-
   Means Algorithm. Pattern Recognition Letters, vol. 20,
   pp. 1027–1040. 1999.
2. J.R. Cano, O. Cordón, F. Herrera, and L. Sánchez. A
   Greedy Randomized Adaptive Search Procedure
   Applied to the Clustering Problem as an Initialization
   Process Using K-Means as a Local Search Procedure.
   Journal of Intelligent and Fuzzy Systems, vol. 12, pp.
   235 – 242. 2002.
3. L. Kaufman and P.J. Rousseeuw. Finding Groups in
   Data: An Introduction to Cluster Analysis. Wiley. 1990.
Questions
1. Kenapa inisialisasi penting pada k-means?
2. Metode inisialisasi apa yang memiliki greedy
   choice property?
3. Jelaskan kompleksitas O(nkd) pada metode
   Random.

More Related Content

What's hot

はじパタ6章前半
はじパタ6章前半はじパタ6章前半
はじパタ6章前半
T T
 
指数時間アルゴリズムの最先端
指数時間アルゴリズムの最先端指数時間アルゴリズムの最先端
指数時間アルゴリズムの最先端
Yoichi Iwata
 

What's hot (20)

PRML第6章「カーネル法」
PRML第6章「カーネル法」PRML第6章「カーネル法」
PRML第6章「カーネル法」
 
AtCoder Beginner Contest 021 解説
AtCoder Beginner Contest 021 解説AtCoder Beginner Contest 021 解説
AtCoder Beginner Contest 021 解説
 
第5章混合分布モデルによる逐次更新型異常検知
第5章混合分布モデルによる逐次更新型異常検知第5章混合分布モデルによる逐次更新型異常検知
第5章混合分布モデルによる逐次更新型異常検知
 
안.전.제.일. 강화학습!
안.전.제.일. 강화학습!안.전.제.일. 강화학습!
안.전.제.일. 강화학습!
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
AtCoder Regular Contest 027 解説
AtCoder Regular Contest 027 解説AtCoder Regular Contest 027 解説
AtCoder Regular Contest 027 解説
 
Juliaで学ぶ Hamiltonian Monte Carlo (NUTS 入り)
Juliaで学ぶ Hamiltonian Monte Carlo (NUTS 入り)Juliaで学ぶ Hamiltonian Monte Carlo (NUTS 入り)
Juliaで学ぶ Hamiltonian Monte Carlo (NUTS 入り)
 
Arc041
Arc041Arc041
Arc041
 
はじパタ6章前半
はじパタ6章前半はじパタ6章前半
はじパタ6章前半
 
異常検知と変化検知で復習するPRML
異常検知と変化検知で復習するPRML異常検知と変化検知で復習するPRML
異常検知と変化検知で復習するPRML
 
Reinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsReinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed Bandits
 
선형대수 05. 열벡터공간
선형대수 05. 열벡터공간선형대수 05. 열벡터공간
선형대수 05. 열벡터공간
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed bandit
 
指数時間アルゴリズムの最先端
指数時間アルゴリズムの最先端指数時間アルゴリズムの最先端
指数時間アルゴリズムの最先端
 
PRML輪読#6
PRML輪読#6PRML輪読#6
PRML輪読#6
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Kmeans
KmeansKmeans
Kmeans
 
AtCoder Regular Contest 034 解説
AtCoder Regular Contest 034 解説AtCoder Regular Contest 034 解説
AtCoder Regular Contest 034 解説
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
AtCoder Regular Contest 036 解説
AtCoder Regular Contest 036 解説AtCoder Regular Contest 036 解説
AtCoder Regular Contest 036 解説
 

Viewers also liked

Маркетинг финансовых услуг - выступление для студентов
Маркетинг финансовых услуг - выступление для студентовМаркетинг финансовых услуг - выступление для студентов
Маркетинг финансовых услуг - выступление для студентов
Cyril Savitsky
 
Experimental design
Experimental designExperimental design
Experimental design
Dan Toma
 
سبيلك الى الثروة و النجاح
سبيلك الى الثروة و النجاحسبيلك الى الثروة و النجاح
سبيلك الى الثروة و النجاح
Morad Kheloufi Kheloufi
 
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
School of Efficient Language Studying Lingvocat.com/ Школа результативных языков Lingvocat.com
 
Trulia Metro Movers Report - Winter 2012
Trulia Metro Movers Report - Winter 2012Trulia Metro Movers Report - Winter 2012
Trulia Metro Movers Report - Winter 2012
Trulia
 

Viewers also liked (20)

Kmeans plusplus
Kmeans plusplusKmeans plusplus
Kmeans plusplus
 
Clustering, k means algorithm
Clustering, k means algorithmClustering, k means algorithm
Clustering, k means algorithm
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
PRML 9.1-9.2: K-means Clustering & Mixtures of Gaussians
PRML 9.1-9.2: K-means Clustering & Mixtures of GaussiansPRML 9.1-9.2: K-means Clustering & Mixtures of Gaussians
PRML 9.1-9.2: K-means Clustering & Mixtures of Gaussians
 
Kmeans
KmeansKmeans
Kmeans
 
The Public Opinion Landscape: Election 2016
The Public Opinion Landscape: Election 2016The Public Opinion Landscape: Election 2016
The Public Opinion Landscape: Election 2016
 
Comprension de lectura de los mexicanos
Comprension de lectura de los mexicanosComprension de lectura de los mexicanos
Comprension de lectura de los mexicanos
 
广东证券见记者发表
广东证券见记者发表广东证券见记者发表
广东证券见记者发表
 
 
Zaragoza turismo 243
Zaragoza turismo 243Zaragoza turismo 243
Zaragoza turismo 243
 
Маркетинг финансовых услуг - выступление для студентов
Маркетинг финансовых услуг - выступление для студентовМаркетинг финансовых услуг - выступление для студентов
Маркетинг финансовых услуг - выступление для студентов
 
Experimental design
Experimental designExperimental design
Experimental design
 
سبيلك الى الثروة و النجاح
سبيلك الى الثروة و النجاحسبيلك الى الثروة و النجاح
سبيلك الى الثروة و النجاح
 
Mumbai - Zappos - Downtown Project - Dec 10, 2015
Mumbai - Zappos - Downtown Project - Dec 10, 2015Mumbai - Zappos - Downtown Project - Dec 10, 2015
Mumbai - Zappos - Downtown Project - Dec 10, 2015
 
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
 
Who Needs Love! In Japan, Many Couples Don't- by Nicholas D. Kristof
Who Needs Love! In Japan, Many Couples Don't- by Nicholas D. KristofWho Needs Love! In Japan, Many Couples Don't- by Nicholas D. Kristof
Who Needs Love! In Japan, Many Couples Don't- by Nicholas D. Kristof
 
Kmeans
KmeansKmeans
Kmeans
 
Trulia Metro Movers Report - Winter 2012
Trulia Metro Movers Report - Winter 2012Trulia Metro Movers Report - Winter 2012
Trulia Metro Movers Report - Winter 2012
 

Similar to Kmeans initialization

Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
Junghoon Kim
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
Gianmario Spacagna
 

Similar to Kmeans initialization (20)

Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Clustering Theory
Clustering TheoryClustering Theory
Clustering Theory
 
K means-1
K means-1K means-1
K means-1
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
 
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringDMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based Clustering
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
 
Data Mining Lecture_7.pptx
Data Mining Lecture_7.pptxData Mining Lecture_7.pptx
Data Mining Lecture_7.pptx
 
K means clustering
K means clusteringK means clustering
K means clustering
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithms
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
Clustering
ClusteringClustering
Clustering
 
Bioalgo 2012-03-randomized
Bioalgo 2012-03-randomizedBioalgo 2012-03-randomized
Bioalgo 2012-03-randomized
 
Ch12 randalgs
Ch12 randalgsCh12 randalgs
Ch12 randalgs
 
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner softwareData Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Kmeans initialization

  • 1. K-Means Clustering Problem Ahmad Sabiq Febri Maspiyanti Indah Kuntum Khairina Wiwin Farhania Yonatan
  • 2. What is k-means? • To partition n objects into k clusters, based on attributes. – Objects of the same cluster are close their attributes are related to each other. – Objects of different clusters are far apart their attributes are very dissimilar.
  • 3. Algorithm • Input: n objects, k (integer k ≤ n) • Output: k clusters • Steps: 1. Select k initial centroids. 2. Calculate the distance between each object and each centroid. 3. Assign each object to the cluster with the nearest centroid. 4. Recalculate each centroid. 5. If the centroids don’t change, stop (convergence). Otherwise, back to step 2. • Complexity: O(k.n.d.total_iteration)
  • 4. Initialization • Why is it important? What does it affect? – Clustering result local optimum! – Total iteration / complexity
  • 5. Good Initialization 3 clusters with 2 iterations…
  • 6. Bad Initialization 3 clusters with 4 iterations…
  • 7. Initialization Methods 1. Random 2. Forgy 3. Macqueen 4. Kaufman
  • 8. Random • Algorithm: 1. Assigns each object to a random cluster. 2. Computes the initial centroid of each cluster.
  • 11. Random 9 8 7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 35
  • 12. Forgy • Algorithm: 1. Chooses k objects at random and uses them as the initial centroids.
  • 13. Forgy 9 8 7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 35
  • 14. MacQueen • Algorithm: 1. Chooses k objects at random and uses them as the initial centroids. 2. Assign each object to the cluster with the nearest centroid. 3. After each assignment, recalculate the centroid.
  • 15. MacQueen 9 8 7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 35
  • 33. Kaufman C=0 d = 24,33 D = 15,52
  • 34. Kaufman C=0 C=0 C=0 C=0 C=0
  • 35. Kaufman C=0 C=0 C=0 C=0 ∑C1 = 2,74 C=0
  • 36. Kaufman ∑C5 = 52,55 ∑C6 = 55,88 ∑C9 = 42,69 ∑C7 = 53,77 ∑C1 = 2,74 ∑C8 = 51,16 ∑C2 = 12,,21 ∑C3 = 12,36 ∑C3 = 8,38
  • 37. Kaufman ∑C5 = 52,55 ∑C6 = 55,88 ∑C9 = 42,69 ∑C7 = 53,77 ∑C1 = 2,74 ∑C8 = 51,16 ∑C2 = 12,,21 ∑C3 = 12,36 ∑C3 = 8,38
  • 38. Reference 1. J.M. Peña, J.A. Lozano, and P. Larrañaga. An Empirical Comparison of Four Initialization Methods for the K- Means Algorithm. Pattern Recognition Letters, vol. 20, pp. 1027–1040. 1999. 2. J.R. Cano, O. Cordón, F. Herrera, and L. Sánchez. A Greedy Randomized Adaptive Search Procedure Applied to the Clustering Problem as an Initialization Process Using K-Means as a Local Search Procedure. Journal of Intelligent and Fuzzy Systems, vol. 12, pp. 235 – 242. 2002. 3. L. Kaufman and P.J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley. 1990.
  • 39. Questions 1. Kenapa inisialisasi penting pada k-means? 2. Metode inisialisasi apa yang memiliki greedy choice property? 3. Jelaskan kompleksitas O(nkd) pada metode Random.