SlideShare a Scribd company logo
1 of 19
Unsupervised Learning: k-means
CS771: Introduction to Machine Learning
Nisheeth
CS771: Intro to ML
Unsupervised Learning
2
 It’s about learning interesting/useful structures in the data
(unsupervisedly!)
 There is no supervision (no labels/responses), only inputs 𝒙1, 𝒙2, … , 𝒙𝑁
 Some examples of unsupervised learning
 Clustering: Grouping similar inputs together (and dissimilar ones far apart)
 Dimensionality Reduction: Reducing the data dimensionality
 Estimating the probability density of data (which distribution “generated” the data)
 Most unsup. learning algos can also be seen as learning a new
representation of data
 For example, hard clustering can be used to get a one-hot representation
Already looked at
some basic
approaches(e.g.,
estimating a
Gaussian given
observed data)
Each point belongs
deterministically to a single
cluster
In contrast, there is also
soft/probabilistic clustering in which
𝒛𝑛 will be be probability vector that
sums to 1(will see later)
CS771: Intro to ML
Clustering
3
 Given: 𝑁 unlabeled examples 𝒙1, 𝒙2, … , 𝒙𝑁 ; desired no. of partitions 𝐾
 Goal: Group the examples into 𝐾 “homogeneous” partitions
 Loosely speaking, it is classification without ground truth labels
 A good clustering is one that achieves
 High within-cluster similarity
 Low inter-cluster similarity
In some cases, we may not know
the right number of clusters in the
data and may want to learn that
(technique exists for doing this but
beyond the scope)
CS771: Intro to ML
Similarity can be Subjective
4
 Clustering only looks at similarities b/w inputs, since no labels are given
 Without labels, similarity can be hard to define
 Thus using the right distance/similarity is very important in clustering
 In some sense, related to asking: “Clustering based on what”?
Picture courtesy: http://www.guy-sports.com/humor/videos/powerpoint presentation dogs.htm
CS771: Intro to ML
Clustering: Some Examples
5
 Document/Image/Webpage Clustering
 Image Segmentation (clustering pixels)
 Clustering web-search results
 Clustering (people) nodes in (social) networks/graphs
 .. and many more..
Picture courtesy: http://people.cs.uchicago.edu/∼pff/segment/
CS771: Intro to ML
Types of Clustering
6
 Flat or Partitional clustering
 Partitions are independent of each other
 Hierarchical clustering
 Partitions can be visualized using a tree structure (a dendrogram)
In hierarchical clustering, we can look
at a clustering for any given number of
cluster by “cutting the dendrogram at
an appropriate level (so 𝐾 does not
have to be specified)
Hierarchical clustering
gives us a clustering at
multiple levels of
granularity
CS771: Intro to ML
Flat Clustering: K-means algorithm (Lloyd, 1957)
7
 Input: 𝑁 unlabeled examples 𝒙1, 𝒙2, … , 𝒙𝑁; 𝒙𝑛 ∈ ℝ𝐷; desired no. of
partitions 𝐾
 Desired Output: Cluster assignments of these 𝑁 examples and 𝐾
cluster means
 Initialize: The 𝐾 cluster means denoted by 𝝁1, 𝝁2, … , 𝝁𝐾; with each 𝝁𝑘 ∈
ℝ𝐷
 Usually initialized randomly, but good initialization is crucial; many smarter
initialization heuristics exist (e.g., K-means++, Arthur & Vassilvitskii, 2007)
 Iterate:
 (Re)-Assign each input 𝑥𝑛 to its closest cluster center (based on the smallest Eucl.
distance)
 Update the cluster means
𝒞𝑘: Set of
examples assigned
to cluster 𝑘 with
center 𝝁𝑘
Some ways to declare convergence if
between two successive iterations:
- Cluster means don’t change
- Cluster assignments don’t change
- Clustering “loss” doesn’t
change by much
CS771: Intro to ML
K-means algorithm: Summarized Another Way
8
 Notation: 𝑧𝑛 ∈ 1,2, … , 𝐾 or 𝒛𝑛 is a 𝐾-dim one-hot vector
 (𝑧𝑛𝑘 = 1 and 𝑧𝑛 = 𝑘 mean the same)
This basic 𝐾-means models
each cluster by a single mean
𝜇𝑘. Ignores size/shape of
clusters
Can be fixed by modeling
each cluster by a probability
distribution, such as Gaussian
(e.g., Gaussian Mixture Model;
will see later)
𝐾-means algo can also be seen as
doing a compression by
“quantization”: Representing
each of the 𝑁 inputs by one of the
𝐾 < 𝑁 means
CS771: Intro to ML
K-means: An Illustration
9
CS771: Intro to ML
K-means = LwP with Unlabeled Data
10
 Guess the means
 Predict the labels (cluster ids)
 Recompute the means using
these predicted labels
 Repeat until not converged
CS771: Intro to ML
The K-means Algorithm: Some Comments
11
 One of the most popular clustering algorithms
 Very widely used, guaranteed to converge (to a local minima; will see
a proof)
 Can also be used as a sub-routine in graph clustering (in the Spectral
Clustering algorithm): Inputs are given as an 𝑁 × 𝑁 adjacency matrix A
(𝐴𝑛𝑚 = 0/1)
 Perform a spectral decomposition of the graph Laplacian of A to get F (an 𝑁 ×
𝐾 matrix)
 Run 𝐾-means using rows of the F matrix as the inputs
 Has some shortcomings but can be improved upon, e.g.,
 Can be kernelized (using kernels or using kernel-based landmarks/random features)
 More flexible cluster sizes/shapes via probabilistic models (e.g., every cluster is a
Gaussian)
 Soft-clustering (fractional/probabilistic memberships): 𝑧𝑛 is not one-hot but a
CS771: Intro to ML
What Loss Function is K-means Optimizing?
12
 Let 𝝁1, 𝝁2, … , 𝝁𝐾 be the K cluster centroids/means
 Let 𝑧𝑛𝑘 ∈ {0, 1} be s.t. 𝑧𝑛𝑘 = 1 if 𝒙𝑛 belongs to cluster 𝑘, and 0
otherwise
 Define the distortion or “loss” for the cluster assignment of 𝒙𝑛
 Total distortion over all points defines the 𝐾-means “loss function”
 The 𝐾-means problem is to minimize this objective w.r.t. µ and 𝒁
 Alternating optimization on this loss would give the K-means (Lloyd’s) algorithm we
X Z
𝝁
N
K K
K
≈
Row 𝑛 is 𝒛𝑛
(one-hot
vector)
Row 𝑘 is
𝝁𝑘
𝒛𝑛 = [𝑧𝑛1, 𝑧𝑛2, … , 𝑧𝑛𝐾]
denotes a length 𝐾 one-
hot encoding of 𝒙𝑛
CS771: Intro to ML
K-means Loss: Several Forms, Same Meaning!
13
 Notation: 𝑿 is 𝑁 × 𝐷
 𝒁 is 𝑁 × 𝐾(each row is a one-hot 𝑧𝑛 or equivalently 𝑧𝑛 ∈ {1,2, … , 𝐾})
 𝝁 is K × D (each row is a 𝜇𝑘)
 Note: Most unsup. learning algos try to minimize a distortion or recon
error
Distortion on
assigning 𝒙𝑛 to
cluster 𝑧𝑛
Total “distortion”
or reconstruction
error
Replacing the ℓ2 (Euclidean)
squared by ℓ1 distances givesn the
K-medoids algorithm (more robust
to outliers)
CS771: Intro to ML
Optimizing the K-means Loss Function
14
 So the 𝐾-means problem is
 Can’t optimize it jointly for 𝒁 and 𝝁. Let’s try alternating optimization for
𝒁 and 𝝁
CS771: Intro to ML
Solving for Z
15
 Solving for 𝒁 with 𝝁 fixed at 𝝁
 Still not easy. 𝒁 is discrete and above is an NP-hard problem
 Combinatorial optimization: 𝐾𝑁
possibilities for 𝒁 (𝑁 × 𝐾 matrix with one-hot rows)
 Greedy approach: Optimize 𝒁 one row (𝑧𝑛) at a time keeping all others
𝑧𝑛’s (and the cluster means 𝜇1, 𝜇2, … , 𝜇𝐾) fixed
 Easy to see that this is minimized by assigning 𝒙𝑛 to the closest mean
 This is exactly what the 𝐾-means (Lloyd’s) algo does!
CS771: Intro to ML
Solving for 𝝁
16
 Solving for 𝝁 with 𝒁 fixed at 𝒁
 Not difficult to solve (each 𝜇𝑘 is a real-valued vector, can optimize easily)
 Note that each 𝜇𝑘 can be optimized for independently
 (Verify) This is minimized by setting 𝜇𝑘 to be mean of points currently in
cluster 𝑘
CS771: Intro to ML
Convergence of K-means algorithm
17
 Each step (updating 𝒁 or 𝝁) can never increase the 𝐾-means loss
 When we update 𝒁 from 𝒁(𝑡−1) to 𝒁(𝑡)
 When we update 𝝁 from 𝝁(𝑡−1) to 𝝁(𝑡)
 Thus the 𝐾-means algorithm monotonically decreases the objective
because
because
CS771: Intro to ML
K-means: Choosing K
18
 One way to select 𝐾 for the 𝐾-means algorithm is to try different values of
𝐾, plot the 𝐾-means objective versus 𝐾, and look at the “elbow-point”
 Can also information criterion such as AIC (Akaike Information
Criterion)
and choose K which gives the smallest AIC (small loss + large K values
penalized)
𝐾=6 is the elbow
point
CS771: Intro to ML
Coming up next
19
 Improvements to K-means
 Soft-assignments
 Handling complex cluster shapes (basic K-means assumes spherical
clusters)
 Evaluating clustering algorithms (how to evaluate without true labels)
 Probabilistic approaches to clustering

More Related Content

Similar to unsupervised learning k-means model.pptx

Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...butest
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysisguru_prasadg
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptxssuser2023c6
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptxGandhiMathy6
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERINGsingh7599
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clusteringMegha Sharma
 
lecture notes about decision tree. Its a very good
lecture notes about decision tree. Its a very goodlecture notes about decision tree. Its a very good
lecture notes about decision tree. Its a very goodranjankumarbehera14
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1
 
Cs229 notes7a
Cs229 notes7aCs229 notes7a
Cs229 notes7aVuTran231
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfSowmyaJyothi3
 
Lecture 17: Supervised Learning Recap
Lecture 17: Supervised Learning RecapLecture 17: Supervised Learning Recap
Lecture 17: Supervised Learning Recapbutest
 

Similar to unsupervised learning k-means model.pptx (20)

3.SVM.pdf
3.SVM.pdf3.SVM.pdf
3.SVM.pdf
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
My8clst
My8clstMy8clst
My8clst
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Clustering ppt
Clustering pptClustering ppt
Clustering ppt
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptx
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
lecture notes about decision tree. Its a very good
lecture notes about decision tree. Its a very goodlecture notes about decision tree. Its a very good
lecture notes about decision tree. Its a very good
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Cs229 notes7a
Cs229 notes7aCs229 notes7a
Cs229 notes7a
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Lecture 17: Supervised Learning Recap
Lecture 17: Supervised Learning RecapLecture 17: Supervised Learning Recap
Lecture 17: Supervised Learning Recap
 

More from NaveenKumar5162

8086 microprocessor assembler directives.ppt
8086   microprocessor assembler directives.ppt8086   microprocessor assembler directives.ppt
8086 microprocessor assembler directives.pptNaveenKumar5162
 
dimentionality reduction data compression
dimentionality reduction data compressiondimentionality reduction data compression
dimentionality reduction data compressionNaveenKumar5162
 
Adapt-Resona-Theory, adaptive resonnance theorey
Adapt-Resona-Theory, adaptive resonnance theoreyAdapt-Resona-Theory, adaptive resonnance theorey
Adapt-Resona-Theory, adaptive resonnance theoreyNaveenKumar5162
 
I B TECH II SEM TIME TABLES(2023-24).pdf
I B TECH II SEM TIME TABLES(2023-24).pdfI B TECH II SEM TIME TABLES(2023-24).pdf
I B TECH II SEM TIME TABLES(2023-24).pdfNaveenKumar5162
 
image segmentation image segmentation.pptx
image segmentation image segmentation.pptximage segmentation image segmentation.pptx
image segmentation image segmentation.pptxNaveenKumar5162
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapterNaveenKumar5162
 
image enhancement image enhancement imag
image enhancement image enhancement imagimage enhancement image enhancement imag
image enhancement image enhancement imagNaveenKumar5162
 

More from NaveenKumar5162 (7)

8086 microprocessor assembler directives.ppt
8086   microprocessor assembler directives.ppt8086   microprocessor assembler directives.ppt
8086 microprocessor assembler directives.ppt
 
dimentionality reduction data compression
dimentionality reduction data compressiondimentionality reduction data compression
dimentionality reduction data compression
 
Adapt-Resona-Theory, adaptive resonnance theorey
Adapt-Resona-Theory, adaptive resonnance theoreyAdapt-Resona-Theory, adaptive resonnance theorey
Adapt-Resona-Theory, adaptive resonnance theorey
 
I B TECH II SEM TIME TABLES(2023-24).pdf
I B TECH II SEM TIME TABLES(2023-24).pdfI B TECH II SEM TIME TABLES(2023-24).pdf
I B TECH II SEM TIME TABLES(2023-24).pdf
 
image segmentation image segmentation.pptx
image segmentation image segmentation.pptximage segmentation image segmentation.pptx
image segmentation image segmentation.pptx
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
image enhancement image enhancement imag
image enhancement image enhancement imagimage enhancement image enhancement imag
image enhancement image enhancement imag
 

Recently uploaded

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 

Recently uploaded (20)

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 

unsupervised learning k-means model.pptx

  • 1. Unsupervised Learning: k-means CS771: Introduction to Machine Learning Nisheeth
  • 2. CS771: Intro to ML Unsupervised Learning 2  It’s about learning interesting/useful structures in the data (unsupervisedly!)  There is no supervision (no labels/responses), only inputs 𝒙1, 𝒙2, … , 𝒙𝑁  Some examples of unsupervised learning  Clustering: Grouping similar inputs together (and dissimilar ones far apart)  Dimensionality Reduction: Reducing the data dimensionality  Estimating the probability density of data (which distribution “generated” the data)  Most unsup. learning algos can also be seen as learning a new representation of data  For example, hard clustering can be used to get a one-hot representation Already looked at some basic approaches(e.g., estimating a Gaussian given observed data) Each point belongs deterministically to a single cluster In contrast, there is also soft/probabilistic clustering in which 𝒛𝑛 will be be probability vector that sums to 1(will see later)
  • 3. CS771: Intro to ML Clustering 3  Given: 𝑁 unlabeled examples 𝒙1, 𝒙2, … , 𝒙𝑁 ; desired no. of partitions 𝐾  Goal: Group the examples into 𝐾 “homogeneous” partitions  Loosely speaking, it is classification without ground truth labels  A good clustering is one that achieves  High within-cluster similarity  Low inter-cluster similarity In some cases, we may not know the right number of clusters in the data and may want to learn that (technique exists for doing this but beyond the scope)
  • 4. CS771: Intro to ML Similarity can be Subjective 4  Clustering only looks at similarities b/w inputs, since no labels are given  Without labels, similarity can be hard to define  Thus using the right distance/similarity is very important in clustering  In some sense, related to asking: “Clustering based on what”? Picture courtesy: http://www.guy-sports.com/humor/videos/powerpoint presentation dogs.htm
  • 5. CS771: Intro to ML Clustering: Some Examples 5  Document/Image/Webpage Clustering  Image Segmentation (clustering pixels)  Clustering web-search results  Clustering (people) nodes in (social) networks/graphs  .. and many more.. Picture courtesy: http://people.cs.uchicago.edu/∼pff/segment/
  • 6. CS771: Intro to ML Types of Clustering 6  Flat or Partitional clustering  Partitions are independent of each other  Hierarchical clustering  Partitions can be visualized using a tree structure (a dendrogram) In hierarchical clustering, we can look at a clustering for any given number of cluster by “cutting the dendrogram at an appropriate level (so 𝐾 does not have to be specified) Hierarchical clustering gives us a clustering at multiple levels of granularity
  • 7. CS771: Intro to ML Flat Clustering: K-means algorithm (Lloyd, 1957) 7  Input: 𝑁 unlabeled examples 𝒙1, 𝒙2, … , 𝒙𝑁; 𝒙𝑛 ∈ ℝ𝐷; desired no. of partitions 𝐾  Desired Output: Cluster assignments of these 𝑁 examples and 𝐾 cluster means  Initialize: The 𝐾 cluster means denoted by 𝝁1, 𝝁2, … , 𝝁𝐾; with each 𝝁𝑘 ∈ ℝ𝐷  Usually initialized randomly, but good initialization is crucial; many smarter initialization heuristics exist (e.g., K-means++, Arthur & Vassilvitskii, 2007)  Iterate:  (Re)-Assign each input 𝑥𝑛 to its closest cluster center (based on the smallest Eucl. distance)  Update the cluster means 𝒞𝑘: Set of examples assigned to cluster 𝑘 with center 𝝁𝑘 Some ways to declare convergence if between two successive iterations: - Cluster means don’t change - Cluster assignments don’t change - Clustering “loss” doesn’t change by much
  • 8. CS771: Intro to ML K-means algorithm: Summarized Another Way 8  Notation: 𝑧𝑛 ∈ 1,2, … , 𝐾 or 𝒛𝑛 is a 𝐾-dim one-hot vector  (𝑧𝑛𝑘 = 1 and 𝑧𝑛 = 𝑘 mean the same) This basic 𝐾-means models each cluster by a single mean 𝜇𝑘. Ignores size/shape of clusters Can be fixed by modeling each cluster by a probability distribution, such as Gaussian (e.g., Gaussian Mixture Model; will see later) 𝐾-means algo can also be seen as doing a compression by “quantization”: Representing each of the 𝑁 inputs by one of the 𝐾 < 𝑁 means
  • 9. CS771: Intro to ML K-means: An Illustration 9
  • 10. CS771: Intro to ML K-means = LwP with Unlabeled Data 10  Guess the means  Predict the labels (cluster ids)  Recompute the means using these predicted labels  Repeat until not converged
  • 11. CS771: Intro to ML The K-means Algorithm: Some Comments 11  One of the most popular clustering algorithms  Very widely used, guaranteed to converge (to a local minima; will see a proof)  Can also be used as a sub-routine in graph clustering (in the Spectral Clustering algorithm): Inputs are given as an 𝑁 × 𝑁 adjacency matrix A (𝐴𝑛𝑚 = 0/1)  Perform a spectral decomposition of the graph Laplacian of A to get F (an 𝑁 × 𝐾 matrix)  Run 𝐾-means using rows of the F matrix as the inputs  Has some shortcomings but can be improved upon, e.g.,  Can be kernelized (using kernels or using kernel-based landmarks/random features)  More flexible cluster sizes/shapes via probabilistic models (e.g., every cluster is a Gaussian)  Soft-clustering (fractional/probabilistic memberships): 𝑧𝑛 is not one-hot but a
  • 12. CS771: Intro to ML What Loss Function is K-means Optimizing? 12  Let 𝝁1, 𝝁2, … , 𝝁𝐾 be the K cluster centroids/means  Let 𝑧𝑛𝑘 ∈ {0, 1} be s.t. 𝑧𝑛𝑘 = 1 if 𝒙𝑛 belongs to cluster 𝑘, and 0 otherwise  Define the distortion or “loss” for the cluster assignment of 𝒙𝑛  Total distortion over all points defines the 𝐾-means “loss function”  The 𝐾-means problem is to minimize this objective w.r.t. µ and 𝒁  Alternating optimization on this loss would give the K-means (Lloyd’s) algorithm we X Z 𝝁 N K K K ≈ Row 𝑛 is 𝒛𝑛 (one-hot vector) Row 𝑘 is 𝝁𝑘 𝒛𝑛 = [𝑧𝑛1, 𝑧𝑛2, … , 𝑧𝑛𝐾] denotes a length 𝐾 one- hot encoding of 𝒙𝑛
  • 13. CS771: Intro to ML K-means Loss: Several Forms, Same Meaning! 13  Notation: 𝑿 is 𝑁 × 𝐷  𝒁 is 𝑁 × 𝐾(each row is a one-hot 𝑧𝑛 or equivalently 𝑧𝑛 ∈ {1,2, … , 𝐾})  𝝁 is K × D (each row is a 𝜇𝑘)  Note: Most unsup. learning algos try to minimize a distortion or recon error Distortion on assigning 𝒙𝑛 to cluster 𝑧𝑛 Total “distortion” or reconstruction error Replacing the ℓ2 (Euclidean) squared by ℓ1 distances givesn the K-medoids algorithm (more robust to outliers)
  • 14. CS771: Intro to ML Optimizing the K-means Loss Function 14  So the 𝐾-means problem is  Can’t optimize it jointly for 𝒁 and 𝝁. Let’s try alternating optimization for 𝒁 and 𝝁
  • 15. CS771: Intro to ML Solving for Z 15  Solving for 𝒁 with 𝝁 fixed at 𝝁  Still not easy. 𝒁 is discrete and above is an NP-hard problem  Combinatorial optimization: 𝐾𝑁 possibilities for 𝒁 (𝑁 × 𝐾 matrix with one-hot rows)  Greedy approach: Optimize 𝒁 one row (𝑧𝑛) at a time keeping all others 𝑧𝑛’s (and the cluster means 𝜇1, 𝜇2, … , 𝜇𝐾) fixed  Easy to see that this is minimized by assigning 𝒙𝑛 to the closest mean  This is exactly what the 𝐾-means (Lloyd’s) algo does!
  • 16. CS771: Intro to ML Solving for 𝝁 16  Solving for 𝝁 with 𝒁 fixed at 𝒁  Not difficult to solve (each 𝜇𝑘 is a real-valued vector, can optimize easily)  Note that each 𝜇𝑘 can be optimized for independently  (Verify) This is minimized by setting 𝜇𝑘 to be mean of points currently in cluster 𝑘
  • 17. CS771: Intro to ML Convergence of K-means algorithm 17  Each step (updating 𝒁 or 𝝁) can never increase the 𝐾-means loss  When we update 𝒁 from 𝒁(𝑡−1) to 𝒁(𝑡)  When we update 𝝁 from 𝝁(𝑡−1) to 𝝁(𝑡)  Thus the 𝐾-means algorithm monotonically decreases the objective because because
  • 18. CS771: Intro to ML K-means: Choosing K 18  One way to select 𝐾 for the 𝐾-means algorithm is to try different values of 𝐾, plot the 𝐾-means objective versus 𝐾, and look at the “elbow-point”  Can also information criterion such as AIC (Akaike Information Criterion) and choose K which gives the smallest AIC (small loss + large K values penalized) 𝐾=6 is the elbow point
  • 19. CS771: Intro to ML Coming up next 19  Improvements to K-means  Soft-assignments  Handling complex cluster shapes (basic K-means assumes spherical clusters)  Evaluating clustering algorithms (how to evaluate without true labels)  Probabilistic approaches to clustering