SlideShare a Scribd company logo
Machine Learning
Unsupervised Learning: Clustering
Peter Chen
HyerPlanar, Chief Data Scientist
Peter Chen
HyperPlanar, Chief Data
Scientist
•20 Years Industry Experience –
Quantitative Investment Management,
Startups, Retail, Consulting, Software,
Energy, etc.
•Petco, Sempra Energy, Mitchell
International, EMC, Vistage, etc.
• Analytics & Data Science consulting
across a number of industries & clients
•B.S. M.I.T. Graduate Degrees from
Harvard University
•What is Clustering?
•K-means Algorithms
•Gaussian Mixture Models
•Hierarchical Clustering
•Methods for Selecting the number of clusters
•Evaluating the Quality of the Clustering: Silhouette Plots
•Applications in Industry
Topics
Prerequisites
•Python Programming
oUnderstand and read basic Python code
oKnow pandas, numpy, matplotlib libraries
•Basic Mathematics
oBasic Probability
oStatistics
•Provides a basic conceptual understanding of how clustering works
•Provides intuitive understanding of the mathematics behind various
clustering algorithms
•Walk through Python code examples on how to use various cluster
algorithms
•Show how clustering is applied in various industry applications
What am I going to get from this course?
SECTION 1
Course Overview and Introductions
• What is Clustering?
• K- means clustering
• Gaussian Mixture Models (GMM)
• Hierarchical Clustering
• How to select the best number of clusters
• Industry Applications
Course Overview
Section Name here 8
• Grouping/Clustering is such a natural thing that humans do it
all the time!
•Kids separate Halloween candies by type
What is clustering?
Section Name here 9
• Biologists classify and group animals using
the system below
What is clustering?
Section Name here 10
• Watching tons of videos and instantly recognizing they are all different
types of cats. (Note: A nontrivial amount of time on the internet is spent
watching cat videos! )
What is clustering?
Section Name here 11
•Can we teach machines to do clustering in a somewhat
automated fashion??
•Can machine find groupings of things that are similar without us
explicitly telling it how to do it?
Big Question?
Section Name here 12
•Yes!
•Using the machine learning technique clustering, we can save
time and money.
•For example, supposed we have a million images in our
database and we want to automatically label them
•Hiring people to manually reviewing the million images would
be costly and time intensive
Big Answer
Section Name here 13
• We feed the clustering algorithm thousands and millions of
images and then we let it group/cluster/categorize them into
cats
Big Answer
Section Name here 14
• Create a group with high similarity among the members of the
cluster
•Create a group with significant dissimilarity(differences)
between members of two different clusters
More exact definition of Clustering
Section Name here 15
SECTION 2
K-Means Clustering
r
•Partitions the data set into k clusters
•Each data point belongs to the cluster with the nearest mean
K-means Clustering: High level Idea
Section Name here 17
r
•Create some sample data with 5 clusters
K-means Algorithm : Code samples
Section Name here 18
r
•Plot the 5 same clusters
K-means Algorithm: Code samples
Section Name here 19
r
•K-means automatically identifies the 5 clusters (color coded)
K-means Algorithm: Code samples
Section Name here 20
r
How does k-means
do that?
K-means Algorithm
Section Name here 21
r
• Pick some random cluster centers(k of them)
• Repeat until converged
Expectation step: Assign points to the nearest cluster
center
Maximization step: Set the cluster centers to the
mean
k-means Algorithm: Expectation-
Maximization
Section Name here 22
r
• Step 1: Decide the number of clusters(k)
• Step 2 : Assign randomly the cluster centers(centroid) for each cluster.
• Step 3: Calculate the distance of each observation from each cluster
•Step 4: Assign the observation to the cluster from which its distance is the least
•Step 5: Recalculate the cluster centroid using the mean of ALL the observations in the cluster
•Step 6: Repeat the process starting in Step 3
•Step 7: Stop if none of the observations were reassigned from one cluster to another.
k-means Algorithm: Step by Step
Section Name here 23
r
• We need to normalize our data points to get the clustering
right, because our input data is usually on different scales.
•Or in Python code:
•df_norm = (df-df.min()) / (df.max()-df.min())
k-means Algorithm: Normalizing
Section Name here 24
r
Similarity Measures
r
• We have to define the distance between two points
• Here are some popular distance measures:
1) Euclidean distance
2) Manhattan distance
3) Minkowski distance
k-means Algorithm: Similarity
measures
Section Name here 26
r
• Square distance between the two vectors
Similarity measures: Euclidean
distance
Section Name here 27
r
• Absolute distance between the two vectors
Similarity measures: Manhattan
distance
Section Name here 28
Absolute distance between the two vectors
r
• Generalized distance metric. p=1 then it becomes
Manhatthan distance, p=2 it’s Euclidean, etc.
Similarity measures: Minkowski
distance
Section Name here 29
r
• All Distance Metric just satisfy the following mathematical
properties:
• d(a, b) ≥ 0 Distance between 2 points must be non-negative
• d(a ,b) = 0 ↔ a= b Distance between 2 points is zero iff they are the same
point
• d(a, b) = d(b, a) Symmetry.
• d(a, c) ≤ d(a,b) + d(b, c) Triangle inequality. Shortest distance between
two points is a straight line.
Can we just invent any distance
metric??
Section Name here 30
r
Issues
r
•Global optimal results may not be achieved
•Number of clusters must be selected beforehand
•Limited to linear cluster boundaries (hard spherical boundaries)
•Can be slow for large numbers of samples
k-means Algorithm: Issues
Section Name here 32
r
• Please see Ipython “Clustering Mini-Project” notebook on the course website
Mini-Project: Complete Code Solution
Section Name here 33

More Related Content

What's hot

K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
Carlos Castillo (ChaTo)
 
Introduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnIntroduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-Learn
Amol Agrawal
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data
Krish_ver2
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learn
Matt Hagy
 
Example of iterative deepening search & bidirectional search
Example of iterative deepening search & bidirectional searchExample of iterative deepening search & bidirectional search
Example of iterative deepening search & bidirectional search
Abhijeet Agarwal
 
Hyperparameter Optimization with Hyperband Algorithm
Hyperparameter Optimization with Hyperband AlgorithmHyperparameter Optimization with Hyperband Algorithm
Hyperparameter Optimization with Hyperband Algorithm
Deep Learning Italia
 
DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering Types
Ashwin Shenoy M
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
K-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnK-means Clustering with Scikit-Learn
K-means Clustering with Scikit-Learn
Sarah Guido
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Unsupervised Learning
Unsupervised LearningUnsupervised Learning
Unsupervised Learning
SAHEEL FAL DESAI
 
Ashish garg research paper 660_CamReady
Ashish garg research paper 660_CamReadyAshish garg research paper 660_CamReady
Ashish garg research paper 660_CamReadyAshish Garg
 
Training machine learning knn 2017
Training machine learning knn 2017Training machine learning knn 2017
Training machine learning knn 2017
Iwan Sofana
 
Volume Rendering of Unstructured Tetrahedral Grids using Intel / nVidia OpenCL
Volume Rendering of Unstructured Tetrahedral Grids using Intel / nVidia OpenCLVolume Rendering of Unstructured Tetrahedral Grids using Intel / nVidia OpenCL
Volume Rendering of Unstructured Tetrahedral Grids using Intel / nVidia OpenCL
Nitesh Bhatia
 
Nearest neighbour algorithm
Nearest neighbour algorithmNearest neighbour algorithm
Nearest neighbour algorithm
Anmitas1
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means ClusteringJunghoon Kim
 
Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)
Benjamin Bengfort
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
Pravinkumar Landge
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
Subhas Kumar Ghosh
 

What's hot (20)

K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
Introduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnIntroduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-Learn
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learn
 
Example of iterative deepening search & bidirectional search
Example of iterative deepening search & bidirectional searchExample of iterative deepening search & bidirectional search
Example of iterative deepening search & bidirectional search
 
KNN
KNNKNN
KNN
 
Hyperparameter Optimization with Hyperband Algorithm
Hyperparameter Optimization with Hyperband AlgorithmHyperparameter Optimization with Hyperband Algorithm
Hyperparameter Optimization with Hyperband Algorithm
 
DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering Types
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
K-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnK-means Clustering with Scikit-Learn
K-means Clustering with Scikit-Learn
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Unsupervised Learning
Unsupervised LearningUnsupervised Learning
Unsupervised Learning
 
Ashish garg research paper 660_CamReady
Ashish garg research paper 660_CamReadyAshish garg research paper 660_CamReady
Ashish garg research paper 660_CamReady
 
Training machine learning knn 2017
Training machine learning knn 2017Training machine learning knn 2017
Training machine learning knn 2017
 
Volume Rendering of Unstructured Tetrahedral Grids using Intel / nVidia OpenCL
Volume Rendering of Unstructured Tetrahedral Grids using Intel / nVidia OpenCLVolume Rendering of Unstructured Tetrahedral Grids using Intel / nVidia OpenCL
Volume Rendering of Unstructured Tetrahedral Grids using Intel / nVidia OpenCL
 
Nearest neighbour algorithm
Nearest neighbour algorithmNearest neighbour algorithm
Nearest neighbour algorithm
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
 
Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 

Similar to Unsupervised Learning: Clustering

Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
Soumya Mukherjee
 
background.pptx
background.pptxbackground.pptx
background.pptx
KabileshCm
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
talktoharry
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
Poonam Kshirsagar
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
Damian R. Mingle, MBA
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
pradeep kumar
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
Madan Golla
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
IRJET Journal
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
Editor IJCATR
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...KamleshKumar394
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
QuantUniversity
 
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
QuantUniversity
 
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET Journal
 
K means report
K means reportK means report
K means report
Gaurav Handa
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
Manish Pandey
 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...butest
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
WeCloudData
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learning
Vivek Maskara
 

Similar to Unsupervised Learning: Clustering (20)

Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
 
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
 
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
 
K means report
K means reportK means report
K means report
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Lecture 1 (bce-7)
Lecture   1 (bce-7)Lecture   1 (bce-7)
Lecture 1 (bce-7)
 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learning
 

More from Experfy

Predictive Analytics and Modeling in Life Insurance
Predictive Analytics and Modeling in Life InsurancePredictive Analytics and Modeling in Life Insurance
Predictive Analytics and Modeling in Life Insurance
Experfy
 
Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...
Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...
Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...
Experfy
 
Graph Models for Deep Learning
Graph Models for Deep LearningGraph Models for Deep Learning
Graph Models for Deep Learning
Experfy
 
Apache HBase Crash Course - Quick Tutorial
Apache HBase Crash Course - Quick Tutorial Apache HBase Crash Course - Quick Tutorial
Apache HBase Crash Course - Quick Tutorial
Experfy
 
Machine Learning in AI
Machine Learning in AIMachine Learning in AI
Machine Learning in AI
Experfy
 
A Gentle Introduction to Genomics
A Gentle Introduction to GenomicsA Gentle Introduction to Genomics
A Gentle Introduction to Genomics
Experfy
 
A Comprehensive Guide to Insurance Technology - InsurTech
A Comprehensive Guide to Insurance Technology - InsurTechA Comprehensive Guide to Insurance Technology - InsurTech
A Comprehensive Guide to Insurance Technology - InsurTech
Experfy
 
Health Insurance 101
Health Insurance 101Health Insurance 101
Health Insurance 101
Experfy
 
Financial Derivatives
Financial Derivatives Financial Derivatives
Financial Derivatives
Experfy
 
AI for executives
AI for executives AI for executives
AI for executives
Experfy
 
Cloud Native Computing Foundation: How Virtualization and Containers are Chan...
Cloud Native Computing Foundation: How Virtualization and Containers are Chan...Cloud Native Computing Foundation: How Virtualization and Containers are Chan...
Cloud Native Computing Foundation: How Virtualization and Containers are Chan...
Experfy
 
Microsoft Azure Power BI
Microsoft Azure Power BIMicrosoft Azure Power BI
Microsoft Azure Power BI
Experfy
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
Experfy
 
Sales Forecasting
Sales ForecastingSales Forecasting
Sales Forecasting
Experfy
 
Uncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceUncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial Intelligence
Experfy
 
Introduction to Healthcare Analytics
Introduction to Healthcare Analytics Introduction to Healthcare Analytics
Introduction to Healthcare Analytics
Experfy
 
Blockchain Technology Fundamentals
Blockchain Technology FundamentalsBlockchain Technology Fundamentals
Blockchain Technology Fundamentals
Experfy
 
Data Quality: Are Your Data Suitable For Answering Your Questions? - Experfy ...
Data Quality: Are Your Data Suitable For Answering Your Questions? - Experfy ...Data Quality: Are Your Data Suitable For Answering Your Questions? - Experfy ...
Data Quality: Are Your Data Suitable For Answering Your Questions? - Experfy ...
Experfy
 
Apache Spark SQL- Installing Spark
Apache Spark SQL- Installing SparkApache Spark SQL- Installing Spark
Apache Spark SQL- Installing Spark
Experfy
 
Econometric Analysis | Methods and Applications
Econometric Analysis | Methods and ApplicationsEconometric Analysis | Methods and Applications
Econometric Analysis | Methods and Applications
Experfy
 

More from Experfy (20)

Predictive Analytics and Modeling in Life Insurance
Predictive Analytics and Modeling in Life InsurancePredictive Analytics and Modeling in Life Insurance
Predictive Analytics and Modeling in Life Insurance
 
Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...
Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...
Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...
 
Graph Models for Deep Learning
Graph Models for Deep LearningGraph Models for Deep Learning
Graph Models for Deep Learning
 
Apache HBase Crash Course - Quick Tutorial
Apache HBase Crash Course - Quick Tutorial Apache HBase Crash Course - Quick Tutorial
Apache HBase Crash Course - Quick Tutorial
 
Machine Learning in AI
Machine Learning in AIMachine Learning in AI
Machine Learning in AI
 
A Gentle Introduction to Genomics
A Gentle Introduction to GenomicsA Gentle Introduction to Genomics
A Gentle Introduction to Genomics
 
A Comprehensive Guide to Insurance Technology - InsurTech
A Comprehensive Guide to Insurance Technology - InsurTechA Comprehensive Guide to Insurance Technology - InsurTech
A Comprehensive Guide to Insurance Technology - InsurTech
 
Health Insurance 101
Health Insurance 101Health Insurance 101
Health Insurance 101
 
Financial Derivatives
Financial Derivatives Financial Derivatives
Financial Derivatives
 
AI for executives
AI for executives AI for executives
AI for executives
 
Cloud Native Computing Foundation: How Virtualization and Containers are Chan...
Cloud Native Computing Foundation: How Virtualization and Containers are Chan...Cloud Native Computing Foundation: How Virtualization and Containers are Chan...
Cloud Native Computing Foundation: How Virtualization and Containers are Chan...
 
Microsoft Azure Power BI
Microsoft Azure Power BIMicrosoft Azure Power BI
Microsoft Azure Power BI
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
 
Sales Forecasting
Sales ForecastingSales Forecasting
Sales Forecasting
 
Uncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceUncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial Intelligence
 
Introduction to Healthcare Analytics
Introduction to Healthcare Analytics Introduction to Healthcare Analytics
Introduction to Healthcare Analytics
 
Blockchain Technology Fundamentals
Blockchain Technology FundamentalsBlockchain Technology Fundamentals
Blockchain Technology Fundamentals
 
Data Quality: Are Your Data Suitable For Answering Your Questions? - Experfy ...
Data Quality: Are Your Data Suitable For Answering Your Questions? - Experfy ...Data Quality: Are Your Data Suitable For Answering Your Questions? - Experfy ...
Data Quality: Are Your Data Suitable For Answering Your Questions? - Experfy ...
 
Apache Spark SQL- Installing Spark
Apache Spark SQL- Installing SparkApache Spark SQL- Installing Spark
Apache Spark SQL- Installing Spark
 
Econometric Analysis | Methods and Applications
Econometric Analysis | Methods and ApplicationsEconometric Analysis | Methods and Applications
Econometric Analysis | Methods and Applications
 

Recently uploaded

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 

Recently uploaded (20)

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 

Unsupervised Learning: Clustering

  • 1.
  • 2. Machine Learning Unsupervised Learning: Clustering Peter Chen HyerPlanar, Chief Data Scientist
  • 3. Peter Chen HyperPlanar, Chief Data Scientist •20 Years Industry Experience – Quantitative Investment Management, Startups, Retail, Consulting, Software, Energy, etc. •Petco, Sempra Energy, Mitchell International, EMC, Vistage, etc. • Analytics & Data Science consulting across a number of industries & clients •B.S. M.I.T. Graduate Degrees from Harvard University
  • 4. •What is Clustering? •K-means Algorithms •Gaussian Mixture Models •Hierarchical Clustering •Methods for Selecting the number of clusters •Evaluating the Quality of the Clustering: Silhouette Plots •Applications in Industry Topics
  • 5. Prerequisites •Python Programming oUnderstand and read basic Python code oKnow pandas, numpy, matplotlib libraries •Basic Mathematics oBasic Probability oStatistics
  • 6. •Provides a basic conceptual understanding of how clustering works •Provides intuitive understanding of the mathematics behind various clustering algorithms •Walk through Python code examples on how to use various cluster algorithms •Show how clustering is applied in various industry applications What am I going to get from this course?
  • 7. SECTION 1 Course Overview and Introductions
  • 8. • What is Clustering? • K- means clustering • Gaussian Mixture Models (GMM) • Hierarchical Clustering • How to select the best number of clusters • Industry Applications Course Overview Section Name here 8
  • 9. • Grouping/Clustering is such a natural thing that humans do it all the time! •Kids separate Halloween candies by type What is clustering? Section Name here 9
  • 10. • Biologists classify and group animals using the system below What is clustering? Section Name here 10
  • 11. • Watching tons of videos and instantly recognizing they are all different types of cats. (Note: A nontrivial amount of time on the internet is spent watching cat videos! ) What is clustering? Section Name here 11
  • 12. •Can we teach machines to do clustering in a somewhat automated fashion?? •Can machine find groupings of things that are similar without us explicitly telling it how to do it? Big Question? Section Name here 12
  • 13. •Yes! •Using the machine learning technique clustering, we can save time and money. •For example, supposed we have a million images in our database and we want to automatically label them •Hiring people to manually reviewing the million images would be costly and time intensive Big Answer Section Name here 13
  • 14. • We feed the clustering algorithm thousands and millions of images and then we let it group/cluster/categorize them into cats Big Answer Section Name here 14
  • 15. • Create a group with high similarity among the members of the cluster •Create a group with significant dissimilarity(differences) between members of two different clusters More exact definition of Clustering Section Name here 15
  • 17. r •Partitions the data set into k clusters •Each data point belongs to the cluster with the nearest mean K-means Clustering: High level Idea Section Name here 17
  • 18. r •Create some sample data with 5 clusters K-means Algorithm : Code samples Section Name here 18
  • 19. r •Plot the 5 same clusters K-means Algorithm: Code samples Section Name here 19
  • 20. r •K-means automatically identifies the 5 clusters (color coded) K-means Algorithm: Code samples Section Name here 20
  • 21. r How does k-means do that? K-means Algorithm Section Name here 21
  • 22. r • Pick some random cluster centers(k of them) • Repeat until converged Expectation step: Assign points to the nearest cluster center Maximization step: Set the cluster centers to the mean k-means Algorithm: Expectation- Maximization Section Name here 22
  • 23. r • Step 1: Decide the number of clusters(k) • Step 2 : Assign randomly the cluster centers(centroid) for each cluster. • Step 3: Calculate the distance of each observation from each cluster •Step 4: Assign the observation to the cluster from which its distance is the least •Step 5: Recalculate the cluster centroid using the mean of ALL the observations in the cluster •Step 6: Repeat the process starting in Step 3 •Step 7: Stop if none of the observations were reassigned from one cluster to another. k-means Algorithm: Step by Step Section Name here 23
  • 24. r • We need to normalize our data points to get the clustering right, because our input data is usually on different scales. •Or in Python code: •df_norm = (df-df.min()) / (df.max()-df.min()) k-means Algorithm: Normalizing Section Name here 24
  • 26. r • We have to define the distance between two points • Here are some popular distance measures: 1) Euclidean distance 2) Manhattan distance 3) Minkowski distance k-means Algorithm: Similarity measures Section Name here 26
  • 27. r • Square distance between the two vectors Similarity measures: Euclidean distance Section Name here 27
  • 28. r • Absolute distance between the two vectors Similarity measures: Manhattan distance Section Name here 28 Absolute distance between the two vectors
  • 29. r • Generalized distance metric. p=1 then it becomes Manhatthan distance, p=2 it’s Euclidean, etc. Similarity measures: Minkowski distance Section Name here 29
  • 30. r • All Distance Metric just satisfy the following mathematical properties: • d(a, b) ≥ 0 Distance between 2 points must be non-negative • d(a ,b) = 0 ↔ a= b Distance between 2 points is zero iff they are the same point • d(a, b) = d(b, a) Symmetry. • d(a, c) ≤ d(a,b) + d(b, c) Triangle inequality. Shortest distance between two points is a straight line. Can we just invent any distance metric?? Section Name here 30
  • 32. r •Global optimal results may not be achieved •Number of clusters must be selected beforehand •Limited to linear cluster boundaries (hard spherical boundaries) •Can be slow for large numbers of samples k-means Algorithm: Issues Section Name here 32
  • 33. r • Please see Ipython “Clustering Mini-Project” notebook on the course website Mini-Project: Complete Code Solution Section Name here 33

Editor's Notes

  1. 1
  2. 7
  3. 8
  4. 9
  5. 10
  6. 11
  7. 12
  8. 13
  9. 14
  10. 15
  11. 16
  12. 17
  13. 18
  14. 19
  15. 20
  16. 21
  17. 22
  18. 23
  19. 24
  20. 25
  21. 26
  22. 27
  23. 28
  24. 29
  25. 30
  26. 31
  27. 32
  28. 33