SlideShare a Scribd company logo
BAS 250
Lesson 3: K-Means Clustering
 Effectively employ the CRISP-DM method
 Develop a k-means cluster data mining model
 Interpret output generated by model
This Week’s Learning Objectives
 Explain what k-Means clusters are, how they are
found, and their benefits
 Demonstrate the necessary format for data in
order to create k-Means clusters
 Interpret the clusters generated by a k-Means
model and explain their significance, if any
K-Means Clustering
 Clustering means: “Grouping of data or dividing a large data set
into smaller data sets of some similarity”
 The “k” in k-Means clustering stands for some number of groups, or
clusters – You can control over these (supervised learning).
 Enables the user to define natural groups between data sets by
comparing the means of their individual attribute values
 Means are susceptible to undue influence by extreme outliers, so
watching for inconsistent data is very important with k-Means
K-Means Clustering
 k-Means algorithm samples observations and then compares the
other attributes in the data set to that sample’s means
 Process is repeated in order to ‘circle-in’ on the best matches and
then formulate groups of observations which become clusters as
the means become more and more similar
 Sometimes takes a while to run, especially if using a large number
of ‘max runs’ or seeking a large number of clusters (k)
K-Means Clustering
K-Means Clustering
 For every business problem going forward, you will work
through the CRISP-DM method to complete your work.
K-Means Clustering: CRISP-DM
 Context:
o You work for a major health insurance provider and have been
asked to create a weight and cholesterol management program
for policy holders to reduce policy payouts due to heart disease.
You have a limited budget to communicate with potential policy
holders who would benefit from such a program. Your message
must be targeted to those who are most at risk for heart disease
due to weight issues and high cholesterol.
K-Means Clustering
 Business Understanding:
o You will need to search through thousands of
policy holders to find groups of people with similar
characteristics and develop programs and
communications that will be relevant to these
groups.
K-Means Clustering
 Data Understanding:
o Instead of searching thousands of policy holders, you have access to a clean sample of
roughly 550. There are 3 attributes. Each row is a policy holder. If gender = 1, then male. If
gender = 0, then female.

K-Means Clustering
 Data Preparation:
o None of the values seem to be inconsistent.
 No missing values and the standard deviations are reasonable.
K-Means Clustering
 Data Modeling:
o We will use k-means clustering to determine the natural
groups. We will not be predicting who will have heart
disease, as k-means is not predictive.
o We want to know more than 2 clusters (high and low risk of
heart disease), as there are likely a number of different
types of groups.
o For this exercise, we will use 4 potential groups.
K-Means Clustering
K-Means Clustering
Observations:
• Clusters are fairly
balanced.
• We will keep these
groups for
evaluation.
 Once we run the cluster process…below is the
output.
K-Means Clustering
Evaluation:
 The Centroid Table shows the means for each attribute in the four (k)
clusters
K-Means Clustering
 To see who these policy holders are, you can see more details by
going to “Folder View” and clicking on “cluster_3”.
 Observation #6 refers to a policy holder’s information.
K-Means Clustering
Deployment:
 To deploy the information from the analysis, we will go back to the design tab…
Add a filter process
to choose only
“cluster_3” using the
attribute_value_filter
K-Means Clustering
Deployment:
The results from filtering on cluster_3…
K-Means Clustering
Deployment:
You can now go back to your company’s database and issue a
SQL query to pull all records…
SELECT First_Name, Last_Name, Policy_Num, Address, Phone_Num
FROM PolicyHolders_view
WHERE Weight >= 167
AND Cholesterol >= 204
AND Gender = 1;
K-Means Clustering
Deployment:
 By targeting the highest risk of heart disease group, you can reduce
the payouts, thus increasing profits for your company.
Note: Your next targeted communication would have been “cluster_2”. This
group is women with a high risk of heart disease. There the message may be
communicated differently from men.
K-Means Clustering
 k-Means does not necessarily predict values, it simply
takes known indicators from the attributes in a data set
and groups them together based on those attributes’
similarity to group averages
 It helps the user to understand where one group
begins and the other ends- in other words, where the
natural breaks occur between groups in a data set
K-Means Clustering: Summary
 Effectively employ the CRISP-DM method
 Develop a k-means cluster data mining model
 Interpret output generated by models
Summary - Learning Objectives
“This workforce solution was funded by a grant awarded by the U.S. Department of Labor’s
Employment and Training Administration. The solution was created by the grantee and does not
necessarily reflect the official position of the U.S. Department of Labor. The Department of Labor
makes no guarantees, warranties, or assurances of any kind, express or implied, with respect to such
information, including any information on linked sites and including, but not limited to, accuracy of the
information or its completeness, timeliness, usefulness, adequacy, continued availability, or ownership.”
Except where otherwise stated, this work by Wake Technical Community College Building Capacity in
Business Analytics, a Department of Labor, TAACCCT funded project, is licensed under the Creative
Commons Attribution 4.0 International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by/4.0/
Copyright Information

More Related Content

What's hot

K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
Edureka!
 
Classification modelling review
Classification modelling reviewClassification modelling review
Classification modelling review
Jaideep Adusumelli
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Suchismita Prusty
 
Symbology and Classifying data in ARC GIS
Symbology and Classifying data in ARC GISSymbology and Classifying data in ARC GIS
Symbology and Classifying data in ARC GIS
KU Leuven
 
Data mining
Data miningData mining
Data mining
EmaSushan
 
Classification vs clustering
Classification vs clusteringClassification vs clustering
Classification vs clustering
Khadija Parween
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
02 Related Concepts
02 Related Concepts02 Related Concepts
02 Related Concepts
Valerii Klymchuk
 
San Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contestSan Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contest
Sameer Darekar
 
Protection models
Protection modelsProtection models
Protection models
G Prachi
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
Nguyen Ngoc Binh Phuong
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
Valerii Klymchuk
 
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Edureka!
 

What's hot (14)

K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
 
Classification modelling review
Classification modelling reviewClassification modelling review
Classification modelling review
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Symbology and Classifying data in ARC GIS
Symbology and Classifying data in ARC GISSymbology and Classifying data in ARC GIS
Symbology and Classifying data in ARC GIS
 
Data mining
Data miningData mining
Data mining
 
Classification vs clustering
Classification vs clusteringClassification vs clustering
Classification vs clustering
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
02 Related Concepts
02 Related Concepts02 Related Concepts
02 Related Concepts
 
San Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contestSan Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contest
 
Competition16
Competition16Competition16
Competition16
 
Protection models
Protection modelsProtection models
Protection models
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
 

Similar to BAS 250 Lecture 3

For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptx
SureshPolisetty2
 
Customer segmentation.pptx
Customer segmentation.pptxCustomer segmentation.pptx
Customer segmentation.pptx
Addalashashikumar
 
Clustering & classification
Clustering & classificationClustering & classification
Clustering & classification
Jamshed Khan
 
Cluster analysis in R by Aman Chauhan
Cluster analysis in R by Aman ChauhanCluster analysis in R by Aman Chauhan
Cluster analysis in R by Aman Chauhan
Aman Chauhan
 
AlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptxAlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptx
PerumalPitchandi
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
Saleesh Satheeshchandran
 
Cluster analysis (2).docx
Cluster analysis (2).docxCluster analysis (2).docx
Cluster analysis (2).docx
YaseenRashid4
 
CS8091_BDA_Unit_II_Clustering
CS8091_BDA_Unit_II_ClusteringCS8091_BDA_Unit_II_Clustering
CS8091_BDA_Unit_II_Clustering
Palani Kumar
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
Boston Institute of Analytics
 
3 classification
3  classification3  classification
3 classification
Mahmoud Alfarra
 
K-Means Clustering Explained_ Algorithm And Sklearn Implementation _ by Mariu...
K-Means Clustering Explained_ Algorithm And Sklearn Implementation _ by Mariu...K-Means Clustering Explained_ Algorithm And Sklearn Implementation _ by Mariu...
K-Means Clustering Explained_ Algorithm And Sklearn Implementation _ by Mariu...
christopher corlett
 
Cluster2
Cluster2Cluster2
Cluster2work
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
IJERA Editor
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Boston Institute of Analytics
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
Rupak Roy
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
Classification
ClassificationClassification
Classification
thamizh arasi
 

Similar to BAS 250 Lecture 3 (20)

For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptx
 
Customer segmentation.pptx
Customer segmentation.pptxCustomer segmentation.pptx
Customer segmentation.pptx
 
Clustering & classification
Clustering & classificationClustering & classification
Clustering & classification
 
Cluster analysis in R by Aman Chauhan
Cluster analysis in R by Aman ChauhanCluster analysis in R by Aman Chauhan
Cluster analysis in R by Aman Chauhan
 
AlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptxAlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptx
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Cluster analysis (2).docx
Cluster analysis (2).docxCluster analysis (2).docx
Cluster analysis (2).docx
 
CS8091_BDA_Unit_II_Clustering
CS8091_BDA_Unit_II_ClusteringCS8091_BDA_Unit_II_Clustering
CS8091_BDA_Unit_II_Clustering
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
3 classification
3  classification3  classification
3 classification
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
K-Means Clustering Explained_ Algorithm And Sklearn Implementation _ by Mariu...
K-Means Clustering Explained_ Algorithm And Sklearn Implementation _ by Mariu...K-Means Clustering Explained_ Algorithm And Sklearn Implementation _ by Mariu...
K-Means Clustering Explained_ Algorithm And Sklearn Implementation _ by Mariu...
 
DataMining_CA2-4
DataMining_CA2-4DataMining_CA2-4
DataMining_CA2-4
 
Cluster2
Cluster2Cluster2
Cluster2
 
Clustering
ClusteringClustering
Clustering
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
Classification
ClassificationClassification
Classification
 

More from Wake Tech BAS

BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
Wake Tech BAS
 
BAS 250 Lecture 5
BAS 250 Lecture 5BAS 250 Lecture 5
BAS 250 Lecture 5
Wake Tech BAS
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
Wake Tech BAS
 
BAS 250 Lecture 2
BAS 250 Lecture 2BAS 250 Lecture 2
BAS 250 Lecture 2
Wake Tech BAS
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
Wake Tech BAS
 
BAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 LectureBAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 Lecture
Wake Tech BAS
 
BAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 LectureBAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 Lecture
Wake Tech BAS
 
BAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 LectureBAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 Lecture
Wake Tech BAS
 
BAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 LectureBAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 Lecture
Wake Tech BAS
 
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureBAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 Lecture
Wake Tech BAS
 
BAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 LectureBAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 Lecture
Wake Tech BAS
 
BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture
Wake Tech BAS
 
BAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 LectureBAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 Lecture
Wake Tech BAS
 

More from Wake Tech BAS (13)

BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
BAS 250 Lecture 5
BAS 250 Lecture 5BAS 250 Lecture 5
BAS 250 Lecture 5
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
 
BAS 250 Lecture 2
BAS 250 Lecture 2BAS 250 Lecture 2
BAS 250 Lecture 2
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 
BAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 LectureBAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 Lecture
 
BAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 LectureBAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 Lecture
 
BAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 LectureBAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 Lecture
 
BAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 LectureBAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 Lecture
 
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureBAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 Lecture
 
BAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 LectureBAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 Lecture
 
BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture
 
BAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 LectureBAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 Lecture
 

Recently uploaded

Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 

Recently uploaded (20)

Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 

BAS 250 Lecture 3

  • 1. BAS 250 Lesson 3: K-Means Clustering
  • 2.  Effectively employ the CRISP-DM method  Develop a k-means cluster data mining model  Interpret output generated by model This Week’s Learning Objectives
  • 3.  Explain what k-Means clusters are, how they are found, and their benefits  Demonstrate the necessary format for data in order to create k-Means clusters  Interpret the clusters generated by a k-Means model and explain their significance, if any K-Means Clustering
  • 4.  Clustering means: “Grouping of data or dividing a large data set into smaller data sets of some similarity”  The “k” in k-Means clustering stands for some number of groups, or clusters – You can control over these (supervised learning).  Enables the user to define natural groups between data sets by comparing the means of their individual attribute values  Means are susceptible to undue influence by extreme outliers, so watching for inconsistent data is very important with k-Means K-Means Clustering
  • 5.  k-Means algorithm samples observations and then compares the other attributes in the data set to that sample’s means  Process is repeated in order to ‘circle-in’ on the best matches and then formulate groups of observations which become clusters as the means become more and more similar  Sometimes takes a while to run, especially if using a large number of ‘max runs’ or seeking a large number of clusters (k) K-Means Clustering
  • 7.  For every business problem going forward, you will work through the CRISP-DM method to complete your work. K-Means Clustering: CRISP-DM
  • 8.  Context: o You work for a major health insurance provider and have been asked to create a weight and cholesterol management program for policy holders to reduce policy payouts due to heart disease. You have a limited budget to communicate with potential policy holders who would benefit from such a program. Your message must be targeted to those who are most at risk for heart disease due to weight issues and high cholesterol. K-Means Clustering
  • 9.  Business Understanding: o You will need to search through thousands of policy holders to find groups of people with similar characteristics and develop programs and communications that will be relevant to these groups. K-Means Clustering
  • 10.  Data Understanding: o Instead of searching thousands of policy holders, you have access to a clean sample of roughly 550. There are 3 attributes. Each row is a policy holder. If gender = 1, then male. If gender = 0, then female.  K-Means Clustering
  • 11.  Data Preparation: o None of the values seem to be inconsistent.  No missing values and the standard deviations are reasonable. K-Means Clustering
  • 12.  Data Modeling: o We will use k-means clustering to determine the natural groups. We will not be predicting who will have heart disease, as k-means is not predictive. o We want to know more than 2 clusters (high and low risk of heart disease), as there are likely a number of different types of groups. o For this exercise, we will use 4 potential groups. K-Means Clustering
  • 14. Observations: • Clusters are fairly balanced. • We will keep these groups for evaluation.  Once we run the cluster process…below is the output. K-Means Clustering
  • 15. Evaluation:  The Centroid Table shows the means for each attribute in the four (k) clusters K-Means Clustering
  • 16.  To see who these policy holders are, you can see more details by going to “Folder View” and clicking on “cluster_3”.  Observation #6 refers to a policy holder’s information. K-Means Clustering
  • 17. Deployment:  To deploy the information from the analysis, we will go back to the design tab… Add a filter process to choose only “cluster_3” using the attribute_value_filter K-Means Clustering
  • 18. Deployment: The results from filtering on cluster_3… K-Means Clustering
  • 19. Deployment: You can now go back to your company’s database and issue a SQL query to pull all records… SELECT First_Name, Last_Name, Policy_Num, Address, Phone_Num FROM PolicyHolders_view WHERE Weight >= 167 AND Cholesterol >= 204 AND Gender = 1; K-Means Clustering
  • 20. Deployment:  By targeting the highest risk of heart disease group, you can reduce the payouts, thus increasing profits for your company. Note: Your next targeted communication would have been “cluster_2”. This group is women with a high risk of heart disease. There the message may be communicated differently from men. K-Means Clustering
  • 21.  k-Means does not necessarily predict values, it simply takes known indicators from the attributes in a data set and groups them together based on those attributes’ similarity to group averages  It helps the user to understand where one group begins and the other ends- in other words, where the natural breaks occur between groups in a data set K-Means Clustering: Summary
  • 22.  Effectively employ the CRISP-DM method  Develop a k-means cluster data mining model  Interpret output generated by models Summary - Learning Objectives
  • 23. “This workforce solution was funded by a grant awarded by the U.S. Department of Labor’s Employment and Training Administration. The solution was created by the grantee and does not necessarily reflect the official position of the U.S. Department of Labor. The Department of Labor makes no guarantees, warranties, or assurances of any kind, express or implied, with respect to such information, including any information on linked sites and including, but not limited to, accuracy of the information or its completeness, timeliness, usefulness, adequacy, continued availability, or ownership.” Except where otherwise stated, this work by Wake Technical Community College Building Capacity in Business Analytics, a Department of Labor, TAACCCT funded project, is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ Copyright Information