SlideShare a Scribd company logo
Presentation on
Unsupervised Learning
ANKUSH PAL
MBA – 172
12019001001172
Supervised learning vs. unsupervised
learning
 Supervised learning: discover patterns in the data that relate data attributes
with a target (class) attribute.
 These patterns are then utilized to predict the values of the target attribute in
future data instances.
 Unsupervised learning: The data have no target attribute.
 We want to explore the data to find some intrinsic structures in them.
CS583, Bing Liu, UIC
2
K-means clustering
 K-means is a partitional clustering algorithm
 Let the set of data points (or instances) D be
{x1, x2, …, xn},
where xi = (xi1, xi2, …, xir) is a vector in a real-valued space X  Rr, and r is the
number of attributes (dimensions) in the data.
 The k-means algorithm partitions the given data into k clusters.
 Each cluster has a cluster center, called centroid.
 k is specified by the user
CS583, Bing Liu, UIC
3
K-means algorithm
 Given k, the k-means algorithm works as follows:
1) Randomly choose k data points (seeds) to be the initial centroids, cluster centers
2) Assign each data point to the closest centroid
3) Re-compute the centroids using the current cluster memberships.
4) If a convergence criterion is not met, go to 2).
CS583, Bing Liu, UIC
4
Data standardization
 In the Euclidean space, standardization of attributes is
recommended so that all attributes can have equal
impact on the computation of distances.
 Consider the following pair of data points
 xi: (0.1, 20) and xj: (0.9, 720).
 The distance is almost completely dominated by (720-
20) = 700.
 Standardize attributes: to force the attributes to have a
common value range
CS583, Bing Liu, UIC
5
,
700.000457
)
20
720
(
)
1
.
0
9
.
0
(
)
,
( 2
2





j
i
dist x
x
Interval-scaled attributes
 Their values are real numbers following a linear scale.
 The difference in Age between 10 and 20 is the same as that between 40 and 50.
 The key idea is that intervals keep the same importance through out the scale
 Two main approaches to standardize interval scaled attributes, range and z-
score. f is an attribute
CS583, Bing Liu, UIC
6
,
)
min(
)
max(
)
min(
)
(
f
f
f
x
x
range
if
if



Cluster Evaluation: hard problem
 The quality of a clustering is very hard to evaluate because
 We do not know the correct clusters
 Some methods are used:
 User inspection
 Study centroids, and spreads
 Rules from a decision tree.
 For text documents, one can read some documents in clusters.
CS583, Bing Liu, UIC
7
Cluster evaluation: ground truth
 We use some labeled data (for classification)
 Assumption: Each class is a cluster.
 After clustering, a confusion matrix is constructed. From the matrix, we
compute various measurements, entropy, purity, precision, recall and F-score.
 Let the classes in the data D be C = (c1, c2, …, ck). The clustering method produces k
clusters, which divides D into k disjoint subsets, D1, D2, …, Dk.
CS583, Bing Liu, UIC
8
Supervised learning for unsupervised
learning
 Decision tree algorithm is not directly applicable.
 it needs at least two classes of data.
 A clustering data set has no class label for each data point.
 The problem can be dealt with by a simple idea.
 Regard each point in the data set to have a class label Y.
 Assume that the data space is uniformly distributed with another
type of points, called non-existing points. We give them the
class, N.
 With the N points added, the problem of partitioning the
data space into data and empty regions becomes a
supervised classification problem.
CS583, Bing Liu, UIC
9
An example
CS583, Bing Liu, UIC
10
A decision tree method is used for partitioning in (B).
Building the Tree
 The main computation in decision tree building is to
evaluate entropy (for information gain):
 Can it be evaluated without adding N points? Yes.
 Pr(cj) is the probability of class cj in data set D, and |C| is
the number of classes, Y and N (2 classes).
 To compute Pr(cj), we only need the number of Y (data) points
and the number of N (non-existing) points.
 We already have Y (or data) points, and we can compute the
number of N points on the fly. Simple: as we assume that the N
points are uniformly distributed in the space.
CS583, Bing Liu, UIC
11
)
Pr(
log
)
Pr(
)
(
|
|
1
2 j
C
j
j c
c
D
entropy 



An example
 The space has 25 data (Y) points and 25 N points.
Assume the system is evaluating a possible cut S.
 # N points on the left of S is 25 * 4/10 = 10. The number of Y
points is 3.
 Likewise, # N points on the right of S is 15 (= 25 - 10).The
number of Y points is 22.
 With these numbers, entropy can be computed.
CS583, Bing Liu, UIC 12

More Related Content

What's hot

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
K means clustering
K means clusteringK means clustering
K means clustering
keshav goyal
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
Saad Elbeleidy
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
NAGUR SHAREEF SHAIK
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Prof. Neeta Awasthy
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
rameswara reddy venkat
 
Feature selection
Feature selectionFeature selection
Feature selection
Dong Guo
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
amalalhait
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
CloudxLab
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Krish_ver2
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
Mahbubur Rahman Shimul
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
Knoldus Inc.
 
Clustering
ClusteringClustering
Clustering
M Rizwan Aqeel
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Edureka!
 

What's hot (20)

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
Clustering
ClusteringClustering
Clustering
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
 

Similar to Presentation on unsupervised learning

CS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptCS583-unsupervised-learning.ppt
CS583-unsupervised-learning.ppt
HathiramN1
 
CS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learningCS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learning
ssuserb02eff
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clusteringIAEME Publication
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clusteringprjpublications
 
I1803026164
I1803026164I1803026164
I1803026164
IOSR Journals
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
midi
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
cscpconf
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
VIKASGUPTA127897
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCA
Editor Jacotech
 
Fuzzy c means_realestate_application
Fuzzy c means_realestate_applicationFuzzy c means_realestate_application
Fuzzy c means_realestate_applicationCemal Ardil
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptx
RohanBorgalli
 
20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf
MariaKhan905189
 
Reduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theoryReduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theory
csandit
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporaryprjpublications
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
SueMiu
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
yang947066
 
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdfDWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
ChristinaGayenMondal
 

Similar to Presentation on unsupervised learning (20)

Lect4
Lect4Lect4
Lect4
 
CS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptCS583-unsupervised-learning.ppt
CS583-unsupervised-learning.ppt
 
CS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learningCS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learning
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
I1803026164
I1803026164I1803026164
I1803026164
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCA
 
1376846406 14447221
1376846406  144472211376846406  14447221
1376846406 14447221
 
11 clusadvanced
11 clusadvanced11 clusadvanced
11 clusadvanced
 
Fuzzy c means_realestate_application
Fuzzy c means_realestate_applicationFuzzy c means_realestate_application
Fuzzy c means_realestate_application
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptx
 
20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf
 
Reduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theoryReduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theory
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporary
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdfDWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
 

More from ANKUSH PAL

Start up eco-system in india2
Start up eco-system in india2Start up eco-system in india2
Start up eco-system in india2
ANKUSH PAL
 
Is paperless office a reality
Is paperless office a realityIs paperless office a reality
Is paperless office a reality
ANKUSH PAL
 
Merger of public sector banks & it’s impact on private sector banks
Merger of public sector banks & it’s impact on private sector banksMerger of public sector banks & it’s impact on private sector banks
Merger of public sector banks & it’s impact on private sector banks
ANKUSH PAL
 
Low involvement purchase and brand personality
Low involvement purchase and brand personalityLow involvement purchase and brand personality
Low involvement purchase and brand personality
ANKUSH PAL
 
Idfc vishesh savings account
Idfc vishesh savings accountIdfc vishesh savings account
Idfc vishesh savings account
ANKUSH PAL
 
Traditional astronomy
Traditional astronomyTraditional astronomy
Traditional astronomy
ANKUSH PAL
 
Oral and written communication
Oral and written communicationOral and written communication
Oral and written communication
ANKUSH PAL
 
Greek astronomy
Greek  astronomyGreek  astronomy
Greek astronomy
ANKUSH PAL
 
Distance Education.ppt
Distance Education.pptDistance Education.ppt
Distance Education.ppt
ANKUSH PAL
 

More from ANKUSH PAL (9)

Start up eco-system in india2
Start up eco-system in india2Start up eco-system in india2
Start up eco-system in india2
 
Is paperless office a reality
Is paperless office a realityIs paperless office a reality
Is paperless office a reality
 
Merger of public sector banks & it’s impact on private sector banks
Merger of public sector banks & it’s impact on private sector banksMerger of public sector banks & it’s impact on private sector banks
Merger of public sector banks & it’s impact on private sector banks
 
Low involvement purchase and brand personality
Low involvement purchase and brand personalityLow involvement purchase and brand personality
Low involvement purchase and brand personality
 
Idfc vishesh savings account
Idfc vishesh savings accountIdfc vishesh savings account
Idfc vishesh savings account
 
Traditional astronomy
Traditional astronomyTraditional astronomy
Traditional astronomy
 
Oral and written communication
Oral and written communicationOral and written communication
Oral and written communication
 
Greek astronomy
Greek  astronomyGreek  astronomy
Greek astronomy
 
Distance Education.ppt
Distance Education.pptDistance Education.ppt
Distance Education.ppt
 

Recently uploaded

ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 

Presentation on unsupervised learning

  • 1. Presentation on Unsupervised Learning ANKUSH PAL MBA – 172 12019001001172
  • 2. Supervised learning vs. unsupervised learning  Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute.  These patterns are then utilized to predict the values of the target attribute in future data instances.  Unsupervised learning: The data have no target attribute.  We want to explore the data to find some intrinsic structures in them. CS583, Bing Liu, UIC 2
  • 3. K-means clustering  K-means is a partitional clustering algorithm  Let the set of data points (or instances) D be {x1, x2, …, xn}, where xi = (xi1, xi2, …, xir) is a vector in a real-valued space X  Rr, and r is the number of attributes (dimensions) in the data.  The k-means algorithm partitions the given data into k clusters.  Each cluster has a cluster center, called centroid.  k is specified by the user CS583, Bing Liu, UIC 3
  • 4. K-means algorithm  Given k, the k-means algorithm works as follows: 1) Randomly choose k data points (seeds) to be the initial centroids, cluster centers 2) Assign each data point to the closest centroid 3) Re-compute the centroids using the current cluster memberships. 4) If a convergence criterion is not met, go to 2). CS583, Bing Liu, UIC 4
  • 5. Data standardization  In the Euclidean space, standardization of attributes is recommended so that all attributes can have equal impact on the computation of distances.  Consider the following pair of data points  xi: (0.1, 20) and xj: (0.9, 720).  The distance is almost completely dominated by (720- 20) = 700.  Standardize attributes: to force the attributes to have a common value range CS583, Bing Liu, UIC 5 , 700.000457 ) 20 720 ( ) 1 . 0 9 . 0 ( ) , ( 2 2      j i dist x x
  • 6. Interval-scaled attributes  Their values are real numbers following a linear scale.  The difference in Age between 10 and 20 is the same as that between 40 and 50.  The key idea is that intervals keep the same importance through out the scale  Two main approaches to standardize interval scaled attributes, range and z- score. f is an attribute CS583, Bing Liu, UIC 6 , ) min( ) max( ) min( ) ( f f f x x range if if   
  • 7. Cluster Evaluation: hard problem  The quality of a clustering is very hard to evaluate because  We do not know the correct clusters  Some methods are used:  User inspection  Study centroids, and spreads  Rules from a decision tree.  For text documents, one can read some documents in clusters. CS583, Bing Liu, UIC 7
  • 8. Cluster evaluation: ground truth  We use some labeled data (for classification)  Assumption: Each class is a cluster.  After clustering, a confusion matrix is constructed. From the matrix, we compute various measurements, entropy, purity, precision, recall and F-score.  Let the classes in the data D be C = (c1, c2, …, ck). The clustering method produces k clusters, which divides D into k disjoint subsets, D1, D2, …, Dk. CS583, Bing Liu, UIC 8
  • 9. Supervised learning for unsupervised learning  Decision tree algorithm is not directly applicable.  it needs at least two classes of data.  A clustering data set has no class label for each data point.  The problem can be dealt with by a simple idea.  Regard each point in the data set to have a class label Y.  Assume that the data space is uniformly distributed with another type of points, called non-existing points. We give them the class, N.  With the N points added, the problem of partitioning the data space into data and empty regions becomes a supervised classification problem. CS583, Bing Liu, UIC 9
  • 10. An example CS583, Bing Liu, UIC 10 A decision tree method is used for partitioning in (B).
  • 11. Building the Tree  The main computation in decision tree building is to evaluate entropy (for information gain):  Can it be evaluated without adding N points? Yes.  Pr(cj) is the probability of class cj in data set D, and |C| is the number of classes, Y and N (2 classes).  To compute Pr(cj), we only need the number of Y (data) points and the number of N (non-existing) points.  We already have Y (or data) points, and we can compute the number of N points on the fly. Simple: as we assume that the N points are uniformly distributed in the space. CS583, Bing Liu, UIC 11 ) Pr( log ) Pr( ) ( | | 1 2 j C j j c c D entropy    
  • 12. An example  The space has 25 data (Y) points and 25 N points. Assume the system is evaluating a possible cut S.  # N points on the left of S is 25 * 4/10 = 10. The number of Y points is 3.  Likewise, # N points on the right of S is 15 (= 25 - 10).The number of Y points is 22.  With these numbers, entropy can be computed. CS583, Bing Liu, UIC 12