SlideShare a Scribd company logo
MACHINE LEARNING Clustering
WHAT’S IN THE MENU - RECOMMENDATIONS
1. Why so popular
2. Supervised vs Unsupervised Learning
3. Topic2
4. Topic3
5. Topic4
6. Wrap-up
MACHINE LEARNING
http://videolectures.net/Top/Computer_Science/Machine_Learning/
WHY IS MACHINE LEARNING (CS 229) THE MOST
POPULAR COURSE AT STANFORD? - ANDREW NG
WHAT CAN YOU TELL ME ABOUT X?
Supervised vs unsupervised learning
Typical methods: regression and classification
Given an object with observed set of features X1, …., Xn
having an response Y, the goal is to predict Y using X1,
…., Xn
Typical methods: principal component analysis (PCA),
expectation maximization (EM) and clustering (k-means
and its variations)
Given an object with observed set of features X1, …., Xn,
the goal is to discover relationships or groups between
variables or observations. Clustering algorithms try to find
natural grouping in data and therefore similar datasets.
APPLICATIONS
Market segmentation : given market research results, how you can find the best
customer segments
Anomaly detection : find fraud, detect network attacks, or discover problems in
servers or other sensor-equipped machinery. Is important to be able to find new
types of anomalies that have never seen before.
Healthcare: accident prone factor of the area to hospital assignment, gene clustering
GROUPING UNLABELED ITEMS USING K-MEANS
CLUSTERING
SWAT
Strengths :
Will always converge
Scales well
Weakness :
Can converge at local minima
Slow on very large datasets
Choosing the wrong k
Advantages :
Easy to implement
GROUPING UNLABELED ITEMS USING K-MEANS
CLUSTERING
SIMILARITY
There are several ways on measuring similarity between observations.
Manhattan distance
Euclidian distance
Cosine distance
K-MEANS PSEUDO CODE
Randomly create k points for starting centroids
----------------------------------------------------------------
For every point assigned to a centroid
Calculate the distance between the centroid and point
Assign the point to the cluster with the lowest distance
----------------------------------------------------------------
For every cluster calculate the mean of the points in that cluster
Assign the centroid to the mean
While any point has changed cluster assignment
Repeat until convergence
Cluster assignment
step
Move centroid
step
COST FUNCTION & RANDOM INITIALIZATION
for i = 1 to 100 {
randomly initialize k-means
run k-means and get centroids positions c(1 to m) and µ(1 to K)
compute cost function J(c(1 to m), µ(1 to K))
}
Pick clustering that gave lowest J(c(1 to m), µ(1 to K))
Cluster assignment step: minimize J c(1 to m) while holding µ(1 to K) fixed
Move centroid step: minimize J with respect to µ(1 to K)
PERFORMANCE CONSIDERATION
K-means
The K-means has the computational complexity of O(iKnm),
i is the number of iterations,
K the number of clusters,
n the number of observations,
m the number of features.
Improvements:
•Reducing the average number of iterations.
•Parallel implementation of K-means by leveraging Hadoop or Spark.
•Reducing the number of outliers and possible features by noise filtering with a smoothing
algorithm.
•Decreasing the dimensions of the model.
FRAMEWORKS
Java : Weka, Mahout, spark
Python: scikit-learn, py-spark, Pylearn2 (Theano)
C ++: Shogun
.NET: Encog
https://github.com/josephmisiti/awesome-machine-learning
PLATFORMS - IBM BLUEMIX
PLATFORMS – MICROSOFT AZURE ML
REFERENCES
http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/
http://www-bcf.usc.edu/~gareth/ISL/
BOOKS

More Related Content

What's hot

Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
AlaaZ
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
taeseon ryu
 
Uninformed search
Uninformed searchUninformed search
Uninformed search
Bablu Shofi
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
Afzaal Subhani
 
Jarrar: Games
Jarrar: GamesJarrar: Games
Jarrar: Games
Mustafa Jarrar
 
K-Means clustring @jax
K-Means clustring @jaxK-Means clustring @jax
K-Means clustring @jax
Ajay Iet
 
Clustering
ClusteringClustering
Clustering
LipikaSaha2
 
K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++
sabbirantor
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
Megha Sharma
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
lakshmidkurup
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
Megha Sharma
 
K means clustering
K means clusteringK means clustering
K means clustering
Thomas K T
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
Prashanth Guntal
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
Anna Fensel
 
Clique and sting
Clique and stingClique and sting
Clique and sting
Subramanyam Natarajan
 
ANFIS
ANFISANFIS
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
Lino Possamai
 
Anfis (1)
Anfis (1)Anfis (1)
Anfis (1)
TarekBarhoum
 

What's hot (20)

Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
 
Uninformed search
Uninformed searchUninformed search
Uninformed search
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
Jarrar: Games
Jarrar: GamesJarrar: Games
Jarrar: Games
 
K-Means clustring @jax
K-Means clustring @jaxK-Means clustring @jax
K-Means clustring @jax
 
Clustering
ClusteringClustering
Clustering
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
Clique
Clique Clique
Clique
 
ANFIS
ANFISANFIS
ANFIS
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Anfis (1)
Anfis (1)Anfis (1)
Anfis (1)
 

Viewers also liked

Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
Nadeem Oozeer
 
Machine Learning and Data Mining: 06 Clustering: Introduction
Machine Learning and Data Mining: 06 Clustering: IntroductionMachine Learning and Data Mining: 06 Clustering: Introduction
Machine Learning and Data Mining: 06 Clustering: Introduction
Pier Luca Lanzi
 
Mahout and Distributed Machine Learning 101
Mahout and Distributed Machine Learning 101Mahout and Distributed Machine Learning 101
Mahout and Distributed Machine Learning 101
John Ternent
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
Sudhakar Chavan
 
Machine Learning and Data Mining: 06 Clustering: Partitioning
Machine Learning and Data Mining: 06 Clustering: PartitioningMachine Learning and Data Mining: 06 Clustering: Partitioning
Machine Learning and Data Mining: 06 Clustering: Partitioning
Pier Luca Lanzi
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
Dr.E.N.Sathishkumar
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain
 

Viewers also liked (8)

Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Clustering tutorial
Clustering tutorialClustering tutorial
Clustering tutorial
 
Machine Learning and Data Mining: 06 Clustering: Introduction
Machine Learning and Data Mining: 06 Clustering: IntroductionMachine Learning and Data Mining: 06 Clustering: Introduction
Machine Learning and Data Mining: 06 Clustering: Introduction
 
Mahout and Distributed Machine Learning 101
Mahout and Distributed Machine Learning 101Mahout and Distributed Machine Learning 101
Mahout and Distributed Machine Learning 101
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Machine Learning and Data Mining: 06 Clustering: Partitioning
Machine Learning and Data Mining: 06 Clustering: PartitioningMachine Learning and Data Mining: 06 Clustering: Partitioning
Machine Learning and Data Mining: 06 Clustering: Partitioning
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Similar to Machine learning hands on clustering

Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithms
Mark Moriarty
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
Alejandro Correa Bahnsen, PhD
 
ML basic & clustering
ML basic & clusteringML basic & clustering
ML basic & clustering
monalisa Das
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
YaswanthHariKumarVud
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
Mukul Kumar Singh Chauhan
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
Khalid Rabayah
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports Data
KathleneNgo
 
Clustering (from Google)
Clustering (from Google)Clustering (from Google)
Clustering (from Google)Sri Prasanna
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf
EmanAsem4
 
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptxANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
rinehi3578
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methods
ankit_ppt
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Arshad Farhad
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
mobius.cn
 
Parallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using openclParallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using opencl
eSAT Publishing House
 
Parallel knn on gpu architecture using opencl
Parallel knn on gpu architecture using openclParallel knn on gpu architecture using opencl
Parallel knn on gpu architecture using opencl
eSAT Journals
 
K-Means Clustering Simply
K-Means Clustering SimplyK-Means Clustering Simply
K-Means Clustering Simply
Emad Nabil
 

Similar to Machine learning hands on clustering (20)

Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithms
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
 
ML basic & clustering
ML basic & clusteringML basic & clustering
ML basic & clustering
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports Data
 
Clustering (from Google)
Clustering (from Google)Clustering (from Google)
Clustering (from Google)
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf
 
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptxANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methods
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
 
Parallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using openclParallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using opencl
 
Parallel knn on gpu architecture using opencl
Parallel knn on gpu architecture using openclParallel knn on gpu architecture using opencl
Parallel knn on gpu architecture using opencl
 
K-Means Clustering Simply
K-Means Clustering SimplyK-Means Clustering Simply
K-Means Clustering Simply
 

Recently uploaded

Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 

Recently uploaded (20)

Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 

Machine learning hands on clustering

  • 2. WHAT’S IN THE MENU - RECOMMENDATIONS 1. Why so popular 2. Supervised vs Unsupervised Learning 3. Topic2 4. Topic3 5. Topic4 6. Wrap-up
  • 4. WHY IS MACHINE LEARNING (CS 229) THE MOST POPULAR COURSE AT STANFORD? - ANDREW NG
  • 5. WHAT CAN YOU TELL ME ABOUT X? Supervised vs unsupervised learning Typical methods: regression and classification Given an object with observed set of features X1, …., Xn having an response Y, the goal is to predict Y using X1, …., Xn Typical methods: principal component analysis (PCA), expectation maximization (EM) and clustering (k-means and its variations) Given an object with observed set of features X1, …., Xn, the goal is to discover relationships or groups between variables or observations. Clustering algorithms try to find natural grouping in data and therefore similar datasets.
  • 6. APPLICATIONS Market segmentation : given market research results, how you can find the best customer segments Anomaly detection : find fraud, detect network attacks, or discover problems in servers or other sensor-equipped machinery. Is important to be able to find new types of anomalies that have never seen before. Healthcare: accident prone factor of the area to hospital assignment, gene clustering
  • 7. GROUPING UNLABELED ITEMS USING K-MEANS CLUSTERING SWAT Strengths : Will always converge Scales well Weakness : Can converge at local minima Slow on very large datasets Choosing the wrong k Advantages : Easy to implement
  • 8. GROUPING UNLABELED ITEMS USING K-MEANS CLUSTERING
  • 9. SIMILARITY There are several ways on measuring similarity between observations. Manhattan distance Euclidian distance Cosine distance
  • 10. K-MEANS PSEUDO CODE Randomly create k points for starting centroids ---------------------------------------------------------------- For every point assigned to a centroid Calculate the distance between the centroid and point Assign the point to the cluster with the lowest distance ---------------------------------------------------------------- For every cluster calculate the mean of the points in that cluster Assign the centroid to the mean While any point has changed cluster assignment Repeat until convergence Cluster assignment step Move centroid step
  • 11. COST FUNCTION & RANDOM INITIALIZATION for i = 1 to 100 { randomly initialize k-means run k-means and get centroids positions c(1 to m) and µ(1 to K) compute cost function J(c(1 to m), µ(1 to K)) } Pick clustering that gave lowest J(c(1 to m), µ(1 to K)) Cluster assignment step: minimize J c(1 to m) while holding µ(1 to K) fixed Move centroid step: minimize J with respect to µ(1 to K)
  • 12. PERFORMANCE CONSIDERATION K-means The K-means has the computational complexity of O(iKnm), i is the number of iterations, K the number of clusters, n the number of observations, m the number of features. Improvements: •Reducing the average number of iterations. •Parallel implementation of K-means by leveraging Hadoop or Spark. •Reducing the number of outliers and possible features by noise filtering with a smoothing algorithm. •Decreasing the dimensions of the model.
  • 13. FRAMEWORKS Java : Weka, Mahout, spark Python: scikit-learn, py-spark, Pylearn2 (Theano) C ++: Shogun .NET: Encog https://github.com/josephmisiti/awesome-machine-learning
  • 14. PLATFORMS - IBM BLUEMIX
  • 17. BOOKS