SlideShare a Scribd company logo
1 of 33
Download to read offline
Parallel K-Means
ncku Tien-Yang Wu
outline
Introduction
K-Means Algorithm
Parallel K-Means Based on MapReduce
Experimental Results
K-Means on spark
Introduction
They assume that all objects can reside in main
memory at the same time.
Their parallel systems have provided restricted
programming models.
Introduction
They assume that all objects can reside in main
memory at the same time.
Their parallel systems have provided restricted
programming models.
dataset oriented parallel clustering algorithms should be
developed.
K-Means Algorithm
K-Means Algorithm
Firstly, it randomly selects k objects from the whole objects
which represent initial cluster centers.
K-Means Algorithm
Each remaining object is assigned to the cluster to which it
is the most similar, based on the distance between the
object and the cluster center.
K-Means Algorithm
The new mean for each cluster is then calculated. This
process iterates until the criterion function converges.
Parallel K-Means Based
on MapReduce
most intensive calculation to occur is the calculation of
distances.
each iteration require nk distance
Parallel K-Means Based
on MapReduce
the distance computations between one object with the
centers is irrelevant to the distance computations
between other objects with the corresponding centers.
distance computations between different objects with
centers can be parallel executed.
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
data target
1,1
2,2
3,3
11,11
12,12
13,13
1 class
2 class
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
random two centroid
c1:(1,1)
c2:(11,11)
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
store two nodes
c1:(1,1)
c2:(11,11)
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
1,1
12,12
3,3
11,11
2,2
13,13
node1
node2
c1:(1,1)
c2:(11,11)
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
1,1
12,12
3,3
11,11
2,2
13,13
node1
node2
c1:(1,1)
c2:(11,11)
map
map
combine
combine
reduce
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
node1
map
3,3
c1:(1,1)
c2:(11,11)
assign to c1(1,1)
(1,1) , {(3,3),(3,3)}
key value
output<key,value>
Parallel K-Means Based
on MapReduce
(1,1) , {(3,3),(3,3)}
key value
centroid
temporary to calculate new centroid, the object
(1,1)
{(3,3),(3,3)}
output<key,value>
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
node1
map
c1:(1,1)
c2:(11,11)
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
11,11
2,2
13,13
node1
node2
c1:(1,1)
c2:(11,11)
map
map
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
(11,11) , {(11,11),(11,11)}
(1,1) , {(2,2),(2,2)}
key value
(11,11) , {(13,13),(13,13)}
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
node1 c1:(1,1)
c2:(11,11)
map
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
combine
Parallel K-Means Based
on MapReduce
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
combine
(1,1) , {(4,4),{(1,1),(3,3),2}
(11,11) , {(12,12),(12,12),1}
key value
same key combine
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(11,11) , {(12,12),(12,12),1}
key value
output<key,value>
centroid
temporary to calculate new centroid, the objects
,number of objects
(1,1)
{(4,4),{(1,1),(3,3)},2}
Parallel K-Means Based
on MapReduce
combine
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
(11,11) , {(11,11),(11,11)}
(1,1) , {(2,2),(2,2)}
key value
(11,11) , {(13,13),(13,13)}
combine
(1,1) , {(4,4),{(1,1),(3,3)},2}
(11,11) , {(12,12),(12,12),1}
key value
(1,1) , {(2,2),(2,2),1}
(11,11) , {(24,24),{(11,11),(13,13)},2}
key value
Parallel K-Means Based
on MapReduce
reduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(11,11) , {(12,12),(12,12),1}
key value
(1,1) , {(2,2),(2,2),1}
(11,11) , {(24,24),{(11,11),(13,13)},2}
key value
same key reduce
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(1,1) , {(2,2),(2,2),1}
reduce
same key reduce
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(1,1) , {(2,2),(2,2),1}
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
(4+2)/(2+1) ,(4+2)/(2+1) = 2,2
2,2 = new centroid
1,1
2,2
3,3
centroid is 2,2
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(1,1) , {(2,2),(2,2),1}
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
centroid
new centroid, the objects
,new cluster
Parallel K-Means Based
on MapReduce
reduce
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
(11,11) , {(12,12),{(11,11),(12,12),(13,13)}
update new centroid and next iteration
until converge or arrive to iteration number
Experimental Results
two 2.8 GHz cores and 4GB of memory
Experimental Results
Speedup
non linear,communication cost
Experimental Results
Scale up
performance
datasets 1GB 2GB 3GB 4GB
K-Means on spark
Reference
Parallel K-Means Clustering Based on MapReduce
Weizhong Zhao1,2, Huifang Ma1,2, and Qing He1
2009
K means algorithm

More Related Content

What's hot

Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...
CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...
CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...Giorgio Carbone
 
Generating functions solve recurrence
Generating functions solve recurrenceGenerating functions solve recurrence
Generating functions solve recurrenceHae Morgia
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsVarad Meru
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with PostgresEDB
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector MachinesCloudxLab
 
random forest regression
random forest regressionrandom forest regression
random forest regressionAkhilesh Joshi
 
PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuningFederico Campoli
 
Mixed Precision Training Review
Mixed Precision Training ReviewMixed Precision Training Review
Mixed Precision Training ReviewLEE HOSEONG
 
3.5 equivalence of pushdown automata and cfl
3.5 equivalence of pushdown automata and cfl3.5 equivalence of pushdown automata and cfl
3.5 equivalence of pushdown automata and cflSampath Kumar S
 

What's hot (20)

Bayes network
Bayes networkBayes network
Bayes network
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...
CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...
CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...
 
Generating functions solve recurrence
Generating functions solve recurrenceGenerating functions solve recurrence
Generating functions solve recurrence
 
K-Means manual work
K-Means manual workK-Means manual work
K-Means manual work
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
convex hull
convex hullconvex hull
convex hull
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with Postgres
 
Disjoint sets
Disjoint setsDisjoint sets
Disjoint sets
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
07 approximate inference in bn
07 approximate inference in bn07 approximate inference in bn
07 approximate inference in bn
 
Svm
SvmSvm
Svm
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
random forest regression
random forest regressionrandom forest regression
random forest regression
 
PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuning
 
KNN
KNNKNN
KNN
 
07. disjoint set
07. disjoint set07. disjoint set
07. disjoint set
 
Single linked list
Single linked listSingle linked list
Single linked list
 
Mixed Precision Training Review
Mixed Precision Training ReviewMixed Precision Training Review
Mixed Precision Training Review
 
3.5 equivalence of pushdown automata and cfl
3.5 equivalence of pushdown automata and cfl3.5 equivalence of pushdown automata and cfl
3.5 equivalence of pushdown automata and cfl
 

Viewers also liked

Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Titus Damaiyanti
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoopTianwei Liu
 
Hadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringHadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringGeorge Ang
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clusteringmobius.cn
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesFarzad Nozarian
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
Optimization for iterative queries on Mapreduce
Optimization for iterative queries on MapreduceOptimization for iterative queries on Mapreduce
Optimization for iterative queries on Mapreducemakoto onizuka
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringIJRES Journal
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
Collaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopCollaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopTien-Yang (Aiden) Wu
 
Spark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, alticSpark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, alticALTIC Altic
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2Tianwei Liu
 

Viewers also liked (20)

Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoop
 
Hadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringHadoop Design and k -Means Clustering
Hadoop Design and k -Means Clustering
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Optimization for iterative queries on Mapreduce
Optimization for iterative queries on MapreduceOptimization for iterative queries on Mapreduce
Optimization for iterative queries on Mapreduce
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text Clustering
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
MachineLearning_MPI_vs_Spark
MachineLearning_MPI_vs_SparkMachineLearning_MPI_vs_Spark
MachineLearning_MPI_vs_Spark
 
Collaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopCollaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on Hadoop
 
Spark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, alticSpark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, altic
 
K means
K meansK means
K means
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 

Similar to Parallel-kmeans

The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
 
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Florent Renucci
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationSangmin Woo
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduceThibault Debatty
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsJonny Daenen
 
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection ProblemTheories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection ProblemSeongcheol Baek
 
Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4Andrea Antonello
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.pptCheeWeiTan10
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingzukun
 
An analysis between exact and approximate algorithms for the k-center proble...
An analysis between exact and approximate algorithms for the  k-center proble...An analysis between exact and approximate algorithms for the  k-center proble...
An analysis between exact and approximate algorithms for the k-center proble...IJECEIAES
 
Visualization of general defined space data
Visualization of general defined space dataVisualization of general defined space data
Visualization of general defined space dataijcga
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceKyong-Ha Lee
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...IJECEIAES
 
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Ismael Torres-Pizarro, PhD, PE, Esq.
 

Similar to Parallel-kmeans (20)

The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph Generation
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduce
 
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection ProblemTheories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
 
Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
Knn solution
Knn solutionKnn solution
Knn solution
 
M0174491101
M0174491101M0174491101
M0174491101
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracing
 
An analysis between exact and approximate algorithms for the k-center proble...
An analysis between exact and approximate algorithms for the  k-center proble...An analysis between exact and approximate algorithms for the  k-center proble...
An analysis between exact and approximate algorithms for the k-center proble...
 
Visualization of general defined space data
Visualization of general defined space dataVisualization of general defined space data
Visualization of general defined space data
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...
 
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
 

More from Tien-Yang (Aiden) Wu

More from Tien-Yang (Aiden) Wu (11)

Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 
沒有想像中簡單的簡單分類器 Knn
沒有想像中簡單的簡單分類器 Knn沒有想像中簡單的簡單分類器 Knn
沒有想像中簡單的簡單分類器 Knn
 
Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
RDD
RDDRDD
RDD
 
Semantic ui教學
Semantic ui教學Semantic ui教學
Semantic ui教學
 
響應式網頁教學
響應式網頁教學響應式網頁教學
響應式網頁教學
 
NoSQL & JSON
NoSQL & JSONNoSQL & JSON
NoSQL & JSON
 
Weebly上手教學
Weebly上手教學Weebly上手教學
Weebly上手教學
 
簡易爬蟲製作和Pttcrawler
簡易爬蟲製作和Pttcrawler簡易爬蟲製作和Pttcrawler
簡易爬蟲製作和Pttcrawler
 
Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設
 

Recently uploaded

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 

Recently uploaded (20)

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Parallel-kmeans

  • 2. outline Introduction K-Means Algorithm Parallel K-Means Based on MapReduce Experimental Results K-Means on spark
  • 3. Introduction They assume that all objects can reside in main memory at the same time. Their parallel systems have provided restricted programming models.
  • 4. Introduction They assume that all objects can reside in main memory at the same time. Their parallel systems have provided restricted programming models. dataset oriented parallel clustering algorithms should be developed.
  • 6. K-Means Algorithm Firstly, it randomly selects k objects from the whole objects which represent initial cluster centers.
  • 7. K-Means Algorithm Each remaining object is assigned to the cluster to which it is the most similar, based on the distance between the object and the cluster center.
  • 8. K-Means Algorithm The new mean for each cluster is then calculated. This process iterates until the criterion function converges.
  • 9. Parallel K-Means Based on MapReduce most intensive calculation to occur is the calculation of distances. each iteration require nk distance
  • 10. Parallel K-Means Based on MapReduce the distance computations between one object with the centers is irrelevant to the distance computations between other objects with the corresponding centers. distance computations between different objects with centers can be parallel executed.
  • 11. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 data target 1,1 2,2 3,3 11,11 12,12 13,13 1 class 2 class
  • 12. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 random two centroid c1:(1,1) c2:(11,11)
  • 13. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 store two nodes c1:(1,1) c2:(11,11)
  • 14. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11)
  • 15. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11) map map combine combine reduce
  • 16. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 map 3,3 c1:(1,1) c2:(11,11) assign to c1(1,1) (1,1) , {(3,3),(3,3)} key value output<key,value>
  • 17. Parallel K-Means Based on MapReduce (1,1) , {(3,3),(3,3)} key value centroid temporary to calculate new centroid, the object (1,1) {(3,3),(3,3)} output<key,value>
  • 18. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 map c1:(1,1) c2:(11,11) (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value
  • 19. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11) map map (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value (11,11) , {(11,11),(11,11)} (1,1) , {(2,2),(2,2)} key value (11,11) , {(13,13),(13,13)}
  • 20. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 c1:(1,1) c2:(11,11) map (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value combine
  • 21. Parallel K-Means Based on MapReduce (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value combine (1,1) , {(4,4),{(1,1),(3,3),2} (11,11) , {(12,12),(12,12),1} key value same key combine
  • 22. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value output<key,value> centroid temporary to calculate new centroid, the objects ,number of objects (1,1) {(4,4),{(1,1),(3,3)},2}
  • 23. Parallel K-Means Based on MapReduce combine (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value (11,11) , {(11,11),(11,11)} (1,1) , {(2,2),(2,2)} key value (11,11) , {(13,13),(13,13)} combine (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value (1,1) , {(2,2),(2,2),1} (11,11) , {(24,24),{(11,11),(13,13)},2} key value
  • 24. Parallel K-Means Based on MapReduce reduce (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value (1,1) , {(2,2),(2,2),1} (11,11) , {(24,24),{(11,11),(13,13)},2} key value same key reduce
  • 25. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} reduce same key reduce (1,1) , {(2,2),{(1,1),(2,2),(3,3)}
  • 26. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (4+2)/(2+1) ,(4+2)/(2+1) = 2,2 2,2 = new centroid 1,1 2,2 3,3 centroid is 2,2
  • 27. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} centroid new centroid, the objects ,new cluster
  • 28. Parallel K-Means Based on MapReduce reduce (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (11,11) , {(12,12),{(11,11),(12,12),(13,13)} update new centroid and next iteration until converge or arrive to iteration number
  • 29. Experimental Results two 2.8 GHz cores and 4GB of memory
  • 33. Reference Parallel K-Means Clustering Based on MapReduce Weizhong Zhao1,2, Huifang Ma1,2, and Qing He1 2009 K means algorithm