SlideShare a Scribd company logo
Parallel K-Means
ncku Tien-Yang Wu
outline
Introduction
K-Means Algorithm
Parallel K-Means Based on MapReduce
Experimental Results
K-Means on spark
Introduction
They assume that all objects can reside in main
memory at the same time.
Their parallel systems have provided restricted
programming models.
Introduction
They assume that all objects can reside in main
memory at the same time.
Their parallel systems have provided restricted
programming models.
dataset oriented parallel clustering algorithms should be
developed.
K-Means Algorithm
K-Means Algorithm
Firstly, it randomly selects k objects from the whole objects
which represent initial cluster centers.
K-Means Algorithm
Each remaining object is assigned to the cluster to which it
is the most similar, based on the distance between the
object and the cluster center.
K-Means Algorithm
The new mean for each cluster is then calculated. This
process iterates until the criterion function converges.
Parallel K-Means Based
on MapReduce
most intensive calculation to occur is the calculation of
distances.
each iteration require nk distance
Parallel K-Means Based
on MapReduce
the distance computations between one object with the
centers is irrelevant to the distance computations
between other objects with the corresponding centers.
distance computations between different objects with
centers can be parallel executed.
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
data target
1,1
2,2
3,3
11,11
12,12
13,13
1 class
2 class
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
random two centroid
c1:(1,1)
c2:(11,11)
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
store two nodes
c1:(1,1)
c2:(11,11)
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
1,1
12,12
3,3
11,11
2,2
13,13
node1
node2
c1:(1,1)
c2:(11,11)
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
1,1
12,12
3,3
11,11
2,2
13,13
node1
node2
c1:(1,1)
c2:(11,11)
map
map
combine
combine
reduce
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
node1
map
3,3
c1:(1,1)
c2:(11,11)
assign to c1(1,1)
(1,1) , {(3,3),(3,3)}
key value
output<key,value>
Parallel K-Means Based
on MapReduce
(1,1) , {(3,3),(3,3)}
key value
centroid
temporary to calculate new centroid, the object
(1,1)
{(3,3),(3,3)}
output<key,value>
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
node1
map
c1:(1,1)
c2:(11,11)
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
11,11
2,2
13,13
node1
node2
c1:(1,1)
c2:(11,11)
map
map
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
(11,11) , {(11,11),(11,11)}
(1,1) , {(2,2),(2,2)}
key value
(11,11) , {(13,13),(13,13)}
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
node1 c1:(1,1)
c2:(11,11)
map
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
combine
Parallel K-Means Based
on MapReduce
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
combine
(1,1) , {(4,4),{(1,1),(3,3),2}
(11,11) , {(12,12),(12,12),1}
key value
same key combine
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(11,11) , {(12,12),(12,12),1}
key value
output<key,value>
centroid
temporary to calculate new centroid, the objects
,number of objects
(1,1)
{(4,4),{(1,1),(3,3)},2}
Parallel K-Means Based
on MapReduce
combine
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
(11,11) , {(11,11),(11,11)}
(1,1) , {(2,2),(2,2)}
key value
(11,11) , {(13,13),(13,13)}
combine
(1,1) , {(4,4),{(1,1),(3,3)},2}
(11,11) , {(12,12),(12,12),1}
key value
(1,1) , {(2,2),(2,2),1}
(11,11) , {(24,24),{(11,11),(13,13)},2}
key value
Parallel K-Means Based
on MapReduce
reduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(11,11) , {(12,12),(12,12),1}
key value
(1,1) , {(2,2),(2,2),1}
(11,11) , {(24,24),{(11,11),(13,13)},2}
key value
same key reduce
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(1,1) , {(2,2),(2,2),1}
reduce
same key reduce
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(1,1) , {(2,2),(2,2),1}
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
(4+2)/(2+1) ,(4+2)/(2+1) = 2,2
2,2 = new centroid
1,1
2,2
3,3
centroid is 2,2
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(1,1) , {(2,2),(2,2),1}
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
centroid
new centroid, the objects
,new cluster
Parallel K-Means Based
on MapReduce
reduce
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
(11,11) , {(12,12),{(11,11),(12,12),(13,13)}
update new centroid and next iteration
until converge or arrive to iteration number
Experimental Results
two 2.8 GHz cores and 4GB of memory
Experimental Results
Speedup
non linear,communication cost
Experimental Results
Scale up
performance
datasets 1GB 2GB 3GB 4GB
K-Means on spark
Reference
Parallel K-Means Clustering Based on MapReduce
Weizhong Zhao1,2, Huifang Ma1,2, and Qing He1
2009
K means algorithm

More Related Content

What's hot

From Renamer Plugin to Polyglot IDE
From Renamer Plugin to Polyglot IDEFrom Renamer Plugin to Polyglot IDE
From Renamer Plugin to Polyglot IDE
intelliyole
 
A Star Search
A Star SearchA Star Search
A Star Search
Computing Cage
 
Backtracking
BacktrackingBacktracking
Backtracking
Vikas Sharma
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
Joud Khattab
 
Hill climbing algorithm in artificial intelligence
Hill climbing algorithm in artificial intelligenceHill climbing algorithm in artificial intelligence
Hill climbing algorithm in artificial intelligence
sandeep54552
 
Data Mining
Data MiningData Mining
Data Mining
NafiulIslamNakib
 
Machine learning
Machine learningMachine learning
Machine learning
Dr Geetha Mohan
 
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
The Mechanics of Testing Large Data Pipelines (QCon London 2016)The Mechanics of Testing Large Data Pipelines (QCon London 2016)
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
Mathieu Bastian
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Stanley Wang
 
Presto
PrestoPresto
Presto
Knoldus Inc.
 
Data Modeling for MongoDB
Data Modeling for MongoDBData Modeling for MongoDB
Data Modeling for MongoDB
MongoDB
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
swapnac12
 
implementation of travelling salesman problem with complexity ppt
implementation of travelling salesman problem with complexity pptimplementation of travelling salesman problem with complexity ppt
implementation of travelling salesman problem with complexity ppt
AntaraBhattacharya12
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
Milind Bhandarkar
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
Solving recurrences
Solving recurrencesSolving recurrences
Solving recurrences
Megha V
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Databricks
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 

What's hot (20)

From Renamer Plugin to Polyglot IDE
From Renamer Plugin to Polyglot IDEFrom Renamer Plugin to Polyglot IDE
From Renamer Plugin to Polyglot IDE
 
A Star Search
A Star SearchA Star Search
A Star Search
 
Backtracking
BacktrackingBacktracking
Backtracking
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Hadoop2.2
Hadoop2.2Hadoop2.2
Hadoop2.2
 
Hill climbing algorithm in artificial intelligence
Hill climbing algorithm in artificial intelligenceHill climbing algorithm in artificial intelligence
Hill climbing algorithm in artificial intelligence
 
Data Mining
Data MiningData Mining
Data Mining
 
Machine learning
Machine learningMachine learning
Machine learning
 
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
The Mechanics of Testing Large Data Pipelines (QCon London 2016)The Mechanics of Testing Large Data Pipelines (QCon London 2016)
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Presto
PrestoPresto
Presto
 
Data Modeling for MongoDB
Data Modeling for MongoDBData Modeling for MongoDB
Data Modeling for MongoDB
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
implementation of travelling salesman problem with complexity ppt
implementation of travelling salesman problem with complexity pptimplementation of travelling salesman problem with complexity ppt
implementation of travelling salesman problem with complexity ppt
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Solving recurrences
Solving recurrencesSolving recurrences
Solving recurrences
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 

Viewers also liked

Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Titus Damaiyanti
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
Varad Meru
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoopTianwei Liu
 
Hadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringHadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringGeorge Ang
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
mobius.cn
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
Farzad Nozarian
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
 
Optimization for iterative queries on Mapreduce
Optimization for iterative queries on MapreduceOptimization for iterative queries on Mapreduce
Optimization for iterative queries on Mapreduce
makoto onizuka
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text Clustering
IJRES Journal
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
Subhas Kumar Ghosh
 
Collaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopCollaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on Hadoop
Tien-Yang (Aiden) Wu
 
Spark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, alticSpark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, alticALTIC Altic
 
K means
K meansK means
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2Tianwei Liu
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
Tien-Yang (Aiden) Wu
 

Viewers also liked (20)

Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoop
 
Hadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringHadoop Design and k -Means Clustering
Hadoop Design and k -Means Clustering
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Optimization for iterative queries on Mapreduce
Optimization for iterative queries on MapreduceOptimization for iterative queries on Mapreduce
Optimization for iterative queries on Mapreduce
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text Clustering
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
MachineLearning_MPI_vs_Spark
MachineLearning_MPI_vs_SparkMachineLearning_MPI_vs_Spark
MachineLearning_MPI_vs_Spark
 
Collaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopCollaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on Hadoop
 
Spark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, alticSpark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, altic
 
K means
K meansK means
K means
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 

Similar to Parallel-kmeans

The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
AlaaZ
 
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Florent Renucci
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
Geoffrey Fox
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph Generation
Sangmin Woo
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduce
Thibault Debatty
 
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
Md Abul Hayat
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
Jonny Daenen
 
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection ProblemTheories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
Seongcheol Baek
 
Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4
Andrea Antonello
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
CheeWeiTan10
 
M0174491101
M0174491101M0174491101
M0174491101
IOSR Journals
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingzukun
 
An analysis between exact and approximate algorithms for the k-center proble...
An analysis between exact and approximate algorithms for the  k-center proble...An analysis between exact and approximate algorithms for the  k-center proble...
An analysis between exact and approximate algorithms for the k-center proble...
IJECEIAES
 
Visualization of general defined space data
Visualization of general defined space dataVisualization of general defined space data
Visualization of general defined space data
ijcga
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
Kyong-Ha Lee
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...
IJECEIAES
 
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Ismael Torres-Pizarro, PhD, PE, Esq.
 

Similar to Parallel-kmeans (20)

The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph Generation
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduce
 
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection ProblemTheories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
 
Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
Knn solution
Knn solutionKnn solution
Knn solution
 
M0174491101
M0174491101M0174491101
M0174491101
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracing
 
An analysis between exact and approximate algorithms for the k-center proble...
An analysis between exact and approximate algorithms for the  k-center proble...An analysis between exact and approximate algorithms for the  k-center proble...
An analysis between exact and approximate algorithms for the k-center proble...
 
Visualization of general defined space data
Visualization of general defined space dataVisualization of general defined space data
Visualization of general defined space data
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...
 
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
 

More from Tien-Yang (Aiden) Wu

Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
Tien-Yang (Aiden) Wu
 
沒有想像中簡單的簡單分類器 Knn
沒有想像中簡單的簡單分類器 Knn沒有想像中簡單的簡單分類器 Knn
沒有想像中簡單的簡單分類器 Knn
Tien-Yang (Aiden) Wu
 
Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...
Tien-Yang (Aiden) Wu
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
Tien-Yang (Aiden) Wu
 
RDD
RDDRDD
Semantic ui教學
Semantic ui教學Semantic ui教學
Semantic ui教學
Tien-Yang (Aiden) Wu
 
響應式網頁教學
響應式網頁教學響應式網頁教學
響應式網頁教學
Tien-Yang (Aiden) Wu
 
NoSQL & JSON
NoSQL & JSONNoSQL & JSON
NoSQL & JSON
Tien-Yang (Aiden) Wu
 
Weebly上手教學
Weebly上手教學Weebly上手教學
Weebly上手教學
Tien-Yang (Aiden) Wu
 
簡易爬蟲製作和Pttcrawler
簡易爬蟲製作和Pttcrawler簡易爬蟲製作和Pttcrawler
簡易爬蟲製作和Pttcrawler
Tien-Yang (Aiden) Wu
 
Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設
Tien-Yang (Aiden) Wu
 

More from Tien-Yang (Aiden) Wu (11)

Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 
沒有想像中簡單的簡單分類器 Knn
沒有想像中簡單的簡單分類器 Knn沒有想像中簡單的簡單分類器 Knn
沒有想像中簡單的簡單分類器 Knn
 
Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
RDD
RDDRDD
RDD
 
Semantic ui教學
Semantic ui教學Semantic ui教學
Semantic ui教學
 
響應式網頁教學
響應式網頁教學響應式網頁教學
響應式網頁教學
 
NoSQL & JSON
NoSQL & JSONNoSQL & JSON
NoSQL & JSON
 
Weebly上手教學
Weebly上手教學Weebly上手教學
Weebly上手教學
 
簡易爬蟲製作和Pttcrawler
簡易爬蟲製作和Pttcrawler簡易爬蟲製作和Pttcrawler
簡易爬蟲製作和Pttcrawler
 
Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設
 

Recently uploaded

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
QuickwayInfoSystems3
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Yara Milbes
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 

Recently uploaded (20)

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 

Parallel-kmeans

  • 2. outline Introduction K-Means Algorithm Parallel K-Means Based on MapReduce Experimental Results K-Means on spark
  • 3. Introduction They assume that all objects can reside in main memory at the same time. Their parallel systems have provided restricted programming models.
  • 4. Introduction They assume that all objects can reside in main memory at the same time. Their parallel systems have provided restricted programming models. dataset oriented parallel clustering algorithms should be developed.
  • 6. K-Means Algorithm Firstly, it randomly selects k objects from the whole objects which represent initial cluster centers.
  • 7. K-Means Algorithm Each remaining object is assigned to the cluster to which it is the most similar, based on the distance between the object and the cluster center.
  • 8. K-Means Algorithm The new mean for each cluster is then calculated. This process iterates until the criterion function converges.
  • 9. Parallel K-Means Based on MapReduce most intensive calculation to occur is the calculation of distances. each iteration require nk distance
  • 10. Parallel K-Means Based on MapReduce the distance computations between one object with the centers is irrelevant to the distance computations between other objects with the corresponding centers. distance computations between different objects with centers can be parallel executed.
  • 11. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 data target 1,1 2,2 3,3 11,11 12,12 13,13 1 class 2 class
  • 12. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 random two centroid c1:(1,1) c2:(11,11)
  • 13. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 store two nodes c1:(1,1) c2:(11,11)
  • 14. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11)
  • 15. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11) map map combine combine reduce
  • 16. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 map 3,3 c1:(1,1) c2:(11,11) assign to c1(1,1) (1,1) , {(3,3),(3,3)} key value output<key,value>
  • 17. Parallel K-Means Based on MapReduce (1,1) , {(3,3),(3,3)} key value centroid temporary to calculate new centroid, the object (1,1) {(3,3),(3,3)} output<key,value>
  • 18. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 map c1:(1,1) c2:(11,11) (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value
  • 19. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11) map map (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value (11,11) , {(11,11),(11,11)} (1,1) , {(2,2),(2,2)} key value (11,11) , {(13,13),(13,13)}
  • 20. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 c1:(1,1) c2:(11,11) map (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value combine
  • 21. Parallel K-Means Based on MapReduce (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value combine (1,1) , {(4,4),{(1,1),(3,3),2} (11,11) , {(12,12),(12,12),1} key value same key combine
  • 22. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value output<key,value> centroid temporary to calculate new centroid, the objects ,number of objects (1,1) {(4,4),{(1,1),(3,3)},2}
  • 23. Parallel K-Means Based on MapReduce combine (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value (11,11) , {(11,11),(11,11)} (1,1) , {(2,2),(2,2)} key value (11,11) , {(13,13),(13,13)} combine (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value (1,1) , {(2,2),(2,2),1} (11,11) , {(24,24),{(11,11),(13,13)},2} key value
  • 24. Parallel K-Means Based on MapReduce reduce (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value (1,1) , {(2,2),(2,2),1} (11,11) , {(24,24),{(11,11),(13,13)},2} key value same key reduce
  • 25. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} reduce same key reduce (1,1) , {(2,2),{(1,1),(2,2),(3,3)}
  • 26. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (4+2)/(2+1) ,(4+2)/(2+1) = 2,2 2,2 = new centroid 1,1 2,2 3,3 centroid is 2,2
  • 27. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} centroid new centroid, the objects ,new cluster
  • 28. Parallel K-Means Based on MapReduce reduce (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (11,11) , {(12,12),{(11,11),(12,12),(13,13)} update new centroid and next iteration until converge or arrive to iteration number
  • 29. Experimental Results two 2.8 GHz cores and 4GB of memory
  • 33. Reference Parallel K-Means Clustering Based on MapReduce Weizhong Zhao1,2, Huifang Ma1,2, and Qing He1 2009 K means algorithm