SlideShare a Scribd company logo
1 of 46
Enhancing the performance of
K-Means algorithm
Plan
• Basic K-Means Algorithm
• Converting basic K-Means algorithm to
concurrent
• Implementation of K-Means algorithm using
C#
• Analysis of Results
• Conclusion
Basic K-Means Algorithm
Definition
• K-Means clustering is a method of cluster
analysis which aims to partition n
observations into k clusters in which each
observation belongs to the cluster with the
nearest mean.
Definition
• The K-Means problem is to find cluster centers
that minimize the sum of squared distances
from each data point being clustered to its
cluster center (the center that is closest to it).
• A very common measure is the sum of
distances or sum of squared Euclidean
distances from the mean of each cluster.
Basic K-Means
Algorithm Steps
Step 1
The algorithm arbitrarily selects k
points as the initial cluster centers
(“means”).
Step 2
Each point in the dataset is assigned to
the closed cluster, based upon the
Euclidean distance between each point
and each cluster center.
Step 3
Each cluster center is recomputed as
the average of the points in that
cluster.
Steps 2 and 3 repeat until the clusters converge.
Convergence
Convergence means that either no
observations change clusters when
steps 2 and 3 are repeated or that the
changes do not make a material
difference in the definition of the
clusters
K-Means Algorithm Steps Schema
K-Means Algorithm Deficiencies
• The k-means algorithm has at least two major
theoretic shortcomings:
It has been shown that the worst case running
time of the algorithm is super-polynomial in the
input size.
The approximation found can be arbitrarily bad
with respect to the objective function compared
to the optimal clustering.
Our Work
Basic K-Means will be updated and
manipulated to a Concurrent K-Means
version that uses special .Net framework
libraries to take advantage of Multi-
threading Technology.
Our Work
This Concurrent version of K-Means
reserves all the benefits of Basic K-Means
and adds to it a much faster and
manipulated abilities that makes the
software runs as fast as 70%~85% more
than Basic K-Means.
Converting Basic K-Means
algorithm into Concurrent
First Step
First we must identify the Task
containing independent sub-tasks
that can be executed in parallel.
Identifying sub-Tasks
Consider the K-Means algorithm as follows:
1) Pick Random Center Points
2) Assign Points To Centers
3) Calculate New Centers
4) Check If Centers Are Equal
(if so, quit Else Go to 2)
Basic K-Means Algorithm execution
1 2 3 4 End
no convergence
convergence
Single thread
Identifying sub-Tasks
In step 2, we are going to loop over every
point and determine which center is
closest to it. Since there is no state
modified during this lookup.
we can easily make this processes parallel.
Identifying sub-Tasks
In step 3, when we calculate new centers,
we are just going to loop over all of the
points in a given group and calculate their
“average” location (or centroid)
Identifying sub-Tasks
The Steps 2 and 3 are the best steps that
we can apply parallelism on them because
they are composed of independent loops
executed over the data points.
Concurrent K-Means Algorithm
execution
Linear execution
1 2 3 4 End
Parallel execution
no convergence
convergence
Implementation of K-Means
algorithm using C#
Basic K-Means algorithm
For step 3, all we need to do is loop through
each point and check every center until we find
the closest one.
If we weren’t concerned with writing a parallel
application then we could simple loop over
them with a normal foreach statement:
foreach (var point in Points){ //content goes here }
Concurrent K-Means algorithm
But if we leverage the
System.Threading.Tasks.Parallel class in .NET 4.0, we
could simply write this:
Parallel.ForEach(points, point =>
{ //contents goes here });
The same thing is repeated in step 4
Demo Application
Application Snapshots
Application Snapshots
Analysis of Results
Used machine
• Experiments are made under the following
machine:
• CPU = Intel(R) Xeon(R) X5690 @ 3.47 GHz/ 63.9
Gb of RAM
• Operating System = Microsoft Windows Server
2003 Enterprise X64 Edition Service Pack 2
• Number of Processors = 24
• Application Type = 64 bit
K = 10
Means (k) Data Points One Thread (sec) Multi-Threaded (sec) Difference (sec) Iterations Enhancement (%)
10 5000 0.3880071 0.2134675 0.1745396 46 44.98360984
10 10000 0.7593024 0.398528 0.3607744 48 47.51392857
10 15000 0.7250237 0.3331953 0.3918284 48 54.04352989
10 20000 1.2642376 0.5171551 0.7470825 21 59.09352008
10 25000 0.8343164 0.3451272 0.4891892 21 58.63353519
10 30000 2.2632929 0.913688 1.3496049 47 59.63014774
10 35000 1.907018 0.7550718 1.1519462 34 60.40562805
10 40000 2.4957917 0.9887817 1.50701 39 60.3820423
10 45000 3.2316701 1.2320773 1.9995928 44 61.87490487
10 50000 4.0127932 1.4904087 2.5223845 49 62.85857193
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 10000 20000 30000 40000 50000 60000
Executiontime(Second)
Data points (point)
One-Threaded
Multi-Threaded
K = 10
K = 20
Means (k) Data Points One Thread (sec) Multi-Threaded (sec) Difference (sec) Iterations Enhancement (%)
20 5000 0.7284919 0.2125982 0.5158937 47 70.81666934
20 10000 1.2648146 0.3195409 0.9452737 41 74.7361471
20 15000 1.6659779 0.3957833 1.2701946 36 76.24318426
20 20000 5.0632423 1.2135201 3.8497222 84 76.03274684
20 25000 4.2068176 1.0020813 3.2047363 56 76.17958763
20 30000 7.151855 1.6554456 5.4964094 80 76.85291998
20 35000 6.0900071 1.4264851 4.663522 58 76.57662665
20 40000 4.9248527 1.1537625 3.7710902 41 76.57264957
20 45000 14.0519236 3.2482402 10.8036834 104 76.8840175
20 50000 6.5857465 1.5168731 5.0688734 44 76.9673324
K = 20
0
2
4
6
8
10
12
14
16
0 10000 20000 30000 40000 50000 60000
Executiontime(Second)
Data points (point)
One-Threaded
Multi-Threaded
K = 30
Means (k) Data Points One Thread (sec) Multi-Threaded (sec) Difference (sec) Iterations Enhancement (%)
30 5000 0.6833344 0.1506775 0.5326569 30 77.94966857
30 10000 2.9058289 0.5835878 2.3222411 66 79.9166496
30 15000 3.9962787 0.7419719 3.2543068 60 81.43342956
30 20000 4.7729457 0.8792443 3.8937014 54 81.57858155
30 25000 13.3885657 2.3911529 10.9974128 121 82.14033561
30 30000 5.942487 1.0777533 4.8647337 45 81.86359852
30 35000 9.0325469 1.6179971 7.4145498 59 82.0870335
30 40000 14.0488585 2.4782393 11.5706192 80 82.35985294
30 45000 15.148019 2.6497895 12.4982295 77 82.50735294
30 50000 15.9880739 2.7855552 13.2025187 73 82.57729344
K = 30
0
2
4
6
8
10
12
14
16
18
0 10000 20000 30000 40000 50000 60000
Executiontime(Second)
Data points (point)
One-Threaded
Mutli-Threaded
K = 40
Means (k) Data Points One Thread (sec) Multi-Threaded (sec) Difference (sec) Iterations Enhancement (%)
40 5000 0.8754205 0.1490575 0.726363 30 82.97303981
40 10000 3.9124465 0.6024399 3.3100066 68 84.60196453
40 15000 6.7258824 0.9795056 5.7463768 78 85.43677184
40 20000 8.4592087 1.1685599 7.2906488 73 86.18594314
40 25000 8.5551805 1.1898307 7.3653498 59 86.09227824
40 30000 14.9347712 2.0584344 12.8763368 86 86.21716816
40 35000 24.0051212 3.2160665 20.7890547 119 86.6025817
40 40000 28.2736811 3.7496219 24.5240592 122 86.73811915
40 45000 16.3791093 2.1855015 14.1936078 63 86.65677443
40 50000 16.4799443 2.1400651 14.3398792 57 87.01412419
K = 40
0
5
10
15
20
25
30
0 10000 20000 30000 40000 50000 60000
Executiontime(Second)
Data points (point)
One-Threaded
Multi-Threaded
Results Analysis
• In case of K = 10, the results show that when
data points number is 5000 the algorithm is
enhanced by 44.98360984 % and this value
grows up to reach 62.85857193 % when data
points number is 50000 .
Results Analysis
• In case of K = 20, the results show that when
data points number is 5000 the algorithm is
enhanced by 70.81666934 % and this value
grows up to reach 76.9673324 % when data
points number is 50000 .
Results Analysis
• In case of K = 30, the results show that when
data points number is 5000 the algorithm is
enhanced by 77.94966857 % and this value
grows up to reach 82.57729344 % when data
points number is 50000 .
Results Analysis
• In case of K = 40, the results show that when
data points number is 5000 the algorithm is
enhanced by 82.97303981 % and this value
grows up to reach 87.01412419 % when data
points number is 50000.
Results Analysis
• If we regress the enhancement (Y) on the
number of means (X1) and the data points
(X2) we will have the equation below:
• Enhancement = 49.762 + 0.871 (Number of
means) + 0.0001 (Number of data points)
• R2 = 0.83251942
Conclusion
Conclusion
• This equation shows that there are a high
correlation between the Enhancement and
the number of means. We can say that when
the number of means is bigger, the
enhancement is better. We have this result
because we used parallel loops when looping
over clusters. So multi-threading is used more
when we have more means.
Future Work
In this project we have worked only on two tasks
in the K-Means algorithm (Steps 2 & 3).
In future works, we can work on converting the
whole algorithm into concurrent.

More Related Content

What's hot

Paper id 26201483
Paper id 26201483Paper id 26201483
Paper id 26201483IJRAT
 
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), FooladSuperpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), FooladShima Foolad
 
The Queue M/M/1 with Additional Servers for a Longer Queue
The Queue M/M/1 with Additional Servers for a Longer QueueThe Queue M/M/1 with Additional Servers for a Longer Queue
The Queue M/M/1 with Additional Servers for a Longer QueueIJMER
 
NON LINEAR ANALYSIS OF STRUCTURAL STEEL I BEAM
NON LINEAR ANALYSIS OF STRUCTURAL STEEL I BEAMNON LINEAR ANALYSIS OF STRUCTURAL STEEL I BEAM
NON LINEAR ANALYSIS OF STRUCTURAL STEEL I BEAMVishnu R
 
52.3 Model with Shear webs
52.3 Model with Shear webs52.3 Model with Shear webs
52.3 Model with Shear websVishnu R
 
Implementation of dijsktra’s algorithm in parallel
Implementation of dijsktra’s algorithm in parallelImplementation of dijsktra’s algorithm in parallel
Implementation of dijsktra’s algorithm in parallelMeenakshi Muthuraman
 
ML basic & clustering
ML basic & clusteringML basic & clustering
ML basic & clusteringmonalisa Das
 
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Koh Takeuchi
 
Scale free network Visualiuzation
Scale free network VisualiuzationScale free network Visualiuzation
Scale free network VisualiuzationHarshit Srivastava
 
IRJET- Clustering the Real Time Moving Object Adjacent Tracking
IRJET-  	  Clustering the Real Time Moving Object Adjacent TrackingIRJET-  	  Clustering the Real Time Moving Object Adjacent Tracking
IRJET- Clustering the Real Time Moving Object Adjacent TrackingIRJET Journal
 
Digital signal processing fundamentals
Digital signal processing fundamentalsDigital signal processing fundamentals
Digital signal processing fundamentalsElaine Malabana
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Sangwoo Mo
 
Simulation of Scale-Free Networks
Simulation of Scale-Free NetworksSimulation of Scale-Free Networks
Simulation of Scale-Free NetworksGabriele D'Angelo
 
My presentation in MST -11 International Workshop
My presentation in MST -11 International WorkshopMy presentation in MST -11 International Workshop
My presentation in MST -11 International WorkshopArpit Gupta
 
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environmentjins0618
 

What's hot (19)

Paper id 26201483
Paper id 26201483Paper id 26201483
Paper id 26201483
 
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), FooladSuperpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
 
The Queue M/M/1 with Additional Servers for a Longer Queue
The Queue M/M/1 with Additional Servers for a Longer QueueThe Queue M/M/1 with Additional Servers for a Longer Queue
The Queue M/M/1 with Additional Servers for a Longer Queue
 
Group01_Project3
Group01_Project3Group01_Project3
Group01_Project3
 
NON LINEAR ANALYSIS OF STRUCTURAL STEEL I BEAM
NON LINEAR ANALYSIS OF STRUCTURAL STEEL I BEAMNON LINEAR ANALYSIS OF STRUCTURAL STEEL I BEAM
NON LINEAR ANALYSIS OF STRUCTURAL STEEL I BEAM
 
52.3 Model with Shear webs
52.3 Model with Shear webs52.3 Model with Shear webs
52.3 Model with Shear webs
 
Implementation of dijsktra’s algorithm in parallel
Implementation of dijsktra’s algorithm in parallelImplementation of dijsktra’s algorithm in parallel
Implementation of dijsktra’s algorithm in parallel
 
ML basic & clustering
ML basic & clusteringML basic & clustering
ML basic & clustering
 
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
 
Scale free network Visualiuzation
Scale free network VisualiuzationScale free network Visualiuzation
Scale free network Visualiuzation
 
IRJET- Clustering the Real Time Moving Object Adjacent Tracking
IRJET-  	  Clustering the Real Time Moving Object Adjacent TrackingIRJET-  	  Clustering the Real Time Moving Object Adjacent Tracking
IRJET- Clustering the Real Time Moving Object Adjacent Tracking
 
Report01_rev1
Report01_rev1Report01_rev1
Report01_rev1
 
lec15_ref.pdf
lec15_ref.pdflec15_ref.pdf
lec15_ref.pdf
 
Digital signal processing fundamentals
Digital signal processing fundamentalsDigital signal processing fundamentals
Digital signal processing fundamentals
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
 
Simulation of Scale-Free Networks
Simulation of Scale-Free NetworksSimulation of Scale-Free Networks
Simulation of Scale-Free Networks
 
My presentation in MST -11 International Workshop
My presentation in MST -11 International WorkshopMy presentation in MST -11 International Workshop
My presentation in MST -11 International Workshop
 
Data fusion with kalman filtering
Data fusion with kalman filteringData fusion with kalman filtering
Data fusion with kalman filtering
 
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
 

Viewers also liked

(363)long reconquistar derechos sociale sdocx
(363)long reconquistar derechos sociale sdocx(363)long reconquistar derechos sociale sdocx
(363)long reconquistar derechos sociale sdocxManfredNolte
 
Be&gg, evarianna f pasaribu, hapzi ali, the corporate culture infact and...
Be&gg, evarianna f pasaribu, hapzi ali, the corporate culture  infact and...Be&gg, evarianna f pasaribu, hapzi ali, the corporate culture  infact and...
Be&gg, evarianna f pasaribu, hapzi ali, the corporate culture infact and...Evarianna
 
Búsqueda scopus y cinahl
Búsqueda scopus y cinahlBúsqueda scopus y cinahl
Búsqueda scopus y cinahlPaula Boyero
 
How does it link
How does it linkHow does it link
How does it linkasmediae15
 
Individual Project: Contemporary Interpretation
Individual Project: Contemporary InterpretationIndividual Project: Contemporary Interpretation
Individual Project: Contemporary InterpretationJacques de Beaufort
 
Setting up a 3 compartment sink (spanish) - Configurar un fregadero de tres c...
Setting up a 3 compartment sink (spanish) - Configurar un fregadero de tres c...Setting up a 3 compartment sink (spanish) - Configurar un fregadero de tres c...
Setting up a 3 compartment sink (spanish) - Configurar un fregadero de tres c...Linda Burns, CHI
 
Naturaleza de los fenomenos hidraulicos
Naturaleza de los fenomenos hidraulicosNaturaleza de los fenomenos hidraulicos
Naturaleza de los fenomenos hidraulicosMaría José Meza Vera
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
Analyzing "Total liban" mobile Application
Analyzing "Total liban" mobile ApplicationAnalyzing "Total liban" mobile Application
Analyzing "Total liban" mobile ApplicationHadi Fadlallah
 
Decision Support Systems
Decision Support SystemsDecision Support Systems
Decision Support SystemsHadi Fadlallah
 
3Com 3C16471B / 10016355
3Com 3C16471B / 100163553Com 3C16471B / 10016355
3Com 3C16471B / 10016355savomir
 
Political Alert - House of Representatives Daily Program (FED)
Political Alert - House of Representatives Daily Program (FED)Political Alert - House of Representatives Daily Program (FED)
Political Alert - House of Representatives Daily Program (FED)Lisa Munoz
 
3Com 7030-10067
3Com 7030-100673Com 7030-10067
3Com 7030-10067savomir
 

Viewers also liked (20)

Pro chi pics class work
Pro chi pics class workPro chi pics class work
Pro chi pics class work
 
Psychisme Humain
Psychisme HumainPsychisme Humain
Psychisme Humain
 
(363)long reconquistar derechos sociale sdocx
(363)long reconquistar derechos sociale sdocx(363)long reconquistar derechos sociale sdocx
(363)long reconquistar derechos sociale sdocx
 
Esclerodermia
EsclerodermiaEsclerodermia
Esclerodermia
 
Análisis de invento
Análisis de invento Análisis de invento
Análisis de invento
 
Be&gg, evarianna f pasaribu, hapzi ali, the corporate culture infact and...
Be&gg, evarianna f pasaribu, hapzi ali, the corporate culture  infact and...Be&gg, evarianna f pasaribu, hapzi ali, the corporate culture  infact and...
Be&gg, evarianna f pasaribu, hapzi ali, the corporate culture infact and...
 
Búsqueda scopus y cinahl
Búsqueda scopus y cinahlBúsqueda scopus y cinahl
Búsqueda scopus y cinahl
 
How does it link
How does it linkHow does it link
How does it link
 
Question 2
Question 2Question 2
Question 2
 
Individual Project: Contemporary Interpretation
Individual Project: Contemporary InterpretationIndividual Project: Contemporary Interpretation
Individual Project: Contemporary Interpretation
 
Setting up a 3 compartment sink (spanish) - Configurar un fregadero de tres c...
Setting up a 3 compartment sink (spanish) - Configurar un fregadero de tres c...Setting up a 3 compartment sink (spanish) - Configurar un fregadero de tres c...
Setting up a 3 compartment sink (spanish) - Configurar un fregadero de tres c...
 
Naturaleza de los fenomenos hidraulicos
Naturaleza de los fenomenos hidraulicosNaturaleza de los fenomenos hidraulicos
Naturaleza de los fenomenos hidraulicos
 
Network data storage
Network data storageNetwork data storage
Network data storage
 
K means clustering
K means clusteringK means clustering
K means clustering
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Analyzing "Total liban" mobile Application
Analyzing "Total liban" mobile ApplicationAnalyzing "Total liban" mobile Application
Analyzing "Total liban" mobile Application
 
Decision Support Systems
Decision Support SystemsDecision Support Systems
Decision Support Systems
 
3Com 3C16471B / 10016355
3Com 3C16471B / 100163553Com 3C16471B / 10016355
3Com 3C16471B / 10016355
 
Political Alert - House of Representatives Daily Program (FED)
Political Alert - House of Representatives Daily Program (FED)Political Alert - House of Representatives Daily Program (FED)
Political Alert - House of Representatives Daily Program (FED)
 
3Com 7030-10067
3Com 7030-100673Com 7030-10067
3Com 7030-10067
 

Similar to Enhancing the performance of kmeans algorithm

Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
MAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxMAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxBharathiLakshmiAAssi
 
Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0PMILebanonChapter
 
Profiling ruby
Profiling rubyProfiling ruby
Profiling rubynasirj
 
Octave - Prototyping Machine Learning Algorithms
Octave - Prototyping Machine Learning AlgorithmsOctave - Prototyping Machine Learning Algorithms
Octave - Prototyping Machine Learning AlgorithmsCraig Trim
 
K-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonK-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonAfzal Ahmad
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
 
Image orientation classification analysis
Image orientation classification analysisImage orientation classification analysis
Image orientation classification analysisRohit Dandona
 
Gradient Steepest method application on Griewank Function
Gradient Steepest method application  on Griewank Function Gradient Steepest method application  on Griewank Function
Gradient Steepest method application on Griewank Function Imane Haf
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniquestalktoharry
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfKundjanasith Thonglek
 
The fundamentals of regression
The fundamentals of regressionThe fundamentals of regression
The fundamentals of regressionStephanie Locke
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Experfy
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
Enabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationEnabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationQualcomm Research
 

Similar to Enhancing the performance of kmeans algorithm (20)

Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
MAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxMAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsx
 
matrixmultiplicationparallel.ppsx
matrixmultiplicationparallel.ppsxmatrixmultiplicationparallel.ppsx
matrixmultiplicationparallel.ppsx
 
Svm on cloud (presntation)
Svm on cloud  (presntation)Svm on cloud  (presntation)
Svm on cloud (presntation)
 
Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0
 
Profiling ruby
Profiling rubyProfiling ruby
Profiling ruby
 
Octave - Prototyping Machine Learning Algorithms
Octave - Prototyping Machine Learning AlgorithmsOctave - Prototyping Machine Learning Algorithms
Octave - Prototyping Machine Learning Algorithms
 
microprocessor
microprocessormicroprocessor
microprocessor
 
K-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonK-Means Algorithm Implementation In python
K-Means Algorithm Implementation In python
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
Image orientation classification analysis
Image orientation classification analysisImage orientation classification analysis
Image orientation classification analysis
 
Mat lab
Mat labMat lab
Mat lab
 
Gradient Steepest method application on Griewank Function
Gradient Steepest method application  on Griewank Function Gradient Steepest method application  on Griewank Function
Gradient Steepest method application on Griewank Function
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
 
The fundamentals of regression
The fundamentals of regressionThe fundamentals of regression
The fundamentals of regression
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Enabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationEnabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through Quantization
 

More from Hadi Fadlallah

RaDEn : A Scalable and Efficient Platform for Engineering Radiation Data
RaDEn :  A Scalable and Efficient Platform for Engineering Radiation DataRaDEn :  A Scalable and Efficient Platform for Engineering Radiation Data
RaDEn : A Scalable and Efficient Platform for Engineering Radiation DataHadi Fadlallah
 
ORADIEX : A Big Data driven smart framework for real-time surveillance and an...
ORADIEX : A Big Data driven smart framework for real-time surveillance and an...ORADIEX : A Big Data driven smart framework for real-time surveillance and an...
ORADIEX : A Big Data driven smart framework for real-time surveillance and an...Hadi Fadlallah
 
What makes it worth becoming a Data Engineer?
What makes it worth becoming a Data Engineer?What makes it worth becoming a Data Engineer?
What makes it worth becoming a Data Engineer?Hadi Fadlallah
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringHadi Fadlallah
 
An introduction to Business intelligence
An introduction to Business intelligenceAn introduction to Business intelligence
An introduction to Business intelligenceHadi Fadlallah
 
Big data lab as a service
Big data lab as a serviceBig data lab as a service
Big data lab as a serviceHadi Fadlallah
 
Risk management and IT technologies
Risk management and IT technologiesRisk management and IT technologies
Risk management and IT technologiesHadi Fadlallah
 
Cloud computing pricing models
Cloud computing pricing modelsCloud computing pricing models
Cloud computing pricing modelsHadi Fadlallah
 
Internet of things security challenges
Internet of things security challengesInternet of things security challenges
Internet of things security challengesHadi Fadlallah
 
Secure Aware Routing Protocol
Secure Aware Routing ProtocolSecure Aware Routing Protocol
Secure Aware Routing ProtocolHadi Fadlallah
 
Penetration testing in wireless network
Penetration testing in wireless networkPenetration testing in wireless network
Penetration testing in wireless networkHadi Fadlallah
 
Dhcp authentication using certificates
Dhcp authentication using certificatesDhcp authentication using certificates
Dhcp authentication using certificatesHadi Fadlallah
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Sql parametrized queries
Sql parametrized queriesSql parametrized queries
Sql parametrized queriesHadi Fadlallah
 

More from Hadi Fadlallah (20)

RaDEn : A Scalable and Efficient Platform for Engineering Radiation Data
RaDEn :  A Scalable and Efficient Platform for Engineering Radiation DataRaDEn :  A Scalable and Efficient Platform for Engineering Radiation Data
RaDEn : A Scalable and Efficient Platform for Engineering Radiation Data
 
ORADIEX : A Big Data driven smart framework for real-time surveillance and an...
ORADIEX : A Big Data driven smart framework for real-time surveillance and an...ORADIEX : A Big Data driven smart framework for real-time surveillance and an...
ORADIEX : A Big Data driven smart framework for real-time surveillance and an...
 
What makes it worth becoming a Data Engineer?
What makes it worth becoming a Data Engineer?What makes it worth becoming a Data Engineer?
What makes it worth becoming a Data Engineer?
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
An introduction to Business intelligence
An introduction to Business intelligenceAn introduction to Business intelligence
An introduction to Business intelligence
 
Big data lab as a service
Big data lab as a serviceBig data lab as a service
Big data lab as a service
 
Risk management and IT technologies
Risk management and IT technologiesRisk management and IT technologies
Risk management and IT technologies
 
Fog computing
Fog computingFog computing
Fog computing
 
Inertial sensors
Inertial sensors Inertial sensors
Inertial sensors
 
Big Data Integration
Big Data IntegrationBig Data Integration
Big Data Integration
 
Cloud computing pricing models
Cloud computing pricing modelsCloud computing pricing models
Cloud computing pricing models
 
Internet of things security challenges
Internet of things security challengesInternet of things security challenges
Internet of things security challenges
 
Marketing Mobile
Marketing MobileMarketing Mobile
Marketing Mobile
 
Secure Aware Routing Protocol
Secure Aware Routing ProtocolSecure Aware Routing Protocol
Secure Aware Routing Protocol
 
Bhopal disaster
Bhopal disasterBhopal disaster
Bhopal disaster
 
Penetration testing in wireless network
Penetration testing in wireless networkPenetration testing in wireless network
Penetration testing in wireless network
 
Cyber propaganda
Cyber propagandaCyber propaganda
Cyber propaganda
 
Dhcp authentication using certificates
Dhcp authentication using certificatesDhcp authentication using certificates
Dhcp authentication using certificates
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Sql parametrized queries
Sql parametrized queriesSql parametrized queries
Sql parametrized queries
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Enhancing the performance of kmeans algorithm

  • 1. Enhancing the performance of K-Means algorithm
  • 2. Plan • Basic K-Means Algorithm • Converting basic K-Means algorithm to concurrent • Implementation of K-Means algorithm using C# • Analysis of Results • Conclusion
  • 4. Definition • K-Means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.
  • 5. Definition • The K-Means problem is to find cluster centers that minimize the sum of squared distances from each data point being clustered to its cluster center (the center that is closest to it). • A very common measure is the sum of distances or sum of squared Euclidean distances from the mean of each cluster.
  • 7. Step 1 The algorithm arbitrarily selects k points as the initial cluster centers (“means”).
  • 8. Step 2 Each point in the dataset is assigned to the closed cluster, based upon the Euclidean distance between each point and each cluster center.
  • 9. Step 3 Each cluster center is recomputed as the average of the points in that cluster. Steps 2 and 3 repeat until the clusters converge.
  • 10. Convergence Convergence means that either no observations change clusters when steps 2 and 3 are repeated or that the changes do not make a material difference in the definition of the clusters
  • 12. K-Means Algorithm Deficiencies • The k-means algorithm has at least two major theoretic shortcomings: It has been shown that the worst case running time of the algorithm is super-polynomial in the input size. The approximation found can be arbitrarily bad with respect to the objective function compared to the optimal clustering.
  • 13. Our Work Basic K-Means will be updated and manipulated to a Concurrent K-Means version that uses special .Net framework libraries to take advantage of Multi- threading Technology.
  • 14. Our Work This Concurrent version of K-Means reserves all the benefits of Basic K-Means and adds to it a much faster and manipulated abilities that makes the software runs as fast as 70%~85% more than Basic K-Means.
  • 16. First Step First we must identify the Task containing independent sub-tasks that can be executed in parallel.
  • 17. Identifying sub-Tasks Consider the K-Means algorithm as follows: 1) Pick Random Center Points 2) Assign Points To Centers 3) Calculate New Centers 4) Check If Centers Are Equal (if so, quit Else Go to 2)
  • 18. Basic K-Means Algorithm execution 1 2 3 4 End no convergence convergence Single thread
  • 19. Identifying sub-Tasks In step 2, we are going to loop over every point and determine which center is closest to it. Since there is no state modified during this lookup. we can easily make this processes parallel.
  • 20. Identifying sub-Tasks In step 3, when we calculate new centers, we are just going to loop over all of the points in a given group and calculate their “average” location (or centroid)
  • 21. Identifying sub-Tasks The Steps 2 and 3 are the best steps that we can apply parallelism on them because they are composed of independent loops executed over the data points.
  • 22. Concurrent K-Means Algorithm execution Linear execution 1 2 3 4 End Parallel execution no convergence convergence
  • 24. Basic K-Means algorithm For step 3, all we need to do is loop through each point and check every center until we find the closest one. If we weren’t concerned with writing a parallel application then we could simple loop over them with a normal foreach statement: foreach (var point in Points){ //content goes here }
  • 25. Concurrent K-Means algorithm But if we leverage the System.Threading.Tasks.Parallel class in .NET 4.0, we could simply write this: Parallel.ForEach(points, point => { //contents goes here }); The same thing is repeated in step 4
  • 30. Used machine • Experiments are made under the following machine: • CPU = Intel(R) Xeon(R) X5690 @ 3.47 GHz/ 63.9 Gb of RAM • Operating System = Microsoft Windows Server 2003 Enterprise X64 Edition Service Pack 2 • Number of Processors = 24 • Application Type = 64 bit
  • 31. K = 10 Means (k) Data Points One Thread (sec) Multi-Threaded (sec) Difference (sec) Iterations Enhancement (%) 10 5000 0.3880071 0.2134675 0.1745396 46 44.98360984 10 10000 0.7593024 0.398528 0.3607744 48 47.51392857 10 15000 0.7250237 0.3331953 0.3918284 48 54.04352989 10 20000 1.2642376 0.5171551 0.7470825 21 59.09352008 10 25000 0.8343164 0.3451272 0.4891892 21 58.63353519 10 30000 2.2632929 0.913688 1.3496049 47 59.63014774 10 35000 1.907018 0.7550718 1.1519462 34 60.40562805 10 40000 2.4957917 0.9887817 1.50701 39 60.3820423 10 45000 3.2316701 1.2320773 1.9995928 44 61.87490487 10 50000 4.0127932 1.4904087 2.5223845 49 62.85857193
  • 32. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 10000 20000 30000 40000 50000 60000 Executiontime(Second) Data points (point) One-Threaded Multi-Threaded K = 10
  • 33. K = 20 Means (k) Data Points One Thread (sec) Multi-Threaded (sec) Difference (sec) Iterations Enhancement (%) 20 5000 0.7284919 0.2125982 0.5158937 47 70.81666934 20 10000 1.2648146 0.3195409 0.9452737 41 74.7361471 20 15000 1.6659779 0.3957833 1.2701946 36 76.24318426 20 20000 5.0632423 1.2135201 3.8497222 84 76.03274684 20 25000 4.2068176 1.0020813 3.2047363 56 76.17958763 20 30000 7.151855 1.6554456 5.4964094 80 76.85291998 20 35000 6.0900071 1.4264851 4.663522 58 76.57662665 20 40000 4.9248527 1.1537625 3.7710902 41 76.57264957 20 45000 14.0519236 3.2482402 10.8036834 104 76.8840175 20 50000 6.5857465 1.5168731 5.0688734 44 76.9673324
  • 34. K = 20 0 2 4 6 8 10 12 14 16 0 10000 20000 30000 40000 50000 60000 Executiontime(Second) Data points (point) One-Threaded Multi-Threaded
  • 35. K = 30 Means (k) Data Points One Thread (sec) Multi-Threaded (sec) Difference (sec) Iterations Enhancement (%) 30 5000 0.6833344 0.1506775 0.5326569 30 77.94966857 30 10000 2.9058289 0.5835878 2.3222411 66 79.9166496 30 15000 3.9962787 0.7419719 3.2543068 60 81.43342956 30 20000 4.7729457 0.8792443 3.8937014 54 81.57858155 30 25000 13.3885657 2.3911529 10.9974128 121 82.14033561 30 30000 5.942487 1.0777533 4.8647337 45 81.86359852 30 35000 9.0325469 1.6179971 7.4145498 59 82.0870335 30 40000 14.0488585 2.4782393 11.5706192 80 82.35985294 30 45000 15.148019 2.6497895 12.4982295 77 82.50735294 30 50000 15.9880739 2.7855552 13.2025187 73 82.57729344
  • 36. K = 30 0 2 4 6 8 10 12 14 16 18 0 10000 20000 30000 40000 50000 60000 Executiontime(Second) Data points (point) One-Threaded Mutli-Threaded
  • 37. K = 40 Means (k) Data Points One Thread (sec) Multi-Threaded (sec) Difference (sec) Iterations Enhancement (%) 40 5000 0.8754205 0.1490575 0.726363 30 82.97303981 40 10000 3.9124465 0.6024399 3.3100066 68 84.60196453 40 15000 6.7258824 0.9795056 5.7463768 78 85.43677184 40 20000 8.4592087 1.1685599 7.2906488 73 86.18594314 40 25000 8.5551805 1.1898307 7.3653498 59 86.09227824 40 30000 14.9347712 2.0584344 12.8763368 86 86.21716816 40 35000 24.0051212 3.2160665 20.7890547 119 86.6025817 40 40000 28.2736811 3.7496219 24.5240592 122 86.73811915 40 45000 16.3791093 2.1855015 14.1936078 63 86.65677443 40 50000 16.4799443 2.1400651 14.3398792 57 87.01412419
  • 38. K = 40 0 5 10 15 20 25 30 0 10000 20000 30000 40000 50000 60000 Executiontime(Second) Data points (point) One-Threaded Multi-Threaded
  • 39. Results Analysis • In case of K = 10, the results show that when data points number is 5000 the algorithm is enhanced by 44.98360984 % and this value grows up to reach 62.85857193 % when data points number is 50000 .
  • 40. Results Analysis • In case of K = 20, the results show that when data points number is 5000 the algorithm is enhanced by 70.81666934 % and this value grows up to reach 76.9673324 % when data points number is 50000 .
  • 41. Results Analysis • In case of K = 30, the results show that when data points number is 5000 the algorithm is enhanced by 77.94966857 % and this value grows up to reach 82.57729344 % when data points number is 50000 .
  • 42. Results Analysis • In case of K = 40, the results show that when data points number is 5000 the algorithm is enhanced by 82.97303981 % and this value grows up to reach 87.01412419 % when data points number is 50000.
  • 43. Results Analysis • If we regress the enhancement (Y) on the number of means (X1) and the data points (X2) we will have the equation below: • Enhancement = 49.762 + 0.871 (Number of means) + 0.0001 (Number of data points) • R2 = 0.83251942
  • 45. Conclusion • This equation shows that there are a high correlation between the Enhancement and the number of means. We can say that when the number of means is bigger, the enhancement is better. We have this result because we used parallel loops when looping over clusters. So multi-threading is used more when we have more means.
  • 46. Future Work In this project we have worked only on two tasks in the K-Means algorithm (Steps 2 & 3). In future works, we can work on converting the whole algorithm into concurrent.