1. Electricity consumption optimization using K-means
clustering algorithm
Hyun Wong Choi, Nawab Muhammad Faseeh Qureshi and
Dong Ryeol Shin
Sungkyunkwan University, South Korea.
3. Introduction
• Electricity consumption
• Power grid
• In the power grid, we measure the consumption through sensors
• Industrial consumption
• Housing consumption
• Factories consumption
• Housing Consumption
• Front end (Consumer End)
• Back end (electrical company end)
4. Introduction (Cont.)
• Back end (Company End)
• Dataset for consumption UCIRVINE
• So many techniques that solves the optimization problem of
electricity but, none of them focus on housing electricity
optimization,
• Reducing the cost
• Factors of overcharge
• Prediction
Are not available
5. Introduction (Cont.)
• Solution
• K-mean algorithm
• Why chose k-mean cluster
• Predict the answer from the dataset
• No any answer is available in terms of k-mean
• Why predicting the answers
• No clear result
• In this paper electricity usage of home is analyzed through k-means clustering
algorithm for obtaining the optimal home usage electricity usage of home is
• 3A is analyzed through K-means clustering algorithm for obtaining the optimal home usage
electricity data points The Calinski-Harabasz Index, davis-boulden index and silhouette_score find
detailed optimal number of clusters in the K-means algorithm and present the application
scenario of the machine learning algorithm.
• 3B is reducing the 1/8 dataset and result the same result
• The proposed approach delivers us efficient and meaning prediction results never
obtained before.
8. Evaluation
• Declaration & Resources
System PC configuration Software
3A 1st paper execution snapshot
3B 2nd paper execution snapshot
9. Analysis
From K-means algorithms calculate proper cluster things is
very important, from the data, estimate Silhouette_score, the
result is K – 7 each cluster centroid and data prices silhouette
score are 0.799 is the optimal score. From the formal Caliski-
Harabasz Index results are 560.3999 is the optimal result. Using
this k-means algorithm the fact is figure 11.
K = 7
10. Software
PC Perfomance
Software OS Software Ram Processor Harddisk
Anadconda3 + Pycham3 Window 10 Professional 16.0GB i7-6600U CPU @2.60GHz 420GB SSD
11. System PC configuration Software
• Dataset UC Irvine Machine learning
https://archive.ics.uci.edu/ml/index.php
• sci-kit learn, Anaconda 3 open-source personally can easily
follow it and because using BSD License to real works don’t
have difficulties to that.
13. 3B Second paper execution(1)
1/8 dataset K =1 1/8 dataset K = 2 1/8 dataset K =3 1/8 dataset K =4 1/8 dataset K =5
1/8 dataset K =6 1/8 dataset K = 7 1/8 dataset K = 8 1/8 dataset K = 9 1/8 dataset K = 10
1/8 dataset K = 11
14. Conclusion
• Technique
• How we solve it 1st point
Using the k-means clustering algorithm and
• Association with multiple answers
• Electricity
• Blight side of conclusion
15. Published paper
1. Hyun Wong Choi, Nawab Muhammad Faseeh Qureshi and Dong Ryeol Shin “Comparative Analysis of
Electricity Consumption at Home through a Silhouette-score prospective” , ICACT 2019 , South Korea , 2019
Sungkyunkwan University, Korea
2. Hyun Wong Choi, Nawab Muhammad Faseeh Qureshi and Dong Ryeol Shin “Analysis of Electricity
Consumption at Home Using K-means Clustering Algorithm”, ICACT 2019 , South Korea , 2019
Sungkyunkwan University, Korea
17. K-means clustering
• K-means algorithm is one of the clustering methods for
divided, divided is giving the data among the many partitions.
For example, receive data object n, divided data is input data
divided K (≤ n) data, each group consisting of cluster below
equation is the at K-means algorithm when cluster consists of
algorithms using cost function use it [11]
argmin
𝑖 =1
𝑘
𝑥 ∈ 𝑆 𝑖
𝑥 − 𝜇𝑖
2
18. Calinski-Harabasz Index
• How to be well to be clustering inner way is Caliski-Harabasz Index, Davies-Bouldin
index, Dunn index, Silhouette score. In this paper. Evaluate via Clainiski-Harabasz Index
and silhouette score evaluate it.
• From the Cluster Calinski-Harabasz Index s I the clusters distributed average and
cluster distributed ratio will give it to you.
𝑠 𝑘 =
𝑇𝑟(𝐵 𝑘)
𝑇𝑟(𝑊𝑘)
×
𝑁 − 𝑘
𝑘 − 1
• For this Bk is the distributed matrix from each group Wk is the cluster distributed
defined.
𝑊𝑘 =
𝑞=1
𝑘
𝑥∈𝐶 𝑞
(𝑥 − 𝑐 𝑞)(𝑥 − 𝑐 𝑞) 𝑇
𝐵 𝑘 =
𝑞
𝑛 𝑞(𝐶 𝑞 − 𝑐)(𝐶 𝑞 − 𝑐) 𝑇
19. Silhouette score
• Silhouette score is the easy way to in data I each data cluster in
data’s definition an (i) each data is not clustered inner and data’s
definition b(i) silhouette score s(i) is equal to calculate that
s i =
𝑏 𝑖 − 𝑎(𝑖)
max { 𝑎 𝑖 , 𝑏 𝑖 }
• From this calculate s(i) is equal to that function
−1 ≤ s i ≤ 1
• S(i) is the close to 1 is the data I is the correct cluster to each
thing, close to -1 cannot distribute cluster is distributed, from this
paper machine Using the machine learning library scikit-learn in
the house hold power consumption clustering [7].
20. Davies-Bouldin index
• If the ground truth labels are not known, the Davies-Bouldin index
(sklearn. Metrixdavis Boulden)
𝑅𝑖𝑗 =
𝑠 𝑖+𝑠 𝑗
𝑑 𝑖𝑗
• Then the Davis-Bouldin Index is defined as
DB =
1
𝑘
𝑖 = 1 𝑘 max
𝑖≠𝑗
𝑅𝑖𝑗
• The zero is the lowest score a possible. Score. Values closer to
zero indicate a better partition. But the problem is this algorithm
do not attach it in the Scikit-learn library and only explain it in the
document page but cannot experiment easily.
21. Conclusion
• From the paper, Household power consumption via k-means clustering, Used library
which is sci-kit learn, Anaconda 3 open-source personally can easily follow it and
because using BSD License to real works don’t have difficulties to that.
• Not only the K-means algorithm, PCA Algorithms, but also SVM algorithm etc other
machine learning algorithms clustering can also do it.
• From this result, in real life household power consumptions diverse analytics. And
electricity transformer, Transmission power can management period can estimate it.
And each data using electricity consumption. It can be used for progressive taxation,
regional to regional demand forecasting, maintenance of power plants and facilities.
Can do it.
• In the Gas company can estimate via k-means algorithms and also can estimate about
the gas consumption rate to via K-means clustering and index.
23. Comparative Analysis of Electricity Consumption at
Home through a Silhouette-score prospective
Hyun Wong Choi, Nawab Muhammad Faseeh Qureshi and Dong Ryeol Shin
SungKyunKwan University, South Korea
24. Abstract
• Machine learning is a state-of-the-art sub-project of artificial intelligence, that is been evolved
for finding large-scale intelligent analytics in the distributed computing environment. In this
paper, we perform comparative analytics onto dataset collected for the electricity usage of
home based on the K-mean clustering algorithm using comparison to silhouette score with a
ratio of 1/8 dataset. The performance evaluation shows that, the comparison index is similar in
numbers of silhouette score even if dataset is smaller than before.
25. K-means clustering
• K-means clustering
K-means algorithm is one of the clustering methods for divided,
divided is giving the data among the many partitions. For example,
receive data object n, divided data is input data divided K (≤ n)
data, each group consisting of cluster below equation is the at K-
means algorithm when cluster consists of algorithms using cost
function use it [11]
argmin
𝑖 =1
𝑘
𝑥 ∈ 𝑆 𝑖
𝑥 − 𝜇𝑖
2
26. Silhouette score
• Silhouette score is the easy way to in data I each data cluster in data’s definition an (i) each
data is not clustered inner and data’s definition b(i) silhouette score s(i) is equal to calculate that
s i =
𝑏 𝑖 − 𝑎(𝑖)
max { 𝑎 𝑖 , 𝑏 𝑖 }
• From this calculate s(i) is equal to that function
−1 ≤ s i ≤ 1
• S(i) is the close to 1 is the data I is the correct cluster to each thing, close to -1 cannot
distribute cluster is distributed, from this paper machine Using the machine learning library
scikit-learn in the house hold power consumption clustering
27. Analysis
• From K-means algorithms calculate proper cluster things is very important, from the data,
estimate Silhouette_score, the result is K = 7 each cluster centroid and data prices silhouette
score is 0.799 is the optimal score. Even if dataset is so small but the 1/8 datasets K= 7 each
cluster centroid and data prices silhouette score 0.810 is the optimal score. From this K-means
algorithm cluster 7th, ( all dataset , 1/8 dataset ) each group’s centroid and each centroid
distance will be an optimal value. From this result, the dataset is decrease but the K-means
clustering ‘s class vector space. Its optimal cluster is same situation with before original Dataset
Household power consumption rate via clustering.
Figure 12. Shiloutette score according to change of cluster number. Figure 13. 1/8 dataset Silhouette score according to change of cluster number.
28. Conclusion
From the paper, Household power consumption via k-means clustering,
Used library which is sci-kit learn, Anaconda 3 open-source personally
can easily follow it and because using BSD License to real works don’t
have difficulties to that. From this result even if reduce the dataset 1/8
but the silhouette score and all the clustering result is same with before.
But the population will increase it can show the clearer result for the
classification and vector space. Large dataset to small dataset is clear to
show to the specific result for the Silhouette score but opposite site is
not clearly allowed. Because 4-dimension vector dataset. From the
experiment reduce the estimated time if received huge dataset from
analysis.
Machine learning is a state-of-the-art sub-project of artificial intelligence, that is been evolved for finding large-scale intelligent analytics in the distributed computing environment. In this paper, we perform comparative analytics onto dataset collected for the electricity usage of home based on the K-mean clustering algorithm using comparison to silhouette score with a ratio of 1/8 dataset. The performance evaluation shows that, the comparison index is similar in numbers of silhouette score even if dataset is smaller than before.
K-means clustering
K-means algorithm is one of the clustering methods for divided, divided is giving the data among the many partitions. For example, receive data object n, divided data is input data divided K (≤n) data, each group consisting of cluster below equation is the at K-means algorithm when cluster consists of algorithms using cost function use it [11]
Silhouette score is the easy way to in data I each data cluster in data’s definition an (i) each data is not clustered inner and data’s definition b(i) silhouette score s(i) is equal to calculate that
From K-means algorithms calculate proper cluster things is very important, from the data, estimate Silhouette_score, the result is K = 7 each cluster centroid and data prices silhouette score is 0.799 is the optimal score. Even if dataset is so small but the 1/8 datasets K= 7 each cluster centroid and data prices silhouette score 0.810 is the optimal score. From this K-means algorithm cluster 7th, ( all dataset , 1/8 dataset ) each group’s centroid and each centroid distance will be an optimal value. From this result, the dataset is decrease but the K-means clustering ‘s class vector space. Its optimal cluster is same situation with before original Dataset Household power consumption rate via clustering.
From the paper, Household power consumption via k-means clustering, Used library which is sci-kit learn, Anaconda 3 open-source personally can easily follow it and because using BSD License to real works don’t have difficulties to that. From this result even if reduce the dataset 1/8 but the silhouette score and all the clustering result is same with before. But the population will increase it can show the clearer result for the classification and vector space. Large dataset to small dataset is clear to show to the specific result for the Silhouette score but opposite site is not clearly allowed. Because 4-dimension vector dataset. From the experiment reduce the estimated time if received huge dataset from analysis.