presentation 2019 04_09_rev1

Analysis of Electricity Consumption at Home Using K-
means Clustering Algorithm
Hyun Wong Choi, Nawab Muhammad Faseeh Qureshi and
Dong Ryeol Shin
Sungkyunkwan University, South Korea.

Present
• Part A – General Solution
• Part B – Precise Solution

Outline
• Introduction
• Related work
• Proposed Approach
• Evaluation
• Conclusion
• References

Evaluation
• Declaration & Resources
3A execution snapshot included
3B Second paper execution

Conclusion
• Technique
• How we solve it 1st point
• Association with multiple answers
• Electricity
• Blight side of conclusion

Published paper
Comparative Analysis of Electricity Consumption at Home through a Silhouette-score prospective
Hyun Wong Choi, Nawab Muhammad Faseeh Qureshi and Dong Ryeol Shin Sungkyunkwan University, Korea
Analysis of Electricity Consumption at Home Using K-means Clustering Algorithm Hyun Wong Choi, Nawab
Muhammad Faseeh Qureshi and Dong Ryeol Shin Sungkyunkwan University, Korea

Association
• Bigger dataset
• Reducing dataset

K-means clustering
• K-means algorithm is one of the clustering methods for
divided, divided is giving the data among the many partitions.
For example, receive data object n, divided data is input data
divided K (≤ n) data, each group consisting of cluster below
equation is the at K-means algorithm when cluster consists of
algorithms using cost function use it [11]
argmin
𝑖 =1
𝑘
𝑥 ∈ 𝑆 𝑖
𝑥 − 𝜇𝑖
2

K-means clustering (1)
K = 2 K = 3

K = 4 K = 5

K = 6 K = 7

K = 9K = 8

Calinski-Harabasz Index
• How to be well to be clustering inner way is Caliski-Harabasz Index, Davies-Bouldin
index, Dunn index, Silhouette score. In this paper. Evaluate via Clainiski-Harabasz Index
and silhouette score evaluate it.
• From the Cluster Calinski-Harabasz Index s I the clusters distributed average and
cluster distributed ratio will give it to you.
𝑠 𝑘 =
𝑇𝑟(𝐵 𝑘)
𝑇𝑟(𝑊𝑘)
×
𝑁 − 𝑘
𝑘 − 1
• For this Bk is the distributed matrix from each group Wk is the cluster distributed
defined.
𝑊𝑘 =
𝑞=1
𝑘
𝑥∈𝐶 𝑞
(𝑥 − 𝑐 𝑞)(𝑥 − 𝑐 𝑞) 𝑇
𝐵 𝑘 =
𝑞
𝑛 𝑞(𝐶 𝑞 − 𝑐)(𝐶 𝑞 − 𝑐) 𝑇

Silhouette score
• Silhouette score is the easy way to in data I each data cluster in
data’s definition an (i) each data is not clustered inner and data’s
definition b(i) silhouette score s(i) is equal to calculate that
s i =
𝑏 𝑖 − 𝑎(𝑖)
max { 𝑎 𝑖 , 𝑏 𝑖 }
• From this calculate s(i) is equal to that function
−1 ≤ s i ≤ 1
• S(i) is the close to 1 is the data I is the correct cluster to each
thing, close to -1 cannot distribute cluster is distributed, from this
paper machine Using the machine learning library scikit-learn in
the house hold power consumption clustering [7].

Davies-Bouldin index
• If the ground truth labels are not known, the Davies-Bouldin index
(sklearn. Metrixdavis Boulden)
𝑅𝑖𝑗 =
𝑠 𝑖+𝑠 𝑗
𝑑 𝑖𝑗
• Then the Davis-Bouldin Index is defined as
DB =
1
𝑘
𝑖 = 1 𝑘 max
𝑖≠𝑗
𝑅𝑖𝑗
• The zero is the lowest score a possible. Score. Values closer to
zero indicate a better partition. But the problem is this algorithm
do not attach it in the Scikit-learn library and only explain it in the
document page but cannot experiment easily.

Analysis
From K-means algorithms calculate proper cluster things is
very important, from the data, estimate Silhouette_score, the
result is K – 7 each cluster centroid and data prices silhouette
score are 0.799 is the optimal score. From the formal Caliski-
Harabasz Index results are 560.3999 is the optimal result. Using
this k-means algorithm the fact is figure 11.
K = 7

Conclusion
• From the paper, Household power consumption via k-means clustering, Used library
which is sci-kit learn, Anaconda 3 open-source personally can easily follow it and
because using BSD License to real works don’t have difficulties to that.
• Not only the K-means algorithm, PCA Algorithms, but also SVM algorithm etc other
machine learning algorithms clustering can also do it.
• From this result, in real life household power consumptions diverse analytics. And
electricity transformer, Transmission power can management period can estimate it.
And each data using electricity consumption. It can be used for progressive taxation,
regional to regional demand forecasting, maintenance of power plants and facilities.
Can do it.
• In the Gas company can estimate via k-means algorithms and also can estimate about
the gas consumption rate to via K-means clustering and index.

Comparative Analysis of Electricity Consumption at
Home through a Silhouette-score prospective
Hyun Wong Choi, Nawab Muhammad Faseeh Qureshi and Dong Ryeol Shin
SungKyunKwan University, South Korea

Abstract
• Machine learning is a state-of-the-art sub-project of artificial intelligence, that is been evolved
for finding large-scale intelligent analytics in the distributed computing environment. In this
paper, we perform comparative analytics onto dataset collected for the electricity usage of
home based on the K-mean clustering algorithm using comparison to silhouette score with a
ratio of 1/8 dataset. The performance evaluation shows that, the comparison index is similar in
numbers of silhouette score even if dataset is smaller than before.

K-means clustering
• K-means clustering
K-means algorithm is one of the clustering methods for divided,
divided is giving the data among the many partitions. For example,
receive data object n, divided data is input data divided K (≤ n)
data, each group consisting of cluster below equation is the at K-
means algorithm when cluster consists of algorithms using cost
function use it [11]
argmin
𝑖 =1
𝑘
𝑥 ∈ 𝑆 𝑖
𝑥 − 𝜇𝑖
2

Silhouette score
• Silhouette score is the easy way to in data I each data cluster in data’s definition an (i) each
data is not clustered inner and data’s definition b(i) silhouette score s(i) is equal to calculate that
s i =
𝑏 𝑖 − 𝑎(𝑖)
max { 𝑎 𝑖 , 𝑏 𝑖 }
• From this calculate s(i) is equal to that function
−1 ≤ s i ≤ 1
• S(i) is the close to 1 is the data I is the correct cluster to each thing, close to -1 cannot
distribute cluster is distributed, from this paper machine Using the machine learning library
scikit-learn in the house hold power consumption clustering

K =1 K =2

K =3 K =4

K =5 K =6

K =7 K =8

K =9 K =10

K-means clustering - 1/8 dataset (1)
1/8 dataset K =1 1/8 dataset K = 2

1/8 dataset K =3 1/8 dataset K =4

1/8 dataset K =5 1/8 dataset K =6

1/8 dataset K = 7 1/8 dataset K = 8

1/8 dataset K = 9 1/8 dataset K = 10

1/8 dataset K = 11

Analysis
• From K-means algorithms calculate proper cluster things is very important, from the data,
estimate Silhouette_score, the result is K = 7 each cluster centroid and data prices silhouette
score is 0.799 is the optimal score. Even if dataset is so small but the 1/8 datasets K= 7 each
cluster centroid and data prices silhouette score 0.810 is the optimal score. From this K-means
algorithm cluster 7th, ( all dataset , 1/8 dataset ) each group’s centroid and each centroid
distance will be an optimal value. From this result, the dataset is decrease but the K-means
clustering ‘s class vector space. Its optimal cluster is same situation with before original Dataset
Household power consumption rate via clustering.
Figure 12. Shiloutette score according to change of cluster number. Figure 13. 1/8 dataset Silhouette score according to change of cluster number.

Conclusion
From the paper, Household power consumption via k-means clustering,
Used library which is sci-kit learn, Anaconda 3 open-source personally
can easily follow it and because using BSD License to real works don’t
have difficulties to that. From this result even if reduce the dataset 1/8
but the silhouette score and all the clustering result is same with before.
But the population will increase it can show the clearer result for the
classification and vector space. Large dataset to small dataset is clear to
show to the specific result for the Silhouette score but opposite site is
not clearly allowed. Because 4-dimension vector dataset. From the
experiment reduce the estimated time if received huge dataset from
analysis.

presentation 2019 04_09_rev1

More Related Content

What's hot

Similar to presentation 2019 04_09_rev1

More from Hyun Wong Choi

Recently uploaded

presentation 2019 04_09_rev1

Editor's Notes