SlideShare a Scribd company logo
1 of 54
Download to read offline
Master’s Dissertation
Comparative Analysis of Electricity
Consumption at Home through a
Silhouette-score prospective
Hyun Wong Choi
Department of Electrical and Computer Engineering
The Graduate School
Sungkyunkwan University
메모 포함[CH1]: Revised 1
Comparative Analysis of Electricity
Consumption at Home through a
Silhouette-score prospective
Hyun Wong Choi
Department of Electrical and Computer Engineering
The Graduate School
Sungkyunkwan University
Comparative Analysis of Electricity
Consumption at Home through a
Silhouette-score prospective
Hyun Wong Choi
A Dissertation Submitted to the Department of
Electrical and Computer Engineering and
the Graduate School of Sungkyunkwan University
in partial fulfillment of the requirements
for the degree of Master of Science in Engineering
April 2019
Approved by
Professor Dr. Dong Ryeol Shin
This certifies that the dissertation of
Hyun Wong Choi is approved.
Committee Chair:
Prof. Dr. MUHAMMAD MANNAN SAEED
Committee Member: Prof. Dr. Eung Mo Kim
Major Advisor: Prof. Dr. Dong Ryeol Shin
Co-Advisor:
Prof. Dr. Nawab Muhammad Faseeh Querish
The Graduate School
Sungkyunkwan University
June 2019
- 3 -
Contents
List of Figures ....................................................................................................................................................................5
Abstract................................................................................................................................................................................6
Chapter 1 .............................................................................................................................................................................7
Introduction ..........................................................................................................................................................................7
Chapter 2 .......................................................................................................................................................................... 10
Overview & Motivation ................................................................................................................................................. 10
Chapter 3 .......................................................................................................................................................................... 11
Paper-1 Analysis of Electricity Consumption at Home through a Silhouette-score prospective ................. 11
3.1. Introduction....................................................................................................................................................... 11
2.4 Paper-1 EVALUATION............................................................................................................................. 13
2.4.1 Experimental Environment ..................................................................................................................... 13
2.4.2 Experimental Dataset................................................................................................................................. 14
3.2. Previous work................................................................................................................................................... 15
3.3. Proposed Approach....................................................................................................................................... 19
1.1.1. Experimental Results ............................................................................................................................ 23
3.4. Related work..................................................................................................................................................... 29
3.1 Summary............................................................................................................................................................. 31
Chapter 4 .......................................................................................................................................................................... 32
Paper-2 Comparative Analsysis of Electricity Consumption at Home through a Silhouette-score prospective
32
4.1 Introduction ....................................................................................................................................................... 32
4.2 Related work...................................................................................................................................................... 33
3.5. Paper-2 Methodology .................................................................................................................................. 35
1.1.1. Experimental Environment................................................................................................................. 37
1.1.2. Experimental Dataset............................................................................................................................ 38
1.1.3. Experimental Results ............................................................................................................................ 39
Summary...................................................................................................................................................................... 44
Chapter 5 .......................................................................................................................................................................... 45
Conclusion........................................................................................................................................................................ 45
Acknowledgement.......................................................................................................................................................... 46
- 4 -
References......................................................................................................................................................................... 48
- 5 -
List of Figures
Fig.1. Figure 1. Clustering result at K = 1 23
Fig.2. Figure2. Clustering result at K = 2 23
Fig.3. Figure 3. Clustering result at K = 3 23
Fig.4. Figure 4 Clustering result at K = 4 23
Fig.5. Figure 5. Clustering result at K = 5 24
Fig.6. Figure 6. Clustering result at K = 6 24
Fig.7. Figure 7. Clustering result at K = 7 24
Fig.8. Figure 8. Clustering result at K = 8 24
Fig.9. Figure 9. Clustering result at K = 9 25
Fig.10. Figure 10. Clustering result at K = 10 25
Fig.11. Figure 11. Silhouette score according to change of cluster number. 26
Fig.12. Figure 12: Clustering result at K = 7 27
Fig.13. Figure 13. 1/8 dataset cluster K = 1 39
Fig.14. Figure 14. 1/8 dataset cluster K = 2 39
Fig.15. Figure 15. 1/8 dataset cluster K = 3 39
Fig.16. Figure 16. 1/8 dataset cluster K = 4 39
Fig.17. Figure 17. 1/8 dataset cluster K = 5 40
Fig.18. Figure 18. 1/8 dataset cluster K = 6 40
Fig.19. Figure 19. 1/8 dataset cluster K = 7 40
Fig.20. Figure 20. 1/8 dataset cluster K = 8 40
Fig.21. Figure 20. 1/8 dataset cluster K = 9 40
Fig.22. Figure 20. 1/8 dataset cluster K = 10 41
Fig.23. Figure 20. 1/8 dataset cluster K = 11 41
Fig.24. Figure 24. Silhouette score according to change of cluster number. 42
Fig.25.
Figure 25. 1/8 dataset Silhouette score according to change of cluster
number.
42
- 6 -
Abstract
Machine learning is a modern field that has emerged as a new tool for data
analytics in a distributed computing environment. There are several aspects, at
which, machine learning has improved the processing capacity along with the
effectiveness of analysis. In this paper, the electricity usage of the home is analyzed
through K-means clustering algorithm for obtaining the optimal home usage
electricity data points. The Davis Boulden Index and Silhouette_score finds the
detailed optimal number of clusters in the K-means algorithm and present the
application scenario of the machine learning clustering analytics
Machine learning is a state-of-the-art sub-project of artificial intelligence, that
is been evolved for finding large-scale intelligent analytics in the distributed
computing environment. In this paper, we perform comparative analytics onto
dataset collected for the electricity usage of home based on the K-means clustering
algorithm using comparison to silhouette score with a ratio 1/8 dataset. The
performance evaluation shows that the comparison index is similar in numbers of
silhouette score even if datasets are smaller than before
KeyMAwords: Machine Learning, K-means clustering
- 7 -
Chapter 1
Introduction
Electiricty consumption from power grid
In the power grid, we measure the consumption through sensors
Industrial consumption
Housing consumption
Factories consumption
Housing Consumption
Front end( Consumer End )
Back end ( Electircal Company end)
Back end ( Company end )
- Dataset For consumption UCIRVINE
So many techniques that solve the optimization problem of electricity but, none of
them focus on housing electricity optimization,
- Reducing the cost
- Factors of overcharge
- Prediction
Are not available.
Solution
K-means algorithm
Why chose k-mean cluster
Predict the answer from the dataset
No any answer is available in terms of k-mean
- 8 -
Why predicting the answers
No clear result
In this paper electricity usage of home is analyzed through k-means
clustering algorithm for obtaining the optimal home usage electricity usage
of the home is
3A is analyzed through k-means clustering algorithm for obtaining the
optimal home usage electricity data points The calinski-Harabasz Index,
Davis-Boulden index and silhouette_score find the detailed optimal number
of clusters in the K-means algorithm and present the application scenario of
the machine learning algorithm.
3B is reducing the 1/8 dataset and result in the same result
The proposed approach delivers us efficient and meaning prediction results
never obtained before.
- 9 -
Machine learning is an analyzing mechanism that fetches and identifies the
matching patterns from existing datasets for newer result formations. This paper
discusses comparative analytics related to unsupervised learning algorithms. At
which we compare the K-mean clustering result with a ratio of half dataset to
silhouette_score result. We performed analysis and came to a conclusion that
Davis-Boulden index is not working smoothly in the sci-kit learn library, so
performed a check analysis for Caliski-Harabasz Index and Silhouette score along
with and Davis – Boulden index and compared results to each of them so to learn
that when we reduce the dataset to a mentioned proportion, the resultant dataset
shows half score than the traditional dataset score.
- 10 -
Chapter 2
Overview & Motivation
In real life household power consumptions diverse analytics and electricity
transformer, Transmission power can management period can estimate it.
And each data using electricity consumption. It can be used for progressive
taxation. Regional to regional demand, forecasting, maintenance of power
plant and facilities. In the gas company or Car, the company can estimate
about the consumption for the via k-means algorithm and also can estimate
via k-means algorithms and also can estimate about the gas consumption rate
to via k-means clustering and index.
Motivated by Google AI, Tensor-flow Conference 2017
- 11 -
Chapter 3
Paper-1 Analysis of Electricity Consumption at Home
through a Silhouette-score prospective
3.1. Introduction
Machine learning is a sub-project of artificial intelligence, that is used
to develop algorithms and techniques for enabling the computers to learn [1].
It is used to train the computer for various aspects such as (i) distinguish
whether e-mails received are s pam or not, (ii) data classification application,
(iii) association rule identification, and (iv) character recognition.
Machine learning includes a series of processes, in which a computer
lookup for (i) similar patterns, (ii) generate a novel classification system, (iii)
data analytics, and (iv) producing meaningful results. It is a kind of artificial
intelligence, that can be predicted based on the result if it is supported only
by analytics algorithms. Machine learning is a step-by-step evolution process
that leads from big data analytics to predict future actions towards making
decisions on its own through past learned results. The key issues for
processing a successful prediction model remains to be within increasing the
probability and reducing the error and the said problems are resolved through
enabling numerous iterative learnings [2].
- 12 -
At the heart of machine learning are Representation and Generalization,
where expression is an evaluation of data and generalization is the processing
of future data. Unsupervised learning is a type of machine learning that is
used primarily to determine how data is organized. Unlike Supervised
Learning or Reinforcement Learning, this method does not give a target
value for input values [3].
Autonomous learning is closely related to the density estimation of
statistics. These autonomous learning can summarize and describe the main
characteristics of the data. An example of autonomous learning is clustering.
In this paper, we use the K-means algorithm to measure the optimal number
of clusters based on the Calinski-Harabasz Index and Silhouette_score,
Davis-Boulden index and then apply it to household electricity consumption
analysis.
- 13 -
2.4 Paper-1 EVALUATION
2.4.1 Experimental Environment
Software: Anaconda3 + Pycharm3
OS Software: Window 10 Professional
Ram 16.0GB
Processor: i7-6600U CPU @2.60GHz
Harddisk : 420GB SSD
- 14 -
2.4.2 Experimental Dataset
1.date: Date in format dd/mm/yyyy
2.time: time in format hh:mm:ss
3.global_active_power: household global minute-averaged active
power (in kilowatt)
4.global_reactive_power: household global minute-averaged reactive
power (in kilowatt)
5.voltage: minute-averaged voltage (in volt)
6.global_intensity: household global minute-averaged current
intensity (in ampere)
7.sub_metering_1: energy sub-metering No. 1 (in watt-hour of active
energy). It corresponds to the kitchen, containing mainly a dishwasher,
an oven and a microwave (hot plates are not electric but gas powered).
8.sub_metering_2: energy sub-metering No. 2 (in watt-hour of active
energy). It corresponds to the laundry room, containing a washing-
machine, a tumble-drier, a refrigerator, and a light.
9.sub_metering_3: energy sub-metering No. 3 (in watt-hour of active
energy). It corresponds to an electric water-heater and an air-
conditioner.
Global_active_power Global_reactive_power Voltage Global_intensity Sub_metering_1 Sub_metering_2 Sub_metering_3
0 4.216 0.418 234.84 18.4 0 1 17
1 5.36 0.436 233.63 23 0 1 16
2 5.374 0.498 233.29 23 0 2 17
3 5.388 0.502 233.74 23 0 1 17
4 3.666 0.528 235.68 15.8 0 1 17
- 15 -
Household power consumption from the dataset Download from University
California Irvine Machine Learning Dataset Repository and then use it, this
dataset is via delimiter is divided.
Global_active_power, Global Reactive_power, Voltage,
Global_intensity is divided. Global Active_power and Global Reactive
power the X, Y axis experiment it.
3.2. Previous work
3.2.1 Machine Learning
Machine learning is like data mining, but it is different in predicting
data based on learned attributes, mainly through training data. In addition to
the three techniques, Unsupervised learning, Supervised Learning or
Reinforcement Learning, various types of machine learning techniques such
as Semi-Supervised Learning and Deep Learning algorithms are developed
Has been used.
- 16 -
3.2.2 Clustering
Clustering is a method of data mining by defining a cluster of data
considering the characteristics of given data and finding a representative
point that can represent the data group. A cluster is a group of data with
similar characteristics. If the characteristics of the data are different, they
must belong to different clusters. It is the main task of exploratory data
mining, and a common technique for statistical data analysis, used in many
fields, including pattern recognition, information retrieval, machine learning,
and computer graphics [3].
(1) Maximizing inter-cluster variance
(2) Minimizing the inner-cluster variance
Note, however, that clustering should be distinguished from
Classification. Clustering is unsupervised learning without correct answers.
In other words, we group similar objects without group information of each
object. Classification, on the other hand, is supervised learning. When you
carry out classification tasks, you will learn to predict the dependent variable
(Y) with the independent variable (X) of the data [4].
- 17 -
3.2.3 Community Feasibility Assessment
Since clustering tasks are not correct, they cannot be evaluated as
indicators, such as simple accuracy, as in a typical machine learning
algorithm. As you can see in the example below, it is not easy to find the
optimal number of clusters without the correct answers. Cluster analysis
itself is not one specific algorithm, but the general task to be solved. It can
be achieved by various algorithms that differ significantly in their
understanding of what constitutes a cluster and how to efficiently find them.
Popular notions of clusters include a group with small distances between
cluster members, dense areas of data space,
- 18 -
3.2.4 Scikit-learn
In general, a learning problem considers a set of n samples of data and
then tries to predict the properties of unknown data. If each sample is more
than a single number and for instance. A multi-dimensional entry, it is said
to have several attributes or features.
Supervised learning, in which the data comes with additional attributes
that we want to predict this problem can be either.
Classification: samples belong to two or more classes and we want to learn
from already labeled data on how to predict the class of unlabeled data. An
example of a classification problem would be handwritten digit recognition,
in which the aim is to assign each input vector to one of a finite number of
discrete categories. Another way to think of classification is as a discrete( as
opposed to continuous) form of supervised learning where one has a limited
number of categories and for each of n samples provided. One if to try to
label them with the correct category or class.
Scikit-learn is the machine learning platform in the middle range of
superficial broad python module this package high-level language can us
easily high-level documentation and proper API suggested. Using BSD
license as academic or commercially use it. Source-code, documentation is
downloaded from websites [10]
Supervised learning, Unsupervised Learning is the many problems is
inserted in the Scikit-learn, Generalized Models, Linear and Quadratic
Recruitment Analysis, Kernel Ridged regression, Support Vector machine,
Stochastic Gradient Decent model’s solution also inserted in the Scikit-learn
- 19 -
3.3. Proposed Approach
K-means algorithms is one of the clustering methods for divided,
divided is giving the data among the many partitions, For example, receive
data object n, divided data is input data divided K(<= n) data, each group
consisting of cluster below equation is the at K-means algorithm when
cluster consists of algorithms using cost function use it [11]
argmin ∑ ∑ ‖𝑥 − 𝜇𝑖‖
𝑥 ∈ 𝑆 𝑖
2𝑘
𝑖 =1
In other words, one of the data objects divided by the K group.
Currently, the divided similarity is (dissimilarity with reducing the cost
function about it. And from this theory each object similarity increase,
different group similarity will decrease.[12] K-means algorithm is each
centroid and in each group’s data object times’ summation, from this
function result, the data object group updated clustering progressed.[5]
- 20 -
How to be well to be clustering inner way is Caliski-Harabasz Index,
Davies-Bouldin index, Dunn index, Silhouette score. In this paper. Evaluate
via Clainiski-Harabasz Index and silhouette score evaluate it.
From the Cluster Calinski-Harabasz Index s I the clusters distributed
average and cluster distributed ratio will give it to you.
𝑠(𝑘) =
𝑇𝑟(𝐵 𝑘)
𝑇𝑟(𝑊𝑘)
×
𝑁 − 𝑘
𝑘 − 1
For this Bk is the distributed matrix from each group Wk is the cluster
distributed defined.
𝑊𝑘 = ∑ ∑ (𝑥 − 𝑐 𝑞)(𝑥 − 𝑐 𝑞
𝑥∈𝐶 𝑞
𝑘
𝑞=1
) 𝑇
𝐵 𝑘 = ∑ 𝑛 𝑞(𝐶 𝑞 − 𝑐)(𝐶 𝑞
𝑞
− 𝑐) 𝑇
N is the number of Data, Cq data group in Cq, Cq is the cluster q’s centroid,
c is the E of the Centroid, NQ is the number of data number in cluster_q
- 21 -
Silhouette score is the easy way to in data I each data cluster in data’s
definition an (i) each data is not clustered inner and data’s definition b(i)
silhouette score s(i) is equal to calculate that
s(i) =
𝑏(𝑖) − 𝑎(𝑖)
max { 𝑎(𝑖), 𝑏(𝑖)}
From this calculate s(i) is equal to that function
−1 ≤ s(i) ≤ 1
S(i) is the close to 1 is the data 1 is the correct cluster to each thing, close
to -1 cannot distribute cluster is distributed, from this paper machine Using
the machine learning library scikit-learn in the household power
consumption clustering[7],
- 22 -
Household power consumption from the dataset Download from
University California Irvine Machine Learning Data Repository[8] and then
use it, this dataset is via delimiter is divided. Global_active_power, Global
Reactive_power, Voltage, Global _intensity is divided. Global
Active_powere and Global Reactive power the X, Y axis experiment it,
Python library is Anaconda3 K-means algorithm’s key point is using Data
keep K clusters, reduce cluster’s distance, K-means algorithms input data put
the labels. Figure 1 is the before check Calinski-Harabasz Index and
Silhouette_score execute K-means algorithm’s result. Figure 1 to Figure 11
are 1/8 dataset k-means clustering result for Household power consumption
from UC Irvine Repository and reduce the dataset 1/8 times from original
UCI machine learning data repository.
- 23 -
1.1.1. Experimental Results
Figure 1. Clustering result at K = 1 Figure2. Clustering result at K=2
Figure 3. Clustering result at K = 3 Figure4. Clustering result at K=4
- 24 -
Figure 5. Clustering result at K = 5 Figure 6. Clustering result at K=6
Figure 7. Clustering result at K=7 Figure 8. Clustering result at K=8
- 25 -
Figure 9. Clustering result at K = 9 Figure 10. Clustering result at K=10
- 26 -
After all, reduce each cluster’s distance calculate each cluster’s
Calinski-Harabasz Index, increasing clusters’ Calinski-Harabasz Index will
decrease with K ratio is too law estimate K this cluster partition will one
more or not electric consumption rate is very important. This one is the most
important fact.
Figure 11. Silhouette score according to change of cluster number.
Equal with Caliski-Harabasz Index estimation, calculate
Silhouette_score. The cluster will increase Silhouette_score will decreases
with K distributed, a low factor with optimal K represented.
From K-means algorithms calculate proper cluster things is very
important, from the data, estimate Silhouette_score, the result is K=7 each
cluster centroid and data prices silhouette score are 0.799 is the optimal score.
- 27 -
From the formal Caliski-Harabasz Index results are 560.3999 is the
optimal result. Using this k-means algorithm the fact is figure 11.
From this K-means algorithm cluster 7th,
each group’s centroid and
each centroid distance will be an optimal value. From this result, each
Centroid can divide. Household power consumption rate via clustering.
Figure 12: Clustering result at K=7
- 28 -
Davies-Bouldin index
If the ground truth labels are not known, the Davies-Bouldin index
(sklearn. Metrixdavis Boulden)
𝑅𝑖𝑗 =
𝑠𝑖 + 𝑠𝑗
𝑑𝑖𝑗
Then the Davis-Bouldin Index is defined as
DB =
1
𝑘
∑ 𝑖 = 1 𝑘
max
𝑖≠𝑗
𝑅𝑖𝑗
The zero is the lowest score a possible. Score. Values closer to zero
indicate a better partition. But the problem is this algorithm does not attach
it in the Scikit-learn library and only explain it in the document page but
cannot experiment easily.
Evaluation result
K Silhouette Score
Calinski-Harabasz
Index
Davies-Boulden
index
5 0.8117 N/A
6 0.6511 N/A
7 0.7719 560.3999 N/A
8 0.7037 N/A
- 29 -
3.4. Related work
Machine learning is a sub-project of artificial intelligence, that is used
to develop algorithms and techniques for enabling the computers to learn [1].
It is used to train the computer for various aspects such as (i) distinguish
whether e-mails received are spam or not, (ii) data classification application,
(iii) association rule identification, and (iv) character recognition.
Machine learning includes a series of processes, in which a computer
lookup for (i) similar patterns, (ii) generate a novel classification system, (iii)
data analytics, and (iv) producing meaningful results. It is a kind of artificial
intelligence, that can be predicted based on the result if it is supported only
by analytics algorithms. Machine learning is a step-by-step evolution process
that leads from big data analytics to predict future actions towards making
decisions on its own through past learned results. The key issues for
processing a successful prediction model remains to be within increasing the
probability and reducing the error and the said problems are resolved through
enabling numerous iterative learnings [2].
At the heart of machine learning are Representation and Generalization,
where expression is an evaluation of data and generalization is the processing
of future data. Unsupervised learning is a type of machine learning that is
used primarily to determine how data is organized. Unlike Supervised
Learning or Reinforcement Learning, this method does not give a target
value for input values [3].
- 30 -
Autonomous learning is closely related to the density estimation of
statistics. These autonomous learning can summarize and describe the main
characteristics of the data. An example of autonomous learning is clustering.
In this paper, we use the K-means algorithm to measure the optimal number
of clusters based on the Calinski-Harabasz Index and Silhouette_score,
Davis-Boulden index and then apply it to household electricity consumption
analysis.
- 31 -
3.1 Summary
From the paper, Household power consumption via k-means clustering,
Used library which is sci-kit learn, Anaconda 3 open-source personally can
easily follow it and because using BSD License to real works don’t have
difficulties to that. Not only the K-means algorithm, PCA Algorithms but
also SVM algorithm, etc other machine learning algorithms clustering can
also do it. From this result, in real life household power consumptions
diverse analytics. And the electricity transformer, Transmission power can
management period can estimate it. And each data using electricity
consumption. It can be used for progressive taxation, regional to regional
demand forecasting, maintenance of power plants and facilities. Can do it. In
the Gas, the company can estimate via k-means algorithms and also can
estimate the gas consumption rate to via K-means clustering and index.
- 32 -
Chapter 4
Paper-2 Comparative Analysis of Electricity Consumption at Home
through a Silhouette-score prospective
4.1 Introduction
Machine learning is an analyzing mechanism that fetches and identifies
the matching patterns from existing datasets for newer result formations.
This paper discusses comparative analytics related to unsupervised learning
algorithms, at which we compare the K-mean clustering result with a ratio
of half dataset to Silhouette_score results. We performed analysis and came
to the conclusion that Davis-Boulden index is not working smoothly in the
sci-kit learning, so performed a check analysis for Caliski-Harabasz Index
and Silhouette score along with and Davis – Boulden index and compared
results to each of them so to learn that when we reduce the dataset to a
mentioned proportion, the resultant dataset shows half score than the
traditional dataset score.
- 33 -
4.2 Related work
Machine learning is a field of artificial intelligence, that is used to
develop algorithms and techniques that enable computers to learn [1]. It is
used to train the computer to distinguish whether e-mails received are spam
or not, and there are various applications such as data classification,
associated rule identification, and character recognition, which comply to the
standard machine learning perspectives.
It includes a series of processes, in which a computer finds its own
patterns, creates a new classification system, analyzes the data, and produces
meaningful results. The successful prediction occurs with the increase in
probability and decrease in the error issues. Machine learning enables to sort
out the issues with various iterative learning [2]. Among them, supervised
learning is highly related to summarizing the learning methods for re-
enforcement mechanisms [3].
- 34 -
Clustering is a process of mining the dataset by defining a cluster of
data that considers the characteristics of input and finds a representative
method to point out the data group. In this way, a cluster is a group of relevant
data elements with similar characteristics. If the functions are not the same,
the ingredients belong to contrast clusters [3]. Clustering is unsupervised
learning without accuracy in answers. In the same way, the objects having
the same information are grouped together for similar elements. However,
the classification is a way related to supervised learning. When you perform
classification operations, the system will learn to predict the dependent
variable (Y) with the independent variable (X) of the data [4].
Scikit-learn is the machine learning platform in the middle range of
superficial broad python module this package high-level language can us
easily high-level documentation and proper API suggested. Using BSD
license as academic or commercially use it. Source-code, documentation is
downloaded from websites [10]. Supervised learning, Unsupervised
Learning is the many problems is inserted in the Scikit-learn, Generalized
Models, Linear and Quadratic Decruitment Analysis, Kernel Ridged
regression, Support Vector machine, Stochastic Gradient Decent model’s
solution also inserted in the Scikit-learn.
- 35 -
3.5. Paper-2 Methodology
K-means algorithm is one of the clustering methods for divided,
divided is giving the data among the many partitions. For example, receive
data object n, divided data is input data divided K (≤ n) data, each group
consisting of cluster below equation is the at K-means algorithm when
cluster consists of algorithms using cost function use it [11]
argmin ∑ ∑ ‖𝑥 − 𝜇𝑖‖
𝑥 ∈ 𝑆 𝑖
2𝑘
𝑖 =1
In other words, one of the data objects divided by the K group.
Currently, the divided similarity is (dissimilarity with reducing the cost
function about it. And from this theory each object similarity increase,
different group similarity will decrease. [12] K-means algorithm is each
centroid and in each group’s data object times’ summation, from this
function result, the data object group updated clustering progressed. [5]
- 36 -
Silhouette score is the easy way to in data I each data cluster in data’s
definition an (i) each data is not clustered inner and data’s definition b(i)
silhouette score s(i) is equal to calculate that
s(i) =
𝑏(𝑖) − 𝑎(𝑖)
max { 𝑎(𝑖), 𝑏(𝑖)}
From this calculate s(i) is equal to that function
−1 ≤ s(i) ≤ 1
S(i) is the close to 1 is the data I is the correct cluster to each thing,
close to -1 cannot distribute cluster is distributed, from this paper machine
Using the machine learning library scikit-learn in the household power
consumption clustering [7]. Household power consumption from the
dataset Download from University California Irvine Machine Learning
Dataset Repository [8] and then use it, this dataset is via delimiter is divided.
Global_active_power, Global Reactive_power, Voltage, Global_intensity is
divided. Global Active_power and Global Reactive power the X, Y axis
experiment it.
- 37 -
Python library is Anaconda3 K-means algorithm’s key point is using
Data keep K clusters, reduce cluster’s distance, K-means algorithms input
data put the labels. figure 1 is the before check Calinski-Harabasz Index and
Silhouette_score execute K-means algorithm’s result. Figure 1 to Figure 11
are 1/8 dataset k-means clustering result for House hold power consumption
from UC Irvine Repository and reduce the dataset 1/8 times from original
UCI machine learning data repository.
1.1.1. Experimental Environment
Software: Anaconda3 + Pycharm3
OS Software: Window 10 Professional
Ram 16.0GB
Processor: i7-6600U CPU @2.60GHz
Harddisk : 420GB SSD
- 38 -
1.1.2. Experimental Dataset
1.date: Date in format dd/mm/yyyy
2.time: time in format hh:mm:ss
3.global_active_power: household global minute-averaged active
power (in kilowatt)
4.global_reactive_power: household global minute-averaged reactive
power (in kilowatt)
5.voltage: minute-averaged voltage (in volt)
6.global_intensity: household global minute-averaged current
intensity (in ampere)
7.sub_metering_1: energy sub-metering No. 1 (in watt-hour of active
energy). It corresponds to the kitchen, containing mainly a dishwasher,
an oven and a microwave (hot plates are not electric but gas powered).
8.sub_metering_2: energy sub-metering No. 2 (in watt-hour of active
energy). It corresponds to the laundry room, containing a washing-
machine, a tumble-drier, a refrigerator, and a light.
9.sub_metering_3: energy sub-metering No. 3 (in watt-hour of active
energy). It corresponds to an electric water-heater and an air-
conditioner.
Global_active_power Global_reactive_power Voltage Global_intensity Sub_metering_1 Sub_metering_2 Sub_metering_3
0 4.216 0.418 234.84 18.4 0 1 17
1 5.36 0.436 233.63 23 0 1 16
2 5.374 0.498 233.29 23 0 2 17
3 5.388 0.502 233.74 23 0 1 17
4 3.666 0.528 235.68 15.8 0 1 17
- 39 -
1.1.3. Experimental Results
Figure 13. 1/8 dataset cluster K=1 Figure 14. 1/8 dataset cluster K=2
Figure15. 1/8 dataset cluster K=3
Figure 16. 1/8 dataset cluster K=4
- 40 -
Figure 17. 1/8 dataset cluster K=5 Figure 18. 1/8 dataset cluster K=6
Figure 19. 1/8 dataset cluster K=7 Figure 20. 1/8 dataset cluster K=8
- 41 -
Figure 21. 1/8 dataset cluster K=9 Figure 22. 1/8 dataset cluster K=10
Figure 23. 1/8 dataset cluster K=11
- 42 -
Figure 24. Silhouette score according to change of cluster number.
Figure 25. 1/8 dataset Silhouette score according to change of cluster number.
- 43 -
From K-means algorithms calculate proper cluster things is very
important, from the data, estimate Silhouette_score, the result is K = 7 each
cluster centroid and data prices silhouette score is 0.799 is the optimal score.
Even if the dataset is so small but the 1/8 datasets K= 7 each cluster
centroid and data prices silhouette score 0.810 is the optimal score. From this
K-means algorithm cluster 7th,
( all dataset, 1/8 dataset ) each group’s
centroid and each centroid distance will be an optimal value. From this result,
the dataset is decreased but the K-means clustering ‘s class vector space. Its
optimal cluster is the same situation with before original Dataset Household
power consumption rate via clustering.
- 44 -
Summary
From the paper, Household power consumption via k-means clustering,
Used library which is sci-kit learn, Anaconda 3 open-source personally can
easily follow it and because using BSD License to real works don’t have
difficulties to that. From this result even if reduce the dataset 1/8 but the
silhouette score and all the clustering result is same as before. But the
population will increase it can show a clearer result for the classification and
vector space. Large dataset to small dataset is clear to show to the specific
result for the Silhouette score but the opposite site is not clearly allowed.
Because of 4-dimension vector dataset. From the experiment reduce the
estimated time if received huge dataset from the analysis.
- 45 -
Chapter 5
Conclusion
his dissertation approach to a diverse aspect of the k-means clustering
applications, First time try to reduce the k-means algorithm’s time
consumption but next time I try to change my aspect to the how to reduce
the time from Large dataset, the approach is changed. These days, via
machine learning algorithm, can estimate about the when changing the part,
(life span) From this result, all of the experiment Used library scikit-learn
Anaconda3, open-source, it can easily implement any environment, because
using BSD License. Can analyze diverse indexes from the first experiment.
From the second experiment, if the dataset is huge need time to analyze, how
many centroids is a proper k-mean cluster, at that time can reduce time,
compare with 1/8 dataset, but limited classification and vector space. From
the experiment reduce the estimated time if received huge dataset from the
analysis.
- 46 -
Acknowledgment
대학원 석사 생활 중 총 114회의 컨퍼런스 참가와 7회의 발
표를 하였습니다. IEEE Globecom 2017 이 그 중 인상적이었으며,
본 논문은 Google AI, Tensor-flow Conference 2017 에서 Motivation 을
얻어 실험하게 되었습니다. 지도교수 이시면서 성균관대학교를 대
표하는 총장님이신 신동렬 교수님의 지도와, Co-Advisor 이신
Nawab Muhammad Fasheeh Queshi 와의 Co-work 에도 부족한 저를
항상 웃으며 지도해 주신대에 대하여 감사의 인사를 전합니다.
성균관대학교에 처음 Join 하게 도와주신 모바일 컴퓨팅연구
실 윤희용 교수님 SKKU Fellow 께도 감사드리며, 오픈랩에 생활
함에 있어 불편함이 없이 도와주신 남춘성 박사님과, 같이 사용한
최기현 박사님, Muhammad Hamza, Janaid , 김우현, 소 청에게도 고
마움의 뜻을 전합니다.
학위기간 동안에 끝까지 후원해 주셨던 어머니 이신 동남보
건대학교 이봉순 교수님, LG전자 평택캠퍼스 창립멤버 이신 아버
지 최한청 부장님 (현 온누리이엔지 이사) 에게도 감사의 인사를
전합니다.
학위기간동안 종종 집까지 바래다 주신, 친형인 포스코건설 최현
석 과장 및 분당서울대병원 안여울 간호사, 귀여운 조카 연우에게
도 고마움을 전합니다.
학위를 하면서 이정표가 되어준 사촌 누나 형들께도 감사의 인사
를 드리며 이만 갈음합니다.
International Scholar Pooh ® 최현웅 드림
- 47 -
Acknowledgment
I participated in a total of 114 conferences and 7 presentations during
my graduate school life. IEEE Globecom 2017 was impressive, and this
paper experimented with Motivation at Google AI, Tensor-flow Conference
2017. I would like to express my gratitude to Professor, Dr. Dong-Ryul, Shin
who is the president of Sungkyunkwan University, and co-work with Co-
Advisor Assistant Professor, Dr. Nawab Muhammad Fasheeh Qureshi.
Thank you to SKKU Fellow, Professor Hee-Yong Yoon Director of
Mobile Computing Lab for helping me to join Sungkyunkwan University for
the first time. I am thankful to Dr. Kee-Hyun Kim, Muhammad Hamza,
Junaid, Woo-Hyun Kim and Chung So I also want to thank you.
I would like to extend my sincere thanks to Bong-Soon, Lee mother,
Professor of Dongnam Health University, who supported me for the duration
of my degree, and to my father Han-Chung Choi, who is a founding member
of LG Electronics Pyeongtaek Campus.
I am also grateful to Hyun-Suk, Choi POSCO E&C Deputy manager
who has often taken his car to home during my degree, and my brother-in-
law, Ye-ul, Ahn Nurse at Seoul National University Bundang Hospital and
my cute nephew. Youn-Woo
I give my thanks to my cousins and older brothers who gave me a
milestone in my degree.
International Scholar Pooh ® Hyun-Wong Choi
June 19, 2019
- 48 -
References
[1] https://en.wikipedia.org/wiki/K-means_clustering
[2] https://en.wikipedia.org/wiki/Cluster_analysis
[3] https://en.wikipedia.org/wiki/Silhouette_(clustering)
[4] https://github.com/sarguido.
[5] http://archive.ics.uci.edu/ml/datasets.html.
[6] http://scikit-learn.org/stable/modules/clustering.html#calinski-harabaz-index
[7] http://scikit-learn.org/stable/.
[8] T. Calinski and J. Harabasz, 1974. “A dendrite method for cluster analysis”.
Communications in Statistics
[9] Kanungo, Tapas et al. “An Efficient k-Means Clustering Algorithm: Analysis and
Implementation.” IEEE Trans. Pattern Anal. Mach. Intell. 24 (2002): 881-892.
[10]David, and Sergei Vassilvitskii ,“k-means++: The advantages of careful seeding”
Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete
algorithms, Society for Industrial and Applied Mathematics (2007): 1027-1035
[11]Wagstaff, K., Cardie, C., Rogers, S., & Schrödl, S. (2001, June). Constrained k-
means clustering with background knowledge. In ICML (Vol. 1, pp. 577-584).
[12]Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering
algorithm. Journal of the Royal Statistical Society. Series C (Applied
Statistics), 28(1), 100-108.
[13]Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu,
A. Y. (2002). An efficient k-means clustering algorithm: Analysis and
implementation. IEEE Transactions on Pattern Analysis & Machine Intelligence, (7),
881-892.
[14]Alsabti, K., Ranka, S., & Singh, V. (1997). An efficient k-means clustering algorithm.
[15]Likas, A., Vlassis, N., & Verbeek, J. J. (2003). The global k-means clustering
algorithm. Pattern recognition, 36(2), 451-461.
[16]Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... &
Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine
learning research, 12(Oct), 2825-2830.
[17]Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., ... &
Layton, R. (2013). API design for machine learning software: experiences from the
scikit-learn project. arXiv preprint arXiv:1309.0238.
[18]Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., ...
& Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-
learn. Frontiers in neuroinformatics, 8, 14.
[19]Fabian, P., Gaël, V., Alexandre, G., Vincent, M., Bertrand, T., Olivier, G., ... &
Alexandre, P. (2011). Scikit-learn: Machine learning in Python. Journal of Machine
Learning Research, 12, 2825-2830.
[20]Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support
vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28.
- 49 -
[21]Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." Journal of
machine learning research 12.Oct (2011): 2825-2830.
[22]Alsabti, Khaled, Sanjay Ranka, and Vineet Singh. "An efficient k-means clustering
algorithm." (1997).
[23]Ding, Chris, and Xiaofeng He. "K-means clustering via principal component
analysis." Proceedings of the twenty-first international conference on Machine
learning. ACM, 2004.
[24]Paneque-Gálvez, Jaime, et al. "Small drones for community-based forest monitoring:
An assessment of their feasibility and potential in tropical areas." Forests 5.6 (2014):
1481-1507.
[25]Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." Journal of
machine learning research 12.Oct (2011): 2825-2830.
[26]Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006.
[27]Rasmussen, Carl Edward. "Gaussian processes in machine learning." Summer
School on Machine Learning. Springer, Berlin, Heidelberg, 2003.
[28]Hartigan, John A., and Manchek A. Wong. "Algorithm AS 136: A k-means clustering
algorithm." Journal of the Royal Statistical Society. Series C (Applied Statistics) 28.1
(1979): 100-108.
[29]Paneque-Gálvez, Jaime, et al. "Small drones for community-based forest monitoring:
An assessment of their feasibility and potential in tropical areas." Forests 5.6 (2014):
1481-1507.
[30]Sass, Ron, et al. "Reconfigurable computing cluster (RCC) project: Investigating the
feasibility of FPGA-based petascale computing." 15th Annual IEEE Symposium on
Field-Programmable Custom Computing Machines (FCCM 2007). IEEE, 2007.
[31] Duda, Richard O., Peter E. Hart, and David G. Stork. Pattern classification. John
Wiley & Sons, 2012.
[32]Cover, Thomas M., and Peter E. Hart. "Nearest neighbor pattern
classification." IEEE transactions on information theory13.1 (1967): 21-27.
[33]Breiman, Leo. Classification and regression trees. Routledge, 2017.
[34]Haralick, Robert M., and Karthikeyan Shanmugam. "Textural features for image
classification." IEEE Transactions on systems, man, and cybernetics 6 (1973): 610-
621.
[35]Chapelle, Olivier, Bernhard Scholkopf, and Alexander Zien. "Semi-supervised
learning (chapelle, o. et al., eds.; 2006)[book reviews]." IEEE Transactions on
Neural Networks 20.3 (2009): 542-542.
[36]Zhu, Xiaojin, Zoubin Ghahramani, and John D. Lafferty. "Semi-supervised learning
using gaussian fields and harmonic functions." Proceedings of the 20th International
conference on Machine learning (ICML-03). 2003.
[37]Caruana, Rich, and Alexandru Niculescu-Mizil. "An empirical comparison of
supervised learning algorithms." Proceedings of the 23rd international conference
on Machine learning. ACM, 2006.
[38]Jain, Anil K. "Data clustering: 50 years beyond K-means." Pattern recognition
letters 31.8 (2010): 651-666.
[39]Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation
learning with deep convolutional generative adversarial networks." arXiv preprint
arXiv:1511.06434 (2015).
[40]Figueiredo, Mario A. T., and Anil K. Jain. "Unsupervised learning of finite mixture
models." IEEE Transactions on Pattern Analysis & Machine Intelligence 3 (2002):
381-396.
- 50 -
[41]Lovmar, Lovisa, et al. "Silhouette scores for assessment of SNP genotype clusters."
BMC genomics 6.1 (2005): 35.
[42]Collins, Robert T., Ralph Gross, and Jianbo Shi. "Silhouette-based human
identification from body shape and gait." Proceedings of fifth IEEE international
conference on automatic face gesture recognition. IEEE, 2002.
[43]Gat-Viks, Irit, Roded Sharan, and Ron Shamir. "Scoring clustering solutions by their
biological relevance." Bioinformatics 19.18 (2003): 2381-2389.
[44]Maulik, Ujjwal, and Sanghamitra Bandyopadhyay. "Performance evaluation of some
clustering algorithms and validity indices." IEEE Transactions on pattern analysis
and machine intelligence 24.12 (2002): 1650-1654.
[45]Łukasik, Szymon, et al. "Clustering using flower pollination algorithm and calinski-
harabasz index." 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE,
2016.
[46]Desgraupes, Bernard. "Clustering indices." University of Paris Ouest-Lab Modal’X
1 (2013): 34.
[47]Petrovic, Slobodan. "A comparison between the silhouette index and the davies-
bouldin index in labelling ids clusters." Proceedings of the 11th Nordic Workshop of
Secure IT Systems. sn, 2006.
[48]Maulik, Ujjwal, and Sanghamitra Bandyopadhyay. "Performance evaluation of some
clustering algorithms and validity indices." IEEE Transactions on pattern analysis
and machine intelligence 24.12 (2002): 1650-1654.
[49]Petrovic, Slobodan. "A comparison between the silhouette index and the davies-
bouldin index in labelling ids clusters." Proceedings of the 11th Nordic Workshop of
Secure IT Systems. sn, 2006.
[50] https://scikit-learn.org/stable/
[51] https://www.anaconda.com/
[52] https://www.jetbrains.com/pycharm/
[53] Petrovic, Slobodan. "A comparison between the silhouette index and the davies-
bouldin index in labelling ids clusters." Proceedings of the 11th Nordic Workshop of
Secure IT Systems. sn, 2006.
[54] Bandyopadhyay, Sanghamitra, and Ujjwal Maulik. "Nonparametric genetic
clustering: comparison of validity indices." IEEE Transactions on Systems, Man, and
Cybernetics, Part C (Applications and Reviews) 31.1 (2001): 120-125.
[55]
https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consu
mption
[56] https://github.com/sarguido
- 51 -
논문요약
실루엣 스코어를 활용한 가정 전력소비량 분석
성균관대학교
일반대학원 전자전기컴퓨터공학과
최현웅
기계 학습은 분산 컴퓨팅 환경에서 데이터 분석을위한 새로운 도구로 등장한
현대적인 분야입니다. 기계 학습은 분석의 효율성과 함께 처리 능력을 향상시키는
몇 가지 측면이 있습니다. 본 논문에서는 주택의 전력 사용량을 최적의 가용 전력
데이터 포인트를 얻기위한 K-means 클러스터링 알고리즘을 통해 분석합니다.
Davis Boulden Index와 Silhouette_score는 K-means 알고리즘에서 클러스터의
최적 개수를 찾아 기계 학습 클러스터링 분석의 응용 시나리오를 제시합니다
기계 학습은 분산 컴퓨팅 환경에서 대규모 지능형 분석을 찾기 위해 진화 된
인공 지능의 최첨단 하위 프로젝트입니다. 본 논문에서는 1/8 데이터 세트의
실루엣 점수와의 비교를 통해 K-means 클러스터링 알고리즘을 기반으로 가정의
전기 사용량에 대해 수집 된 데이터 세트에 대한 비교 분석을 수행합니다. 성능
평가 결과 데이터 세트가 이전보다 작더라도 비교 점수는 실루엣 점수의 수가 유사
함을 보여줍니다
주제어: Machine Learning, K-means clustering, Sci-kit Learn , Silhouette score ,
Caliski-Harabasz Index
ComparativeAnalysisofElectricityConsumptionatHomethroughaSilhouette-score
Prospective
2019HyunWongChoi(
최
현
웅
)

More Related Content

What's hot

Hyun wong thesis 2019 06_22_rev40_final
Hyun wong thesis 2019 06_22_rev40_finalHyun wong thesis 2019 06_22_rev40_final
Hyun wong thesis 2019 06_22_rev40_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_22_rev41_final
Hyun wong thesis 2019 06_22_rev41_finalHyun wong thesis 2019 06_22_rev41_final
Hyun wong thesis 2019 06_22_rev41_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_22_rev39_final
Hyun wong thesis 2019 06_22_rev39_finalHyun wong thesis 2019 06_22_rev39_final
Hyun wong thesis 2019 06_22_rev39_finalHyun Wong Choi
 
An analysis of energy saving through delamping method
An analysis of energy saving through delamping methodAn analysis of energy saving through delamping method
An analysis of energy saving through delamping methodIJECEIAES
 
Hyun wong sample thesis 2019 06_19_rev20_final
Hyun wong sample thesis 2019 06_19_rev20_finalHyun wong sample thesis 2019 06_19_rev20_final
Hyun wong sample thesis 2019 06_19_rev20_finalHyun Wong Choi
 
presentation 2019 04_09_rev1
presentation 2019 04_09_rev1presentation 2019 04_09_rev1
presentation 2019 04_09_rev1Hyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev38_final
Hyun wong thesis 2019 06_19_rev38_finalHyun wong thesis 2019 06_19_rev38_final
Hyun wong thesis 2019 06_19_rev38_finalHyun Wong Choi
 
Mathematical Investigation on Emission of Bio Diesel in Internal Combustion E...
Mathematical Investigation on Emission of Bio Diesel in Internal Combustion E...Mathematical Investigation on Emission of Bio Diesel in Internal Combustion E...
Mathematical Investigation on Emission of Bio Diesel in Internal Combustion E...IJSRD
 
Optimization for-power-sy-8631549
Optimization for-power-sy-8631549Optimization for-power-sy-8631549
Optimization for-power-sy-8631549Kannan Kathiravan
 
Mathematical Modeling and Simulation of Photovoltic Array
Mathematical Modeling and Simulation of Photovoltic ArrayMathematical Modeling and Simulation of Photovoltic Array
Mathematical Modeling and Simulation of Photovoltic ArrayIJERA Editor
 
Presentation
PresentationPresentation
PresentationAnmitas1
 
Application of-computational-intelligence-techniques-for-economic-load-dispatch
Application of-computational-intelligence-techniques-for-economic-load-dispatchApplication of-computational-intelligence-techniques-for-economic-load-dispatch
Application of-computational-intelligence-techniques-for-economic-load-dispatchCemal Ardil
 
10766012 ranalitics
10766012 ranalitics10766012 ranalitics
10766012 ranaliticsJason Chen
 
OPTIMIZATION OF COMBINED ECONOMIC EMISSION DISPATCH PROBLEM USING ARTIFICIAL ...
OPTIMIZATION OF COMBINED ECONOMIC EMISSION DISPATCH PROBLEM USING ARTIFICIAL ...OPTIMIZATION OF COMBINED ECONOMIC EMISSION DISPATCH PROBLEM USING ARTIFICIAL ...
OPTIMIZATION OF COMBINED ECONOMIC EMISSION DISPATCH PROBLEM USING ARTIFICIAL ...IJCI JOURNAL
 
A Management of power flow for DC Microgrid with Solar and Wind Energy Sources
A Management of power flow for DC Microgrid with Solar and Wind Energy SourcesA Management of power flow for DC Microgrid with Solar and Wind Energy Sources
A Management of power flow for DC Microgrid with Solar and Wind Energy SourcesAsoka Technologies
 
A Genetic Algorithm Based Approach for Solving Optimal Power Flow Problem
A Genetic Algorithm Based Approach for Solving Optimal Power Flow ProblemA Genetic Algorithm Based Approach for Solving Optimal Power Flow Problem
A Genetic Algorithm Based Approach for Solving Optimal Power Flow ProblemShubhashis Shil
 

What's hot (20)

Hyun wong thesis 2019 06_22_rev40_final
Hyun wong thesis 2019 06_22_rev40_finalHyun wong thesis 2019 06_22_rev40_final
Hyun wong thesis 2019 06_22_rev40_final
 
Hyun wong thesis 2019 06_22_rev41_final
Hyun wong thesis 2019 06_22_rev41_finalHyun wong thesis 2019 06_22_rev41_final
Hyun wong thesis 2019 06_22_rev41_final
 
Hyun wong thesis 2019 06_22_rev39_final
Hyun wong thesis 2019 06_22_rev39_finalHyun wong thesis 2019 06_22_rev39_final
Hyun wong thesis 2019 06_22_rev39_final
 
An analysis of energy saving through delamping method
An analysis of energy saving through delamping methodAn analysis of energy saving through delamping method
An analysis of energy saving through delamping method
 
Hyun wong sample thesis 2019 06_19_rev20_final
Hyun wong sample thesis 2019 06_19_rev20_finalHyun wong sample thesis 2019 06_19_rev20_final
Hyun wong sample thesis 2019 06_19_rev20_final
 
presentation 2019 04_09_rev1
presentation 2019 04_09_rev1presentation 2019 04_09_rev1
presentation 2019 04_09_rev1
 
Hyun wong thesis 2019 06_19_rev38_final
Hyun wong thesis 2019 06_19_rev38_finalHyun wong thesis 2019 06_19_rev38_final
Hyun wong thesis 2019 06_19_rev38_final
 
Mathematical Investigation on Emission of Bio Diesel in Internal Combustion E...
Mathematical Investigation on Emission of Bio Diesel in Internal Combustion E...Mathematical Investigation on Emission of Bio Diesel in Internal Combustion E...
Mathematical Investigation on Emission of Bio Diesel in Internal Combustion E...
 
Optimization for-power-sy-8631549
Optimization for-power-sy-8631549Optimization for-power-sy-8631549
Optimization for-power-sy-8631549
 
Mathematical Modeling and Simulation of Photovoltic Array
Mathematical Modeling and Simulation of Photovoltic ArrayMathematical Modeling and Simulation of Photovoltic Array
Mathematical Modeling and Simulation of Photovoltic Array
 
Presentation
PresentationPresentation
Presentation
 
Application of-computational-intelligence-techniques-for-economic-load-dispatch
Application of-computational-intelligence-techniques-for-economic-load-dispatchApplication of-computational-intelligence-techniques-for-economic-load-dispatch
Application of-computational-intelligence-techniques-for-economic-load-dispatch
 
Psat toolbox-8631349
Psat toolbox-8631349Psat toolbox-8631349
Psat toolbox-8631349
 
Optimization in power system
Optimization in power systemOptimization in power system
Optimization in power system
 
Af4201214217
Af4201214217Af4201214217
Af4201214217
 
10766012 ranalitics
10766012 ranalitics10766012 ranalitics
10766012 ranalitics
 
OPTIMIZATION OF COMBINED ECONOMIC EMISSION DISPATCH PROBLEM USING ARTIFICIAL ...
OPTIMIZATION OF COMBINED ECONOMIC EMISSION DISPATCH PROBLEM USING ARTIFICIAL ...OPTIMIZATION OF COMBINED ECONOMIC EMISSION DISPATCH PROBLEM USING ARTIFICIAL ...
OPTIMIZATION OF COMBINED ECONOMIC EMISSION DISPATCH PROBLEM USING ARTIFICIAL ...
 
Vladimir Savin
Vladimir SavinVladimir Savin
Vladimir Savin
 
A Management of power flow for DC Microgrid with Solar and Wind Energy Sources
A Management of power flow for DC Microgrid with Solar and Wind Energy SourcesA Management of power flow for DC Microgrid with Solar and Wind Energy Sources
A Management of power flow for DC Microgrid with Solar and Wind Energy Sources
 
A Genetic Algorithm Based Approach for Solving Optimal Power Flow Problem
A Genetic Algorithm Based Approach for Solving Optimal Power Flow ProblemA Genetic Algorithm Based Approach for Solving Optimal Power Flow Problem
A Genetic Algorithm Based Approach for Solving Optimal Power Flow Problem
 

Similar to Comparative Analysis of Electricity Consumption at Home through K-Means Clustering

Hyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_finalHyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev32_final
Hyun wong thesis 2019 06_19_rev32_finalHyun wong thesis 2019 06_19_rev32_final
Hyun wong thesis 2019 06_19_rev32_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev33_final
Hyun wong thesis 2019 06_19_rev33_finalHyun wong thesis 2019 06_19_rev33_final
Hyun wong thesis 2019 06_19_rev33_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_finalHyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_finalHyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_finalHyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev31_final
Hyun wong thesis 2019 06_19_rev31_finalHyun wong thesis 2019 06_19_rev31_final
Hyun wong thesis 2019 06_19_rev31_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev29_final
Hyun wong thesis 2019 06_19_rev29_finalHyun wong thesis 2019 06_19_rev29_final
Hyun wong thesis 2019 06_19_rev29_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev27_final
Hyun wong thesis 2019 06_19_rev27_finalHyun wong thesis 2019 06_19_rev27_final
Hyun wong thesis 2019 06_19_rev27_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev30_final
Hyun wong thesis 2019 06_19_rev30_finalHyun wong thesis 2019 06_19_rev30_final
Hyun wong thesis 2019 06_19_rev30_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev25_final
Hyun wong thesis 2019 06_19_rev25_finalHyun wong thesis 2019 06_19_rev25_final
Hyun wong thesis 2019 06_19_rev25_finalHyun Wong Choi
 
Hyun wong sample thesis 2019 06_19_rev21_final
Hyun wong sample thesis 2019 06_19_rev21_finalHyun wong sample thesis 2019 06_19_rev21_final
Hyun wong sample thesis 2019 06_19_rev21_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev28_final
Hyun wong thesis 2019 06_19_rev28_finalHyun wong thesis 2019 06_19_rev28_final
Hyun wong thesis 2019 06_19_rev28_finalHyun Wong Choi
 
Hyun wong sample thesis 2019 06_19_rev22_final
Hyun wong sample thesis 2019 06_19_rev22_finalHyun wong sample thesis 2019 06_19_rev22_final
Hyun wong sample thesis 2019 06_19_rev22_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev25_final
Hyun wong thesis 2019 06_19_rev25_finalHyun wong thesis 2019 06_19_rev25_final
Hyun wong thesis 2019 06_19_rev25_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev27_final
Hyun wong thesis 2019 06_19_rev27_finalHyun wong thesis 2019 06_19_rev27_final
Hyun wong thesis 2019 06_19_rev27_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev26_final
Hyun wong thesis 2019 06_19_rev26_finalHyun wong thesis 2019 06_19_rev26_final
Hyun wong thesis 2019 06_19_rev26_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev23_final
Hyun wong thesis 2019 06_19_rev23_finalHyun wong thesis 2019 06_19_rev23_final
Hyun wong thesis 2019 06_19_rev23_finalHyun Wong Choi
 
Hyun wong thesis 2019 06_19_rev24_final
Hyun wong thesis 2019 06_19_rev24_finalHyun wong thesis 2019 06_19_rev24_final
Hyun wong thesis 2019 06_19_rev24_finalHyun Wong Choi
 
B.E. (EEE) Project Report Preparation Template-Rev (1).docx
B.E. (EEE) Project Report Preparation Template-Rev (1).docxB.E. (EEE) Project Report Preparation Template-Rev (1).docx
B.E. (EEE) Project Report Preparation Template-Rev (1).docxdivyeshparmar927
 

Similar to Comparative Analysis of Electricity Consumption at Home through K-Means Clustering (20)

Hyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_finalHyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_final
 
Hyun wong thesis 2019 06_19_rev32_final
Hyun wong thesis 2019 06_19_rev32_finalHyun wong thesis 2019 06_19_rev32_final
Hyun wong thesis 2019 06_19_rev32_final
 
Hyun wong thesis 2019 06_19_rev33_final
Hyun wong thesis 2019 06_19_rev33_finalHyun wong thesis 2019 06_19_rev33_final
Hyun wong thesis 2019 06_19_rev33_final
 
Hyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_finalHyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_final
 
Hyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_finalHyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_final
 
Hyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_finalHyun wong thesis 2019 06_19_rev35_final
Hyun wong thesis 2019 06_19_rev35_final
 
Hyun wong thesis 2019 06_19_rev31_final
Hyun wong thesis 2019 06_19_rev31_finalHyun wong thesis 2019 06_19_rev31_final
Hyun wong thesis 2019 06_19_rev31_final
 
Hyun wong thesis 2019 06_19_rev29_final
Hyun wong thesis 2019 06_19_rev29_finalHyun wong thesis 2019 06_19_rev29_final
Hyun wong thesis 2019 06_19_rev29_final
 
Hyun wong thesis 2019 06_19_rev27_final
Hyun wong thesis 2019 06_19_rev27_finalHyun wong thesis 2019 06_19_rev27_final
Hyun wong thesis 2019 06_19_rev27_final
 
Hyun wong thesis 2019 06_19_rev30_final
Hyun wong thesis 2019 06_19_rev30_finalHyun wong thesis 2019 06_19_rev30_final
Hyun wong thesis 2019 06_19_rev30_final
 
Hyun wong thesis 2019 06_19_rev25_final
Hyun wong thesis 2019 06_19_rev25_finalHyun wong thesis 2019 06_19_rev25_final
Hyun wong thesis 2019 06_19_rev25_final
 
Hyun wong sample thesis 2019 06_19_rev21_final
Hyun wong sample thesis 2019 06_19_rev21_finalHyun wong sample thesis 2019 06_19_rev21_final
Hyun wong sample thesis 2019 06_19_rev21_final
 
Hyun wong thesis 2019 06_19_rev28_final
Hyun wong thesis 2019 06_19_rev28_finalHyun wong thesis 2019 06_19_rev28_final
Hyun wong thesis 2019 06_19_rev28_final
 
Hyun wong sample thesis 2019 06_19_rev22_final
Hyun wong sample thesis 2019 06_19_rev22_finalHyun wong sample thesis 2019 06_19_rev22_final
Hyun wong sample thesis 2019 06_19_rev22_final
 
Hyun wong thesis 2019 06_19_rev25_final
Hyun wong thesis 2019 06_19_rev25_finalHyun wong thesis 2019 06_19_rev25_final
Hyun wong thesis 2019 06_19_rev25_final
 
Hyun wong thesis 2019 06_19_rev27_final
Hyun wong thesis 2019 06_19_rev27_finalHyun wong thesis 2019 06_19_rev27_final
Hyun wong thesis 2019 06_19_rev27_final
 
Hyun wong thesis 2019 06_19_rev26_final
Hyun wong thesis 2019 06_19_rev26_finalHyun wong thesis 2019 06_19_rev26_final
Hyun wong thesis 2019 06_19_rev26_final
 
Hyun wong thesis 2019 06_19_rev23_final
Hyun wong thesis 2019 06_19_rev23_finalHyun wong thesis 2019 06_19_rev23_final
Hyun wong thesis 2019 06_19_rev23_final
 
Hyun wong thesis 2019 06_19_rev24_final
Hyun wong thesis 2019 06_19_rev24_finalHyun wong thesis 2019 06_19_rev24_final
Hyun wong thesis 2019 06_19_rev24_final
 
B.E. (EEE) Project Report Preparation Template-Rev (1).docx
B.E. (EEE) Project Report Preparation Template-Rev (1).docxB.E. (EEE) Project Report Preparation Template-Rev (1).docx
B.E. (EEE) Project Report Preparation Template-Rev (1).docx
 

More from Hyun Wong Choi

Chapter8 touch 6 10 group11
Chapter8 touch 6 10 group11Chapter8 touch 6 10 group11
Chapter8 touch 6 10 group11Hyun Wong Choi
 
Chapter6 power management ic group11
Chapter6 power management ic group11Chapter6 power management ic group11
Chapter6 power management ic group11Hyun Wong Choi
 
Chapter5 embedded storage
Chapter5 embedded storage Chapter5 embedded storage
Chapter5 embedded storage Hyun Wong Choi
 
Chapter5 embedded storage
Chapter5 embedded storage Chapter5 embedded storage
Chapter5 embedded storage Hyun Wong Choi
 
Chapter4 wireless connectivity group11
Chapter4 wireless connectivity group11Chapter4 wireless connectivity group11
Chapter4 wireless connectivity group11Hyun Wong Choi
 

More from Hyun Wong Choi (9)

Airport security ver1
Airport security ver1Airport security ver1
Airport security ver1
 
Final
FinalFinal
Final
 
Chapter8 touch 6 10 group11
Chapter8 touch 6 10 group11Chapter8 touch 6 10 group11
Chapter8 touch 6 10 group11
 
Chapter6 power management ic group11
Chapter6 power management ic group11Chapter6 power management ic group11
Chapter6 power management ic group11
 
Chapter5 embedded storage
Chapter5 embedded storage Chapter5 embedded storage
Chapter5 embedded storage
 
Chapter5 embedded storage
Chapter5 embedded storage Chapter5 embedded storage
Chapter5 embedded storage
 
Chapter4 wireless connectivity group11
Chapter4 wireless connectivity group11Chapter4 wireless connectivity group11
Chapter4 wireless connectivity group11
 
Chapter2 ap group11
Chapter2 ap group11Chapter2 ap group11
Chapter2 ap group11
 
Chapter1
Chapter1Chapter1
Chapter1
 

Recently uploaded

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 

Recently uploaded (20)

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 

Comparative Analysis of Electricity Consumption at Home through K-Means Clustering

  • 1. Master’s Dissertation Comparative Analysis of Electricity Consumption at Home through a Silhouette-score prospective Hyun Wong Choi Department of Electrical and Computer Engineering The Graduate School Sungkyunkwan University 메모 포함[CH1]: Revised 1
  • 2. Comparative Analysis of Electricity Consumption at Home through a Silhouette-score prospective Hyun Wong Choi Department of Electrical and Computer Engineering The Graduate School Sungkyunkwan University
  • 3. Comparative Analysis of Electricity Consumption at Home through a Silhouette-score prospective Hyun Wong Choi A Dissertation Submitted to the Department of Electrical and Computer Engineering and the Graduate School of Sungkyunkwan University in partial fulfillment of the requirements for the degree of Master of Science in Engineering April 2019 Approved by Professor Dr. Dong Ryeol Shin
  • 4. This certifies that the dissertation of Hyun Wong Choi is approved. Committee Chair: Prof. Dr. MUHAMMAD MANNAN SAEED Committee Member: Prof. Dr. Eung Mo Kim Major Advisor: Prof. Dr. Dong Ryeol Shin Co-Advisor: Prof. Dr. Nawab Muhammad Faseeh Querish The Graduate School Sungkyunkwan University June 2019
  • 5. - 3 - Contents List of Figures ....................................................................................................................................................................5 Abstract................................................................................................................................................................................6 Chapter 1 .............................................................................................................................................................................7 Introduction ..........................................................................................................................................................................7 Chapter 2 .......................................................................................................................................................................... 10 Overview & Motivation ................................................................................................................................................. 10 Chapter 3 .......................................................................................................................................................................... 11 Paper-1 Analysis of Electricity Consumption at Home through a Silhouette-score prospective ................. 11 3.1. Introduction....................................................................................................................................................... 11 2.4 Paper-1 EVALUATION............................................................................................................................. 13 2.4.1 Experimental Environment ..................................................................................................................... 13 2.4.2 Experimental Dataset................................................................................................................................. 14 3.2. Previous work................................................................................................................................................... 15 3.3. Proposed Approach....................................................................................................................................... 19 1.1.1. Experimental Results ............................................................................................................................ 23 3.4. Related work..................................................................................................................................................... 29 3.1 Summary............................................................................................................................................................. 31 Chapter 4 .......................................................................................................................................................................... 32 Paper-2 Comparative Analsysis of Electricity Consumption at Home through a Silhouette-score prospective 32 4.1 Introduction ....................................................................................................................................................... 32 4.2 Related work...................................................................................................................................................... 33 3.5. Paper-2 Methodology .................................................................................................................................. 35 1.1.1. Experimental Environment................................................................................................................. 37 1.1.2. Experimental Dataset............................................................................................................................ 38 1.1.3. Experimental Results ............................................................................................................................ 39 Summary...................................................................................................................................................................... 44 Chapter 5 .......................................................................................................................................................................... 45 Conclusion........................................................................................................................................................................ 45 Acknowledgement.......................................................................................................................................................... 46
  • 7. - 5 - List of Figures Fig.1. Figure 1. Clustering result at K = 1 23 Fig.2. Figure2. Clustering result at K = 2 23 Fig.3. Figure 3. Clustering result at K = 3 23 Fig.4. Figure 4 Clustering result at K = 4 23 Fig.5. Figure 5. Clustering result at K = 5 24 Fig.6. Figure 6. Clustering result at K = 6 24 Fig.7. Figure 7. Clustering result at K = 7 24 Fig.8. Figure 8. Clustering result at K = 8 24 Fig.9. Figure 9. Clustering result at K = 9 25 Fig.10. Figure 10. Clustering result at K = 10 25 Fig.11. Figure 11. Silhouette score according to change of cluster number. 26 Fig.12. Figure 12: Clustering result at K = 7 27 Fig.13. Figure 13. 1/8 dataset cluster K = 1 39 Fig.14. Figure 14. 1/8 dataset cluster K = 2 39 Fig.15. Figure 15. 1/8 dataset cluster K = 3 39 Fig.16. Figure 16. 1/8 dataset cluster K = 4 39 Fig.17. Figure 17. 1/8 dataset cluster K = 5 40 Fig.18. Figure 18. 1/8 dataset cluster K = 6 40 Fig.19. Figure 19. 1/8 dataset cluster K = 7 40 Fig.20. Figure 20. 1/8 dataset cluster K = 8 40 Fig.21. Figure 20. 1/8 dataset cluster K = 9 40 Fig.22. Figure 20. 1/8 dataset cluster K = 10 41 Fig.23. Figure 20. 1/8 dataset cluster K = 11 41 Fig.24. Figure 24. Silhouette score according to change of cluster number. 42 Fig.25. Figure 25. 1/8 dataset Silhouette score according to change of cluster number. 42
  • 8. - 6 - Abstract Machine learning is a modern field that has emerged as a new tool for data analytics in a distributed computing environment. There are several aspects, at which, machine learning has improved the processing capacity along with the effectiveness of analysis. In this paper, the electricity usage of the home is analyzed through K-means clustering algorithm for obtaining the optimal home usage electricity data points. The Davis Boulden Index and Silhouette_score finds the detailed optimal number of clusters in the K-means algorithm and present the application scenario of the machine learning clustering analytics Machine learning is a state-of-the-art sub-project of artificial intelligence, that is been evolved for finding large-scale intelligent analytics in the distributed computing environment. In this paper, we perform comparative analytics onto dataset collected for the electricity usage of home based on the K-means clustering algorithm using comparison to silhouette score with a ratio 1/8 dataset. The performance evaluation shows that the comparison index is similar in numbers of silhouette score even if datasets are smaller than before KeyMAwords: Machine Learning, K-means clustering
  • 9. - 7 - Chapter 1 Introduction Electiricty consumption from power grid In the power grid, we measure the consumption through sensors Industrial consumption Housing consumption Factories consumption Housing Consumption Front end( Consumer End ) Back end ( Electircal Company end) Back end ( Company end ) - Dataset For consumption UCIRVINE So many techniques that solve the optimization problem of electricity but, none of them focus on housing electricity optimization, - Reducing the cost - Factors of overcharge - Prediction Are not available. Solution K-means algorithm Why chose k-mean cluster Predict the answer from the dataset No any answer is available in terms of k-mean
  • 10. - 8 - Why predicting the answers No clear result In this paper electricity usage of home is analyzed through k-means clustering algorithm for obtaining the optimal home usage electricity usage of the home is 3A is analyzed through k-means clustering algorithm for obtaining the optimal home usage electricity data points The calinski-Harabasz Index, Davis-Boulden index and silhouette_score find the detailed optimal number of clusters in the K-means algorithm and present the application scenario of the machine learning algorithm. 3B is reducing the 1/8 dataset and result in the same result The proposed approach delivers us efficient and meaning prediction results never obtained before.
  • 11. - 9 - Machine learning is an analyzing mechanism that fetches and identifies the matching patterns from existing datasets for newer result formations. This paper discusses comparative analytics related to unsupervised learning algorithms. At which we compare the K-mean clustering result with a ratio of half dataset to silhouette_score result. We performed analysis and came to a conclusion that Davis-Boulden index is not working smoothly in the sci-kit learn library, so performed a check analysis for Caliski-Harabasz Index and Silhouette score along with and Davis – Boulden index and compared results to each of them so to learn that when we reduce the dataset to a mentioned proportion, the resultant dataset shows half score than the traditional dataset score.
  • 12. - 10 - Chapter 2 Overview & Motivation In real life household power consumptions diverse analytics and electricity transformer, Transmission power can management period can estimate it. And each data using electricity consumption. It can be used for progressive taxation. Regional to regional demand, forecasting, maintenance of power plant and facilities. In the gas company or Car, the company can estimate about the consumption for the via k-means algorithm and also can estimate via k-means algorithms and also can estimate about the gas consumption rate to via k-means clustering and index. Motivated by Google AI, Tensor-flow Conference 2017
  • 13. - 11 - Chapter 3 Paper-1 Analysis of Electricity Consumption at Home through a Silhouette-score prospective 3.1. Introduction Machine learning is a sub-project of artificial intelligence, that is used to develop algorithms and techniques for enabling the computers to learn [1]. It is used to train the computer for various aspects such as (i) distinguish whether e-mails received are s pam or not, (ii) data classification application, (iii) association rule identification, and (iv) character recognition. Machine learning includes a series of processes, in which a computer lookup for (i) similar patterns, (ii) generate a novel classification system, (iii) data analytics, and (iv) producing meaningful results. It is a kind of artificial intelligence, that can be predicted based on the result if it is supported only by analytics algorithms. Machine learning is a step-by-step evolution process that leads from big data analytics to predict future actions towards making decisions on its own through past learned results. The key issues for processing a successful prediction model remains to be within increasing the probability and reducing the error and the said problems are resolved through enabling numerous iterative learnings [2].
  • 14. - 12 - At the heart of machine learning are Representation and Generalization, where expression is an evaluation of data and generalization is the processing of future data. Unsupervised learning is a type of machine learning that is used primarily to determine how data is organized. Unlike Supervised Learning or Reinforcement Learning, this method does not give a target value for input values [3]. Autonomous learning is closely related to the density estimation of statistics. These autonomous learning can summarize and describe the main characteristics of the data. An example of autonomous learning is clustering. In this paper, we use the K-means algorithm to measure the optimal number of clusters based on the Calinski-Harabasz Index and Silhouette_score, Davis-Boulden index and then apply it to household electricity consumption analysis.
  • 15. - 13 - 2.4 Paper-1 EVALUATION 2.4.1 Experimental Environment Software: Anaconda3 + Pycharm3 OS Software: Window 10 Professional Ram 16.0GB Processor: i7-6600U CPU @2.60GHz Harddisk : 420GB SSD
  • 16. - 14 - 2.4.2 Experimental Dataset 1.date: Date in format dd/mm/yyyy 2.time: time in format hh:mm:ss 3.global_active_power: household global minute-averaged active power (in kilowatt) 4.global_reactive_power: household global minute-averaged reactive power (in kilowatt) 5.voltage: minute-averaged voltage (in volt) 6.global_intensity: household global minute-averaged current intensity (in ampere) 7.sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered). 8.sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing- machine, a tumble-drier, a refrigerator, and a light. 9.sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air- conditioner. Global_active_power Global_reactive_power Voltage Global_intensity Sub_metering_1 Sub_metering_2 Sub_metering_3 0 4.216 0.418 234.84 18.4 0 1 17 1 5.36 0.436 233.63 23 0 1 16 2 5.374 0.498 233.29 23 0 2 17 3 5.388 0.502 233.74 23 0 1 17 4 3.666 0.528 235.68 15.8 0 1 17
  • 17. - 15 - Household power consumption from the dataset Download from University California Irvine Machine Learning Dataset Repository and then use it, this dataset is via delimiter is divided. Global_active_power, Global Reactive_power, Voltage, Global_intensity is divided. Global Active_power and Global Reactive power the X, Y axis experiment it. 3.2. Previous work 3.2.1 Machine Learning Machine learning is like data mining, but it is different in predicting data based on learned attributes, mainly through training data. In addition to the three techniques, Unsupervised learning, Supervised Learning or Reinforcement Learning, various types of machine learning techniques such as Semi-Supervised Learning and Deep Learning algorithms are developed Has been used.
  • 18. - 16 - 3.2.2 Clustering Clustering is a method of data mining by defining a cluster of data considering the characteristics of given data and finding a representative point that can represent the data group. A cluster is a group of data with similar characteristics. If the characteristics of the data are different, they must belong to different clusters. It is the main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including pattern recognition, information retrieval, machine learning, and computer graphics [3]. (1) Maximizing inter-cluster variance (2) Minimizing the inner-cluster variance Note, however, that clustering should be distinguished from Classification. Clustering is unsupervised learning without correct answers. In other words, we group similar objects without group information of each object. Classification, on the other hand, is supervised learning. When you carry out classification tasks, you will learn to predict the dependent variable (Y) with the independent variable (X) of the data [4].
  • 19. - 17 - 3.2.3 Community Feasibility Assessment Since clustering tasks are not correct, they cannot be evaluated as indicators, such as simple accuracy, as in a typical machine learning algorithm. As you can see in the example below, it is not easy to find the optimal number of clusters without the correct answers. Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include a group with small distances between cluster members, dense areas of data space,
  • 20. - 18 - 3.2.4 Scikit-learn In general, a learning problem considers a set of n samples of data and then tries to predict the properties of unknown data. If each sample is more than a single number and for instance. A multi-dimensional entry, it is said to have several attributes or features. Supervised learning, in which the data comes with additional attributes that we want to predict this problem can be either. Classification: samples belong to two or more classes and we want to learn from already labeled data on how to predict the class of unlabeled data. An example of a classification problem would be handwritten digit recognition, in which the aim is to assign each input vector to one of a finite number of discrete categories. Another way to think of classification is as a discrete( as opposed to continuous) form of supervised learning where one has a limited number of categories and for each of n samples provided. One if to try to label them with the correct category or class. Scikit-learn is the machine learning platform in the middle range of superficial broad python module this package high-level language can us easily high-level documentation and proper API suggested. Using BSD license as academic or commercially use it. Source-code, documentation is downloaded from websites [10] Supervised learning, Unsupervised Learning is the many problems is inserted in the Scikit-learn, Generalized Models, Linear and Quadratic Recruitment Analysis, Kernel Ridged regression, Support Vector machine, Stochastic Gradient Decent model’s solution also inserted in the Scikit-learn
  • 21. - 19 - 3.3. Proposed Approach K-means algorithms is one of the clustering methods for divided, divided is giving the data among the many partitions, For example, receive data object n, divided data is input data divided K(<= n) data, each group consisting of cluster below equation is the at K-means algorithm when cluster consists of algorithms using cost function use it [11] argmin ∑ ∑ ‖𝑥 − 𝜇𝑖‖ 𝑥 ∈ 𝑆 𝑖 2𝑘 𝑖 =1 In other words, one of the data objects divided by the K group. Currently, the divided similarity is (dissimilarity with reducing the cost function about it. And from this theory each object similarity increase, different group similarity will decrease.[12] K-means algorithm is each centroid and in each group’s data object times’ summation, from this function result, the data object group updated clustering progressed.[5]
  • 22. - 20 - How to be well to be clustering inner way is Caliski-Harabasz Index, Davies-Bouldin index, Dunn index, Silhouette score. In this paper. Evaluate via Clainiski-Harabasz Index and silhouette score evaluate it. From the Cluster Calinski-Harabasz Index s I the clusters distributed average and cluster distributed ratio will give it to you. 𝑠(𝑘) = 𝑇𝑟(𝐵 𝑘) 𝑇𝑟(𝑊𝑘) × 𝑁 − 𝑘 𝑘 − 1 For this Bk is the distributed matrix from each group Wk is the cluster distributed defined. 𝑊𝑘 = ∑ ∑ (𝑥 − 𝑐 𝑞)(𝑥 − 𝑐 𝑞 𝑥∈𝐶 𝑞 𝑘 𝑞=1 ) 𝑇 𝐵 𝑘 = ∑ 𝑛 𝑞(𝐶 𝑞 − 𝑐)(𝐶 𝑞 𝑞 − 𝑐) 𝑇 N is the number of Data, Cq data group in Cq, Cq is the cluster q’s centroid, c is the E of the Centroid, NQ is the number of data number in cluster_q
  • 23. - 21 - Silhouette score is the easy way to in data I each data cluster in data’s definition an (i) each data is not clustered inner and data’s definition b(i) silhouette score s(i) is equal to calculate that s(i) = 𝑏(𝑖) − 𝑎(𝑖) max { 𝑎(𝑖), 𝑏(𝑖)} From this calculate s(i) is equal to that function −1 ≤ s(i) ≤ 1 S(i) is the close to 1 is the data 1 is the correct cluster to each thing, close to -1 cannot distribute cluster is distributed, from this paper machine Using the machine learning library scikit-learn in the household power consumption clustering[7],
  • 24. - 22 - Household power consumption from the dataset Download from University California Irvine Machine Learning Data Repository[8] and then use it, this dataset is via delimiter is divided. Global_active_power, Global Reactive_power, Voltage, Global _intensity is divided. Global Active_powere and Global Reactive power the X, Y axis experiment it, Python library is Anaconda3 K-means algorithm’s key point is using Data keep K clusters, reduce cluster’s distance, K-means algorithms input data put the labels. Figure 1 is the before check Calinski-Harabasz Index and Silhouette_score execute K-means algorithm’s result. Figure 1 to Figure 11 are 1/8 dataset k-means clustering result for Household power consumption from UC Irvine Repository and reduce the dataset 1/8 times from original UCI machine learning data repository.
  • 25. - 23 - 1.1.1. Experimental Results Figure 1. Clustering result at K = 1 Figure2. Clustering result at K=2 Figure 3. Clustering result at K = 3 Figure4. Clustering result at K=4
  • 26. - 24 - Figure 5. Clustering result at K = 5 Figure 6. Clustering result at K=6 Figure 7. Clustering result at K=7 Figure 8. Clustering result at K=8
  • 27. - 25 - Figure 9. Clustering result at K = 9 Figure 10. Clustering result at K=10
  • 28. - 26 - After all, reduce each cluster’s distance calculate each cluster’s Calinski-Harabasz Index, increasing clusters’ Calinski-Harabasz Index will decrease with K ratio is too law estimate K this cluster partition will one more or not electric consumption rate is very important. This one is the most important fact. Figure 11. Silhouette score according to change of cluster number. Equal with Caliski-Harabasz Index estimation, calculate Silhouette_score. The cluster will increase Silhouette_score will decreases with K distributed, a low factor with optimal K represented. From K-means algorithms calculate proper cluster things is very important, from the data, estimate Silhouette_score, the result is K=7 each cluster centroid and data prices silhouette score are 0.799 is the optimal score.
  • 29. - 27 - From the formal Caliski-Harabasz Index results are 560.3999 is the optimal result. Using this k-means algorithm the fact is figure 11. From this K-means algorithm cluster 7th, each group’s centroid and each centroid distance will be an optimal value. From this result, each Centroid can divide. Household power consumption rate via clustering. Figure 12: Clustering result at K=7
  • 30. - 28 - Davies-Bouldin index If the ground truth labels are not known, the Davies-Bouldin index (sklearn. Metrixdavis Boulden) 𝑅𝑖𝑗 = 𝑠𝑖 + 𝑠𝑗 𝑑𝑖𝑗 Then the Davis-Bouldin Index is defined as DB = 1 𝑘 ∑ 𝑖 = 1 𝑘 max 𝑖≠𝑗 𝑅𝑖𝑗 The zero is the lowest score a possible. Score. Values closer to zero indicate a better partition. But the problem is this algorithm does not attach it in the Scikit-learn library and only explain it in the document page but cannot experiment easily. Evaluation result K Silhouette Score Calinski-Harabasz Index Davies-Boulden index 5 0.8117 N/A 6 0.6511 N/A 7 0.7719 560.3999 N/A 8 0.7037 N/A
  • 31. - 29 - 3.4. Related work Machine learning is a sub-project of artificial intelligence, that is used to develop algorithms and techniques for enabling the computers to learn [1]. It is used to train the computer for various aspects such as (i) distinguish whether e-mails received are spam or not, (ii) data classification application, (iii) association rule identification, and (iv) character recognition. Machine learning includes a series of processes, in which a computer lookup for (i) similar patterns, (ii) generate a novel classification system, (iii) data analytics, and (iv) producing meaningful results. It is a kind of artificial intelligence, that can be predicted based on the result if it is supported only by analytics algorithms. Machine learning is a step-by-step evolution process that leads from big data analytics to predict future actions towards making decisions on its own through past learned results. The key issues for processing a successful prediction model remains to be within increasing the probability and reducing the error and the said problems are resolved through enabling numerous iterative learnings [2]. At the heart of machine learning are Representation and Generalization, where expression is an evaluation of data and generalization is the processing of future data. Unsupervised learning is a type of machine learning that is used primarily to determine how data is organized. Unlike Supervised Learning or Reinforcement Learning, this method does not give a target value for input values [3].
  • 32. - 30 - Autonomous learning is closely related to the density estimation of statistics. These autonomous learning can summarize and describe the main characteristics of the data. An example of autonomous learning is clustering. In this paper, we use the K-means algorithm to measure the optimal number of clusters based on the Calinski-Harabasz Index and Silhouette_score, Davis-Boulden index and then apply it to household electricity consumption analysis.
  • 33. - 31 - 3.1 Summary From the paper, Household power consumption via k-means clustering, Used library which is sci-kit learn, Anaconda 3 open-source personally can easily follow it and because using BSD License to real works don’t have difficulties to that. Not only the K-means algorithm, PCA Algorithms but also SVM algorithm, etc other machine learning algorithms clustering can also do it. From this result, in real life household power consumptions diverse analytics. And the electricity transformer, Transmission power can management period can estimate it. And each data using electricity consumption. It can be used for progressive taxation, regional to regional demand forecasting, maintenance of power plants and facilities. Can do it. In the Gas, the company can estimate via k-means algorithms and also can estimate the gas consumption rate to via K-means clustering and index.
  • 34. - 32 - Chapter 4 Paper-2 Comparative Analysis of Electricity Consumption at Home through a Silhouette-score prospective 4.1 Introduction Machine learning is an analyzing mechanism that fetches and identifies the matching patterns from existing datasets for newer result formations. This paper discusses comparative analytics related to unsupervised learning algorithms, at which we compare the K-mean clustering result with a ratio of half dataset to Silhouette_score results. We performed analysis and came to the conclusion that Davis-Boulden index is not working smoothly in the sci-kit learning, so performed a check analysis for Caliski-Harabasz Index and Silhouette score along with and Davis – Boulden index and compared results to each of them so to learn that when we reduce the dataset to a mentioned proportion, the resultant dataset shows half score than the traditional dataset score.
  • 35. - 33 - 4.2 Related work Machine learning is a field of artificial intelligence, that is used to develop algorithms and techniques that enable computers to learn [1]. It is used to train the computer to distinguish whether e-mails received are spam or not, and there are various applications such as data classification, associated rule identification, and character recognition, which comply to the standard machine learning perspectives. It includes a series of processes, in which a computer finds its own patterns, creates a new classification system, analyzes the data, and produces meaningful results. The successful prediction occurs with the increase in probability and decrease in the error issues. Machine learning enables to sort out the issues with various iterative learning [2]. Among them, supervised learning is highly related to summarizing the learning methods for re- enforcement mechanisms [3].
  • 36. - 34 - Clustering is a process of mining the dataset by defining a cluster of data that considers the characteristics of input and finds a representative method to point out the data group. In this way, a cluster is a group of relevant data elements with similar characteristics. If the functions are not the same, the ingredients belong to contrast clusters [3]. Clustering is unsupervised learning without accuracy in answers. In the same way, the objects having the same information are grouped together for similar elements. However, the classification is a way related to supervised learning. When you perform classification operations, the system will learn to predict the dependent variable (Y) with the independent variable (X) of the data [4]. Scikit-learn is the machine learning platform in the middle range of superficial broad python module this package high-level language can us easily high-level documentation and proper API suggested. Using BSD license as academic or commercially use it. Source-code, documentation is downloaded from websites [10]. Supervised learning, Unsupervised Learning is the many problems is inserted in the Scikit-learn, Generalized Models, Linear and Quadratic Decruitment Analysis, Kernel Ridged regression, Support Vector machine, Stochastic Gradient Decent model’s solution also inserted in the Scikit-learn.
  • 37. - 35 - 3.5. Paper-2 Methodology K-means algorithm is one of the clustering methods for divided, divided is giving the data among the many partitions. For example, receive data object n, divided data is input data divided K (≤ n) data, each group consisting of cluster below equation is the at K-means algorithm when cluster consists of algorithms using cost function use it [11] argmin ∑ ∑ ‖𝑥 − 𝜇𝑖‖ 𝑥 ∈ 𝑆 𝑖 2𝑘 𝑖 =1 In other words, one of the data objects divided by the K group. Currently, the divided similarity is (dissimilarity with reducing the cost function about it. And from this theory each object similarity increase, different group similarity will decrease. [12] K-means algorithm is each centroid and in each group’s data object times’ summation, from this function result, the data object group updated clustering progressed. [5]
  • 38. - 36 - Silhouette score is the easy way to in data I each data cluster in data’s definition an (i) each data is not clustered inner and data’s definition b(i) silhouette score s(i) is equal to calculate that s(i) = 𝑏(𝑖) − 𝑎(𝑖) max { 𝑎(𝑖), 𝑏(𝑖)} From this calculate s(i) is equal to that function −1 ≤ s(i) ≤ 1 S(i) is the close to 1 is the data I is the correct cluster to each thing, close to -1 cannot distribute cluster is distributed, from this paper machine Using the machine learning library scikit-learn in the household power consumption clustering [7]. Household power consumption from the dataset Download from University California Irvine Machine Learning Dataset Repository [8] and then use it, this dataset is via delimiter is divided. Global_active_power, Global Reactive_power, Voltage, Global_intensity is divided. Global Active_power and Global Reactive power the X, Y axis experiment it.
  • 39. - 37 - Python library is Anaconda3 K-means algorithm’s key point is using Data keep K clusters, reduce cluster’s distance, K-means algorithms input data put the labels. figure 1 is the before check Calinski-Harabasz Index and Silhouette_score execute K-means algorithm’s result. Figure 1 to Figure 11 are 1/8 dataset k-means clustering result for House hold power consumption from UC Irvine Repository and reduce the dataset 1/8 times from original UCI machine learning data repository. 1.1.1. Experimental Environment Software: Anaconda3 + Pycharm3 OS Software: Window 10 Professional Ram 16.0GB Processor: i7-6600U CPU @2.60GHz Harddisk : 420GB SSD
  • 40. - 38 - 1.1.2. Experimental Dataset 1.date: Date in format dd/mm/yyyy 2.time: time in format hh:mm:ss 3.global_active_power: household global minute-averaged active power (in kilowatt) 4.global_reactive_power: household global minute-averaged reactive power (in kilowatt) 5.voltage: minute-averaged voltage (in volt) 6.global_intensity: household global minute-averaged current intensity (in ampere) 7.sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered). 8.sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing- machine, a tumble-drier, a refrigerator, and a light. 9.sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air- conditioner. Global_active_power Global_reactive_power Voltage Global_intensity Sub_metering_1 Sub_metering_2 Sub_metering_3 0 4.216 0.418 234.84 18.4 0 1 17 1 5.36 0.436 233.63 23 0 1 16 2 5.374 0.498 233.29 23 0 2 17 3 5.388 0.502 233.74 23 0 1 17 4 3.666 0.528 235.68 15.8 0 1 17
  • 41. - 39 - 1.1.3. Experimental Results Figure 13. 1/8 dataset cluster K=1 Figure 14. 1/8 dataset cluster K=2 Figure15. 1/8 dataset cluster K=3 Figure 16. 1/8 dataset cluster K=4
  • 42. - 40 - Figure 17. 1/8 dataset cluster K=5 Figure 18. 1/8 dataset cluster K=6 Figure 19. 1/8 dataset cluster K=7 Figure 20. 1/8 dataset cluster K=8
  • 43. - 41 - Figure 21. 1/8 dataset cluster K=9 Figure 22. 1/8 dataset cluster K=10 Figure 23. 1/8 dataset cluster K=11
  • 44. - 42 - Figure 24. Silhouette score according to change of cluster number. Figure 25. 1/8 dataset Silhouette score according to change of cluster number.
  • 45. - 43 - From K-means algorithms calculate proper cluster things is very important, from the data, estimate Silhouette_score, the result is K = 7 each cluster centroid and data prices silhouette score is 0.799 is the optimal score. Even if the dataset is so small but the 1/8 datasets K= 7 each cluster centroid and data prices silhouette score 0.810 is the optimal score. From this K-means algorithm cluster 7th, ( all dataset, 1/8 dataset ) each group’s centroid and each centroid distance will be an optimal value. From this result, the dataset is decreased but the K-means clustering ‘s class vector space. Its optimal cluster is the same situation with before original Dataset Household power consumption rate via clustering.
  • 46. - 44 - Summary From the paper, Household power consumption via k-means clustering, Used library which is sci-kit learn, Anaconda 3 open-source personally can easily follow it and because using BSD License to real works don’t have difficulties to that. From this result even if reduce the dataset 1/8 but the silhouette score and all the clustering result is same as before. But the population will increase it can show a clearer result for the classification and vector space. Large dataset to small dataset is clear to show to the specific result for the Silhouette score but the opposite site is not clearly allowed. Because of 4-dimension vector dataset. From the experiment reduce the estimated time if received huge dataset from the analysis.
  • 47. - 45 - Chapter 5 Conclusion his dissertation approach to a diverse aspect of the k-means clustering applications, First time try to reduce the k-means algorithm’s time consumption but next time I try to change my aspect to the how to reduce the time from Large dataset, the approach is changed. These days, via machine learning algorithm, can estimate about the when changing the part, (life span) From this result, all of the experiment Used library scikit-learn Anaconda3, open-source, it can easily implement any environment, because using BSD License. Can analyze diverse indexes from the first experiment. From the second experiment, if the dataset is huge need time to analyze, how many centroids is a proper k-mean cluster, at that time can reduce time, compare with 1/8 dataset, but limited classification and vector space. From the experiment reduce the estimated time if received huge dataset from the analysis.
  • 48. - 46 - Acknowledgment 대학원 석사 생활 중 총 114회의 컨퍼런스 참가와 7회의 발 표를 하였습니다. IEEE Globecom 2017 이 그 중 인상적이었으며, 본 논문은 Google AI, Tensor-flow Conference 2017 에서 Motivation 을 얻어 실험하게 되었습니다. 지도교수 이시면서 성균관대학교를 대 표하는 총장님이신 신동렬 교수님의 지도와, Co-Advisor 이신 Nawab Muhammad Fasheeh Queshi 와의 Co-work 에도 부족한 저를 항상 웃으며 지도해 주신대에 대하여 감사의 인사를 전합니다. 성균관대학교에 처음 Join 하게 도와주신 모바일 컴퓨팅연구 실 윤희용 교수님 SKKU Fellow 께도 감사드리며, 오픈랩에 생활 함에 있어 불편함이 없이 도와주신 남춘성 박사님과, 같이 사용한 최기현 박사님, Muhammad Hamza, Janaid , 김우현, 소 청에게도 고 마움의 뜻을 전합니다. 학위기간 동안에 끝까지 후원해 주셨던 어머니 이신 동남보 건대학교 이봉순 교수님, LG전자 평택캠퍼스 창립멤버 이신 아버 지 최한청 부장님 (현 온누리이엔지 이사) 에게도 감사의 인사를 전합니다. 학위기간동안 종종 집까지 바래다 주신, 친형인 포스코건설 최현 석 과장 및 분당서울대병원 안여울 간호사, 귀여운 조카 연우에게 도 고마움을 전합니다. 학위를 하면서 이정표가 되어준 사촌 누나 형들께도 감사의 인사 를 드리며 이만 갈음합니다. International Scholar Pooh ® 최현웅 드림
  • 49. - 47 - Acknowledgment I participated in a total of 114 conferences and 7 presentations during my graduate school life. IEEE Globecom 2017 was impressive, and this paper experimented with Motivation at Google AI, Tensor-flow Conference 2017. I would like to express my gratitude to Professor, Dr. Dong-Ryul, Shin who is the president of Sungkyunkwan University, and co-work with Co- Advisor Assistant Professor, Dr. Nawab Muhammad Fasheeh Qureshi. Thank you to SKKU Fellow, Professor Hee-Yong Yoon Director of Mobile Computing Lab for helping me to join Sungkyunkwan University for the first time. I am thankful to Dr. Kee-Hyun Kim, Muhammad Hamza, Junaid, Woo-Hyun Kim and Chung So I also want to thank you. I would like to extend my sincere thanks to Bong-Soon, Lee mother, Professor of Dongnam Health University, who supported me for the duration of my degree, and to my father Han-Chung Choi, who is a founding member of LG Electronics Pyeongtaek Campus. I am also grateful to Hyun-Suk, Choi POSCO E&C Deputy manager who has often taken his car to home during my degree, and my brother-in- law, Ye-ul, Ahn Nurse at Seoul National University Bundang Hospital and my cute nephew. Youn-Woo I give my thanks to my cousins and older brothers who gave me a milestone in my degree. International Scholar Pooh ® Hyun-Wong Choi June 19, 2019
  • 50. - 48 - References [1] https://en.wikipedia.org/wiki/K-means_clustering [2] https://en.wikipedia.org/wiki/Cluster_analysis [3] https://en.wikipedia.org/wiki/Silhouette_(clustering) [4] https://github.com/sarguido. [5] http://archive.ics.uci.edu/ml/datasets.html. [6] http://scikit-learn.org/stable/modules/clustering.html#calinski-harabaz-index [7] http://scikit-learn.org/stable/. [8] T. Calinski and J. Harabasz, 1974. “A dendrite method for cluster analysis”. Communications in Statistics [9] Kanungo, Tapas et al. “An Efficient k-Means Clustering Algorithm: Analysis and Implementation.” IEEE Trans. Pattern Anal. Mach. Intell. 24 (2002): 881-892. [10]David, and Sergei Vassilvitskii ,“k-means++: The advantages of careful seeding” Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics (2007): 1027-1035 [11]Wagstaff, K., Cardie, C., Rogers, S., & Schrödl, S. (2001, June). Constrained k- means clustering with background knowledge. In ICML (Vol. 1, pp. 577-584). [12]Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100-108. [13]Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis & Machine Intelligence, (7), 881-892. [14]Alsabti, K., Ranka, S., & Singh, V. (1997). An efficient k-means clustering algorithm. [15]Likas, A., Vlassis, N., & Verbeek, J. J. (2003). The global k-means clustering algorithm. Pattern recognition, 36(2), 451-461. [16]Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825-2830. [17]Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., ... & Layton, R. (2013). API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238. [18]Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., ... & Varoquaux, G. (2014). Machine learning for neuroimaging with scikit- learn. Frontiers in neuroinformatics, 8, 14. [19]Fabian, P., Gaël, V., Alexandre, G., Vincent, M., Bertrand, T., Olivier, G., ... & Alexandre, P. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830. [20]Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28.
  • 51. - 49 - [21]Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." Journal of machine learning research 12.Oct (2011): 2825-2830. [22]Alsabti, Khaled, Sanjay Ranka, and Vineet Singh. "An efficient k-means clustering algorithm." (1997). [23]Ding, Chris, and Xiaofeng He. "K-means clustering via principal component analysis." Proceedings of the twenty-first international conference on Machine learning. ACM, 2004. [24]Paneque-Gálvez, Jaime, et al. "Small drones for community-based forest monitoring: An assessment of their feasibility and potential in tropical areas." Forests 5.6 (2014): 1481-1507. [25]Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." Journal of machine learning research 12.Oct (2011): 2825-2830. [26]Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006. [27]Rasmussen, Carl Edward. "Gaussian processes in machine learning." Summer School on Machine Learning. Springer, Berlin, Heidelberg, 2003. [28]Hartigan, John A., and Manchek A. Wong. "Algorithm AS 136: A k-means clustering algorithm." Journal of the Royal Statistical Society. Series C (Applied Statistics) 28.1 (1979): 100-108. [29]Paneque-Gálvez, Jaime, et al. "Small drones for community-based forest monitoring: An assessment of their feasibility and potential in tropical areas." Forests 5.6 (2014): 1481-1507. [30]Sass, Ron, et al. "Reconfigurable computing cluster (RCC) project: Investigating the feasibility of FPGA-based petascale computing." 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007). IEEE, 2007. [31] Duda, Richard O., Peter E. Hart, and David G. Stork. Pattern classification. John Wiley & Sons, 2012. [32]Cover, Thomas M., and Peter E. Hart. "Nearest neighbor pattern classification." IEEE transactions on information theory13.1 (1967): 21-27. [33]Breiman, Leo. Classification and regression trees. Routledge, 2017. [34]Haralick, Robert M., and Karthikeyan Shanmugam. "Textural features for image classification." IEEE Transactions on systems, man, and cybernetics 6 (1973): 610- 621. [35]Chapelle, Olivier, Bernhard Scholkopf, and Alexander Zien. "Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]." IEEE Transactions on Neural Networks 20.3 (2009): 542-542. [36]Zhu, Xiaojin, Zoubin Ghahramani, and John D. Lafferty. "Semi-supervised learning using gaussian fields and harmonic functions." Proceedings of the 20th International conference on Machine learning (ICML-03). 2003. [37]Caruana, Rich, and Alexandru Niculescu-Mizil. "An empirical comparison of supervised learning algorithms." Proceedings of the 23rd international conference on Machine learning. ACM, 2006. [38]Jain, Anil K. "Data clustering: 50 years beyond K-means." Pattern recognition letters 31.8 (2010): 651-666. [39]Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015). [40]Figueiredo, Mario A. T., and Anil K. Jain. "Unsupervised learning of finite mixture models." IEEE Transactions on Pattern Analysis & Machine Intelligence 3 (2002): 381-396.
  • 52. - 50 - [41]Lovmar, Lovisa, et al. "Silhouette scores for assessment of SNP genotype clusters." BMC genomics 6.1 (2005): 35. [42]Collins, Robert T., Ralph Gross, and Jianbo Shi. "Silhouette-based human identification from body shape and gait." Proceedings of fifth IEEE international conference on automatic face gesture recognition. IEEE, 2002. [43]Gat-Viks, Irit, Roded Sharan, and Ron Shamir. "Scoring clustering solutions by their biological relevance." Bioinformatics 19.18 (2003): 2381-2389. [44]Maulik, Ujjwal, and Sanghamitra Bandyopadhyay. "Performance evaluation of some clustering algorithms and validity indices." IEEE Transactions on pattern analysis and machine intelligence 24.12 (2002): 1650-1654. [45]Łukasik, Szymon, et al. "Clustering using flower pollination algorithm and calinski- harabasz index." 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2016. [46]Desgraupes, Bernard. "Clustering indices." University of Paris Ouest-Lab Modal’X 1 (2013): 34. [47]Petrovic, Slobodan. "A comparison between the silhouette index and the davies- bouldin index in labelling ids clusters." Proceedings of the 11th Nordic Workshop of Secure IT Systems. sn, 2006. [48]Maulik, Ujjwal, and Sanghamitra Bandyopadhyay. "Performance evaluation of some clustering algorithms and validity indices." IEEE Transactions on pattern analysis and machine intelligence 24.12 (2002): 1650-1654. [49]Petrovic, Slobodan. "A comparison between the silhouette index and the davies- bouldin index in labelling ids clusters." Proceedings of the 11th Nordic Workshop of Secure IT Systems. sn, 2006. [50] https://scikit-learn.org/stable/ [51] https://www.anaconda.com/ [52] https://www.jetbrains.com/pycharm/ [53] Petrovic, Slobodan. "A comparison between the silhouette index and the davies- bouldin index in labelling ids clusters." Proceedings of the 11th Nordic Workshop of Secure IT Systems. sn, 2006. [54] Bandyopadhyay, Sanghamitra, and Ujjwal Maulik. "Nonparametric genetic clustering: comparison of validity indices." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 31.1 (2001): 120-125. [55] https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consu mption [56] https://github.com/sarguido
  • 53. - 51 - 논문요약 실루엣 스코어를 활용한 가정 전력소비량 분석 성균관대학교 일반대학원 전자전기컴퓨터공학과 최현웅 기계 학습은 분산 컴퓨팅 환경에서 데이터 분석을위한 새로운 도구로 등장한 현대적인 분야입니다. 기계 학습은 분석의 효율성과 함께 처리 능력을 향상시키는 몇 가지 측면이 있습니다. 본 논문에서는 주택의 전력 사용량을 최적의 가용 전력 데이터 포인트를 얻기위한 K-means 클러스터링 알고리즘을 통해 분석합니다. Davis Boulden Index와 Silhouette_score는 K-means 알고리즘에서 클러스터의 최적 개수를 찾아 기계 학습 클러스터링 분석의 응용 시나리오를 제시합니다 기계 학습은 분산 컴퓨팅 환경에서 대규모 지능형 분석을 찾기 위해 진화 된 인공 지능의 최첨단 하위 프로젝트입니다. 본 논문에서는 1/8 데이터 세트의 실루엣 점수와의 비교를 통해 K-means 클러스터링 알고리즘을 기반으로 가정의 전기 사용량에 대해 수집 된 데이터 세트에 대한 비교 분석을 수행합니다. 성능 평가 결과 데이터 세트가 이전보다 작더라도 비교 점수는 실루엣 점수의 수가 유사 함을 보여줍니다 주제어: Machine Learning, K-means clustering, Sci-kit Learn , Silhouette score , Caliski-Harabasz Index