SlideShare a Scribd company logo
Aparna K et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.214-217

RESEARCH ARTICLE

www.ijera.com

OPEN ACCESS

Selection of Initial Seed Values for K-Means Algorithm Using
Taguchi Method as an Optimization Technique
Aparna K*, Mydhili K Nair**
*Associate Professor, Dept of MCA, BMS Institute of Technology, Bangalore, Karnataka, INDIA
**Associate Professor, Dept. Of ISE, M S Ramaiah Institute of Technology, Bangalore, Karnataka, INDIA

ABSTRACT
This paper proposes an enhancement of the performance of the traditional K-Means algorithm of Partitional
clustering by using Taguchi method as an optimization technique. K-Means algorithm requires the desired
number of clusters to be known in priori. Given the desired number of clusters, the initial seed values are
selected randomly. The K-means algorithm does not have any specific mechanism to choose the appropriate
initial seeds and selecting different initial seeds may generate huge differences in terms of the clustering results.
This is true especially when the target sample contains many outliers. In addition, random selection of initial
seeds often makes the clustering to fall into local optimization. Therefore, it is very important to select
appropriate initial seeds when using the traditional clustering method. Hence, to select the appropriate initial
seeds, we propose the use of Taguchi method in this paper as a tool. Using the Taguchi method, the initial seed
values are selected based on the signal-to-noise ratios and with these values as the initial input, the K-Means
algorithm can be continued to form the clusters for the given data.
Keywords: K-Means algorithm, Taguchi method, Clustering Technique, Partitional Clustering, Data Mining,
Orthogonal Array

I.

INTRODUCTION

Clustering high dimensional data is an
observable fact in real-world data mining
applications.
Developing
efficient
clustering
techniques for high dimensional dataset is a
challenging role because of the curse of
dimensionality [6]. The aim of clustering is to find
structure in data and is therefore exploratory in
nature. Clustering has a long and rich history in a
variety of scientific fields. One of the most popular
and simple clustering algorithms is K-Means. KMeans validity measure is based on the intra-cluster
and inter-cluster distance measures which allows the
number of clusters to be determined automatically.
The main disadvantage of the K-Means algorithm is
that the number of clusters „K‟ must be supplied as a
parameter [4]. Moreover, to start with the K-Means
algorithm, initial seed values based on the number of
clusters required are selected randomly. There is no
specific mechanism to select these initial seed values
and hence the results can vary for every iteration or
for every set of initial seed values. The objective of
this paper is to make use of Taguchi technique to
select the initial seed values. Once the seed values
are selected, the K-Means algorithm is continued
until the desired number of clusters are obtained.
The initial seed values are identified by their rank
based on signal-to-noise ratios using the Taguchi
method [7]. The rest of the paper is organized as
follows. The rest of the paper is organized as
www.ijera.com

follows: Section II gives the overview of K-Means
algorithm and Taguchi method. Section III illustrates
the implementation of the work. The performance
and results are shown in section IV and the
conclusion is given in section V.

II.

OVERVIEW OF TECHNIQUES
USED

A. K-Means Algorithm
The K-Means algorithm is composed of the
following steps
Step 1: Place K points into the space represented by
the objects that are being points represent initial
group centroids.
Step 2: Assign each object to the group that has the
closest centroid.
Step 3: When all objects have been assigned,
recalculate the positions of the K centroids.
Step 4: Repeat Steps 2 and 3 until the centroids no
longer move.
K-Means clustering algorithm partitions the
given data set into k number of clusters of points.
That is, the K-Means algorithm clusters all the data
points in the data set such that each point falls in one
and only one of the k partitions. Points with the same
cluster similarity are in the same cluster, while points
with different cluster difference fall in different
clusters. In clustering algorithms, points are grouped
by some notion of “closeness” or “similarity.” In KMeans, the default measure of closeness is the
214 | P a g e
Aparna K et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.214-217
Euclidean distance. The Euclidean distance formula
is
EUCLIDEAN DISTANCE(X, Y)
= (|X1-Y1|2 + |X2-Y2|2 + ... + |XN-1-YN-1|2 + |XN-YN|2 )1/2
B. Taguchi Method
Design of Experiments (DOE) is a collection
of statistical methods for studying the relationships
between input variables (independent variables) and
their interactions on a response variable (dependent
variable) [2]. Taguchi method provides simple,
efficient and systematic approach to optimize designs
for performance, quality and cost. This method
combines the engineering and statistical knowledge
to optimize design and manufacturing processes to
achieve high quality at low cost and time [1].
Taguchi design is an important tool for robust design.
It helps to identify the process parameters and their
levels to put quality characteristic on target and to
limit number of experiments for optimization [3]. In
this paper, Taguchi Design of Experiments (DOE)
approach has been used to analyze the selection of
initial seed values of Step 1 in the above K-Means
algorithm.

III.

EXPERIMENTAL DESIGN

The experiment using Taguchi method was
considered to calculate the initial seed values from a
huge data set in order to predict the value of relative
humidity. Relative Humidity is calculated based on
the values of Temperature, Dew Point temperature
and two constant values based on August-RocheMagnus Approximation [5] (taken from ClausiusClapeyron Relation). Given these values the Relative
Humidity is calculated using the formula

www.ijera.com

considered valid based on August-Roche-Magnus
Approximation is
0°C < T < 60°C
0°C < Td < 50°C
The K-Means algorithm was then used to
form clusters based on the similar values of the four
factors considered. Table I below shows four factors
and three levels used in the experiment. If three
levels were assigned to each of these factors and a
factorial experimental design was employed using
each of these values, number permutations would be
34. The fractional factorial design reduced the
number of designs to nine. The orthogonal array of
L9 type was used and is represented in Table II
below.
TABLE I
LEVEL OF EXPERIMENTAL PARAMETERS
Symbol
Factor
Level
1
2
3
A
Temperature
23
28
33
B
Dew Point
20
25
27
C
Constant a
17.212 17.368 17.625
D
Constant b
235.48 238.88 243.04
TABLE II
TAGUCHI‟S L9 (34) ORTHOGONAL ARRAY
Factors
Std. order

A

B

C

D

1

23

20

17.212

235.48

2

23

25

17.368

238.88

3

23

27

17.625

243.04

4

28

20

17.368

243.04

5

28

25

17.625

235.48

6

28

27

17.212

238.88

7

33

20

17.625

238.33

8

33

25

17.212

243.04

9

33

27

17.368

235.48

IV.
Where T is the Temperature in °C, Td is
Dew point temperature in °C. The range of values

Expt.
No.
1
2
3

PERFORMANCE AND RESULTS

Nine experiments were conducted based on
the above values and the complete response table for
these data is shown in Table III below.

TABLE III
EXPERIMENTAL DATA AND SAMPLE STATISTICS
Logarithmic
Observed Response values for
Standard
Mean
value of Std.
RH
Deviation
Deviation
83.183
82.123
82.781
82.696
0.53505
-0.62540
112.758
110.678
111.345
111.594
1.06211
0.06026
126.935
127.987
125.123
126.682
1.44867
0.37065

www.ijera.com

S/N Ratio
43.7818
40.4294
38.8348
215 | P a g e
Aparna K et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.214-217
4
5
6
7
8
9

62.272
83.409
94.370
45.883
63.616
70.595

63.989
83.001
95.190
44.761
64.109
70.231

61.712
82.990
94.559
45.083
63.923
71.092

62.658
83.133
94.706
45.242
63.883
70.639

In this paper, optimal humidity is taken as
the indicator of better initial value. Therefore, the
nominal is best is selected for obtaining initial values.

1.18645
0.23856
0.42929
0.57777
0.24882
0.43222

www.ijera.com

0.17097
-1.43312
-0.84562
-0.54858
-1.39104
-0.83882

34.4545
50.8434
46.8726
37.8758
48.1901
44.2668

The S/N Ratio for nominal-is-best is calculated using
the formula:
S/N Ratio = (10*Log10(Ybar**2/s**2))

The plot of factor effects on S/N Ratio is shown below in fig1.
Main Effects Plot for SN ratios
Data Means

A

B

46

Mean of SN ratios

44
42
40
1

2
C

3

1

2
D

3

1

2

3

1

2

3

46
44
42
40

Signal-to-noise: Nominal is best (10*Log10(Ybar**2/s**2))

Fig 1. Plot of factor effects on S/N Ratio
The Taguchi analysis for prediction of optimal values is shown in the response tables below.
Response Table for Signal to Noise Ratios
Nominal is best (10*Log10(Ybar**2/s**2))
Level A
1
41.02
2
44.06
3
43.44
Delta
3.04
Rank
4

B
38.70
46.49
43.32
7.78
1

C
46.28
39.72
42.52
6.56
2

D
46.30
41.73
40.49
5.80
3

www.ijera.com

C
80.43
81.63
85.02
4.59
4

Level
1
2
3
Delta
Rank

A
1.0153
0.6181
0.4196
0.5957
1

Predicted values
S/N Ratio
Mean
35.4470 92.3112

Response Table for Means
Level
A
B
1
106.99 63.53
2
80.17 86.20
3
59.92 97.34
Delta 47.07 33.81
Rank
1
2

Response Table for Standard Deviations

D
78.82
83.85
84.41
5.58
3

B
0.7664
0.5165
0.7701
0.2536
4

StDev
1.17344

Factor levels for predictions
A
B
C
1
1
3

C
0.4044
0.8936
0.7550
0.4892
3

D
0.4019
0.6897
0.9613
0.5594
2

Ln (StDev)
0.312732

D
2

Hence the predicted values by Taguchi method are
level 1 for factors A and B, level 3 for factor C and
level 2 for factor D. These values can be taken as
the initial seed value for performing K-Means
algorithm.
216 | P a g e
Aparna K et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.214-217
V.

CONCLUSION

Taguchi‟s prediction gives the optimal
results. Hence we can use these predicted values as
the initial seed values for the K-Means algorithm.
This gives better quality of clusters rather than
selecting the initial seed values randomly. This also
results in a standard mechanism before the final
results, rather the clusters, are obtained.

REFERENCES
[1].

[2].

[3].

W. H. Yang, Y.S. Tang, Design
optimization of cutting parameters for
tuning operations based on Taguchi method,
Journal of Materials processing technology,
84, 1998, pp 122-129
Howard S Gitlow, Alan J Oppenheim, Rosa
Oppenheim, David M Levine, Quality
Management, McGraw Hill, 2009, pp 427429
Jaharah A Ghani, et al., Philosophy of
Taguchi Approach and method in Design of
Experiment, Asian Journal of Scientific
Research, 6(1), pp 27 – 37, 2013.

www.ijera.com

[4].

[5].

[6].

[7].

[8].

www.ijera.com

Joydeep Ghosh and Alexander Liu, Journal
of Machine Learning Research (JMLR)
Vol.6, pp. 1705-1749. 2005.-“K-Means-type
Algorithms A Generalized Convergence
Theorem
and
characterized
Local
Optimality”
Clausius-Clapeyron Equation – Wikipedia –
Redirected from August-Roche-Magnus
Approximation
Data Mining: Concepts and Techniques,
Jiawei Han and Micheline Kambar, Elsevier,
2nd Edition
Mustafa Zaidi, Bushra A Saeed, I. Amin,
Nukman Yusoff, “Experimental Data
Mining Techniques”, International Journal
of Computer Science Issues, Vol 9, Issue 3,
No.3, May 2012
Huei Chun Wang, Chih-Chou Chiu, ChaoTon Su, “Data Classification using the
Mahalanobis – Taguchi System”, Journal of
the Chinese Institute of Industrial
Engineers”, Vol 21, No, 6, pp. 608-618,
2004.

217 | P a g e

More Related Content

What's hot

Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
IJDKP
 
Noura2
Noura2Noura2
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
IRJET Journal
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimization
Alexander Decker
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiers
amreshkr19
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
IJERA Editor
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
ijscmc
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2
IAEME Publication
 
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET Journal
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
IJDKP
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
Junghoon Kim
 
50120130406008
5012013040600850120130406008
50120130406008
IAEME Publication
 
K means report
K means reportK means report
K means report
Gaurav Handa
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
IRJET Journal
 
Ip3514921495
Ip3514921495Ip3514921495
Ip3514921495
IJERA Editor
 
Survey of K means Clustering and Hierarchical Clustering for Road Accident An...
Survey of K means Clustering and Hierarchical Clustering for Road Accident An...Survey of K means Clustering and Hierarchical Clustering for Road Accident An...
Survey of K means Clustering and Hierarchical Clustering for Road Accident An...
IRJET Journal
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET Journal
 
50120140505013
5012014050501350120140505013
50120140505013
IAEME Publication
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
iosrjce
 
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
IJMER
 

What's hot (20)

Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Noura2
Noura2Noura2
Noura2
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimization
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiers
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2
 
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
 
50120130406008
5012013040600850120130406008
50120130406008
 
K means report
K means reportK means report
K means report
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
 
Ip3514921495
Ip3514921495Ip3514921495
Ip3514921495
 
Survey of K means Clustering and Hierarchical Clustering for Road Accident An...
Survey of K means Clustering and Hierarchical Clustering for Road Accident An...Survey of K means Clustering and Hierarchical Clustering for Road Accident An...
Survey of K means Clustering and Hierarchical Clustering for Road Accident An...
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
50120140505013
5012014050501350120140505013
50120140505013
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
 
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
 

Viewers also liked

Ah4201223230
Ah4201223230Ah4201223230
Ah4201223230
IJERA Editor
 
E41033336
E41033336E41033336
E41033336
IJERA Editor
 
Ap4103260265
Ap4103260265Ap4103260265
Ap4103260265
IJERA Editor
 
Ai4103203205
Ai4103203205Ai4103203205
Ai4103203205
IJERA Editor
 
I41035057
I41035057I41035057
I41035057
IJERA Editor
 
R4103111114
R4103111114R4103111114
R4103111114
IJERA Editor
 
Bi4103375380
Bi4103375380Bi4103375380
Bi4103375380
IJERA Editor
 
At4103280291
At4103280291At4103280291
At4103280291
IJERA Editor
 
D41032632
D41032632D41032632
D41032632
IJERA Editor
 
Ay4103315317
Ay4103315317Ay4103315317
Ay4103315317
IJERA Editor
 
Ib3514141422
Ib3514141422Ib3514141422
Ib3514141422
IJERA Editor
 
Hx3513811385
Hx3513811385Hx3513811385
Hx3513811385
IJERA Editor
 
Monografia
MonografiaMonografia
Monografia
rafaelmaldonado73
 
Modulo2 tic ed
Modulo2 tic edModulo2 tic ed
Modulo2 tic ed
Olivovy
 
Respuesta segundo derecho de petición interpuesto por concejal Ramírez
Respuesta segundo derecho de petición interpuesto por concejal Ramírez Respuesta segundo derecho de petición interpuesto por concejal Ramírez
Respuesta segundo derecho de petición interpuesto por concejal Ramírez Canal Capital
 
Scracht images
Scracht imagesScracht images
Scracht images
pufflenaranja
 
Pra 3
Pra 3Pra 3
Johann Moritz Rugendas
Johann Moritz RugendasJohann Moritz Rugendas
Johann Moritz Rugendas
Rosa Silva
 
Teresa Griffinres
Teresa GriffinresTeresa Griffinres
Teresa Griffinres
Teresa Griffin
 

Viewers also liked (20)

Ah4201223230
Ah4201223230Ah4201223230
Ah4201223230
 
E41033336
E41033336E41033336
E41033336
 
Ap4103260265
Ap4103260265Ap4103260265
Ap4103260265
 
Ai4103203205
Ai4103203205Ai4103203205
Ai4103203205
 
I41035057
I41035057I41035057
I41035057
 
R4103111114
R4103111114R4103111114
R4103111114
 
Bi4103375380
Bi4103375380Bi4103375380
Bi4103375380
 
At4103280291
At4103280291At4103280291
At4103280291
 
D41032632
D41032632D41032632
D41032632
 
Ay4103315317
Ay4103315317Ay4103315317
Ay4103315317
 
Ib3514141422
Ib3514141422Ib3514141422
Ib3514141422
 
Hx3513811385
Hx3513811385Hx3513811385
Hx3513811385
 
Monografia
MonografiaMonografia
Monografia
 
Modulo2 tic ed
Modulo2 tic edModulo2 tic ed
Modulo2 tic ed
 
Respuesta segundo derecho de petición interpuesto por concejal Ramírez
Respuesta segundo derecho de petición interpuesto por concejal Ramírez Respuesta segundo derecho de petición interpuesto por concejal Ramírez
Respuesta segundo derecho de petición interpuesto por concejal Ramírez
 
Scracht images
Scracht imagesScracht images
Scracht images
 
Hostel study
Hostel studyHostel study
Hostel study
 
Pra 3
Pra 3Pra 3
Pra 3
 
Johann Moritz Rugendas
Johann Moritz RugendasJohann Moritz Rugendas
Johann Moritz Rugendas
 
Teresa Griffinres
Teresa GriffinresTeresa Griffinres
Teresa Griffinres
 

Similar to Af4201214217

A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithm
IRJET Journal
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
IRJET Journal
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression Data
IRJET Journal
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
Editor IJCATR
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
Editor IJCATR
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
IRJET Journal
 
Classification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka ToolClassification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka Tool
IRJET Journal
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
Natasha Grant
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce
IJECEIAES
 
Machine Learning, K-means Algorithm Implementation with R
Machine Learning, K-means Algorithm Implementation with RMachine Learning, K-means Algorithm Implementation with R
Machine Learning, K-means Algorithm Implementation with R
IRJET Journal
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
PRAWEEN KUMAR
 
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET Journal
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
IJECEIAES
 
F04463437
F04463437F04463437
F04463437
IOSR-JEN
 
An improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyAn improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracy
ijpla
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
IJERA Editor
 
IRJET- Different Data Mining Techniques for Weather Prediction
IRJET-  	  Different Data Mining Techniques for Weather PredictionIRJET-  	  Different Data Mining Techniques for Weather Prediction
IRJET- Different Data Mining Techniques for Weather Prediction
IRJET Journal
 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal Clusters
IJCSEA Journal
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET Journal
 
Prediction of Critical Temperature of Superconductors using Tree Based Method...
Prediction of Critical Temperature of Superconductors using Tree Based Method...Prediction of Critical Temperature of Superconductors using Tree Based Method...
Prediction of Critical Temperature of Superconductors using Tree Based Method...
IRJET Journal
 

Similar to Af4201214217 (20)

A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithm
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression Data
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
 
Classification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka ToolClassification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka Tool
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce
 
Machine Learning, K-means Algorithm Implementation with R
Machine Learning, K-means Algorithm Implementation with RMachine Learning, K-means Algorithm Implementation with R
Machine Learning, K-means Algorithm Implementation with R
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 
F04463437
F04463437F04463437
F04463437
 
An improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyAn improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracy
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
 
IRJET- Different Data Mining Techniques for Weather Prediction
IRJET-  	  Different Data Mining Techniques for Weather PredictionIRJET-  	  Different Data Mining Techniques for Weather Prediction
IRJET- Different Data Mining Techniques for Weather Prediction
 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal Clusters
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
Prediction of Critical Temperature of Superconductors using Tree Based Method...
Prediction of Critical Temperature of Superconductors using Tree Based Method...Prediction of Critical Temperature of Superconductors using Tree Based Method...
Prediction of Critical Temperature of Superconductors using Tree Based Method...
 

Recently uploaded

Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 

Recently uploaded (20)

Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 

Af4201214217

  • 1. Aparna K et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.214-217 RESEARCH ARTICLE www.ijera.com OPEN ACCESS Selection of Initial Seed Values for K-Means Algorithm Using Taguchi Method as an Optimization Technique Aparna K*, Mydhili K Nair** *Associate Professor, Dept of MCA, BMS Institute of Technology, Bangalore, Karnataka, INDIA **Associate Professor, Dept. Of ISE, M S Ramaiah Institute of Technology, Bangalore, Karnataka, INDIA ABSTRACT This paper proposes an enhancement of the performance of the traditional K-Means algorithm of Partitional clustering by using Taguchi method as an optimization technique. K-Means algorithm requires the desired number of clusters to be known in priori. Given the desired number of clusters, the initial seed values are selected randomly. The K-means algorithm does not have any specific mechanism to choose the appropriate initial seeds and selecting different initial seeds may generate huge differences in terms of the clustering results. This is true especially when the target sample contains many outliers. In addition, random selection of initial seeds often makes the clustering to fall into local optimization. Therefore, it is very important to select appropriate initial seeds when using the traditional clustering method. Hence, to select the appropriate initial seeds, we propose the use of Taguchi method in this paper as a tool. Using the Taguchi method, the initial seed values are selected based on the signal-to-noise ratios and with these values as the initial input, the K-Means algorithm can be continued to form the clusters for the given data. Keywords: K-Means algorithm, Taguchi method, Clustering Technique, Partitional Clustering, Data Mining, Orthogonal Array I. INTRODUCTION Clustering high dimensional data is an observable fact in real-world data mining applications. Developing efficient clustering techniques for high dimensional dataset is a challenging role because of the curse of dimensionality [6]. The aim of clustering is to find structure in data and is therefore exploratory in nature. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms is K-Means. KMeans validity measure is based on the intra-cluster and inter-cluster distance measures which allows the number of clusters to be determined automatically. The main disadvantage of the K-Means algorithm is that the number of clusters „K‟ must be supplied as a parameter [4]. Moreover, to start with the K-Means algorithm, initial seed values based on the number of clusters required are selected randomly. There is no specific mechanism to select these initial seed values and hence the results can vary for every iteration or for every set of initial seed values. The objective of this paper is to make use of Taguchi technique to select the initial seed values. Once the seed values are selected, the K-Means algorithm is continued until the desired number of clusters are obtained. The initial seed values are identified by their rank based on signal-to-noise ratios using the Taguchi method [7]. The rest of the paper is organized as follows. The rest of the paper is organized as www.ijera.com follows: Section II gives the overview of K-Means algorithm and Taguchi method. Section III illustrates the implementation of the work. The performance and results are shown in section IV and the conclusion is given in section V. II. OVERVIEW OF TECHNIQUES USED A. K-Means Algorithm The K-Means algorithm is composed of the following steps Step 1: Place K points into the space represented by the objects that are being points represent initial group centroids. Step 2: Assign each object to the group that has the closest centroid. Step 3: When all objects have been assigned, recalculate the positions of the K centroids. Step 4: Repeat Steps 2 and 3 until the centroids no longer move. K-Means clustering algorithm partitions the given data set into k number of clusters of points. That is, the K-Means algorithm clusters all the data points in the data set such that each point falls in one and only one of the k partitions. Points with the same cluster similarity are in the same cluster, while points with different cluster difference fall in different clusters. In clustering algorithms, points are grouped by some notion of “closeness” or “similarity.” In KMeans, the default measure of closeness is the 214 | P a g e
  • 2. Aparna K et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.214-217 Euclidean distance. The Euclidean distance formula is EUCLIDEAN DISTANCE(X, Y) = (|X1-Y1|2 + |X2-Y2|2 + ... + |XN-1-YN-1|2 + |XN-YN|2 )1/2 B. Taguchi Method Design of Experiments (DOE) is a collection of statistical methods for studying the relationships between input variables (independent variables) and their interactions on a response variable (dependent variable) [2]. Taguchi method provides simple, efficient and systematic approach to optimize designs for performance, quality and cost. This method combines the engineering and statistical knowledge to optimize design and manufacturing processes to achieve high quality at low cost and time [1]. Taguchi design is an important tool for robust design. It helps to identify the process parameters and their levels to put quality characteristic on target and to limit number of experiments for optimization [3]. In this paper, Taguchi Design of Experiments (DOE) approach has been used to analyze the selection of initial seed values of Step 1 in the above K-Means algorithm. III. EXPERIMENTAL DESIGN The experiment using Taguchi method was considered to calculate the initial seed values from a huge data set in order to predict the value of relative humidity. Relative Humidity is calculated based on the values of Temperature, Dew Point temperature and two constant values based on August-RocheMagnus Approximation [5] (taken from ClausiusClapeyron Relation). Given these values the Relative Humidity is calculated using the formula www.ijera.com considered valid based on August-Roche-Magnus Approximation is 0°C < T < 60°C 0°C < Td < 50°C The K-Means algorithm was then used to form clusters based on the similar values of the four factors considered. Table I below shows four factors and three levels used in the experiment. If three levels were assigned to each of these factors and a factorial experimental design was employed using each of these values, number permutations would be 34. The fractional factorial design reduced the number of designs to nine. The orthogonal array of L9 type was used and is represented in Table II below. TABLE I LEVEL OF EXPERIMENTAL PARAMETERS Symbol Factor Level 1 2 3 A Temperature 23 28 33 B Dew Point 20 25 27 C Constant a 17.212 17.368 17.625 D Constant b 235.48 238.88 243.04 TABLE II TAGUCHI‟S L9 (34) ORTHOGONAL ARRAY Factors Std. order A B C D 1 23 20 17.212 235.48 2 23 25 17.368 238.88 3 23 27 17.625 243.04 4 28 20 17.368 243.04 5 28 25 17.625 235.48 6 28 27 17.212 238.88 7 33 20 17.625 238.33 8 33 25 17.212 243.04 9 33 27 17.368 235.48 IV. Where T is the Temperature in °C, Td is Dew point temperature in °C. The range of values Expt. No. 1 2 3 PERFORMANCE AND RESULTS Nine experiments were conducted based on the above values and the complete response table for these data is shown in Table III below. TABLE III EXPERIMENTAL DATA AND SAMPLE STATISTICS Logarithmic Observed Response values for Standard Mean value of Std. RH Deviation Deviation 83.183 82.123 82.781 82.696 0.53505 -0.62540 112.758 110.678 111.345 111.594 1.06211 0.06026 126.935 127.987 125.123 126.682 1.44867 0.37065 www.ijera.com S/N Ratio 43.7818 40.4294 38.8348 215 | P a g e
  • 3. Aparna K et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.214-217 4 5 6 7 8 9 62.272 83.409 94.370 45.883 63.616 70.595 63.989 83.001 95.190 44.761 64.109 70.231 61.712 82.990 94.559 45.083 63.923 71.092 62.658 83.133 94.706 45.242 63.883 70.639 In this paper, optimal humidity is taken as the indicator of better initial value. Therefore, the nominal is best is selected for obtaining initial values. 1.18645 0.23856 0.42929 0.57777 0.24882 0.43222 www.ijera.com 0.17097 -1.43312 -0.84562 -0.54858 -1.39104 -0.83882 34.4545 50.8434 46.8726 37.8758 48.1901 44.2668 The S/N Ratio for nominal-is-best is calculated using the formula: S/N Ratio = (10*Log10(Ybar**2/s**2)) The plot of factor effects on S/N Ratio is shown below in fig1. Main Effects Plot for SN ratios Data Means A B 46 Mean of SN ratios 44 42 40 1 2 C 3 1 2 D 3 1 2 3 1 2 3 46 44 42 40 Signal-to-noise: Nominal is best (10*Log10(Ybar**2/s**2)) Fig 1. Plot of factor effects on S/N Ratio The Taguchi analysis for prediction of optimal values is shown in the response tables below. Response Table for Signal to Noise Ratios Nominal is best (10*Log10(Ybar**2/s**2)) Level A 1 41.02 2 44.06 3 43.44 Delta 3.04 Rank 4 B 38.70 46.49 43.32 7.78 1 C 46.28 39.72 42.52 6.56 2 D 46.30 41.73 40.49 5.80 3 www.ijera.com C 80.43 81.63 85.02 4.59 4 Level 1 2 3 Delta Rank A 1.0153 0.6181 0.4196 0.5957 1 Predicted values S/N Ratio Mean 35.4470 92.3112 Response Table for Means Level A B 1 106.99 63.53 2 80.17 86.20 3 59.92 97.34 Delta 47.07 33.81 Rank 1 2 Response Table for Standard Deviations D 78.82 83.85 84.41 5.58 3 B 0.7664 0.5165 0.7701 0.2536 4 StDev 1.17344 Factor levels for predictions A B C 1 1 3 C 0.4044 0.8936 0.7550 0.4892 3 D 0.4019 0.6897 0.9613 0.5594 2 Ln (StDev) 0.312732 D 2 Hence the predicted values by Taguchi method are level 1 for factors A and B, level 3 for factor C and level 2 for factor D. These values can be taken as the initial seed value for performing K-Means algorithm. 216 | P a g e
  • 4. Aparna K et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.214-217 V. CONCLUSION Taguchi‟s prediction gives the optimal results. Hence we can use these predicted values as the initial seed values for the K-Means algorithm. This gives better quality of clusters rather than selecting the initial seed values randomly. This also results in a standard mechanism before the final results, rather the clusters, are obtained. REFERENCES [1]. [2]. [3]. W. H. Yang, Y.S. Tang, Design optimization of cutting parameters for tuning operations based on Taguchi method, Journal of Materials processing technology, 84, 1998, pp 122-129 Howard S Gitlow, Alan J Oppenheim, Rosa Oppenheim, David M Levine, Quality Management, McGraw Hill, 2009, pp 427429 Jaharah A Ghani, et al., Philosophy of Taguchi Approach and method in Design of Experiment, Asian Journal of Scientific Research, 6(1), pp 27 – 37, 2013. www.ijera.com [4]. [5]. [6]. [7]. [8]. www.ijera.com Joydeep Ghosh and Alexander Liu, Journal of Machine Learning Research (JMLR) Vol.6, pp. 1705-1749. 2005.-“K-Means-type Algorithms A Generalized Convergence Theorem and characterized Local Optimality” Clausius-Clapeyron Equation – Wikipedia – Redirected from August-Roche-Magnus Approximation Data Mining: Concepts and Techniques, Jiawei Han and Micheline Kambar, Elsevier, 2nd Edition Mustafa Zaidi, Bushra A Saeed, I. Amin, Nukman Yusoff, “Experimental Data Mining Techniques”, International Journal of Computer Science Issues, Vol 9, Issue 3, No.3, May 2012 Huei Chun Wang, Chih-Chou Chiu, ChaoTon Su, “Data Classification using the Mahalanobis – Taguchi System”, Journal of the Chinese Institute of Industrial Engineers”, Vol 21, No, 6, pp. 608-618, 2004. 217 | P a g e