SlideShare a Scribd company logo
Dr. C. Chandrasekar et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 4( Version 1), April 2014, pp.209-211
www.ijera.com 209 | P a g e
Clustering Large Databases Using Gmm
Dr. C. Chandrasekar*, M. Srisankar**
* (Assistant Professor, Department of Computer Applications, Sree Narayana Guru College, Coimbatore – 105)
** (Research Scholar, Department of Computer Science, Bharathiar University, Coimbatore - 46)
ABSTRACT
Innovation of such clusters of data is essential in illuminating main links in categorical data regulatory networks.
There are lot of problems exists in the previous clustering methods especially while grouping the data with
mixed data types. This experiment analyzes those existing methods and comes with the new approach for
clustering the mixed data items. There are many methods occur for clustering the parallel kind of data, whereas
only very few methods exists for clustering mixed data items and it leads to the need of better clustering
technique for classification of mixed data. For investigative data, the clustering with the help of Gaussian
mixture models is widely used.
Keywords - Clustering methods, High dimensional data, K-means clustering, GMM, Adult data set, Categorical
data
I. INTRODUCTION
In knowledge discovery, a basis data mining
technique used is data clustering. Most widely used
methods in data mining is clustering and its core is
grouping the whole data based on its parallel
measures that based on some distance measure. The
problem of clustering has become increasingly
important in recent years. Data mining offers a suite
of algorithms, each addressing a different task and in
the process elucidating a unique facet of the data [1].
Of the many facets of data mining, we are
particularly interested in clustering problems, i.e. the
process of finding similarities in the data and then
grouping similar data into identifiable clusters.
Cluster analysis or clustering is the task of grouping a
set of objects in such a way that objects in the same
group (called a cluster) are more similar (in some
sense or another) to each other than to those in other
groups (clusters). Most techniques of clustering
comprise document grouping, scientific data analysis
and customer/market segmentation. Usually,
clustering involves the classification of the granting
data that includes n points in m dimension into k
clusters. The clustering must be such that the data
points in the corresponding cluster that should be
highly similar to one another.The problems exists in
the clustering techniques are: Identifying the similar
measure to find the similarity among various data, it
is complex to find out the suitable methods for
identifying the identical data in unsupervised way
and originate a description that can differentiate the
data of a cluster in an efficient manner. It is a vital
task of experimental data mining, and a usual method
for mathematical data study, used in many areas,
such as machine learning, image analysis,
information retrieval and pattern recognition
II. BACKGROUND STUDY
Efficient collaborative method for mixed
numeric and categorical data is recommended by
earlier researchers[4]. Most of the existing clustering
algorithms focus on numerical data whose inbuilt
geometric characteristics can be put-upon obviously
to outline distance functions between data points and
a massive or enormous data exist in the databases is
categorical, where attribute values will not be
reasonably arranged as numerical values. Since the
differences in the features of the categories of data,
force to build up standard functions for mixed data[2]
was not successful. In previous method novel divide-
and-conquer method to overcome this issue. In the
beginning, the real mixed dataset is segmented with
different sub-datasets and next, accessible well
predictable clustering method developed for various
types of datasets is applied to produce equivalent
clusters. Finally, the clustering results on the
categorical and numeric dataset are combined as a
categorical dataset, on which that clustering method
is used to produce the final result. The vital
contribution in this experimental study is to present
an algorithm for the mixed features clustering
complications, in which previous clustering
algorithms like k-means[5] as shown in fig 1. can be
effortlessly incorporated. The majority of existing
clustering algorithms encounter serious scalability
and/or accuracy related problems when used on
databases with a large number of records and/or
attributes. Only few methods can handle numeric,
nominal, and mixed data.
RESEARCH ARTICLE OPEN ACCESS
Dr. C. Chandrasekar et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 4( Version 1), April 2014, pp.209-211
www.ijera.com 210 | P a g e
Fig 1. Clustering high dimensional data using k-
means
III. METHODOLOGY
Clustering in high dimensional spaces is
frequently challenging as theoretical results [3]
doubted the meaning of closest matching in high
dimensional spaces. When the data items are definite
or mixed, the similarity is difficult to find by
Euclidean distance method. With the enormous data
collected from different fields such as banks, or
health sector, universities, colleges, schools web-log
data and biological sequence data which are definite
data need good clustering method. It is highly
complicated to group the unconditional data into
important category with good distance measure,
acquiring sufficient data similarity and to use
combination with an effective clustering algorithm
and in with mixed numeric and categorical data, only
very few methods available. There are many methods
occur for clustering the parallel kind of data, whereas
only very few methods exists for clustering mixed
data items and it leads to the need of better clustering
technique for classification of mixed data. The
clusters identified in the present study differ
somewhat from those from earlier studies. Previous
methods often cluster adult dataset into groups of
persons by age, salary, married, un-married and
country cognitive performance. Even when it
compared adult dataset test performance to that of
male and female category, the category in the present
study produced few performances that would be
considered clinically impaired. For investigative data,
the clustering with the help of Gaussian mixture
models is widely used.
Fig 2. GMM Clustering
IV. EXPERIMENTAL RESULT
The investigation study of the proposed
technique was accepted with the support of Adult
Data Set. The number of clusters answered for the
proposed method is lesser when compared to the
other approaches and the outliers were not found by
using the proposed method. The Proposed algorithm
utilizes an active sampling mechanism and compels
at most a single examine through the data. The
capability to generalize our identifications to other
instances is reduced in many approaches. One
possible merit identified in the operational definition
of age, salary, married, unmarried and the country
used in this study as in fig 4. We determine the high
quality of the clustering solutions that was found,
their explanatory power, and proposed model’s good
scalability. GMM method has good accuracy, uses a
single tunable parameter, and can successfully
function with partial memory resources.
4.1 GMM Method
Gaussian Mixture Models[6] are one among
the significant statistically developed approaches for
clustering. The concept of clustering, and see how
one form of clustering in which it assumed that
individual data points are produced by choosing
one of a set of multivariate Gaussians and then
sampling from them. This can be a distinct related to
operation and examine how to learn such a thing
from data, and we discover that an optimization. This
optimization method is called Expectation
Maximization (EM).
4.2 Adult data set
The adult data set from the UCI repository
contains massive data with both numeric and nominal
values. There are a total of 18,322 records, split in a
2:1 ratio to form a train and test set. The target class
indicates whether a person by age, income, married
and unmarried. After challenging the nominal
Dr. C. Chandrasekar et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 4( Version 1), April 2014, pp.209-211
www.ijera.com 211 | P a g e
attributes into binary values, the training data were
clustered into 14 clusters. Then these clusters were
labeled with the predominant class. On the test data,
k-Meansaccuracy was 77.2% whereas Gaussian
Mixture Model results with accuracy rate of 91%.The
distance-based k- Means algorithm produced clusters
that did not separate well the rare class from the
dominant class.
Fig 3. Scatter Plot of Multimodal Data
Fig 4. Plotting Multimodal Data with age, salary,
married,
V. CONCLUSION
This research focus on effective clustering
method for mixed category data. There are various
methods available for clustering categorical, but
those techniques resulted with some drawbacks. To
overcome this issue, the clustering in mixed data can
be performed based on many approaches. But these
approaches took more time consumption for
classification. To this issue, a new modified Gaussian
Mixture Model technique is used in this study based
on adult dataset. The experiment is performed with
the help of Adult Data Set and it can be examine that
the improved classification result is achieved by the
proposed technique when compared to the previous
experimental methods. The study through the paper
suggests that the proposed approach can be used to
cluster the mixed data items with better accuracy of
classification.
REFERENCES
[1] U. Fayyad, G. Piatetsky-Shapiro,Psmthy… -
citeulike.org. Advances in Knowledge
Discovery and Data Mining. AAAI Press,
Menlo Park:CA, 1996.
[2] Hari Prasad, D and M. Punithavilli, 2010a.
A review on data clustering algorithms for
mixed data. Global J. Comput. Sci.
Technol., 10: 43-48.
[3] K. Beyer, J. Goldstein, R. Ramakrishnan,
and U. Shaft. When is nearest neighbors
meaningful. In Proc. of the Int. Conf.
Database Theories, pages 217–235, 1999.
[4] Reddy, M.V.J. and B. Kavitha, 2010.
Efficient ensemble algorithm for mixed
numeric and categorical data. Proceedings of
the IEEE International Conference on
Computational Intelligence and Computing
Research, Dec. 28-29, IEEE Xplore Press,
Coimbatore, pp: 1-4. DOI:
10.1109/ICCIC.2010.5705738.
[5] Huang, Z. (1998). Extensions to the K-
means Algorithm for Clustering Large
Datasets with Categorical Values. Data
Mining and Knowledge Discovery, 2, p.
283-304 (Publitemid 128695480)
[6] K. Zhang and J. T. Kwok. Simplifying
mixture models through function
approximation. In B. Scholkopf, J. Platt, and
T. Hoffman, editors, Advances in Neural
Information Processing Systems 17, pages
1577–1584, Cambridge, MA, 2007. MIT
Press.

More Related Content

What's hot

EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
ijistjournal
 
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data ClusteringAn Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
ijeei-iaes
 
Ijetr021251
Ijetr021251Ijetr021251
Ijmet 10 02_050
Ijmet 10 02_050Ijmet 10 02_050
Ijmet 10 02_050
IAEME Publication
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Universitas Pembangunan Panca Budi
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
Editor IJCATR
 
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
ijistjournal
 
Du35687693
Du35687693Du35687693
Du35687693
IJERA Editor
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD Editor
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESS
IJDKP
 
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and PredictionUsing ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
ijtsrd
 
Applying the big bang-big crunch metaheuristic to large-sized operational pro...
Applying the big bang-big crunch metaheuristic to large-sized operational pro...Applying the big bang-big crunch metaheuristic to large-sized operational pro...
Applying the big bang-big crunch metaheuristic to large-sized operational pro...
IJECEIAES
 
A Systematic Review on the Educational Data Mining and its Implementation in ...
A Systematic Review on the Educational Data Mining and its Implementation in ...A Systematic Review on the Educational Data Mining and its Implementation in ...
A Systematic Review on the Educational Data Mining and its Implementation in ...
United International Journal for Research & Technology
 
[IJCT-V3I2P26] Authors: Sunny Sharma
[IJCT-V3I2P26] Authors: Sunny Sharma[IJCT-V3I2P26] Authors: Sunny Sharma
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
iosrjce
 
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryA Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
IJERA Editor
 
Implementation of Prototype Based Credal Classification approach For Enhanced...
Implementation of Prototype Based Credal Classification approach For Enhanced...Implementation of Prototype Based Credal Classification approach For Enhanced...
Implementation of Prototype Based Credal Classification approach For Enhanced...
IRJET Journal
 
Ijetr021252
Ijetr021252Ijetr021252
Towards reducing the
Towards reducing theTowards reducing the
Towards reducing the
IJDKP
 

What's hot (19)

EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
 
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data ClusteringAn Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
 
Ijetr021251
Ijetr021251Ijetr021251
Ijetr021251
 
Ijmet 10 02_050
Ijmet 10 02_050Ijmet 10 02_050
Ijmet 10 02_050
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
 
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
 
Du35687693
Du35687693Du35687693
Du35687693
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESS
 
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and PredictionUsing ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
 
Applying the big bang-big crunch metaheuristic to large-sized operational pro...
Applying the big bang-big crunch metaheuristic to large-sized operational pro...Applying the big bang-big crunch metaheuristic to large-sized operational pro...
Applying the big bang-big crunch metaheuristic to large-sized operational pro...
 
A Systematic Review on the Educational Data Mining and its Implementation in ...
A Systematic Review on the Educational Data Mining and its Implementation in ...A Systematic Review on the Educational Data Mining and its Implementation in ...
A Systematic Review on the Educational Data Mining and its Implementation in ...
 
[IJCT-V3I2P26] Authors: Sunny Sharma
[IJCT-V3I2P26] Authors: Sunny Sharma[IJCT-V3I2P26] Authors: Sunny Sharma
[IJCT-V3I2P26] Authors: Sunny Sharma
 
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
 
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryA Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
 
Implementation of Prototype Based Credal Classification approach For Enhanced...
Implementation of Prototype Based Credal Classification approach For Enhanced...Implementation of Prototype Based Credal Classification approach For Enhanced...
Implementation of Prototype Based Credal Classification approach For Enhanced...
 
Ijetr021252
Ijetr021252Ijetr021252
Ijetr021252
 
Towards reducing the
Towards reducing theTowards reducing the
Towards reducing the
 

Viewers also liked

Ai4301188192
Ai4301188192Ai4301188192
Ai4301188192
IJERA Editor
 
Emotional intelligence
Emotional intelligence Emotional intelligence
Emotional intelligence
IMH chennai
 
Research on Automobile Exterior Color and Interior Color Matching
Research on Automobile Exterior Color and Interior Color MatchingResearch on Automobile Exterior Color and Interior Color Matching
Research on Automobile Exterior Color and Interior Color Matching
IJERA Editor
 
Mirce Functionability Equation
Mirce Functionability EquationMirce Functionability Equation
Mirce Functionability Equation
IJERA Editor
 
Degradation and Microbiological Validation of Meropenem Antibiotic in Aqueous...
Degradation and Microbiological Validation of Meropenem Antibiotic in Aqueous...Degradation and Microbiological Validation of Meropenem Antibiotic in Aqueous...
Degradation and Microbiological Validation of Meropenem Antibiotic in Aqueous...
IJERA Editor
 
Ae04605217226
Ae04605217226Ae04605217226
Ae04605217226
IJERA Editor
 
U0440697101
U0440697101U0440697101
U0440697101
IJERA Editor
 
T4301102105
T4301102105T4301102105
T4301102105
IJERA Editor
 
A43020105
A43020105A43020105
A43020105
IJERA Editor
 
A044080105
A044080105A044080105
A044080105
IJERA Editor
 
Efficient Facial Expression and Face Recognition using Ranking Method
Efficient Facial Expression and Face Recognition using Ranking MethodEfficient Facial Expression and Face Recognition using Ranking Method
Efficient Facial Expression and Face Recognition using Ranking Method
IJERA Editor
 
Rain Water Harvesting and Impact of Microbial Pollutants on Ground Water Rese...
Rain Water Harvesting and Impact of Microbial Pollutants on Ground Water Rese...Rain Water Harvesting and Impact of Microbial Pollutants on Ground Water Rese...
Rain Water Harvesting and Impact of Microbial Pollutants on Ground Water Rese...
IJERA Editor
 
I044064447
I044064447I044064447
I044064447
IJERA Editor
 
Shortest Tree Routing With Security In Wireless Sensor Networks
Shortest Tree Routing With Security In Wireless Sensor NetworksShortest Tree Routing With Security In Wireless Sensor Networks
Shortest Tree Routing With Security In Wireless Sensor Networks
IJERA Editor
 
Growth and Characterization of Barium doped Potassium Hydrogen Phthalate Sing...
Growth and Characterization of Barium doped Potassium Hydrogen Phthalate Sing...Growth and Characterization of Barium doped Potassium Hydrogen Phthalate Sing...
Growth and Characterization of Barium doped Potassium Hydrogen Phthalate Sing...
IJERA Editor
 
S04506102109
S04506102109S04506102109
S04506102109
IJERA Editor
 
UNICEF: Believe in zero campaign 2009
UNICEF: Believe in zero campaign 2009UNICEF: Believe in zero campaign 2009
UNICEF: Believe in zero campaign 2009
Post Media
 
A Survey Analysis on CMOS Integrated Circuits with Clock-Gated Logic Structure
A Survey Analysis on CMOS Integrated Circuits with Clock-Gated Logic StructureA Survey Analysis on CMOS Integrated Circuits with Clock-Gated Logic Structure
A Survey Analysis on CMOS Integrated Circuits with Clock-Gated Logic Structure
IJERA Editor
 
M045077578
M045077578M045077578
M045077578
IJERA Editor
 
X4502151157
X4502151157X4502151157
X4502151157
IJERA Editor
 

Viewers also liked (20)

Ai4301188192
Ai4301188192Ai4301188192
Ai4301188192
 
Emotional intelligence
Emotional intelligence Emotional intelligence
Emotional intelligence
 
Research on Automobile Exterior Color and Interior Color Matching
Research on Automobile Exterior Color and Interior Color MatchingResearch on Automobile Exterior Color and Interior Color Matching
Research on Automobile Exterior Color and Interior Color Matching
 
Mirce Functionability Equation
Mirce Functionability EquationMirce Functionability Equation
Mirce Functionability Equation
 
Degradation and Microbiological Validation of Meropenem Antibiotic in Aqueous...
Degradation and Microbiological Validation of Meropenem Antibiotic in Aqueous...Degradation and Microbiological Validation of Meropenem Antibiotic in Aqueous...
Degradation and Microbiological Validation of Meropenem Antibiotic in Aqueous...
 
Ae04605217226
Ae04605217226Ae04605217226
Ae04605217226
 
U0440697101
U0440697101U0440697101
U0440697101
 
T4301102105
T4301102105T4301102105
T4301102105
 
A43020105
A43020105A43020105
A43020105
 
A044080105
A044080105A044080105
A044080105
 
Efficient Facial Expression and Face Recognition using Ranking Method
Efficient Facial Expression and Face Recognition using Ranking MethodEfficient Facial Expression and Face Recognition using Ranking Method
Efficient Facial Expression and Face Recognition using Ranking Method
 
Rain Water Harvesting and Impact of Microbial Pollutants on Ground Water Rese...
Rain Water Harvesting and Impact of Microbial Pollutants on Ground Water Rese...Rain Water Harvesting and Impact of Microbial Pollutants on Ground Water Rese...
Rain Water Harvesting and Impact of Microbial Pollutants on Ground Water Rese...
 
I044064447
I044064447I044064447
I044064447
 
Shortest Tree Routing With Security In Wireless Sensor Networks
Shortest Tree Routing With Security In Wireless Sensor NetworksShortest Tree Routing With Security In Wireless Sensor Networks
Shortest Tree Routing With Security In Wireless Sensor Networks
 
Growth and Characterization of Barium doped Potassium Hydrogen Phthalate Sing...
Growth and Characterization of Barium doped Potassium Hydrogen Phthalate Sing...Growth and Characterization of Barium doped Potassium Hydrogen Phthalate Sing...
Growth and Characterization of Barium doped Potassium Hydrogen Phthalate Sing...
 
S04506102109
S04506102109S04506102109
S04506102109
 
UNICEF: Believe in zero campaign 2009
UNICEF: Believe in zero campaign 2009UNICEF: Believe in zero campaign 2009
UNICEF: Believe in zero campaign 2009
 
A Survey Analysis on CMOS Integrated Circuits with Clock-Gated Logic Structure
A Survey Analysis on CMOS Integrated Circuits with Clock-Gated Logic StructureA Survey Analysis on CMOS Integrated Circuits with Clock-Gated Logic Structure
A Survey Analysis on CMOS Integrated Circuits with Clock-Gated Logic Structure
 
M045077578
M045077578M045077578
M045077578
 
X4502151157
X4502151157X4502151157
X4502151157
 

Similar to Ae044209211

Clustering heterogeneous categorical data using enhanced mini batch K-means ...
Clustering heterogeneous categorical data using enhanced mini  batch K-means ...Clustering heterogeneous categorical data using enhanced mini  batch K-means ...
Clustering heterogeneous categorical data using enhanced mini batch K-means ...
IJECEIAES
 
U0 vqmtq2otq=
U0 vqmtq2otq=U0 vqmtq2otq=
F04463437
F04463437F04463437
F04463437
IOSR-JEN
 
Curse of Dimensionality in Paradoxical High Dimensional Clinical Datasets � A...
Curse of Dimensionality in Paradoxical High Dimensional Clinical Datasets � A...Curse of Dimensionality in Paradoxical High Dimensional Clinical Datasets � A...
Curse of Dimensionality in Paradoxical High Dimensional Clinical Datasets � A...
ijcnes
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
idescitation
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
eSAT Journals
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
eSAT Publishing House
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESS
IJDKP
 
Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...
IJSRD
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
IJCSIS Research Publications
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data Mining
IOSR Journals
 
Multilevel techniques for the clustering problem
Multilevel techniques for the clustering problemMultilevel techniques for the clustering problem
Multilevel techniques for the clustering problem
csandit
 
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
IJECEIAES
 
An approach for improved students’ performance prediction using homogeneous ...
An approach for improved students’ performance prediction  using homogeneous ...An approach for improved students’ performance prediction  using homogeneous ...
An approach for improved students’ performance prediction using homogeneous ...
IJECEIAES
 
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUESTUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
IJDKP
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
PRAWEEN KUMAR
 
B0930610
B0930610B0930610
B0930610
IOSR Journals
 
H017625354
H017625354H017625354
H017625354
IOSR Journals
 

Similar to Ae044209211 (20)

Clustering heterogeneous categorical data using enhanced mini batch K-means ...
Clustering heterogeneous categorical data using enhanced mini  batch K-means ...Clustering heterogeneous categorical data using enhanced mini  batch K-means ...
Clustering heterogeneous categorical data using enhanced mini batch K-means ...
 
U0 vqmtq2otq=
U0 vqmtq2otq=U0 vqmtq2otq=
U0 vqmtq2otq=
 
F04463437
F04463437F04463437
F04463437
 
Curse of Dimensionality in Paradoxical High Dimensional Clinical Datasets � A...
Curse of Dimensionality in Paradoxical High Dimensional Clinical Datasets � A...Curse of Dimensionality in Paradoxical High Dimensional Clinical Datasets � A...
Curse of Dimensionality in Paradoxical High Dimensional Clinical Datasets � A...
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESS
 
Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data Mining
 
Multilevel techniques for the clustering problem
Multilevel techniques for the clustering problemMultilevel techniques for the clustering problem
Multilevel techniques for the clustering problem
 
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
 
An approach for improved students’ performance prediction using homogeneous ...
An approach for improved students’ performance prediction  using homogeneous ...An approach for improved students’ performance prediction  using homogeneous ...
An approach for improved students’ performance prediction using homogeneous ...
 
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUESTUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
B0930610
B0930610B0930610
B0930610
 
H017625354
H017625354H017625354
H017625354
 

Recently uploaded

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 

Ae044209211

  • 1. Dr. C. Chandrasekar et al Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622, Vol. 4, Issue 4( Version 1), April 2014, pp.209-211 www.ijera.com 209 | P a g e Clustering Large Databases Using Gmm Dr. C. Chandrasekar*, M. Srisankar** * (Assistant Professor, Department of Computer Applications, Sree Narayana Guru College, Coimbatore – 105) ** (Research Scholar, Department of Computer Science, Bharathiar University, Coimbatore - 46) ABSTRACT Innovation of such clusters of data is essential in illuminating main links in categorical data regulatory networks. There are lot of problems exists in the previous clustering methods especially while grouping the data with mixed data types. This experiment analyzes those existing methods and comes with the new approach for clustering the mixed data items. There are many methods occur for clustering the parallel kind of data, whereas only very few methods exists for clustering mixed data items and it leads to the need of better clustering technique for classification of mixed data. For investigative data, the clustering with the help of Gaussian mixture models is widely used. Keywords - Clustering methods, High dimensional data, K-means clustering, GMM, Adult data set, Categorical data I. INTRODUCTION In knowledge discovery, a basis data mining technique used is data clustering. Most widely used methods in data mining is clustering and its core is grouping the whole data based on its parallel measures that based on some distance measure. The problem of clustering has become increasingly important in recent years. Data mining offers a suite of algorithms, each addressing a different task and in the process elucidating a unique facet of the data [1]. Of the many facets of data mining, we are particularly interested in clustering problems, i.e. the process of finding similarities in the data and then grouping similar data into identifiable clusters. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). Most techniques of clustering comprise document grouping, scientific data analysis and customer/market segmentation. Usually, clustering involves the classification of the granting data that includes n points in m dimension into k clusters. The clustering must be such that the data points in the corresponding cluster that should be highly similar to one another.The problems exists in the clustering techniques are: Identifying the similar measure to find the similarity among various data, it is complex to find out the suitable methods for identifying the identical data in unsupervised way and originate a description that can differentiate the data of a cluster in an efficient manner. It is a vital task of experimental data mining, and a usual method for mathematical data study, used in many areas, such as machine learning, image analysis, information retrieval and pattern recognition II. BACKGROUND STUDY Efficient collaborative method for mixed numeric and categorical data is recommended by earlier researchers[4]. Most of the existing clustering algorithms focus on numerical data whose inbuilt geometric characteristics can be put-upon obviously to outline distance functions between data points and a massive or enormous data exist in the databases is categorical, where attribute values will not be reasonably arranged as numerical values. Since the differences in the features of the categories of data, force to build up standard functions for mixed data[2] was not successful. In previous method novel divide- and-conquer method to overcome this issue. In the beginning, the real mixed dataset is segmented with different sub-datasets and next, accessible well predictable clustering method developed for various types of datasets is applied to produce equivalent clusters. Finally, the clustering results on the categorical and numeric dataset are combined as a categorical dataset, on which that clustering method is used to produce the final result. The vital contribution in this experimental study is to present an algorithm for the mixed features clustering complications, in which previous clustering algorithms like k-means[5] as shown in fig 1. can be effortlessly incorporated. The majority of existing clustering algorithms encounter serious scalability and/or accuracy related problems when used on databases with a large number of records and/or attributes. Only few methods can handle numeric, nominal, and mixed data. RESEARCH ARTICLE OPEN ACCESS
  • 2. Dr. C. Chandrasekar et al Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622, Vol. 4, Issue 4( Version 1), April 2014, pp.209-211 www.ijera.com 210 | P a g e Fig 1. Clustering high dimensional data using k- means III. METHODOLOGY Clustering in high dimensional spaces is frequently challenging as theoretical results [3] doubted the meaning of closest matching in high dimensional spaces. When the data items are definite or mixed, the similarity is difficult to find by Euclidean distance method. With the enormous data collected from different fields such as banks, or health sector, universities, colleges, schools web-log data and biological sequence data which are definite data need good clustering method. It is highly complicated to group the unconditional data into important category with good distance measure, acquiring sufficient data similarity and to use combination with an effective clustering algorithm and in with mixed numeric and categorical data, only very few methods available. There are many methods occur for clustering the parallel kind of data, whereas only very few methods exists for clustering mixed data items and it leads to the need of better clustering technique for classification of mixed data. The clusters identified in the present study differ somewhat from those from earlier studies. Previous methods often cluster adult dataset into groups of persons by age, salary, married, un-married and country cognitive performance. Even when it compared adult dataset test performance to that of male and female category, the category in the present study produced few performances that would be considered clinically impaired. For investigative data, the clustering with the help of Gaussian mixture models is widely used. Fig 2. GMM Clustering IV. EXPERIMENTAL RESULT The investigation study of the proposed technique was accepted with the support of Adult Data Set. The number of clusters answered for the proposed method is lesser when compared to the other approaches and the outliers were not found by using the proposed method. The Proposed algorithm utilizes an active sampling mechanism and compels at most a single examine through the data. The capability to generalize our identifications to other instances is reduced in many approaches. One possible merit identified in the operational definition of age, salary, married, unmarried and the country used in this study as in fig 4. We determine the high quality of the clustering solutions that was found, their explanatory power, and proposed model’s good scalability. GMM method has good accuracy, uses a single tunable parameter, and can successfully function with partial memory resources. 4.1 GMM Method Gaussian Mixture Models[6] are one among the significant statistically developed approaches for clustering. The concept of clustering, and see how one form of clustering in which it assumed that individual data points are produced by choosing one of a set of multivariate Gaussians and then sampling from them. This can be a distinct related to operation and examine how to learn such a thing from data, and we discover that an optimization. This optimization method is called Expectation Maximization (EM). 4.2 Adult data set The adult data set from the UCI repository contains massive data with both numeric and nominal values. There are a total of 18,322 records, split in a 2:1 ratio to form a train and test set. The target class indicates whether a person by age, income, married and unmarried. After challenging the nominal
  • 3. Dr. C. Chandrasekar et al Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622, Vol. 4, Issue 4( Version 1), April 2014, pp.209-211 www.ijera.com 211 | P a g e attributes into binary values, the training data were clustered into 14 clusters. Then these clusters were labeled with the predominant class. On the test data, k-Meansaccuracy was 77.2% whereas Gaussian Mixture Model results with accuracy rate of 91%.The distance-based k- Means algorithm produced clusters that did not separate well the rare class from the dominant class. Fig 3. Scatter Plot of Multimodal Data Fig 4. Plotting Multimodal Data with age, salary, married, V. CONCLUSION This research focus on effective clustering method for mixed category data. There are various methods available for clustering categorical, but those techniques resulted with some drawbacks. To overcome this issue, the clustering in mixed data can be performed based on many approaches. But these approaches took more time consumption for classification. To this issue, a new modified Gaussian Mixture Model technique is used in this study based on adult dataset. The experiment is performed with the help of Adult Data Set and it can be examine that the improved classification result is achieved by the proposed technique when compared to the previous experimental methods. The study through the paper suggests that the proposed approach can be used to cluster the mixed data items with better accuracy of classification. REFERENCES [1] U. Fayyad, G. Piatetsky-Shapiro,Psmthy… - citeulike.org. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park:CA, 1996. [2] Hari Prasad, D and M. Punithavilli, 2010a. A review on data clustering algorithms for mixed data. Global J. Comput. Sci. Technol., 10: 43-48. [3] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is nearest neighbors meaningful. In Proc. of the Int. Conf. Database Theories, pages 217–235, 1999. [4] Reddy, M.V.J. and B. Kavitha, 2010. Efficient ensemble algorithm for mixed numeric and categorical data. Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research, Dec. 28-29, IEEE Xplore Press, Coimbatore, pp: 1-4. DOI: 10.1109/ICCIC.2010.5705738. [5] Huang, Z. (1998). Extensions to the K- means Algorithm for Clustering Large Datasets with Categorical Values. Data Mining and Knowledge Discovery, 2, p. 283-304 (Publitemid 128695480) [6] K. Zhang and J. T. Kwok. Simplifying mixture models through function approximation. In B. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 17, pages 1577–1584, Cambridge, MA, 2007. MIT Press.