SlideShare a Scribd company logo
International Journal of Informatics and Communication Technology (IJ-ICT)
Vol.6, No.3, December 2017, pp. 199~208
ISSN: 2252-8776, DOI: 10.11591/ijict.v6i3.pp199-208  199
Journal homepage: http://iaesjournal.com/online/index.php/IJICT
Density Based Clustering with Integrated One-Class SVM for
Noise Reduction
K. Nafees Ahmed*, Dr. T. Abdul Razak
*Departement of Computer Science, Jamal Mohamed College, Tamil nadu, India
Article Info ABSTRACT
Article history:
Received Aug 12nd
, 2017
Revised Oct 26th
, 2017
Accepted Nov 6th
, 2017
Information extraction from data is one of the key necessities for data
analysis. Unsupervised nature of data leads to complex computational
methods for analysis. This paper presents a density based spatial clustering
technique integrated with one-class Support Vector Machine (SVM), a
machine learning technique for noise reduction, a modified variant of
DBSCAN called Noise Reduced DBSCAN (NRDBSCAN). Analysis of
DBSCAN exhibits its major requirement of accurate thresholds, absence of
which yields suboptimal results. However, identifying accurate threshold
settings is unattainable. Noise is one of the major side-effects of the threshold
gap. The proposed work reduces noise by integrating a machine learning
classifier into the operation structure of DBSCAN. The Experimental results
indicate high homogeneity levels in the clustering process.
Keyword:
Clustering
DBSCAN
Machine Learning Classifier
Noise Reduction
One-class SVM Copyright © 2017 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
K. Nafees Ahmed,
Departement of Computer Science,
Jamal Mohamed College,
7 Race Course Road, Khaja Nagar, Tiruchirappalli, Tamil Nadu, 620020, India.
Email: nafeesjmc@gmail.com
1. INTRODUCTION
The current Internet age is data rich, but information poor. Meaningful data is the chief requirement
of the current age. This has led to an enormous increase in the data processing techniques. However, the
major drawback is inherently embedded in the data itself. The distribution of datasets vary and hence a single
technique that was designed to process data from a domain would not possibly provide effective results when
applied to the data from the same domain, due to variations in the data distribution levels as time progresses.
Hence techniques that are as ductile as the data itself are the only ones that can persist.
Unsupervised data processing has always been a challenge due to their unpredictable nature. Ductile
and tractable algorithms are the major requirements for processing such data. Domains with such
requirements range from image processing to web information processing. The requirements for such
processing techniques are constantly on the raise, with the increase in the amount of data being available.
Extracting meaningful information from such data requires grouping them to find commonalities
existing between them. This aids in better interpretation of data. Clustering is the process of grouping data
such that data within a group/cluster is more cohesive compared to data in different clusters. The process of
clustering makes the data meaningful for various applications. Clustering methods are usually classified as
partitional and hierarchical clustering techniques. Partitional clustering techniques perform flat clustering
based on a single decision criterion such as distance or density. Distance based clustering techniques include
K-Means [1] clustering and CLARA [2] to name a few. Density based clustering techniques are currently on
the raise due to their flexible operational nature and the stable solution sets generated by them. Density based
clustering techniques include DBSCAN [3] and DenClue [4], OPTICS [5] to name a few. Hierarchical
clustering algorithms are divided into agglomerative and divisive, corresponding to their basic operational
nature, bottom-up or top-down [6].
 ISSN: 2252-8776
IJECE Vol. 6, No. 3, December 2017 : 199– 208
200
Density based clustering techniques are currently on the increase due to the increase in the
requirements for spatial data processing. However, most of these techniques are derivatives from DBSCAN,
extending it according to their operating domains. Extensions are usually in terms of reduction in time,
parameter automation, input reduction in terms of feature selection and ability to handle varied densities.
FDBSCAN [7] and IDBSCAN [8] are two approaches concentrating on the reduction in time component.
Both operate by identifying a minimal set of highly representative points for their operation rather than
utilizing the entire data points. In addition to this, the IDBSCAN utilized a grid based data selection for
reducing the input levels.
Parameter fine-tuning based algorithms includes k-VDBSCAN [9], DBCLASD [10] and APSCAN
[11]. The k-VDBSCAN is a parameter free technique that automatically identifies the parameters.
DBCLASD operates on large data and is a parameter free clustering technique, enabling gradual cluster
expansion on the basis of the neighbors and their densities. APSCAN utilizes Affinity Propagation technique
to identify the cluster densities based on local data variations.
A density based clustering technique aiding in the discovery of clusters with varying densities
within a single dataset was proposed by Zhu et al in [12]. In general, the input data is considered to contain
data distributed in uniform densities. However, some unusual real time data such as population maps tend to
contain such data. This leads to a complex issue, as increasing the density levels for the entire process
includes several outliers into clusters, while reducing the density levels misses several legitimate clusters.
Two approaches exist in literature to solve this issue, namely modifying the algorithm appropriately and
rescaling the data. The technique proposed in [12] utilizes the latter by rescaling the data to appropriately
identify the clusters. DBSCAN is applied to the rescaled data to identify clusters. A similar technique that
identifies varied density based clusters was proposed by Louhichi et al. in [13]. The operational process of
this technique is divided into two phases. The first phase identifies the density levels of the input data using
the exponential spline technique on the distance matrix. Second phase utilizes the density values determined
from the first phase as local threshold parameters to identify clusters. Other density based clustering
techniques include VDBSCAN [14], GMDBSCAN [15], DDSC [16], EDBSCAN [17] etc.
This paper concentrates on developing a density based spatial clustering technique. DBSCAN, being
the precursor of such techniques, is adopted by most of the techniques involving non-uniform clusters. It was
identified that DBSCAN has high potency of generating outliers. On further assessment it was identified that
several of these outliers effectively fit into the formed clusters. However, they still remain outliers due to the
parameters defined for the clustering process. The parameter setting process is always optimal and is
identified by the data experts through data analysis and trial and error. As resolving this issue is a complex
task, this work presents NRDBSCAN with one-class SVM to reduce the noise levels.
2. RESEARCH METHOD
Density based spatial clustering proposes a grouping mechanism, that operates on the basis of both
distance and the node density. The major advantage of using such an approach is that it does not rely on
centroid based operations, hence the inconsistencies in the formation of clusters are eliminated. Further,
density based clustering is an unsupervised approach with no prior information requirements. Density based
clustering approaches has the ability to identify clusters of arbitrary shapes and sizes rather than being
confined to the traditional circular clusters. It was identified that DBSCAN based techniques exhibits high
noise levels. Even though the base clustering process was identified to be efficient, the noise levels generated
by DBSCAN were found to be excessive. This paper proposes NRDBSCAN, a hybridized density based
clustering approach that initially clusters the data, after which it tries to incorporate noise into any of the
existing definite clusters, hence reducing the noise levels.
Prior to the actual clustering process, the input data is processed and is converted to the required
format. A schema analysis is performed on the data to identify the data types of the contents. The proposed
architecture accepts only numerical data, hence textual data are eliminated and categorical and ordinal data
are converted to numerical formats. The data pre-processing is followed by the actual clustering process.
Algorithm for the proposed NRDBSCAN is shown below.
IJ-ICT ISSN: 2252-8776 
Density Based Clustering with Integrated One-Class SVM… (K. Nafees Ahmed)
201
NRDBSCAN Algorithm
1. Input base data
2. Input threshold levels (minPts, maxDist)
3. Initialize first cluster with a random node n
4. neighbors  identifyNeighbors(n)
5. If the neighbour count<minPts,
a. Set n into a separate cluster (Outlier)
6. Else expandCluster(n,neighbors)
7. Perform this process until all the nodes are grouped into a
cluster
8. For each identified cluster C
a. If density of C < 1
i. noise  C
9. Until Noise is not empty
a. For each identified cluster C
i. Prediction predict(C,noise)
ii. Add all true predictions to cluster C
iii. Delete corresponding entries from noise
b. if no entries are deleted for the last two iterations
i. Allocate new clusters for each entry in noise
ii. Empty noise
function identifyNeighbors (n)
1. For all nodes n1 in data
a. If distance(n,n1)<maxDist
i. Add n1 to neighborList
2. return neighborList
function expandCluster(n,neighbors)
1. Add n to the cluster C
2. For each neighbour n1
a. neighborL1  identifyNeighbors(n1)
b. if neighborL1 count >= minPts
i. Add n1 to C
function predict(C,noise)
1. Initialize one-class SVM with polynomial kernel
2. Set the degree of kernel to be the dimensions of C
3. Train SVM with C
4. Predictions  Apply noise to trained kernel
5. Return Predictions
2.1. Initial Level Cluster Formulation using DBSCAN
The pre-processed data is analyzed and the maximum acceptable distance (maxDist) and the
minimum neighbor requirements (minPts) are identified using the data distribution. However, accurately
identifying the best parameters is not possible in a single iteration. This is a trial and error based fine-tuning
process. Further, these parameters vary with the dataset being used. Hence distribution based transient values
are initially identified and the final parameters are identified by methodically increasing and decreasing the
parameter values to identify the best parameter set for the current data under analysis.
2.1.1. Cluster Identification Phase
In this phase, the process begins with a random node in the dataset. The initial cluster is composed
of this single node n. Neighbors of the selected node (n) are identified using maxDist as the threshold. If the
node satisfies the minimum neighbor requirements (minPts), it is considered to be a part of a cluster and not
an outlier. Hence, this is followed by the cluster expansion phase.
2.1.2. Cluster Expansion Phase
This phase finds the neighbors of each node in n. If each of these nodes satisfies the minPts
requirement they are incorporated as an entity in the cluster. This process is continued until all the
appropriate nodes are grouped into the cluster. However, several clusters might exist in a dataset. Hence
 ISSN: 2252-8776
IJECE Vol. 6, No. 3, December 2017 : 199– 208
202
some points would remain outside the composed cluster. A random such point is considered as the base for
the next cluster. The neighbor identification and cluster expansion phase are repeated to identify the points
corresponding to the new cluster. This process is repeated until all the points are categorized into a cluster.
Though all points are categorized into clusters, not all clusters would be composed with high or even
considerable densities. Some clusters would be composed of single node. These nodes are intended to be
outliers. Major reasons for the occurrence of outliers are improper parameter tuning, their actual presence or
inappropriate node validation in the expansion phase. Though the actual presence of outliers needs to be
considered, the other two issues also play a vital role in the occurrence of outliers in DBSCAN. Hence a
second level examination would incorporate several nodes that could have been components of the existing
clusters, leading to a reduction in the outliers.
2.2. One-Class SVM based Noise Reduction
DBSCAN eliminates several nodes, considering them as outliers. However, they might not
necessarily correspond to outliers, rather they could be components of the defined clusters. Utilizing the
conventional conditional checks utilized by DBSCAN again ultimately has the same effects. Hence the
proposed NRDBSCAN incorporates a machine learning classifier, One-Class Support Vector Machine
(SVM) for the categorization process.
Utilizing all the clusters for training a binary/multi-class classifier and then performing classification
on the detected outliers has several downsides. The major issue is that binary/multi-class classifier definitely
categorizes the data into any one of the groups. Hence all the outliers would definitely be grouped into a
cluster, leading to zero outliers. However, the proposed work is intended on grouping only the appropriate
nodes and retaining the outliers. Hence a one-class classifier would be the best suited approach. One-class
classifier is a special case of classifiers, where the pattern of a single class would be well known, while the
patterns that do not confine to the well-known trained patterns would be considered as outliers.
Output from DBSCAN is in terms of clusters. Data within clusters have obvious associations, hence
these can be used as the training data for the classifiers. The noise reduction phase operates by training the
one-class SVM using data from one cluster and performing predictions on the detected outliers. Outliers
categorized as components of the cluster used for training are incorporated into the cluster. The residual
outliers are considered for processing by training the one-class SVM with data from the next cluster. This
process is continued for all the defined clusters with considerable density levels.
One-class SVM was suggested by Scholkopf et al. in [18]. It operates by separating the data points
from the origin and considering the origin alone as the second class. The base operational nature of one-class
SVM is to maximize the distance from the hyperplane created by the data points to the origin. The resultant
of this is a binary function that effectively captures the regions in the input space, returning +1 for data
contained in the hyperplane and -1 for others. One-class SVM used in the NRDBSCAN utilizes the
polynomial kernel [19], as the input data has the tendency to contain several dimensions.
Not all data are expected to be components of the clusters. After the entire process, some residual data
remains, which are categorized as outliers.
3. RESULTS AND ANALYSIS
DBSCAN was implemented in C#.NET and the noise reduction components were incorporated into
the operational process. Analysis of the NRDBSCAN was performed by using four benchmark datasets.
Clustering specific datasets such as Iris and Banana and spatial datasets such as Quake and Forest were used
for identifying the efficiency of the algorithm.
Cluster densities and the number of clusters obtained from each of the datasets using NRDBSCAN
is shown in figures(1-4). Cluster densities correspond to the number of nodes in each of the formulated
cluster. A density of one indicates an outlier. It could be observed that the clusters formulated using the
proposed approach exhibits low outlier levels and clusters of considerable densities. Iris and Quake,
containing low variations exhibit low outlier levels, while Forest and Banana exhibits high variations, hence
high outliers. However, it could be observed that the clusters other than outliers exhibit high density levels.
IJ-ICT ISSN: 2252-8776 
Density Based Clustering with Integrated One-Class SVM… (K. Nafees Ahmed)
203
Figure 1. Cluster Density (Iris)
Figure 2. Cluster Density (Banana)
Figure 3. Cluster Density (Quake)
50
54
11
23
4
7
1
0
10
20
30
40
50
60
1 2 3 4 5 6 7
No.ofNodes
Cluster Number
Cluster Density (Iris)
5066
131 9 40 16 13 13 6 1 1 1 1 1 1
0
1000
2000
3000
4000
5000
6000
1 2 3 4 5 6 7 8 9 10 11 12 13 14
No.ofNodes
Cluster Number
Cluster Density (Banana)
2045
21 8 7 9 18 12 22 26 9 1
0
500
1000
1500
2000
2500
1 2 3 4 5 6 7 8 9 10 11
No.ofNodes
Cluster Number
Cluster Density (Quake)
 ISSN: 2252-8776
IJECE Vol. 6, No. 3, December 2017 : 199– 208
204
Figure 4. Cluster Density (Forest)
Intra-cluster radius of the proposed NRDBSCAN is presented in figures(5-8). It could be observed
that the proposed technique exhibits low to moderate intra cluster distance levels. Moderate levels are due to
the varied shaped clusters formed by the algorithm. Inter-cluster distance exhibited by the algorithm is
presented in figure 9. It could be observed that low inter cluster distances are exhibited by NRDBSCAN.
This validates our claim of varied shaped clusters with different density levels.
Figure 5. Intra Cluster Radius (Iris)
494
4 5 4 1 1 1 1 1 1 1 1 1 1
0
100
200
300
400
500
600
1 2 3 4 5 6 7 8 9 10 11 12 13 14
No.ofNodes
Cluster Number
Cluster Density (Forest)
0
0.5
1
1.5
2
2.5
3
1 2 3 4 5 6 7
NodeDistance
Cluster Number
IntraCluster Radius (Iris)
IJ-ICT ISSN: 2252-8776 
Density Based Clustering with Integrated One-Class SVM… (K. Nafees Ahmed)
205
Figure 6. Intra Cluster Radius (Banana)
Figure 7. Intra Cluster Radius (Quake)
Figure 8. Intra Cluster Radius (Forest)
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14
NodeDistance
Cluster Number
IntraCluster Radius (Banana)
0
0.5
1
1.5
2
2.5
3
3.5
4
1 2 3 4 5 6 7 8 9 10 11
NodeDistance
Cluster Number
IntraCluster Radius (Quake)
0
100
200
300
400
500
600
1 2 3 4 5 6 7 8 9 10 11 12 13 14
NodeDistance
Cluster Number
IntraCluster Radius (Forest)
 ISSN: 2252-8776
IJECE Vol. 6, No. 3, December 2017 : 199– 208
206
Figure 9. Inter Cluster Distance
A comparison in terms of time is carried out between DBSCAN, Modified PSO based DBSCAN
[20] and the proposed NRDBSCAN is presented in figure 10. It could be observed that the time taken for
modified PSO is the highest. However, the time taken by NRDBSCAN is higher compared to DBSCAN.
Even-though that is the case, the difference is in terms of 0.3 sec (maximum). Hence the significance of the
time increase is considered to be low.
Figure 10. Time Comparison
A comparison in terms of the number of clusters generated by DBSCAN and NRDBSCAN is shown
in figure 11. It could be observed that the number of clusters generated by NRDBSCAN is at-least 60% less
than the conventional DBSCAN. This exhibits the effective noise reduction levels exhibited by the proposed
approach.
0
0.5
1
1.5
2
2.5
3
3.5
4
Iris Banana Quake Forest
Distance
Inter Cluster Distance
Iris Banana Quake Forest
Time (DBSCAN) 0.002 0.046 0.0361 0.0126
Time
(NRDBSCAN)
0.005 0.3605 0.0706 0.0217
Time (Modified
PSO)
0.193 3.972 3.421 1.794
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Time(sec)
IJ-ICT ISSN: 2252-8776 
Density Based Clustering with Integrated One-Class SVM… (K. Nafees Ahmed)
207
Figure 11. Comparison in terms of number of clusters
4. CONCLUSION
Clustering, being one of the major techniques for data analysis has seen several adaptations due to
the changing constraints. The major adaptations include identifying varied shaped clusters. This paper
proposes an enhancement of the DBSCAN that is a pioneer algorithm in identifying varied shaped and varied
density clusters. The major downside of DBSCAN was observed to be its requirement for accurate parameter
setting. A slightly varied setting results in the increase in outliers. However, identifying the perfect parameter
is not feasible. Hence the proposed NRDBSCAN enhances the DBSCAN approach by introducing a noise
reduction component based on one-class classifier model. One-class SVM was used to perform this process.
Results exhibits significant reduction in the noise levels. Current algorithm operates effectively on fixed
density clusters. Future works will concentrate on porting the algorithm to operate on datasets with varied
densities and effectively identify clusters with varied density levels.
REFERENCES
[1] J. A. Hartigan and M.A. Wong, “Algorithm AS 136: A k-means Clustering Algorithm,” Journal of Royal Statistical
Society,Series C (Applied Statistics) vol. 28, pp. 100-108, Jan 1979.
[2] C.P. Wei, et al., “Empirical Comparison of Fast Clustering Algorithms for Large data sets,” in System Sciences,
2000 IEEE Proceeding of 33rd
Annual Hawaii International Conference, 2000, pp. 1-10.
[3] M.Ester, et al., “A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in
2nd
International Conference on Knowledge Discovery and Data Mining. KDD-1996, vol. 96, pp. 226-231. 1996.
[4] A. Hinneburg and D.A. Keim, “An Efficient Approach to Clustering in Large Multimedia Databases with Noise,”
in International Conference on Knowlegde Discovery and Data Mining, KDD-1998, vol. 98, pp. 58-65, 1998.
[5] M. Ankerst, et al., “OPTICS: Ordering Points to Identify the Clustering Structure,” in ACM SIGMOD Record 1999,
vol. 28, pp. 49-60, 1999.
[6] E. Güngör and A. Özmen, “Distance and Density based Clustering Algorithm using Gaussian Kernel,” Expert
Systems with Appilcations, vol. 1, pp. 10-20, 2017.
[7] S. Zhou, et al., “FDBSCAN: A Fast DBSCAN Algorithm,” Ruan Jian Xue Bao, vol. 11, pp. 735-744, 2000.
[8] C.F. Tsain and H.F. Yeh, “Npust: An Efficient Clustering Algorithm using Partition Space Technique for Large
Databases,” in International Conference on Industrial, Engineering and Other Applications of Applied Intelligent
Systems, pp. 787-796, 2009, Springer Berlin Heidelberg.
[9] A.R. Chowdhury, et al., “An Efficient Method forSsubjectively Choosing Parameter „k‟Automatically in
VDBSCAN (Varied Density Based Spatial Clustering of Applications with Noise)Algorithm,” in Computer and
Automation Engineering, 2010, ICCAE 2010. Second International Conference, IEEE, pp. 38-41.
[10] M. Parimala, et al., “A Survey on Density Based Clustering Algorithms for Mining Large Spatial Databases,”
International Journal of Advanced Science and Technology, vol. 31, pp. 59-66, 2011.
[11] X. Chen, et al., “APSCAN: A Parameter Free Algorithm for Clustering,” Pattern Recognition Letters, vol. 32, pp.
973-986, 2011.
[12] Y. Zhu, et al., “Density-ratio Based Clustering for Discovering Clusters with Varying Densities,” Pattern
Recognition, vol. 60, pp. 983-997, 2016.
[13] S. Louhichi, et al., “Unsupervised Varied Density Based Clustering Algorithm using Spline,” Pattern Recognition
Letters, 2016.
0
50
100
150
200
250
300
Iris Banana Quake Forest
ClusterNumber
No of Clusters
# Clusters (DBSCAN) # Clusters (NRDBSCAN)
 ISSN: 2252-8776
IJECE Vol. 6, No. 3, December 2017 : 199– 208
208
[14] P. Liu, et al., “VDBSCAN: Varied Density Based Spatial Clustering of Applications with Noise,” in Service
Systems and Service Management, 2007, IEEE International Conference, 2007, pp. 1-4.
[15] C. Xiaoyum, et al., “GMDBSCAN: Multi-Density DBSCAN Cluster Based on Grid,” in e-Business Engineering,
2008, ICEBE 2008. IEEE International Conference, 2008, pp. 780-783.
[16] B. Borah and D.K. Bhattacharyya, “DDSC: A Density Differentiated Spatial Clustering Technique,” Journal of
computers, vol. 3, pp. 72-79, 2008.
[17] A. Ram, et al., “An Enhanced Density Based Spatial Clustering of Applications with Noise,” in Advanced
Computing Conference, 2009. IACC 2009. IEEE International, 2009, pp. 1475-1478.
[18] B. Schölkopf, et al., “Estimating the Support of a High-dimensional Distribution,” Neural Computation, 2001, pp.
1443-1471.
[19] L.M. Manevitz and M. Yousef, “One-class SVMs for Document Classification,” Journal of Machine Learning
Research, 2001, vol. 2, pp. 139-154.
[20] K. Nafees Ahmed and T. Abdul Razak, “Density Based Clustering using Modified PSO based Neighbor Selection,”
International Journal of Computer Science and Engineering, vol. 9, pp. 192-199.

More Related Content

What's hot

IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET Journal
 
50120130406035
5012013040603550120130406035
50120130406035
IAEME Publication
 
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
ijcseit
 
3 article azojete vol 7 24 33
3 article azojete vol 7 24 333 article azojete vol 7 24 33
3 article azojete vol 7 24 33
Oyeniyi Samuel
 
Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...
IRJET Journal
 
Data collection in multi application sharing wireless sensor networks
Data collection in multi application sharing wireless sensor networksData collection in multi application sharing wireless sensor networks
Data collection in multi application sharing wireless sensor networks
Pvrtechnologies Nellore
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster Results
CSCJournals
 
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET Journal
 
IRJET- Enhanced Density Based Method for Clustering Data Stream
IRJET-  	  Enhanced Density Based Method for Clustering Data StreamIRJET-  	  Enhanced Density Based Method for Clustering Data Stream
IRJET- Enhanced Density Based Method for Clustering Data Stream
IRJET Journal
 
A Survey on Balancing the Network Load Using Geographic Hash Tables
A Survey on Balancing the Network Load Using Geographic Hash TablesA Survey on Balancing the Network Load Using Geographic Hash Tables
A Survey on Balancing the Network Load Using Geographic Hash Tables
IOSR Journals
 
MultiModal Retrieval Image
MultiModal Retrieval ImageMultiModal Retrieval Image
MultiModal Retrieval Image
Konstantinos Zagoris
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...
IJECEIAES
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
IRJET Journal
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
ijdkp
 
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
ijcseit
 
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATIONGRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
ijdms
 
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
IJECEIAES
 
Improved text clustering with
Improved text clustering withImproved text clustering with
Improved text clustering with
IJDKP
 

What's hot (18)

IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction Data
 
50120130406035
5012013040603550120130406035
50120130406035
 
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
 
3 article azojete vol 7 24 33
3 article azojete vol 7 24 333 article azojete vol 7 24 33
3 article azojete vol 7 24 33
 
Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...
 
Data collection in multi application sharing wireless sensor networks
Data collection in multi application sharing wireless sensor networksData collection in multi application sharing wireless sensor networks
Data collection in multi application sharing wireless sensor networks
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster Results
 
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
 
IRJET- Enhanced Density Based Method for Clustering Data Stream
IRJET-  	  Enhanced Density Based Method for Clustering Data StreamIRJET-  	  Enhanced Density Based Method for Clustering Data Stream
IRJET- Enhanced Density Based Method for Clustering Data Stream
 
A Survey on Balancing the Network Load Using Geographic Hash Tables
A Survey on Balancing the Network Load Using Geographic Hash TablesA Survey on Balancing the Network Load Using Geographic Hash Tables
A Survey on Balancing the Network Load Using Geographic Hash Tables
 
MultiModal Retrieval Image
MultiModal Retrieval ImageMultiModal Retrieval Image
MultiModal Retrieval Image
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
 
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
 
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATIONGRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
 
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
 
Improved text clustering with
Improved text clustering withImproved text clustering with
Improved text clustering with
 

Similar to 7. 10083 12464-1-pb

Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...
Alexander Decker
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
IOSR Journals
 
50120140501016
5012014050101650120140501016
50120140501016
IAEME Publication
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
IRJET Journal
 
A NOVEL DENSITY-BASED CLUSTERING ALGORITHM FOR PREDICTING CARDIOVASCULAR DISEASE
A NOVEL DENSITY-BASED CLUSTERING ALGORITHM FOR PREDICTING CARDIOVASCULAR DISEASEA NOVEL DENSITY-BASED CLUSTERING ALGORITHM FOR PREDICTING CARDIOVASCULAR DISEASE
A NOVEL DENSITY-BASED CLUSTERING ALGORITHM FOR PREDICTING CARDIOVASCULAR DISEASE
indexPub
 
Intrusion Detection System using K-Means Clustering and SMOTE
Intrusion Detection System using K-Means Clustering and SMOTEIntrusion Detection System using K-Means Clustering and SMOTE
Intrusion Detection System using K-Means Clustering and SMOTE
IRJET Journal
 
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
ijasuc
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
Alexander Decker
 
Ir3116271633
Ir3116271633Ir3116271633
Ir3116271633
IJERA Editor
 
Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream
IRJET Journal
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving data
iaemedu
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
cscpconf
 
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
ijcsit
 
sensors-20-01011-v2 (1).pdf
sensors-20-01011-v2 (1).pdfsensors-20-01011-v2 (1).pdf
sensors-20-01011-v2 (1).pdf
Shishir Ahmed
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
IJCSIS Research Publications
 
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptxPresentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
AkimPardede2
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
 
Fuzzy In Remote Classification
Fuzzy In Remote ClassificationFuzzy In Remote Classification
Fuzzy In Remote Classification
University of Oradea
 
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
csandit
 

Similar to 7. 10083 12464-1-pb (20)

Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
 
50120140501016
5012014050101650120140501016
50120140501016
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
 
A NOVEL DENSITY-BASED CLUSTERING ALGORITHM FOR PREDICTING CARDIOVASCULAR DISEASE
A NOVEL DENSITY-BASED CLUSTERING ALGORITHM FOR PREDICTING CARDIOVASCULAR DISEASEA NOVEL DENSITY-BASED CLUSTERING ALGORITHM FOR PREDICTING CARDIOVASCULAR DISEASE
A NOVEL DENSITY-BASED CLUSTERING ALGORITHM FOR PREDICTING CARDIOVASCULAR DISEASE
 
Intrusion Detection System using K-Means Clustering and SMOTE
Intrusion Detection System using K-Means Clustering and SMOTEIntrusion Detection System using K-Means Clustering and SMOTE
Intrusion Detection System using K-Means Clustering and SMOTE
 
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
Ir3116271633
Ir3116271633Ir3116271633
Ir3116271633
 
Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving data
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
 
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
 
sensors-20-01011-v2 (1).pdf
sensors-20-01011-v2 (1).pdfsensors-20-01011-v2 (1).pdf
sensors-20-01011-v2 (1).pdf
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
 
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptxPresentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
 
Fuzzy In Remote Classification
Fuzzy In Remote ClassificationFuzzy In Remote Classification
Fuzzy In Remote Classification
 
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
 

More from IAESIJEECS

08 20314 electronic doorbell...
08 20314 electronic doorbell...08 20314 electronic doorbell...
08 20314 electronic doorbell...
IAESIJEECS
 
07 20278 augmented reality...
07 20278 augmented reality...07 20278 augmented reality...
07 20278 augmented reality...
IAESIJEECS
 
06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...
IAESIJEECS
 
05 20275 computational solution...
05 20275 computational solution...05 20275 computational solution...
05 20275 computational solution...
IAESIJEECS
 
04 20268 power loss reduction ...
04 20268 power loss reduction ...04 20268 power loss reduction ...
04 20268 power loss reduction ...
IAESIJEECS
 
03 20237 arduino based gas-
03 20237 arduino based gas-03 20237 arduino based gas-
03 20237 arduino based gas-
IAESIJEECS
 
02 20274 improved ichi square...
02 20274 improved ichi square...02 20274 improved ichi square...
02 20274 improved ichi square...
IAESIJEECS
 
01 20264 diminution of real power...
01 20264 diminution of real power...01 20264 diminution of real power...
01 20264 diminution of real power...
IAESIJEECS
 
08 20272 academic insight on application
08 20272 academic insight on application08 20272 academic insight on application
08 20272 academic insight on application
IAESIJEECS
 
07 20252 cloud computing survey
07 20252 cloud computing survey07 20252 cloud computing survey
07 20252 cloud computing survey
IAESIJEECS
 
06 20273 37746-1-ed
06 20273 37746-1-ed06 20273 37746-1-ed
06 20273 37746-1-ed
IAESIJEECS
 
05 20261 real power loss reduction
05 20261 real power loss reduction05 20261 real power loss reduction
05 20261 real power loss reduction
IAESIJEECS
 
04 20259 real power loss
04 20259 real power loss04 20259 real power loss
04 20259 real power loss
IAESIJEECS
 
03 20270 true power loss reduction
03 20270 true power loss reduction03 20270 true power loss reduction
03 20270 true power loss reduction
IAESIJEECS
 
02 15034 neural network
02 15034 neural network02 15034 neural network
02 15034 neural network
IAESIJEECS
 
01 8445 speech enhancement
01 8445 speech enhancement01 8445 speech enhancement
01 8445 speech enhancement
IAESIJEECS
 
08 17079 ijict
08 17079 ijict08 17079 ijict
08 17079 ijict
IAESIJEECS
 
07 20269 ijict
07 20269 ijict07 20269 ijict
07 20269 ijict
IAESIJEECS
 
06 10154 ijict
06 10154 ijict06 10154 ijict
06 10154 ijict
IAESIJEECS
 
05 20255 ijict
05 20255 ijict05 20255 ijict
05 20255 ijict
IAESIJEECS
 

More from IAESIJEECS (20)

08 20314 electronic doorbell...
08 20314 electronic doorbell...08 20314 electronic doorbell...
08 20314 electronic doorbell...
 
07 20278 augmented reality...
07 20278 augmented reality...07 20278 augmented reality...
07 20278 augmented reality...
 
06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...
 
05 20275 computational solution...
05 20275 computational solution...05 20275 computational solution...
05 20275 computational solution...
 
04 20268 power loss reduction ...
04 20268 power loss reduction ...04 20268 power loss reduction ...
04 20268 power loss reduction ...
 
03 20237 arduino based gas-
03 20237 arduino based gas-03 20237 arduino based gas-
03 20237 arduino based gas-
 
02 20274 improved ichi square...
02 20274 improved ichi square...02 20274 improved ichi square...
02 20274 improved ichi square...
 
01 20264 diminution of real power...
01 20264 diminution of real power...01 20264 diminution of real power...
01 20264 diminution of real power...
 
08 20272 academic insight on application
08 20272 academic insight on application08 20272 academic insight on application
08 20272 academic insight on application
 
07 20252 cloud computing survey
07 20252 cloud computing survey07 20252 cloud computing survey
07 20252 cloud computing survey
 
06 20273 37746-1-ed
06 20273 37746-1-ed06 20273 37746-1-ed
06 20273 37746-1-ed
 
05 20261 real power loss reduction
05 20261 real power loss reduction05 20261 real power loss reduction
05 20261 real power loss reduction
 
04 20259 real power loss
04 20259 real power loss04 20259 real power loss
04 20259 real power loss
 
03 20270 true power loss reduction
03 20270 true power loss reduction03 20270 true power loss reduction
03 20270 true power loss reduction
 
02 15034 neural network
02 15034 neural network02 15034 neural network
02 15034 neural network
 
01 8445 speech enhancement
01 8445 speech enhancement01 8445 speech enhancement
01 8445 speech enhancement
 
08 17079 ijict
08 17079 ijict08 17079 ijict
08 17079 ijict
 
07 20269 ijict
07 20269 ijict07 20269 ijict
07 20269 ijict
 
06 10154 ijict
06 10154 ijict06 10154 ijict
06 10154 ijict
 
05 20255 ijict
05 20255 ijict05 20255 ijict
05 20255 ijict
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 

7. 10083 12464-1-pb

  • 1. International Journal of Informatics and Communication Technology (IJ-ICT) Vol.6, No.3, December 2017, pp. 199~208 ISSN: 2252-8776, DOI: 10.11591/ijict.v6i3.pp199-208  199 Journal homepage: http://iaesjournal.com/online/index.php/IJICT Density Based Clustering with Integrated One-Class SVM for Noise Reduction K. Nafees Ahmed*, Dr. T. Abdul Razak *Departement of Computer Science, Jamal Mohamed College, Tamil nadu, India Article Info ABSTRACT Article history: Received Aug 12nd , 2017 Revised Oct 26th , 2017 Accepted Nov 6th , 2017 Information extraction from data is one of the key necessities for data analysis. Unsupervised nature of data leads to complex computational methods for analysis. This paper presents a density based spatial clustering technique integrated with one-class Support Vector Machine (SVM), a machine learning technique for noise reduction, a modified variant of DBSCAN called Noise Reduced DBSCAN (NRDBSCAN). Analysis of DBSCAN exhibits its major requirement of accurate thresholds, absence of which yields suboptimal results. However, identifying accurate threshold settings is unattainable. Noise is one of the major side-effects of the threshold gap. The proposed work reduces noise by integrating a machine learning classifier into the operation structure of DBSCAN. The Experimental results indicate high homogeneity levels in the clustering process. Keyword: Clustering DBSCAN Machine Learning Classifier Noise Reduction One-class SVM Copyright © 2017 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: K. Nafees Ahmed, Departement of Computer Science, Jamal Mohamed College, 7 Race Course Road, Khaja Nagar, Tiruchirappalli, Tamil Nadu, 620020, India. Email: nafeesjmc@gmail.com 1. INTRODUCTION The current Internet age is data rich, but information poor. Meaningful data is the chief requirement of the current age. This has led to an enormous increase in the data processing techniques. However, the major drawback is inherently embedded in the data itself. The distribution of datasets vary and hence a single technique that was designed to process data from a domain would not possibly provide effective results when applied to the data from the same domain, due to variations in the data distribution levels as time progresses. Hence techniques that are as ductile as the data itself are the only ones that can persist. Unsupervised data processing has always been a challenge due to their unpredictable nature. Ductile and tractable algorithms are the major requirements for processing such data. Domains with such requirements range from image processing to web information processing. The requirements for such processing techniques are constantly on the raise, with the increase in the amount of data being available. Extracting meaningful information from such data requires grouping them to find commonalities existing between them. This aids in better interpretation of data. Clustering is the process of grouping data such that data within a group/cluster is more cohesive compared to data in different clusters. The process of clustering makes the data meaningful for various applications. Clustering methods are usually classified as partitional and hierarchical clustering techniques. Partitional clustering techniques perform flat clustering based on a single decision criterion such as distance or density. Distance based clustering techniques include K-Means [1] clustering and CLARA [2] to name a few. Density based clustering techniques are currently on the raise due to their flexible operational nature and the stable solution sets generated by them. Density based clustering techniques include DBSCAN [3] and DenClue [4], OPTICS [5] to name a few. Hierarchical clustering algorithms are divided into agglomerative and divisive, corresponding to their basic operational nature, bottom-up or top-down [6].
  • 2.  ISSN: 2252-8776 IJECE Vol. 6, No. 3, December 2017 : 199– 208 200 Density based clustering techniques are currently on the increase due to the increase in the requirements for spatial data processing. However, most of these techniques are derivatives from DBSCAN, extending it according to their operating domains. Extensions are usually in terms of reduction in time, parameter automation, input reduction in terms of feature selection and ability to handle varied densities. FDBSCAN [7] and IDBSCAN [8] are two approaches concentrating on the reduction in time component. Both operate by identifying a minimal set of highly representative points for their operation rather than utilizing the entire data points. In addition to this, the IDBSCAN utilized a grid based data selection for reducing the input levels. Parameter fine-tuning based algorithms includes k-VDBSCAN [9], DBCLASD [10] and APSCAN [11]. The k-VDBSCAN is a parameter free technique that automatically identifies the parameters. DBCLASD operates on large data and is a parameter free clustering technique, enabling gradual cluster expansion on the basis of the neighbors and their densities. APSCAN utilizes Affinity Propagation technique to identify the cluster densities based on local data variations. A density based clustering technique aiding in the discovery of clusters with varying densities within a single dataset was proposed by Zhu et al in [12]. In general, the input data is considered to contain data distributed in uniform densities. However, some unusual real time data such as population maps tend to contain such data. This leads to a complex issue, as increasing the density levels for the entire process includes several outliers into clusters, while reducing the density levels misses several legitimate clusters. Two approaches exist in literature to solve this issue, namely modifying the algorithm appropriately and rescaling the data. The technique proposed in [12] utilizes the latter by rescaling the data to appropriately identify the clusters. DBSCAN is applied to the rescaled data to identify clusters. A similar technique that identifies varied density based clusters was proposed by Louhichi et al. in [13]. The operational process of this technique is divided into two phases. The first phase identifies the density levels of the input data using the exponential spline technique on the distance matrix. Second phase utilizes the density values determined from the first phase as local threshold parameters to identify clusters. Other density based clustering techniques include VDBSCAN [14], GMDBSCAN [15], DDSC [16], EDBSCAN [17] etc. This paper concentrates on developing a density based spatial clustering technique. DBSCAN, being the precursor of such techniques, is adopted by most of the techniques involving non-uniform clusters. It was identified that DBSCAN has high potency of generating outliers. On further assessment it was identified that several of these outliers effectively fit into the formed clusters. However, they still remain outliers due to the parameters defined for the clustering process. The parameter setting process is always optimal and is identified by the data experts through data analysis and trial and error. As resolving this issue is a complex task, this work presents NRDBSCAN with one-class SVM to reduce the noise levels. 2. RESEARCH METHOD Density based spatial clustering proposes a grouping mechanism, that operates on the basis of both distance and the node density. The major advantage of using such an approach is that it does not rely on centroid based operations, hence the inconsistencies in the formation of clusters are eliminated. Further, density based clustering is an unsupervised approach with no prior information requirements. Density based clustering approaches has the ability to identify clusters of arbitrary shapes and sizes rather than being confined to the traditional circular clusters. It was identified that DBSCAN based techniques exhibits high noise levels. Even though the base clustering process was identified to be efficient, the noise levels generated by DBSCAN were found to be excessive. This paper proposes NRDBSCAN, a hybridized density based clustering approach that initially clusters the data, after which it tries to incorporate noise into any of the existing definite clusters, hence reducing the noise levels. Prior to the actual clustering process, the input data is processed and is converted to the required format. A schema analysis is performed on the data to identify the data types of the contents. The proposed architecture accepts only numerical data, hence textual data are eliminated and categorical and ordinal data are converted to numerical formats. The data pre-processing is followed by the actual clustering process. Algorithm for the proposed NRDBSCAN is shown below.
  • 3. IJ-ICT ISSN: 2252-8776  Density Based Clustering with Integrated One-Class SVM… (K. Nafees Ahmed) 201 NRDBSCAN Algorithm 1. Input base data 2. Input threshold levels (minPts, maxDist) 3. Initialize first cluster with a random node n 4. neighbors  identifyNeighbors(n) 5. If the neighbour count<minPts, a. Set n into a separate cluster (Outlier) 6. Else expandCluster(n,neighbors) 7. Perform this process until all the nodes are grouped into a cluster 8. For each identified cluster C a. If density of C < 1 i. noise  C 9. Until Noise is not empty a. For each identified cluster C i. Prediction predict(C,noise) ii. Add all true predictions to cluster C iii. Delete corresponding entries from noise b. if no entries are deleted for the last two iterations i. Allocate new clusters for each entry in noise ii. Empty noise function identifyNeighbors (n) 1. For all nodes n1 in data a. If distance(n,n1)<maxDist i. Add n1 to neighborList 2. return neighborList function expandCluster(n,neighbors) 1. Add n to the cluster C 2. For each neighbour n1 a. neighborL1  identifyNeighbors(n1) b. if neighborL1 count >= minPts i. Add n1 to C function predict(C,noise) 1. Initialize one-class SVM with polynomial kernel 2. Set the degree of kernel to be the dimensions of C 3. Train SVM with C 4. Predictions  Apply noise to trained kernel 5. Return Predictions 2.1. Initial Level Cluster Formulation using DBSCAN The pre-processed data is analyzed and the maximum acceptable distance (maxDist) and the minimum neighbor requirements (minPts) are identified using the data distribution. However, accurately identifying the best parameters is not possible in a single iteration. This is a trial and error based fine-tuning process. Further, these parameters vary with the dataset being used. Hence distribution based transient values are initially identified and the final parameters are identified by methodically increasing and decreasing the parameter values to identify the best parameter set for the current data under analysis. 2.1.1. Cluster Identification Phase In this phase, the process begins with a random node in the dataset. The initial cluster is composed of this single node n. Neighbors of the selected node (n) are identified using maxDist as the threshold. If the node satisfies the minimum neighbor requirements (minPts), it is considered to be a part of a cluster and not an outlier. Hence, this is followed by the cluster expansion phase. 2.1.2. Cluster Expansion Phase This phase finds the neighbors of each node in n. If each of these nodes satisfies the minPts requirement they are incorporated as an entity in the cluster. This process is continued until all the appropriate nodes are grouped into the cluster. However, several clusters might exist in a dataset. Hence
  • 4.  ISSN: 2252-8776 IJECE Vol. 6, No. 3, December 2017 : 199– 208 202 some points would remain outside the composed cluster. A random such point is considered as the base for the next cluster. The neighbor identification and cluster expansion phase are repeated to identify the points corresponding to the new cluster. This process is repeated until all the points are categorized into a cluster. Though all points are categorized into clusters, not all clusters would be composed with high or even considerable densities. Some clusters would be composed of single node. These nodes are intended to be outliers. Major reasons for the occurrence of outliers are improper parameter tuning, their actual presence or inappropriate node validation in the expansion phase. Though the actual presence of outliers needs to be considered, the other two issues also play a vital role in the occurrence of outliers in DBSCAN. Hence a second level examination would incorporate several nodes that could have been components of the existing clusters, leading to a reduction in the outliers. 2.2. One-Class SVM based Noise Reduction DBSCAN eliminates several nodes, considering them as outliers. However, they might not necessarily correspond to outliers, rather they could be components of the defined clusters. Utilizing the conventional conditional checks utilized by DBSCAN again ultimately has the same effects. Hence the proposed NRDBSCAN incorporates a machine learning classifier, One-Class Support Vector Machine (SVM) for the categorization process. Utilizing all the clusters for training a binary/multi-class classifier and then performing classification on the detected outliers has several downsides. The major issue is that binary/multi-class classifier definitely categorizes the data into any one of the groups. Hence all the outliers would definitely be grouped into a cluster, leading to zero outliers. However, the proposed work is intended on grouping only the appropriate nodes and retaining the outliers. Hence a one-class classifier would be the best suited approach. One-class classifier is a special case of classifiers, where the pattern of a single class would be well known, while the patterns that do not confine to the well-known trained patterns would be considered as outliers. Output from DBSCAN is in terms of clusters. Data within clusters have obvious associations, hence these can be used as the training data for the classifiers. The noise reduction phase operates by training the one-class SVM using data from one cluster and performing predictions on the detected outliers. Outliers categorized as components of the cluster used for training are incorporated into the cluster. The residual outliers are considered for processing by training the one-class SVM with data from the next cluster. This process is continued for all the defined clusters with considerable density levels. One-class SVM was suggested by Scholkopf et al. in [18]. It operates by separating the data points from the origin and considering the origin alone as the second class. The base operational nature of one-class SVM is to maximize the distance from the hyperplane created by the data points to the origin. The resultant of this is a binary function that effectively captures the regions in the input space, returning +1 for data contained in the hyperplane and -1 for others. One-class SVM used in the NRDBSCAN utilizes the polynomial kernel [19], as the input data has the tendency to contain several dimensions. Not all data are expected to be components of the clusters. After the entire process, some residual data remains, which are categorized as outliers. 3. RESULTS AND ANALYSIS DBSCAN was implemented in C#.NET and the noise reduction components were incorporated into the operational process. Analysis of the NRDBSCAN was performed by using four benchmark datasets. Clustering specific datasets such as Iris and Banana and spatial datasets such as Quake and Forest were used for identifying the efficiency of the algorithm. Cluster densities and the number of clusters obtained from each of the datasets using NRDBSCAN is shown in figures(1-4). Cluster densities correspond to the number of nodes in each of the formulated cluster. A density of one indicates an outlier. It could be observed that the clusters formulated using the proposed approach exhibits low outlier levels and clusters of considerable densities. Iris and Quake, containing low variations exhibit low outlier levels, while Forest and Banana exhibits high variations, hence high outliers. However, it could be observed that the clusters other than outliers exhibit high density levels.
  • 5. IJ-ICT ISSN: 2252-8776  Density Based Clustering with Integrated One-Class SVM… (K. Nafees Ahmed) 203 Figure 1. Cluster Density (Iris) Figure 2. Cluster Density (Banana) Figure 3. Cluster Density (Quake) 50 54 11 23 4 7 1 0 10 20 30 40 50 60 1 2 3 4 5 6 7 No.ofNodes Cluster Number Cluster Density (Iris) 5066 131 9 40 16 13 13 6 1 1 1 1 1 1 0 1000 2000 3000 4000 5000 6000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 No.ofNodes Cluster Number Cluster Density (Banana) 2045 21 8 7 9 18 12 22 26 9 1 0 500 1000 1500 2000 2500 1 2 3 4 5 6 7 8 9 10 11 No.ofNodes Cluster Number Cluster Density (Quake)
  • 6.  ISSN: 2252-8776 IJECE Vol. 6, No. 3, December 2017 : 199– 208 204 Figure 4. Cluster Density (Forest) Intra-cluster radius of the proposed NRDBSCAN is presented in figures(5-8). It could be observed that the proposed technique exhibits low to moderate intra cluster distance levels. Moderate levels are due to the varied shaped clusters formed by the algorithm. Inter-cluster distance exhibited by the algorithm is presented in figure 9. It could be observed that low inter cluster distances are exhibited by NRDBSCAN. This validates our claim of varied shaped clusters with different density levels. Figure 5. Intra Cluster Radius (Iris) 494 4 5 4 1 1 1 1 1 1 1 1 1 1 0 100 200 300 400 500 600 1 2 3 4 5 6 7 8 9 10 11 12 13 14 No.ofNodes Cluster Number Cluster Density (Forest) 0 0.5 1 1.5 2 2.5 3 1 2 3 4 5 6 7 NodeDistance Cluster Number IntraCluster Radius (Iris)
  • 7. IJ-ICT ISSN: 2252-8776  Density Based Clustering with Integrated One-Class SVM… (K. Nafees Ahmed) 205 Figure 6. Intra Cluster Radius (Banana) Figure 7. Intra Cluster Radius (Quake) Figure 8. Intra Cluster Radius (Forest) 0 0.5 1 1.5 2 2.5 3 3.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 NodeDistance Cluster Number IntraCluster Radius (Banana) 0 0.5 1 1.5 2 2.5 3 3.5 4 1 2 3 4 5 6 7 8 9 10 11 NodeDistance Cluster Number IntraCluster Radius (Quake) 0 100 200 300 400 500 600 1 2 3 4 5 6 7 8 9 10 11 12 13 14 NodeDistance Cluster Number IntraCluster Radius (Forest)
  • 8.  ISSN: 2252-8776 IJECE Vol. 6, No. 3, December 2017 : 199– 208 206 Figure 9. Inter Cluster Distance A comparison in terms of time is carried out between DBSCAN, Modified PSO based DBSCAN [20] and the proposed NRDBSCAN is presented in figure 10. It could be observed that the time taken for modified PSO is the highest. However, the time taken by NRDBSCAN is higher compared to DBSCAN. Even-though that is the case, the difference is in terms of 0.3 sec (maximum). Hence the significance of the time increase is considered to be low. Figure 10. Time Comparison A comparison in terms of the number of clusters generated by DBSCAN and NRDBSCAN is shown in figure 11. It could be observed that the number of clusters generated by NRDBSCAN is at-least 60% less than the conventional DBSCAN. This exhibits the effective noise reduction levels exhibited by the proposed approach. 0 0.5 1 1.5 2 2.5 3 3.5 4 Iris Banana Quake Forest Distance Inter Cluster Distance Iris Banana Quake Forest Time (DBSCAN) 0.002 0.046 0.0361 0.0126 Time (NRDBSCAN) 0.005 0.3605 0.0706 0.0217 Time (Modified PSO) 0.193 3.972 3.421 1.794 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time(sec)
  • 9. IJ-ICT ISSN: 2252-8776  Density Based Clustering with Integrated One-Class SVM… (K. Nafees Ahmed) 207 Figure 11. Comparison in terms of number of clusters 4. CONCLUSION Clustering, being one of the major techniques for data analysis has seen several adaptations due to the changing constraints. The major adaptations include identifying varied shaped clusters. This paper proposes an enhancement of the DBSCAN that is a pioneer algorithm in identifying varied shaped and varied density clusters. The major downside of DBSCAN was observed to be its requirement for accurate parameter setting. A slightly varied setting results in the increase in outliers. However, identifying the perfect parameter is not feasible. Hence the proposed NRDBSCAN enhances the DBSCAN approach by introducing a noise reduction component based on one-class classifier model. One-class SVM was used to perform this process. Results exhibits significant reduction in the noise levels. Current algorithm operates effectively on fixed density clusters. Future works will concentrate on porting the algorithm to operate on datasets with varied densities and effectively identify clusters with varied density levels. REFERENCES [1] J. A. Hartigan and M.A. Wong, “Algorithm AS 136: A k-means Clustering Algorithm,” Journal of Royal Statistical Society,Series C (Applied Statistics) vol. 28, pp. 100-108, Jan 1979. [2] C.P. Wei, et al., “Empirical Comparison of Fast Clustering Algorithms for Large data sets,” in System Sciences, 2000 IEEE Proceeding of 33rd Annual Hawaii International Conference, 2000, pp. 1-10. [3] M.Ester, et al., “A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in 2nd International Conference on Knowledge Discovery and Data Mining. KDD-1996, vol. 96, pp. 226-231. 1996. [4] A. Hinneburg and D.A. Keim, “An Efficient Approach to Clustering in Large Multimedia Databases with Noise,” in International Conference on Knowlegde Discovery and Data Mining, KDD-1998, vol. 98, pp. 58-65, 1998. [5] M. Ankerst, et al., “OPTICS: Ordering Points to Identify the Clustering Structure,” in ACM SIGMOD Record 1999, vol. 28, pp. 49-60, 1999. [6] E. Güngör and A. Özmen, “Distance and Density based Clustering Algorithm using Gaussian Kernel,” Expert Systems with Appilcations, vol. 1, pp. 10-20, 2017. [7] S. Zhou, et al., “FDBSCAN: A Fast DBSCAN Algorithm,” Ruan Jian Xue Bao, vol. 11, pp. 735-744, 2000. [8] C.F. Tsain and H.F. Yeh, “Npust: An Efficient Clustering Algorithm using Partition Space Technique for Large Databases,” in International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp. 787-796, 2009, Springer Berlin Heidelberg. [9] A.R. Chowdhury, et al., “An Efficient Method forSsubjectively Choosing Parameter „k‟Automatically in VDBSCAN (Varied Density Based Spatial Clustering of Applications with Noise)Algorithm,” in Computer and Automation Engineering, 2010, ICCAE 2010. Second International Conference, IEEE, pp. 38-41. [10] M. Parimala, et al., “A Survey on Density Based Clustering Algorithms for Mining Large Spatial Databases,” International Journal of Advanced Science and Technology, vol. 31, pp. 59-66, 2011. [11] X. Chen, et al., “APSCAN: A Parameter Free Algorithm for Clustering,” Pattern Recognition Letters, vol. 32, pp. 973-986, 2011. [12] Y. Zhu, et al., “Density-ratio Based Clustering for Discovering Clusters with Varying Densities,” Pattern Recognition, vol. 60, pp. 983-997, 2016. [13] S. Louhichi, et al., “Unsupervised Varied Density Based Clustering Algorithm using Spline,” Pattern Recognition Letters, 2016. 0 50 100 150 200 250 300 Iris Banana Quake Forest ClusterNumber No of Clusters # Clusters (DBSCAN) # Clusters (NRDBSCAN)
  • 10.  ISSN: 2252-8776 IJECE Vol. 6, No. 3, December 2017 : 199– 208 208 [14] P. Liu, et al., “VDBSCAN: Varied Density Based Spatial Clustering of Applications with Noise,” in Service Systems and Service Management, 2007, IEEE International Conference, 2007, pp. 1-4. [15] C. Xiaoyum, et al., “GMDBSCAN: Multi-Density DBSCAN Cluster Based on Grid,” in e-Business Engineering, 2008, ICEBE 2008. IEEE International Conference, 2008, pp. 780-783. [16] B. Borah and D.K. Bhattacharyya, “DDSC: A Density Differentiated Spatial Clustering Technique,” Journal of computers, vol. 3, pp. 72-79, 2008. [17] A. Ram, et al., “An Enhanced Density Based Spatial Clustering of Applications with Noise,” in Advanced Computing Conference, 2009. IACC 2009. IEEE International, 2009, pp. 1475-1478. [18] B. Schölkopf, et al., “Estimating the Support of a High-dimensional Distribution,” Neural Computation, 2001, pp. 1443-1471. [19] L.M. Manevitz and M. Yousef, “One-class SVMs for Document Classification,” Journal of Machine Learning Research, 2001, vol. 2, pp. 139-154. [20] K. Nafees Ahmed and T. Abdul Razak, “Density Based Clustering using Modified PSO based Neighbor Selection,” International Journal of Computer Science and Engineering, vol. 9, pp. 192-199.