SlideShare a Scribd company logo
1 of 5
Download to read offline
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 166
SCALABLE AND EFFICIENT CLUSTER-BASED FRAMEWORK FOR
MULTIDIMENSIONAL INDEXING
Aretty Narayana1
, M. Srinivasa Rao2
, R.V.Krishnaiah3
1
M.Tech Student, 2
Associate Professor, 3
Principal, Dept. of CSE, DRKCET, Hyderabad, AP, India
arettynarayan4u@gmail.com, msrao@drkgroup.org, r.v.krishnaiah@gmail.com
Abstract
Indexing high dimensional data has its utility in many real world applications. Especially the information retrieval process is
dramatically improved. The existing techniques could overcome the problem of “Curse of Dimensionality” of high dimensional data
sets by using a technique known as Vector Approximation-File which resulted in sub-optimal performance. When compared with VA-
File clustering results in more compact data set as it uses inter-dimensional correlations. However, pruning of unwanted clusters is
important. The existing pruning techniques are based on bounding rectangles, bounding hyper spheres have problems in NN search.
To overcome this problem Ramaswamy and Rose proposed an approach known as adaptive cluster distance bounding for high
dimensional indexing which also includes an efficient spatial filtering. In this paper we implement this high-dimensional indexing
approach. We built a prototype application to for proof of concept. Experimental results are encouraging and the prototype can be
used in real time applications.
Index Terms–Clustering, high dimensional indexing, similarity measures, and multimedia databases
---------------------------------------------------------------------***-------------------------------------------------------------------------
1. INTRODUCTION
Data mining has been around for many years for
programmatically analyzing huge amount of historical data.
Clustering is one of the data mining techniques which is
widely used to group similar objects into a clusters. However,
clustering is not easy with high-dimensional data which is
being produced in many domains such as digital multimedia,
CAD/CAM systems, Geographical Information Systems
(GIS), stock markets, and medical imaging. The high
dimensional data is very huge ranging from 100 GB to 100 TB
or more. Enterprises have to deal with such huge data. Such
data is accessed using NN queries and spatial queries. Many
researches came into existence on queries high dimensional
data [1], [2]. These techniques use similarity measures like
Euclidean Distance while making clustering. In [3] search
performance has been improved using R-tree like structures.
All the existing techniques expect the data sets to have
uniform distribution of correlations in order to overcome the
problem of “curse of dimensionality”. As the nearest and
distant neighbors are not distinguishable, indexing such
datasets is not possible. The real world data sets on the other
hand have non-uniform distribution of correlations. Therefore
these data sets can be indexed really using techniques such as
Euclidean Distance. The usage of ED is a significant research
activity which is used in many applications including content
– based image retrieval [4], [5]. In this paper we focused on
the real world datasets which are high-dimensional in nature
for indexing.
2. PRIOR WORKS
In case of low dimensional data techniques such as hyper-
rectangles [6], [7], hyper spheres [8] or combination of both
[9] were used for NN searches and effective data retrieval.
Manyresearches were found in the literature to deal with high-
dimensional data. Multi-dimensional indexing always
outperforms low dimensional structures as they can access
data quickly when compared with sequential scan. However,
Weber et al. [10] proved that when dimensionality is more
than 10, sequential scan is better than using indexing. The
degradation of performance is due to the “curse of
dimensionality” concept proposed in [11]. VA-files concept
became popular to overcome thisproblem. The VA-File
technique divides space into many hyper-rectangular cells and
the approximation file holds strings pertaining to encoded bits
that represent non-empty cell locations. The vector
approximation ultimately leads to scalar quantization which
overcomes curse of dimensionality. LDR [12] uses non-linear
approximation in order to perform sequential scan. It is
achieved by using dimensionality reduction and also
clustering. There are some hybrid methods such as IQ-Tree
[13] and A-Tree [14] that makes use of both VA and tree
based index. Other methods like Pyramid Tree [15], iDistnce
[16] and LDC [17] reduce transformations based on local
dimensionality. A reference point is used to evaluate partitions
made on dataset. Each partition has feature vectors that are
indexed using their centroid-distance. When queries are made
the spheres increase gradually until the cluster sphere is
intersected. For quality reasons, the query processing
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 167
identifies centroid – distances. The radius for search is
adjusted in such a way that the NN query returns exact results.
Another experiment is made using approximation layer in
LDC [17] by creating a box identification code which
represents resident points. Once set of candidates are
identified, then the B+ tree is used to filter out unwanted
points. The resultant elements are finally retrieved in the NN
result. Care is taken to control search space in order to reduce
disk IO and also ensure that accurate results are retrieved by
NN queries.
3. CLUSTER DISTANCE BOUNDING
Our procedure to measure distances to clusters is based on the
cluster distance bounding proposed by Ramaswamy and Rose
[18]. The architectural overview of the frameworkproposed by
them is as shown in fig. 1.
Fig. 1 –Architecture of Proposed Index Structure (excerpt
from [18])
As seen in fig. 1the index structure contains array of cluster
centroids which point to the cluster locations. The whole
structure is like an index in book which will improve the query
processing speed. In the same fashion, the NN queries and
other queries on high-dimensional data can be processed
faster. The data objects are grouped into number of clusters
namely Voronoi clusters. The cluster-based indexing makes it
more flexible and adaptable thus making the whole system
scalable. This architecture makes the query retrieval very
efficient and faster than other existing clustering algorithms
and related indexes.
Their algorithms are used to implement the prototype which
demonstrates the high dimensional indexing process. The first
algorithm we used in the prototype application is to make
voronoi clusters. The algorithms are presented in fig. 2, fig. 3
and fig. 4. More information about the algorithms and
notations used inthe algorithms can be found in [18]. It is as
shown in fig. 1
Algorithm 1 VORNOI-CLUSTERS(X, K)
1: // Generic clustering algorithm returns
// K cluster centroids
{cm}K
=1 ---- GenericCluster(X,K)
2: set l=0, X1= Ø, X2= Ø…… XK= Ø
3: while l< |X| do
4: l=l+1
5: //Find the centroid nearest to data element X1
K=arg min d(Xl , Cm)
6: //Move Xl to the corresponding Voronoi partition
XK = XK Ụ {Xl}
7: end while
8: return {Xm
K
=1}, {cm}K
=1
Fig. 2–Algorithm for Voronoi Clusters (excerpt from [18])
As can be seen in fig. 2, the algorithm takes a dataset and K
value as input and generates and returns required Voronoi
clusters. Initially a generic clustering algorithm like K-means
is used to get cluster centroids. The algorithm further
processes and finally returns Voronoi clusters. In fig. 3
algorithm 2 is presented. It is meant for making kNN search.
Algorithm 2 KNN-SEARCH(q)
1: //Initialize
set FLAG=0, count = 0,N = 0, kNN = φ
2: //Evaluate query-cluster distance bounds
dLB[] ←HyperplaneBound(q)
3: //Sort the query-cluster distance bounds in
ascending
//order
{dsort
LB[], o[]} ←SortArray(dLB,’ascend’)
4: while FLAG==0 do
5: count = count + 1
6: //Find the kNNs upto current cluster
{Nc, kNN} ←FindkNNsIn(q,Xo[count], kNN)
7: //Update number of elements scanned
N = N + Nc
8: //Find the kNN radius
dkNN=Farthest(q, kNN)
9: if count < K then
10: if N >k then
11: if dkNN< dsort
LB [count + 1] then
12: set FLAG=1 //kNNs found, search ends
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 168
13: end if
14: end if
15: else
16: set FLAG=1 //all clusters scanned, search ends
17: end if
18: end while
19: return kNN
Fig. 3 – kNN Search Algorithm
As seen in fig. 3, it is evident that the kNN algorithm is meant
for performing k-Nearest Neighbor Search. It takes a query as
input and returns k-Nearest Neighbors as possible search
results. Algorithm 3 is meant for providing some support to
algorithm 2.
Algorithm 3 FindkNNsIn(q,A, I)
1: set Nc= 0, F=Open(A), kNN = I
2: while !(EOF(F)) do
3: // Load the next cluster page
C=LoadNextPage(F)
4: //Merge kNN list with current page
Xcand = C Ụ kNN
5: //Find the kNNs within the candidate list
kNN[] ←FindkNN(q,Xcand)
6: //Update number of elements scanned
Nc= Nc+ |C|
7: end while
8: return Nc, kNN
Fig. 4–Algorithm for some intermediary functionality
As seen in fig. 4, it is evident that it performs some
functionality and returns intermediary results which are used
by its caller. Its caller is algorithm 2 that is meant for k-NN
search which returns k-Nearest Neighbors that satisfy the
given query.
4. RESULTS
We built a prototype application to demonstrate the proof of
concept of the paper. We have used many real time data sets to
apply the index structure proposed in [18]. The datasets used
in experiments include SENSORS, HISTOGRAM, AERIAL,
CORTINA and BIO-RETINA. The details of dimensions,
number of vectors and size in terms of pages of these data sets
are presented in table 1.
Table 1- Dataset details
As can be seen in table 1, the dimensionality of data sets and
the number of vectors involved in each dataset are given. The
environment used for developing prototype application
includes a PC with 4GB RAM, Core 2 Dual processor running
Windows XP operating system. Java platform is used to build
the application. NetBeans is used as an IDE (Integrated
Development Environment). Experiments are made in terms
of number of sequential pages scanned, number of random IO
operations, with KNNs and various datasets.
Fig. 5–IO Performance of Distance Bounds (BIO-RETINA
dataset)
As can be seen in fig.5, the horizontal axis represents number
of random IOs while the vertical axis represents number of
sequential pages. As shown in results hyperlane bounds
outperform the other bounds.
Fig. 6 – IO Performance of Distance Bounds (AERIAL
dataset)
0
500
1000
1500
2000
2500
3000
3500
1 2 3 4 5 6 7 8
NumberofSequential
Pages
No.Of Random IOS (log scale)
MBR
Sphere
Bound
Hyperplane(f
wd.comp.)
Hyperplane(f
ull)
0
2000
4000
6000
8000
1 2 3 4 5 6 7 8 9
Numberof
SequentialPages
No.of Random IOS(log scale)
MBR
Sphere
Bound
Hyperplane(
fwd.comp.)
Hyperplane(
full)
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 169
As can be seen in fig.6, the horizontal axis represents number
of random IOs while the vertical axis represents number of
sequential pages. As shown in results hyperlane bounds
outperform the other bounds.
Fig. 7 – IO Performance of Distance Bounds (SENSORS
dataset)
As can be seen in fig. 7, the horizontal axis represents number
of random IOs while the vertical axis represents number of
sequential pages. As shown in results hyperlane bounds
outperform the other bounds.
Fig. 8 – IO Performance of Distance Bounds (SENSORS
dataset)
As can be seen in fig. 8, the horizontal axis represents number
of random IOs while the vertical axis represents number of
sequential pages. As shown in results hyperlane bounds
outperform the other bounds.
CONCLUSIONS
Dealing with real multi-dimensional datasets that have
correlations distributed in non-uniform fashion is not easy.
Indexing such data is a challenging task. The existing indexing
methods used VA-File through scalar quantization. However,
this approach is proved to be suboptimal. Therefore, in this
paper, we implement a new approach proposed by
Ramaswamy and Rose [18] which groups datasets into
Voronoi clusters. It is achieved by using similarity measures
such as Mahalanobis and Euclidean. This has resulted in the
reduction of IOs. We also built a prototype application which
demonstrates the utility of the proposed approach. We made
experiments using various multi-dimensional data sets. The
empirical results revealed that the indexing approach followed
in the prototype is effective and can be used in the real world
applications.
REFERENCES
[1] K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft,
“When is“nearest neighbor” meaningful?” in ICDT, 1999, pp.
217–235.
[2] C. C. Aggarwal, A. Hinneburg, and D. A. Keim, “On the
surprisingbehavior of distance metrics in high dimensional
spaces,” in ICDT, 2001,pp. 420–434.
[3] B. U. Pagel, F. Korn, and C. Faloutsos, “Deflating the
dimensionalitycurse using multiple fractal dimensions,” in
ICDE, 2000, pp. 589–598.
[4] T. Huang and X. S. Zhou, “Image retrieval with relevance
feedback:From heuristic weight adjustment to optimal
learning methods,” in ICIP,vol. 3, 2001, pp. 2–5.
[5] J. Davis, B. Kulis, P. Jain, S. Sra, and I. Dhillon,
“Information-theoreticmetric learning,” in ICML, 2007, pp.
209–216.
[6] A. Guttman, “R-trees: A dynamic index structure for
spatial searching.”in SIGMOD, 1984, pp. 47–57.
[7] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger,
“The R*-tree:an efficient and robust access method for points
and rectangles,” inSIGMOD, 1990, pp. 322–331.
[8] D. A. White and R. Jain, “Similarity indexing with the SS-
tree,” inICDE, 1996, pp. 516–523.
[9] N. Katayama and S. Satoh, “The SR-tree: An index
structure for highdimensionalnearest neighbor queries.” in
SIGMOD, May 1997, pp. 369–380.
[10] R. Weber, H. Schek, and S. Blott, “A quantitative
analysis and performancestudy for similarity-search methods
in high-dimensional spaces.”in VLDB, August 1998, pp. 194–
205.
[11] R. Bellman, Adaptive Control Processes: A Guided Tour.
Princeton,NJ: Princeton University Press, 1961.
[12] K. Chakrabarti and S. Mehrotra, “Local dimensionality
reduction: A newapproach to indexing high dimensional
spaces.” in VLDB, September2000, pp. 89–100.
[13] S. Berchtold, C. Bohm, H. V. Jagadish, H. P. Kriegel, and
J. Sander,“Independent Quantization: An index compression
technique for highdimensionaldata spaces,” in ICDE, 2000,
pp. 577–588.
[14] Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima,
“The A-tree: Anindex structure for high-dimensional spaces
using relative approximation,”in VLDB, September 2000, pp.
516–526.
0
200
400
600
800
1000
1200
1400
1 2 3 4
NumberofSequentialPages
No.of Random IOS(log scale)
MBR
Sphere Bound
Hyperplane(fwd.c
omp.)
Hyperplane(full)
0
0.5
1
1.5
2
2.5
1 2 3 4 5 6 7
NumberofSequential
Pages
No.of Random IOS(log scale)
10NN
20NN
50NN
IJRET: International Journal of Research in Engineering and Technology
__________________________________________________________________________________________
Volume: 02 Issue: 08 | Aug-2013, Available @
[15] S. Berchtold, C. Bohm, and H. Kriegel, “The Pyramid
technique:Towards breaking the curse of dimensionality,” in
SIGMOD, 1998, pp.142–153.
[16] C. Yu, B. C. Ooi, K. L. Tan, and H. V. Jagadish,
“Indexing the distance:An efficient method to knn
processing.” in VLDB, September 2001, pp.421
[17] N. Koudas, B. C. Ooi, H. T. Shen, and A. K. H. Tung,
“LDC: Enablingsearch by partial distance in a hyper
dimensional space.” in ICDE, 2004,pp. 6–17.
[18] Sharadh Ramaswamyand Kenneth Rose
Cluster Distance Bounding for High Dimensional Indexing”.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING, VOL.23, NO. 6, June 2011. P1
BIOGRAPHIES:
Aretty Narayana has completed B.Tech (E.C.E)
from Srinivas Reddy Institute of Technology and
pursuing M.Tech (C.S.E) in DRK College of
Engineering and Technology, JNTUH,
Hyderabad, Andhra Pradesh, India. His main
research interest includes Data Mining, Database and DWH.
M. Srinivasa Rao is working as an Associate
Professor in DRK College of Engineering and
Technology, JNTUH, Hyderabad, Andhra
Pradesh, India. He is pursuing Ph.D in
Information Security. He has completed M.Tech
(C.S.E) from JNTUH. His main research
interest includes Information Security and Computer Ad
Networks.
Dr. R. V. Krishnaiah, did M.Tech (EIE) from
NIT Waranagal, MTech (CSE) form JNTU
Ph.D, from JNTU Ananthapur, He has
memberships in professional bodies MIE,
MIETE, MISTE. His main research
include Image Processing, Security systems, Sensors,
Intelligent Systems, Computer networks, Data mining,
Software Engineering, network protection and security
control. He has published many papers and Editorial Member
and Reviewer for some national and international journals.
IJRET: International Journal of Research in Engineering and Technology eISSN:
__________________________________________________________________________________________
2013, Available @ http://www.ijret.org
] S. Berchtold, C. Bohm, and H. Kriegel, “The Pyramid-
que:Towards breaking the curse of dimensionality,” in
] C. Yu, B. C. Ooi, K. L. Tan, and H. V. Jagadish,
“Indexing the distance:An efficient method to knn
, September 2001, pp.421–430.
Ooi, H. T. Shen, and A. K. H. Tung,
“LDC: Enablingsearch by partial distance in a hyper-
17.
] Sharadh Ramaswamyand Kenneth Rose, “Adaptive
Cluster Distance Bounding for High Dimensional Indexing”.
S ON KNOWLEDGE AND DATA
ENGINEERING, VOL.23, NO. 6, June 2011. P1-15.
has completed B.Tech (E.C.E)
from Srinivas Reddy Institute of Technology and
pursuing M.Tech (C.S.E) in DRK College of
Engineering and Technology, JNTUH,
Hyderabad, Andhra Pradesh, India. His main
research interest includes Data Mining, Database and DWH.
is working as an Associate
Professor in DRK College of Engineering and
Technology, JNTUH, Hyderabad, Andhra
Pradesh, India. He is pursuing Ph.D in
Information Security. He has completed M.Tech
(C.S.E) from JNTUH. His main research
st includes Information Security and Computer Ad-Hoc
, did M.Tech (EIE) from
CSE) form JNTU,
, from JNTU Ananthapur, He has
memberships in professional bodies MIE,
MIETE, MISTE. His main research interests
include Image Processing, Security systems, Sensors,
Intelligent Systems, Computer networks, Data mining,
Software Engineering, network protection and security
control. He has published many papers and Editorial Member
nal and international journals.
eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
170

More Related Content

What's hot

A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
 
Fault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms andFault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms andeSAT Publishing House
 
Fault diagnosis using genetic algorithms and principal curves
Fault diagnosis using genetic algorithms and principal curvesFault diagnosis using genetic algorithms and principal curves
Fault diagnosis using genetic algorithms and principal curveseSAT Journals
 
Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
 
Privacy preserving clustering on centralized data through scaling transf
Privacy preserving clustering on centralized data through scaling transfPrivacy preserving clustering on centralized data through scaling transf
Privacy preserving clustering on centralized data through scaling transfIAEME Publication
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingIRJET Journal
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1bPRAWEEN KUMAR
 
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...TELKOMNIKA JOURNAL
 
AN EFFECT OF USING A STORAGE MEDIUM IN DIJKSTRA ALGORITHM PERFORMANCE FOR IDE...
AN EFFECT OF USING A STORAGE MEDIUM IN DIJKSTRA ALGORITHM PERFORMANCE FOR IDE...AN EFFECT OF USING A STORAGE MEDIUM IN DIJKSTRA ALGORITHM PERFORMANCE FOR IDE...
AN EFFECT OF USING A STORAGE MEDIUM IN DIJKSTRA ALGORITHM PERFORMANCE FOR IDE...ijcsit
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving dataiaemedu
 

What's hot (18)

50120130406022
5012013040602250120130406022
50120130406022
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 
Fault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms andFault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms and
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
 
Fault diagnosis using genetic algorithms and principal curves
Fault diagnosis using genetic algorithms and principal curvesFault diagnosis using genetic algorithms and principal curves
Fault diagnosis using genetic algorithms and principal curves
 
Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clustering
 
Az36311316
Az36311316Az36311316
Az36311316
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce
 
A0360109
A0360109A0360109
A0360109
 
Privacy preserving clustering on centralized data through scaling transf
Privacy preserving clustering on centralized data through scaling transfPrivacy preserving clustering on centralized data through scaling transf
Privacy preserving clustering on centralized data through scaling transf
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster Results
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
 
20320140501002 2
20320140501002 220320140501002 2
20320140501002 2
 
AN EFFECT OF USING A STORAGE MEDIUM IN DIJKSTRA ALGORITHM PERFORMANCE FOR IDE...
AN EFFECT OF USING A STORAGE MEDIUM IN DIJKSTRA ALGORITHM PERFORMANCE FOR IDE...AN EFFECT OF USING A STORAGE MEDIUM IN DIJKSTRA ALGORITHM PERFORMANCE FOR IDE...
AN EFFECT OF USING A STORAGE MEDIUM IN DIJKSTRA ALGORITHM PERFORMANCE FOR IDE...
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving data
 

Viewers also liked

Websites using touchless interaction having graphical
Websites using touchless interaction having graphicalWebsites using touchless interaction having graphical
Websites using touchless interaction having graphicaleSAT Publishing House
 
Design and analysis of low power pseudo differential
Design and analysis of low power pseudo differentialDesign and analysis of low power pseudo differential
Design and analysis of low power pseudo differentialeSAT Publishing House
 
Advanced control systems in two wheeler and finding the collision site of the...
Advanced control systems in two wheeler and finding the collision site of the...Advanced control systems in two wheeler and finding the collision site of the...
Advanced control systems in two wheeler and finding the collision site of the...eSAT Publishing House
 
Devising a new technique to reduce highway barrier accidents
Devising a new technique to reduce highway barrier accidentsDevising a new technique to reduce highway barrier accidents
Devising a new technique to reduce highway barrier accidentseSAT Publishing House
 
Detect and immune mobile cloud infrastructure
Detect and immune mobile cloud infrastructureDetect and immune mobile cloud infrastructure
Detect and immune mobile cloud infrastructureeSAT Publishing House
 
Multistage interconnection networks a transition to optical
Multistage interconnection networks a transition to opticalMultistage interconnection networks a transition to optical
Multistage interconnection networks a transition to opticaleSAT Publishing House
 
Power house automation using wireless communication
Power house automation using wireless communicationPower house automation using wireless communication
Power house automation using wireless communicationeSAT Publishing House
 
Flow balanced routing in wireless sensor networks
Flow balanced routing in wireless sensor networksFlow balanced routing in wireless sensor networks
Flow balanced routing in wireless sensor networkseSAT Publishing House
 
Desinging dsp (0, 1) acceptance sampling plans based on
Desinging dsp (0, 1) acceptance sampling plans based onDesinging dsp (0, 1) acceptance sampling plans based on
Desinging dsp (0, 1) acceptance sampling plans based oneSAT Publishing House
 
Evaluvation of noise level and its adverse effect in metal die manufacuturing...
Evaluvation of noise level and its adverse effect in metal die manufacuturing...Evaluvation of noise level and its adverse effect in metal die manufacuturing...
Evaluvation of noise level and its adverse effect in metal die manufacuturing...eSAT Publishing House
 
Enhancing the capability of supply chain by
Enhancing the capability of supply chain byEnhancing the capability of supply chain by
Enhancing the capability of supply chain byeSAT Publishing House
 
Data detection with a progressive parallel ici canceller in mimo ofdm
Data detection with a progressive parallel ici canceller in mimo ofdmData detection with a progressive parallel ici canceller in mimo ofdm
Data detection with a progressive parallel ici canceller in mimo ofdmeSAT Publishing House
 
Analysis of fuel consumption and oxides of nitrogen using oxygen enriched air...
Analysis of fuel consumption and oxides of nitrogen using oxygen enriched air...Analysis of fuel consumption and oxides of nitrogen using oxygen enriched air...
Analysis of fuel consumption and oxides of nitrogen using oxygen enriched air...eSAT Publishing House
 
Indoor localization in sensor network with
Indoor localization in sensor network withIndoor localization in sensor network with
Indoor localization in sensor network witheSAT Publishing House
 
Pygmy deposit system through android cloud
Pygmy deposit system through android cloudPygmy deposit system through android cloud
Pygmy deposit system through android cloudeSAT Publishing House
 
A survey on mobility models for vehicular ad hoc networks
A survey on mobility models for vehicular ad hoc networksA survey on mobility models for vehicular ad hoc networks
A survey on mobility models for vehicular ad hoc networkseSAT Publishing House
 
Atenolol epoxide reaction data analysis in batch reactor
Atenolol epoxide reaction data analysis in batch reactorAtenolol epoxide reaction data analysis in batch reactor
Atenolol epoxide reaction data analysis in batch reactoreSAT Publishing House
 
Comparative study on performance of coagulants in
Comparative study on performance of coagulants inComparative study on performance of coagulants in
Comparative study on performance of coagulants ineSAT Publishing House
 

Viewers also liked (20)

Websites using touchless interaction having graphical
Websites using touchless interaction having graphicalWebsites using touchless interaction having graphical
Websites using touchless interaction having graphical
 
Design and analysis of low power pseudo differential
Design and analysis of low power pseudo differentialDesign and analysis of low power pseudo differential
Design and analysis of low power pseudo differential
 
Advanced control systems in two wheeler and finding the collision site of the...
Advanced control systems in two wheeler and finding the collision site of the...Advanced control systems in two wheeler and finding the collision site of the...
Advanced control systems in two wheeler and finding the collision site of the...
 
Devising a new technique to reduce highway barrier accidents
Devising a new technique to reduce highway barrier accidentsDevising a new technique to reduce highway barrier accidents
Devising a new technique to reduce highway barrier accidents
 
Scrum an agile process
Scrum an agile processScrum an agile process
Scrum an agile process
 
Detect and immune mobile cloud infrastructure
Detect and immune mobile cloud infrastructureDetect and immune mobile cloud infrastructure
Detect and immune mobile cloud infrastructure
 
Multistage interconnection networks a transition to optical
Multistage interconnection networks a transition to opticalMultistage interconnection networks a transition to optical
Multistage interconnection networks a transition to optical
 
Power house automation using wireless communication
Power house automation using wireless communicationPower house automation using wireless communication
Power house automation using wireless communication
 
Flow balanced routing in wireless sensor networks
Flow balanced routing in wireless sensor networksFlow balanced routing in wireless sensor networks
Flow balanced routing in wireless sensor networks
 
Desinging dsp (0, 1) acceptance sampling plans based on
Desinging dsp (0, 1) acceptance sampling plans based onDesinging dsp (0, 1) acceptance sampling plans based on
Desinging dsp (0, 1) acceptance sampling plans based on
 
Evaluvation of noise level and its adverse effect in metal die manufacuturing...
Evaluvation of noise level and its adverse effect in metal die manufacuturing...Evaluvation of noise level and its adverse effect in metal die manufacuturing...
Evaluvation of noise level and its adverse effect in metal die manufacuturing...
 
Enhancing the capability of supply chain by
Enhancing the capability of supply chain byEnhancing the capability of supply chain by
Enhancing the capability of supply chain by
 
Data detection with a progressive parallel ici canceller in mimo ofdm
Data detection with a progressive parallel ici canceller in mimo ofdmData detection with a progressive parallel ici canceller in mimo ofdm
Data detection with a progressive parallel ici canceller in mimo ofdm
 
Analysis of fuel consumption and oxides of nitrogen using oxygen enriched air...
Analysis of fuel consumption and oxides of nitrogen using oxygen enriched air...Analysis of fuel consumption and oxides of nitrogen using oxygen enriched air...
Analysis of fuel consumption and oxides of nitrogen using oxygen enriched air...
 
Indoor localization in sensor network with
Indoor localization in sensor network withIndoor localization in sensor network with
Indoor localization in sensor network with
 
Pygmy deposit system through android cloud
Pygmy deposit system through android cloudPygmy deposit system through android cloud
Pygmy deposit system through android cloud
 
A survey on mobility models for vehicular ad hoc networks
A survey on mobility models for vehicular ad hoc networksA survey on mobility models for vehicular ad hoc networks
A survey on mobility models for vehicular ad hoc networks
 
Atenolol epoxide reaction data analysis in batch reactor
Atenolol epoxide reaction data analysis in batch reactorAtenolol epoxide reaction data analysis in batch reactor
Atenolol epoxide reaction data analysis in batch reactor
 
Design of green data center
Design of green data centerDesign of green data center
Design of green data center
 
Comparative study on performance of coagulants in
Comparative study on performance of coagulants inComparative study on performance of coagulants in
Comparative study on performance of coagulants in
 

Similar to Scalable and efficient cluster based framework for

A h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningA h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...IOSR Journals
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using dataeSAT Publishing House
 
Variance rover system
Variance rover systemVariance rover system
Variance rover systemeSAT Journals
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A ReviewIOSRjournaljce
 
CLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATACLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATAcsandit
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
 
Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Alexander Decker
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...IOSR Journals
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataeSAT Journals
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
 
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSIONEFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSIONAM Publications,India
 
Analysis and implementation of modified k medoids
Analysis and implementation of modified k medoidsAnalysis and implementation of modified k medoids
Analysis and implementation of modified k medoidseSAT Publishing House
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporaryprjpublications
 

Similar to Scalable and efficient cluster based framework for (20)

Noura2
Noura2Noura2
Noura2
 
A h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningA h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learning
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using data
 
Variance rover system
Variance rover systemVariance rover system
Variance rover system
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A Review
 
CLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATACLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATA
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity Measure
 
Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological data
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSIONEFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
 
Analysis and implementation of modified k medoids
Analysis and implementation of modified k medoidsAnalysis and implementation of modified k medoids
Analysis and implementation of modified k medoids
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporary
 

More from eSAT Publishing House

Likely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnamLikely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnameSAT Publishing House
 
Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...eSAT Publishing House
 
Hudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnamHudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnameSAT Publishing House
 
Groundwater investigation using geophysical methods a case study of pydibhim...
Groundwater investigation using geophysical methods  a case study of pydibhim...Groundwater investigation using geophysical methods  a case study of pydibhim...
Groundwater investigation using geophysical methods a case study of pydibhim...eSAT Publishing House
 
Flood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, indiaFlood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, indiaeSAT Publishing House
 
Enhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity buildingEnhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity buildingeSAT Publishing House
 
Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...eSAT Publishing House
 
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...eSAT Publishing House
 
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...eSAT Publishing House
 
Shear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a reviewShear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a revieweSAT Publishing House
 
Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...eSAT Publishing House
 
Risk analysis and environmental hazard management
Risk analysis and environmental hazard managementRisk analysis and environmental hazard management
Risk analysis and environmental hazard managementeSAT Publishing House
 
Review study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear wallsReview study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear wallseSAT Publishing House
 
Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...eSAT Publishing House
 
Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...eSAT Publishing House
 
Coastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of indiaCoastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of indiaeSAT Publishing House
 
Can fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structuresCan fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structureseSAT Publishing House
 
Assessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildingsAssessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildingseSAT Publishing House
 
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...eSAT Publishing House
 
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...eSAT Publishing House
 

More from eSAT Publishing House (20)

Likely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnamLikely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnam
 
Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...
 
Hudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnamHudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnam
 
Groundwater investigation using geophysical methods a case study of pydibhim...
Groundwater investigation using geophysical methods  a case study of pydibhim...Groundwater investigation using geophysical methods  a case study of pydibhim...
Groundwater investigation using geophysical methods a case study of pydibhim...
 
Flood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, indiaFlood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, india
 
Enhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity buildingEnhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity building
 
Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...
 
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
 
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...
 
Shear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a reviewShear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a review
 
Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...
 
Risk analysis and environmental hazard management
Risk analysis and environmental hazard managementRisk analysis and environmental hazard management
Risk analysis and environmental hazard management
 
Review study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear wallsReview study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear walls
 
Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...
 
Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...
 
Coastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of indiaCoastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of india
 
Can fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structuresCan fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structures
 
Assessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildingsAssessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildings
 
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
 
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
 

Recently uploaded

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 

Recently uploaded (20)

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 

Scalable and efficient cluster based framework for

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 166 SCALABLE AND EFFICIENT CLUSTER-BASED FRAMEWORK FOR MULTIDIMENSIONAL INDEXING Aretty Narayana1 , M. Srinivasa Rao2 , R.V.Krishnaiah3 1 M.Tech Student, 2 Associate Professor, 3 Principal, Dept. of CSE, DRKCET, Hyderabad, AP, India arettynarayan4u@gmail.com, msrao@drkgroup.org, r.v.krishnaiah@gmail.com Abstract Indexing high dimensional data has its utility in many real world applications. Especially the information retrieval process is dramatically improved. The existing techniques could overcome the problem of “Curse of Dimensionality” of high dimensional data sets by using a technique known as Vector Approximation-File which resulted in sub-optimal performance. When compared with VA- File clustering results in more compact data set as it uses inter-dimensional correlations. However, pruning of unwanted clusters is important. The existing pruning techniques are based on bounding rectangles, bounding hyper spheres have problems in NN search. To overcome this problem Ramaswamy and Rose proposed an approach known as adaptive cluster distance bounding for high dimensional indexing which also includes an efficient spatial filtering. In this paper we implement this high-dimensional indexing approach. We built a prototype application to for proof of concept. Experimental results are encouraging and the prototype can be used in real time applications. Index Terms–Clustering, high dimensional indexing, similarity measures, and multimedia databases ---------------------------------------------------------------------***------------------------------------------------------------------------- 1. INTRODUCTION Data mining has been around for many years for programmatically analyzing huge amount of historical data. Clustering is one of the data mining techniques which is widely used to group similar objects into a clusters. However, clustering is not easy with high-dimensional data which is being produced in many domains such as digital multimedia, CAD/CAM systems, Geographical Information Systems (GIS), stock markets, and medical imaging. The high dimensional data is very huge ranging from 100 GB to 100 TB or more. Enterprises have to deal with such huge data. Such data is accessed using NN queries and spatial queries. Many researches came into existence on queries high dimensional data [1], [2]. These techniques use similarity measures like Euclidean Distance while making clustering. In [3] search performance has been improved using R-tree like structures. All the existing techniques expect the data sets to have uniform distribution of correlations in order to overcome the problem of “curse of dimensionality”. As the nearest and distant neighbors are not distinguishable, indexing such datasets is not possible. The real world data sets on the other hand have non-uniform distribution of correlations. Therefore these data sets can be indexed really using techniques such as Euclidean Distance. The usage of ED is a significant research activity which is used in many applications including content – based image retrieval [4], [5]. In this paper we focused on the real world datasets which are high-dimensional in nature for indexing. 2. PRIOR WORKS In case of low dimensional data techniques such as hyper- rectangles [6], [7], hyper spheres [8] or combination of both [9] were used for NN searches and effective data retrieval. Manyresearches were found in the literature to deal with high- dimensional data. Multi-dimensional indexing always outperforms low dimensional structures as they can access data quickly when compared with sequential scan. However, Weber et al. [10] proved that when dimensionality is more than 10, sequential scan is better than using indexing. The degradation of performance is due to the “curse of dimensionality” concept proposed in [11]. VA-files concept became popular to overcome thisproblem. The VA-File technique divides space into many hyper-rectangular cells and the approximation file holds strings pertaining to encoded bits that represent non-empty cell locations. The vector approximation ultimately leads to scalar quantization which overcomes curse of dimensionality. LDR [12] uses non-linear approximation in order to perform sequential scan. It is achieved by using dimensionality reduction and also clustering. There are some hybrid methods such as IQ-Tree [13] and A-Tree [14] that makes use of both VA and tree based index. Other methods like Pyramid Tree [15], iDistnce [16] and LDC [17] reduce transformations based on local dimensionality. A reference point is used to evaluate partitions made on dataset. Each partition has feature vectors that are indexed using their centroid-distance. When queries are made the spheres increase gradually until the cluster sphere is intersected. For quality reasons, the query processing
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 167 identifies centroid – distances. The radius for search is adjusted in such a way that the NN query returns exact results. Another experiment is made using approximation layer in LDC [17] by creating a box identification code which represents resident points. Once set of candidates are identified, then the B+ tree is used to filter out unwanted points. The resultant elements are finally retrieved in the NN result. Care is taken to control search space in order to reduce disk IO and also ensure that accurate results are retrieved by NN queries. 3. CLUSTER DISTANCE BOUNDING Our procedure to measure distances to clusters is based on the cluster distance bounding proposed by Ramaswamy and Rose [18]. The architectural overview of the frameworkproposed by them is as shown in fig. 1. Fig. 1 –Architecture of Proposed Index Structure (excerpt from [18]) As seen in fig. 1the index structure contains array of cluster centroids which point to the cluster locations. The whole structure is like an index in book which will improve the query processing speed. In the same fashion, the NN queries and other queries on high-dimensional data can be processed faster. The data objects are grouped into number of clusters namely Voronoi clusters. The cluster-based indexing makes it more flexible and adaptable thus making the whole system scalable. This architecture makes the query retrieval very efficient and faster than other existing clustering algorithms and related indexes. Their algorithms are used to implement the prototype which demonstrates the high dimensional indexing process. The first algorithm we used in the prototype application is to make voronoi clusters. The algorithms are presented in fig. 2, fig. 3 and fig. 4. More information about the algorithms and notations used inthe algorithms can be found in [18]. It is as shown in fig. 1 Algorithm 1 VORNOI-CLUSTERS(X, K) 1: // Generic clustering algorithm returns // K cluster centroids {cm}K =1 ---- GenericCluster(X,K) 2: set l=0, X1= Ø, X2= Ø…… XK= Ø 3: while l< |X| do 4: l=l+1 5: //Find the centroid nearest to data element X1 K=arg min d(Xl , Cm) 6: //Move Xl to the corresponding Voronoi partition XK = XK Ụ {Xl} 7: end while 8: return {Xm K =1}, {cm}K =1 Fig. 2–Algorithm for Voronoi Clusters (excerpt from [18]) As can be seen in fig. 2, the algorithm takes a dataset and K value as input and generates and returns required Voronoi clusters. Initially a generic clustering algorithm like K-means is used to get cluster centroids. The algorithm further processes and finally returns Voronoi clusters. In fig. 3 algorithm 2 is presented. It is meant for making kNN search. Algorithm 2 KNN-SEARCH(q) 1: //Initialize set FLAG=0, count = 0,N = 0, kNN = φ 2: //Evaluate query-cluster distance bounds dLB[] ←HyperplaneBound(q) 3: //Sort the query-cluster distance bounds in ascending //order {dsort LB[], o[]} ←SortArray(dLB,’ascend’) 4: while FLAG==0 do 5: count = count + 1 6: //Find the kNNs upto current cluster {Nc, kNN} ←FindkNNsIn(q,Xo[count], kNN) 7: //Update number of elements scanned N = N + Nc 8: //Find the kNN radius dkNN=Farthest(q, kNN) 9: if count < K then 10: if N >k then 11: if dkNN< dsort LB [count + 1] then 12: set FLAG=1 //kNNs found, search ends
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 168 13: end if 14: end if 15: else 16: set FLAG=1 //all clusters scanned, search ends 17: end if 18: end while 19: return kNN Fig. 3 – kNN Search Algorithm As seen in fig. 3, it is evident that the kNN algorithm is meant for performing k-Nearest Neighbor Search. It takes a query as input and returns k-Nearest Neighbors as possible search results. Algorithm 3 is meant for providing some support to algorithm 2. Algorithm 3 FindkNNsIn(q,A, I) 1: set Nc= 0, F=Open(A), kNN = I 2: while !(EOF(F)) do 3: // Load the next cluster page C=LoadNextPage(F) 4: //Merge kNN list with current page Xcand = C Ụ kNN 5: //Find the kNNs within the candidate list kNN[] ←FindkNN(q,Xcand) 6: //Update number of elements scanned Nc= Nc+ |C| 7: end while 8: return Nc, kNN Fig. 4–Algorithm for some intermediary functionality As seen in fig. 4, it is evident that it performs some functionality and returns intermediary results which are used by its caller. Its caller is algorithm 2 that is meant for k-NN search which returns k-Nearest Neighbors that satisfy the given query. 4. RESULTS We built a prototype application to demonstrate the proof of concept of the paper. We have used many real time data sets to apply the index structure proposed in [18]. The datasets used in experiments include SENSORS, HISTOGRAM, AERIAL, CORTINA and BIO-RETINA. The details of dimensions, number of vectors and size in terms of pages of these data sets are presented in table 1. Table 1- Dataset details As can be seen in table 1, the dimensionality of data sets and the number of vectors involved in each dataset are given. The environment used for developing prototype application includes a PC with 4GB RAM, Core 2 Dual processor running Windows XP operating system. Java platform is used to build the application. NetBeans is used as an IDE (Integrated Development Environment). Experiments are made in terms of number of sequential pages scanned, number of random IO operations, with KNNs and various datasets. Fig. 5–IO Performance of Distance Bounds (BIO-RETINA dataset) As can be seen in fig.5, the horizontal axis represents number of random IOs while the vertical axis represents number of sequential pages. As shown in results hyperlane bounds outperform the other bounds. Fig. 6 – IO Performance of Distance Bounds (AERIAL dataset) 0 500 1000 1500 2000 2500 3000 3500 1 2 3 4 5 6 7 8 NumberofSequential Pages No.Of Random IOS (log scale) MBR Sphere Bound Hyperplane(f wd.comp.) Hyperplane(f ull) 0 2000 4000 6000 8000 1 2 3 4 5 6 7 8 9 Numberof SequentialPages No.of Random IOS(log scale) MBR Sphere Bound Hyperplane( fwd.comp.) Hyperplane( full)
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 169 As can be seen in fig.6, the horizontal axis represents number of random IOs while the vertical axis represents number of sequential pages. As shown in results hyperlane bounds outperform the other bounds. Fig. 7 – IO Performance of Distance Bounds (SENSORS dataset) As can be seen in fig. 7, the horizontal axis represents number of random IOs while the vertical axis represents number of sequential pages. As shown in results hyperlane bounds outperform the other bounds. Fig. 8 – IO Performance of Distance Bounds (SENSORS dataset) As can be seen in fig. 8, the horizontal axis represents number of random IOs while the vertical axis represents number of sequential pages. As shown in results hyperlane bounds outperform the other bounds. CONCLUSIONS Dealing with real multi-dimensional datasets that have correlations distributed in non-uniform fashion is not easy. Indexing such data is a challenging task. The existing indexing methods used VA-File through scalar quantization. However, this approach is proved to be suboptimal. Therefore, in this paper, we implement a new approach proposed by Ramaswamy and Rose [18] which groups datasets into Voronoi clusters. It is achieved by using similarity measures such as Mahalanobis and Euclidean. This has resulted in the reduction of IOs. We also built a prototype application which demonstrates the utility of the proposed approach. We made experiments using various multi-dimensional data sets. The empirical results revealed that the indexing approach followed in the prototype is effective and can be used in the real world applications. REFERENCES [1] K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is“nearest neighbor” meaningful?” in ICDT, 1999, pp. 217–235. [2] C. C. Aggarwal, A. Hinneburg, and D. A. Keim, “On the surprisingbehavior of distance metrics in high dimensional spaces,” in ICDT, 2001,pp. 420–434. [3] B. U. Pagel, F. Korn, and C. Faloutsos, “Deflating the dimensionalitycurse using multiple fractal dimensions,” in ICDE, 2000, pp. 589–598. [4] T. Huang and X. S. Zhou, “Image retrieval with relevance feedback:From heuristic weight adjustment to optimal learning methods,” in ICIP,vol. 3, 2001, pp. 2–5. [5] J. Davis, B. Kulis, P. Jain, S. Sra, and I. Dhillon, “Information-theoreticmetric learning,” in ICML, 2007, pp. 209–216. [6] A. Guttman, “R-trees: A dynamic index structure for spatial searching.”in SIGMOD, 1984, pp. 47–57. [7] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, “The R*-tree:an efficient and robust access method for points and rectangles,” inSIGMOD, 1990, pp. 322–331. [8] D. A. White and R. Jain, “Similarity indexing with the SS- tree,” inICDE, 1996, pp. 516–523. [9] N. Katayama and S. Satoh, “The SR-tree: An index structure for highdimensionalnearest neighbor queries.” in SIGMOD, May 1997, pp. 369–380. [10] R. Weber, H. Schek, and S. Blott, “A quantitative analysis and performancestudy for similarity-search methods in high-dimensional spaces.”in VLDB, August 1998, pp. 194– 205. [11] R. Bellman, Adaptive Control Processes: A Guided Tour. Princeton,NJ: Princeton University Press, 1961. [12] K. Chakrabarti and S. Mehrotra, “Local dimensionality reduction: A newapproach to indexing high dimensional spaces.” in VLDB, September2000, pp. 89–100. [13] S. Berchtold, C. Bohm, H. V. Jagadish, H. P. Kriegel, and J. Sander,“Independent Quantization: An index compression technique for highdimensionaldata spaces,” in ICDE, 2000, pp. 577–588. [14] Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima, “The A-tree: Anindex structure for high-dimensional spaces using relative approximation,”in VLDB, September 2000, pp. 516–526. 0 200 400 600 800 1000 1200 1400 1 2 3 4 NumberofSequentialPages No.of Random IOS(log scale) MBR Sphere Bound Hyperplane(fwd.c omp.) Hyperplane(full) 0 0.5 1 1.5 2 2.5 1 2 3 4 5 6 7 NumberofSequential Pages No.of Random IOS(log scale) 10NN 20NN 50NN
  • 5. IJRET: International Journal of Research in Engineering and Technology __________________________________________________________________________________________ Volume: 02 Issue: 08 | Aug-2013, Available @ [15] S. Berchtold, C. Bohm, and H. Kriegel, “The Pyramid technique:Towards breaking the curse of dimensionality,” in SIGMOD, 1998, pp.142–153. [16] C. Yu, B. C. Ooi, K. L. Tan, and H. V. Jagadish, “Indexing the distance:An efficient method to knn processing.” in VLDB, September 2001, pp.421 [17] N. Koudas, B. C. Ooi, H. T. Shen, and A. K. H. Tung, “LDC: Enablingsearch by partial distance in a hyper dimensional space.” in ICDE, 2004,pp. 6–17. [18] Sharadh Ramaswamyand Kenneth Rose Cluster Distance Bounding for High Dimensional Indexing”. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.23, NO. 6, June 2011. P1 BIOGRAPHIES: Aretty Narayana has completed B.Tech (E.C.E) from Srinivas Reddy Institute of Technology and pursuing M.Tech (C.S.E) in DRK College of Engineering and Technology, JNTUH, Hyderabad, Andhra Pradesh, India. His main research interest includes Data Mining, Database and DWH. M. Srinivasa Rao is working as an Associate Professor in DRK College of Engineering and Technology, JNTUH, Hyderabad, Andhra Pradesh, India. He is pursuing Ph.D in Information Security. He has completed M.Tech (C.S.E) from JNTUH. His main research interest includes Information Security and Computer Ad Networks. Dr. R. V. Krishnaiah, did M.Tech (EIE) from NIT Waranagal, MTech (CSE) form JNTU Ph.D, from JNTU Ananthapur, He has memberships in professional bodies MIE, MIETE, MISTE. His main research include Image Processing, Security systems, Sensors, Intelligent Systems, Computer networks, Data mining, Software Engineering, network protection and security control. He has published many papers and Editorial Member and Reviewer for some national and international journals. IJRET: International Journal of Research in Engineering and Technology eISSN: __________________________________________________________________________________________ 2013, Available @ http://www.ijret.org ] S. Berchtold, C. Bohm, and H. Kriegel, “The Pyramid- que:Towards breaking the curse of dimensionality,” in ] C. Yu, B. C. Ooi, K. L. Tan, and H. V. Jagadish, “Indexing the distance:An efficient method to knn , September 2001, pp.421–430. Ooi, H. T. Shen, and A. K. H. Tung, “LDC: Enablingsearch by partial distance in a hyper- 17. ] Sharadh Ramaswamyand Kenneth Rose, “Adaptive Cluster Distance Bounding for High Dimensional Indexing”. S ON KNOWLEDGE AND DATA ENGINEERING, VOL.23, NO. 6, June 2011. P1-15. has completed B.Tech (E.C.E) from Srinivas Reddy Institute of Technology and pursuing M.Tech (C.S.E) in DRK College of Engineering and Technology, JNTUH, Hyderabad, Andhra Pradesh, India. His main research interest includes Data Mining, Database and DWH. is working as an Associate Professor in DRK College of Engineering and Technology, JNTUH, Hyderabad, Andhra Pradesh, India. He is pursuing Ph.D in Information Security. He has completed M.Tech (C.S.E) from JNTUH. His main research st includes Information Security and Computer Ad-Hoc , did M.Tech (EIE) from CSE) form JNTU, , from JNTU Ananthapur, He has memberships in professional bodies MIE, MIETE, MISTE. His main research interests include Image Processing, Security systems, Sensors, Intelligent Systems, Computer networks, Data mining, Software Engineering, network protection and security control. He has published many papers and Editorial Member nal and international journals. eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ 170