SlideShare a Scribd company logo
1 of 3
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 04 Issue: 02 December 2015 Page No.75-77
ISSN: 2278-2419
75
Agglomerative Clustering Onvertically Partitioned
Data–Distributed Database Mining
R.Senkamalavalli1
,T.Bhuvaneswari2
1
Research Scholar,Department of Computer Science and Engg.,SCSVMV University, Enathur,Kanchipuram
2
Assistant Professor,Department of Computer Science and Applications,Queen Mary’s College (autonomous),
Mylapore, Chennai
Email: sengu_cool@yahoo.com, t_bhuvaneswari@yahoo.com
Abstract—Mining distributed databases is emerging as a
fundamental computational problem. A common approach for
mining distributed databases is to move all of the data from
each database to a central site and a single model is built.
Privacy concerns in many application domains prevents
sharing of data, which limits data mining technology to
identify patterns and trends from large amount of data.
Traditional data mining algorithms have been developed within
a centralized model. However, distributed knowledge
discovery has been proposed by many researchers as a solution
to privacy preserving data mining techniques. By vertically
partitioned data, each site contains some attributes of the
entities in the environment.In this paper, we present a method
for Agglomerative clustering algorithm in situations where
different sites contain different attributes for a common set of
entities for verticallypartitioned data. Using association rules
data are partitioned into vertically.
Keywords - Datamining; agglomerative clustering; distributed
data;association rule.
I. INTRODUCTION
Data Mining is the technique used by analysts to find out the
hidden and unknown pattern from the collection of data.
Although the organizations gather large volumes of data, it is
of no use if "knowledge" or "beneficial information" cannot be
inferred from it. Unlike the statistical methods the data mining
techniques extracts interesting information. The operations like
classification, clustering, association rule mining, etc. are used
for data mining purposes.The term data distribution means the
manner in which the data has been stored at the sites (DB
servers). Primarily there are two types of data distribution i)
Centralized Data and ii) Partitioned Data. In a centralized data
environment all data is stored at single site. While in
distributed environment all data is distributed among different
sites. Distributed data can further be divided ini) Horizontally
and ii) Vertically distributed environments (Fig.1). In
horizontal distribution the different sites stores the same
attributes for different sets of records. In vertical distribution
the sites stores different attributes for the same set of
records.By vertically partitioned, we mean that each site
contains some elements of a transaction. Using the traditional
online example, one site may contain book purchases, while
another has electronic purchases. Using a key such as credit
card number and date, we can join these to identify
relationships between purchases of books and electronic goods.
However, this discloses the individual purchases at each site,
possibly violating consumer privacy agreements.
Fig. 1. Classification of dataset
Clustering is the method by which like records are grouped
together. Usually this is done to give the end user a high level
view of what is going on in the database. Clustering is
sometimes used to mean segmentation. Technically it can be
defined as the task of grouping a set of objects in such a
manner that objects in same group (cluster) are more similar to
each other than to those in other groups (clusters). It is main
task carried out for machine learning, pattern recognition,
information retrieval, etc. Clustering can be partitioned in i)
Hierarchical ii) Partition Based iii) Density Based Clustering
(Fig.2). The hierarchy of clusters is usually viewed as a tree
where the smallest clusters merge together to create the next
highest level of clusters and those at that level merge together
to create the next highest level of clusters. There are two main
types of hierarchical clustering algorithms:
 Agglomerative - Agglomerative clustering techniques start
with as many clusters as there are records where each
cluster contains just one record. The clusters that are
nearest each other are merged together to form the next
largest cluster. This merging is continued until a hierarchy
of clusters is built with just a single cluster containing all
the records at the top of the hierarchy(Fig.3).
 Divisive - Divisive clustering techniques take the opposite
approach from agglomerative techniques. These
techniques start with all the records in one cluster and then
try to split that cluster into smaller pieces and then in turn
to try to split those smaller pieces.
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 04 Issue: 02 December 2015 Page No.75-77
ISSN: 2278-2419
76
Fig. 2. Categories of Clustering
Fig. 3: Concept of Clustering
II. RELATED WORK
Vaidya et.al [1] presents the k-means technique to preserve
privacy of vertically partitioned data.Hwanjo Yu et.al [2]
suggests an algorithm for privacy preservation for Support
Vector Machines(SVM) based classification using local and
global models. Local models are relevant to each participating
party that is not disclosed to others while generating global
model jointly. Global model remains the same for every party
which is then used for classifying new data objects.Liu et.al [3]
represents two protocols for privacy preserving clustering to
work upon horizontally and vertically partitioned data
separately.Inan et.al [4] suggest methods for constructing
dissimilarity matrices of objects from different sites in privacy
preserving manner. Krishna Prasadet.al [5], mentioned a
procedure for securely running BIRCH algorithm over
arbitrarily partitioned database. Secure protocols are mentioned
in it for distance metrics and procedure is suggested for using
these metrics in securely computing clusters. Pinkas[6]
represents various cryptographic techniques for privacy
preserving. Vaidya[7] presents various techniques of privacy
preserving for different procedures of data mining. An
algorithm is suggested for privacy preserving association rules.
A subroutine in this work suggests procedure for securely
finding the closest cluster in k-means clustering for privacy
preservation.Nishant[8] suggests scaling transformation on
centralized data to preserve privacy for clustering. K-means
clustering [9, 10] is a simple technique to group items into k
clusters. The basic idea behind k-means clustering is as
follows: Each item is placed in its closest cluster, and the
cluster centers are then adjusted based on the data placement.
This repeats until the positions stabilize.
III. AGGLOMERATIVE CLUSTERING ON
VERTICALLY PARTITIONED DATA
The proposed work is about vertically portioned data mining
using clustering technique. In this system, we consider the
heterogeneous database scenario considered a vertical
partitioning of the database between two parties A and
B(Fig.4). The association rule mining problem can be formally
stated as follows: Let I = i1,i2, …ip be a set of literals, called
items. Let D be a set of transactions, where each transaction T
is a set of items such that T I. Associated with each
transaction is a unique identifier, called its TID (Transaction
Identifier). We say that a transaction T contains X, a set of
some items in I, if X T. An association rule is an implication
of the form, X Y, where X  I, Y I, and XY =. The rule
XY holds in the transaction set D with confidence c if c% of
transactions in D that contain X also contain Y . The rule X 
Y has support s in the transaction set D if s% of transactions in
D contain X Y.In this clustering of the Databases will be
done so that the responsibility of finding a frequent n item set
can be distributed over clusters which will increase the
response time as well as decrease the number of messages need
be passed and thus avoiding the bottleneck around central
site.The vertically partitioned data are clustered into n clusters
using agglomerative clustering algorithm. Then these clusters
are separated into n databases. Finally, this information can be
stored in external data base for further usage.
Fig.4. Proposedsystem architecture
IV. AGGLOMERATIVE CLUSTERING ALGORITHM
Agglomerative hierarchical clustering is a bottom-up clustering
method where clusters have sub-clusters, which in turn have
sub-clusters, etc.A good clustering method will produce high
quality clusters with high intra-class similarity and low inter-
class similarity. The quality of a clustering result depends on
both the similarity measure used by the method and its
implementation. The quality of a clustering method is also
measured by its ability to discover some or all of the hidden
patternsAlgorithmic steps-Let X = {x1, x2, x3, ..., xn} be the
set of data points.
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 04 Issue: 02 December 2015 Page No.75-77
ISSN: 2278-2419
77
Step 1:Begin with the disjoint clustering having level L(0) = 0
and sequence number m = 0.
Step 2:Find the least distance pair of clusters in the current
clustering, say pair (r), (s), according to d[(r),(s)] = min
d[(i),(j)] where the minimum is over all pairs of clusters in the
current clustering.
Step 3:Increment the sequence number: m = m +1.Merge
clusters (r) and (s) into a single cluster to form the next
clustering m. Set the level of this clustering to L(m) =
d[(r),(s)].
Step 4:Update the distance matrix, D, by deleting the rows and
columns corresponding to clusters (r) and (s) and adding a row
and column corresponding to the newly formed cluster. The
distance between the new cluster, denoted (r,s) and old
cluster(k) is defined in this way: d[(k), (r,s)] = min (d[(k),(r)],
d[(k),(s)]).
Step 5:If all the data points are in one cluster then stop, else
repeat from step 2.
V. FINDINGS AND RESULTS
The workwas implemented in one way clustering and
efficiency of data obtained was not accurate. This proposed
method can be implemented in twoway clustering technique
which will give better results.
VI. CONCLUSION
This proposed architecture has been implemented in future for
further development using the data mining Toolbox under
WEGA Software.Clustering is an unsupervised learning
technique, and as such, there is no absolutely correct answer.
For this reason and depending on the particular application of
the clustering, fewer or greater numbers of clusters may be
desired. The future enhancement of this is to add global
caching as caching can be used since data in warehouse tend to
change a little over time. Techniques of clustering the
databases can be debated upon, since more efficient the
division of sites, more efficient will be association rules.
REFERENCES
[1] J. Vaidya and C. Clifton, “Privacy preserving K-means clustering over
vertically partitioned data”, Proceedings of the ninth ACMSIGKDD
international conference on Knowledge discovery and data mining
Washington, DC, pp. 206-215, august 24-272003.
[2] H. Yu, J.Vaidya and X. Jiang, “Privacy preserving SVM classification on
vertically partitioned data ", AdvinKnowledge Disc Data Min. vol. 3918,
pp. 647-656, 2006
[3] J. Liu, J. Luo, J. Z. Huang and L. Xiong, "Privacy preserving distributed
DBSCAN clustering", PAIS 2012, vol 6, pp.177-185.
[4] A. Inan, S. Kaya, Y.Saygin, E.Savas, A.Hintoglu and A. Levi , "Privacy
preserving clustering on horizontally partitioneddata", PDM, vol 63, pp.
646-666, 2006.
[5] P. Krishna Prasad and C. PanduRangan, "Privacy preserving BIRCH
algorithm for clustering over arbitrarily partitioned databases", pp. 146-
157, August 2007.
[6] B.Pinkas, "Cryptographic techniques for privacy preserving data
mining".Interntional Journal of Applied Cryptography (IJACT) 3(1): 21-
45 ,2013.
[7] J. S.Vaidya. “A thesis on privacy preserving data mining over Vertically
Partitioned Data". (Unpublished)
[8] P.KhatriNishant, G.PreetiandP. Tusal, “Privacy preserving clustering on
centralized data through scaling transfermation”.International journal of
computer engineering&technology (IJCET).vol4,Issue3,pp449-
454,2013.
[9] R. Duda and P. E. Hart. Pattern classification and scene analysis. Wiley,
New York.1973.
[10] K. Fukunaga. Introduction to statistical pattern recognition. Academic
Press, San Diego, CA, 1990.
[11] S. Paul, “An optimized distributed association rule mining algorithm in
parallel and distributed data mining With Xml data for improved
response time”, Int J Comp Sci Info Technol.vol. 2, pp. 2, April 2010.
[12] R. J. Gil-Garcia, J. M. Badia-Contelles, and A. Pons-Porrata. Extended
star clustering algorithm. Lecture Notes Comp Sci.vol. 2905, pp.480-487,
2003.
[13] G. Lance and W.Williams. A general theory of classificatory sorting
strategies. 1:Hierarchical systems. Comp J, vol. 9, pp.373-380,1967.
[14] B. Larsen and C. Aone. Fast and effective text mining using linear-time
document clustering. KDD vol. 99, pp. 16-22, 1999.
[15] A. Pons-Porrata, R. Berlanga-Llavori, and J. Ruiz-Shulcloper. On-line
event and topic detection by using thecompact sets clustering algorithm. J
Intelli Fuzzy Sys, vol. 3-4, pp. 185-194, 2002.
[16] K. Wagstaff and C. Cardie, “Clustering with instance-level constraints”,
Proceedings of the 17th International Conference on Machine Learning
(ICML 2000), Stanford, CA, pp. 1103-1110, 2000.
[17] I. Davidson and S. S. Ravi, “Clustering with constraints and the k-Means
algorithm”, the 5th SIAM Data Mining Conf. 2005.
[18] I. Davidson and S. S. Ravi, “Hierarchical clustering with constraints:
Theory and practice”, the 9th European Principles and Practice of KDD,
PKDD 2005.
[19] I. Davidson and S. S. Ravi, “Intractability and clustering with
constraints”, Proceedings of the 24th international conference on
Machine learning, 2007.

More Related Content

Similar to Agglomerative Clustering Onvertically Partitioned Data–Distributed Database Mining

Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data miningeSAT Journals
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data miningeSAT Publishing House
 
Bs31267274
Bs31267274Bs31267274
Bs31267274IJMER
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis IntroductionPrasiddhaSarma
 
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data StreamA Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data StreamIIRindia
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957IJMER
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporaryprjpublications
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)Pratik Meshram
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
 
An Analysis On Clustering Algorithms In Data Mining
An Analysis On Clustering Algorithms In Data MiningAn Analysis On Clustering Algorithms In Data Mining
An Analysis On Clustering Algorithms In Data MiningGina Rizzo
 
MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATION
MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATIONMDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATION
MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATIONcscpconf
 
Web Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering AnalysisWeb Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering Analysisinventy
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
 
B colouring
B colouringB colouring
B colouringxs76250
 

Similar to Agglomerative Clustering Onvertically Partitioned Data–Distributed Database Mining (20)

Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
Bs31267274
Bs31267274Bs31267274
Bs31267274
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
 
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data StreamA Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
 
Az36311316
Az36311316Az36311316
Az36311316
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporary
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity Measure
 
I43055257
I43055257I43055257
I43055257
 
An Analysis On Clustering Algorithms In Data Mining
An Analysis On Clustering Algorithms In Data MiningAn Analysis On Clustering Algorithms In Data Mining
An Analysis On Clustering Algorithms In Data Mining
 
MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATION
MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATIONMDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATION
MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATION
 
Web Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering AnalysisWeb Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering Analysis
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 
B colouring
B colouringB colouring
B colouring
 

More from IIRindia

An Investigation into Brain Tumor Segmentation Techniques
An Investigation into Brain Tumor Segmentation Techniques An Investigation into Brain Tumor Segmentation Techniques
An Investigation into Brain Tumor Segmentation Techniques IIRindia
 
E-Agriculture - A Way to Digitalization
E-Agriculture - A Way to DigitalizationE-Agriculture - A Way to Digitalization
E-Agriculture - A Way to DigitalizationIIRindia
 
A Survey on the Analysis of Dissolved Oxygen Level in Water using Data Mining...
A Survey on the Analysis of Dissolved Oxygen Level in Water using Data Mining...A Survey on the Analysis of Dissolved Oxygen Level in Water using Data Mining...
A Survey on the Analysis of Dissolved Oxygen Level in Water using Data Mining...IIRindia
 
Kidney Failure Due to Diabetics – Detection using Classification Algorithm in...
Kidney Failure Due to Diabetics – Detection using Classification Algorithm in...Kidney Failure Due to Diabetics – Detection using Classification Algorithm in...
Kidney Failure Due to Diabetics – Detection using Classification Algorithm in...IIRindia
 
Silhouette Threshold Based Text Clustering for Log Analysis
Silhouette Threshold Based Text Clustering for Log AnalysisSilhouette Threshold Based Text Clustering for Log Analysis
Silhouette Threshold Based Text Clustering for Log AnalysisIIRindia
 
Analysis and Representation of Igbo Text Document for a Text-Based System
Analysis and Representation of Igbo Text Document for a Text-Based SystemAnalysis and Representation of Igbo Text Document for a Text-Based System
Analysis and Representation of Igbo Text Document for a Text-Based SystemIIRindia
 
A Survey on E-Learning System with Data Mining
A Survey on E-Learning System with Data MiningA Survey on E-Learning System with Data Mining
A Survey on E-Learning System with Data MiningIIRindia
 
Image Segmentation Based Survey on the Lung Cancer MRI Images
Image Segmentation Based Survey on the Lung Cancer MRI ImagesImage Segmentation Based Survey on the Lung Cancer MRI Images
Image Segmentation Based Survey on the Lung Cancer MRI ImagesIIRindia
 
The Preface Layer for Auditing Sensual Interacts of Primary Distress Conceali...
The Preface Layer for Auditing Sensual Interacts of Primary Distress Conceali...The Preface Layer for Auditing Sensual Interacts of Primary Distress Conceali...
The Preface Layer for Auditing Sensual Interacts of Primary Distress Conceali...IIRindia
 
Feature Based Underwater Fish Recognition Using SVM Classifier
Feature Based Underwater Fish Recognition Using SVM ClassifierFeature Based Underwater Fish Recognition Using SVM Classifier
Feature Based Underwater Fish Recognition Using SVM ClassifierIIRindia
 
A Survey on Educational Data Mining Techniques
A Survey on Educational Data Mining TechniquesA Survey on Educational Data Mining Techniques
A Survey on Educational Data Mining TechniquesIIRindia
 
V5_I2_2016_Paper11.docx
V5_I2_2016_Paper11.docxV5_I2_2016_Paper11.docx
V5_I2_2016_Paper11.docxIIRindia
 
A Study on MRI Liver Image Segmentation using Fuzzy Connected and Watershed T...
A Study on MRI Liver Image Segmentation using Fuzzy Connected and Watershed T...A Study on MRI Liver Image Segmentation using Fuzzy Connected and Watershed T...
A Study on MRI Liver Image Segmentation using Fuzzy Connected and Watershed T...IIRindia
 
A Clustering Based Collaborative and Pattern based Filtering approach for Big...
A Clustering Based Collaborative and Pattern based Filtering approach for Big...A Clustering Based Collaborative and Pattern based Filtering approach for Big...
A Clustering Based Collaborative and Pattern based Filtering approach for Big...IIRindia
 
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...IIRindia
 
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...IIRindia
 
A Review of Edge Detection Techniques for Image Segmentation
A Review of Edge Detection Techniques for Image SegmentationA Review of Edge Detection Techniques for Image Segmentation
A Review of Edge Detection Techniques for Image SegmentationIIRindia
 
Leanness Assessment using Fuzzy Logic Approach: A Case of Indian Horn Manufac...
Leanness Assessment using Fuzzy Logic Approach: A Case of Indian Horn Manufac...Leanness Assessment using Fuzzy Logic Approach: A Case of Indian Horn Manufac...
Leanness Assessment using Fuzzy Logic Approach: A Case of Indian Horn Manufac...IIRindia
 
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...IIRindia
 
Survey on Segmentation Techniques for Spinal Cord Images
Survey on Segmentation Techniques for Spinal Cord ImagesSurvey on Segmentation Techniques for Spinal Cord Images
Survey on Segmentation Techniques for Spinal Cord ImagesIIRindia
 

More from IIRindia (20)

An Investigation into Brain Tumor Segmentation Techniques
An Investigation into Brain Tumor Segmentation Techniques An Investigation into Brain Tumor Segmentation Techniques
An Investigation into Brain Tumor Segmentation Techniques
 
E-Agriculture - A Way to Digitalization
E-Agriculture - A Way to DigitalizationE-Agriculture - A Way to Digitalization
E-Agriculture - A Way to Digitalization
 
A Survey on the Analysis of Dissolved Oxygen Level in Water using Data Mining...
A Survey on the Analysis of Dissolved Oxygen Level in Water using Data Mining...A Survey on the Analysis of Dissolved Oxygen Level in Water using Data Mining...
A Survey on the Analysis of Dissolved Oxygen Level in Water using Data Mining...
 
Kidney Failure Due to Diabetics – Detection using Classification Algorithm in...
Kidney Failure Due to Diabetics – Detection using Classification Algorithm in...Kidney Failure Due to Diabetics – Detection using Classification Algorithm in...
Kidney Failure Due to Diabetics – Detection using Classification Algorithm in...
 
Silhouette Threshold Based Text Clustering for Log Analysis
Silhouette Threshold Based Text Clustering for Log AnalysisSilhouette Threshold Based Text Clustering for Log Analysis
Silhouette Threshold Based Text Clustering for Log Analysis
 
Analysis and Representation of Igbo Text Document for a Text-Based System
Analysis and Representation of Igbo Text Document for a Text-Based SystemAnalysis and Representation of Igbo Text Document for a Text-Based System
Analysis and Representation of Igbo Text Document for a Text-Based System
 
A Survey on E-Learning System with Data Mining
A Survey on E-Learning System with Data MiningA Survey on E-Learning System with Data Mining
A Survey on E-Learning System with Data Mining
 
Image Segmentation Based Survey on the Lung Cancer MRI Images
Image Segmentation Based Survey on the Lung Cancer MRI ImagesImage Segmentation Based Survey on the Lung Cancer MRI Images
Image Segmentation Based Survey on the Lung Cancer MRI Images
 
The Preface Layer for Auditing Sensual Interacts of Primary Distress Conceali...
The Preface Layer for Auditing Sensual Interacts of Primary Distress Conceali...The Preface Layer for Auditing Sensual Interacts of Primary Distress Conceali...
The Preface Layer for Auditing Sensual Interacts of Primary Distress Conceali...
 
Feature Based Underwater Fish Recognition Using SVM Classifier
Feature Based Underwater Fish Recognition Using SVM ClassifierFeature Based Underwater Fish Recognition Using SVM Classifier
Feature Based Underwater Fish Recognition Using SVM Classifier
 
A Survey on Educational Data Mining Techniques
A Survey on Educational Data Mining TechniquesA Survey on Educational Data Mining Techniques
A Survey on Educational Data Mining Techniques
 
V5_I2_2016_Paper11.docx
V5_I2_2016_Paper11.docxV5_I2_2016_Paper11.docx
V5_I2_2016_Paper11.docx
 
A Study on MRI Liver Image Segmentation using Fuzzy Connected and Watershed T...
A Study on MRI Liver Image Segmentation using Fuzzy Connected and Watershed T...A Study on MRI Liver Image Segmentation using Fuzzy Connected and Watershed T...
A Study on MRI Liver Image Segmentation using Fuzzy Connected and Watershed T...
 
A Clustering Based Collaborative and Pattern based Filtering approach for Big...
A Clustering Based Collaborative and Pattern based Filtering approach for Big...A Clustering Based Collaborative and Pattern based Filtering approach for Big...
A Clustering Based Collaborative and Pattern based Filtering approach for Big...
 
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...
Hadoop and Hive Inspecting Maintenance of Mobile Application for Groceries Ex...
 
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
 
A Review of Edge Detection Techniques for Image Segmentation
A Review of Edge Detection Techniques for Image SegmentationA Review of Edge Detection Techniques for Image Segmentation
A Review of Edge Detection Techniques for Image Segmentation
 
Leanness Assessment using Fuzzy Logic Approach: A Case of Indian Horn Manufac...
Leanness Assessment using Fuzzy Logic Approach: A Case of Indian Horn Manufac...Leanness Assessment using Fuzzy Logic Approach: A Case of Indian Horn Manufac...
Leanness Assessment using Fuzzy Logic Approach: A Case of Indian Horn Manufac...
 
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
 
Survey on Segmentation Techniques for Spinal Cord Images
Survey on Segmentation Techniques for Spinal Cord ImagesSurvey on Segmentation Techniques for Spinal Cord Images
Survey on Segmentation Techniques for Spinal Cord Images
 

Recently uploaded

Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 

Recently uploaded (20)

Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Agglomerative Clustering Onvertically Partitioned Data–Distributed Database Mining

  • 1. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications Volume: 04 Issue: 02 December 2015 Page No.75-77 ISSN: 2278-2419 75 Agglomerative Clustering Onvertically Partitioned Data–Distributed Database Mining R.Senkamalavalli1 ,T.Bhuvaneswari2 1 Research Scholar,Department of Computer Science and Engg.,SCSVMV University, Enathur,Kanchipuram 2 Assistant Professor,Department of Computer Science and Applications,Queen Mary’s College (autonomous), Mylapore, Chennai Email: sengu_cool@yahoo.com, t_bhuvaneswari@yahoo.com Abstract—Mining distributed databases is emerging as a fundamental computational problem. A common approach for mining distributed databases is to move all of the data from each database to a central site and a single model is built. Privacy concerns in many application domains prevents sharing of data, which limits data mining technology to identify patterns and trends from large amount of data. Traditional data mining algorithms have been developed within a centralized model. However, distributed knowledge discovery has been proposed by many researchers as a solution to privacy preserving data mining techniques. By vertically partitioned data, each site contains some attributes of the entities in the environment.In this paper, we present a method for Agglomerative clustering algorithm in situations where different sites contain different attributes for a common set of entities for verticallypartitioned data. Using association rules data are partitioned into vertically. Keywords - Datamining; agglomerative clustering; distributed data;association rule. I. INTRODUCTION Data Mining is the technique used by analysts to find out the hidden and unknown pattern from the collection of data. Although the organizations gather large volumes of data, it is of no use if "knowledge" or "beneficial information" cannot be inferred from it. Unlike the statistical methods the data mining techniques extracts interesting information. The operations like classification, clustering, association rule mining, etc. are used for data mining purposes.The term data distribution means the manner in which the data has been stored at the sites (DB servers). Primarily there are two types of data distribution i) Centralized Data and ii) Partitioned Data. In a centralized data environment all data is stored at single site. While in distributed environment all data is distributed among different sites. Distributed data can further be divided ini) Horizontally and ii) Vertically distributed environments (Fig.1). In horizontal distribution the different sites stores the same attributes for different sets of records. In vertical distribution the sites stores different attributes for the same set of records.By vertically partitioned, we mean that each site contains some elements of a transaction. Using the traditional online example, one site may contain book purchases, while another has electronic purchases. Using a key such as credit card number and date, we can join these to identify relationships between purchases of books and electronic goods. However, this discloses the individual purchases at each site, possibly violating consumer privacy agreements. Fig. 1. Classification of dataset Clustering is the method by which like records are grouped together. Usually this is done to give the end user a high level view of what is going on in the database. Clustering is sometimes used to mean segmentation. Technically it can be defined as the task of grouping a set of objects in such a manner that objects in same group (cluster) are more similar to each other than to those in other groups (clusters). It is main task carried out for machine learning, pattern recognition, information retrieval, etc. Clustering can be partitioned in i) Hierarchical ii) Partition Based iii) Density Based Clustering (Fig.2). The hierarchy of clusters is usually viewed as a tree where the smallest clusters merge together to create the next highest level of clusters and those at that level merge together to create the next highest level of clusters. There are two main types of hierarchical clustering algorithms:  Agglomerative - Agglomerative clustering techniques start with as many clusters as there are records where each cluster contains just one record. The clusters that are nearest each other are merged together to form the next largest cluster. This merging is continued until a hierarchy of clusters is built with just a single cluster containing all the records at the top of the hierarchy(Fig.3).  Divisive - Divisive clustering techniques take the opposite approach from agglomerative techniques. These techniques start with all the records in one cluster and then try to split that cluster into smaller pieces and then in turn to try to split those smaller pieces.
  • 2. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications Volume: 04 Issue: 02 December 2015 Page No.75-77 ISSN: 2278-2419 76 Fig. 2. Categories of Clustering Fig. 3: Concept of Clustering II. RELATED WORK Vaidya et.al [1] presents the k-means technique to preserve privacy of vertically partitioned data.Hwanjo Yu et.al [2] suggests an algorithm for privacy preservation for Support Vector Machines(SVM) based classification using local and global models. Local models are relevant to each participating party that is not disclosed to others while generating global model jointly. Global model remains the same for every party which is then used for classifying new data objects.Liu et.al [3] represents two protocols for privacy preserving clustering to work upon horizontally and vertically partitioned data separately.Inan et.al [4] suggest methods for constructing dissimilarity matrices of objects from different sites in privacy preserving manner. Krishna Prasadet.al [5], mentioned a procedure for securely running BIRCH algorithm over arbitrarily partitioned database. Secure protocols are mentioned in it for distance metrics and procedure is suggested for using these metrics in securely computing clusters. Pinkas[6] represents various cryptographic techniques for privacy preserving. Vaidya[7] presents various techniques of privacy preserving for different procedures of data mining. An algorithm is suggested for privacy preserving association rules. A subroutine in this work suggests procedure for securely finding the closest cluster in k-means clustering for privacy preservation.Nishant[8] suggests scaling transformation on centralized data to preserve privacy for clustering. K-means clustering [9, 10] is a simple technique to group items into k clusters. The basic idea behind k-means clustering is as follows: Each item is placed in its closest cluster, and the cluster centers are then adjusted based on the data placement. This repeats until the positions stabilize. III. AGGLOMERATIVE CLUSTERING ON VERTICALLY PARTITIONED DATA The proposed work is about vertically portioned data mining using clustering technique. In this system, we consider the heterogeneous database scenario considered a vertical partitioning of the database between two parties A and B(Fig.4). The association rule mining problem can be formally stated as follows: Let I = i1,i2, …ip be a set of literals, called items. Let D be a set of transactions, where each transaction T is a set of items such that T I. Associated with each transaction is a unique identifier, called its TID (Transaction Identifier). We say that a transaction T contains X, a set of some items in I, if X T. An association rule is an implication of the form, X Y, where X  I, Y I, and XY =. The rule XY holds in the transaction set D with confidence c if c% of transactions in D that contain X also contain Y . The rule X  Y has support s in the transaction set D if s% of transactions in D contain X Y.In this clustering of the Databases will be done so that the responsibility of finding a frequent n item set can be distributed over clusters which will increase the response time as well as decrease the number of messages need be passed and thus avoiding the bottleneck around central site.The vertically partitioned data are clustered into n clusters using agglomerative clustering algorithm. Then these clusters are separated into n databases. Finally, this information can be stored in external data base for further usage. Fig.4. Proposedsystem architecture IV. AGGLOMERATIVE CLUSTERING ALGORITHM Agglomerative hierarchical clustering is a bottom-up clustering method where clusters have sub-clusters, which in turn have sub-clusters, etc.A good clustering method will produce high quality clusters with high intra-class similarity and low inter- class similarity. The quality of a clustering result depends on both the similarity measure used by the method and its implementation. The quality of a clustering method is also measured by its ability to discover some or all of the hidden patternsAlgorithmic steps-Let X = {x1, x2, x3, ..., xn} be the set of data points.
  • 3. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications Volume: 04 Issue: 02 December 2015 Page No.75-77 ISSN: 2278-2419 77 Step 1:Begin with the disjoint clustering having level L(0) = 0 and sequence number m = 0. Step 2:Find the least distance pair of clusters in the current clustering, say pair (r), (s), according to d[(r),(s)] = min d[(i),(j)] where the minimum is over all pairs of clusters in the current clustering. Step 3:Increment the sequence number: m = m +1.Merge clusters (r) and (s) into a single cluster to form the next clustering m. Set the level of this clustering to L(m) = d[(r),(s)]. Step 4:Update the distance matrix, D, by deleting the rows and columns corresponding to clusters (r) and (s) and adding a row and column corresponding to the newly formed cluster. The distance between the new cluster, denoted (r,s) and old cluster(k) is defined in this way: d[(k), (r,s)] = min (d[(k),(r)], d[(k),(s)]). Step 5:If all the data points are in one cluster then stop, else repeat from step 2. V. FINDINGS AND RESULTS The workwas implemented in one way clustering and efficiency of data obtained was not accurate. This proposed method can be implemented in twoway clustering technique which will give better results. VI. CONCLUSION This proposed architecture has been implemented in future for further development using the data mining Toolbox under WEGA Software.Clustering is an unsupervised learning technique, and as such, there is no absolutely correct answer. For this reason and depending on the particular application of the clustering, fewer or greater numbers of clusters may be desired. The future enhancement of this is to add global caching as caching can be used since data in warehouse tend to change a little over time. Techniques of clustering the databases can be debated upon, since more efficient the division of sites, more efficient will be association rules. REFERENCES [1] J. Vaidya and C. Clifton, “Privacy preserving K-means clustering over vertically partitioned data”, Proceedings of the ninth ACMSIGKDD international conference on Knowledge discovery and data mining Washington, DC, pp. 206-215, august 24-272003. [2] H. Yu, J.Vaidya and X. Jiang, “Privacy preserving SVM classification on vertically partitioned data ", AdvinKnowledge Disc Data Min. vol. 3918, pp. 647-656, 2006 [3] J. Liu, J. Luo, J. Z. Huang and L. Xiong, "Privacy preserving distributed DBSCAN clustering", PAIS 2012, vol 6, pp.177-185. [4] A. Inan, S. Kaya, Y.Saygin, E.Savas, A.Hintoglu and A. Levi , "Privacy preserving clustering on horizontally partitioneddata", PDM, vol 63, pp. 646-666, 2006. [5] P. Krishna Prasad and C. PanduRangan, "Privacy preserving BIRCH algorithm for clustering over arbitrarily partitioned databases", pp. 146- 157, August 2007. [6] B.Pinkas, "Cryptographic techniques for privacy preserving data mining".Interntional Journal of Applied Cryptography (IJACT) 3(1): 21- 45 ,2013. [7] J. S.Vaidya. “A thesis on privacy preserving data mining over Vertically Partitioned Data". (Unpublished) [8] P.KhatriNishant, G.PreetiandP. Tusal, “Privacy preserving clustering on centralized data through scaling transfermation”.International journal of computer engineering&technology (IJCET).vol4,Issue3,pp449- 454,2013. [9] R. Duda and P. E. Hart. Pattern classification and scene analysis. Wiley, New York.1973. [10] K. Fukunaga. Introduction to statistical pattern recognition. Academic Press, San Diego, CA, 1990. [11] S. Paul, “An optimized distributed association rule mining algorithm in parallel and distributed data mining With Xml data for improved response time”, Int J Comp Sci Info Technol.vol. 2, pp. 2, April 2010. [12] R. J. Gil-Garcia, J. M. Badia-Contelles, and A. Pons-Porrata. Extended star clustering algorithm. Lecture Notes Comp Sci.vol. 2905, pp.480-487, 2003. [13] G. Lance and W.Williams. A general theory of classificatory sorting strategies. 1:Hierarchical systems. Comp J, vol. 9, pp.373-380,1967. [14] B. Larsen and C. Aone. Fast and effective text mining using linear-time document clustering. KDD vol. 99, pp. 16-22, 1999. [15] A. Pons-Porrata, R. Berlanga-Llavori, and J. Ruiz-Shulcloper. On-line event and topic detection by using thecompact sets clustering algorithm. J Intelli Fuzzy Sys, vol. 3-4, pp. 185-194, 2002. [16] K. Wagstaff and C. Cardie, “Clustering with instance-level constraints”, Proceedings of the 17th International Conference on Machine Learning (ICML 2000), Stanford, CA, pp. 1103-1110, 2000. [17] I. Davidson and S. S. Ravi, “Clustering with constraints and the k-Means algorithm”, the 5th SIAM Data Mining Conf. 2005. [18] I. Davidson and S. S. Ravi, “Hierarchical clustering with constraints: Theory and practice”, the 9th European Principles and Practice of KDD, PKDD 2005. [19] I. Davidson and S. S. Ravi, “Intractability and clustering with constraints”, Proceedings of the 24th international conference on Machine learning, 2007.