SlideShare a Scribd company logo
1 of 5
Download to read offline
ACEEE Int. J. on Control System and Instrumentation, Vol. 03, No. 01, Feb 2012



        Record Ordering Heuristics for Disclosure Control
                  through Microaggregation
                                           Brook Heaton1 and Sumitra Mukherjee2
    1
      Graduate School of Computer and Information Sciences, Nova Southeastern University, Fort Lauderdale, FL, USA
                                               Email: willheat@nova.edu
    2
      Graduate School of Computer and Information Sciences, Nova Southeastern University, Fort Lauderdale, FL, USA.
                                               Email: sumitra@nova.edu

Abstract— Statistical disclosure control (SDC) methods                   indistinguishable from at least k-1 other records. Replacing
reconcile the need to release information to researchers with             actual data values with group means leads to information
the need to protect privacy of individual records.                       loss and renders the data less valuable to users. Optimal
Microaggregation is a SDC method that protects data subjects             microaggregation aims to minimize information loss by
by guarantying k-anonymity: Records are partitioned into
                                                                         grouping together records that are very similar, thereby
groups of size at least k and actual data values are replaced by
the group means so that each record in the group is
                                                                         maximizing homogeneity among the records of each group.
indistinguishable from at least k-1 other records. The goal is           The optimal microaggregation problem can be formally
to create groups of similar records such that information loss           defined as follows: Given a data set with n records and p
due to data modification is minimized, where information                 numerical attributes per record, partition the records into
loss is measured by the sum of squared deviations between                groups containing at least k records each, such that the sum
the actual data values and their group means. Since optimal              squared error (SSE) within groups is minimized. SSE is defined
multivariate microaggregation is NP-hard, heuristics have
                                                                              g    ni
been developed for microaggregation. It has been shown that                                         2
for a given ordering of records, the optimal partition consistent        as   
                                                                              i 1 j 1
                                                                                          X ij  X i , where g is the number of groups, X
                                                                                                                                          ij
with that ordering can be efficiently computed and some of
the best existing microaggregation methods are based on this
approach. This paper improves on previous heuristics by                  is the jth record of the ith group, and X i is the mean for the ith
adapting tour construction and tour improvement heuristics               group.
for the traveling salesman problem (TSP) for
                                                                             This problem has been shown to be NP-hard when p>1
microaggregation. Specifically, the Greedy heuristic and the
Quick Boruvka heuristic are investigated for tour construction
                                                                         [21]. Since microaggregation is extensively used for SDC,
and the 2-opt, 3-opt, and Lin-Kernighan heuristics are used              several heuristics have been proposed that lead to low
for tour improvements. Computational experiments using                   information loss (see e.g. [4], [6], [7], [9], [10], [11], [13], [16],
benchmark datasets indicate that our method results in lower             [17], [18], [20], [24], and [25]). There is a need to develop new
information loss than extant microaggregation heuristics.                heuristics that perform better than those currently known,
                                                                         either by producing groups with lower information loss, or
Index Terms—Disclosure Control, Microaggregation, Privacy                by achieving the same information loss at lower
protection, Tour construction heuristics, Tour improvement               computational cost. Optimal univariate microaggregation (i.e.,
heuristics, Shortest path.
                                                                         involving a single attribute) has been shown to be solvable
                                                                         in polynomial time by taking an ordered list of records and
                         I. INTRODUCTION                                 using a shortest-path algorithm to compute the lowest
    Many databases, such as health information systems and               information loss k-partition for the given ordered list [14]. A
U.S. census data contain information that is valuable to                 k-partition is a set of groups such that every group contains
researchers for statistical purposes. At the same time, these            at least k elements. Optimality is guaranteed, since univariate
databases contain private information about individuals.                 data can be strictly ordered. Practical applications, however,
There is a tradeoff between providing information for the                require microaggregation to be performed on records
benefit of society, and restricting information for the benefit          containing multiple attributes. While multivariate data cannot
of individuals. Statistical disclosure control (SDC) is a set of         be strictly ordered, Domingo-Ferrer et al. [9] show that, for a
techniques for providing access to aggregate information in              given ordering of records, the best partition consistent with
statistical databases, while at the same time protecting the             that ordering can be identified efficiently as a shortest path
privacy of individual data subjects. See [1], [12], and [27] for         in a network using Hansen and Mukherjee’s method [14].
good overviews of statistical disclosure controls methods.               Exploiting this idea, they develop the Multivariate Hansen-
Microaggregation is a popular SDC method that that protects              Mukherjee (MHM) algorithm and propose several heuristics
data subjects by guarantying k-anonymity (see [2], [23], and             for ordering the records. Empirical results indicate that MHM,
[26]). Under microaggregation, records are partitioned into              when used with good ordering heuristics, outperforms extant
groups of size at least k and actual data values are replaced            microaggregation methods. This paper builds on the work of
by the group means so that each record in the group is                   [9] by investigating new methods for ordering records as a
© 2012 ACEEE                                                        22
DOI: 01.IJCSI.03.01.11
ACEEE Int. J. on Control System and Instrumentation, Vol. 03, No. 01, Feb 2012


first step in multivariate microaggregation. The techniques                                  are assigned. Stitch the groups together by applying NPN to
for ordering records are based on the tour construction and                                  their centroids. The MD-MHM heuristic runs in O(n3) time.
improvement heuristics for the traveling salesman problem                                    The MDAV-MHM heuristic is very similar to MD, but
(TSP). Using previously studied benchmark datasets, our                                      considers distances of records to group centroids, rather than
method is shown to outperform extant microaggregation                                        distances to other records. The MDAV heuristic runs in O(n2)
heuristics. Section II discusses the MHM based                                               time. The CBFS-MHM designates as the first record the
microaggregation method and record ordering heuristics                                       records r that is furthest from the centroid. The k-1 closest
proposed in [9]. Section III introduces our TSP based tour                                   records to r are added to its group. This process is repeated
construction and improvement heuristics. Section IV                                          using the remaining records until all the records are assigned.
describes the computational experiments used to compare                                      The complexity of CBFS is also O(n2).
our method with extant microaggregation heuristics. Section
V presents our results.                                                                            III. OUR ORDERING AND IMPROVEMENT HEURISTICS
                                                                                                 We use two TSP based tour construction heuristics –
            II. MHM MICROAGGREGATION HEURISTIC
                                                                                             Greedy and Quick Boruvka (QB) – and three tour
A. The MHM Algorithm                                                                         improvement heuristics – 2-Opt, 3-Opt, and Lin-Kernighan
    The MHM algorithm introduced in [9] involves                                             (LK). The improvement heuristics are applied not only to the
constructing a graph based on an ordered list of records, and                                tours generated by Greedy and QB but also to those produced
finding the shortest path in the graph. The arcs in the shortest                             by the record ordering heuristics discussed in the previous
path correspond to a partition of the records that is guaranteed                             section – NPN, MD-MHM, MDAV-MHM, and CBFS-MHM.
to be the lowest cost partition consistent with the specified                                 The Greedy heuristic produces an ordering of records using
ordering. A partition is said to be consistent with an ordering                              several steps. First, the distances between every pair of
if the following is true: for any three records xi, xj, and xk                               records in the dataset are calculated and sorted in ascending
where i<j<k if xi and xk belong to the same group g then xj                                  order. The pair of records with the smallest distance is chosen
also belongs to group g. The graph is constructed as follows:                                and added to the path. The remaining pairs are iteratively
Given an ordered set of n records, create n nodes labeled 1 to                               considered in order, and added to the path as long as the do
n. Additionally, create a node labeled 0. For each pair of                                   not introduce sub-tours or vertices of degree greater than 2.
nodes (i, j) where i+k <= j < i+2k, create an arc directed                                   The computational complexity of the Greedy heuristic is O(n2
                                                                                             log n). In Quick Boruvka, the records are first sorted by one
                                                     j
from i to j. The length of the arc L(i , j )       h  i 1
                                                                ( xh  x(i , j ) )2 ,        of the attributes to generate an initial ordering of records.
                                                                                             For each record in the initial ordering, the distance between
where xh is a multivariate record, and x(i , j ) is the centroid of                          the current record and the other records in the dataset is
the records in (i+1, j). Once the graph is constructed, a                                    computed. The pair of records with the minimum distance is
shortest-path algorithm such as Dijkstra’s algorithm [8] is                                  added, subject to the same conditions outlined in the Greedy
used to determine the shortest path from node 0 to node n.                                   heuristic above. The QB heuristic is also of complexity of
The length of the shortest path gives the information loss                                   O(n2 log n), but empirically performs slightly faster than the
corresponding to the best partition consistent with the                                      Greedy heuristic [3]. The 2-Opt heuristic, is a tour
specified ordering.                                                                          improvement heuristic; that is, it starts with a tour and
                                                                                             iteratively attempts to improve it. It operates by first randomly
B. Existing Record Ordering Heuristics                                                       selecting two non-connected edges (a total of four points)
    Four heuristics for ordering records have been proposed                                  from the tour. If re-connecting the edges result in a shorter
in [9]: nearest-point-next (NPN), maximum distance (MD-                                      overall tour the originally selected edges are replaced with
MHM), maximum distance to average vector (MDAV-MHM),                                         the newly computed ones. This process continues until either
and centroid-based fixed-size (CBFS-MHM). The NPN                                            no further exchanges result in a shorter tour, or some other
heuristic selects the first record by computing the record                                   stopping criteria are met. In the 3-opt heuristic, three
furthest away from the centroid of the entire dataset. The                                   disconnected edges are broken, and every possible way of
record closest to the initial record is designated as the second                             reconnecting the edges into a valid tour is examined. If a tour
record. The third record is closest to the second record.                                    with shorter distance is found, the edges are swapped, and
This process continues until all of the records have been                                    the process repeats until no more exchanges are possible or
added to the tour. The NPN heuristic runs in O(n2) time,                                     some other stopping criteria are met. The Lin-Kernighan (LK)
where n is the number of records in the dataset. The MD-                                     heuristic is a tour improvement heuristic that is similar to 2-
MHM heuristic works as follows: Find the two records r and                                   opt and 3-opt. Like 2-opt, and 3-opt, LK takes an existing tour,
s that are furthest from one another in the dataset, using                                   and iteratively swaps edges to create a better tour. Lin and
Euclidean distance. Find the k-1 closest records to r and                                    Kernighan [19] note that the choice of swapping two edges in
form a group. Find the k-1 closest records to s and form a                                   2-opt or 3-opt is arbitrary, since it is possible that swapping
group. Repeat this process, using the remaining records that                                 more than three edges may actually result in a better tour.
haven’t been assigned to a group until all the remaining records                             Therefore, LK attempts to build a series of sequential k-opt

© 2012 ACEEE                                                                            23
DOI: 01.IJCSI.03.01.11
ACEEE Int. J. on Control System and Instrumentation, Vol. 03, No. 01, Feb 2012


moves (sometimes referred to as swaps, flips, or exchanges),             against the Census, Tarragona, and EIA datasets for five
rather than a 2-opt or 3-opt move, that will result in an overall        different values of k. The information loss under each
shorter tour. A good overview of these heuristics can be                 method is reported. For each specified value of k, the lowest
found in [22].                                                           information loss obtained for each dataset is highlighted in
                                                                         bold to emphasize the best performing method. Table 1 shows
                  IV. EXPERIMENTAL DESIGN                                that in all but one case (k=10 for Census), our tour
                                                                         improvement heuristics improved upon the results obtained
A. Datasets
                                                                         using MD-MHM. Results for the MDAV-MHM and CBFS-
    We used three benchmark datasets that were used in                   MHM family of algorithms are very similar, as shown in Tables
previous studies [9] to evaluate our proposed methods. The               2 and 3. Reductions in information loss are often substantial.
first dataset, “Tarragona”, consists of 834 records with 13              For example with the EIA dataset, the MDAV-MHM+LK
variables. The second dataset, “EIA”, consists of 4092                   heuristic for k=10 resulted in an information loss of 1.968,
records with 10 variables. Finally, the “Census” dataset                 which is 31% lower than the information loss (2.867) obtained
consists of 1080 records from the U.S. census, with 13                   using MDAV-MHM alone.
variables. Consistent with previous studies, the variables
                                                                                             TABLE I.
were normalized to ensure that results are not dominated by              INFORMATION LOSS UNDER MD-MHM AND IMPROVEMENT
any single variable.                                                                        HEURISTICS
B. Record Ordering Algorithms
     The record ordering algorithms used by Domingo-Ferrer
et al. [9] are NPN, MD-MHM, CBFS-MHM, and MDAV-MHM.
These algorithms were run on the three selected datasets.
Additionally, two new record ordering algorithms introduced
in this study – Greedy and Quick Boruvka – were also run.
Three different algorithms for improving existing record
orderings – 2-opt, 3-opt, and LK – were used with all tour
construction heuristics.
C. Comparisons
    The six microaggregation algorithms (i.e., NPN, MD-
MHM, MDAV-MHM, CBFS-MHM, Greedy, and QB) were                                                 TABLE II.
                                                                                INFORMATION LOSS UNDER MDAV-MHM AND
run on the three datasets (Census, Tarragona, and EIA) using                          IMPROVEMENT HEURISTICS
five typical values of k (3, 4, 5, 6, and 10). These values of k
are consistent with those used in [9]. The three tour
improvement heuristics (2opt, 3opt, and LK) were then
applied to these resulting record orderings. That is, for a
given dataset and a given value of k, four different
experiments were run using each record ordering heuristic
(the original algorithm with no improvement, plus the three
improvement heuristics). Consistent with previous studies,
we report the normalized information loss IL = 100 * SSE/SST
(with possible values between 0 and 100) where SST =
 g    ni
                     2

 i 1 j 1
             X ij  X , and X is the centroid of the entire                                    TABLE III.
                                                                                 INFORMATION LOSS UNDER CBFS-MHM AND
                                                                                       IMPROVEMENT HEURISTICS
dataset. All of the algorithms were implemented in the Java
programming language, using the Sun Java 6.0 SDK. All
algorithms were run on a computer with a dual 3.0 GHz
processor and 2GB of RAM, running the Linux operating
system (Ubuntu 10.04).

                           V. RESULTS
   This section details the results from applying the MD-
MHM, MDAV-MHM, CBFS-MHM, NPN-MHM, Greedy-
MHM, and QB-MHM algorithms, along with their
corresponding 2-opt, 3-opt, and LK improvement heuristics,

© 2012 ACEEE                                                        24
DOI: 01.IJCSI.03.01. 11
ACEEE Int. J. on Control System and Instrumentation, Vol. 03, No. 01, Feb 2012

                    TABLE IV.                                                          TABLE VII.
       INFORMATION LOSS UNDER NPN-MHM AND                              BEST COMBINATION OF TOUR CONSTRUCTION AND
             IMPROVEMENT HEURISTICS                                             IMPROVEMENT HEURISTICS




                    TABLE V.
INFORMATION LOSS UNDER QB-MHM AND IMPROVEMENT
                   HEURISTICS




                                                                    Table 7 summarizes the results from tables 1–6 as follows: the
                                                                    best performing combination of tour construction and tour
                                                                    improvement heuristics for each dataset are identified under
                                                                    a specified k. The row labeled “Best under” identifies the
                                                                    specific improvement heuristic that produced the best results.
                                                                    For example, with the Census dataset, when k=3, the best
                                                                    results are obtained using a combination of Greedy-MHM
                                                                    and the LK tour improvement heuristic with an information
Tables 4, 5, and 6 present the results for the NPN-MHM, QB-         loss of 5.002%. While no tour construction heuristic
MHM, and Greedy-MHM family of algorithms respectively.              dominates, out of 15 total instances, the best result was found
Our tour improvement heuristics resulted in the lower infor-        using the LK improvement heuristic 7 times and using the 3-
mation loss in every instance                                       opt improvement heuristic 6 times. Table 8 shows the
                     TABLE VI.
                                                                    difference between the lowest information loss obtained using
      INFORMATION LOSS UNDER GREEDY-MHM AND                         extant microaggregation methods (the minimum value found
             IMPROVEMENT HEURISTICS                                 from the MD, MDAV, CBFS, MD-MHM, CBFS-MHM, NPN-
                                                                    MHM, and MDAV-MHM heuristics) and those obtained
                                                                    using the new methods described in this paper. Under our
                                                                    methods information losses are decreased by 4%, 5.5%, and
                                                                    13.9% on the average for the Census, Tarragona, and EIA
                                                                    datasets respectively. In all but one instance (k=10 with the
                                                                    census dataset) our proposed method outperformed existing
                                                                    methods and should hence be considered as a promising
                                                                    microaggregation heuristic.




© 2012 ACEEE                                                   25
DOI: 01.IJCSI.03.01.11
ACEEE Int. J. on Control System and Instrumentation, Vol. 03, No. 01, Feb 2012

                     TABLE VIII.                                               [12] Duncan, G. T., Elliot, M., Salazar-González, J. (2011).
        RESULTS COMPARED TO EXTANT METHODS                                     Statistical Confidentiality: Principles and Practice, Springer.
                                                                               [13] Fayyoumi, E., & Oommen, B.J. (2009). Achieving
                                                                               microaggregation for secure statistical databases using fixed-
                                                                               structure partitioning-based learning automata. IEEE Transactions
                                                                               on Systems, Man, and Cybernetics, Part B, 39(5), 1192-1205.
                                                                               [14] Hansen, S. L., & Mukherjee, S. (2003). A polynomial algorithm
                                                                               for optimal univariate microaggregation. IEEE Transactions on
                                                                               Knowledge and Data Engineering, 15(4), 1043-1044.
                                                                               [15] Hundepool, A., Wetering, A., Ramaswamy, R., Franconi, L.,
                                                                               Capobianchi, A., Wolf, P., Domingo-Ferrer, J., Torra, V., Brand,
                                                                               R., & Giessing, A. (2004). µ-ARGUS Ver. 4.0 Software and User’s
                                                                               Manual. Voorburg, The Netherlands: Statistics Netherlands.
                                                                               [16] Kabir, M.E., Wang, H., & Zhang, Y. (2010). A pairwise-
                                                                               systematic microaggregation for statistical disclosure control.
                                                                               International Conference on Data Mining 2010, 266-273.
                                                                               [17] Laszlo, M., & Mukherjee, S. (2005). Minimum spanning tree
                                                                               partitioning algorithm for microaggregation. IEEE Transactions on
                            REFERENCES                                         Knowledge and Data Engineering, 17(7), 902-911.
                                                                               [18] Laszlo, M., & Mukherjee, S. (2009). Approximation bounds
[1] Adam, N. R., & Wortmann, J. C. (1989). Security-control
                                                                               for minimum information loss microaggregation. IEEE Transactions
methods for statistical databases: a comparative study. ACM
                                                                               on Knowledge and Data Engineering, 21(11), 1643-1647.
Computing Surveys, 21(4), 515-556.
                                                                               [19] Lin, S., & Kernighan, B. (1973). An effective heuristic algorithm
[2] Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy,
                                                                               for the traveling-salesman problem. Operations Research, 21(2),
R., Thomas, D., & Zhu, A. (2005). Approximation algorithms for
                                                                               498-516.
k-anonymity. Journal of Privacy Technology, Paper No.
                                                                               [20] Mateo-Sanz, J. M., & Domingo-Ferrer, J. (1999). A method
20051120001.
                                                                               for data oriented multivariate microaggregation In J. Domingo-Ferrer
[3] Applegate, D., Cook, W., & Rohe, A. (2003). Chained Lin-
                                                                               (Ed.), Statistical Data Protection (pp. 89-99). Luxemburg: Office
Kernighan for large traveling salesman problems. INFORMS Journal
                                                                               for Official Publications of the European Communities.
on Computing, 15(1), 82-92.
                                                                               [21] Oganian, A., & Domingo-Ferrer, J. (2001). On the complexity
[4] Balleste, A.M., Solanas, A., Domingo-Ferrer, J., & Mateo-Sanz,
                                                                               of optimal microaggregation for statistical disclosure control.
J.M. (2007). A genetic approach to multivariate microaggregation
                                                                               Statistical Journal of the United Nations Economic Commission for
for database privacy. IEEE 23rd International Conference on Data
                                                                               Europe, 18(4), 345-354.
Engineering Workshop, 180-185.
                                                                               [22] Rego, C., Gamboa, D., Glover, F., & Osterman, C. (2011).
[5] Bentley, J.L. (1992). Fast algorithms for geometric traveling
                                                                               Traveling salesman problem heuristics: leading methods,
salesman problems. ORSA Journal of Computing, 4, 387-411.
                                                                               implementations, and latest advances. European Journal of
[6] Chang, C.-C., Li, Y.-C., & Huang, W.-H. (2007). TFRP: An
                                                                               Operations Research, 211(3), 427-441.
efficient microaggregation algorithm for statistical disclosure
                                                                               [23] Samarati, P. (2001). Protecting respondents’ identities in
control. Journal of Systems and Software, 80(11), 1866-1878.
                                                                               microdata release. IEEE Transactions on Knowledge and Data
[7] Defays, D., & Anwar, N. (1995). Micro-aggregation: a generic
                                                                               Engineering, 13(6), 1010-1027.
method. Proceedings of the 2nd International Symposium on
                                                                               [24] Sande, G. (2002). Exact and approximate methods for data
Statistical Confidentiality. Eurostat, Luxemburg, 69-78.
                                                                               directed microaggregation in one or more dimensions. International
[8] Dijkstra, E. W. (1959). A note on two problems in connexion
                                                                               Journal of Uncertainty and Fuzziness in Knowledge Based Systems,
with graphs. Numerische Mathematik, 1, 269-271.
                                                                               10(5), 459-476.
[9] Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J. M.,
                                                                               [25] Solanas, A., Martinez-Balleste, A., Domingo-Ferrer, J., &
& Sebé, F. (2006). Efficient multivariate data-oriented
                                                                               Mateo-Sanz, M. (2006). A 2^d-tree-based blocking method for
microaggregation. The VLDB Journal, 15, 355–369.
                                                                               microaggregating very large data sets. In First International
[10] Domingo-Ferrer, J., & Mateo-Sanz, J. M. (2002). Practical
                                                                               Conference on Availability, Reliability and Security (ARES’06), 922-
data-oriented microaggregation for statistical disclosure control. IEEE
                                                                               928.
Transactions on Knowledge and Data Engineering, 14(1), 189-
                                                                               [26] Sweeney, L. (2002). K-anonymity: a model for protecting
201.
                                                                               privacy. International Journal on Uncertainty, Fuzziness and
[11] Domingo-Ferrer, J., Sebe, F., & Solanas, A. A polynomial-
                                                                               Knowledge-based Systems, 10(5), 557–570.
time approximation to optimal multivariate microaggregation.
                                                                               [27] Willenborg, L., & de Waal, T. (2001). Elements of statistical
Computers & Mathematics with Applications, 55(4), 714-732.
                                                                               disclosure control. New York: Springer-Verlag.




© 2012 ACEEE                                                              26
DOI: 01.IJCSI.03.01. 11

More Related Content

What's hot

A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
 
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...sherinmm
 
CLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATACLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATAcsandit
 
DOWNLOAD
DOWNLOADDOWNLOAD
DOWNLOADbutest
 
Performance evaluation of a discovery and scheduling protocol for multihop ad...
Performance evaluation of a discovery and scheduling protocol for multihop ad...Performance evaluation of a discovery and scheduling protocol for multihop ad...
Performance evaluation of a discovery and scheduling protocol for multihop ad...Minor33
 
An Efficient Block Matching Algorithm Using Logical Image
An Efficient Block Matching Algorithm Using Logical ImageAn Efficient Block Matching Algorithm Using Logical Image
An Efficient Block Matching Algorithm Using Logical ImageIJERA Editor
 
Parallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using openclParallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using opencleSAT Publishing House
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Operating Task Redistribution in Hyperconverged Networks
Operating Task Redistribution in Hyperconverged Networks Operating Task Redistribution in Hyperconverged Networks
Operating Task Redistribution in Hyperconverged Networks IJECEIAES
 
Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Alexander Decker
 
Improving search time for contentment based image retrieval via, LSH, MTRee, ...
Improving search time for contentment based image retrieval via, LSH, MTRee, ...Improving search time for contentment based image retrieval via, LSH, MTRee, ...
Improving search time for contentment based image retrieval via, LSH, MTRee, ...IOSR Journals
 
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...csandit
 
New Proposedclassic Cluster Layer Architecture for Mobile Adhoc Network (cclam)
New Proposedclassic Cluster Layer Architecture for Mobile Adhoc Network (cclam)New Proposedclassic Cluster Layer Architecture for Mobile Adhoc Network (cclam)
New Proposedclassic Cluster Layer Architecture for Mobile Adhoc Network (cclam)CSCJournals
 

What's hot (18)

A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
 
I017425763
I017425763I017425763
I017425763
 
vol.4.1.2.july.13
vol.4.1.2.july.13vol.4.1.2.july.13
vol.4.1.2.july.13
 
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...
 
CLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATACLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATA
 
DOWNLOAD
DOWNLOADDOWNLOAD
DOWNLOAD
 
Performance evaluation of a discovery and scheduling protocol for multihop ad...
Performance evaluation of a discovery and scheduling protocol for multihop ad...Performance evaluation of a discovery and scheduling protocol for multihop ad...
Performance evaluation of a discovery and scheduling protocol for multihop ad...
 
A0360109
A0360109A0360109
A0360109
 
An Efficient Block Matching Algorithm Using Logical Image
An Efficient Block Matching Algorithm Using Logical ImageAn Efficient Block Matching Algorithm Using Logical Image
An Efficient Block Matching Algorithm Using Logical Image
 
98462-234458-1-PB
98462-234458-1-PB98462-234458-1-PB
98462-234458-1-PB
 
Parallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using openclParallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using opencl
 
T24144148
T24144148T24144148
T24144148
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Operating Task Redistribution in Hyperconverged Networks
Operating Task Redistribution in Hyperconverged Networks Operating Task Redistribution in Hyperconverged Networks
Operating Task Redistribution in Hyperconverged Networks
 
Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...
 
Improving search time for contentment based image retrieval via, LSH, MTRee, ...
Improving search time for contentment based image retrieval via, LSH, MTRee, ...Improving search time for contentment based image retrieval via, LSH, MTRee, ...
Improving search time for contentment based image retrieval via, LSH, MTRee, ...
 
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
 
New Proposedclassic Cluster Layer Architecture for Mobile Adhoc Network (cclam)
New Proposedclassic Cluster Layer Architecture for Mobile Adhoc Network (cclam)New Proposedclassic Cluster Layer Architecture for Mobile Adhoc Network (cclam)
New Proposedclassic Cluster Layer Architecture for Mobile Adhoc Network (cclam)
 

Viewers also liked

Candle melting diagram style design 3 powerpoint ppt slides.
Candle melting diagram style design 3 powerpoint ppt slides.Candle melting diagram style design 3 powerpoint ppt slides.
Candle melting diagram style design 3 powerpoint ppt slides.SlideTeam.net
 
Doug's purification class
Doug's purification classDoug's purification class
Doug's purification classdrendel
 
Humans to Mars: Logical Step or Dangerous Distraction?
Humans to Mars: Logical Step or Dangerous Distraction?Humans to Mars: Logical Step or Dangerous Distraction?
Humans to Mars: Logical Step or Dangerous Distraction?James Vedda
 
Mitä tapahtuu, jos mitään ei tehdä?
Mitä tapahtuu, jos mitään ei tehdä?Mitä tapahtuu, jos mitään ei tehdä?
Mitä tapahtuu, jos mitään ei tehdä?Sovelto
 
Can you tweet your way to private practice success?
Can you tweet your way to private practice success?Can you tweet your way to private practice success?
Can you tweet your way to private practice success?Scott Brown
 
Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A ReviewIDES Editor
 

Viewers also liked (8)

Candle melting diagram style design 3 powerpoint ppt slides.
Candle melting diagram style design 3 powerpoint ppt slides.Candle melting diagram style design 3 powerpoint ppt slides.
Candle melting diagram style design 3 powerpoint ppt slides.
 
Doug's purification class
Doug's purification classDoug's purification class
Doug's purification class
 
Doug Pippin
Doug PippinDoug Pippin
Doug Pippin
 
Humans to Mars: Logical Step or Dangerous Distraction?
Humans to Mars: Logical Step or Dangerous Distraction?Humans to Mars: Logical Step or Dangerous Distraction?
Humans to Mars: Logical Step or Dangerous Distraction?
 
Achimowicz pociagdogwiazd
Achimowicz pociagdogwiazdAchimowicz pociagdogwiazd
Achimowicz pociagdogwiazd
 
Mitä tapahtuu, jos mitään ei tehdä?
Mitä tapahtuu, jos mitään ei tehdä?Mitä tapahtuu, jos mitään ei tehdä?
Mitä tapahtuu, jos mitään ei tehdä?
 
Can you tweet your way to private practice success?
Can you tweet your way to private practice success?Can you tweet your way to private practice success?
Can you tweet your way to private practice success?
 
Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A Review
 

Similar to Record Ordering Heuristics for Disclosure Control through Microaggregation

MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATION
MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATIONMDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATION
MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATIONcscpconf
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringIJERD Editor
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957IJMER
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478IJRAT
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...IJCSIS Research Publications
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Dataidescitation
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimizationAlexander Decker
 
Applications Of Clustering Techniques In Data Mining A Comparative Study
Applications Of Clustering Techniques In Data Mining  A Comparative StudyApplications Of Clustering Techniques In Data Mining  A Comparative Study
Applications Of Clustering Techniques In Data Mining A Comparative StudyFiona Phillips
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGA SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGijcsit
 
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGA SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGijcsit
 
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGA SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGAIRCC Publishing Corporation
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A ReviewIOSRjournaljce
 
A density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data miningA density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data miningeSAT Publishing House
 
A density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data miningA density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data miningeSAT Publishing House
 
Paper id 252014139
Paper id 252014139Paper id 252014139
Paper id 252014139IJRAT
 

Similar to Record Ordering Heuristics for Disclosure Control through Microaggregation (20)

MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATION
MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATIONMDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATION
MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATION
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimization
 
Applications Of Clustering Techniques In Data Mining A Comparative Study
Applications Of Clustering Techniques In Data Mining  A Comparative StudyApplications Of Clustering Techniques In Data Mining  A Comparative Study
Applications Of Clustering Techniques In Data Mining A Comparative Study
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGA SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
 
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGA SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
 
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGA SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A Review
 
A density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data miningA density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data mining
 
A density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data miningA density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data mining
 
Big Data Clustering Model based on Fuzzy Gaussian
Big Data Clustering Model based on Fuzzy GaussianBig Data Clustering Model based on Fuzzy Gaussian
Big Data Clustering Model based on Fuzzy Gaussian
 
Paper id 252014139
Paper id 252014139Paper id 252014139
Paper id 252014139
 

More from IDES Editor

Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...IDES Editor
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...IDES Editor
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...IDES Editor
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCIDES Editor
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...IDES Editor
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingIDES Editor
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...IDES Editor
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsIDES Editor
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...IDES Editor
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...IDES Editor
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkIDES Editor
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetIDES Editor
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyIDES Editor
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’sIDES Editor
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...IDES Editor
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance AnalysisIDES Editor
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesIDES Editor
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...IDES Editor
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...IDES Editor
 
Mental Stress Evaluation using an Adaptive Model
Mental Stress Evaluation using an Adaptive ModelMental Stress Evaluation using an Adaptive Model
Mental Stress Evaluation using an Adaptive ModelIDES Editor
 

More from IDES Editor (20)

Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFC
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive Thresholds
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability Framework
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through Steganography
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’s
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance Analysis
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
 
Mental Stress Evaluation using an Adaptive Model
Mental Stress Evaluation using an Adaptive ModelMental Stress Evaluation using an Adaptive Model
Mental Stress Evaluation using an Adaptive Model
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 

Record Ordering Heuristics for Disclosure Control through Microaggregation

  • 1. ACEEE Int. J. on Control System and Instrumentation, Vol. 03, No. 01, Feb 2012 Record Ordering Heuristics for Disclosure Control through Microaggregation Brook Heaton1 and Sumitra Mukherjee2 1 Graduate School of Computer and Information Sciences, Nova Southeastern University, Fort Lauderdale, FL, USA Email: willheat@nova.edu 2 Graduate School of Computer and Information Sciences, Nova Southeastern University, Fort Lauderdale, FL, USA. Email: sumitra@nova.edu Abstract— Statistical disclosure control (SDC) methods indistinguishable from at least k-1 other records. Replacing reconcile the need to release information to researchers with actual data values with group means leads to information the need to protect privacy of individual records. loss and renders the data less valuable to users. Optimal Microaggregation is a SDC method that protects data subjects microaggregation aims to minimize information loss by by guarantying k-anonymity: Records are partitioned into grouping together records that are very similar, thereby groups of size at least k and actual data values are replaced by the group means so that each record in the group is maximizing homogeneity among the records of each group. indistinguishable from at least k-1 other records. The goal is The optimal microaggregation problem can be formally to create groups of similar records such that information loss defined as follows: Given a data set with n records and p due to data modification is minimized, where information numerical attributes per record, partition the records into loss is measured by the sum of squared deviations between groups containing at least k records each, such that the sum the actual data values and their group means. Since optimal squared error (SSE) within groups is minimized. SSE is defined multivariate microaggregation is NP-hard, heuristics have g ni been developed for microaggregation. It has been shown that 2 for a given ordering of records, the optimal partition consistent as  i 1 j 1 X ij  X i , where g is the number of groups, X ij with that ordering can be efficiently computed and some of the best existing microaggregation methods are based on this approach. This paper improves on previous heuristics by is the jth record of the ith group, and X i is the mean for the ith adapting tour construction and tour improvement heuristics group. for the traveling salesman problem (TSP) for This problem has been shown to be NP-hard when p>1 microaggregation. Specifically, the Greedy heuristic and the Quick Boruvka heuristic are investigated for tour construction [21]. Since microaggregation is extensively used for SDC, and the 2-opt, 3-opt, and Lin-Kernighan heuristics are used several heuristics have been proposed that lead to low for tour improvements. Computational experiments using information loss (see e.g. [4], [6], [7], [9], [10], [11], [13], [16], benchmark datasets indicate that our method results in lower [17], [18], [20], [24], and [25]). There is a need to develop new information loss than extant microaggregation heuristics. heuristics that perform better than those currently known, either by producing groups with lower information loss, or Index Terms—Disclosure Control, Microaggregation, Privacy by achieving the same information loss at lower protection, Tour construction heuristics, Tour improvement computational cost. Optimal univariate microaggregation (i.e., heuristics, Shortest path. involving a single attribute) has been shown to be solvable in polynomial time by taking an ordered list of records and I. INTRODUCTION using a shortest-path algorithm to compute the lowest Many databases, such as health information systems and information loss k-partition for the given ordered list [14]. A U.S. census data contain information that is valuable to k-partition is a set of groups such that every group contains researchers for statistical purposes. At the same time, these at least k elements. Optimality is guaranteed, since univariate databases contain private information about individuals. data can be strictly ordered. Practical applications, however, There is a tradeoff between providing information for the require microaggregation to be performed on records benefit of society, and restricting information for the benefit containing multiple attributes. While multivariate data cannot of individuals. Statistical disclosure control (SDC) is a set of be strictly ordered, Domingo-Ferrer et al. [9] show that, for a techniques for providing access to aggregate information in given ordering of records, the best partition consistent with statistical databases, while at the same time protecting the that ordering can be identified efficiently as a shortest path privacy of individual data subjects. See [1], [12], and [27] for in a network using Hansen and Mukherjee’s method [14]. good overviews of statistical disclosure controls methods. Exploiting this idea, they develop the Multivariate Hansen- Microaggregation is a popular SDC method that that protects Mukherjee (MHM) algorithm and propose several heuristics data subjects by guarantying k-anonymity (see [2], [23], and for ordering the records. Empirical results indicate that MHM, [26]). Under microaggregation, records are partitioned into when used with good ordering heuristics, outperforms extant groups of size at least k and actual data values are replaced microaggregation methods. This paper builds on the work of by the group means so that each record in the group is [9] by investigating new methods for ordering records as a © 2012 ACEEE 22 DOI: 01.IJCSI.03.01.11
  • 2. ACEEE Int. J. on Control System and Instrumentation, Vol. 03, No. 01, Feb 2012 first step in multivariate microaggregation. The techniques are assigned. Stitch the groups together by applying NPN to for ordering records are based on the tour construction and their centroids. The MD-MHM heuristic runs in O(n3) time. improvement heuristics for the traveling salesman problem The MDAV-MHM heuristic is very similar to MD, but (TSP). Using previously studied benchmark datasets, our considers distances of records to group centroids, rather than method is shown to outperform extant microaggregation distances to other records. The MDAV heuristic runs in O(n2) heuristics. Section II discusses the MHM based time. The CBFS-MHM designates as the first record the microaggregation method and record ordering heuristics records r that is furthest from the centroid. The k-1 closest proposed in [9]. Section III introduces our TSP based tour records to r are added to its group. This process is repeated construction and improvement heuristics. Section IV using the remaining records until all the records are assigned. describes the computational experiments used to compare The complexity of CBFS is also O(n2). our method with extant microaggregation heuristics. Section V presents our results. III. OUR ORDERING AND IMPROVEMENT HEURISTICS We use two TSP based tour construction heuristics – II. MHM MICROAGGREGATION HEURISTIC Greedy and Quick Boruvka (QB) – and three tour A. The MHM Algorithm improvement heuristics – 2-Opt, 3-Opt, and Lin-Kernighan The MHM algorithm introduced in [9] involves (LK). The improvement heuristics are applied not only to the constructing a graph based on an ordered list of records, and tours generated by Greedy and QB but also to those produced finding the shortest path in the graph. The arcs in the shortest by the record ordering heuristics discussed in the previous path correspond to a partition of the records that is guaranteed section – NPN, MD-MHM, MDAV-MHM, and CBFS-MHM. to be the lowest cost partition consistent with the specified The Greedy heuristic produces an ordering of records using ordering. A partition is said to be consistent with an ordering several steps. First, the distances between every pair of if the following is true: for any three records xi, xj, and xk records in the dataset are calculated and sorted in ascending where i<j<k if xi and xk belong to the same group g then xj order. The pair of records with the smallest distance is chosen also belongs to group g. The graph is constructed as follows: and added to the path. The remaining pairs are iteratively Given an ordered set of n records, create n nodes labeled 1 to considered in order, and added to the path as long as the do n. Additionally, create a node labeled 0. For each pair of not introduce sub-tours or vertices of degree greater than 2. nodes (i, j) where i+k <= j < i+2k, create an arc directed The computational complexity of the Greedy heuristic is O(n2 log n). In Quick Boruvka, the records are first sorted by one j from i to j. The length of the arc L(i , j )   h  i 1 ( xh  x(i , j ) )2 , of the attributes to generate an initial ordering of records. For each record in the initial ordering, the distance between where xh is a multivariate record, and x(i , j ) is the centroid of the current record and the other records in the dataset is the records in (i+1, j). Once the graph is constructed, a computed. The pair of records with the minimum distance is shortest-path algorithm such as Dijkstra’s algorithm [8] is added, subject to the same conditions outlined in the Greedy used to determine the shortest path from node 0 to node n. heuristic above. The QB heuristic is also of complexity of The length of the shortest path gives the information loss O(n2 log n), but empirically performs slightly faster than the corresponding to the best partition consistent with the Greedy heuristic [3]. The 2-Opt heuristic, is a tour specified ordering. improvement heuristic; that is, it starts with a tour and iteratively attempts to improve it. It operates by first randomly B. Existing Record Ordering Heuristics selecting two non-connected edges (a total of four points) Four heuristics for ordering records have been proposed from the tour. If re-connecting the edges result in a shorter in [9]: nearest-point-next (NPN), maximum distance (MD- overall tour the originally selected edges are replaced with MHM), maximum distance to average vector (MDAV-MHM), the newly computed ones. This process continues until either and centroid-based fixed-size (CBFS-MHM). The NPN no further exchanges result in a shorter tour, or some other heuristic selects the first record by computing the record stopping criteria are met. In the 3-opt heuristic, three furthest away from the centroid of the entire dataset. The disconnected edges are broken, and every possible way of record closest to the initial record is designated as the second reconnecting the edges into a valid tour is examined. If a tour record. The third record is closest to the second record. with shorter distance is found, the edges are swapped, and This process continues until all of the records have been the process repeats until no more exchanges are possible or added to the tour. The NPN heuristic runs in O(n2) time, some other stopping criteria are met. The Lin-Kernighan (LK) where n is the number of records in the dataset. The MD- heuristic is a tour improvement heuristic that is similar to 2- MHM heuristic works as follows: Find the two records r and opt and 3-opt. Like 2-opt, and 3-opt, LK takes an existing tour, s that are furthest from one another in the dataset, using and iteratively swaps edges to create a better tour. Lin and Euclidean distance. Find the k-1 closest records to r and Kernighan [19] note that the choice of swapping two edges in form a group. Find the k-1 closest records to s and form a 2-opt or 3-opt is arbitrary, since it is possible that swapping group. Repeat this process, using the remaining records that more than three edges may actually result in a better tour. haven’t been assigned to a group until all the remaining records Therefore, LK attempts to build a series of sequential k-opt © 2012 ACEEE 23 DOI: 01.IJCSI.03.01.11
  • 3. ACEEE Int. J. on Control System and Instrumentation, Vol. 03, No. 01, Feb 2012 moves (sometimes referred to as swaps, flips, or exchanges), against the Census, Tarragona, and EIA datasets for five rather than a 2-opt or 3-opt move, that will result in an overall different values of k. The information loss under each shorter tour. A good overview of these heuristics can be method is reported. For each specified value of k, the lowest found in [22]. information loss obtained for each dataset is highlighted in bold to emphasize the best performing method. Table 1 shows IV. EXPERIMENTAL DESIGN that in all but one case (k=10 for Census), our tour improvement heuristics improved upon the results obtained A. Datasets using MD-MHM. Results for the MDAV-MHM and CBFS- We used three benchmark datasets that were used in MHM family of algorithms are very similar, as shown in Tables previous studies [9] to evaluate our proposed methods. The 2 and 3. Reductions in information loss are often substantial. first dataset, “Tarragona”, consists of 834 records with 13 For example with the EIA dataset, the MDAV-MHM+LK variables. The second dataset, “EIA”, consists of 4092 heuristic for k=10 resulted in an information loss of 1.968, records with 10 variables. Finally, the “Census” dataset which is 31% lower than the information loss (2.867) obtained consists of 1080 records from the U.S. census, with 13 using MDAV-MHM alone. variables. Consistent with previous studies, the variables TABLE I. were normalized to ensure that results are not dominated by INFORMATION LOSS UNDER MD-MHM AND IMPROVEMENT any single variable. HEURISTICS B. Record Ordering Algorithms The record ordering algorithms used by Domingo-Ferrer et al. [9] are NPN, MD-MHM, CBFS-MHM, and MDAV-MHM. These algorithms were run on the three selected datasets. Additionally, two new record ordering algorithms introduced in this study – Greedy and Quick Boruvka – were also run. Three different algorithms for improving existing record orderings – 2-opt, 3-opt, and LK – were used with all tour construction heuristics. C. Comparisons The six microaggregation algorithms (i.e., NPN, MD- MHM, MDAV-MHM, CBFS-MHM, Greedy, and QB) were TABLE II. INFORMATION LOSS UNDER MDAV-MHM AND run on the three datasets (Census, Tarragona, and EIA) using IMPROVEMENT HEURISTICS five typical values of k (3, 4, 5, 6, and 10). These values of k are consistent with those used in [9]. The three tour improvement heuristics (2opt, 3opt, and LK) were then applied to these resulting record orderings. That is, for a given dataset and a given value of k, four different experiments were run using each record ordering heuristic (the original algorithm with no improvement, plus the three improvement heuristics). Consistent with previous studies, we report the normalized information loss IL = 100 * SSE/SST (with possible values between 0 and 100) where SST = g ni 2  i 1 j 1 X ij  X , and X is the centroid of the entire TABLE III. INFORMATION LOSS UNDER CBFS-MHM AND IMPROVEMENT HEURISTICS dataset. All of the algorithms were implemented in the Java programming language, using the Sun Java 6.0 SDK. All algorithms were run on a computer with a dual 3.0 GHz processor and 2GB of RAM, running the Linux operating system (Ubuntu 10.04). V. RESULTS This section details the results from applying the MD- MHM, MDAV-MHM, CBFS-MHM, NPN-MHM, Greedy- MHM, and QB-MHM algorithms, along with their corresponding 2-opt, 3-opt, and LK improvement heuristics, © 2012 ACEEE 24 DOI: 01.IJCSI.03.01. 11
  • 4. ACEEE Int. J. on Control System and Instrumentation, Vol. 03, No. 01, Feb 2012 TABLE IV. TABLE VII. INFORMATION LOSS UNDER NPN-MHM AND BEST COMBINATION OF TOUR CONSTRUCTION AND IMPROVEMENT HEURISTICS IMPROVEMENT HEURISTICS TABLE V. INFORMATION LOSS UNDER QB-MHM AND IMPROVEMENT HEURISTICS Table 7 summarizes the results from tables 1–6 as follows: the best performing combination of tour construction and tour improvement heuristics for each dataset are identified under a specified k. The row labeled “Best under” identifies the specific improvement heuristic that produced the best results. For example, with the Census dataset, when k=3, the best results are obtained using a combination of Greedy-MHM and the LK tour improvement heuristic with an information Tables 4, 5, and 6 present the results for the NPN-MHM, QB- loss of 5.002%. While no tour construction heuristic MHM, and Greedy-MHM family of algorithms respectively. dominates, out of 15 total instances, the best result was found Our tour improvement heuristics resulted in the lower infor- using the LK improvement heuristic 7 times and using the 3- mation loss in every instance opt improvement heuristic 6 times. Table 8 shows the TABLE VI. difference between the lowest information loss obtained using INFORMATION LOSS UNDER GREEDY-MHM AND extant microaggregation methods (the minimum value found IMPROVEMENT HEURISTICS from the MD, MDAV, CBFS, MD-MHM, CBFS-MHM, NPN- MHM, and MDAV-MHM heuristics) and those obtained using the new methods described in this paper. Under our methods information losses are decreased by 4%, 5.5%, and 13.9% on the average for the Census, Tarragona, and EIA datasets respectively. In all but one instance (k=10 with the census dataset) our proposed method outperformed existing methods and should hence be considered as a promising microaggregation heuristic. © 2012 ACEEE 25 DOI: 01.IJCSI.03.01.11
  • 5. ACEEE Int. J. on Control System and Instrumentation, Vol. 03, No. 01, Feb 2012 TABLE VIII. [12] Duncan, G. T., Elliot, M., Salazar-González, J. (2011). RESULTS COMPARED TO EXTANT METHODS Statistical Confidentiality: Principles and Practice, Springer. [13] Fayyoumi, E., & Oommen, B.J. (2009). Achieving microaggregation for secure statistical databases using fixed- structure partitioning-based learning automata. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 39(5), 1192-1205. [14] Hansen, S. L., & Mukherjee, S. (2003). A polynomial algorithm for optimal univariate microaggregation. IEEE Transactions on Knowledge and Data Engineering, 15(4), 1043-1044. [15] Hundepool, A., Wetering, A., Ramaswamy, R., Franconi, L., Capobianchi, A., Wolf, P., Domingo-Ferrer, J., Torra, V., Brand, R., & Giessing, A. (2004). µ-ARGUS Ver. 4.0 Software and User’s Manual. Voorburg, The Netherlands: Statistics Netherlands. [16] Kabir, M.E., Wang, H., & Zhang, Y. (2010). A pairwise- systematic microaggregation for statistical disclosure control. International Conference on Data Mining 2010, 266-273. [17] Laszlo, M., & Mukherjee, S. (2005). Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on REFERENCES Knowledge and Data Engineering, 17(7), 902-911. [18] Laszlo, M., & Mukherjee, S. (2009). Approximation bounds [1] Adam, N. R., & Wortmann, J. C. (1989). Security-control for minimum information loss microaggregation. IEEE Transactions methods for statistical databases: a comparative study. ACM on Knowledge and Data Engineering, 21(11), 1643-1647. Computing Surveys, 21(4), 515-556. [19] Lin, S., & Kernighan, B. (1973). An effective heuristic algorithm [2] Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, for the traveling-salesman problem. Operations Research, 21(2), R., Thomas, D., & Zhu, A. (2005). Approximation algorithms for 498-516. k-anonymity. Journal of Privacy Technology, Paper No. [20] Mateo-Sanz, J. M., & Domingo-Ferrer, J. (1999). A method 20051120001. for data oriented multivariate microaggregation In J. Domingo-Ferrer [3] Applegate, D., Cook, W., & Rohe, A. (2003). Chained Lin- (Ed.), Statistical Data Protection (pp. 89-99). Luxemburg: Office Kernighan for large traveling salesman problems. INFORMS Journal for Official Publications of the European Communities. on Computing, 15(1), 82-92. [21] Oganian, A., & Domingo-Ferrer, J. (2001). On the complexity [4] Balleste, A.M., Solanas, A., Domingo-Ferrer, J., & Mateo-Sanz, of optimal microaggregation for statistical disclosure control. J.M. (2007). A genetic approach to multivariate microaggregation Statistical Journal of the United Nations Economic Commission for for database privacy. IEEE 23rd International Conference on Data Europe, 18(4), 345-354. Engineering Workshop, 180-185. [22] Rego, C., Gamboa, D., Glover, F., & Osterman, C. (2011). [5] Bentley, J.L. (1992). Fast algorithms for geometric traveling Traveling salesman problem heuristics: leading methods, salesman problems. ORSA Journal of Computing, 4, 387-411. implementations, and latest advances. European Journal of [6] Chang, C.-C., Li, Y.-C., & Huang, W.-H. (2007). TFRP: An Operations Research, 211(3), 427-441. efficient microaggregation algorithm for statistical disclosure [23] Samarati, P. (2001). Protecting respondents’ identities in control. Journal of Systems and Software, 80(11), 1866-1878. microdata release. IEEE Transactions on Knowledge and Data [7] Defays, D., & Anwar, N. (1995). Micro-aggregation: a generic Engineering, 13(6), 1010-1027. method. Proceedings of the 2nd International Symposium on [24] Sande, G. (2002). Exact and approximate methods for data Statistical Confidentiality. Eurostat, Luxemburg, 69-78. directed microaggregation in one or more dimensions. International [8] Dijkstra, E. W. (1959). A note on two problems in connexion Journal of Uncertainty and Fuzziness in Knowledge Based Systems, with graphs. Numerische Mathematik, 1, 269-271. 10(5), 459-476. [9] Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J. M., [25] Solanas, A., Martinez-Balleste, A., Domingo-Ferrer, J., & & Sebé, F. (2006). Efficient multivariate data-oriented Mateo-Sanz, M. (2006). A 2^d-tree-based blocking method for microaggregation. The VLDB Journal, 15, 355–369. microaggregating very large data sets. In First International [10] Domingo-Ferrer, J., & Mateo-Sanz, J. M. (2002). Practical Conference on Availability, Reliability and Security (ARES’06), 922- data-oriented microaggregation for statistical disclosure control. IEEE 928. Transactions on Knowledge and Data Engineering, 14(1), 189- [26] Sweeney, L. (2002). K-anonymity: a model for protecting 201. privacy. International Journal on Uncertainty, Fuzziness and [11] Domingo-Ferrer, J., Sebe, F., & Solanas, A. A polynomial- Knowledge-based Systems, 10(5), 557–570. time approximation to optimal multivariate microaggregation. [27] Willenborg, L., & de Waal, T. (2001). Elements of statistical Computers & Mathematics with Applications, 55(4), 714-732. disclosure control. New York: Springer-Verlag. © 2012 ACEEE 26 DOI: 01.IJCSI.03.01. 11