Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Approximate Protocol for Privacy Preserving Associate Rule Mining


Published on

A discussion on the research paper 'An Efficient Approximate Protocol for Privacy-Preserving Association Rule Mining' by 'Murat Kantarcioglu, Robert Nix , and Jaideep Vaidya'

Published in: Data & Analytics
  • Be the first to comment

Approximate Protocol for Privacy Preserving Associate Rule Mining

  2. 2. CONTENTS • Introduction • Two Typesof Techniques • Problem • Proposing Solution • Related Work • Bloom Filters • Redefinethe Problem • Approximate Threshold Dot ProductAlgorithm • Security • Computational and Communicational Cost • Experimental Results • Accuracy • Efficiency • Conclusion
  3. 3. INTRODUCTION • Association Rule Mining • An important data mining model studied extensively by the database and data mining community • A method for discovering interesting relations between variables in large databases • "Beer and diaper" story • Buying Diapers==> Buying Beer • Privacy Preserving in Association Rule Mining • More parties are interested in learning the global association rules • None is willing to reveal the data at individual sites
  4. 4. TWO TYPES OF TECHNIQUES • Perturbation Based methods • Locally perturb data before delivering to the data miner • Special techniques are used to reconstructthe original distribution • Mining algorithm needs to be modified to consider that data is perturbed • Have security concerns • Secure Multiparty Computation Techniques • Each party builds decision tree • Only the final decision tree is shared, not any other data • Use cryptographic techniques for security • Computationally intensive
  5. 5. PROBLEM? How can we mine data in an efficient and provably secure way?
  6. 6. PROPOSING SOLUTION An approximate protocol for computing the dot product of two vectors owned by two different parties
  7. 7. RELATED WORK • A similar approximation protocol is already proposed with sampling techniques • A solution is present with bloom filters • Rule mining is done centrally • Goethals' s encryption mechanism • simple and secured • Calculate exact dot product • Run time O(n)
  8. 8. BLOOM FILTERS • A probabilistic data structure • Used to test on membership of a set • False positives are possible • No false negatives • Can be used to approximate the intersection size between two sets
  9. 9. REDEFINE PROBLEM • Compute the scalar product • Checks if the scalar product of two distributed vectors is greater than some threshold X1 . X2 = |S1 ∩ S2| ≥ t
  10. 10. APPROXIMATE THRESHOLD DOT PRODUCT ALGORITHM • Each party creates own bloom filter, using common parameters. • size of the bloom filter - m • hash functions - h1, h2, ..................., hk • Participate in the secure dot product algorithm using private bloom filters and get the random shares of the dot product result • each party participates in secure multiplication protocol using private dot product results to get the random share of the multiplication result • Finally, each party participate in a secure comparison protocol to approximate the final result.
  11. 11. SECURITY • Preserved under following assumptions • Parties are semi-honest • Dot product, multiplication and comparison protocols are secure
  12. 12. COMPUTATIONAL AND COMMUNICATIONAL COST • O(nk) for hashing for bloom filters, rest is O(1) • Hashing cost is negligible compared to public key operations • m<<n --> faster • Flexible to use if a better secure dot product computing protocol if found in the future • communication cost propotional to m --> low cost
  13. 13. EXPERIMENTAL RESULTS • Consider effect of, • vector length (l), • vector density (d) • the actual intersection of the two vectors (i) • the bloom filter parameters • m (length of filter) • k(number of hash functions) on the performance of the algorithm.
  14. 14. ACCURACY -1 Increase k --> increase distortion --> less accuracy (when filter length is small )
  15. 15. ACCURACY -2 Increase filter length --> high accuracy (Less distortion and collision )
  16. 16. ACCURACY -3 Even for a large vector, same accuracy can be achieved with sub- linear increase in filter length
  17. 17. ACCURACY -4 • At 0 density no error • Drastically increase error at high densities • Good for sparse vectors
  18. 18. ACCURACY -5 < 1 % error all the time
  19. 19. EFFICIENCY Compared to exact version, 27m : 57s Vs 4m : 04s at run time
  20. 20. CONCLUSIONS • Propose an efficient and secure protocol to approximately compute scalar product in a privacy preserving manner. • Efficiency is gained by allowing an approximation than an exact answer • Extending to work with more than 2 parties is a future work
  21. 21. Q & A