Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Data Linkage by Alasdair Gray 1577 views
- Brisbane Health-y Data: Queensland ... by ARDC 400 views
- IEEE 2014 JAVA DATA MINING PROJECTS... by IEEEFINALYEARSTUD... 624 views
- Methodological developments in data... by AugustaMarton 65 views
- Predictive Models and data linkage by Nuffield Trust 639 views
- Privacy Preserved Distributed Data ... by Editor IJMTER 186 views

410 views

Published on

Published in:
Data & Analytics

License: CC Attribution License

No Downloads

Total views

410

On SlideShare

0

From Embeds

0

Number of Embeds

4

Shares

0

Downloads

15

Comments

0

Likes

1

No embeds

No notes for slide

- 1. AN EFFICIENT APPROXIMATE PROTOCOL FOR PRIVACY PRESERVING ASSOCIATION RULE MINING MURAT KANTARCIOGLU, ROBERT NIX AND JAIDEEP VAIDYA (2009) - BY PUSHPALANKA JAYAWARDHANA 158217G
- 2. CONTENTS • Introduction • Two Typesof Techniques • Problem • Proposing Solution • Related Work • Bloom Filters • Redefinethe Problem • Approximate Threshold Dot ProductAlgorithm • Security • Computational and Communicational Cost • Experimental Results • Accuracy • Efficiency • Conclusion
- 3. INTRODUCTION • Association Rule Mining • An important data mining model studied extensively by the database and data mining community • A method for discovering interesting relations between variables in large databases • "Beer and diaper" story • Buying Diapers==> Buying Beer • Privacy Preserving in Association Rule Mining • More parties are interested in learning the global association rules • None is willing to reveal the data at individual sites
- 4. TWO TYPES OF TECHNIQUES • Perturbation Based methods • Locally perturb data before delivering to the data miner • Special techniques are used to reconstructthe original distribution • Mining algorithm needs to be modified to consider that data is perturbed • Have security concerns • Secure Multiparty Computation Techniques • Each party builds decision tree • Only the final decision tree is shared, not any other data • Use cryptographic techniques for security • Computationally intensive
- 5. PROBLEM? How can we mine data in an efficient and provably secure way?
- 6. PROPOSING SOLUTION An approximate protocol for computing the dot product of two vectors owned by two different parties
- 7. RELATED WORK • A similar approximation protocol is already proposed with sampling techniques • A solution is present with bloom filters • Rule mining is done centrally • Goethals' s encryption mechanism • simple and secured • Calculate exact dot product • Run time O(n)
- 8. BLOOM FILTERS • A probabilistic data structure • Used to test on membership of a set • False positives are possible • No false negatives • Can be used to approximate the intersection size between two sets
- 9. REDEFINE PROBLEM • Compute the scalar product • Checks if the scalar product of two distributed vectors is greater than some threshold X1 . X2 = |S1 ∩ S2| ≥ t
- 10. APPROXIMATE THRESHOLD DOT PRODUCT ALGORITHM • Each party creates own bloom filter, using common parameters. • size of the bloom filter - m • hash functions - h1, h2, ..................., hk • Participate in the secure dot product algorithm using private bloom filters and get the random shares of the dot product result • each party participates in secure multiplication protocol using private dot product results to get the random share of the multiplication result • Finally, each party participate in a secure comparison protocol to approximate the final result.
- 11. SECURITY • Preserved under following assumptions • Parties are semi-honest • Dot product, multiplication and comparison protocols are secure
- 12. COMPUTATIONAL AND COMMUNICATIONAL COST • O(nk) for hashing for bloom filters, rest is O(1) • Hashing cost is negligible compared to public key operations • m<<n --> faster • Flexible to use if a better secure dot product computing protocol if found in the future • communication cost propotional to m --> low cost
- 13. EXPERIMENTAL RESULTS • Consider effect of, • vector length (l), • vector density (d) • the actual intersection of the two vectors (i) • the bloom filter parameters • m (length of filter) • k(number of hash functions) on the performance of the algorithm.
- 14. ACCURACY -1 Increase k --> increase distortion --> less accuracy (when filter length is small )
- 15. ACCURACY -2 Increase filter length --> high accuracy (Less distortion and collision )
- 16. ACCURACY -3 Even for a large vector, same accuracy can be achieved with sub- linear increase in filter length
- 17. ACCURACY -4 • At 0 density no error • Drastically increase error at high densities • Good for sparse vectors
- 18. ACCURACY -5 < 1 % error all the time
- 19. EFFICIENCY Compared to exact version, 27m : 57s Vs 4m : 04s at run time
- 20. CONCLUSIONS • Propose an efficient and secure protocol to approximately compute scalar product in a privacy preserving manner. • Efficiency is gained by allowing an approximation than an exact answer • Extending to work with more than 2 parties is a future work
- 21. Q & A

No public clipboards found for this slide

Be the first to comment