Privacy in Data Mining

1,393 views
1,347 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,393
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
75
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Privacy in Data Mining

  1. 1. Privacy in Data Mining Presented by Kalyan K Beemanapalli Vamshi Kodithava Graduate Student Graduate Student Department of CS Department of CS
  2. 2. Outline <ul><li>Definitions, Introduction, Importance and Relevance to the Class </li></ul><ul><li>Key Issues and Key Results </li></ul><ul><li>Techniques Developed </li></ul><ul><li>Open Issues/ Research Directions </li></ul><ul><li>References </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Definitions…….. <ul><li>Data Mining : Data mining - or knowledge discovery in data bases (KDD) – is the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data </li></ul><ul><li>Privacy : Individual’s desire and ability to keep certain information about themselves hidden from others. What is Corporate Privacy ? </li></ul><ul><li>Relationship: The primary goal of data mining is to extract hidden relationships and patterns among data items. Hence the privacy concerns of data mining </li></ul>
  4. 4. Importance…. <ul><li>Example from Health Informatics of privacy breaches due to data mining </li></ul><ul><li>The Database Inference Problem </li></ul><ul><li>Data mining makes the Inference problem quite dangerous </li></ul><ul><li>Hence, data mining algorithms are being revisited from the angle of privacy, security and Civil Liberties </li></ul><ul><li>Relevance to Class : Before applying Data Mining Techniques it is important to know its positive and negative implications </li></ul>
  5. 5. Outline <ul><li>Definitions, Introduction, Importance and Relevance to the Class </li></ul><ul><li>Key Issues and Key Results </li></ul><ul><li>Techniques Developed </li></ul><ul><li>Open Issues/ Research Directions </li></ul><ul><li>References </li></ul><ul><li>Conclusion </li></ul>
  6. 6. Privacy Preserving Data Mining <ul><li>Approaches adopted for preserving privacy </li></ul><ul><ul><li>Data Distribution </li></ul></ul><ul><ul><li>Data Modification </li></ul></ul><ul><ul><li>Data Mining Algorithm </li></ul></ul><ul><ul><li>Data or rule hiding </li></ul></ul><ul><ul><li>Privacy Preservation </li></ul></ul><ul><li>Concerns: Performance, Scalability, Data Utility, Level of Uncertainty, Resistance </li></ul><ul><li>Very thin line between utilizing the power of data mining techniques and preserving privacy. </li></ul>
  7. 7. Heuristic-Based Techniques <ul><li>Association Rule Confusion </li></ul><ul><li>Classification Rule Confusion </li></ul><ul><li>Privacy Preserving Clustering </li></ul><ul><li>Cryptography-Based Techniques </li></ul><ul><li>Reconstruction-Based Techniques </li></ul><ul><li>These techniques are combined with various data modification techniques and different data distribution techniques and numerous algorithms have been developed </li></ul>
  8. 8. Reconstruction-Based Techniques <ul><li>Outcome of the work done by Rakesh Agrawal and Ramakrishnan Srikant at IBM Research, Almaden </li></ul><ul><li>Step 1 :Creating randomized data sample by data perturbation of individual data records </li></ul><ul><li>Step 2 : Reconstruct distributions, not values in individual records . </li></ul><ul><li>Step 3 : By using the reconstructed distributions, build the decision tree classifier </li></ul><ul><li>For Reconstruction they used the Bayesian approach and proposed three algorithms for building decision trees that rely on reconstructed distributions </li></ul>
  9. 9. Accuracy Analysis
  10. 10. Outline <ul><li>Definitions, Introduction, Importance and Relevance to the Class </li></ul><ul><li>Key Issues and Key Results </li></ul><ul><li>Techniques Developed </li></ul><ul><li>Open Issues/ Research Directions </li></ul><ul><li>References </li></ul><ul><li>Conclusion </li></ul>
  11. 11. Open Issues/ Research Directions <ul><li>Data Warehousing and the Inference problem </li></ul><ul><li>Preparing Perturbed Databases for a combination of Algorithms </li></ul><ul><li>Social Effects. [ Work with Social Scientists to preserve privacy over cultures] </li></ul><ul><li>Formulating legal rules and developing data mining algorithms accordingly </li></ul><ul><li>Privacy Inference Controller </li></ul><ul><li>Quantifying Privacy? </li></ul><ul><li>How about formulating something similar to ACID properties for Perturbed Databases? </li></ul>
  12. 12. References <ul><li>Security and Privacy Implications of Data Mining , 1996, Chris Clifton and Don Marks </li></ul><ul><li>Defining privacy for Data Mining , Chris Clifton, Murat Kantarcioglu and Jaideep Vaidya, Purdue University </li></ul><ul><li>Data Mining, National Security, Privacy and Civil Liberties , Bhavani Thuraisingham, The National Science Foundation </li></ul><ul><li>A Framework for Privacy Preserving Classification in Data Mining , 2004 Md.Zahidul Islam and Ljiljana Brankovic </li></ul><ul><li>Privacy Preserving Mining of Association Rules , 2002, Alexandre Evfimlevski, Ramakrishnan Srikant, Rakesh Agrawal and Johannes Gehrke, IBM Almaden Research Center </li></ul>
  13. 13. References ………Continued <ul><li>Privacy Preserving Data Mining , 2000, Rakesh Agrawal and Ramakrishnan Srikanth, IBM Research, Almaden </li></ul><ul><li>Privacy Preserving Data Mining, Advances in Crptology , 2000, Y.Lindell and Benny Pinkas. </li></ul><ul><li>Detecting Privacy and Ethical Sensitivity in Data Mining Results, 2004, Peter Fule and John Roddick </li></ul><ul><li>Limiting Privacy Breaches in Privacy Preserving Data Mining , 2003, Alexandre Evvfimievski, Johannes Gehrke and Ramakrishnan Srikant </li></ul><ul><li>State-of-the-art in Privacy Preserving Data Mining , Vassilios S.Verykios, Elisa Bertino, Igor Nai Fovino, Provenza, Yucel Saygin and Yannis Theodoridis </li></ul>
  14. 14. Conclusion <ul><li>Presented an overview and brief insight into the new research in the area of data mining </li></ul><ul><li>Statistical Databases – were the first to think about privacy issues when data is being analyzed </li></ul><ul><li>Techniques and approaches used by researchers for preserving privacy while mining the data </li></ul><ul><li>Various open issues which have to be addressed and the potential research directions </li></ul><ul><li>Privacy, Secrecy, Ethical Sensitivity , National Security Civil Liberties……………. </li></ul>
  15. 15. Conclusion…….Continued <ul><li>There is so much to talk about this issue of privacy in general and specifically related to data mining. </li></ul><ul><li>Data Mining can be used to its fullest ability only if researchers address the problem of privacy and develop techniques in this direction </li></ul>
  16. 16. Queries Thank You.

×