Your SlideShare is downloading. ×
0
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Privacy in Data Mining
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Privacy in Data Mining

1,030

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,030
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
47
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Privacy in Data Mining Presented by Kalyan K Beemanapalli Vamshi Kodithava Graduate Student Graduate Student Department of CS Department of CS
  • 2. Outline <ul><li>Definitions, Introduction, Importance and Relevance to the Class </li></ul><ul><li>Key Issues and Key Results </li></ul><ul><li>Techniques Developed </li></ul><ul><li>Open Issues/ Research Directions </li></ul><ul><li>References </li></ul><ul><li>Conclusion </li></ul>
  • 3. Definitions…….. <ul><li>Data Mining : Data mining - or knowledge discovery in data bases (KDD) – is the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data </li></ul><ul><li>Privacy : Individual’s desire and ability to keep certain information about themselves hidden from others. What is Corporate Privacy ? </li></ul><ul><li>Relationship: The primary goal of data mining is to extract hidden relationships and patterns among data items. Hence the privacy concerns of data mining </li></ul>
  • 4. Importance…. <ul><li>Example from Health Informatics of privacy breaches due to data mining </li></ul><ul><li>The Database Inference Problem </li></ul><ul><li>Data mining makes the Inference problem quite dangerous </li></ul><ul><li>Hence, data mining algorithms are being revisited from the angle of privacy, security and Civil Liberties </li></ul><ul><li>Relevance to Class : Before applying Data Mining Techniques it is important to know its positive and negative implications </li></ul>
  • 5. Outline <ul><li>Definitions, Introduction, Importance and Relevance to the Class </li></ul><ul><li>Key Issues and Key Results </li></ul><ul><li>Techniques Developed </li></ul><ul><li>Open Issues/ Research Directions </li></ul><ul><li>References </li></ul><ul><li>Conclusion </li></ul>
  • 6. Privacy Preserving Data Mining <ul><li>Approaches adopted for preserving privacy </li></ul><ul><ul><li>Data Distribution </li></ul></ul><ul><ul><li>Data Modification </li></ul></ul><ul><ul><li>Data Mining Algorithm </li></ul></ul><ul><ul><li>Data or rule hiding </li></ul></ul><ul><ul><li>Privacy Preservation </li></ul></ul><ul><li>Concerns: Performance, Scalability, Data Utility, Level of Uncertainty, Resistance </li></ul><ul><li>Very thin line between utilizing the power of data mining techniques and preserving privacy. </li></ul>
  • 7. Heuristic-Based Techniques <ul><li>Association Rule Confusion </li></ul><ul><li>Classification Rule Confusion </li></ul><ul><li>Privacy Preserving Clustering </li></ul><ul><li>Cryptography-Based Techniques </li></ul><ul><li>Reconstruction-Based Techniques </li></ul><ul><li>These techniques are combined with various data modification techniques and different data distribution techniques and numerous algorithms have been developed </li></ul>
  • 8. Reconstruction-Based Techniques <ul><li>Outcome of the work done by Rakesh Agrawal and Ramakrishnan Srikant at IBM Research, Almaden </li></ul><ul><li>Step 1 :Creating randomized data sample by data perturbation of individual data records </li></ul><ul><li>Step 2 : Reconstruct distributions, not values in individual records . </li></ul><ul><li>Step 3 : By using the reconstructed distributions, build the decision tree classifier </li></ul><ul><li>For Reconstruction they used the Bayesian approach and proposed three algorithms for building decision trees that rely on reconstructed distributions </li></ul>
  • 9. Accuracy Analysis
  • 10. Outline <ul><li>Definitions, Introduction, Importance and Relevance to the Class </li></ul><ul><li>Key Issues and Key Results </li></ul><ul><li>Techniques Developed </li></ul><ul><li>Open Issues/ Research Directions </li></ul><ul><li>References </li></ul><ul><li>Conclusion </li></ul>
  • 11. Open Issues/ Research Directions <ul><li>Data Warehousing and the Inference problem </li></ul><ul><li>Preparing Perturbed Databases for a combination of Algorithms </li></ul><ul><li>Social Effects. [ Work with Social Scientists to preserve privacy over cultures] </li></ul><ul><li>Formulating legal rules and developing data mining algorithms accordingly </li></ul><ul><li>Privacy Inference Controller </li></ul><ul><li>Quantifying Privacy? </li></ul><ul><li>How about formulating something similar to ACID properties for Perturbed Databases? </li></ul>
  • 12. References <ul><li>Security and Privacy Implications of Data Mining , 1996, Chris Clifton and Don Marks </li></ul><ul><li>Defining privacy for Data Mining , Chris Clifton, Murat Kantarcioglu and Jaideep Vaidya, Purdue University </li></ul><ul><li>Data Mining, National Security, Privacy and Civil Liberties , Bhavani Thuraisingham, The National Science Foundation </li></ul><ul><li>A Framework for Privacy Preserving Classification in Data Mining , 2004 Md.Zahidul Islam and Ljiljana Brankovic </li></ul><ul><li>Privacy Preserving Mining of Association Rules , 2002, Alexandre Evfimlevski, Ramakrishnan Srikant, Rakesh Agrawal and Johannes Gehrke, IBM Almaden Research Center </li></ul>
  • 13. References ………Continued <ul><li>Privacy Preserving Data Mining , 2000, Rakesh Agrawal and Ramakrishnan Srikanth, IBM Research, Almaden </li></ul><ul><li>Privacy Preserving Data Mining, Advances in Crptology , 2000, Y.Lindell and Benny Pinkas. </li></ul><ul><li>Detecting Privacy and Ethical Sensitivity in Data Mining Results, 2004, Peter Fule and John Roddick </li></ul><ul><li>Limiting Privacy Breaches in Privacy Preserving Data Mining , 2003, Alexandre Evvfimievski, Johannes Gehrke and Ramakrishnan Srikant </li></ul><ul><li>State-of-the-art in Privacy Preserving Data Mining , Vassilios S.Verykios, Elisa Bertino, Igor Nai Fovino, Provenza, Yucel Saygin and Yannis Theodoridis </li></ul>
  • 14. Conclusion <ul><li>Presented an overview and brief insight into the new research in the area of data mining </li></ul><ul><li>Statistical Databases – were the first to think about privacy issues when data is being analyzed </li></ul><ul><li>Techniques and approaches used by researchers for preserving privacy while mining the data </li></ul><ul><li>Various open issues which have to be addressed and the potential research directions </li></ul><ul><li>Privacy, Secrecy, Ethical Sensitivity , National Security Civil Liberties……………. </li></ul>
  • 15. Conclusion…….Continued <ul><li>There is so much to talk about this issue of privacy in general and specifically related to data mining. </li></ul><ul><li>Data Mining can be used to its fullest ability only if researchers address the problem of privacy and develop techniques in this direction </li></ul>
  • 16. Queries Thank You.

×