Build profiles of customers likely to use which services
Why Should There be a Standard Process?
Framework for recording experience
Allows projects to be replicated
Aid to project planning and management
“ Comfort factor” for new adopters
Demonstrates maturity of Data Mining
Reduces dependency on “stars”
The data mining process must be reliable and repeatable by people with little data mining background.
Data mining is extensively used for knowledge discovery from large databases.
The problem with data mining is that with the availability of non-sensitive information, one is able to infer sensitive information that is not to be disclosed.
Thus privacy is becoming an increasingly important issue in many data mining applications.
This has led to the development of privacy preserving data mining.
What is Data Mining and Privacy Preservation all about:
Perform Data-mining on union of two private databases
Data stays private i.e. no party learns anything but output
meeting privacy requirements. Study of achieving some data mining goals without scarifying the privacy of the individuals
providing valid data mining results.
How Do We Do It?
There are two approaches to preserve privacy:
The first approach protects the privacy of the data by using an extended role based access control approach where sensitive objects identification is used to protect an individual’s privacy.
The second approach uses cryptographic techniques.
Cryptographic techniques for PPDM
To run the data mining algorithm on the union of their databases without revealing any unnecessary information.
consider separate medical institutions that wish to conduct a joint research while preserving the privacy of their patients.
Protection of privileged information, along with its use for research.
How much privacy?
if a data mining algorithm is run against the union of the databases, and its output becomes known to one or more of the parties, it reveals something about the contents of the other databases.
leak of information is inevitable, however, if the parties need to learn this output.
What is Cryptography?
The common definition of privacy in the cryptographic community limits the information that is leaked by the distributed computation to be the information that can be learned from the designated output of the computation.
Specific data mining:Applications
What data mining has done for…..
The US Internal Revenue Service needed to improve customer service and...
Scheduled its workforce to provide faster, more accurate answers to questions
What data mining has done for…………
The US Drug Enforcement Agency needed to be more effective in their drug “busts” and
analyzed suspects’ cell phone usage to focus investigations.
What data mining has done for……..
HSBC need to cross-sell more
effectively by identifying profiles
that would be interested in higher
yielding investments and...
Reduced direct mail costs by 30% while garnering 95% of the campaign’s revenue.
Data Mining can be utilized in any organization that needs to find patterns or relationships in their data.
By using the CRISP-DM methodology, analysts can have a reasonable level of assurance that their Data Mining efforts will render useful, repeatable, and valid results.
5 dimensions of PPDM
(1) the distribution of the
(2) how basic data are modified
(3) which mining method is being used
(4) If basic data or rules are to be hidden and
(5) which additional methods for privacy preservation are used
1. Collection limitation principle – too general to be enforced in PPDM
2. Data quality principle – most of today’s PPDM methods or algorithms assume that
data are already prepared to an appropriate quality to be mined
3. Purpose specification principle – extremely relevant for PPDM
4. Use limitation principle – fundamental for PPDM
5. Security safeguard principle – unenforceable in the context of PPDM
6. Openness principle – relevant for PPDM
7. Individual participation principle - Oliveira and Zaïane suggest that the implications of this principle for PPDM should be carefully weighed in light of the ownership of the basic data otherwise the application could be too rigid in PPDM applications.
8. Accountability principle – too general for PPDM