SlideShare a Scribd company logo
Hiding sensitive association rules . 
Under the guidance of 
Sri G.Sridhar 
Assistant 
Professor 
Presented by: 
P.Vinay 
Reddy(11016T1013) 
Y.Bharath(11016T0982) 
G.Divya(11016T0997) 
K.Rashmika(11016T0998)
Outline 
1. Introduction 
2. Scope and application 
3. Problem formulation 
4. Modification schemes for rule hiding 
5. Implementation steps 
6. Flow diagram 
7. Apriori algorithm 
8. Requirements
Introduction 
The security of the large database that contains 
certain crucial information, it will become a 
serious issue when sharing data to the network 
against unauthorized access. Privacy preserving 
data mining is a new research trend in privacy 
data for data mining and statistical database. 
Association analysis is a powerful tool for 
discovering relationships which are hidden in large 
database. Association rules hiding algorithms get 
strong and efficient performance for protecting 
confidential and crucial data. Data modification 
and rule hiding is one of the most important
Terminology 
Data mining: 
Generally, data mining (sometimes called data or 
knowledge discovery) is the process of analyzing data 
from different perspectives and summarizing it into useful 
information. 
Data: Data are any facts, numbers, or text that can be 
processed by a computer. 
Association rule: 
Association rules are if/then statements that help uncover 
relationships between data in a transactional database.
Terminology(cont…) 
Itemset : an itemset is a set of products(items) 
Support : support of an itemset I is defined as the 
percentage of transactions that contain I 
in the entire database transactions 
Denoted as Sup( I ). 
Frequent itemset : if the support of itemset I is not lower 
than a minimum support 
threshold(MST) then I is called as frequent itemset. 
Confidence of an association rule X -> Y : probability that Y 
occurs given that X occurs.
Scope and Application 
• Shopping centers use association rules to 
increase sales. 
• Amazon use association rules to 
recommend an items based on 
current item you are browsing. 
• Google auto-complete. 
• customer’s consumption analysis.
Problem formulation 
Let M={i1,i2,…in} and D ={t1,t2,…tm}, where every tj is a 
subset of M, be the set of all distinct items and the 
transaction database, respectively. Each transaction tj is 
associated with a unique identifier called TID and can be 
represented as a bit-vector b1b2 . . . bn, where bk=1 if ik 
∈ tj. 
Association rule mining 
Include XY if 1.SupX U Y≥ 
MST and 
2. 
Association Rule Hiding 
To hide XY we have to1 .Smuapk Xe 
U Y<MST and 
Conf XY≥ MCT 
2.
Modification schemes for rule hiding 
Scheme1: Deletion of items. 
if X U Y is deleted 
1.Sup X U Y and 
2. Conf XY will be decreased. 
Scheme2: Insertion of items. 
Conf XY can be reduced by inserting X into 
a transaction that doesn’t contain Y in it.
Implementation steps 
1. Define minimum support threshold and 
minimum confidence threshold 
2. By applying Apriori algorithm on this 
transaction database we get 
frequent itemset and strong rules 
3. From these we select some rules to hide 
which are called sensitive rules. 
4. We strategically employee one of the 
above mentioned schemes to hide strong 
rules.
Flow diagram
The Apriori algorithm: 
It is a classic algorithm used in data mining for 
learning association rules. It is no where as 
complex For example consider as it following sounds, transaction on table 
the contrary it is very 
simple. 
Transaction ID Items Bought 
T1 {Mango, Onion, Nintendo, Key-chain, Eggs, Yo-yo} 
T2 {Doll, Onion, Nintendo, Key-chain, Eggs, Yo-yo} 
T3 {Mango, Apple, Key-chain, Eggs} 
T4 {Mango, Umbrella, Corn, Key-chain, Yo-yo} 
T5 {Corn, Onion, Onion, Key-chain, Ice-cream, Eggs} 
For simplicity consider 
M=mango 
O=onion and so on…
k-itemset : a set of k items. E.g. 
{beer, cheese, eggs} is a 3-itemset 
{cheese} is a 1-itemset 
{honey, ice-cream} is a 2-itemset 
support : an itemset has support s% if s% of the records 
in the DB contain that itemset. 
Lk 
Terminology 
Ck -K itemset 
-caFnrdeiqduaetenst K-itemset
The Apriori algorithm 
1. Find all frequent 1-itemsets 
2. For (k = 2 ; while Lk -1 is non-empty; 
k ++) 
3. {Ck = apriori-gen(Lk -1) 
4. For each c in Ck, initialise 
c .count to zero 
5. For all records r in the DB 
6. {Cr = subset(Ck, r ); For each c in 
Cr , c.count ++ }
Example 
So original transactional database will 
be as follows. 
Transaction ID Items Bought 
T1 {M, O, N, K, E, Y } 
T2 {D, O, N, K, E, Y } 
T3 {M, A, K, E} 
T4 {M, U, C, K, Y } 
T5 {C, O, O, K, I, E} 
Let minimum support 
count be 3 
1.Find all item sets with specified minimal 
support(coverage). 
2. Use these item sets to generate 
interesting rules.
Example(cont…) 
Step 1: 
Count number of transactions in which each item occur 
and tabulate them as below. Item Support count 
M 3 
O 3 
N 2 
K 5 
E 4 
Y 3 
D 1 
A 1 
U 1 
C 2 
I 1
Example(cont…) 
Step 2: 
Remove all the item sets that are bought less 
than 3 times because minimum support count is 
assumed as 3. 
So above tabItelme wilSlu ppborte co unct hanged to 
M 3 
O 3 
K 5 
E 4 
Y 3 
These are the single items that are 
most frequently bought.
Example(cont…) 
Step 3: 
Now take each possible pair of items and count how 
many times each item pair is being bought together. 
Now our table turns as 
Item Pairs Support count 
MO 1 
MK 3 
ME 2 
MY 2 
OK 3 
OE 3 
OY 2 
KE 4 
KY 3 
EY 2 
Totally C(n,2)=n(n-1)/2 item 
pairs will be possible. 
As n=5 here 5(4)/2=10 pairs 
exist.
Example(cont…) 
Step 4: 
Repeat same procedure i.e, remove all the item pairs 
with support count < 3 and now we are left with 
Item Pairs Support count 
MK 3 
OK 3 
OE 3 
KE 4 
KY 3 
These are the pair of item sets that are 
bought more frequently.
Example(cont…) 
Step 5: 
To make set of three items we need one more 
rule termed as self-join. 
It simply means, from the Item pairs in the 
above table, we find two pairs with 
the same first Alphabet, so we get 
1. OK and OE, this gives OKE 
2. KE and KY, this gives KEY 
Item Set Support count 
OKE 3 
KEY 2 
Here 3-itemset OKE is bought 3 times. So, OKE is 3- 
item set bought more frequently.
Hardware specifications 
1. Pentium IV 
processor 
2. 256MB RAM 
3. 40GB Hard 
disk
Software specifications 
Operating System : windows 
2000 and above 
versions 
Jdk kit : Jdk 
1.7 
Front end : Java
Thank You

More Related Content

Similar to Hiding Sensitive Association Rules

CS583-association-rules presentation.ppt
CS583-association-rules presentation.pptCS583-association-rules presentation.ppt
CS583-association-rules presentation.ppt
l228296
 
CS583-association-rules.ppt
CS583-association-rules.pptCS583-association-rules.ppt
CS583-association-rules.ppt
ZAFmedia
 
unit II Mining Association Rule.pdf
unit II Mining   Association    Rule.pdfunit II Mining   Association    Rule.pdf
unit II Mining Association Rule.pdf
logeswarisaravanan
 
Association rule mining used in data mining
Association rule mining used in data miningAssociation rule mining used in data mining
Association rule mining used in data mining
vayumani25
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.ppt
SowmyaJyothi3
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.ppt
SowmyaJyothi3
 
ASSOCIATION RULE MINING BASED ON TRADE LIST
ASSOCIATION RULE MINING BASED  ON TRADE LISTASSOCIATION RULE MINING BASED  ON TRADE LIST
ASSOCIATION RULE MINING BASED ON TRADE LIST
IJDKP
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
An Approach of Improvisation in Efficiency of Apriori Algorithm
An Approach of Improvisation in Efficiency of Apriori AlgorithmAn Approach of Improvisation in Efficiency of Apriori Algorithm
An Approach of Improvisation in Efficiency of Apriori Algorithm
International Journal of Computer and Communication System Engineering
 
Datamining.pptx
Datamining.pptxDatamining.pptx
Datamining.pptx
MedinaBedru
 
Hiding slides
Hiding slidesHiding slides
Hiding slides
sameeksha15
 
MiningAssociationbestRulespresentation.ppt
MiningAssociationbestRulespresentation.pptMiningAssociationbestRulespresentation.ppt
MiningAssociationbestRulespresentation.ppt
l228296
 
3 association rule mining
3 association rule mining3 association rule mining
3 association rule mining
Vishal Dutt
 
Efficient Temporal Association Rule Mining
Efficient Temporal Association Rule MiningEfficient Temporal Association Rule Mining
A04010105
A04010105A04010105
Efficient Temporal Association Rule Mining
Efficient Temporal Association Rule MiningEfficient Temporal Association Rule Mining
Efficient Temporal Association Rule Mining
IJMER
 
Association Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxAssociation Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptx
lahiruherath654
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
cscpconf
 
Pattern Discovery Using Apriori and Ch-Search Algorithm
 Pattern Discovery Using Apriori and Ch-Search Algorithm Pattern Discovery Using Apriori and Ch-Search Algorithm
Pattern Discovery Using Apriori and Ch-Search Algorithm
ijceronline
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
Rashi Agarwal
 

Similar to Hiding Sensitive Association Rules (20)

CS583-association-rules presentation.ppt
CS583-association-rules presentation.pptCS583-association-rules presentation.ppt
CS583-association-rules presentation.ppt
 
CS583-association-rules.ppt
CS583-association-rules.pptCS583-association-rules.ppt
CS583-association-rules.ppt
 
unit II Mining Association Rule.pdf
unit II Mining   Association    Rule.pdfunit II Mining   Association    Rule.pdf
unit II Mining Association Rule.pdf
 
Association rule mining used in data mining
Association rule mining used in data miningAssociation rule mining used in data mining
Association rule mining used in data mining
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.ppt
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.ppt
 
ASSOCIATION RULE MINING BASED ON TRADE LIST
ASSOCIATION RULE MINING BASED  ON TRADE LISTASSOCIATION RULE MINING BASED  ON TRADE LIST
ASSOCIATION RULE MINING BASED ON TRADE LIST
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
An Approach of Improvisation in Efficiency of Apriori Algorithm
An Approach of Improvisation in Efficiency of Apriori AlgorithmAn Approach of Improvisation in Efficiency of Apriori Algorithm
An Approach of Improvisation in Efficiency of Apriori Algorithm
 
Datamining.pptx
Datamining.pptxDatamining.pptx
Datamining.pptx
 
Hiding slides
Hiding slidesHiding slides
Hiding slides
 
MiningAssociationbestRulespresentation.ppt
MiningAssociationbestRulespresentation.pptMiningAssociationbestRulespresentation.ppt
MiningAssociationbestRulespresentation.ppt
 
3 association rule mining
3 association rule mining3 association rule mining
3 association rule mining
 
Efficient Temporal Association Rule Mining
Efficient Temporal Association Rule MiningEfficient Temporal Association Rule Mining
Efficient Temporal Association Rule Mining
 
A04010105
A04010105A04010105
A04010105
 
Efficient Temporal Association Rule Mining
Efficient Temporal Association Rule MiningEfficient Temporal Association Rule Mining
Efficient Temporal Association Rule Mining
 
Association Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxAssociation Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptx
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
 
Pattern Discovery Using Apriori and Ch-Search Algorithm
 Pattern Discovery Using Apriori and Ch-Search Algorithm Pattern Discovery Using Apriori and Ch-Search Algorithm
Pattern Discovery Using Apriori and Ch-Search Algorithm
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
 

Hiding Sensitive Association Rules

  • 1. Hiding sensitive association rules . Under the guidance of Sri G.Sridhar Assistant Professor Presented by: P.Vinay Reddy(11016T1013) Y.Bharath(11016T0982) G.Divya(11016T0997) K.Rashmika(11016T0998)
  • 2. Outline 1. Introduction 2. Scope and application 3. Problem formulation 4. Modification schemes for rule hiding 5. Implementation steps 6. Flow diagram 7. Apriori algorithm 8. Requirements
  • 3. Introduction The security of the large database that contains certain crucial information, it will become a serious issue when sharing data to the network against unauthorized access. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Association analysis is a powerful tool for discovering relationships which are hidden in large database. Association rules hiding algorithms get strong and efficient performance for protecting confidential and crucial data. Data modification and rule hiding is one of the most important
  • 4. Terminology Data mining: Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information. Data: Data are any facts, numbers, or text that can be processed by a computer. Association rule: Association rules are if/then statements that help uncover relationships between data in a transactional database.
  • 5. Terminology(cont…) Itemset : an itemset is a set of products(items) Support : support of an itemset I is defined as the percentage of transactions that contain I in the entire database transactions Denoted as Sup( I ). Frequent itemset : if the support of itemset I is not lower than a minimum support threshold(MST) then I is called as frequent itemset. Confidence of an association rule X -> Y : probability that Y occurs given that X occurs.
  • 6. Scope and Application • Shopping centers use association rules to increase sales. • Amazon use association rules to recommend an items based on current item you are browsing. • Google auto-complete. • customer’s consumption analysis.
  • 7. Problem formulation Let M={i1,i2,…in} and D ={t1,t2,…tm}, where every tj is a subset of M, be the set of all distinct items and the transaction database, respectively. Each transaction tj is associated with a unique identifier called TID and can be represented as a bit-vector b1b2 . . . bn, where bk=1 if ik ∈ tj. Association rule mining Include XY if 1.SupX U Y≥ MST and 2. Association Rule Hiding To hide XY we have to1 .Smuapk Xe U Y<MST and Conf XY≥ MCT 2.
  • 8. Modification schemes for rule hiding Scheme1: Deletion of items. if X U Y is deleted 1.Sup X U Y and 2. Conf XY will be decreased. Scheme2: Insertion of items. Conf XY can be reduced by inserting X into a transaction that doesn’t contain Y in it.
  • 9. Implementation steps 1. Define minimum support threshold and minimum confidence threshold 2. By applying Apriori algorithm on this transaction database we get frequent itemset and strong rules 3. From these we select some rules to hide which are called sensitive rules. 4. We strategically employee one of the above mentioned schemes to hide strong rules.
  • 11. The Apriori algorithm: It is a classic algorithm used in data mining for learning association rules. It is no where as complex For example consider as it following sounds, transaction on table the contrary it is very simple. Transaction ID Items Bought T1 {Mango, Onion, Nintendo, Key-chain, Eggs, Yo-yo} T2 {Doll, Onion, Nintendo, Key-chain, Eggs, Yo-yo} T3 {Mango, Apple, Key-chain, Eggs} T4 {Mango, Umbrella, Corn, Key-chain, Yo-yo} T5 {Corn, Onion, Onion, Key-chain, Ice-cream, Eggs} For simplicity consider M=mango O=onion and so on…
  • 12. k-itemset : a set of k items. E.g. {beer, cheese, eggs} is a 3-itemset {cheese} is a 1-itemset {honey, ice-cream} is a 2-itemset support : an itemset has support s% if s% of the records in the DB contain that itemset. Lk Terminology Ck -K itemset -caFnrdeiqduaetenst K-itemset
  • 13. The Apriori algorithm 1. Find all frequent 1-itemsets 2. For (k = 2 ; while Lk -1 is non-empty; k ++) 3. {Ck = apriori-gen(Lk -1) 4. For each c in Ck, initialise c .count to zero 5. For all records r in the DB 6. {Cr = subset(Ck, r ); For each c in Cr , c.count ++ }
  • 14. Example So original transactional database will be as follows. Transaction ID Items Bought T1 {M, O, N, K, E, Y } T2 {D, O, N, K, E, Y } T3 {M, A, K, E} T4 {M, U, C, K, Y } T5 {C, O, O, K, I, E} Let minimum support count be 3 1.Find all item sets with specified minimal support(coverage). 2. Use these item sets to generate interesting rules.
  • 15. Example(cont…) Step 1: Count number of transactions in which each item occur and tabulate them as below. Item Support count M 3 O 3 N 2 K 5 E 4 Y 3 D 1 A 1 U 1 C 2 I 1
  • 16. Example(cont…) Step 2: Remove all the item sets that are bought less than 3 times because minimum support count is assumed as 3. So above tabItelme wilSlu ppborte co unct hanged to M 3 O 3 K 5 E 4 Y 3 These are the single items that are most frequently bought.
  • 17. Example(cont…) Step 3: Now take each possible pair of items and count how many times each item pair is being bought together. Now our table turns as Item Pairs Support count MO 1 MK 3 ME 2 MY 2 OK 3 OE 3 OY 2 KE 4 KY 3 EY 2 Totally C(n,2)=n(n-1)/2 item pairs will be possible. As n=5 here 5(4)/2=10 pairs exist.
  • 18. Example(cont…) Step 4: Repeat same procedure i.e, remove all the item pairs with support count < 3 and now we are left with Item Pairs Support count MK 3 OK 3 OE 3 KE 4 KY 3 These are the pair of item sets that are bought more frequently.
  • 19. Example(cont…) Step 5: To make set of three items we need one more rule termed as self-join. It simply means, from the Item pairs in the above table, we find two pairs with the same first Alphabet, so we get 1. OK and OE, this gives OKE 2. KE and KY, this gives KEY Item Set Support count OKE 3 KEY 2 Here 3-itemset OKE is bought 3 times. So, OKE is 3- item set bought more frequently.
  • 20. Hardware specifications 1. Pentium IV processor 2. 256MB RAM 3. 40GB Hard disk
  • 21. Software specifications Operating System : windows 2000 and above versions Jdk kit : Jdk 1.7 Front end : Java