Hiding Sensitive Association Rules

Hiding sensitive association rules .
Under the guidance of
Sri G.Sridhar
Assistant
Professor
Presented by:
P.Vinay
Reddy(11016T1013)
Y.Bharath(11016T0982)
G.Divya(11016T0997)
K.Rashmika(11016T0998)

Outline
1. Introduction
2. Scope and application
3. Problem formulation
4. Modification schemes for rule hiding
5. Implementation steps
6. Flow diagram
7. Apriori algorithm
8. Requirements

Introduction
The security of the large database that contains
certain crucial information, it will become a
serious issue when sharing data to the network
against unauthorized access. Privacy preserving
data mining is a new research trend in privacy
data for data mining and statistical database.
Association analysis is a powerful tool for
discovering relationships which are hidden in large
database. Association rules hiding algorithms get
strong and efficient performance for protecting
confidential and crucial data. Data modification
and rule hiding is one of the most important

Terminology
Data mining:
Generally, data mining (sometimes called data or
knowledge discovery) is the process of analyzing data
from different perspectives and summarizing it into useful
information.
Data: Data are any facts, numbers, or text that can be
processed by a computer.
Association rule:
Association rules are if/then statements that help uncover
relationships between data in a transactional database.

Terminology(cont…)
Itemset : an itemset is a set of products(items)
Support : support of an itemset I is defined as the
percentage of transactions that contain I
in the entire database transactions
Denoted as Sup( I ).
Frequent itemset : if the support of itemset I is not lower
than a minimum support
threshold(MST) then I is called as frequent itemset.
Confidence of an association rule X -> Y : probability that Y
occurs given that X occurs.

Scope and Application
• Shopping centers use association rules to
increase sales.
• Amazon use association rules to
recommend an items based on
current item you are browsing.
• Google auto-complete.
• customer’s consumption analysis.

Problem formulation
Let M={i1,i2,…in} and D ={t1,t2,…tm}, where every tj is a
subset of M, be the set of all distinct items and the
transaction database, respectively. Each transaction tj is
associated with a unique identifier called TID and can be
represented as a bit-vector b1b2 . . . bn, where bk=1 if ik
∈ tj.
Association rule mining
Include XY if 1.SupX U Y≥
MST and
2.
Association Rule Hiding
To hide XY we have to1 .Smuapk Xe
U Y<MST and
Conf XY≥ MCT
2.

Modification schemes for rule hiding
Scheme1: Deletion of items.
if X U Y is deleted
1.Sup X U Y and
2. Conf XY will be decreased.
Scheme2: Insertion of items.
Conf XY can be reduced by inserting X into
a transaction that doesn’t contain Y in it.

Implementation steps
1. Define minimum support threshold and
minimum confidence threshold
2. By applying Apriori algorithm on this
transaction database we get
frequent itemset and strong rules
3. From these we select some rules to hide
which are called sensitive rules.
4. We strategically employee one of the
above mentioned schemes to hide strong
rules.

The Apriori algorithm:
It is a classic algorithm used in data mining for
learning association rules. It is no where as
complex For example consider as it following sounds, transaction on table
the contrary it is very
simple.
Transaction ID Items Bought
T1 {Mango, Onion, Nintendo, Key-chain, Eggs, Yo-yo}
T2 {Doll, Onion, Nintendo, Key-chain, Eggs, Yo-yo}
T3 {Mango, Apple, Key-chain, Eggs}
T4 {Mango, Umbrella, Corn, Key-chain, Yo-yo}
T5 {Corn, Onion, Onion, Key-chain, Ice-cream, Eggs}
For simplicity consider
M=mango
O=onion and so on…

k-itemset : a set of k items. E.g.
{beer, cheese, eggs} is a 3-itemset
{cheese} is a 1-itemset
{honey, ice-cream} is a 2-itemset
support : an itemset has support s% if s% of the records
in the DB contain that itemset.
Lk
Terminology
Ck -K itemset
-caFnrdeiqduaetenst K-itemset

The Apriori algorithm
1. Find all frequent 1-itemsets
2. For (k = 2 ; while Lk -1 is non-empty;
k ++)
3. {Ck = apriori-gen(Lk -1)
4. For each c in Ck, initialise
c .count to zero
5. For all records r in the DB
6. {Cr = subset(Ck, r ); For each c in
Cr , c.count ++ }

Example
So original transactional database will
be as follows.
Transaction ID Items Bought
T1 {M, O, N, K, E, Y }
T2 {D, O, N, K, E, Y }
T3 {M, A, K, E}
T4 {M, U, C, K, Y }
T5 {C, O, O, K, I, E}
Let minimum support
count be 3
1.Find all item sets with specified minimal
support(coverage).
2. Use these item sets to generate
interesting rules.

Example(cont…)
Step 1:
Count number of transactions in which each item occur
and tabulate them as below. Item Support count
M 3
O 3
N 2
K 5
E 4
Y 3
D 1
A 1
U 1
C 2
I 1

Example(cont…)
Step 2:
Remove all the item sets that are bought less
than 3 times because minimum support count is
assumed as 3.
So above tabItelme wilSlu ppborte co unct hanged to
M 3
O 3
K 5
E 4
Y 3
These are the single items that are
most frequently bought.

Example(cont…)
Step 3:
Now take each possible pair of items and count how
many times each item pair is being bought together.
Now our table turns as
Item Pairs Support count
MO 1
MK 3
ME 2
MY 2
OK 3
OE 3
OY 2
KE 4
KY 3
EY 2
Totally C(n,2)=n(n-1)/2 item
pairs will be possible.
As n=5 here 5(4)/2=10 pairs
exist.

Example(cont…)
Step 4:
Repeat same procedure i.e, remove all the item pairs
with support count < 3 and now we are left with
Item Pairs Support count
MK 3
OK 3
OE 3
KE 4
KY 3
These are the pair of item sets that are
bought more frequently.

Example(cont…)
Step 5:
To make set of three items we need one more
rule termed as self-join.
It simply means, from the Item pairs in the
above table, we find two pairs with
the same first Alphabet, so we get
1. OK and OE, this gives OKE
2. KE and KY, this gives KEY
Item Set Support count
OKE 3
KEY 2
Here 3-itemset OKE is bought 3 times. So, OKE is 3-
item set bought more frequently.

Hardware specifications
1. Pentium IV
processor
2. 256MB RAM
3. 40GB Hard
disk

Software specifications
Operating System : windows
2000 and above
versions
Jdk kit : Jdk
1.7
Front end : Java

Hiding Sensitive Association Rules

Recommended

Recommended

More Related Content

Similar to Hiding Sensitive Association Rules

Similar to Hiding Sensitive Association Rules (20)

Hiding Sensitive Association Rules