SlideShare a Scribd company logo
Data Mining
Association Rules Mining or
Market Basket Analysis
Prithwis Mukerjee, Ph.D.
Prithwis
Mukerjee 2
Let us describe the problem ...
A retailer sells the following items
 And we assume that the shopkeeper keeps track of what
each customer purchases :
 He needs to know which items are generally sold together
Bread Cheese Coffee Juice
Milk Tea BiscuitsSugar Newspaper
Items
10 Bread, Cheese, Newspaper
20 Bread, Cheese, Juice
30 Bread, Milk
40 Cheese, Juice, Milk, Coffee
50 Sugar, Tea, Coffee, Biscuits, Newspaper
60 Sugar, Tea, Coffee, Biscuits, Milk, Juice, Newspaper
70 Bread, Cheese
80 Bread, Cheese, Juice, Coffee
90 Bread, Milk
100 Sugar, Tea, Coffee, Bread, Milk, Juice, Newspaper
Trans ID
Prithwis
Mukerjee 3
Associations
Rules expressing relations between items in a
“Market Basket”
{ Sugar and Tea } => {Biscuits}
 Is it true, that if a customer buys Sugar and Tea, she will
also buy biscuits ?
 If so, then
 These items should be ordered together
 But discounts should not be given on these items at the same
time !
We can make a guess but
 It would be better if we could structure this problem in
terms of mathematics
Prithwis
Mukerjee 4
Basic Concepts
Set of n Items on Sale
 I = { i1
, i2
, i3
, i4
, i5
, i5
, ......, in
}
Transaction
 A subset of I : T ⊆ I
 A set of items purchased in an individual transaction
 With each transaction having m items
 ti
= { i1
, i2
, i3
, i4
, i5
, i5
, ......, im
} with m < n
 If we have N transactions then we have t1
, t2
,t3
,.. tN
as
unique identifier for each transaction
D is our total data about all N transactions
 D = {t1
, t2
,t3
,.. tN
}
Prithwis
Mukerjee 5
An Association Rule
Whenever X appears, Y also appears
 X ⇒ Y
 X ⊆ I, Y ⊆ I, X ∩ Y = ∅
X and Y may be
 Single items or
 Sets of items – in which the same item does not appear
X is referred to as the antecedent
Y is referred to as the consequent
Whether a rule like this exists is the focus of
our analysis
Prithwis
Mukerjee 6
Two key concepts
Support ( or prevalence)
 How often does X and Y appear together in the basket ?
 If this number is very low then it is not worth examining
 Expressed as a fraction of the total number of transactions
 Say 10% or 0.1
Confidence ( or predictability )
 Of all the occurances of X, in what fraction does Y also
appear ?
 Expressed as a fraction of all transactions containing X
 Say 80% or 0.8
We are interested in rules that have a
 Minimum value of support : say 25%
 Minimum value of confidence : say 75%
Prithwis
Mukerjee 7
Mathematically speaking ...
Support (X)
 = (Number of times X appears ) / N
 = P(X)
Support (XY)
 = (Number of times X and Y appears ) / N
 = P(X ∩ Y)
Confidence (X ⇒ Y)
 = Support (XY) / Support(X)
 = Probability (X ∩ Y) / P(X)
 = Conditional Probability P( Y | X)
Lift : an optional term
 Measures the power of association
 P( Y | X) / P(Y)
Prithwis
Mukerjee 8
The task at hand ...
Given a large set of transactions, we seek a
procedure ( or algorithm )
 That will discover all association rules
 That have a minimum support of p%
 And a minimum confidence level of q%
 And to do so in an efficient manner
Algorithms
 The Naive or Brute Force Method
 The Improved Naive algorithm
 The Apriori Algorithm
 Improvements to the Apriori algorithm
 FP ( Frequent Pattern ) Algorithm
Prithwis
Mukerjee 9
Let us try the Naive Algorithm manually !
This is the set of transaction that we have ...
 We want to find Association Rules with
 Minimum 50% support and
 Minimum 75% confidence
Items
100 Bread, Cheese
200 Bread, Cheese, Juice
300 Bread, Milk
400 Cheese, Juice, Milk
Trans ID
Prithwis
Mukerjee 10
Itemsets & Frequencies
Which sets are frequent ?
 Since we are looking for a
support of 50%, we need a
set to appear in 2 out of 4
transactions
 = (# of times X appears ) / N
 = P(X)
 6 sets meet this criteria
Item Sets Frequency
{Bread} 3
{Cheese } 3
{Juice} 2
{Milk} 2
{Bread, Cheese} 2
{Bread, Juice } 1
{Bread, Milk} 1
{Cheese, Juice} 2
{Cheese, Milk} 1
{Juice, Milk} 1
{Bread, Cheese, Juice} 1
{Bread, Cheese, Milk} 0
{Bread, Juice, Milk} 0
{Cheese, Juice, Milk} 1
{Bread, Cheese, Juice, Milk} 0
Prithwis
Mukerjee 11
A closer look at the “Frequent Set”
Look at itemsets with more than 1 item
 {Bread, Cheese}, {Cheese, Juice}
 4 rules are possible
Look for confidence levels
 Confidence (X ⇒ Y)
 = Support (XY) / Support(X)
Item Sets Frequency Rule Confidence
{Bread} 3 Bread => Cheese 2 / 3 67.00%
{Cheese } 3
{Juice} 2 Cheese => Bread 2 / 3 67.00%
{Milk} 2
{Bread, Cheese} 2 Cheese => Juice 2 / 3 67.00%
{Cheese, Juice} 2
Juice => Cheese 2 / 2 100.00%
Prithwis
Mukerjee 12
A closer look at the “Frequent Set”
Look at itemsets with more than 1 item
 {Bread, Cheese}, {Cheese, Juice}
 4 rules are possible
Look for confidence levels
 Confidence (X ⇒ Y)
 = Support (XY) / Support(X)
Item Sets Frequency Rule Confidence
{Bread} 3 Bread => Cheese 2 / 3 67.00%
{Cheese } 3
{Juice} 2 Cheese => Bread 2 / 3 67.00%
{Milk} 2
{Bread, Cheese} 2 Cheese => Juice 2 / 3 67.00%
{Cheese, Juice} 2
Juice => Cheese 2 / 2 100.00%
Prithwis
Mukerjee 13
The Big Picture
List all itemsets
 Find frequency of each
Identify “frequent sets”
 Based on support
Search for Rules within “frequent sets”
 Based on confidence
Prithwis
Mukerjee 14
Looking Beyond the Retail Store
Counter Terrorism
 Track phone calls made
or received from a
particular number every
day
 Is an incoming call from a
particular number
followed by a call to
another number ?
 Are there any sets of
numbers that are always
called together ?
Expand the item sets
to include
 Electronic fund transfers
 Travel between two
locations
 Boarding cards
 Railway reservation
All data is available
in electronic format
Prithwis
Mukerjee 15
Major Problem
Exponential Growth of
number of Itemsets
 4 items : 16 = 24
members
 n items : 2n
members
 As n becomes larger, the
problem cannot be solved
anymore in finite time
All attempts are made to
reduce the number of
Item sets to be processed
“Improved” Naive
algorithm
 Ignore sets with zero
frequency
Item Sets Frequency
{Bread} 3
{Cheese } 3
{Juice} 2
{Milk} 2
{Bread, Cheese} 2
{Bread, Juice } 1
{Bread, Milk} 1
{Cheese, Juice} 2
{Cheese, Milk} 1
{Juice, Milk} 1
{Bread, Cheese, Juice} 1
{Bread, Cheese, Milk} 0
{Bread, Juice, Milk} 0
{Cheese, Juice, Milk} 1
{Bread, Cheese, Juice, Milk} 0
Prithwis
Mukerjee 16
The APriori Algorithm
Consists of two PARTS
 First find the frequent itemsets
 Most of the cleverness happens here
 We will do better than the naive algorithm
 Find the rules
 This is relatively simpler
Prithwis
Mukerjee 17
APriori : Part 1 - Frequent Sets
Step 1
 Scan all transactions and find all frequent items that have
support above p%. This is set L1
Step 2 : Apriori-Gen
 Build potential sets of k items from the Lk-1
by using pairs of
itemsets in Lk-1
that has the first k-2 items common and one
remaining item from each member of the pair.
 This is Candidate set CK
Step 3 : Find Frequent Item Sets again
 Scan all transactions and find frequency of sets in CK
that
are frequent : This gives LK
 If LK
is empty, stop, else go back to step 2
Prithwis
Mukerjee 18
APriori : Part 1 - Frequent Sets
Step 1
 Scan all transactions and find all frequent items that have
support above p% - This is set L1
Prithwis
Mukerjee 19
Example
We have 16 items spread over 25 transactions
Item No Item Name
1 Biscuits
2 Bread
3 Cereal
4 Cheese
5 Chocolate
6 Coffee
7
8 Eggs
9 Juice
10 Milk
11 Newspaper
12 Pastry
13 Rolls
14 Sugar
15 Tea
16
Donuts
Yogurt
TID Items
1
2 Bread, Cereal, Cheese, Coffee
3
4 Bread, Cheese, Coffee, Cereal, Juice
5
6 Milk, Tea
7 Biscuits, Bread, Cheese, Coffee, Milk
8 Eggs, Milk, Tea
9 Bread, Cereal, Cheese, Chocolate, Coffee
10
11 Bread, Cheese, Juice
12
13 Biscuits, Bread, Cereal
14
15 Chocolate, Coffee
16
17
18 Biscuits, Bread, Cheese, Coffee
19
20
21
22 Bread, Cereal, Cheese, Coffee
23
24 Newspaper, Pastry, Rolls
25 Rolls, Sugar, Tea
Biscuits, Bread, Cheese, Coffee, Yogurt
Cheese, Chocolate, Donuts, Juice, Milk
Bread, Cereal, Chocolate, Donuts, Juice
Bread, Cereal, Chocolate, Donuts, Juice
Bread, Cheese, Coffee, Donuts, Juice
Cereal, Cheese, Chocolate, Donuts, Juice
Donuts
Donuts, Eggs, Juice
Bread, Cereal, Chocolate, Donuts, Juice
Cheese, Chocolate, Donuts, Juice
Milk, Tea, Yogurt
Chocolate, Donuts, Juice, Milk, Newspaper
Prithwis
Mukerjee 20
Apriori : Step 1 – Computing L1
Count frequency for each item and exclude
those that are below minimum support
Item No Item Name Frequency
1 Biscuits 4
2 Bread 13
3 Cereal 10
4 Cheese 11
5 Chocolate 9
6 Coffee 9
7 10
8 Eggs 2
9 Juice 11
10 Milk 6
11 Newspaper 2
12 Pastry 1
13 Rolls 2
14 Sugar 1
15 Tea 4
16 2
Donuts
Yogurt
Item No Item Name Frequency
2 Bread 13
3 Cereal 10
4 Cheese 11
5 Chocolate 9
6 Coffee 9
7 10
9 Juice 11
Donuts
25%
support
25%
support
This is set L1
Prithwis
Mukerjee 21
APriori : Part 1 - Frequent Sets
Step 1
 Scan all transactions and find all frequent items that have
support above p%. This is set L1
Step 2 : Apriori-Gen
 Build potential sets of k items from the Lk-1
by using pairs of
itemsets in Lk-1
that has the first k-2 items common and one
remaining item from each member of the pair.
 This is Candidate set CK
Prithwis
Mukerjee 22
Step 2 : Computing C2
 Given L1
, we now form candidate pairs of C2
. The 7 items in
form 21 pairs : d*(d-1)/2 – this is a quadratic function and
not a exponential function.
1 {Bread, Cereal}
2 {Bread, Cheese}
3 {Bread, Chocolate}
4 {Bread, Coffee}
5
6 {Bread,Juice}
7 {Cereal, Cheese}
8 {Cereal, Coffee}
9 {Cereal, Chocolate}
10
11 {Cereal, Juice}
12 {Cheese, Chocolate}
13 {Cheese, Coffee}
14
15 {Cheese, Juice}
16 {Chocolate, Coffee}
17
18 {Chocolate, Juice}
19
20 {Coffee, Juice}
21
{Bread, Donuts}
{Cereal, Donuts}
{Cheese, Donuts}
{Chocolate, Donuts}
{Coffee, Donuts}
{Donuts, Juice}
Item No Item Name Frequency
2 Bread 13
3 Cereal 10
4 Cheese 11
5 Chocolate 9
6 Coffee 9
7 10
9 Juice 11
Donuts
L1
to C2
L1
to C2
Prithwis
Mukerjee 23
APriori : Part 1 - Frequent Sets
Step 1
 Scan all transactions and find all frequent items that have
support above p%. This is set L1
Step 2 : Apriori-Gen
 Build potential sets of k items from the Lk-1
by using pairs of
itemsets in Lk-1
that has the first k-2 items common and one
remaining item from each member of the pair.
 This is Candidate set CK
Step 3 : Find Frequent Item Sets again
 Scan all transactions and find frequency of sets in CK
that
are frequent : This gives LK
 If LK
is empty, stop, else go back to step 2
Prithwis
Mukerjee 24
From C2
to L2
based on minimum support
Candidate 2-Item Set Freq
{Bread, Cereal} 9
{Bread, Cheese} 8
{Bread, Chocolate} 4
{Bread, Coffee} 8
4
{Bread,Juice} 6
{Cereal, Cheese} 5
{Cereal, Coffee} 4
{Cereal, Chocolate} 5
4
{Cereal, Juice} 6
{Cheese, Chocolate} 4
{Cheese, Coffee} 9
3
{Cheese, Juice} 4
{Chocolate, Coffee} 1
7
{Chocolate, Juice} 7
1
{Coffee, Juice} 2
9
{Bread, Donuts}
{Cereal, Donuts}
{Cheese, Donuts}
{Chocolate, Donuts}
{Coffee, Donuts}
{Donuts, Juice}
Frequent 2-Item Set Freq
{Bread, Cereal} 9
{Bread, Cheese} 8
{Bread, Coffee} 8
{Cheese, Coffee} 9
7
{Chocolate, Juice} 7
9
{Chocolate, Donuts}
{Donuts, Juice}
25%
support
25%
support
 This is a computationally
intensive step
 L2
is not empty
This is set L2
Prithwis
Mukerjee 25
APriori : Part 1 - Frequent Sets
Step 1
 Scan all transactions and find all frequent items that have
support above p%. This is set L1
Step 2 : Apriori-Gen
 Build potential sets of k items from the Lk-1
by using pairs of
itemsets in Lk-1
that has the first k-2 items common and one
remaining item from each member of the pair.
 This is Candidate set CK
Step 3 : Find Frequent Item Sets again
 Scan all transactions and find frequency of sets in CK
that
are frequent : This gives LK
 If LK
is empty, stop, else go back to step 2
Prithwis
Mukerjee 26
Step 2 Again : Get C3
 We combine the appropriate frequent 2-item sets from L2
(which must have the same first item) and obtain four such
itemsets each containing three items
Frequent 2-Item Set Freq
{Bread, Cereal} 9
{Bread, Cheese} 8
{Bread, Coffee} 8
{Cheese, Coffee} 9
7
{Chocolate, Juice} 7
9
{Chocolate, Donuts}
{Donuts, Juice}
This is set L2
Candidate 3 item set
{Bread, Cheese, Cereal}
{Bread, Cereal, Coffee}
{Bread, Cheese, Coffee}
{Chocolate, Donut, Juice}
L2
to C3
L2
to C3
Prithwis
Mukerjee 27
Step 3 Again C3
to L3
Again Based on Minimum Support
 Since C4 cannot be formed, L4 cannot be formed so we
stop here
Candidate 3 item set Frequency
{Bread, Cheese, Cereal} 4
{Bread, Cereal, Coffee} 4
{Bread, Cheese, Coffee} 8
7{Chocolate, Donut, Juice}
Frequent 3 item set Frequency
{Bread, Cheese, Coffee} 8
7{Chocolate, Donut, Juice}
25%
support
25%
support
Prithwis
Mukerjee 28
APriori : Part 1 - Frequent Sets
Step 1
 Scan all transactions and find all frequent items that have
support above p%. This is set L1
Step 2 : Apriori-Gen
 Build potential sets of k items from the Lk-1
by using pairs of
itemsets in Lk-1
that has the first k-2 items common and one
remaining item from each member of the pair.
 This is Candidate set CK
Step 3 : Find Frequent Item Sets again
 Scan all transactions and find frequency of sets in CK
that
are frequent : This gives LK
 If LK
is empty, stop, else go back to step 2
Prithwis
Mukerjee 29
The APriori Algorithm
Consists of two PARTS
 First find the frequent itemsets
 Most of the cleverness happens here
 We will do better than the naive algorithm
 Find the rules
 This is relatively simpler
Prithwis
Mukerjee 30
APriori : Part 2 – Find Rules
Rules will be found by looking at
 3-item sets found in L3
 2-item sets in L2 that are not subsets of L3
In each case we
 Calculate confidence (A ⇒ B )
 = P (B | A) = P(A ∩ B ) / P(A)
Some short hand
 {Bread, Cheese, Coffee } is written as { B, C, D}
Prithwis
Mukerjee 31
Rules for Finding Rules !
A 3 item frequent set { BCD} results in 6 rules
 B ⇒ CD, C ⇒ BD, D ⇒ BC
 CD ⇒ B, BD ⇒ C, BC ⇒ D
Also note that
 B ⇒ CD can also be written as
 B ⇒ D, B ⇒ C
We now look at these two 3-item sets and find
their confidence levels
 { Bread, Cheese, Coffee}
 { Chocolate, Donuts, Juice }
 From the L3
set ( the highest L set ) and note that support
for these rules is 8 and 7
Prithwis
Mukerjee 32
Rules from First of 2 Itemsets in L3
One rule drops out because confidence < 70%
 Calculate confidence (X ⇒ Y )
 = P (Y | X) = P(X ∩ Y ) / P(X)
Confidence of association rules from { Bread, Cheese, Coffee }
Rule Confidence
B => CD 8 13 0.615
C => BD 8 11 0.727
D => BC 8 9 0.889
CD => B 8 9 0.889
BD => C 8 8 1.000
BC => D 8 8 1.000
Support
of BCD
Frequency
of LHS
Item No Item Name Frequency
1 Biscuits 4
2 Bread 13
3 Cereal 10
4 Cheese 11
5 Chocolate 9
6 Coffee 9
7 10
8 Eggs 2
9 Juice 11
10 Milk 6
11 Newspaper 2
12 Pastry 1
13 Rolls 2
14 Sugar 1
15 Tea 4
16 2
Donuts
Yogurt
Prithwis
Mukerjee 33
Rules from First of 2 Itemsets in L3
One rule drops out because confidence < 70%
Confidence of association rules from { Bread B, Cheese C, Coffee D }
Rule Confidence
B => CD 8 13 0.615
C => BD 8 11 0.727
D => BC 8 9 0.889
CD => B 8 9 0.889
BD => C 8 8 1.000
BC => D 8 8 1.000
Support
of BCD
Frequency
of LHS
Frequent 2-Item Set Freq
{Bread, Cereal} 9
{Bread, Cheese} 8
{Bread, Coffee} 8
{Cheese, Coffee} 9
7
{Chocolate, Juice} 7
9
{Chocolate, Donuts}
{Donuts, Juice}
Prithwis
Mukerjee 34
Rules from Second of 2 Itemsets in L3
One rule drops out because confidence < 70%
Rule Confidence
N => MP 7 9 0.778
M => NP 7 10 0.700
P => NM 7 11 0.636
MP => N 7 9 0.778
NP => M 7 7 1.000
NM => P 7 7 1.000
Confidence of association rules from { chocolate N, donut M, juice P}
Support
of BCD
Frequency
of LHS
Item No Item Name Frequency
1 Biscuits 4
2 Bread 13
3 Cereal 10
4 Cheese 11
5 Chocolate 9
6 Coffee 9
7 10
8 Eggs 2
9 Juice 11
10 Milk 6
11 Newspaper 2
12 Pastry 1
13 Rolls 2
14 Sugar 1
15 Tea 4
16 2
Donuts
Yogurt
Prithwis
Mukerjee 35
Rules from Second of 2 Itemsets in L3
One rule drops out because confidence < 70%
Rule Confidence
N => MP 7 9 0.778
M => NP 7 10 0.700
P => NM 7 11 0.636
MP => N 7 9 0.778
NP => M 7 7 1.000
NM => P 7 7 1.000
Confidence of association rules from { chocolate N, donut M, juice P}
Support
of BCD
Frequency
of LHS
Frequent 2-Item Set Freq
{Bread, Cereal} 9
{Bread, Cheese} 8
{Bread, Coffee} 8
{Cheese, Coffee} 9
7
{Chocolate, Juice} 7
9
{Chocolate, Donuts}
{Donuts, Juice}
Prithwis
Mukerjee 36
Set of 14 Rules obtained from L3
C => BD
C => B 1 Cheese => Bread
C => D 2 Cheese => Coffee
D => BC
D => B 3 Coffee = > Bread
D => C 4 Coffee => Cheese
CD => B 5 Cheese, Coffee => Bread
BD => C 6 Bread, Coffee => Cheese
BC => D 7 Bread, Cheese => Coffee
N => MP
N => M 8
N => P 9 Chocolate => Juice
M => NP
M => P 10
M => N 11
MP => N 12
NP => M 13
NM => P 14
Chocolate => Donuts
Donuts => Chocolate
Donuts => Juice
Donuts, Juice => Chocolate
Chocolate , Juice => Donuts
Chocolate, Donuts => Juice
Prithwis
Mukerjee 37
What about L2
?
Look for sets in L2
that are not subsets of L3
 { Bread, Cereal} is the only candidate
 Which gives are two more rules
 Bread ⇒ Cereal
 Cereal ⇒ Bread
Frequent 2-Item Set Freq
{Bread, Cereal} 9
{Bread, Cheese} 8
{Bread, Coffee} 8
{Cheese, Coffee} 9
7
{Chocolate, Juice} 7
9
{Chocolate, Donuts}
{Donuts, Juice}
Frequent 3 item set Frequency
{Bread, Cheese, Coffee} 8
7{Chocolate, Donut, Juice}
Prithwis
Mukerjee 38
Which are now added to get 16 rules
C => BD
C => B 1 Cheese => Bread
C => D 2 Cheese => Coffee
D => BC
D => B 3 Coffee = > Bread
D => C 4 Coffee => Cheese
CD => B 5 Cheese, Coffee => Bread
BD => C 6 Bread, Coffee => Cheese
BC => D 7 Bread, Cheese => Coffee
N => MP
N => M 8
N => P 9 Chocolate => Juice
M => NP
M => P 10
M => N 11
MP => N 12
NP => M 13
NM => P 14
15 Bread = > Cereal
16 Cereal => Bread
Chocolate => Donuts
Donuts => Chocolate
Donuts => Juice
Donuts, Juice => Chocolate
Chocolate , Juice => Donuts
Chocolate, Donuts => Juice
Prithwis
Mukerjee 39
So where are we ?
Apriori Algorithm
Consists of two
PARTS
 First find the frequent
itemsets
 Most of the cleverness
happens here
 We will do better than
the naive algorithm
 Find the rules
 This is relatively simpler
We have just
completed the two
PARTS
Overall approach to
ARM is as follows
 List all itemsets
 Find frequency of each
 Identify “frequent sets”
 Based on support
 Search for Rules within
“frequent sets”
 Based on confidence
Naive Algorithm
 Exponential Time
A Priori Algoritm
 Polynomial Time
Prithwis
Mukerjee 40
Observations
Actual values of support and confidence
 25%, 75% are very high values
 In reality one works with far smaller values
“Interestingness” of a rule
 Since X, Y are related events – not independent – hence
P(X ∩ Y) ≠ P(X)P(Y)
 Interestingness ≈ P(X ∩ Y) – P(X)P(Y)
Triviality of rules
 Rules involving very frequent items can be trivial
 You always buy potatoes when you go to the market and
so you can get rules that connect potatoes to many things
Inexplicable rules
 Toothbrush was the most frequent item on Tuesday ??
Prithwis
Mukerjee 41
Better Algorithms
Enhancements to
the Apriori
Algorithm
 AP-TID
 Direct Hashing and
Pruning (DHP)
 Dynamic Itemset
Counting (DIC)
Frequent Pattern (FP)
Tree
 Only frequent items are
needed to find association
rules – so ignore others !
 Move the data of only
frequent items to a more
compact and efficient
structure
 A Tree structure or a directed
graph is used
 Multiple transactions with
same (frequent) items are
stored once with a count
information
Prithwis
Mukerjee 42
Software Support
KDNuggets.com
 Excellent collections of software available
Bart Goethals
 Free software for Apriori, FP-Tree
ARMiner
 GNU Open Source software from UMass/Boston
DMII
 National University of Singapore
DB2 Intelligent Data Miner
 IBM Corporation
 Equivalent software available from other vendors as well

More Related Content

Viewers also liked

Data mining classification-2009-v0
Data mining classification-2009-v0Data mining classification-2009-v0
Data mining classification-2009-v0Prithwis Mukerjee
 
Bitcoin, Blockchain and Crypto Contracts - Part 3
Bitcoin, Blockchain and Crypto Contracts - Part 3Bitcoin, Blockchain and Crypto Contracts - Part 3
Bitcoin, Blockchain and Crypto Contracts - Part 3
Prithwis Mukerjee
 
Business Intelligence Industry Perspective Session I
Business Intelligence   Industry Perspective Session IBusiness Intelligence   Industry Perspective Session I
Business Intelligence Industry Perspective Session I
Prithwis Mukerjee
 
Game theoretic concepts in Support Vector Machines
Game theoretic concepts in Support Vector MachinesGame theoretic concepts in Support Vector Machines
Game theoretic concepts in Support Vector Machines
Subhayan Mukerjee
 
The incompleteness of reason
The incompleteness of reasonThe incompleteness of reason
The incompleteness of reason
Subhayan Mukerjee
 
Tintin and Contemporary Politics
Tintin and Contemporary PoliticsTintin and Contemporary Politics
Tintin and Contemporary Politics
Subhayan Mukerjee
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
Prithwis Mukerjee
 
ইন্টার্নেট কি এবং কেন ?
ইন্টার্নেট কি এবং কেন ?ইন্টার্নেট কি এবং কেন ?
ইন্টার্নেট কি এবং কেন ?
Prithwis Mukerjee
 
Bitcoin, Blockchain and the Crypto Contracts - Part 2
Bitcoin, Blockchain and the Crypto Contracts - Part 2Bitcoin, Blockchain and the Crypto Contracts - Part 2
Bitcoin, Blockchain and the Crypto Contracts - Part 2
Prithwis Mukerjee
 

Viewers also liked (10)

Data mining intro-2009-v2
Data mining intro-2009-v2Data mining intro-2009-v2
Data mining intro-2009-v2
 
Data mining classification-2009-v0
Data mining classification-2009-v0Data mining classification-2009-v0
Data mining classification-2009-v0
 
Bitcoin, Blockchain and Crypto Contracts - Part 3
Bitcoin, Blockchain and Crypto Contracts - Part 3Bitcoin, Blockchain and Crypto Contracts - Part 3
Bitcoin, Blockchain and Crypto Contracts - Part 3
 
Business Intelligence Industry Perspective Session I
Business Intelligence   Industry Perspective Session IBusiness Intelligence   Industry Perspective Session I
Business Intelligence Industry Perspective Session I
 
Game theoretic concepts in Support Vector Machines
Game theoretic concepts in Support Vector MachinesGame theoretic concepts in Support Vector Machines
Game theoretic concepts in Support Vector Machines
 
The incompleteness of reason
The incompleteness of reasonThe incompleteness of reason
The incompleteness of reason
 
Tintin and Contemporary Politics
Tintin and Contemporary PoliticsTintin and Contemporary Politics
Tintin and Contemporary Politics
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
ইন্টার্নেট কি এবং কেন ?
ইন্টার্নেট কি এবং কেন ?ইন্টার্নেট কি এবং কেন ?
ইন্টার্নেট কি এবং কেন ?
 
Bitcoin, Blockchain and the Crypto Contracts - Part 2
Bitcoin, Blockchain and the Crypto Contracts - Part 2Bitcoin, Blockchain and the Crypto Contracts - Part 2
Bitcoin, Blockchain and the Crypto Contracts - Part 2
 

Similar to Data mining arm-2009-v0

MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
nikshaikh786
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
hktripathy
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
Wake Tech BAS
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
rahulmath80
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdf
Jyoti Yadav
 
Association Rule Mining in Data Mining
Association Rule Mining in Data Mining Association Rule Mining in Data Mining
Association Rule Mining in Data Mining
Ayesha Ali
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
Sulman Ahmed
 
Association rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithmsAssociation rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithms
Francisco E. Figueroa-Nigaglioni
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdf
WailaBaba
 
Association Rule Mining
Association Rule MiningAssociation Rule Mining
Association Rule Mining
PALLAB DAS
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Gaurav Aggarwal
 
Association Rule mining
Association Rule miningAssociation Rule mining
Association Rule mining
Megha Sharma
 
What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)
Kumar P
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
raju980973
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
Sandeep Prasad
 
Intake 37 DM
Intake 37 DMIntake 37 DM
Intake 37 DM
Mahmoud Ouf
 
ASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptxASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptx
SherishJaved
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
selvifitria1
 
Case study on Transaction in Grocery Store
Case study on Transaction in Grocery Store Case study on Transaction in Grocery Store
Case study on Transaction in Grocery Store
divyawani2
 
Association Rule Mining || Data Mining
Association Rule Mining || Data MiningAssociation Rule Mining || Data Mining
Association Rule Mining || Data Mining
Iffat Firozy
 

Similar to Data mining arm-2009-v0 (20)

MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdf
 
Association Rule Mining in Data Mining
Association Rule Mining in Data Mining Association Rule Mining in Data Mining
Association Rule Mining in Data Mining
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Association rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithmsAssociation rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithms
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdf
 
Association Rule Mining
Association Rule MiningAssociation Rule Mining
Association Rule Mining
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Association Rule mining
Association Rule miningAssociation Rule mining
Association Rule mining
 
What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Intake 37 DM
Intake 37 DMIntake 37 DM
Intake 37 DM
 
ASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptxASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptx
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
 
Case study on Transaction in Grocery Store
Case study on Transaction in Grocery Store Case study on Transaction in Grocery Store
Case study on Transaction in Grocery Store
 
Association Rule Mining || Data Mining
Association Rule Mining || Data MiningAssociation Rule Mining || Data Mining
Association Rule Mining || Data Mining
 

More from Prithwis Mukerjee

Thought controlled devices
Thought controlled devicesThought controlled devices
Thought controlled devices
Prithwis Mukerjee
 
Cloudcasting
CloudcastingCloudcasting
Cloudcasting
Prithwis Mukerjee
 
Currency, Commodity and Bitcoins
Currency, Commodity and BitcoinsCurrency, Commodity and Bitcoins
Currency, Commodity and Bitcoins
Prithwis Mukerjee
 
Data Science
Data ScienceData Science
Data Science
Prithwis Mukerjee
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6Prithwis Mukerjee
 
Thought control
Thought controlThought control
Thought control
Prithwis Mukerjee
 
World of data @ praxis 2013 v2
World of data   @ praxis 2013  v2World of data   @ praxis 2013  v2
World of data @ praxis 2013 v2
Prithwis Mukerjee
 
BIS 08a - Application Development - II Version 2
BIS 08a - Application Development - II Version 2BIS 08a - Application Development - II Version 2
BIS 08a - Application Development - II Version 2
Prithwis Mukerjee
 
Lecture02 - Data Mining & Analytics
Lecture02 - Data Mining & AnalyticsLecture02 - Data Mining & Analytics
Lecture02 - Data Mining & AnalyticsPrithwis Mukerjee
 
Data mining clustering-2009-v0
Data mining clustering-2009-v0Data mining clustering-2009-v0
Data mining clustering-2009-v0Prithwis Mukerjee
 
PPM Lite
PPM LitePPM Lite
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business IntelligencePrithwis Mukerjee
 
Business Models for Web 2.0
Business Models for Web 2.0Business Models for Web 2.0
Business Models for Web 2.0
Prithwis Mukerjee
 
BIS01 Living On the Web
BIS01 Living On the WebBIS01 Living On the Web
BIS01 Living On the Web
Prithwis Mukerjee
 
BIS03 Data Modelling - I
BIS03 Data Modelling - IBIS03 Data Modelling - I
BIS03 Data Modelling - I
Prithwis Mukerjee
 
BIS04 Data Modelling - II
BIS04 Data Modelling  - IIBIS04 Data Modelling  - II
BIS04 Data Modelling - II
Prithwis Mukerjee
 
BIS06 Physical Database Models
BIS06 Physical Database ModelsBIS06 Physical Database Models
BIS06 Physical Database Models
Prithwis Mukerjee
 

More from Prithwis Mukerjee (20)

Thought controlled devices
Thought controlled devicesThought controlled devices
Thought controlled devices
 
Cloudcasting
CloudcastingCloudcasting
Cloudcasting
 
Currency, Commodity and Bitcoins
Currency, Commodity and BitcoinsCurrency, Commodity and Bitcoins
Currency, Commodity and Bitcoins
 
Data Science
Data ScienceData Science
Data Science
 
05 OLAP v6 weekend
05 OLAP  v6 weekend05 OLAP  v6 weekend
05 OLAP v6 weekend
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6
 
Thought control
Thought controlThought control
Thought control
 
World of data @ praxis 2013 v2
World of data   @ praxis 2013  v2World of data   @ praxis 2013  v2
World of data @ praxis 2013 v2
 
BIS 08a - Application Development - II Version 2
BIS 08a - Application Development - II Version 2BIS 08a - Application Development - II Version 2
BIS 08a - Application Development - II Version 2
 
Lecture02 - Data Mining & Analytics
Lecture02 - Data Mining & AnalyticsLecture02 - Data Mining & Analytics
Lecture02 - Data Mining & Analytics
 
Data mining clustering-2009-v0
Data mining clustering-2009-v0Data mining clustering-2009-v0
Data mining clustering-2009-v0
 
PPM Lite
PPM LitePPM Lite
PPM Lite
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business Intelligence
 
Business Models for Web 2.0
Business Models for Web 2.0Business Models for Web 2.0
Business Models for Web 2.0
 
BIS01 Living On the Web
BIS01 Living On the WebBIS01 Living On the Web
BIS01 Living On the Web
 
BIS03 Data Modelling - I
BIS03 Data Modelling - IBIS03 Data Modelling - I
BIS03 Data Modelling - I
 
BIS04 Data Modelling - II
BIS04 Data Modelling  - IIBIS04 Data Modelling  - II
BIS04 Data Modelling - II
 
BIS06 Physical Database Models
BIS06 Physical Database ModelsBIS06 Physical Database Models
BIS06 Physical Database Models
 

Recently uploaded

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 

Recently uploaded (20)

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 

Data mining arm-2009-v0

  • 1. Data Mining Association Rules Mining or Market Basket Analysis Prithwis Mukerjee, Ph.D.
  • 2. Prithwis Mukerjee 2 Let us describe the problem ... A retailer sells the following items  And we assume that the shopkeeper keeps track of what each customer purchases :  He needs to know which items are generally sold together Bread Cheese Coffee Juice Milk Tea BiscuitsSugar Newspaper Items 10 Bread, Cheese, Newspaper 20 Bread, Cheese, Juice 30 Bread, Milk 40 Cheese, Juice, Milk, Coffee 50 Sugar, Tea, Coffee, Biscuits, Newspaper 60 Sugar, Tea, Coffee, Biscuits, Milk, Juice, Newspaper 70 Bread, Cheese 80 Bread, Cheese, Juice, Coffee 90 Bread, Milk 100 Sugar, Tea, Coffee, Bread, Milk, Juice, Newspaper Trans ID
  • 3. Prithwis Mukerjee 3 Associations Rules expressing relations between items in a “Market Basket” { Sugar and Tea } => {Biscuits}  Is it true, that if a customer buys Sugar and Tea, she will also buy biscuits ?  If so, then  These items should be ordered together  But discounts should not be given on these items at the same time ! We can make a guess but  It would be better if we could structure this problem in terms of mathematics
  • 4. Prithwis Mukerjee 4 Basic Concepts Set of n Items on Sale  I = { i1 , i2 , i3 , i4 , i5 , i5 , ......, in } Transaction  A subset of I : T ⊆ I  A set of items purchased in an individual transaction  With each transaction having m items  ti = { i1 , i2 , i3 , i4 , i5 , i5 , ......, im } with m < n  If we have N transactions then we have t1 , t2 ,t3 ,.. tN as unique identifier for each transaction D is our total data about all N transactions  D = {t1 , t2 ,t3 ,.. tN }
  • 5. Prithwis Mukerjee 5 An Association Rule Whenever X appears, Y also appears  X ⇒ Y  X ⊆ I, Y ⊆ I, X ∩ Y = ∅ X and Y may be  Single items or  Sets of items – in which the same item does not appear X is referred to as the antecedent Y is referred to as the consequent Whether a rule like this exists is the focus of our analysis
  • 6. Prithwis Mukerjee 6 Two key concepts Support ( or prevalence)  How often does X and Y appear together in the basket ?  If this number is very low then it is not worth examining  Expressed as a fraction of the total number of transactions  Say 10% or 0.1 Confidence ( or predictability )  Of all the occurances of X, in what fraction does Y also appear ?  Expressed as a fraction of all transactions containing X  Say 80% or 0.8 We are interested in rules that have a  Minimum value of support : say 25%  Minimum value of confidence : say 75%
  • 7. Prithwis Mukerjee 7 Mathematically speaking ... Support (X)  = (Number of times X appears ) / N  = P(X) Support (XY)  = (Number of times X and Y appears ) / N  = P(X ∩ Y) Confidence (X ⇒ Y)  = Support (XY) / Support(X)  = Probability (X ∩ Y) / P(X)  = Conditional Probability P( Y | X) Lift : an optional term  Measures the power of association  P( Y | X) / P(Y)
  • 8. Prithwis Mukerjee 8 The task at hand ... Given a large set of transactions, we seek a procedure ( or algorithm )  That will discover all association rules  That have a minimum support of p%  And a minimum confidence level of q%  And to do so in an efficient manner Algorithms  The Naive or Brute Force Method  The Improved Naive algorithm  The Apriori Algorithm  Improvements to the Apriori algorithm  FP ( Frequent Pattern ) Algorithm
  • 9. Prithwis Mukerjee 9 Let us try the Naive Algorithm manually ! This is the set of transaction that we have ...  We want to find Association Rules with  Minimum 50% support and  Minimum 75% confidence Items 100 Bread, Cheese 200 Bread, Cheese, Juice 300 Bread, Milk 400 Cheese, Juice, Milk Trans ID
  • 10. Prithwis Mukerjee 10 Itemsets & Frequencies Which sets are frequent ?  Since we are looking for a support of 50%, we need a set to appear in 2 out of 4 transactions  = (# of times X appears ) / N  = P(X)  6 sets meet this criteria Item Sets Frequency {Bread} 3 {Cheese } 3 {Juice} 2 {Milk} 2 {Bread, Cheese} 2 {Bread, Juice } 1 {Bread, Milk} 1 {Cheese, Juice} 2 {Cheese, Milk} 1 {Juice, Milk} 1 {Bread, Cheese, Juice} 1 {Bread, Cheese, Milk} 0 {Bread, Juice, Milk} 0 {Cheese, Juice, Milk} 1 {Bread, Cheese, Juice, Milk} 0
  • 11. Prithwis Mukerjee 11 A closer look at the “Frequent Set” Look at itemsets with more than 1 item  {Bread, Cheese}, {Cheese, Juice}  4 rules are possible Look for confidence levels  Confidence (X ⇒ Y)  = Support (XY) / Support(X) Item Sets Frequency Rule Confidence {Bread} 3 Bread => Cheese 2 / 3 67.00% {Cheese } 3 {Juice} 2 Cheese => Bread 2 / 3 67.00% {Milk} 2 {Bread, Cheese} 2 Cheese => Juice 2 / 3 67.00% {Cheese, Juice} 2 Juice => Cheese 2 / 2 100.00%
  • 12. Prithwis Mukerjee 12 A closer look at the “Frequent Set” Look at itemsets with more than 1 item  {Bread, Cheese}, {Cheese, Juice}  4 rules are possible Look for confidence levels  Confidence (X ⇒ Y)  = Support (XY) / Support(X) Item Sets Frequency Rule Confidence {Bread} 3 Bread => Cheese 2 / 3 67.00% {Cheese } 3 {Juice} 2 Cheese => Bread 2 / 3 67.00% {Milk} 2 {Bread, Cheese} 2 Cheese => Juice 2 / 3 67.00% {Cheese, Juice} 2 Juice => Cheese 2 / 2 100.00%
  • 13. Prithwis Mukerjee 13 The Big Picture List all itemsets  Find frequency of each Identify “frequent sets”  Based on support Search for Rules within “frequent sets”  Based on confidence
  • 14. Prithwis Mukerjee 14 Looking Beyond the Retail Store Counter Terrorism  Track phone calls made or received from a particular number every day  Is an incoming call from a particular number followed by a call to another number ?  Are there any sets of numbers that are always called together ? Expand the item sets to include  Electronic fund transfers  Travel between two locations  Boarding cards  Railway reservation All data is available in electronic format
  • 15. Prithwis Mukerjee 15 Major Problem Exponential Growth of number of Itemsets  4 items : 16 = 24 members  n items : 2n members  As n becomes larger, the problem cannot be solved anymore in finite time All attempts are made to reduce the number of Item sets to be processed “Improved” Naive algorithm  Ignore sets with zero frequency Item Sets Frequency {Bread} 3 {Cheese } 3 {Juice} 2 {Milk} 2 {Bread, Cheese} 2 {Bread, Juice } 1 {Bread, Milk} 1 {Cheese, Juice} 2 {Cheese, Milk} 1 {Juice, Milk} 1 {Bread, Cheese, Juice} 1 {Bread, Cheese, Milk} 0 {Bread, Juice, Milk} 0 {Cheese, Juice, Milk} 1 {Bread, Cheese, Juice, Milk} 0
  • 16. Prithwis Mukerjee 16 The APriori Algorithm Consists of two PARTS  First find the frequent itemsets  Most of the cleverness happens here  We will do better than the naive algorithm  Find the rules  This is relatively simpler
  • 17. Prithwis Mukerjee 17 APriori : Part 1 - Frequent Sets Step 1  Scan all transactions and find all frequent items that have support above p%. This is set L1 Step 2 : Apriori-Gen  Build potential sets of k items from the Lk-1 by using pairs of itemsets in Lk-1 that has the first k-2 items common and one remaining item from each member of the pair.  This is Candidate set CK Step 3 : Find Frequent Item Sets again  Scan all transactions and find frequency of sets in CK that are frequent : This gives LK  If LK is empty, stop, else go back to step 2
  • 18. Prithwis Mukerjee 18 APriori : Part 1 - Frequent Sets Step 1  Scan all transactions and find all frequent items that have support above p% - This is set L1
  • 19. Prithwis Mukerjee 19 Example We have 16 items spread over 25 transactions Item No Item Name 1 Biscuits 2 Bread 3 Cereal 4 Cheese 5 Chocolate 6 Coffee 7 8 Eggs 9 Juice 10 Milk 11 Newspaper 12 Pastry 13 Rolls 14 Sugar 15 Tea 16 Donuts Yogurt TID Items 1 2 Bread, Cereal, Cheese, Coffee 3 4 Bread, Cheese, Coffee, Cereal, Juice 5 6 Milk, Tea 7 Biscuits, Bread, Cheese, Coffee, Milk 8 Eggs, Milk, Tea 9 Bread, Cereal, Cheese, Chocolate, Coffee 10 11 Bread, Cheese, Juice 12 13 Biscuits, Bread, Cereal 14 15 Chocolate, Coffee 16 17 18 Biscuits, Bread, Cheese, Coffee 19 20 21 22 Bread, Cereal, Cheese, Coffee 23 24 Newspaper, Pastry, Rolls 25 Rolls, Sugar, Tea Biscuits, Bread, Cheese, Coffee, Yogurt Cheese, Chocolate, Donuts, Juice, Milk Bread, Cereal, Chocolate, Donuts, Juice Bread, Cereal, Chocolate, Donuts, Juice Bread, Cheese, Coffee, Donuts, Juice Cereal, Cheese, Chocolate, Donuts, Juice Donuts Donuts, Eggs, Juice Bread, Cereal, Chocolate, Donuts, Juice Cheese, Chocolate, Donuts, Juice Milk, Tea, Yogurt Chocolate, Donuts, Juice, Milk, Newspaper
  • 20. Prithwis Mukerjee 20 Apriori : Step 1 – Computing L1 Count frequency for each item and exclude those that are below minimum support Item No Item Name Frequency 1 Biscuits 4 2 Bread 13 3 Cereal 10 4 Cheese 11 5 Chocolate 9 6 Coffee 9 7 10 8 Eggs 2 9 Juice 11 10 Milk 6 11 Newspaper 2 12 Pastry 1 13 Rolls 2 14 Sugar 1 15 Tea 4 16 2 Donuts Yogurt Item No Item Name Frequency 2 Bread 13 3 Cereal 10 4 Cheese 11 5 Chocolate 9 6 Coffee 9 7 10 9 Juice 11 Donuts 25% support 25% support This is set L1
  • 21. Prithwis Mukerjee 21 APriori : Part 1 - Frequent Sets Step 1  Scan all transactions and find all frequent items that have support above p%. This is set L1 Step 2 : Apriori-Gen  Build potential sets of k items from the Lk-1 by using pairs of itemsets in Lk-1 that has the first k-2 items common and one remaining item from each member of the pair.  This is Candidate set CK
  • 22. Prithwis Mukerjee 22 Step 2 : Computing C2  Given L1 , we now form candidate pairs of C2 . The 7 items in form 21 pairs : d*(d-1)/2 – this is a quadratic function and not a exponential function. 1 {Bread, Cereal} 2 {Bread, Cheese} 3 {Bread, Chocolate} 4 {Bread, Coffee} 5 6 {Bread,Juice} 7 {Cereal, Cheese} 8 {Cereal, Coffee} 9 {Cereal, Chocolate} 10 11 {Cereal, Juice} 12 {Cheese, Chocolate} 13 {Cheese, Coffee} 14 15 {Cheese, Juice} 16 {Chocolate, Coffee} 17 18 {Chocolate, Juice} 19 20 {Coffee, Juice} 21 {Bread, Donuts} {Cereal, Donuts} {Cheese, Donuts} {Chocolate, Donuts} {Coffee, Donuts} {Donuts, Juice} Item No Item Name Frequency 2 Bread 13 3 Cereal 10 4 Cheese 11 5 Chocolate 9 6 Coffee 9 7 10 9 Juice 11 Donuts L1 to C2 L1 to C2
  • 23. Prithwis Mukerjee 23 APriori : Part 1 - Frequent Sets Step 1  Scan all transactions and find all frequent items that have support above p%. This is set L1 Step 2 : Apriori-Gen  Build potential sets of k items from the Lk-1 by using pairs of itemsets in Lk-1 that has the first k-2 items common and one remaining item from each member of the pair.  This is Candidate set CK Step 3 : Find Frequent Item Sets again  Scan all transactions and find frequency of sets in CK that are frequent : This gives LK  If LK is empty, stop, else go back to step 2
  • 24. Prithwis Mukerjee 24 From C2 to L2 based on minimum support Candidate 2-Item Set Freq {Bread, Cereal} 9 {Bread, Cheese} 8 {Bread, Chocolate} 4 {Bread, Coffee} 8 4 {Bread,Juice} 6 {Cereal, Cheese} 5 {Cereal, Coffee} 4 {Cereal, Chocolate} 5 4 {Cereal, Juice} 6 {Cheese, Chocolate} 4 {Cheese, Coffee} 9 3 {Cheese, Juice} 4 {Chocolate, Coffee} 1 7 {Chocolate, Juice} 7 1 {Coffee, Juice} 2 9 {Bread, Donuts} {Cereal, Donuts} {Cheese, Donuts} {Chocolate, Donuts} {Coffee, Donuts} {Donuts, Juice} Frequent 2-Item Set Freq {Bread, Cereal} 9 {Bread, Cheese} 8 {Bread, Coffee} 8 {Cheese, Coffee} 9 7 {Chocolate, Juice} 7 9 {Chocolate, Donuts} {Donuts, Juice} 25% support 25% support  This is a computationally intensive step  L2 is not empty This is set L2
  • 25. Prithwis Mukerjee 25 APriori : Part 1 - Frequent Sets Step 1  Scan all transactions and find all frequent items that have support above p%. This is set L1 Step 2 : Apriori-Gen  Build potential sets of k items from the Lk-1 by using pairs of itemsets in Lk-1 that has the first k-2 items common and one remaining item from each member of the pair.  This is Candidate set CK Step 3 : Find Frequent Item Sets again  Scan all transactions and find frequency of sets in CK that are frequent : This gives LK  If LK is empty, stop, else go back to step 2
  • 26. Prithwis Mukerjee 26 Step 2 Again : Get C3  We combine the appropriate frequent 2-item sets from L2 (which must have the same first item) and obtain four such itemsets each containing three items Frequent 2-Item Set Freq {Bread, Cereal} 9 {Bread, Cheese} 8 {Bread, Coffee} 8 {Cheese, Coffee} 9 7 {Chocolate, Juice} 7 9 {Chocolate, Donuts} {Donuts, Juice} This is set L2 Candidate 3 item set {Bread, Cheese, Cereal} {Bread, Cereal, Coffee} {Bread, Cheese, Coffee} {Chocolate, Donut, Juice} L2 to C3 L2 to C3
  • 27. Prithwis Mukerjee 27 Step 3 Again C3 to L3 Again Based on Minimum Support  Since C4 cannot be formed, L4 cannot be formed so we stop here Candidate 3 item set Frequency {Bread, Cheese, Cereal} 4 {Bread, Cereal, Coffee} 4 {Bread, Cheese, Coffee} 8 7{Chocolate, Donut, Juice} Frequent 3 item set Frequency {Bread, Cheese, Coffee} 8 7{Chocolate, Donut, Juice} 25% support 25% support
  • 28. Prithwis Mukerjee 28 APriori : Part 1 - Frequent Sets Step 1  Scan all transactions and find all frequent items that have support above p%. This is set L1 Step 2 : Apriori-Gen  Build potential sets of k items from the Lk-1 by using pairs of itemsets in Lk-1 that has the first k-2 items common and one remaining item from each member of the pair.  This is Candidate set CK Step 3 : Find Frequent Item Sets again  Scan all transactions and find frequency of sets in CK that are frequent : This gives LK  If LK is empty, stop, else go back to step 2
  • 29. Prithwis Mukerjee 29 The APriori Algorithm Consists of two PARTS  First find the frequent itemsets  Most of the cleverness happens here  We will do better than the naive algorithm  Find the rules  This is relatively simpler
  • 30. Prithwis Mukerjee 30 APriori : Part 2 – Find Rules Rules will be found by looking at  3-item sets found in L3  2-item sets in L2 that are not subsets of L3 In each case we  Calculate confidence (A ⇒ B )  = P (B | A) = P(A ∩ B ) / P(A) Some short hand  {Bread, Cheese, Coffee } is written as { B, C, D}
  • 31. Prithwis Mukerjee 31 Rules for Finding Rules ! A 3 item frequent set { BCD} results in 6 rules  B ⇒ CD, C ⇒ BD, D ⇒ BC  CD ⇒ B, BD ⇒ C, BC ⇒ D Also note that  B ⇒ CD can also be written as  B ⇒ D, B ⇒ C We now look at these two 3-item sets and find their confidence levels  { Bread, Cheese, Coffee}  { Chocolate, Donuts, Juice }  From the L3 set ( the highest L set ) and note that support for these rules is 8 and 7
  • 32. Prithwis Mukerjee 32 Rules from First of 2 Itemsets in L3 One rule drops out because confidence < 70%  Calculate confidence (X ⇒ Y )  = P (Y | X) = P(X ∩ Y ) / P(X) Confidence of association rules from { Bread, Cheese, Coffee } Rule Confidence B => CD 8 13 0.615 C => BD 8 11 0.727 D => BC 8 9 0.889 CD => B 8 9 0.889 BD => C 8 8 1.000 BC => D 8 8 1.000 Support of BCD Frequency of LHS Item No Item Name Frequency 1 Biscuits 4 2 Bread 13 3 Cereal 10 4 Cheese 11 5 Chocolate 9 6 Coffee 9 7 10 8 Eggs 2 9 Juice 11 10 Milk 6 11 Newspaper 2 12 Pastry 1 13 Rolls 2 14 Sugar 1 15 Tea 4 16 2 Donuts Yogurt
  • 33. Prithwis Mukerjee 33 Rules from First of 2 Itemsets in L3 One rule drops out because confidence < 70% Confidence of association rules from { Bread B, Cheese C, Coffee D } Rule Confidence B => CD 8 13 0.615 C => BD 8 11 0.727 D => BC 8 9 0.889 CD => B 8 9 0.889 BD => C 8 8 1.000 BC => D 8 8 1.000 Support of BCD Frequency of LHS Frequent 2-Item Set Freq {Bread, Cereal} 9 {Bread, Cheese} 8 {Bread, Coffee} 8 {Cheese, Coffee} 9 7 {Chocolate, Juice} 7 9 {Chocolate, Donuts} {Donuts, Juice}
  • 34. Prithwis Mukerjee 34 Rules from Second of 2 Itemsets in L3 One rule drops out because confidence < 70% Rule Confidence N => MP 7 9 0.778 M => NP 7 10 0.700 P => NM 7 11 0.636 MP => N 7 9 0.778 NP => M 7 7 1.000 NM => P 7 7 1.000 Confidence of association rules from { chocolate N, donut M, juice P} Support of BCD Frequency of LHS Item No Item Name Frequency 1 Biscuits 4 2 Bread 13 3 Cereal 10 4 Cheese 11 5 Chocolate 9 6 Coffee 9 7 10 8 Eggs 2 9 Juice 11 10 Milk 6 11 Newspaper 2 12 Pastry 1 13 Rolls 2 14 Sugar 1 15 Tea 4 16 2 Donuts Yogurt
  • 35. Prithwis Mukerjee 35 Rules from Second of 2 Itemsets in L3 One rule drops out because confidence < 70% Rule Confidence N => MP 7 9 0.778 M => NP 7 10 0.700 P => NM 7 11 0.636 MP => N 7 9 0.778 NP => M 7 7 1.000 NM => P 7 7 1.000 Confidence of association rules from { chocolate N, donut M, juice P} Support of BCD Frequency of LHS Frequent 2-Item Set Freq {Bread, Cereal} 9 {Bread, Cheese} 8 {Bread, Coffee} 8 {Cheese, Coffee} 9 7 {Chocolate, Juice} 7 9 {Chocolate, Donuts} {Donuts, Juice}
  • 36. Prithwis Mukerjee 36 Set of 14 Rules obtained from L3 C => BD C => B 1 Cheese => Bread C => D 2 Cheese => Coffee D => BC D => B 3 Coffee = > Bread D => C 4 Coffee => Cheese CD => B 5 Cheese, Coffee => Bread BD => C 6 Bread, Coffee => Cheese BC => D 7 Bread, Cheese => Coffee N => MP N => M 8 N => P 9 Chocolate => Juice M => NP M => P 10 M => N 11 MP => N 12 NP => M 13 NM => P 14 Chocolate => Donuts Donuts => Chocolate Donuts => Juice Donuts, Juice => Chocolate Chocolate , Juice => Donuts Chocolate, Donuts => Juice
  • 37. Prithwis Mukerjee 37 What about L2 ? Look for sets in L2 that are not subsets of L3  { Bread, Cereal} is the only candidate  Which gives are two more rules  Bread ⇒ Cereal  Cereal ⇒ Bread Frequent 2-Item Set Freq {Bread, Cereal} 9 {Bread, Cheese} 8 {Bread, Coffee} 8 {Cheese, Coffee} 9 7 {Chocolate, Juice} 7 9 {Chocolate, Donuts} {Donuts, Juice} Frequent 3 item set Frequency {Bread, Cheese, Coffee} 8 7{Chocolate, Donut, Juice}
  • 38. Prithwis Mukerjee 38 Which are now added to get 16 rules C => BD C => B 1 Cheese => Bread C => D 2 Cheese => Coffee D => BC D => B 3 Coffee = > Bread D => C 4 Coffee => Cheese CD => B 5 Cheese, Coffee => Bread BD => C 6 Bread, Coffee => Cheese BC => D 7 Bread, Cheese => Coffee N => MP N => M 8 N => P 9 Chocolate => Juice M => NP M => P 10 M => N 11 MP => N 12 NP => M 13 NM => P 14 15 Bread = > Cereal 16 Cereal => Bread Chocolate => Donuts Donuts => Chocolate Donuts => Juice Donuts, Juice => Chocolate Chocolate , Juice => Donuts Chocolate, Donuts => Juice
  • 39. Prithwis Mukerjee 39 So where are we ? Apriori Algorithm Consists of two PARTS  First find the frequent itemsets  Most of the cleverness happens here  We will do better than the naive algorithm  Find the rules  This is relatively simpler We have just completed the two PARTS Overall approach to ARM is as follows  List all itemsets  Find frequency of each  Identify “frequent sets”  Based on support  Search for Rules within “frequent sets”  Based on confidence Naive Algorithm  Exponential Time A Priori Algoritm  Polynomial Time
  • 40. Prithwis Mukerjee 40 Observations Actual values of support and confidence  25%, 75% are very high values  In reality one works with far smaller values “Interestingness” of a rule  Since X, Y are related events – not independent – hence P(X ∩ Y) ≠ P(X)P(Y)  Interestingness ≈ P(X ∩ Y) – P(X)P(Y) Triviality of rules  Rules involving very frequent items can be trivial  You always buy potatoes when you go to the market and so you can get rules that connect potatoes to many things Inexplicable rules  Toothbrush was the most frequent item on Tuesday ??
  • 41. Prithwis Mukerjee 41 Better Algorithms Enhancements to the Apriori Algorithm  AP-TID  Direct Hashing and Pruning (DHP)  Dynamic Itemset Counting (DIC) Frequent Pattern (FP) Tree  Only frequent items are needed to find association rules – so ignore others !  Move the data of only frequent items to a more compact and efficient structure  A Tree structure or a directed graph is used  Multiple transactions with same (frequent) items are stored once with a count information
  • 42. Prithwis Mukerjee 42 Software Support KDNuggets.com  Excellent collections of software available Bart Goethals  Free software for Apriori, FP-Tree ARMiner  GNU Open Source software from UMass/Boston DMII  National University of Singapore DB2 Intelligent Data Miner  IBM Corporation  Equivalent software available from other vendors as well