SlideShare a Scribd company logo
Unit 4: Association Analysis LH 7
Presented By : Tekendra Nath Yogi
Tekendranath@gmail.com
College Of Applied Business And Technology
Contd…
• Outline:
– 4.1. Basics and Algorithms
– 4.2. Frequent Item-set Pattern & Apriori Principle
– 4.3. FP-Growth, FP-Tree
– 4.4. Handling Categorical Attributes
26/30/2019 By: Tekendra Nath Yogi
6/30/2019 By:Tekendra Nath Yogi 3
Association Analysis
• Association rules analysis is a technique to uncover(mine) how
items are associated to each other.
• Such uncovered association between items are called association
rules
• When to mine association rules?
– Scenario:
• You are a sales manager
• Customer bought a pc and a digital camera recently
• What should you recommend to her next?
• Association rules are helpful In making your recommendation.
6/30/2019 By:Tekendra Nath Yogi 4
Contd…
• Frequent patterns(item sets):
– Frequent patterns are patterns that appear frequently in a data
set.
– E.g.,
• In transaction data set milk and bread is a frequent pattern,
• In a shopping history database first buy pc, then a digital camera, and
then a memory card is another example of frequent pattern.
– Finding frequent patterns plays an essential role in mining
association rules.
6/30/2019 By:Tekendra Nath Yogi 5
Frequent pattern mining
• Frequent pattern mining searches for recurring relationships in a given
data set.
• Frequent pattern mining helps for the discovery of interesting
associations between item sets
• Such associations can be applicable in many business decision making
processes such as:
– Catalog design
– Basket data analysis
– cross-marketing,
– sale campaign analysis,
– Web log (click stream) analysis, etc
6/30/2019 By:Tekendra Nath Yogi 6
Market Basket analysis
A typical example of frequent pattern(item set) mining for association rules.
• Market basket analysis analyzes customer buying habits by finding
associations between the different items that customers place in their shopping
baskets.
Applications: To make marketing strategies
Example of Association Rule:
milk bread
Definitions
• Itemset:
– A collection of one or more items
• Example: {Milk, Bread, Diaper}
– k-itemset
• An itemset that contains k items
• Support count ():
– Frequency of occurrence of an itemset
– E.g. ({Milk, Bread,Diaper}) = 2
• Support:
– Fraction of transactions that contain an
itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent Itemset:
– An itemset whose support is greater than or
equal to a minsup threshold
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
6/30/2019 7By:Tekendra Nath Yogi
Definitions
• Association Rule :
• An implication expression of the form X  Y, where X and Y are
itemsets
• e.g.,
• It is not very difficult to develop algorithms that will find this
associations in a large database.
• The problem is that such an algorithm will also uncover many other
associations that are of very little value.
• It is necessary to introduce some measures to distinguish interesting
associations from non-interesting ones.
Beer}Diaper,Milk{ 
6/30/2019 8By:Tekendra Nath Yogi
6/30/2019 By:Tekendra Nath Yogi 9
Contd….
• Rule Evaluation Metrics:
– Support (s): This says how popular an itemset is (Prevalence), as
measured by the proportion of transactions in which an itemset appears.
– In Table below, the support of {apple} is 4 out of 8, or 50%. Itemsets can
also contain multiple items. For instance, the support of {apple, beer, rice}
is 2 out of 8, or 25%.
6/30/2019 By:Tekendra Nath Yogi 10
Contd….
• Rule Evaluation Metrics:
– Confidence: This says how likely item Y is purchased when item X is
purchased(Predictability), expressed as {X -> Y}. This is measured by the
proportion of transactions with item X, in which item Y also appears. In
Table below, the confidence of {apple -> beer} is 3 out of 4, or 75%.
6/30/2019 By:Tekendra Nath Yogi 11
Example1
• Given data set D is:
• What is the support and confidence of the rule:
• Support: percentage of tuples that contain {Milk, Diaper, Beer}
– i.e.,
• Confidence:
– i.e.,
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Beer}Diaper,Milk{ 
%404.0
5
2
|T|
)BeerDiaper,,Milk(


s
%6767.0
3
2
)Diaper,Milk(
)BeerDiaper,Milk,(



c
Diper}{Milk,containthattuplesofnumber
Beer}Diper,{Milk,containthattuplesofnumber
TID date items_bought
100 10/10/99 {F,A,D,B}
200 15/10/99 {D,A,C,E,B}
300 19/10/99 {C,A,B,E}
400 20/10/99 {B,A,D}
Example2
• Given data set D is:
• What is the support and confidence of the rule: {B,D}  {A}
• Support:
• percentage of tuples that contain {A,B,D} =3/4*100=75%
• Confidence:
6/30/2019 12By:Tekendra Nath Yogi
%100100*3/3
D}{B,containthattuplesofnumber
D}B,{A,containthattuplesofnumber

Association Rule Mining Task
• Given a set of transactions T, the goal of association rule mining is to find all
rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold
• If a rule A=>B[Support, Confidence] satisfies min_sup and min_confidence
then it is a strong rule.
• So, we can say that the goal of association rule mining is to find all strong
rules.
6/30/2019 13By:Tekendra Nath Yogi
6/30/2019 By:Tekendra Nath Yogi 14
Contd….
• Brute-force approach for association rule mining:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf thresholds
 Computationally prohibitive!
6/30/2019 By:Tekendra Nath Yogi 15
Contd….
• How? Given data set D is:
– Example of Rules and their support and confidence is:
{Milk,Diaper}  {Beer} (s=0.4, c=0.67)
{Milk,Beer}  {Diaper} (s=0.4, c=1.0)
{Diaper,Beer}  {Milk} (s=0.4, c=0.67)
{Beer}  {Milk,Diaper} (s=0.4, c=0.67)
{Diaper}  {Milk,Beer} (s=0.4, c=0.5)
{Milk}  {Diaper,Beer} (s=0.4, c=0.5)
– Observations:
• All the above rules are binary partitions of the same itemset: {Milk, Diaper,
Beer}. Rules originating from the same itemset have identical support but
can have different confidence Thus, we may decouple the support and
confidence requirements
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Contd….
• Association Rules mining is a two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support  minsup
2. Rule Generation
– Generate high confidence rules from each frequent itemset, where
each rule is a binary partitioning of a frequent itemset
• Frequent itemset generation is still computationally expensive!
6/30/2019 16By:Tekendra Nath Yogi
Contd…
Given d items, there
are 2d possible
candidate itemsets
6/30/2019 17By:Tekendra Nath Yogi
Frequent Itemset Generation: Frequent itemset generation is still computationally
expensive!
Contd…
• Reducing Number of Candidates! By using Apriori principle:
– Apriori principle states that:
• If an itemset is frequent, then all of its subsets must also be frequent
– E.g., if {beer, diaper, nuts} is frequent, so is {beer, diaper}
• Apriori principle holds due to the following property of the support
measure:
• i.e., Support of an itemset never exceeds the support of its subsets
• E.g.,
)()()(:, YsXsYXYX 
6/30/2019 18By:Tekendra Nath Yogi
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
s(Bread) > s(Bread, Beer)
s(Milk) > s(Bread, Milk)
s(Diaper, Beer) > s(Diaper, Beer, Coke)
6/30/2019 By:Tekendra Nath Yogi 19
Contd…
• How is the apriori property is used in the algorithm?
– If there is any itemset which is infrequent, its superset should not be
generated/tested!
Found to be
Infrequent
Pruned supersets
If item-set {a,b} is infrequent then we do not
need to take into account all its super-sets.
The Apriori Algorithm
1. Initially, scan DB once to get candidate item set C1 and find frequent
1-items from C1and put them to Lk (k=1)
2. Use Lk to generate a collection of candidate itemsets Ck+1 with size
(k+1)
3. Scan the database to find which itemsets in Ck+1 are frequent and put
them into Lk+1
4. If Lk+1 is not empty(i.e., terminate when no frequent or candidate set
can be generated)
k=k+1
GOTO 2
6/30/2019 20By:Tekendra Nath Yogi
6/30/2019 By:Tekendra Nath Yogi 21
Generating association rules from frequent itemsets
• generate strong association rules from frequent itemsets (where strong
association rules satisfy both minimum support and minimum confidence) as
follows:
• The rules generated from frequent itemsets, each one automatically satisfies
the minimum support.
6/30/2019 By:Tekendra Nath Yogi 22
Contd…
• Example1: Use APRIORI algorithm to generate strong
association rules from the following transaction database. Use
min_sup=2 and min_confidence=75%
Database TDB
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
6/30/2019 By:Tekendra Nath Yogi 23
Contd…
Database TDB
1st scan
C1
L1
L2
C2 C2
2nd scan
C3 L33rd scan
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Itemset sup
{A} 2
{B} 3
{C} 3
{D} 1
{E} 3
Itemset sup
{A} 2
{B} 3
{C} 3
{E} 3
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
Itemset sup
{A, B} 1
{A, C} 2
{A, E} 1
{B, C} 2
{B, E} 3
{C, E} 2
Itemset sup
{A, C} 2
{B, C} 2
{B, E} 3
{C, E} 2
Itemset
{B, C, E}
Itemset sup
{B, C, E} 2
Supmin = 2
Step1: Frequent itemset generation:
6/30/2019 By:Tekendra Nath Yogi 24
Contd…
• Step2:Association rule generation and strong association rule filtering:
– Possible association rules for frequent item set {B,C,E} and their
corresponding confidence is:
• B->{C,E} confidence1= 2/3=66.67%
• C->{B,E} confidence 2=2/3=66.67%
• E->{B,C} confidence3=2/3=66.67%
• {C,E}->B confidence4=2/2=100%
• {B,E}->C confidence5=2/3=66.67%
• {B,C}->E confidence6=2/2=100%
– Here, minimum confidence is 75% so strong rules are:
• {C,E}->B
• {B,C}->E
6/30/2019 By:Tekendra Nath Yogi 25
Contd…
• Example2: Use APRIORI algorithm to generate strong
association rules from the following transaction database. Use
min_sup=2 and min_confidence=75%
Database TDB
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Contd….
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database D itemset sup.
{1} 2
{2} 3
{3} 3
{4} 1
{5} 3
itemset sup.
{1} 2
{2} 3
{3} 3
{5} 3
Scan D
C1
L1
itemset
{1 2}
{1 3}
{1 5}
{2 3}
{2 5}
{3 5}
itemset sup
{1 2} 1
{1 3} 2
{1 5} 1
{2 3} 2
{2 5} 3
{3 5} 2
itemset sup
{1 3} 2
{2 3} 2
{2 5} 3
{3 5} 2
L2
C2 C2
Scan D
C3 L3itemset
{2 3 5}
Scan D itemset sup
{2 3 5} 2
min_sup=2=50%
6/30/2019 26By:Tekendra Nath Yogi
Step1: Frequent itemset generation:
6/30/2019 By:Tekendra Nath Yogi 27
Contd…
• Step2:Association rule generation and strong association rule
filtering:
– Possible association rules for frequent item set {2,3,5} and their
corresponding confidence is:
• 2->{3,5} c1=2/3=66.67%
• 3->{2,5} c2=2/3=66.67%
• 5->{2,3} c3=2/3=66.67%
• {3,5}->2 c4=2/2=100%
• {2,5}->3 c5=2/3=100%
• {2,3}->5 c6=2/2=100%
– Here, minimum confidence is 75% so strong rules are:
• {3,5}->2
• {2,3}->5
6/30/2019 By:Tekendra Nath Yogi 28
Contd…
• Example3:
– Consider the following transactions for association rules analysis:
– Use minimum support(min_sup) = 2 (2/9 = 22%) and
– Minimum confidence = 70%
6/30/2019 By:Tekendra Nath Yogi 29
Contd…
• Step1: Frequent Itemset Generation:
6/30/2019 By:Tekendra Nath Yogi 30
Contd…
• Step2: Generating association rules:
• The data contain frequent itemset X ={I1,I2,I5} .What are the association rules
that can be generated from X?
• The nonempty subsets of X are . The
resulting association rules are as shown below, each listed with its confidence:
• Here, minimum confidence threshold is 70%, so only the second, third, and
last rules are output, because these are the only ones generated that are strong.
31
Contd...
• Problems with A-priori Algorithms:
– It is costly to handle a huge number of candidate sets.
• For example if there are 104 large 1-itemsets, the Apriori algorithm
will need to generate more than 107 candidate 2-itemsets. Moreover
for 100-itemsets, it must generate more than 2100  1030 candidates in
total.
– The candidate generation is the inherent cost of the Apriori Algorithms, no
matter what implementation technique is applied.
– To mine a large data sets for long patterns – this algorithm is NOT a good
idea.
– When Database is scanned to check Ck for creating Lk, a large number of
transactions will be scanned even they do not contain any k-itemset.
6/30/2019 By:Tekendra Nath Yogi
Frequent pattern (FP) growth approach for Mining
Frequent Item-sets
• A frequent pattern growth approach mines Frequent Patterns
Without Candidate Generation.
• In FP-Growth there are mainly two step involved:
– Build a compact data structure called FP-Tree and
– Than, extract frequent itemset directly from the FP-tree.
6/30/2019 32By:Tekendra Nath Yogi
6/30/2019 By:Tekendra Nath Yogi 33
Contd….
• FP-tree Construction from a Transactional DB:
– FP-Tree is constructed using two passes over the data set:
– Pass1:
1. Scan DB to find frequent 1-itemsets as:
a. Scan DB and find support for each item.
b. Discard infrequent items
2. sort items in descending order of their frequency(support count).
3. Sort the items in each transaction in descending order of their frequency.
Use this order when building the FP-tree, so common prefixes can be shared.
– Pass2: Scan DB again, construct FP-tree
1. FP-growth reads one transaction at a time and maps it to a path.
2. Fixed order is used, so path can overlap when transaction share items.
3. Pointers are maintained between nodes containing the same item(doted line)
June 30, 2019 By:Tekendra Nath Yogi 34
Contd…
Fig: Flow chart for FP-tree construction process
Contd..
• Mining Frequent Patterns Using FP-tree:
– Start from each frequent length-1 pattern (called suffix pattern)
– construct its conditional pattern base (set of prefix paths in the FP-tree co-
occurring with the suffix pattern)
– then construct its (conditional) FP-tree.
– The pattern growth is achieved by the concatenation of the suffix pattern
with the frequent patterns generated from a conditional FP-tree.
6/30/2019 35By:Tekendra Nath Yogi
June 30, 2019 By:Tekendra Nath Yogi 36
Contd…
• Example1: Find all frequent itemsets or frequent patterns in the following
database using FP-growth algorithm. Take minimum support=2
TID List of item
IDs
1 I1, I2, I5
2 I2, I4
3 I2, I3
4 I1, I2, I4
5 I1, I3
6 I2, I3
7 I1, I3
8 I1, I2, I3, I5
9 I1, I2, I3
• Now we will build a FPtree of thatdatabase
• Item sets are considered in order of their descending value of supportcount.
June 30, 2019 By:Tekendra Nath Yogi 37
Contd…
• Constructing 1-itemsets and counting support count for each item set :
• Discarding all infrequent itemsets:
since min_sup=2.
Itemset Support count
I1 6
I2 7
I3 6
I4 2
I5 2
Itemset Support count
I1 6
I2 7
I3 6
I4 2
I5 2
June 30, 2019 By:Tekendra Nath Yogi 38
Contd…
• Sorting frequent 1-itemsets in descending order of their support count:
Itemset Support count
I2 7
I1 6
I3 6
I4 2
I5 2
June 30, 2019 By:Tekendra Nath Yogi 39
Contd…
• Now, ordering each itemsets in D based on frequent 1-itemsets above:
TID List of items Ordered items
1 I1,I2,I5 I2,I1,I5
2 I2,I4 I2,I4
3 I2,I3 I2,I3
4 I1,I2,I4 I2,I1,I4
5 I1,I3 I1,I3
6 I2,I3 I2,I3
7 I1,I3 I1,I3
8 I1,I2,I3,I5 I2,I1,I3,I5
9 I1,I2,I3 I2,I1,I3
June 30, 2019 By:Tekendra Nath Yogi 40
Contd…
• Now drawing FP-tree by using ordered itemsets one by one:
null
I2:1
null
I2:1
I1:1
I5:1
ForTransaction 1:
I2,I1,I5
June 30, 2019 By:Tekendra Nath Yogi 41
Contd…
null
I2:2
I1:1
I5:1
I4:1
ForTransaction 2:
I2,I4
June 30, 2019 By:Tekendra Nath Yogi 42
Contd…
null
I2:3
I1:1
I5:1
I3:1 I4:1
ForTransaction 3:
I2,I3
June 30, 2019 By:Tekendra Nath Yogi 43
Contd…
null
I2:4
I1:2
I5:1
I3:1 I4:1
I4:1
ForTransaction 4:
I2,I1,I4
June 30, 2019 By:Tekendra Nath Yogi 44
Contd…
null
I2:4
I1:2
I3:1 I4:1
I4:1
ForTransaction 5:
I1,I3
I5:1
I1:1
I3:1
June 30, 2019 By:Tekendra Nath Yogi 45
Contd…
null
I2:5
I1:2
I3:2 I4:1
I4:1
ForTransaction 6:
I2,I3
I5:1
I1:1
I3:1
June 30, 2019 By:Tekendra Nath Yogi 46
Contd…
null
I2:5
I1:2
I3:2 I4:1
I4:1
ForTransaction 7:
I1,I3
I5:1
I1:2
I3:2
June 30, 2019 By:Tekendra Nath Yogi 47
Contd…
null
I2:6
I1:3
I3:1
I3:2
I5:1
I4:1
I4:1
ForTransaction 8:
I2,I1,I3,I5
I5:1
I1:2
I3:2
June 30, 2019 By:Tekendra Nath Yogi 48
Contd…
null
I2:7
I1:4
I3:2
I3:2
I5:1
I4:1
I4:1
ForTransaction 9:
I2,I1,I3
I1:2
I3:2
I5:1
June 30, 2019 By:Tekendra Nath Yogi 49
Contd…
I2 7
I1 6
I3 6
I4 2
I5 2
null
I2:7
I1:4
I3:2
I3:2
I5:1
I4:1
I4:1
To facilitate tree traversal, an item header table
is built so that each item points to its
occurrences in the tree via a chain of node-
links.
I1:2
I3:2
I5:1
FPTree Construction Over!!
Now we need to find conditional pattern base and
Conditional FPTree for each item
June 30, 2019 By:Tekendra Nath Yogi 50
Contd…
null
I2:7
I1:4
I3:2
I3:2
I5:1
I4:1
I4:1
I1:2
I3:2
I5:1
Conditional Pattern Base
I5 { {I2,I1:1}, {I2,I1,I3:1} }
Conditional FPTree for I5:{I2:2,I1:2}
June 30, 2019 By:Tekendra Nath Yogi 51
Contd…
null
I2:7
I1:4
I3:2
I3:2
I5:1
I4:1
I4:1
I1:2
I3:2
I5:1
Conditional Pattern Base
I4 {{I2,I1:1}, {I2:1} }
Conditional FPTree for I4:{I2:2}
June 30, 2019 By:Tekendra Nath Yogi 52
Contd…
null
I2:7
I1:4
I3:2
I3:2
I5:1
I4:1
I4:1
I1:2
I3:2
I5:1
Conditional Pattern Base
I3 {{I2,I1:2}, {I2:2}, {I1:2}}
Conditional FPTree for I3:{I2:4,I1:2},{I1:2}
June 30, 2019 By:Tekendra Nath Yogi 53
Contd…
null
I2:7
I1:4
I3:2
I3:2
I5:1
I4:1
I4:1
I1:2
I3:2
I5:1
Conditional Pattern Base
I1 {{I2:4}}
Conditional FPTree for I1:{I2:4}
June 30, 2019 By:Tekendra Nath Yogi 54
Contd…
• Frequent patterns generated:
Frequent Pattern Generated
I5 {I2,I5:2},{I1,I5:2},{I2,I1,I5: 2}
I4 {I2,I4:2}
I3 {I2,I3:4}, {I1,I3:4}, {I2,I1,I3: 2}
I1 {I2,I1:4}
June 30, 2019 By:Tekendra Nath Yogi 55
Contd…
• Summary of generation of conditional pattern base, conditional FP-Tree, and
frequent patterns generated:
6/30/2019 By:Tekendra Nath Yogi 56
Example 2
• Example2: Find all frequent itemsets or frequent patterns in
the following database using FP-growth algorithm. Take
minimum support=3 :
6/30/2019 By:Tekendra Nath Yogi 57
Contd….
• FP-Tree construction:
• Finding frequent 1-itemset and sorting the this set in descending order of support
count(frequency):
• Then, Making sorted frequent transactions in the transaction dataset D:
Item frequency
f 4
c 4
a 3
b 3
m 3
p 3
TID Items bought (ordered) frequent items
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
Contd…
root
TID freq. Items bought
100 {f, c, a, m, p}
200 {f, c, a, b, m}
300 {f, b}
400 {c, p, b}
500 {f, c, a, m, p}
Item frequency
f 4
c 4
a 3
b 3
m 3
p 3
min_support = 3
f:1
c:1
a:1
m:1
p:1
6/30/2019 58By:Tekendra Nath Yogi
Contd…
root
Item frequency
f 4
c 4
a 3
b 3
m 3
p 3
min_support = 3
f:2
c:2
a:2
m:1
p:1
b:1
m:1
TID freq. Items bought
100 {f, c, a, m, p}
200 {f, c, a, b, m}
300 {f, b}
400 {c, p, b}
500 {f, c, a, m, p}
6/30/2019 59By:Tekendra Nath Yogi
Contd…
root
Item frequency
f 4
c 4
a 3
b 3
m 3
p 3
min_support = 3
f:3
c:2
a:2
m:1
p:1
b:1
m:1
b:1
TID freq. Items bought
100 {f, c, a, m, p}
200 {f, c, a, b, m}
300 {f, b}
400 {c, p, b}
500 {f, c, a, m, p}
6/30/2019 60By:Tekendra Nath Yogi
Contd…
root
Item frequency
f 4
c 4
a 3
b 3
m 3
p 3
min_support = 3
f:3
c:2
a:2
m:1
p:1
b:1
m:1
b:1
TID freq. Items bought
100 {f, c, a, m, p}
200 {f, c, a, b, m}
300 {f, b}
400 {c, p, b}
500 {f, c, a, m, p}
c:1
b:1
p:1
6/30/2019 61By:Tekendra Nath Yogi
Contd…
root
Item frequency
f 4
c 4
a 3
b 3
m 3
p 3
min_support = 3
f:4
c:3
a:3
m:2
p:2
b:1
m:1
b:1
TID freq. Items bought
100 {f, c, a, m, p}
200 {f, c, a, b, m}
300 {f, b}
400 {c, p, b}
500 {f, c, a, m, p}
c:1
b:1
p:1
Header Table
Item frequency head
f 4
c 4
a 3
b 3
m 3
p 3
6/30/2019 62By:Tekendra Nath Yogi
Contd…
• Mining Frequent Patterns Using the FP-tree :
Start with last item in order (i.e., p).
• Follow node pointers and traverse only the paths containing p.
• Accumulate all of transformed prefix paths of that item to form a conditional
pattern base
Conditional pattern base for p
fcam:2, cb:1
f:4
c:3
a:3
m:2
p:2
c:1
b:1
p:1
p
Construct a new FP-tree based on this
pattern, by merging all paths and
keeping nodes that appear sup times.
This leads to only one branch c:3
Thus we derive only one frequent
pattern cont. p. Pattern cp
6/30/2019 63By:Tekendra Nath Yogi
Contd…
• Move to next least frequent item in order, i.e., m
• Follow node pointers and traverse only the paths containing m.
• Accumulate all of transformed prefix paths of that item to form a conditional
pattern base
f:4
c:3
a:3
m:2
m
m:1
b:1
m-conditional pattern base: fca:2, fcab:1
{}
f:3
c:3
a:3
m-conditional FP-tree (contains only path fca:3)
All frequent patterns that include m
m,
fm, cm, am,
fcm, fam, cam,
fcam 
6/30/2019 64By:Tekendra Nath Yogi
Contd…
EmptyEmptyf
{(f:3)}|c{(f:3)}c
{(f:3, c:3)}|a{(fc:3)}a
Empty{(fca:1), (f:1), (c:1)}b
{(f:3, c:3, a:3)}|m{(fca:2), (fcab:1)}m
{(c:3)}|p{(fcam:2), (cb:1)}p
Conditional FP-treeConditional pattern-baseItem
6/30/2019 65By:Tekendra Nath Yogi
Why Is Frequent Pattern Growth Fast?
• Performance studies show
– FP-growth is an faster than Apriori
• Reasoning
– No candidate generation, no candidate test
– Uses compact data structure
– Eliminates repeated database scan
– Basic operation is counting and FP-tree building
6/30/2019 66By:Tekendra Nath Yogi
6/30/2019 By:Tekendra Nath Yogi 67
Handling Categorical Attributes
• So far, we have used only transaction data for mining association rules.
• The data can be in transaction form or table form
Transaction form: t1: a, b
t2: a, c, d, e
t3: a, d, f
Table form:
• Table data need to be converted to transaction form for association rule
mining
Attr1 Attr2 Attr3
a b d
b c e
6/30/2019 By:Tekendra Nath Yogi 68
Contd…
• To convert a table data set to a transaction data set simply change
each value to an attribute–value pair.
• For example:
6/30/2019 By:Tekendra Nath Yogi 69
Contd…
• Each attribute–value pair is considered an item.
• Using only values is not sufficient in the transaction form because different
attributes may have the same values.
• For example, without including attribute names, value a’s for Attribute1 and
Attribute2 are not distinguishable.
• After the conversion, Figure (B) can be used in mining.
6/30/2019 By:Tekendra Nath Yogi 70
Homework
• What is the aim of association rule mining? Why is this aim important in some
application?
• Define the concepts of support and confidence for an association rule.
• Show how the apriori algorithm works on an example dataset.
• What is the basis of the apriori algorithm? Describe the algorithm briefly. Which step
of the algorithm can become a bottleneck?
• A database has five transactions. Let min sup = 60% and min conf = 80%. Find all
frequent itemsets using Apriori algorithm. List all the strong association rules.
6/30/2019 By:Tekendra Nath Yogi 71
Contd…
• Show using an example how FP-tree algorithm solves the association rule
mining (ARM) problem.
• Perform ARM using FP-growth on the following data set with minimum
support = 50% and confidence = 75%
Transaction ID Items
1 Bread, cheese, Eggs, Juice
2 Bread, Cheese, Juice
3 Bread, Milk, Yogurt
4 Bread, Juice, Milk
5 Cheese, Juice, Milk
Thank You !
72By: Tekendra Nath Yogi6/30/2019

More Related Content

What's hot

Association in Frequent Pattern Mining
Association in Frequent Pattern MiningAssociation in Frequent Pattern Mining
Association in Frequent Pattern Mining
ShreeaBose
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
Azad public school
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
maha797959
 
Data mining
Data miningData mining
Data mining
Samir Sabry
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
Krish_ver2
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
Mohit Rajput
 
Random forest
Random forestRandom forest
Random forest
Ujjawal
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Decision tree
Decision treeDecision tree
Decision tree
R A Akerkar
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
Salah Amean
 
Customer Segmentation Project
Customer Segmentation ProjectCustomer Segmentation Project
Customer Segmentation Project
Aditya Ekawade
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning
ANKUSH PAL
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
Dr. Abdul Ahad Abro
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
Krish_ver2
 
Naïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptxNaïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptx
Shubham Jaybhaye
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Salah Amean
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Salah Amean
 

What's hot (20)

Association in Frequent Pattern Mining
Association in Frequent Pattern MiningAssociation in Frequent Pattern Mining
Association in Frequent Pattern Mining
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Data mining
Data miningData mining
Data mining
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
 
Random forest
Random forestRandom forest
Random forest
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Decision tree
Decision treeDecision tree
Decision tree
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Customer Segmentation Project
Customer Segmentation ProjectCustomer Segmentation Project
Customer Segmentation Project
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
 
Naïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptxNaïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptx
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 

Similar to BIM Data Mining Unit4 by Tekendra Nath Yogi

MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
nikshaikh786
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
Sulman Ahmed
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdf
WailaBaba
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
raju980973
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
selvifitria1
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
FEG
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061
badirh
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
dataminers.ir
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
Dung Nguyen
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
hktripathy
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
Smarten Augmented Analytics
 
datamining and warehousing ppt
datamining  and warehousing pptdatamining  and warehousing ppt
datamining and warehousing ppt
Satyamverma2011
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
rahulmath80
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdf
Jyoti Yadav
 
6asso
6asso6asso
Devry econ 545 week 3 course project 1 microeconomic analysis
Devry econ 545 week 3 course project 1 microeconomic analysisDevry econ 545 week 3 course project 1 microeconomic analysis
Devry econ 545 week 3 course project 1 microeconomic analysis
Haashimm
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniques
Sandhya Tarwani
 
Data Mining Techniques for CRM
Data Mining Techniques for CRMData Mining Techniques for CRM
Data Mining Techniques for CRM
Shyaamini Balu
 
ASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptxASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptx
SherishJaved
 
Profitable Itemset Mining using Weights
Profitable Itemset Mining using WeightsProfitable Itemset Mining using Weights
Profitable Itemset Mining using Weights
IRJET Journal
 

Similar to BIM Data Mining Unit4 by Tekendra Nath Yogi (20)

MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdf
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
 
datamining and warehousing ppt
datamining  and warehousing pptdatamining  and warehousing ppt
datamining and warehousing ppt
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdf
 
6asso
6asso6asso
6asso
 
Devry econ 545 week 3 course project 1 microeconomic analysis
Devry econ 545 week 3 course project 1 microeconomic analysisDevry econ 545 week 3 course project 1 microeconomic analysis
Devry econ 545 week 3 course project 1 microeconomic analysis
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniques
 
Data Mining Techniques for CRM
Data Mining Techniques for CRMData Mining Techniques for CRM
Data Mining Techniques for CRM
 
ASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptxASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptx
 
Profitable Itemset Mining using Weights
Profitable Itemset Mining using WeightsProfitable Itemset Mining using Weights
Profitable Itemset Mining using Weights
 

More from Tekendra Nath Yogi

Unit9:Expert System
Unit9:Expert SystemUnit9:Expert System
Unit9:Expert System
Tekendra Nath Yogi
 
Unit7: Production System
Unit7: Production SystemUnit7: Production System
Unit7: Production System
Tekendra Nath Yogi
 
Unit8: Uncertainty in AI
Unit8: Uncertainty in AIUnit8: Uncertainty in AI
Unit8: Uncertainty in AI
Tekendra Nath Yogi
 
Unit5: Learning
Unit5: LearningUnit5: Learning
Unit5: Learning
Tekendra Nath Yogi
 
Unit4: Knowledge Representation
Unit4: Knowledge RepresentationUnit4: Knowledge Representation
Unit4: Knowledge Representation
Tekendra Nath Yogi
 
Unit3:Informed and Uninformed search
Unit3:Informed and Uninformed searchUnit3:Informed and Uninformed search
Unit3:Informed and Uninformed search
Tekendra Nath Yogi
 
Unit2: Agents and Environment
Unit2: Agents and EnvironmentUnit2: Agents and Environment
Unit2: Agents and Environment
Tekendra Nath Yogi
 
Unit1: Introduction to AI
Unit1: Introduction to AIUnit1: Introduction to AI
Unit1: Introduction to AI
Tekendra Nath Yogi
 
Unit 6: Application of AI
Unit 6: Application of AIUnit 6: Application of AI
Unit 6: Application of AI
Tekendra Nath Yogi
 
Unit10
Unit10Unit10
Unit9
Unit9Unit9
Unit8
Unit8Unit8
Unit7
Unit7Unit7
BIM Data Mining Unit5 by Tekendra Nath Yogi
 BIM Data Mining Unit5 by Tekendra Nath Yogi BIM Data Mining Unit5 by Tekendra Nath Yogi
BIM Data Mining Unit5 by Tekendra Nath Yogi
Tekendra Nath Yogi
 
BIM Data Mining Unit3 by Tekendra Nath Yogi
 BIM Data Mining Unit3 by Tekendra Nath Yogi BIM Data Mining Unit3 by Tekendra Nath Yogi
BIM Data Mining Unit3 by Tekendra Nath Yogi
Tekendra Nath Yogi
 
BIM Data Mining Unit2 by Tekendra Nath Yogi
 BIM Data Mining Unit2 by Tekendra Nath Yogi BIM Data Mining Unit2 by Tekendra Nath Yogi
BIM Data Mining Unit2 by Tekendra Nath Yogi
Tekendra Nath Yogi
 
Unit6
Unit6Unit6
B. SC CSIT Computer Graphics Unit 5 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 5 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 5 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 5 By Tekendra Nath Yogi
Tekendra Nath Yogi
 
B. SC CSIT Computer Graphics Lab By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Lab By Tekendra Nath YogiB. SC CSIT Computer Graphics Lab By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Lab By Tekendra Nath Yogi
Tekendra Nath Yogi
 
B. SC CSIT Computer Graphics Unit 4 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 4 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 4 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 4 By Tekendra Nath Yogi
Tekendra Nath Yogi
 

More from Tekendra Nath Yogi (20)

Unit9:Expert System
Unit9:Expert SystemUnit9:Expert System
Unit9:Expert System
 
Unit7: Production System
Unit7: Production SystemUnit7: Production System
Unit7: Production System
 
Unit8: Uncertainty in AI
Unit8: Uncertainty in AIUnit8: Uncertainty in AI
Unit8: Uncertainty in AI
 
Unit5: Learning
Unit5: LearningUnit5: Learning
Unit5: Learning
 
Unit4: Knowledge Representation
Unit4: Knowledge RepresentationUnit4: Knowledge Representation
Unit4: Knowledge Representation
 
Unit3:Informed and Uninformed search
Unit3:Informed and Uninformed searchUnit3:Informed and Uninformed search
Unit3:Informed and Uninformed search
 
Unit2: Agents and Environment
Unit2: Agents and EnvironmentUnit2: Agents and Environment
Unit2: Agents and Environment
 
Unit1: Introduction to AI
Unit1: Introduction to AIUnit1: Introduction to AI
Unit1: Introduction to AI
 
Unit 6: Application of AI
Unit 6: Application of AIUnit 6: Application of AI
Unit 6: Application of AI
 
Unit10
Unit10Unit10
Unit10
 
Unit9
Unit9Unit9
Unit9
 
Unit8
Unit8Unit8
Unit8
 
Unit7
Unit7Unit7
Unit7
 
BIM Data Mining Unit5 by Tekendra Nath Yogi
 BIM Data Mining Unit5 by Tekendra Nath Yogi BIM Data Mining Unit5 by Tekendra Nath Yogi
BIM Data Mining Unit5 by Tekendra Nath Yogi
 
BIM Data Mining Unit3 by Tekendra Nath Yogi
 BIM Data Mining Unit3 by Tekendra Nath Yogi BIM Data Mining Unit3 by Tekendra Nath Yogi
BIM Data Mining Unit3 by Tekendra Nath Yogi
 
BIM Data Mining Unit2 by Tekendra Nath Yogi
 BIM Data Mining Unit2 by Tekendra Nath Yogi BIM Data Mining Unit2 by Tekendra Nath Yogi
BIM Data Mining Unit2 by Tekendra Nath Yogi
 
Unit6
Unit6Unit6
Unit6
 
B. SC CSIT Computer Graphics Unit 5 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 5 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 5 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 5 By Tekendra Nath Yogi
 
B. SC CSIT Computer Graphics Lab By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Lab By Tekendra Nath YogiB. SC CSIT Computer Graphics Lab By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Lab By Tekendra Nath Yogi
 
B. SC CSIT Computer Graphics Unit 4 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 4 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 4 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 4 By Tekendra Nath Yogi
 

Recently uploaded

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 

BIM Data Mining Unit4 by Tekendra Nath Yogi

  • 1. Unit 4: Association Analysis LH 7 Presented By : Tekendra Nath Yogi Tekendranath@gmail.com College Of Applied Business And Technology
  • 2. Contd… • Outline: – 4.1. Basics and Algorithms – 4.2. Frequent Item-set Pattern & Apriori Principle – 4.3. FP-Growth, FP-Tree – 4.4. Handling Categorical Attributes 26/30/2019 By: Tekendra Nath Yogi
  • 3. 6/30/2019 By:Tekendra Nath Yogi 3 Association Analysis • Association rules analysis is a technique to uncover(mine) how items are associated to each other. • Such uncovered association between items are called association rules • When to mine association rules? – Scenario: • You are a sales manager • Customer bought a pc and a digital camera recently • What should you recommend to her next? • Association rules are helpful In making your recommendation.
  • 4. 6/30/2019 By:Tekendra Nath Yogi 4 Contd… • Frequent patterns(item sets): – Frequent patterns are patterns that appear frequently in a data set. – E.g., • In transaction data set milk and bread is a frequent pattern, • In a shopping history database first buy pc, then a digital camera, and then a memory card is another example of frequent pattern. – Finding frequent patterns plays an essential role in mining association rules.
  • 5. 6/30/2019 By:Tekendra Nath Yogi 5 Frequent pattern mining • Frequent pattern mining searches for recurring relationships in a given data set. • Frequent pattern mining helps for the discovery of interesting associations between item sets • Such associations can be applicable in many business decision making processes such as: – Catalog design – Basket data analysis – cross-marketing, – sale campaign analysis, – Web log (click stream) analysis, etc
  • 6. 6/30/2019 By:Tekendra Nath Yogi 6 Market Basket analysis A typical example of frequent pattern(item set) mining for association rules. • Market basket analysis analyzes customer buying habits by finding associations between the different items that customers place in their shopping baskets. Applications: To make marketing strategies Example of Association Rule: milk bread
  • 7. Definitions • Itemset: – A collection of one or more items • Example: {Milk, Bread, Diaper} – k-itemset • An itemset that contains k items • Support count (): – Frequency of occurrence of an itemset – E.g. ({Milk, Bread,Diaper}) = 2 • Support: – Fraction of transactions that contain an itemset – E.g. s({Milk, Bread, Diaper}) = 2/5 • Frequent Itemset: – An itemset whose support is greater than or equal to a minsup threshold TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke 6/30/2019 7By:Tekendra Nath Yogi
  • 8. Definitions • Association Rule : • An implication expression of the form X  Y, where X and Y are itemsets • e.g., • It is not very difficult to develop algorithms that will find this associations in a large database. • The problem is that such an algorithm will also uncover many other associations that are of very little value. • It is necessary to introduce some measures to distinguish interesting associations from non-interesting ones. Beer}Diaper,Milk{  6/30/2019 8By:Tekendra Nath Yogi
  • 9. 6/30/2019 By:Tekendra Nath Yogi 9 Contd…. • Rule Evaluation Metrics: – Support (s): This says how popular an itemset is (Prevalence), as measured by the proportion of transactions in which an itemset appears. – In Table below, the support of {apple} is 4 out of 8, or 50%. Itemsets can also contain multiple items. For instance, the support of {apple, beer, rice} is 2 out of 8, or 25%.
  • 10. 6/30/2019 By:Tekendra Nath Yogi 10 Contd…. • Rule Evaluation Metrics: – Confidence: This says how likely item Y is purchased when item X is purchased(Predictability), expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears. In Table below, the confidence of {apple -> beer} is 3 out of 4, or 75%.
  • 11. 6/30/2019 By:Tekendra Nath Yogi 11 Example1 • Given data set D is: • What is the support and confidence of the rule: • Support: percentage of tuples that contain {Milk, Diaper, Beer} – i.e., • Confidence: – i.e., TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Beer}Diaper,Milk{  %404.0 5 2 |T| )BeerDiaper,,Milk(   s %6767.0 3 2 )Diaper,Milk( )BeerDiaper,Milk,(    c Diper}{Milk,containthattuplesofnumber Beer}Diper,{Milk,containthattuplesofnumber
  • 12. TID date items_bought 100 10/10/99 {F,A,D,B} 200 15/10/99 {D,A,C,E,B} 300 19/10/99 {C,A,B,E} 400 20/10/99 {B,A,D} Example2 • Given data set D is: • What is the support and confidence of the rule: {B,D}  {A} • Support: • percentage of tuples that contain {A,B,D} =3/4*100=75% • Confidence: 6/30/2019 12By:Tekendra Nath Yogi %100100*3/3 D}{B,containthattuplesofnumber D}B,{A,containthattuplesofnumber 
  • 13. Association Rule Mining Task • Given a set of transactions T, the goal of association rule mining is to find all rules having – support ≥ minsup threshold – confidence ≥ minconf threshold • If a rule A=>B[Support, Confidence] satisfies min_sup and min_confidence then it is a strong rule. • So, we can say that the goal of association rule mining is to find all strong rules. 6/30/2019 13By:Tekendra Nath Yogi
  • 14. 6/30/2019 By:Tekendra Nath Yogi 14 Contd…. • Brute-force approach for association rule mining: – List all possible association rules – Compute the support and confidence for each rule – Prune rules that fail the minsup and minconf thresholds  Computationally prohibitive!
  • 15. 6/30/2019 By:Tekendra Nath Yogi 15 Contd…. • How? Given data set D is: – Example of Rules and their support and confidence is: {Milk,Diaper}  {Beer} (s=0.4, c=0.67) {Milk,Beer}  {Diaper} (s=0.4, c=1.0) {Diaper,Beer}  {Milk} (s=0.4, c=0.67) {Beer}  {Milk,Diaper} (s=0.4, c=0.67) {Diaper}  {Milk,Beer} (s=0.4, c=0.5) {Milk}  {Diaper,Beer} (s=0.4, c=0.5) – Observations: • All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer}. Rules originating from the same itemset have identical support but can have different confidence Thus, we may decouple the support and confidence requirements TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
  • 16. Contd…. • Association Rules mining is a two-step approach: 1. Frequent Itemset Generation – Generate all itemsets whose support  minsup 2. Rule Generation – Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset • Frequent itemset generation is still computationally expensive! 6/30/2019 16By:Tekendra Nath Yogi
  • 17. Contd… Given d items, there are 2d possible candidate itemsets 6/30/2019 17By:Tekendra Nath Yogi Frequent Itemset Generation: Frequent itemset generation is still computationally expensive!
  • 18. Contd… • Reducing Number of Candidates! By using Apriori principle: – Apriori principle states that: • If an itemset is frequent, then all of its subsets must also be frequent – E.g., if {beer, diaper, nuts} is frequent, so is {beer, diaper} • Apriori principle holds due to the following property of the support measure: • i.e., Support of an itemset never exceeds the support of its subsets • E.g., )()()(:, YsXsYXYX  6/30/2019 18By:Tekendra Nath Yogi TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke s(Bread) > s(Bread, Beer) s(Milk) > s(Bread, Milk) s(Diaper, Beer) > s(Diaper, Beer, Coke)
  • 19. 6/30/2019 By:Tekendra Nath Yogi 19 Contd… • How is the apriori property is used in the algorithm? – If there is any itemset which is infrequent, its superset should not be generated/tested! Found to be Infrequent Pruned supersets If item-set {a,b} is infrequent then we do not need to take into account all its super-sets.
  • 20. The Apriori Algorithm 1. Initially, scan DB once to get candidate item set C1 and find frequent 1-items from C1and put them to Lk (k=1) 2. Use Lk to generate a collection of candidate itemsets Ck+1 with size (k+1) 3. Scan the database to find which itemsets in Ck+1 are frequent and put them into Lk+1 4. If Lk+1 is not empty(i.e., terminate when no frequent or candidate set can be generated) k=k+1 GOTO 2 6/30/2019 20By:Tekendra Nath Yogi
  • 21. 6/30/2019 By:Tekendra Nath Yogi 21 Generating association rules from frequent itemsets • generate strong association rules from frequent itemsets (where strong association rules satisfy both minimum support and minimum confidence) as follows: • The rules generated from frequent itemsets, each one automatically satisfies the minimum support.
  • 22. 6/30/2019 By:Tekendra Nath Yogi 22 Contd… • Example1: Use APRIORI algorithm to generate strong association rules from the following transaction database. Use min_sup=2 and min_confidence=75% Database TDB Tid Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E
  • 23. 6/30/2019 By:Tekendra Nath Yogi 23 Contd… Database TDB 1st scan C1 L1 L2 C2 C2 2nd scan C3 L33rd scan Tid Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E Itemset sup {A} 2 {B} 3 {C} 3 {D} 1 {E} 3 Itemset sup {A} 2 {B} 3 {C} 3 {E} 3 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} Itemset sup {A, B} 1 {A, C} 2 {A, E} 1 {B, C} 2 {B, E} 3 {C, E} 2 Itemset sup {A, C} 2 {B, C} 2 {B, E} 3 {C, E} 2 Itemset {B, C, E} Itemset sup {B, C, E} 2 Supmin = 2 Step1: Frequent itemset generation:
  • 24. 6/30/2019 By:Tekendra Nath Yogi 24 Contd… • Step2:Association rule generation and strong association rule filtering: – Possible association rules for frequent item set {B,C,E} and their corresponding confidence is: • B->{C,E} confidence1= 2/3=66.67% • C->{B,E} confidence 2=2/3=66.67% • E->{B,C} confidence3=2/3=66.67% • {C,E}->B confidence4=2/2=100% • {B,E}->C confidence5=2/3=66.67% • {B,C}->E confidence6=2/2=100% – Here, minimum confidence is 75% so strong rules are: • {C,E}->B • {B,C}->E
  • 25. 6/30/2019 By:Tekendra Nath Yogi 25 Contd… • Example2: Use APRIORI algorithm to generate strong association rules from the following transaction database. Use min_sup=2 and min_confidence=75% Database TDB TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5
  • 26. Contd…. TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database D itemset sup. {1} 2 {2} 3 {3} 3 {4} 1 {5} 3 itemset sup. {1} 2 {2} 3 {3} 3 {5} 3 Scan D C1 L1 itemset {1 2} {1 3} {1 5} {2 3} {2 5} {3 5} itemset sup {1 2} 1 {1 3} 2 {1 5} 1 {2 3} 2 {2 5} 3 {3 5} 2 itemset sup {1 3} 2 {2 3} 2 {2 5} 3 {3 5} 2 L2 C2 C2 Scan D C3 L3itemset {2 3 5} Scan D itemset sup {2 3 5} 2 min_sup=2=50% 6/30/2019 26By:Tekendra Nath Yogi Step1: Frequent itemset generation:
  • 27. 6/30/2019 By:Tekendra Nath Yogi 27 Contd… • Step2:Association rule generation and strong association rule filtering: – Possible association rules for frequent item set {2,3,5} and their corresponding confidence is: • 2->{3,5} c1=2/3=66.67% • 3->{2,5} c2=2/3=66.67% • 5->{2,3} c3=2/3=66.67% • {3,5}->2 c4=2/2=100% • {2,5}->3 c5=2/3=100% • {2,3}->5 c6=2/2=100% – Here, minimum confidence is 75% so strong rules are: • {3,5}->2 • {2,3}->5
  • 28. 6/30/2019 By:Tekendra Nath Yogi 28 Contd… • Example3: – Consider the following transactions for association rules analysis: – Use minimum support(min_sup) = 2 (2/9 = 22%) and – Minimum confidence = 70%
  • 29. 6/30/2019 By:Tekendra Nath Yogi 29 Contd… • Step1: Frequent Itemset Generation:
  • 30. 6/30/2019 By:Tekendra Nath Yogi 30 Contd… • Step2: Generating association rules: • The data contain frequent itemset X ={I1,I2,I5} .What are the association rules that can be generated from X? • The nonempty subsets of X are . The resulting association rules are as shown below, each listed with its confidence: • Here, minimum confidence threshold is 70%, so only the second, third, and last rules are output, because these are the only ones generated that are strong.
  • 31. 31 Contd... • Problems with A-priori Algorithms: – It is costly to handle a huge number of candidate sets. • For example if there are 104 large 1-itemsets, the Apriori algorithm will need to generate more than 107 candidate 2-itemsets. Moreover for 100-itemsets, it must generate more than 2100  1030 candidates in total. – The candidate generation is the inherent cost of the Apriori Algorithms, no matter what implementation technique is applied. – To mine a large data sets for long patterns – this algorithm is NOT a good idea. – When Database is scanned to check Ck for creating Lk, a large number of transactions will be scanned even they do not contain any k-itemset. 6/30/2019 By:Tekendra Nath Yogi
  • 32. Frequent pattern (FP) growth approach for Mining Frequent Item-sets • A frequent pattern growth approach mines Frequent Patterns Without Candidate Generation. • In FP-Growth there are mainly two step involved: – Build a compact data structure called FP-Tree and – Than, extract frequent itemset directly from the FP-tree. 6/30/2019 32By:Tekendra Nath Yogi
  • 33. 6/30/2019 By:Tekendra Nath Yogi 33 Contd…. • FP-tree Construction from a Transactional DB: – FP-Tree is constructed using two passes over the data set: – Pass1: 1. Scan DB to find frequent 1-itemsets as: a. Scan DB and find support for each item. b. Discard infrequent items 2. sort items in descending order of their frequency(support count). 3. Sort the items in each transaction in descending order of their frequency. Use this order when building the FP-tree, so common prefixes can be shared. – Pass2: Scan DB again, construct FP-tree 1. FP-growth reads one transaction at a time and maps it to a path. 2. Fixed order is used, so path can overlap when transaction share items. 3. Pointers are maintained between nodes containing the same item(doted line)
  • 34. June 30, 2019 By:Tekendra Nath Yogi 34 Contd… Fig: Flow chart for FP-tree construction process
  • 35. Contd.. • Mining Frequent Patterns Using FP-tree: – Start from each frequent length-1 pattern (called suffix pattern) – construct its conditional pattern base (set of prefix paths in the FP-tree co- occurring with the suffix pattern) – then construct its (conditional) FP-tree. – The pattern growth is achieved by the concatenation of the suffix pattern with the frequent patterns generated from a conditional FP-tree. 6/30/2019 35By:Tekendra Nath Yogi
  • 36. June 30, 2019 By:Tekendra Nath Yogi 36 Contd… • Example1: Find all frequent itemsets or frequent patterns in the following database using FP-growth algorithm. Take minimum support=2 TID List of item IDs 1 I1, I2, I5 2 I2, I4 3 I2, I3 4 I1, I2, I4 5 I1, I3 6 I2, I3 7 I1, I3 8 I1, I2, I3, I5 9 I1, I2, I3 • Now we will build a FPtree of thatdatabase • Item sets are considered in order of their descending value of supportcount.
  • 37. June 30, 2019 By:Tekendra Nath Yogi 37 Contd… • Constructing 1-itemsets and counting support count for each item set : • Discarding all infrequent itemsets: since min_sup=2. Itemset Support count I1 6 I2 7 I3 6 I4 2 I5 2 Itemset Support count I1 6 I2 7 I3 6 I4 2 I5 2
  • 38. June 30, 2019 By:Tekendra Nath Yogi 38 Contd… • Sorting frequent 1-itemsets in descending order of their support count: Itemset Support count I2 7 I1 6 I3 6 I4 2 I5 2
  • 39. June 30, 2019 By:Tekendra Nath Yogi 39 Contd… • Now, ordering each itemsets in D based on frequent 1-itemsets above: TID List of items Ordered items 1 I1,I2,I5 I2,I1,I5 2 I2,I4 I2,I4 3 I2,I3 I2,I3 4 I1,I2,I4 I2,I1,I4 5 I1,I3 I1,I3 6 I2,I3 I2,I3 7 I1,I3 I1,I3 8 I1,I2,I3,I5 I2,I1,I3,I5 9 I1,I2,I3 I2,I1,I3
  • 40. June 30, 2019 By:Tekendra Nath Yogi 40 Contd… • Now drawing FP-tree by using ordered itemsets one by one: null I2:1 null I2:1 I1:1 I5:1 ForTransaction 1: I2,I1,I5
  • 41. June 30, 2019 By:Tekendra Nath Yogi 41 Contd… null I2:2 I1:1 I5:1 I4:1 ForTransaction 2: I2,I4
  • 42. June 30, 2019 By:Tekendra Nath Yogi 42 Contd… null I2:3 I1:1 I5:1 I3:1 I4:1 ForTransaction 3: I2,I3
  • 43. June 30, 2019 By:Tekendra Nath Yogi 43 Contd… null I2:4 I1:2 I5:1 I3:1 I4:1 I4:1 ForTransaction 4: I2,I1,I4
  • 44. June 30, 2019 By:Tekendra Nath Yogi 44 Contd… null I2:4 I1:2 I3:1 I4:1 I4:1 ForTransaction 5: I1,I3 I5:1 I1:1 I3:1
  • 45. June 30, 2019 By:Tekendra Nath Yogi 45 Contd… null I2:5 I1:2 I3:2 I4:1 I4:1 ForTransaction 6: I2,I3 I5:1 I1:1 I3:1
  • 46. June 30, 2019 By:Tekendra Nath Yogi 46 Contd… null I2:5 I1:2 I3:2 I4:1 I4:1 ForTransaction 7: I1,I3 I5:1 I1:2 I3:2
  • 47. June 30, 2019 By:Tekendra Nath Yogi 47 Contd… null I2:6 I1:3 I3:1 I3:2 I5:1 I4:1 I4:1 ForTransaction 8: I2,I1,I3,I5 I5:1 I1:2 I3:2
  • 48. June 30, 2019 By:Tekendra Nath Yogi 48 Contd… null I2:7 I1:4 I3:2 I3:2 I5:1 I4:1 I4:1 ForTransaction 9: I2,I1,I3 I1:2 I3:2 I5:1
  • 49. June 30, 2019 By:Tekendra Nath Yogi 49 Contd… I2 7 I1 6 I3 6 I4 2 I5 2 null I2:7 I1:4 I3:2 I3:2 I5:1 I4:1 I4:1 To facilitate tree traversal, an item header table is built so that each item points to its occurrences in the tree via a chain of node- links. I1:2 I3:2 I5:1 FPTree Construction Over!! Now we need to find conditional pattern base and Conditional FPTree for each item
  • 50. June 30, 2019 By:Tekendra Nath Yogi 50 Contd… null I2:7 I1:4 I3:2 I3:2 I5:1 I4:1 I4:1 I1:2 I3:2 I5:1 Conditional Pattern Base I5 { {I2,I1:1}, {I2,I1,I3:1} } Conditional FPTree for I5:{I2:2,I1:2}
  • 51. June 30, 2019 By:Tekendra Nath Yogi 51 Contd… null I2:7 I1:4 I3:2 I3:2 I5:1 I4:1 I4:1 I1:2 I3:2 I5:1 Conditional Pattern Base I4 {{I2,I1:1}, {I2:1} } Conditional FPTree for I4:{I2:2}
  • 52. June 30, 2019 By:Tekendra Nath Yogi 52 Contd… null I2:7 I1:4 I3:2 I3:2 I5:1 I4:1 I4:1 I1:2 I3:2 I5:1 Conditional Pattern Base I3 {{I2,I1:2}, {I2:2}, {I1:2}} Conditional FPTree for I3:{I2:4,I1:2},{I1:2}
  • 53. June 30, 2019 By:Tekendra Nath Yogi 53 Contd… null I2:7 I1:4 I3:2 I3:2 I5:1 I4:1 I4:1 I1:2 I3:2 I5:1 Conditional Pattern Base I1 {{I2:4}} Conditional FPTree for I1:{I2:4}
  • 54. June 30, 2019 By:Tekendra Nath Yogi 54 Contd… • Frequent patterns generated: Frequent Pattern Generated I5 {I2,I5:2},{I1,I5:2},{I2,I1,I5: 2} I4 {I2,I4:2} I3 {I2,I3:4}, {I1,I3:4}, {I2,I1,I3: 2} I1 {I2,I1:4}
  • 55. June 30, 2019 By:Tekendra Nath Yogi 55 Contd… • Summary of generation of conditional pattern base, conditional FP-Tree, and frequent patterns generated:
  • 56. 6/30/2019 By:Tekendra Nath Yogi 56 Example 2 • Example2: Find all frequent itemsets or frequent patterns in the following database using FP-growth algorithm. Take minimum support=3 :
  • 57. 6/30/2019 By:Tekendra Nath Yogi 57 Contd…. • FP-Tree construction: • Finding frequent 1-itemset and sorting the this set in descending order of support count(frequency): • Then, Making sorted frequent transactions in the transaction dataset D: Item frequency f 4 c 4 a 3 b 3 m 3 p 3 TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
  • 58. Contd… root TID freq. Items bought 100 {f, c, a, m, p} 200 {f, c, a, b, m} 300 {f, b} 400 {c, p, b} 500 {f, c, a, m, p} Item frequency f 4 c 4 a 3 b 3 m 3 p 3 min_support = 3 f:1 c:1 a:1 m:1 p:1 6/30/2019 58By:Tekendra Nath Yogi
  • 59. Contd… root Item frequency f 4 c 4 a 3 b 3 m 3 p 3 min_support = 3 f:2 c:2 a:2 m:1 p:1 b:1 m:1 TID freq. Items bought 100 {f, c, a, m, p} 200 {f, c, a, b, m} 300 {f, b} 400 {c, p, b} 500 {f, c, a, m, p} 6/30/2019 59By:Tekendra Nath Yogi
  • 60. Contd… root Item frequency f 4 c 4 a 3 b 3 m 3 p 3 min_support = 3 f:3 c:2 a:2 m:1 p:1 b:1 m:1 b:1 TID freq. Items bought 100 {f, c, a, m, p} 200 {f, c, a, b, m} 300 {f, b} 400 {c, p, b} 500 {f, c, a, m, p} 6/30/2019 60By:Tekendra Nath Yogi
  • 61. Contd… root Item frequency f 4 c 4 a 3 b 3 m 3 p 3 min_support = 3 f:3 c:2 a:2 m:1 p:1 b:1 m:1 b:1 TID freq. Items bought 100 {f, c, a, m, p} 200 {f, c, a, b, m} 300 {f, b} 400 {c, p, b} 500 {f, c, a, m, p} c:1 b:1 p:1 6/30/2019 61By:Tekendra Nath Yogi
  • 62. Contd… root Item frequency f 4 c 4 a 3 b 3 m 3 p 3 min_support = 3 f:4 c:3 a:3 m:2 p:2 b:1 m:1 b:1 TID freq. Items bought 100 {f, c, a, m, p} 200 {f, c, a, b, m} 300 {f, b} 400 {c, p, b} 500 {f, c, a, m, p} c:1 b:1 p:1 Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 6/30/2019 62By:Tekendra Nath Yogi
  • 63. Contd… • Mining Frequent Patterns Using the FP-tree : Start with last item in order (i.e., p). • Follow node pointers and traverse only the paths containing p. • Accumulate all of transformed prefix paths of that item to form a conditional pattern base Conditional pattern base for p fcam:2, cb:1 f:4 c:3 a:3 m:2 p:2 c:1 b:1 p:1 p Construct a new FP-tree based on this pattern, by merging all paths and keeping nodes that appear sup times. This leads to only one branch c:3 Thus we derive only one frequent pattern cont. p. Pattern cp 6/30/2019 63By:Tekendra Nath Yogi
  • 64. Contd… • Move to next least frequent item in order, i.e., m • Follow node pointers and traverse only the paths containing m. • Accumulate all of transformed prefix paths of that item to form a conditional pattern base f:4 c:3 a:3 m:2 m m:1 b:1 m-conditional pattern base: fca:2, fcab:1 {} f:3 c:3 a:3 m-conditional FP-tree (contains only path fca:3) All frequent patterns that include m m, fm, cm, am, fcm, fam, cam, fcam  6/30/2019 64By:Tekendra Nath Yogi
  • 65. Contd… EmptyEmptyf {(f:3)}|c{(f:3)}c {(f:3, c:3)}|a{(fc:3)}a Empty{(fca:1), (f:1), (c:1)}b {(f:3, c:3, a:3)}|m{(fca:2), (fcab:1)}m {(c:3)}|p{(fcam:2), (cb:1)}p Conditional FP-treeConditional pattern-baseItem 6/30/2019 65By:Tekendra Nath Yogi
  • 66. Why Is Frequent Pattern Growth Fast? • Performance studies show – FP-growth is an faster than Apriori • Reasoning – No candidate generation, no candidate test – Uses compact data structure – Eliminates repeated database scan – Basic operation is counting and FP-tree building 6/30/2019 66By:Tekendra Nath Yogi
  • 67. 6/30/2019 By:Tekendra Nath Yogi 67 Handling Categorical Attributes • So far, we have used only transaction data for mining association rules. • The data can be in transaction form or table form Transaction form: t1: a, b t2: a, c, d, e t3: a, d, f Table form: • Table data need to be converted to transaction form for association rule mining Attr1 Attr2 Attr3 a b d b c e
  • 68. 6/30/2019 By:Tekendra Nath Yogi 68 Contd… • To convert a table data set to a transaction data set simply change each value to an attribute–value pair. • For example:
  • 69. 6/30/2019 By:Tekendra Nath Yogi 69 Contd… • Each attribute–value pair is considered an item. • Using only values is not sufficient in the transaction form because different attributes may have the same values. • For example, without including attribute names, value a’s for Attribute1 and Attribute2 are not distinguishable. • After the conversion, Figure (B) can be used in mining.
  • 70. 6/30/2019 By:Tekendra Nath Yogi 70 Homework • What is the aim of association rule mining? Why is this aim important in some application? • Define the concepts of support and confidence for an association rule. • Show how the apriori algorithm works on an example dataset. • What is the basis of the apriori algorithm? Describe the algorithm briefly. Which step of the algorithm can become a bottleneck? • A database has five transactions. Let min sup = 60% and min conf = 80%. Find all frequent itemsets using Apriori algorithm. List all the strong association rules.
  • 71. 6/30/2019 By:Tekendra Nath Yogi 71 Contd… • Show using an example how FP-tree algorithm solves the association rule mining (ARM) problem. • Perform ARM using FP-growth on the following data set with minimum support = 50% and confidence = 75% Transaction ID Items 1 Bread, cheese, Eggs, Juice 2 Bread, Cheese, Juice 3 Bread, Milk, Yogurt 4 Bread, Juice, Milk 5 Cheese, Juice, Milk
  • 72. Thank You ! 72By: Tekendra Nath Yogi6/30/2019