Association Rule Learning Part 1: Frequent Itemset Generation

Association Rule
Learning Part 1:
Frequent Itemset
Generation
Akshansh Jain
Software Consultant
Knoldus Inc.

Agenda
● What is Association Rule Learning?
● Applications of Association Rule Learning
● Terminologies related with Association Rule Learning
● Issues with Association Rule Learning
● FP-Tree
● FP-Tree Construction
● The FP-Growth algorithm

What is Association Rule Learning?
Market Basket Transactions

Association Analysis - A methodology useful for discovering
interesting relationships hidden in large data sets. The uncovered
relationships can be presented in the form of association rules.

Applications of Association Rule
Learning
● Market Basket analysis
● Bioinformatics
● Medical Diagnosis
● Web Mining
● Scientific Data Analysis

Terminologies related with
Association Rule Learning
● Itemset - In association rule learning, a collection of zero or
more items is known as an itemset. If an itemset consists of k-
items, then it is known as k-itemset. In our example dataset,
itemset examples would be - {Bread, Milk}, {Bread, Diapers,
Beer, Eggs}

● Support - Support is the measure that tells us how frequent an
itemset is in a given dataset. It is calculated by dividing the total
number of occurrences of an itemset by the total number of
transactions.
= ⅖ = 0.4

● Confidence - For a rule,
Confidence determines how frequently Y has appeared in
transactions that contain X.
Confidence can be calculated by -

Issues with Association Rule
Learning
● Discovering patterns from a large transaction data set can be
computationally expensive.
● Some of the discovered patterns are potentially spurious
because they may happen simply by chance.

● Frequent Itemset Generation - whose objective is to find all
the itemsets that satisfy the minsup threshold. These itemsets
are called frequent itemsets.
● Rule Generation - whose objective is to extract all the high-
confidence rules from the frequent itemsets found in the
previous step. These rules are called strong rules.

Minimum
support = 3
Header Table

The FP-Growth algorithm
● FP-growth is an algorithm that generates frequent
itemsets from an FP-tree by exploring the tree in a
bottom-up fashion.
● FP-growth finds all the frequent itemsets ending with
a particular suffix by employing a divide-and-conquer
strategy to split the problem into smaller subproblems.

Conditional FP-Tree for node p
{c, p} becomes a frequent itemset

Continuing in this manner, we will get the frequent itemsets
as -
{p - 3}, {c,p - 3}, {m - 3}, {f,m - 3}, {c,f,m - 3}, {c,m - 3}, {a,m -
3}, {f,a,m - 3}, {c,f,a,m - 3}, {c,a,m - 3}, {b - 3}, {a - 3},
{ f,a - 3}, {c,f,a - 3}, {c,a - 3}, {f - 4}, {c,f - 3}, {c - 4}

References
● Introduction to Data Mining - by Michael Steinbach, Pang-
Ning Tan, and Vipin Kumar
● Blog on Association Rule Learning at KD - nuggets

Association Rule Learning Part 1: Frequent Itemset Generation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Association Rule Learning Part 1: Frequent Itemset Generation

Similar to Association Rule Learning Part 1: Frequent Itemset Generation (20)

More from Knoldus Inc.

More from Knoldus Inc. (20)

Recently uploaded

Recently uploaded (20)

Association Rule Learning Part 1: Frequent Itemset Generation