Frequent pattern mining is an analytical algorithm that is used by businesses and, is accessible in some self-serve business intelligence solutions. The FP Growth analytical technique finds frequent patterns, associations, or causal structures from data sets in various kinds of databases such as relational databases, transactional databases, and other forms of data repositories.
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining to Analyze Data?
1. Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
4. INTRODUCTION
Association rule mining is a procedure which
finds frequent patterns, associations, or causal
structures from data sets found in various kinds
of databases such as relational databases,
transactional databases, and other forms of
data repositories
Given a set of transactions, association rule
mining aims to find the rules which enable us
to predict the occurrence of a specific item
based on the occurrences of the other items in
the transaction
5. BASIC
TERMINOLOGIES
• ANTECEDENT:
• Left hand side of the rule is called Antecedent
• For instance, for the rule milk->bread, milk is antecedent
• CONSEQUENT:
• Right hand side of the rule is called Consequent
• For instance, for the rule milk->bread, bread is consequent
• SUPPORT :
• The support of a rule x -> y (where x and y are each items/events
etc.) is defined as the proportion of transactions in the data set
which contain the item set x as well as y
• Thus, Support (x -> y)= no. of transactions which contain the item
set x & y / total no. of transactions
•
6. BASIC
TERMINOLOGIES
• CONFIDENCE :
• The confidence of a rule x -> y is defined as:
• Support ( x -> y ) / support (x)
• Thus it is the ratio of the number of
transactions that include all items in the
consequent (y in this case), as well as the
antecedent( x in this case) to the number of
transactions that include all items in the
antecedent ( x in this case )
• LIFT :
• The lift of a rule x -> y is defined as:
• Support ( x -> y ) / support (x) * support (y)
7. Here , support (Milk -> Bread):
= Number of transactions containing milk &
bread / total transactions
= 2/5 = 0.4
TID Milk Bread Butter Beer
1 1 0 1 1
2 1 1 1 0
3 0 1 1 0
4 1 0 0 1
5 1 1 1 1
Confidence (Milk -> Bread):
= support (milk-> bread)/ support(milk)
= 0.4/ [4/5]
=0.4/ 0.8
=0.5
Lift (Milk -> Bread):
= support (milk-> bread)/ support(milk) *
support(bread)
= 0.4/ [(4/5) * (3/5)]
=0.4/ [0.8*0.6] = 0.4/0.48
=0.83
Support (milk->bread) = 0.4
means milk & bread together
occur in 40% of all transactions
Confidence (milk->bread) = 0.5
means, if there are 100
transactions containing milk then
there are 50 of them containing
bread also
Example
8. Example
Similarly , support, confidence and lift values
of all item combinations are found and the
rules matching user defined threshold of
support and confidence will be displayed in
final output as shown below :
For instance, for minimum support = 0.3 and
minimum confidence =0.3, sample rules
generated would be as shown below ,
depicting frequent item sets or best
performing combination of item sets
Rule Support Confidence Lift
Bread->Butter 0.5 0.6 0.86
Milk -> Bread 0.4 0.5 0.83
Milk -> Butter 0.3 0.5 0.78
Bread-> Beer 0.3 0.4 0.68
9. INTERPRETATION :
Example
Rule Support Confidence Lift
Bread->Butter 0.5 0.6 0.86
Milk -> Bread 0.4 0.5 0.83
Milk -> Butter 0.3 0.5 0.78
Bread-> Beer 0.3 0.4 0.68
• In this case, Bread -> Butter rule has highest
propensity to be bought together as it has
highest support as well as confidence,
followed by Milk -> Bread, Milk -> Butter
and Bread -> Beer
• As support (Bread-> Butter ) = 0.5, there are
50% transactions containing bread and
butter
• As confidence (Bread-> Butter ) = 0.6 , if
there are 100 transactions containing bread
then there are 60 of them containing butter
also
• A lift larger than 1.0 implies that the
relationship between the antecedent and
the consequent is more significant than
would be expected if the two were
independent. The larger the lift, the more
significant the association
11. Standard Output 1 :
Model Summary
Rules that have both high confidence and
support are called strong rules
Even if confidence reaches high values, the
rule is not useful unless the support value is
high as well
In this case, Bread -> Butter rule has highest
propensity to be bought together as it has
highest support as well as confidence,
followed by Milk -> Bread, Milk -> Butter and
Bread -> Beer
There are 50% transactions containing break
and butter, and if there are 100 transactions
containing bread, 60 of them also has butter
INTERPRETATION :
Rule Support Confidence Lift
Shampoo -> Soap 0.5 0.5 0.86
Cold drink -> Snacks 0.4 0.4 0.83
Fruit -> Vegetables 0.3 0.35 0.78
Milk > Egg 0.3 0.30 0.68
12. INTERPRETATION :
Sample Output 2 :
Plot : Confidence By
Rules
The confidence value of each rule is
shown in the plot above
As confidence value takes into
account the sequence of items in the
association rule, this plot is built
based on confidence values instead of
support or lift
The product combinations shown in
dark green color in plot above have
the highest likelihood to be bought
together and in sequence
Darker the color, more the likelihood
of products being bought together
and sequentially
13. Sample Output 2:
Plot : Support &
confidence by rule
• Ideally both support and
confidence should be taken into
account to come up with best rules
because support only indicates the
frequency of both items being sold
together where as confidence takes
care of sequence of purchase also
• Hence, alternatively , a bubble
scatter plot using support and
confidence measures can be used to
focus on rules with high support as
well as confidence
14. LIMITATIONS :
Processing time for running this algorithm is
relatively high when compared to other
algorithms due to millions of transaction level
data in input
The user must possess a certain amount of
expertise in order to find the right settings for
support and confidence to obtain the best
association rules
15. GENERAL
APPLICATIONS
• BASKET DATA ANALYSIS
• To analyze the association of purchased items in a single basket or
single purchase
• CROSS MARKETING/SELLING
• To work with other businesses that complement your own, not
competitors.
• For example, vehicle dealerships and manufacturers have cross
marketing campaigns with oil and gas companies for obvious reasons
• CATALOG DESIGN
• The selection of items in a business’ catalog are often designed to
complement each other so that buying one item will lead to buying of
another. So these items are often complements or very related
• MEDICAL TREATMENTS
• Each patient is represented as a transaction containing the ordered set
of diseases and which diseases are likely to occur simultaneously /
sequentially can be predicted
16. USE CASE 1
Business benefit:
• Based on the association rules
generated, the store manager will
be able to strategically place the
products together or in sequence
leading to growth in sales and in
turn revenue of the store
• Offers such as “Buy this and get this
free” or “Buy this and get %off on
this” can be designed based on the
association rules generated
Business problem :
• A retail store manager wants to
conduct Market Basket analysis to
come up with better strategy of
products placement and product
bundling
17. Use case 1 : Sample Input Dataset
Transaction ID Product 1 Product 2 Product 3
1039153 Milk Bread Jam
1069697 Shampoo Soap Tooth paste
1068120 Cold drink Snacks Ear ring
563175 Hand wash Antiseptic liquid Hand sanitizer
562842 Fruit Vegetables Ketchup
562681 Cold drink Snacks Ear ring
562404 Shampoo Soap -
700159 Bread Jam -
696484 Milk Fruit Vegetables
18. Use Case 1 : Sample Output 1 : Association
Rules
Rule Support Confidence Lift
Shampoo -> Soap 0.5 0.6 0.86
Cold drink -> Snacks 0.4 0.5 0.83
Fruits -> Vegetables 0.3 0.5 0.78
Bread -> Jam 0.3 0.3 0.67
Output : Based on the threshold set by user or automatically selected by algorithm, the best product combinations
will show up in the form of association rules, along with their support, confidence and lift values as shown below :
19. Use Case 1 : Sample Output 2: Plot :
Confidence By Rule
• Based on the association rules
generated, the heat map plot can
be shown as above , indicating
rules having high confidence or
support with darker shade of a
particular color and those having
lower support or confidence
values with lighter shade of a
same color
• End user can simply focus on the
rules shown in dark color to come
up with better product bundling or
placement in order to increase the
cross sell
20. Use case 2
Business benefit:
• Based on the rules generated,
banking products can be cross sold
to each existing or prospective
customer to drive sales and bank
revenue
• For instance, if saving ,personal
loan and credit card are frequently
sequentially bought then a new
saving account customer can be
cross sold with personal loan and
credit card
Business problem :
• A bank marketing manager may
want to analyze which products are
frequently and sequentially bought
together
• Each customer is represented as a
transaction containing the ordered
set of products and which products
are likely to be purchased
simultaneously / sequentially can
be predicted
21. Use case 3
Business problem :
•A telecom marketing manager may want
to analyze which value added services are
frequently and sequentially bought
together
•Each customer is represented as a
transaction containing the ordered set of
value added services and which services
are likely to be purchased simultaneously
/ sequentially can be predicted
Business benefit:
•Based on the rules generated, value
added services can be cross sold to each
existing or prospective customer to drive
sales and revenue of a telecom service
provider
•For instance, if a group calling sim, 1 GB
internet plan and 500 minutes plan is
generally opted out plan by most of the
customers than whenever a new
prospective customer comes in for group
calling sim card, then he or she can be
targeted with 1 GB internet plan and 500
minutes plan
22. Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018