Market Basket Analysis of bakery Shop

MARKET BASKET ANALYSIS OF BACKRY SHOP
BY
VARUN SAHDEV
MARKET BASKET ANALYSIS

CONTENTS
• Introduction
• Association rules
o Support
o Confidence
o Lift
o Conviction
• Benefits of Market Basket Analysis
• Application of Market Basket Analysis
• Loading Data
• Variable Details
• Data Analysis
• Apriori algorithm
o Choice of support and confidence
o Execution
o Visualize association rules
o Another execution
• Conclusion

Introduction
Market basket analysis is an unsupervised learning technique that can be useful
for analyzing transactional data. It can be a powerful technique in analyzing the
purchasing patterns of consumers. In this tutorial, we will examine the concept
behind market basket analysis, introduce the apriori algorithm, as well conduct
our own market basket analysis using R.
First, it’s important to define the Apriori algorithm, including some statistical
concepts (support, confidence, lift and conviction) to select interesting rules.
Then we are going to use a data set containing more than 6.000 transactions from
a bakery to apply the algorithm and find combinations of products that are
bought together
Association rules
The Apriori algorithm generates association rules for a given data set. An
association rule implies that if an item A occurs, then item B also occurs with a
certain probability. for example,
Transaction Items
t1 {T-shirt, Trousers, Belt}
t2 {T-shirt, Jacket}
t3 {Jacket, Gloves}
t4 {T-shirt, Trousers, Jacket}
t5 {T-shirt, Trousers, Sneakers, Jacket, Belt}
t6 {Trousers, Sneakers, Belt}
t7 {Trousers, Belt, Sneakers}
In the table above, we can see seven transactions from a clothing store. Each
transaction shows items bought in that transaction. We can represent our items as
an item set as follows:
𝐼 = {𝑖1, 𝑖2, . . . , 𝑖 𝑘}
In our case it corresponds to:
$$I={Ttext- shirt, Trousers, Belt, Jacket, Gloves, Sneakers}$$
A transaction is represented by the following expression:
𝑇 = {𝑡1, 𝑡2, . . . , 𝑡 𝑛}

For example,
$$t_1={Ttext- shirt, Trousers, Belt}$$
Then, an association rule is defined as an implication of the form:
𝑋 ⇒ 𝑌, where 𝑋 ⊂ 𝐼, 𝑌 ⊂ 𝐼 and 𝑋 ∩ 𝑌 = 0
For example,
$${Ttext- shirt, Trousers} Rightarrow {Belt}$$
In the following sections we are going to define four metrics to measure the
precision of a rule.
➢Support
Support is an indication of how frequently the item set appears in the data set.
𝑠𝑢𝑝𝑝(𝑋 ⇒ 𝑌) =
|𝑋 ∪ 𝑌|
𝑛
In other words, it’s the number of transactions with both 𝑋 and 𝑌 divided by the
total number of transactions. The rules are not useful for low support values.
Let’s see different examples using the clothing store transactions from the
previous table.
• $supp(Ttext- shirt Rightarrow Trousers)=dfrac{3}{7}=43 %$
• 𝑠𝑢𝑝𝑝(𝑇𝑟𝑜𝑢𝑠𝑒𝑟𝑠 ⇒ 𝐵𝑒𝑙𝑡) =
4
7
= 57%
• $supp(Ttext- shirt Rightarrow Belt)=dfrac{2}{7}=28 %$
• $supp({Ttext- shirt, Trousers} Rightarrow {Belt})=dfrac{2}{7}=28 %$
➢Confidence
For a rule 𝑋 ⇒ 𝑌, confidence shows the percentage in which 𝑌 is bought with 𝑋.
It’s an indication of how often the rule has been found to be true.
𝑐𝑜𝑛𝑓(𝑋 ⇒ 𝑌) =
𝑠𝑢𝑝𝑝(𝑋 ∪ 𝑌)
𝑠𝑢𝑝𝑝(𝑋)
For example, the rule $Ttext- shirt Rightarrow Trousers$ has a confidence of
3/4, which means that for 75% of the transactions containing a t-shirt the rule is
correct (75% of the times a customer buys a t-shirt, trousers are bought as well).
Three more examples:

• 𝑐𝑜𝑛𝑓(𝑇𝑟𝑜𝑢𝑠𝑒𝑟𝑠 ⇒ 𝐵𝑒𝑙𝑡) =
4/7
5/7
= 80%
• $conf(Ttext- shirt Rightarrow Belt)=dfrac{2/7}{4/7}=50 %$
• $conf({Ttext- shirt, Trousers} Rightarrow {Belt})=dfrac{2/7}{3/7}=66
%$
➢Lift
The lift of a rule is the ratio of the observed support to that expected if 𝑋 and 𝑌
were independent, and is defined as
𝑙𝑖𝑓𝑡(𝑋 ⇒ 𝑌) =
𝑠𝑢𝑝𝑝(𝑋 ∪ 𝑌)
𝑠𝑢𝑝𝑝(𝑋)𝑠𝑢𝑝𝑝(𝑌)
Greater lift values indicate stronger associations. Let’s see some examples:
• $lift(Ttext- shirt Rightarrow Trousers)=dfrac{3/7}{(4/7)(5/7)}= 1.05$
• 𝑙𝑖𝑓𝑡(𝑇𝑟𝑜𝑢𝑠𝑒𝑟𝑠 ⇒ 𝐵𝑒𝑙𝑡) =
4/7
(5/7)(4/7)
= 1.4
• $lift(Ttext- shirt Rightarrow Belt)=dfrac{2/7}{(4/7)(4/7)}=0.875$
• $lift({Ttext- shirt, Trousers} Rightarrow
{Belt})=dfrac{2/7}{(3/7)(4/7)}=1.17$
➢Conviction
The conviction of a rule is defined as
𝑐𝑜𝑛𝑣(𝑋 ⇒ 𝑌) =
1 − 𝑠𝑢𝑝𝑝(𝑌)
1 − 𝑐𝑜𝑛𝑓(𝑋 ⇒ 𝑌)
It can be interpreted as the ratio of the expected frequency that 𝑋 occurs without
𝑌 if 𝑋 and 𝑌 were independent divided by the observed frequency of incorrect
predictions. A high value means that the consequent depends strongly on the
antecedent. we can understand more by these examples:
• $conv(Ttext- shirt Rightarrow Trousers)= dfrac{1-5/7}{1-3/4}=1.14$
• 𝑐𝑜𝑛𝑣(𝑇𝑟𝑜𝑢𝑠𝑒𝑟𝑠 ⇒ 𝐵𝑒𝑙𝑡) =
1−4/7
1−4/5
= 2.14
• $conv(Ttext- shirt Rightarrow Belt)=dfrac{1-4/7}{1-1/2}=0.86$
• $conv({Ttext- shirt, Trousers} Rightarrow {Belt})=dfrac{1-4/7}{1-
2/3}=1.28$

Benefits Of Market Basket Analysis
The followings are main benefits of Market Basket Analysis :-
• Store Layout
we can organize or set up store according to market basket analysis in order
to increase revenue. Once we know the products in the market basket, we can
arrange or place the products near each other so that the customer notice and
take a decision to buy them. Market business analysis acts as a guide to
organize store to get the best revenues.
• Marketing Messages
Market basket analysis increase the efficiency of marketing messages
whether it is done by phone, email, social media etc.we can suggest the next
best option to the customers by using market business analysis data. With the
help of market business analysis data, we can give relevant suggestions to our
customer instead of telling them about irritating marketing offers.
• Maintain Inventory
With the help of market basket analysis, we may know what are the products
that our customers are going to buy in future and we can maintain our
inventory accordingly. we can also predict the future purchase of customers
over a period of time on the basis of market basket analysis data. we can also
use initial sales data to maintain our inventory. we can also predict the
shortage of useful items or more demanded items in our store and then
arrange our stock or inventory accordingly.
• Content Placement
Content placement is very important when we are doing an e-commerce
business. our conversion rates will increase when our products are displayed
or arranged in a right order. Marketing basket analysis is used by the online
retailers to display the content that is likely to read next by the customers. It
will help to engage customers on our website. Market basket analysis helps to
increase traffic on our website and to get better conversion rates.
• Recommendation Engines
Market basket analysis is the base for creating recommendation engines. A
recommendation engine is a software that analyzes identifies and
recommends content to users in which they are interested. A recommendation
engine is an important part of application and software product. It collects
information about people’s habits and then recommends contents to them.

Applicaion Of Market Basket Analysis
Market basket analysis is applied to various fields of the retail sector in order to
boost sales and generate revenue by identifying the needs of the customers and
make purchase suggestions to them.
• Cross Selling
Cross-selling is basically a sales technique in which seller suggests some related
product to a customer after he buys a product. A seller influences the customer
to spend more by purchasing more products related to the product that has
already been purchased by him. For instance, if someone buys milk from a
store, the seller asks or suggests him to buy coffee or tea as well. So basically
the seller suggests the complementary product to the customer with the product
that he has already purchased. Market basket analysis helps the retailer to know
the consumer behavior and then go for cross-selling.
• Product Placement
It refers to placing the complimentary (pen and paper)and substitute goods (tea
and coffee) together so that the customer addresses the goods and will buy both
the goods together. If a seller places these kinds of goods together there is a
probability that a customer will purchase them together. Market basket analysis
helps the retailer to identify the goods that a customer can purchase together.
• Affinity Promotion
Affinity promotion is a method of promotion that design promotional events
based on associated products. Market basket analysis affinity promotion is a
useful way to prepare and analyze questionnaire data.
• Fraud Detection
Market basket analysis is also applied to fraud detection. It may be possible to
identify purchase behavior that can associate with fraud on the basis of market
basket analysis data that contain credit card usage. Hence market basket
analysis is also useful in fraud detection.
• Customer Behavior
Market basket analysis helps to understand customer behavior. It understands
the customer behavior under different conditions. It provides an insight into
customer behavior. It allows the retailer to identify the relationship between two
products that people tend to buy and hence helps to understand the customer
behavior towards a product or service.
Hence, market basket analysis helps the retailer to get an insight into customer
behavior and to understand the relationship between two or more goods so that they
can offer or do purchase suggestions to their customers so that they will buy more
from their stores and they can earn great revenue.

Loading Data
First we need to load some libraries and import our data. We can use the function
read.transactions() from the arules package to create a transactions object.
with the help of Descriptive Analysis , we understands the basic attributes ,
features and variables in our data set. Basic under standing of data set
➢ Transaction object
# Transaction object
## transactions in sparse format with
## 6614 transactions (rows) and
## 104 items (columns)
➢ Summary
# Summary
## transactions as itemMatrix in sparse format with
## 6614 rows (elements/itemsets/transactions) and
## 104 columns (items) and a density of 0.02008705
##
## most frequent items:
## Coffee Bread Tea Cake Pastry (Other)
## 3188 2146 941 694 576 6272
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10
## 2556 2154 1078 546 187 67 18 3 2 3
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 2.089 3.000 10.000
##
## includes extended item information - examples:
## labels
## 1 Adjustment
## 2 Afternoon with the baker
## 3 Alfajores
##
## includes extended transaction information - examples:
## transactionID
## 1 1
## 2 10
## 3 1000

➢ Structure
# Structure
## Formal class 'transactions' [package "arules"] with 3 slots
## ..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
## ..@ itemInfo :'data.frame': 104 obs. of 1 variable:
## .. ..$ labels: chr [1:104] "Adjustment" "Afternoon with the baker" "Alfajores" "Argentin
a Night" ...
## ..@ itemsetInfo:'data.frame': 6614 obs. of 1 variable:
## .. ..$ transactionID: Factor w/ 6614 levels "1","10","1000",..: 1 2 3 4 5 6 7 8 9 10 ...
Variable Details
The data set contains 15.010 observations and the following columns,
• Date. Categorical variable that tells us the date of the transactions (YYYY-
MM-DD format). The column includes dates from 30/10/2016 to 09/04/2017.
• Time. Categorical variable that tells us the time of the transactions
(HH:MM:SS format).
• Transaction. Quantitative variable that allows us to differentiate the
transactions. The rows that share the same value in this field belong to the
same transaction, that’s why the data set has less transactions than
observations.
• Item. Categorical variable with the products.
Data Analysis
Before applying the Apriori algorithm on the data set, we use some basic
plots.This Visualization can help to learn more about the transactions. For
example, we can generate an itemFrequencyPlot() to create an item Frequency Bar
Plot to view the distribution of products.

The itemFrequencyPlot() is a method to create item frequency bar plot for
inspecting the item frequency distribution for objects on Item matrix it allows us
to show the absolute or relative values. If absolute it will plot numeric
frequencies of each item independently. If relative it will plot how many times
these items have appeared as compared to others, we can understand by these
plots.

by visualize the plots we can say that, Coffee is the best-selling product by far,
followed by bread and tea. For more understanding the beahviors and patterns in
tranactions,we use some visualizations describing the time distribution using the
ggplot() function.
• Transactions per month
• Transactions per weekday
• Transactions per hour
The data set includes dates from 30/10/2016 to 09/04/2017, that’s why we have
so few transactions in October and April.

As we can see, Saturday is the busiest day in the bakery. Conversely, Wednesday
is the day with fewer transactions.
There’s not much to discuss with this visualization. The results are logical and
expected.

Apriori algorithm
➢Choice of support and confidence
The first step in order to create a set of association rules is to determine the
optimal thresholds for support and confidence. If we set these values too low,
then the algorithm will take longer to execute and we will get a lot of rules (most
of them will not be useful). Then, what values do we choose? We can try
different values of support and confidence and see graphically how many rules
are generated for each combination.
In the following plots we can see the number of rules generated with a support
level of 10%, 5%, 1% and 0.5%.

We can join the four lines to improve the visualization.
Analysis of Results
• Support level of 10%. We only identify a few rules with very low
confidence levels. This means that there are no relatively frequent
associations in our data set. We can’t choose this value, the resulting rules are
unrepresentative.
• Support level of 5%. We only identify a rule with a confidence of at least
50%. It seems that we have to look for support levels below 5% to obtain a
greater number of rules with a reasonable confidence.
• Support level of 1%. We started to get dozens of rules, of which 13 have a
confidence of at least 50%.
• Support level of 0.5%. Too many rules to analyze!
As per above analysis we will use a support level of 1% and a confidence level of
50% for further Analysis.

➢Execution
we can execute the Apriori algorithm with the values obtained in the
previous section. with help of apriori() function.
For generated association rules. we can use inspect() function for
Describe and view this rules.
# Association rules
## lhs rhs support confidence lift count
## [1] {Tiffin} => {Coffee} 0.01058361 0.5468750 1.134577 70
## [2] {Spanish Brunch} => {Coffee} 0.01406108 0.6326531 1.312537 93
## [3] {Scone} => {Coffee} 0.01844572 0.5422222 1.124924 122
## [4] {Toast} => {Coffee} 0.02570305 0.7296137 1.513697 170
## [5] {Alfajores} => {Coffee} 0.02237678 0.5522388 1.145705 148
## [6] {Juice} => {Coffee} 0.02131842 0.5300752 1.099723 141
## [7] {Hot chocolate} => {Coffee} 0.02721500 0.5263158 1.091924 180
## [8] {Medialuna} => {Coffee} 0.03296039 0.5751979 1.193337 218
## [9] {Cookies} => {Coffee} 0.02978530 0.5267380 1.092800 197
## [10] {NONE} => {Coffee} 0.04172966 0.5810526 1.205484 276
## [11] {Sandwich} => {Coffee} 0.04233444 0.5679513 1.178303 280
## [12] {Pastry} => {Coffee} 0.04868461 0.5590278 1.159790 322
## [13] {Cake} => {Coffee} 0.05654672 0.5389049 1.118042 374
We can also create an HTML table widget using the inspectDT() function
from the aruslesViz package. Rules can be interactively filtered and
sorted.
interpreation of these rules.
• 52% of the customers who bought a hot chocolate algo bought a
coffee.
• 63% of the customers who bought a spanish brunch also bought a
coffee.
• 73% of the customers who bought a toast also bought a coffee.
And so on. It seems that in this bakery there are many coffee lovers.

➢Visualize association rules
We will use the arulesViz package to create the visualizations. first we create a
simple scatter plot with different measures of interestingness on the axes (lift and
support) and a third measure (confidence) represented by the color of the points.
The following visualization represents the rules as a graph with items as labeled
vertices, and rules represented as vertices connected to items using arrows.

We can also change the graph layout.

We can represent the rules as a grouped matrix-based visualization. The support
and lift measures are represented by the size and color of the ballons,
respectively. In this case it’s not a very useful visualization, since we only have
coffe on the right-hand-side of the rules.
➢Another execution
We have executed the Apriori algorithm with the appropriate support and
confidence values. What happens if we execute it with low values? How do the
visualizations change? Let’s try with a support level of 0.5% and a confidence
level of 10%.
It’s impossible to analyze these visualizations! For larger rule sets visual analysis
becomes difficult. Furthermore, most of the rules are useless. That’s why we
have to carefully select the right values of support and confidence.

Graph
Parallel coordinates plot

Grouped matrix plot
Scatter plot

Conclusion
Market basket analysis is an unsupervised machine learning technique that can be
useful for finding patterns in transactional data. It can be a very powerful tool for
analyzing the purchasing patterns of consumers. The main algorithm used in
market basket analysis is the apriori algorithm.Apriori algorithm is one of the
most frequently used algorothm in data mining.The three statistical measures in
market basket analysis are support, confidence, and lift. Support measures the
frequency an item appears in a given transactional data set, confidence measures
the algorithm’s predictive power or accuracy, and lift measures how much more
likely an item is purchased relative to its typical purchase rate. In our example,
we examined the transactional patterns of Backery purchases and discovered
both obvious and not-so-obvious patterns in certain transactions.

Market Basket Analysis of bakery Shop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Market Basket Analysis of bakery Shop

Similar to Market Basket Analysis of bakery Shop (20)

Recently uploaded

Recently uploaded (20)

Market Basket Analysis of bakery Shop