SlideShare a Scribd company logo
MARKET BASKET ANALYSIS OF BACKRY SHOP
BY
VARUN SAHDEV
MARKET BASKET ANALYSIS
CONTENTS
• Introduction
• Association rules
o Support
o Confidence
o Lift
o Conviction
• Benefits of Market Basket Analysis
• Application of Market Basket Analysis
• Loading Data
• Variable Details
• Data Analysis
• Apriori algorithm
o Choice of support and confidence
o Execution
o Visualize association rules
o Another execution
• Conclusion
Introduction
Market basket analysis is an unsupervised learning technique that can be useful
for analyzing transactional data. It can be a powerful technique in analyzing the
purchasing patterns of consumers. In this tutorial, we will examine the concept
behind market basket analysis, introduce the apriori algorithm, as well conduct
our own market basket analysis using R.
First, it’s important to define the Apriori algorithm, including some statistical
concepts (support, confidence, lift and conviction) to select interesting rules.
Then we are going to use a data set containing more than 6.000 transactions from
a bakery to apply the algorithm and find combinations of products that are
bought together
Association rules
The Apriori algorithm generates association rules for a given data set. An
association rule implies that if an item A occurs, then item B also occurs with a
certain probability. for example,
Transaction Items
t1 {T-shirt, Trousers, Belt}
t2 {T-shirt, Jacket}
t3 {Jacket, Gloves}
t4 {T-shirt, Trousers, Jacket}
t5 {T-shirt, Trousers, Sneakers, Jacket, Belt}
t6 {Trousers, Sneakers, Belt}
t7 {Trousers, Belt, Sneakers}
In the table above, we can see seven transactions from a clothing store. Each
transaction shows items bought in that transaction. We can represent our items as
an item set as follows:
𝐼 = {𝑖1, 𝑖2, . . . , 𝑖 𝑘}
In our case it corresponds to:
$$I={Ttext- shirt, Trousers, Belt, Jacket, Gloves, Sneakers}$$
A transaction is represented by the following expression:
𝑇 = {𝑡1, 𝑡2, . . . , 𝑡 𝑛}
For example,
$$t_1={Ttext- shirt, Trousers, Belt}$$
Then, an association rule is defined as an implication of the form:
𝑋 ⇒ 𝑌, where 𝑋 ⊂ 𝐼, 𝑌 ⊂ 𝐼 and 𝑋 ∩ 𝑌 = 0
For example,
$${Ttext- shirt, Trousers} Rightarrow {Belt}$$
In the following sections we are going to define four metrics to measure the
precision of a rule.
➢Support
Support is an indication of how frequently the item set appears in the data set.
𝑠𝑢𝑝𝑝(𝑋 ⇒ 𝑌) =
|𝑋 ∪ 𝑌|
𝑛
In other words, it’s the number of transactions with both 𝑋 and 𝑌 divided by the
total number of transactions. The rules are not useful for low support values.
Let’s see different examples using the clothing store transactions from the
previous table.
• $supp(Ttext- shirt Rightarrow Trousers)=dfrac{3}{7}=43 %$
• 𝑠𝑢𝑝𝑝(𝑇𝑟𝑜𝑢𝑠𝑒𝑟𝑠 ⇒ 𝐵𝑒𝑙𝑡) =
4
7
= 57%
• $supp(Ttext- shirt Rightarrow Belt)=dfrac{2}{7}=28 %$
• $supp({Ttext- shirt, Trousers} Rightarrow {Belt})=dfrac{2}{7}=28 %$
➢Confidence
For a rule 𝑋 ⇒ 𝑌, confidence shows the percentage in which 𝑌 is bought with 𝑋.
It’s an indication of how often the rule has been found to be true.
𝑐𝑜𝑛𝑓(𝑋 ⇒ 𝑌) =
𝑠𝑢𝑝𝑝(𝑋 ∪ 𝑌)
𝑠𝑢𝑝𝑝(𝑋)
For example, the rule $Ttext- shirt Rightarrow Trousers$ has a confidence of
3/4, which means that for 75% of the transactions containing a t-shirt the rule is
correct (75% of the times a customer buys a t-shirt, trousers are bought as well).
Three more examples:
• 𝑐𝑜𝑛𝑓(𝑇𝑟𝑜𝑢𝑠𝑒𝑟𝑠 ⇒ 𝐵𝑒𝑙𝑡) =
4/7
5/7
= 80%
• $conf(Ttext- shirt Rightarrow Belt)=dfrac{2/7}{4/7}=50 %$
• $conf({Ttext- shirt, Trousers} Rightarrow {Belt})=dfrac{2/7}{3/7}=66
%$
➢Lift
The lift of a rule is the ratio of the observed support to that expected if 𝑋 and 𝑌
were independent, and is defined as
𝑙𝑖𝑓𝑡(𝑋 ⇒ 𝑌) =
𝑠𝑢𝑝𝑝(𝑋 ∪ 𝑌)
𝑠𝑢𝑝𝑝(𝑋)𝑠𝑢𝑝𝑝(𝑌)
Greater lift values indicate stronger associations. Let’s see some examples:
• $lift(Ttext- shirt Rightarrow Trousers)=dfrac{3/7}{(4/7)(5/7)}= 1.05$
• 𝑙𝑖𝑓𝑡(𝑇𝑟𝑜𝑢𝑠𝑒𝑟𝑠 ⇒ 𝐵𝑒𝑙𝑡) =
4/7
(5/7)(4/7)
= 1.4
• $lift(Ttext- shirt Rightarrow Belt)=dfrac{2/7}{(4/7)(4/7)}=0.875$
• $lift({Ttext- shirt, Trousers} Rightarrow
{Belt})=dfrac{2/7}{(3/7)(4/7)}=1.17$
➢Conviction
The conviction of a rule is defined as
𝑐𝑜𝑛𝑣(𝑋 ⇒ 𝑌) =
1 − 𝑠𝑢𝑝𝑝(𝑌)
1 − 𝑐𝑜𝑛𝑓(𝑋 ⇒ 𝑌)
It can be interpreted as the ratio of the expected frequency that 𝑋 occurs without
𝑌 if 𝑋 and 𝑌 were independent divided by the observed frequency of incorrect
predictions. A high value means that the consequent depends strongly on the
antecedent. we can understand more by these examples:
• $conv(Ttext- shirt Rightarrow Trousers)= dfrac{1-5/7}{1-3/4}=1.14$
• 𝑐𝑜𝑛𝑣(𝑇𝑟𝑜𝑢𝑠𝑒𝑟𝑠 ⇒ 𝐵𝑒𝑙𝑡) =
1−4/7
1−4/5
= 2.14
• $conv(Ttext- shirt Rightarrow Belt)=dfrac{1-4/7}{1-1/2}=0.86$
• $conv({Ttext- shirt, Trousers} Rightarrow {Belt})=dfrac{1-4/7}{1-
2/3}=1.28$
Benefits Of Market Basket Analysis
The followings are main benefits of Market Basket Analysis :-
• Store Layout
we can organize or set up store according to market basket analysis in order
to increase revenue. Once we know the products in the market basket, we can
arrange or place the products near each other so that the customer notice and
take a decision to buy them. Market business analysis acts as a guide to
organize store to get the best revenues.
• Marketing Messages
Market basket analysis increase the efficiency of marketing messages
whether it is done by phone, email, social media etc.we can suggest the next
best option to the customers by using market business analysis data. With the
help of market business analysis data, we can give relevant suggestions to our
customer instead of telling them about irritating marketing offers.
• Maintain Inventory
With the help of market basket analysis, we may know what are the products
that our customers are going to buy in future and we can maintain our
inventory accordingly. we can also predict the future purchase of customers
over a period of time on the basis of market basket analysis data. we can also
use initial sales data to maintain our inventory. we can also predict the
shortage of useful items or more demanded items in our store and then
arrange our stock or inventory accordingly.
• Content Placement
Content placement is very important when we are doing an e-commerce
business. our conversion rates will increase when our products are displayed
or arranged in a right order. Marketing basket analysis is used by the online
retailers to display the content that is likely to read next by the customers. It
will help to engage customers on our website. Market basket analysis helps to
increase traffic on our website and to get better conversion rates.
• Recommendation Engines
Market basket analysis is the base for creating recommendation engines. A
recommendation engine is a software that analyzes identifies and
recommends content to users in which they are interested. A recommendation
engine is an important part of application and software product. It collects
information about people’s habits and then recommends contents to them.
Applicaion Of Market Basket Analysis
Market basket analysis is applied to various fields of the retail sector in order to
boost sales and generate revenue by identifying the needs of the customers and
make purchase suggestions to them.
• Cross Selling
Cross-selling is basically a sales technique in which seller suggests some related
product to a customer after he buys a product. A seller influences the customer
to spend more by purchasing more products related to the product that has
already been purchased by him. For instance, if someone buys milk from a
store, the seller asks or suggests him to buy coffee or tea as well. So basically
the seller suggests the complementary product to the customer with the product
that he has already purchased. Market basket analysis helps the retailer to know
the consumer behavior and then go for cross-selling.
• Product Placement
It refers to placing the complimentary (pen and paper)and substitute goods (tea
and coffee) together so that the customer addresses the goods and will buy both
the goods together. If a seller places these kinds of goods together there is a
probability that a customer will purchase them together. Market basket analysis
helps the retailer to identify the goods that a customer can purchase together.
• Affinity Promotion
Affinity promotion is a method of promotion that design promotional events
based on associated products. Market basket analysis affinity promotion is a
useful way to prepare and analyze questionnaire data.
• Fraud Detection
Market basket analysis is also applied to fraud detection. It may be possible to
identify purchase behavior that can associate with fraud on the basis of market
basket analysis data that contain credit card usage. Hence market basket
analysis is also useful in fraud detection.
• Customer Behavior
Market basket analysis helps to understand customer behavior. It understands
the customer behavior under different conditions. It provides an insight into
customer behavior. It allows the retailer to identify the relationship between two
products that people tend to buy and hence helps to understand the customer
behavior towards a product or service.
Hence, market basket analysis helps the retailer to get an insight into customer
behavior and to understand the relationship between two or more goods so that they
can offer or do purchase suggestions to their customers so that they will buy more
from their stores and they can earn great revenue.
Loading Data
First we need to load some libraries and import our data. We can use the function
read.transactions() from the arules package to create a transactions object.
with the help of Descriptive Analysis , we understands the basic attributes ,
features and variables in our data set. Basic under standing of data set
➢ Transaction object
# Transaction object
## transactions in sparse format with
## 6614 transactions (rows) and
## 104 items (columns)
➢ Summary
# Summary
## transactions as itemMatrix in sparse format with
## 6614 rows (elements/itemsets/transactions) and
## 104 columns (items) and a density of 0.02008705
##
## most frequent items:
## Coffee Bread Tea Cake Pastry (Other)
## 3188 2146 941 694 576 6272
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10
## 2556 2154 1078 546 187 67 18 3 2 3
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 2.089 3.000 10.000
##
## includes extended item information - examples:
## labels
## 1 Adjustment
## 2 Afternoon with the baker
## 3 Alfajores
##
## includes extended transaction information - examples:
## transactionID
## 1 1
## 2 10
## 3 1000
➢ Structure
# Structure
## Formal class 'transactions' [package "arules"] with 3 slots
## ..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
## ..@ itemInfo :'data.frame': 104 obs. of 1 variable:
## .. ..$ labels: chr [1:104] "Adjustment" "Afternoon with the baker" "Alfajores" "Argentin
a Night" ...
## ..@ itemsetInfo:'data.frame': 6614 obs. of 1 variable:
## .. ..$ transactionID: Factor w/ 6614 levels "1","10","1000",..: 1 2 3 4 5 6 7 8 9 10 ...
Variable Details
The data set contains 15.010 observations and the following columns,
• Date. Categorical variable that tells us the date of the transactions (YYYY-
MM-DD format). The column includes dates from 30/10/2016 to 09/04/2017.
• Time. Categorical variable that tells us the time of the transactions
(HH:MM:SS format).
• Transaction. Quantitative variable that allows us to differentiate the
transactions. The rows that share the same value in this field belong to the
same transaction, that’s why the data set has less transactions than
observations.
• Item. Categorical variable with the products.
Data Analysis
Before applying the Apriori algorithm on the data set, we use some basic
plots.This Visualization can help to learn more about the transactions. For
example, we can generate an itemFrequencyPlot() to create an item Frequency Bar
Plot to view the distribution of products.
The itemFrequencyPlot() is a method to create item frequency bar plot for
inspecting the item frequency distribution for objects on Item matrix it allows us
to show the absolute or relative values. If absolute it will plot numeric
frequencies of each item independently. If relative it will plot how many times
these items have appeared as compared to others, we can understand by these
plots.
by visualize the plots we can say that, Coffee is the best-selling product by far,
followed by bread and tea. For more understanding the beahviors and patterns in
tranactions,we use some visualizations describing the time distribution using the
ggplot() function.
• Transactions per month
• Transactions per weekday
• Transactions per hour
The data set includes dates from 30/10/2016 to 09/04/2017, that’s why we have
so few transactions in October and April.
As we can see, Saturday is the busiest day in the bakery. Conversely, Wednesday
is the day with fewer transactions.
There’s not much to discuss with this visualization. The results are logical and
expected.
Apriori algorithm
➢Choice of support and confidence
The first step in order to create a set of association rules is to determine the
optimal thresholds for support and confidence. If we set these values too low,
then the algorithm will take longer to execute and we will get a lot of rules (most
of them will not be useful). Then, what values do we choose? We can try
different values of support and confidence and see graphically how many rules
are generated for each combination.
In the following plots we can see the number of rules generated with a support
level of 10%, 5%, 1% and 0.5%.
We can join the four lines to improve the visualization.
Analysis of Results
• Support level of 10%. We only identify a few rules with very low
confidence levels. This means that there are no relatively frequent
associations in our data set. We can’t choose this value, the resulting rules are
unrepresentative.
• Support level of 5%. We only identify a rule with a confidence of at least
50%. It seems that we have to look for support levels below 5% to obtain a
greater number of rules with a reasonable confidence.
• Support level of 1%. We started to get dozens of rules, of which 13 have a
confidence of at least 50%.
• Support level of 0.5%. Too many rules to analyze!
As per above analysis we will use a support level of 1% and a confidence level of
50% for further Analysis.
➢Execution
we can execute the Apriori algorithm with the values obtained in the
previous section. with help of apriori() function.
For generated association rules. we can use inspect() function for
Describe and view this rules.
# Association rules
## lhs rhs support confidence lift count
## [1] {Tiffin} => {Coffee} 0.01058361 0.5468750 1.134577 70
## [2] {Spanish Brunch} => {Coffee} 0.01406108 0.6326531 1.312537 93
## [3] {Scone} => {Coffee} 0.01844572 0.5422222 1.124924 122
## [4] {Toast} => {Coffee} 0.02570305 0.7296137 1.513697 170
## [5] {Alfajores} => {Coffee} 0.02237678 0.5522388 1.145705 148
## [6] {Juice} => {Coffee} 0.02131842 0.5300752 1.099723 141
## [7] {Hot chocolate} => {Coffee} 0.02721500 0.5263158 1.091924 180
## [8] {Medialuna} => {Coffee} 0.03296039 0.5751979 1.193337 218
## [9] {Cookies} => {Coffee} 0.02978530 0.5267380 1.092800 197
## [10] {NONE} => {Coffee} 0.04172966 0.5810526 1.205484 276
## [11] {Sandwich} => {Coffee} 0.04233444 0.5679513 1.178303 280
## [12] {Pastry} => {Coffee} 0.04868461 0.5590278 1.159790 322
## [13] {Cake} => {Coffee} 0.05654672 0.5389049 1.118042 374
We can also create an HTML table widget using the inspectDT() function
from the aruslesViz package. Rules can be interactively filtered and
sorted.
interpreation of these rules.
• 52% of the customers who bought a hot chocolate algo bought a
coffee.
• 63% of the customers who bought a spanish brunch also bought a
coffee.
• 73% of the customers who bought a toast also bought a coffee.
And so on. It seems that in this bakery there are many coffee lovers.
➢Visualize association rules
We will use the arulesViz package to create the visualizations. first we create a
simple scatter plot with different measures of interestingness on the axes (lift and
support) and a third measure (confidence) represented by the color of the points.
The following visualization represents the rules as a graph with items as labeled
vertices, and rules represented as vertices connected to items using arrows.
We can also change the graph layout.
We can represent the rules as a grouped matrix-based visualization. The support
and lift measures are represented by the size and color of the ballons,
respectively. In this case it’s not a very useful visualization, since we only have
coffe on the right-hand-side of the rules.
➢Another execution
We have executed the Apriori algorithm with the appropriate support and
confidence values. What happens if we execute it with low values? How do the
visualizations change? Let’s try with a support level of 0.5% and a confidence
level of 10%.
It’s impossible to analyze these visualizations! For larger rule sets visual analysis
becomes difficult. Furthermore, most of the rules are useless. That’s why we
have to carefully select the right values of support and confidence.
Graph
Parallel coordinates plot
Grouped matrix plot
Scatter plot
Conclusion
Market basket analysis is an unsupervised machine learning technique that can be
useful for finding patterns in transactional data. It can be a very powerful tool for
analyzing the purchasing patterns of consumers. The main algorithm used in
market basket analysis is the apriori algorithm.Apriori algorithm is one of the
most frequently used algorothm in data mining.The three statistical measures in
market basket analysis are support, confidence, and lift. Support measures the
frequency an item appears in a given transactional data set, confidence measures
the algorithm’s predictive power or accuracy, and lift measures how much more
likely an item is purchased relative to its typical purchase rate. In our example,
we examined the transactional patterns of Backery purchases and discovered
both obvious and not-so-obvious patterns in certain transactions.

More Related Content

What's hot

Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
Mohit Rajput
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
Dung Nguyen
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
Knoldus Inc.
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : Concepts
Pragya Pandey
 
Data mining notes
Data mining notesData mining notes
Data mining notes
AVC College of Engineering
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
Mahendra Gupta
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
tsering choezom
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
Seerat Malik
 
Association rules apriori algorithm
Association rules   apriori algorithmAssociation rules   apriori algorithm
Association rules apriori algorithm
Dr. Jasmine Beulah Gnanadurai
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
hktripathy
 
Top Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their ApplicationsTop Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their Applications
PromptCloud
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
VermaAkash32
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
Shuvra Ghosh
 
Data mining in market basket analysis
Data mining in market basket analysisData mining in market basket analysis
Data mining in market basket analysis
TanmayeeMandala
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
neelamoberoi1030
 
Big Data Case Study on Walmart
Big Data Case Study on WalmartBig Data Case Study on Walmart
Big Data Case Study on Walmart
JainamParikh3
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
Umma Khatuna Jannat
 
Business information system with explaination
Business information system with explainationBusiness information system with explaination
Business information system with explaination
Alana Abraham
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
Text Mining
Text MiningText Mining
Text Mining
Biniam Asnake
 

What's hot (20)

Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : Concepts
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
Association rules apriori algorithm
Association rules   apriori algorithmAssociation rules   apriori algorithm
Association rules apriori algorithm
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
Top Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their ApplicationsTop Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their Applications
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
 
Data mining in market basket analysis
Data mining in market basket analysisData mining in market basket analysis
Data mining in market basket analysis
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Big Data Case Study on Walmart
Big Data Case Study on WalmartBig Data Case Study on Walmart
Big Data Case Study on Walmart
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
Business information system with explaination
Business information system with explainationBusiness information system with explaination
Business information system with explaination
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Text Mining
Text MiningText Mining
Text Mining
 

Similar to Market Basket Analysis of bakery Shop

Unit 4_ML.pptx
Unit 4_ML.pptxUnit 4_ML.pptx
Unit 4_ML.pptx
SmithaRaj16
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
Derek Kane
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
hina firdaus
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
AmenahAbbood
 
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxModule_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
HarshitGoel87
 
Market Basket Analysis.pptx
Market Basket Analysis.pptxMarket Basket Analysis.pptx
Market Basket Analysis.pptx
ssuserb7effa
 
Market basketanalysis using r
Market basketanalysis using rMarket basketanalysis using r
Market basketanalysis using r
Yogesh Khandelwal
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
Smarten Augmented Analytics
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
FEG
 
Would this clothing fit me
Would this clothing fit meWould this clothing fit me
Would this clothing fit me
Rishabh Misra
 
Data Mining
Data Mining Data Mining
Mining internal sources of data
Mining internal sources of dataMining internal sources of data
Mining internal sources of data
nomanbhutta
 
Instacart Market Basket Analysis
Instacart Market Basket AnalysisInstacart Market Basket Analysis
Instacart Market Basket Analysis
Sharanya Prathap
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
Wake Tech BAS
 
Marketing analytics
Marketing analyticsMarketing analytics
Marketing analytics
Data Science Thailand
 
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptxbigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
Harshavardhan851231
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdf
Jyoti Yadav
 
Data Mining Lec1.pptx
Data Mining Lec1.pptxData Mining Lec1.pptx
Data Mining Lec1.pptx
NimishaKapoor9
 
Assignment #3 10.19.14
Assignment #3 10.19.14Assignment #3 10.19.14
Assignment #3 10.19.14
Lourdes Greenwood
 
Intro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdfIntro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdf
MachineLearning22
 

Similar to Market Basket Analysis of bakery Shop (20)

Unit 4_ML.pptx
Unit 4_ML.pptxUnit 4_ML.pptx
Unit 4_ML.pptx
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxModule_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
 
Market Basket Analysis.pptx
Market Basket Analysis.pptxMarket Basket Analysis.pptx
Market Basket Analysis.pptx
 
Market basketanalysis using r
Market basketanalysis using rMarket basketanalysis using r
Market basketanalysis using r
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
 
Would this clothing fit me
Would this clothing fit meWould this clothing fit me
Would this clothing fit me
 
Data Mining
Data Mining Data Mining
Data Mining
 
Mining internal sources of data
Mining internal sources of dataMining internal sources of data
Mining internal sources of data
 
Instacart Market Basket Analysis
Instacart Market Basket AnalysisInstacart Market Basket Analysis
Instacart Market Basket Analysis
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
 
Marketing analytics
Marketing analyticsMarketing analytics
Marketing analytics
 
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptxbigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdf
 
Data Mining Lec1.pptx
Data Mining Lec1.pptxData Mining Lec1.pptx
Data Mining Lec1.pptx
 
Assignment #3 10.19.14
Assignment #3 10.19.14Assignment #3 10.19.14
Assignment #3 10.19.14
 
Intro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdfIntro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdf
 

Recently uploaded

The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 

Recently uploaded (20)

The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 

Market Basket Analysis of bakery Shop

  • 1. MARKET BASKET ANALYSIS OF BACKRY SHOP BY VARUN SAHDEV MARKET BASKET ANALYSIS
  • 2. CONTENTS • Introduction • Association rules o Support o Confidence o Lift o Conviction • Benefits of Market Basket Analysis • Application of Market Basket Analysis • Loading Data • Variable Details • Data Analysis • Apriori algorithm o Choice of support and confidence o Execution o Visualize association rules o Another execution • Conclusion
  • 3. Introduction Market basket analysis is an unsupervised learning technique that can be useful for analyzing transactional data. It can be a powerful technique in analyzing the purchasing patterns of consumers. In this tutorial, we will examine the concept behind market basket analysis, introduce the apriori algorithm, as well conduct our own market basket analysis using R. First, it’s important to define the Apriori algorithm, including some statistical concepts (support, confidence, lift and conviction) to select interesting rules. Then we are going to use a data set containing more than 6.000 transactions from a bakery to apply the algorithm and find combinations of products that are bought together Association rules The Apriori algorithm generates association rules for a given data set. An association rule implies that if an item A occurs, then item B also occurs with a certain probability. for example, Transaction Items t1 {T-shirt, Trousers, Belt} t2 {T-shirt, Jacket} t3 {Jacket, Gloves} t4 {T-shirt, Trousers, Jacket} t5 {T-shirt, Trousers, Sneakers, Jacket, Belt} t6 {Trousers, Sneakers, Belt} t7 {Trousers, Belt, Sneakers} In the table above, we can see seven transactions from a clothing store. Each transaction shows items bought in that transaction. We can represent our items as an item set as follows: 𝐼 = {𝑖1, 𝑖2, . . . , 𝑖 𝑘} In our case it corresponds to: $$I={Ttext- shirt, Trousers, Belt, Jacket, Gloves, Sneakers}$$ A transaction is represented by the following expression: 𝑇 = {𝑡1, 𝑡2, . . . , 𝑡 𝑛}
  • 4. For example, $$t_1={Ttext- shirt, Trousers, Belt}$$ Then, an association rule is defined as an implication of the form: 𝑋 ⇒ 𝑌, where 𝑋 ⊂ 𝐼, 𝑌 ⊂ 𝐼 and 𝑋 ∩ 𝑌 = 0 For example, $${Ttext- shirt, Trousers} Rightarrow {Belt}$$ In the following sections we are going to define four metrics to measure the precision of a rule. ➢Support Support is an indication of how frequently the item set appears in the data set. 𝑠𝑢𝑝𝑝(𝑋 ⇒ 𝑌) = |𝑋 ∪ 𝑌| 𝑛 In other words, it’s the number of transactions with both 𝑋 and 𝑌 divided by the total number of transactions. The rules are not useful for low support values. Let’s see different examples using the clothing store transactions from the previous table. • $supp(Ttext- shirt Rightarrow Trousers)=dfrac{3}{7}=43 %$ • 𝑠𝑢𝑝𝑝(𝑇𝑟𝑜𝑢𝑠𝑒𝑟𝑠 ⇒ 𝐵𝑒𝑙𝑡) = 4 7 = 57% • $supp(Ttext- shirt Rightarrow Belt)=dfrac{2}{7}=28 %$ • $supp({Ttext- shirt, Trousers} Rightarrow {Belt})=dfrac{2}{7}=28 %$ ➢Confidence For a rule 𝑋 ⇒ 𝑌, confidence shows the percentage in which 𝑌 is bought with 𝑋. It’s an indication of how often the rule has been found to be true. 𝑐𝑜𝑛𝑓(𝑋 ⇒ 𝑌) = 𝑠𝑢𝑝𝑝(𝑋 ∪ 𝑌) 𝑠𝑢𝑝𝑝(𝑋) For example, the rule $Ttext- shirt Rightarrow Trousers$ has a confidence of 3/4, which means that for 75% of the transactions containing a t-shirt the rule is correct (75% of the times a customer buys a t-shirt, trousers are bought as well). Three more examples:
  • 5. • 𝑐𝑜𝑛𝑓(𝑇𝑟𝑜𝑢𝑠𝑒𝑟𝑠 ⇒ 𝐵𝑒𝑙𝑡) = 4/7 5/7 = 80% • $conf(Ttext- shirt Rightarrow Belt)=dfrac{2/7}{4/7}=50 %$ • $conf({Ttext- shirt, Trousers} Rightarrow {Belt})=dfrac{2/7}{3/7}=66 %$ ➢Lift The lift of a rule is the ratio of the observed support to that expected if 𝑋 and 𝑌 were independent, and is defined as 𝑙𝑖𝑓𝑡(𝑋 ⇒ 𝑌) = 𝑠𝑢𝑝𝑝(𝑋 ∪ 𝑌) 𝑠𝑢𝑝𝑝(𝑋)𝑠𝑢𝑝𝑝(𝑌) Greater lift values indicate stronger associations. Let’s see some examples: • $lift(Ttext- shirt Rightarrow Trousers)=dfrac{3/7}{(4/7)(5/7)}= 1.05$ • 𝑙𝑖𝑓𝑡(𝑇𝑟𝑜𝑢𝑠𝑒𝑟𝑠 ⇒ 𝐵𝑒𝑙𝑡) = 4/7 (5/7)(4/7) = 1.4 • $lift(Ttext- shirt Rightarrow Belt)=dfrac{2/7}{(4/7)(4/7)}=0.875$ • $lift({Ttext- shirt, Trousers} Rightarrow {Belt})=dfrac{2/7}{(3/7)(4/7)}=1.17$ ➢Conviction The conviction of a rule is defined as 𝑐𝑜𝑛𝑣(𝑋 ⇒ 𝑌) = 1 − 𝑠𝑢𝑝𝑝(𝑌) 1 − 𝑐𝑜𝑛𝑓(𝑋 ⇒ 𝑌) It can be interpreted as the ratio of the expected frequency that 𝑋 occurs without 𝑌 if 𝑋 and 𝑌 were independent divided by the observed frequency of incorrect predictions. A high value means that the consequent depends strongly on the antecedent. we can understand more by these examples: • $conv(Ttext- shirt Rightarrow Trousers)= dfrac{1-5/7}{1-3/4}=1.14$ • 𝑐𝑜𝑛𝑣(𝑇𝑟𝑜𝑢𝑠𝑒𝑟𝑠 ⇒ 𝐵𝑒𝑙𝑡) = 1−4/7 1−4/5 = 2.14 • $conv(Ttext- shirt Rightarrow Belt)=dfrac{1-4/7}{1-1/2}=0.86$ • $conv({Ttext- shirt, Trousers} Rightarrow {Belt})=dfrac{1-4/7}{1- 2/3}=1.28$
  • 6. Benefits Of Market Basket Analysis The followings are main benefits of Market Basket Analysis :- • Store Layout we can organize or set up store according to market basket analysis in order to increase revenue. Once we know the products in the market basket, we can arrange or place the products near each other so that the customer notice and take a decision to buy them. Market business analysis acts as a guide to organize store to get the best revenues. • Marketing Messages Market basket analysis increase the efficiency of marketing messages whether it is done by phone, email, social media etc.we can suggest the next best option to the customers by using market business analysis data. With the help of market business analysis data, we can give relevant suggestions to our customer instead of telling them about irritating marketing offers. • Maintain Inventory With the help of market basket analysis, we may know what are the products that our customers are going to buy in future and we can maintain our inventory accordingly. we can also predict the future purchase of customers over a period of time on the basis of market basket analysis data. we can also use initial sales data to maintain our inventory. we can also predict the shortage of useful items or more demanded items in our store and then arrange our stock or inventory accordingly. • Content Placement Content placement is very important when we are doing an e-commerce business. our conversion rates will increase when our products are displayed or arranged in a right order. Marketing basket analysis is used by the online retailers to display the content that is likely to read next by the customers. It will help to engage customers on our website. Market basket analysis helps to increase traffic on our website and to get better conversion rates. • Recommendation Engines Market basket analysis is the base for creating recommendation engines. A recommendation engine is a software that analyzes identifies and recommends content to users in which they are interested. A recommendation engine is an important part of application and software product. It collects information about people’s habits and then recommends contents to them.
  • 7. Applicaion Of Market Basket Analysis Market basket analysis is applied to various fields of the retail sector in order to boost sales and generate revenue by identifying the needs of the customers and make purchase suggestions to them. • Cross Selling Cross-selling is basically a sales technique in which seller suggests some related product to a customer after he buys a product. A seller influences the customer to spend more by purchasing more products related to the product that has already been purchased by him. For instance, if someone buys milk from a store, the seller asks or suggests him to buy coffee or tea as well. So basically the seller suggests the complementary product to the customer with the product that he has already purchased. Market basket analysis helps the retailer to know the consumer behavior and then go for cross-selling. • Product Placement It refers to placing the complimentary (pen and paper)and substitute goods (tea and coffee) together so that the customer addresses the goods and will buy both the goods together. If a seller places these kinds of goods together there is a probability that a customer will purchase them together. Market basket analysis helps the retailer to identify the goods that a customer can purchase together. • Affinity Promotion Affinity promotion is a method of promotion that design promotional events based on associated products. Market basket analysis affinity promotion is a useful way to prepare and analyze questionnaire data. • Fraud Detection Market basket analysis is also applied to fraud detection. It may be possible to identify purchase behavior that can associate with fraud on the basis of market basket analysis data that contain credit card usage. Hence market basket analysis is also useful in fraud detection. • Customer Behavior Market basket analysis helps to understand customer behavior. It understands the customer behavior under different conditions. It provides an insight into customer behavior. It allows the retailer to identify the relationship between two products that people tend to buy and hence helps to understand the customer behavior towards a product or service. Hence, market basket analysis helps the retailer to get an insight into customer behavior and to understand the relationship between two or more goods so that they can offer or do purchase suggestions to their customers so that they will buy more from their stores and they can earn great revenue.
  • 8. Loading Data First we need to load some libraries and import our data. We can use the function read.transactions() from the arules package to create a transactions object. with the help of Descriptive Analysis , we understands the basic attributes , features and variables in our data set. Basic under standing of data set ➢ Transaction object # Transaction object ## transactions in sparse format with ## 6614 transactions (rows) and ## 104 items (columns) ➢ Summary # Summary ## transactions as itemMatrix in sparse format with ## 6614 rows (elements/itemsets/transactions) and ## 104 columns (items) and a density of 0.02008705 ## ## most frequent items: ## Coffee Bread Tea Cake Pastry (Other) ## 3188 2146 941 694 576 6272 ## ## element (itemset/transaction) length distribution: ## sizes ## 1 2 3 4 5 6 7 8 9 10 ## 2556 2154 1078 546 187 67 18 3 2 3 ## ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.000 1.000 2.000 2.089 3.000 10.000 ## ## includes extended item information - examples: ## labels ## 1 Adjustment ## 2 Afternoon with the baker ## 3 Alfajores ## ## includes extended transaction information - examples: ## transactionID ## 1 1 ## 2 10 ## 3 1000
  • 9. ➢ Structure # Structure ## Formal class 'transactions' [package "arules"] with 3 slots ## ..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots ## ..@ itemInfo :'data.frame': 104 obs. of 1 variable: ## .. ..$ labels: chr [1:104] "Adjustment" "Afternoon with the baker" "Alfajores" "Argentin a Night" ... ## ..@ itemsetInfo:'data.frame': 6614 obs. of 1 variable: ## .. ..$ transactionID: Factor w/ 6614 levels "1","10","1000",..: 1 2 3 4 5 6 7 8 9 10 ... Variable Details The data set contains 15.010 observations and the following columns, • Date. Categorical variable that tells us the date of the transactions (YYYY- MM-DD format). The column includes dates from 30/10/2016 to 09/04/2017. • Time. Categorical variable that tells us the time of the transactions (HH:MM:SS format). • Transaction. Quantitative variable that allows us to differentiate the transactions. The rows that share the same value in this field belong to the same transaction, that’s why the data set has less transactions than observations. • Item. Categorical variable with the products. Data Analysis Before applying the Apriori algorithm on the data set, we use some basic plots.This Visualization can help to learn more about the transactions. For example, we can generate an itemFrequencyPlot() to create an item Frequency Bar Plot to view the distribution of products.
  • 10. The itemFrequencyPlot() is a method to create item frequency bar plot for inspecting the item frequency distribution for objects on Item matrix it allows us to show the absolute or relative values. If absolute it will plot numeric frequencies of each item independently. If relative it will plot how many times these items have appeared as compared to others, we can understand by these plots.
  • 11. by visualize the plots we can say that, Coffee is the best-selling product by far, followed by bread and tea. For more understanding the beahviors and patterns in tranactions,we use some visualizations describing the time distribution using the ggplot() function. • Transactions per month • Transactions per weekday • Transactions per hour The data set includes dates from 30/10/2016 to 09/04/2017, that’s why we have so few transactions in October and April.
  • 12. As we can see, Saturday is the busiest day in the bakery. Conversely, Wednesday is the day with fewer transactions. There’s not much to discuss with this visualization. The results are logical and expected.
  • 13. Apriori algorithm ➢Choice of support and confidence The first step in order to create a set of association rules is to determine the optimal thresholds for support and confidence. If we set these values too low, then the algorithm will take longer to execute and we will get a lot of rules (most of them will not be useful). Then, what values do we choose? We can try different values of support and confidence and see graphically how many rules are generated for each combination. In the following plots we can see the number of rules generated with a support level of 10%, 5%, 1% and 0.5%.
  • 14. We can join the four lines to improve the visualization. Analysis of Results • Support level of 10%. We only identify a few rules with very low confidence levels. This means that there are no relatively frequent associations in our data set. We can’t choose this value, the resulting rules are unrepresentative. • Support level of 5%. We only identify a rule with a confidence of at least 50%. It seems that we have to look for support levels below 5% to obtain a greater number of rules with a reasonable confidence. • Support level of 1%. We started to get dozens of rules, of which 13 have a confidence of at least 50%. • Support level of 0.5%. Too many rules to analyze! As per above analysis we will use a support level of 1% and a confidence level of 50% for further Analysis.
  • 15. ➢Execution we can execute the Apriori algorithm with the values obtained in the previous section. with help of apriori() function. For generated association rules. we can use inspect() function for Describe and view this rules. # Association rules ## lhs rhs support confidence lift count ## [1] {Tiffin} => {Coffee} 0.01058361 0.5468750 1.134577 70 ## [2] {Spanish Brunch} => {Coffee} 0.01406108 0.6326531 1.312537 93 ## [3] {Scone} => {Coffee} 0.01844572 0.5422222 1.124924 122 ## [4] {Toast} => {Coffee} 0.02570305 0.7296137 1.513697 170 ## [5] {Alfajores} => {Coffee} 0.02237678 0.5522388 1.145705 148 ## [6] {Juice} => {Coffee} 0.02131842 0.5300752 1.099723 141 ## [7] {Hot chocolate} => {Coffee} 0.02721500 0.5263158 1.091924 180 ## [8] {Medialuna} => {Coffee} 0.03296039 0.5751979 1.193337 218 ## [9] {Cookies} => {Coffee} 0.02978530 0.5267380 1.092800 197 ## [10] {NONE} => {Coffee} 0.04172966 0.5810526 1.205484 276 ## [11] {Sandwich} => {Coffee} 0.04233444 0.5679513 1.178303 280 ## [12] {Pastry} => {Coffee} 0.04868461 0.5590278 1.159790 322 ## [13] {Cake} => {Coffee} 0.05654672 0.5389049 1.118042 374 We can also create an HTML table widget using the inspectDT() function from the aruslesViz package. Rules can be interactively filtered and sorted. interpreation of these rules. • 52% of the customers who bought a hot chocolate algo bought a coffee. • 63% of the customers who bought a spanish brunch also bought a coffee. • 73% of the customers who bought a toast also bought a coffee. And so on. It seems that in this bakery there are many coffee lovers.
  • 16. ➢Visualize association rules We will use the arulesViz package to create the visualizations. first we create a simple scatter plot with different measures of interestingness on the axes (lift and support) and a third measure (confidence) represented by the color of the points. The following visualization represents the rules as a graph with items as labeled vertices, and rules represented as vertices connected to items using arrows.
  • 17. We can also change the graph layout.
  • 18. We can represent the rules as a grouped matrix-based visualization. The support and lift measures are represented by the size and color of the ballons, respectively. In this case it’s not a very useful visualization, since we only have coffe on the right-hand-side of the rules. ➢Another execution We have executed the Apriori algorithm with the appropriate support and confidence values. What happens if we execute it with low values? How do the visualizations change? Let’s try with a support level of 0.5% and a confidence level of 10%. It’s impossible to analyze these visualizations! For larger rule sets visual analysis becomes difficult. Furthermore, most of the rules are useless. That’s why we have to carefully select the right values of support and confidence.
  • 21. Conclusion Market basket analysis is an unsupervised machine learning technique that can be useful for finding patterns in transactional data. It can be a very powerful tool for analyzing the purchasing patterns of consumers. The main algorithm used in market basket analysis is the apriori algorithm.Apriori algorithm is one of the most frequently used algorothm in data mining.The three statistical measures in market basket analysis are support, confidence, and lift. Support measures the frequency an item appears in a given transactional data set, confidence measures the algorithm’s predictive power or accuracy, and lift measures how much more likely an item is purchased relative to its typical purchase rate. In our example, we examined the transactional patterns of Backery purchases and discovered both obvious and not-so-obvious patterns in certain transactions.