SlideShare a Scribd company logo
Market Basket Analysis
using Apriori algorithm
on “Groceries” dataset
Submitted By:
MadhuKiran P C20-085
Sai Vinod P C20-131
Sesha Sai Harsha C20-142
Contents
Overview:................................................................................................................................................3
Apriori algorithm:....................................................................................................................................3
The data: .................................................................................................................................................4
Transformed data to dummy flag variables:...........................................................................................4
Program flow: .........................................................................................................................................5
Top 12 most frequent items: ..................................................................................................................5
Results: Top 12 rules by “support”: ........................................................................................................5
Results: Top 12 rules by “confidence”:...................................................................................................6
Results: Top 12 rules by “lift”: ................................................................................................................6
Web:........................................................................................................................................................7
Discussion: ..............................................................................................................................................7
References: .............................................................................................................................................7
Overview:
 Identifies frequently purchased groceries from given transactional data
 Implemented SPSS Modeler A-priori modelling node to calculate support, confidence and lift for
association rules
 Listed top 12 frequent bought items, top 10 combinations by support, confidence and lift values.
Apriori algorithm:
 Apriori algorithm employs a simple a priori belief as guideline for reducing the association rule
search space: all subsets of a frequent item-set must also be frequent
 The support of an item-set or rule measures how frequently it occurs in the data
 A rule's confidence is a measurement of its predictive power or accuracy. It is defined as the
support of the item-set containing both X and Y divided by the support of the item-set
containing only X
 Lift is a measure of how much more likely one item is to be purchased relative to its typical
purchase rate, given that you know another item has been purchased
The data:
citrus fruit semi-finished
bread
margarine ready soups
tropical fruit yogurt coffee
whole milk
pip fruit yogurt cream cheese meat spreads
other vegetables whole milk condensed milk long life bakery
product
whole milk butter yogurt rice abrasive cleaner
rolls/buns
other vegetables UHT milk rolls/buns bottled beer liquor (appetizer)
potted plants
whole milk cereals
tropical fruit other vegetables white bread bottled water chocolate
citrus fruit tropical fruit whole milk butter curd
beef
frankfurter rolls/buns soda
 The dataset has been created by researchers Department of Information Systems and
Operations, Wirtschaftsuniversitat Wien, Austria
 The “Groceries” data set contains 1 month (30 days) of real-world point-of-sale transaction data
from a typical local grocery outlet. The data set contains 9835 transactions and the items are
aggregated to 169 categories
 Item categories have been used instead of brands, for simplicity. So “milk” can refer to any
brand of milk.
Transformed data to dummy flag variables:
citrus
fruit
tropical
fruit
whole
milk
pip fruit other
vegetables
rolls/buns potted
plants
beef
1 1 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0
3 0 0 1 0 0 0 0 0
4 0 0 0 1 0 0 0 0
5 0 0 1 0 1 0 0 0
6 0 0 1 0 0 0 0 0
7 0 0 0 0 0 1 0 0
8 0 0 0 0 1 1 0 0
9 0 0 0 0 0 0 1 0
10 0 0 1 0 0 0 0 0
11 0 1 0 0 1 0 0 0
12 1 1 1 0 0 0 0 0
13 0 0 0 0 0 0 0 1
Program flow:
 Converted dataset to dummy flag variables
 Load the dataset into SPSS environment
 Using data audit node, the matrix has 169 columns (corresponding to 169 item categories) and
9835 rows (corresponding to 9835 transactions)
 Apply A-priori modelling node with 5% support and 30% confidence and lift parameters to
generate association rules
Top 12 most frequent items:
Results: Top 12 rules by “support”:
Consequent Antecedent Support % Confidence % Lift
other
vegetables
whole milk 25.310 30.300 1.568
whole milk other vegetables 19.318 39.698 1.568
whole milk rolls/buns 18.443 31.542 1.246
other
vegetables
yogurt 14.011 32.570 1.686
whole milk yogurt 14.011 39.646 1.566
whole milk bottled water 11.270 30.789 1.216
other
vegetables
root vegetables 10.832 44.280 2.292
whole milk root vegetables 10.832 45.087 1.781
2513
1903 1809 1715
1372
1087 1072 1032 969 924 875 814
0
500
1000
1500
2000
2500
3000
Top 12 most frequent items
other
vegetables
tropical fruit 10.395 33.801 1.750
whole milk tropical fruit 10.395 39.130 1.546
Results: Top 12 rules by “confidence”:
Consequent Antecedent Support % Confidence % Lift
whole milk butter 5.701 49.616 1.960
whole milk curd 5.642 48.320 1.909
whole milk domestic eggs 6.459 47.856 1.891
whole milk root vegetables 10.832 45.087 1.781
other
vegetables
root vegetables 10.832 44.280 2.292
whole milk whipped/sour cream 7.333 43.936 1.736
other
vegetables
yogurt and whole milk 5.555 43.045 2.228
whole milk beef 5.351 40.872 1.615
whole milk margarine 6.211 40.845 1.614
other
vegetables
whipped/sour cream 7.333 40.755 2.110
Results: Top 12 rules by “lift”:
Consequent Antecedent Support % Confidence % Lift
root vegetables beef 5.351 33.243 3.069
root vegetables other vegetables and whole milk 7.669 31.749 2.931
yogurt curd 5.642 34.884 2.490
other
vegetables
root vegetables 10.832 44.280 2.292
other
vegetables
yogurt and whole milk 5.555 43.045 2.228
yogurt other vegetables and whole milk 7.669 31.179 2.225
other
vegetables
whipped/sour cream 7.333 40.755 2.110
other
vegetables
pork 5.846 38.155 1.975
other
vegetables
beef 5.351 38.147 1.975
whole milk butter 5.701 49.616 1.960
Web:
➔ We can observe that those who buys pastry, citrus fruit & sausage are a group of customers
stand out
➔ It does mean that (here, for example), a customer is more likely to buy any of these three
products if he/she buys one pf those three
Discussion:
 We can see that the top rules when sorted by “support” and “confidence” are dominated by
“whole milk” and “other vegetables”, which are the two most frequently bought items overall
 However, when “lift” is considered we get rules not involving “whole milk” and “other
vegetables”. A lift value of greater than 1 implies that LHS and RHS sets are found more often
than purely by chance
 Although such market basket analysis may yield many rules, not all of them would be useful.
Some would be trivial, some inexplicable and only a very few of them would be useful. Further
analysis and extra domain knowledge and common-sense are often required to subjectively
judge the real-world usefulness of the rules
References:
 Dataset download link (via “arules” package) http://cran.r-
project.org/web/packages/arules/index.html
 "Fast algorithms for mining association rule", in Proceedings of the 20th International
Conference on Very Large Databases, pp. 487-499, by R. Agrawal, and R.Srikant, (1994).
 “Implications of probabilistic data modelling for mining association rules” , in Studies in
Classification, Data Analysis, and Knowledge Organization: from Data and Information Analysis
to Knowledge Engineering, pp. 598–605, by M. Hahsler, K. Hornik, and T. Reutterer, (2006).
 “Machine Learning with R”, Brett Lantz, Packt Publishing

More Related Content

What's hot

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
Market baasket analysis
Market baasket analysisMarket baasket analysis
Market baasket analysis
SiddharthaPanapakam
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
Usama Fayyaz
 
Linear regression
Linear regressionLinear regression
Linear regression
MartinHogg9
 
BCA DATA STRUCTURES SEARCHING AND SORTING MRS.SOWMYA JYOTHI
BCA DATA STRUCTURES SEARCHING AND SORTING MRS.SOWMYA JYOTHIBCA DATA STRUCTURES SEARCHING AND SORTING MRS.SOWMYA JYOTHI
BCA DATA STRUCTURES SEARCHING AND SORTING MRS.SOWMYA JYOTHI
Sowmya Jyothi
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industry
Bhagath Gopinath
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
Pranov Mishra
 
Temporal data mining
Temporal data miningTemporal data mining
Temporal data mining
ReachLocal Services India
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Database Management System
Database Management SystemDatabase Management System
Database Management System
Nishant Munjal
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
Derek Kane
 
SQL Server database project ideas - Top, latest and best project ideas final ...
SQL Server database project ideas - Top, latest and best project ideas final ...SQL Server database project ideas - Top, latest and best project ideas final ...
SQL Server database project ideas - Top, latest and best project ideas final ...
Team Codingparks
 
Case based reasoning
Case based reasoningCase based reasoning
Case based reasoning
ParthVichhi1
 
Data mining
Data miningData mining
Data mining
Birju Tank
 
Analysis vs reporting
Analysis vs reportingAnalysis vs reporting
Analysis vs reporting
Rajashree Thirupathi
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
Mahendra Gupta
 
Decision tree for Predictive Modeling
Decision tree for Predictive ModelingDecision tree for Predictive Modeling
Decision tree for Predictive Modeling
Edureka!
 
Insertion Sort, Quick Sort And Their complexity
Insertion Sort, Quick Sort And Their complexityInsertion Sort, Quick Sort And Their complexity
Insertion Sort, Quick Sort And Their complexity
Motaleb Hossen Manik
 

What's hot (20)

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Market baasket analysis
Market baasket analysisMarket baasket analysis
Market baasket analysis
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Linear regression
Linear regressionLinear regression
Linear regression
 
BCA DATA STRUCTURES SEARCHING AND SORTING MRS.SOWMYA JYOTHI
BCA DATA STRUCTURES SEARCHING AND SORTING MRS.SOWMYA JYOTHIBCA DATA STRUCTURES SEARCHING AND SORTING MRS.SOWMYA JYOTHI
BCA DATA STRUCTURES SEARCHING AND SORTING MRS.SOWMYA JYOTHI
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industry
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Temporal data mining
Temporal data miningTemporal data mining
Temporal data mining
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
 
SQL Server database project ideas - Top, latest and best project ideas final ...
SQL Server database project ideas - Top, latest and best project ideas final ...SQL Server database project ideas - Top, latest and best project ideas final ...
SQL Server database project ideas - Top, latest and best project ideas final ...
 
Case based reasoning
Case based reasoningCase based reasoning
Case based reasoning
 
Data mining
Data miningData mining
Data mining
 
Analysis vs reporting
Analysis vs reportingAnalysis vs reporting
Analysis vs reporting
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Decision tree for Predictive Modeling
Decision tree for Predictive ModelingDecision tree for Predictive Modeling
Decision tree for Predictive Modeling
 
Insertion Sort, Quick Sort And Their complexity
Insertion Sort, Quick Sort And Their complexityInsertion Sort, Quick Sort And Their complexity
Insertion Sort, Quick Sort And Their complexity
 

Similar to Market basket analysis using apriori algorithm on

Association Mining
Association Mining Association Mining
Association Mining
Edureka!
 
2015 06-24 precision dairy farming
2015 06-24 precision dairy farming2015 06-24 precision dairy farming
2015 06-24 precision dairy farming
Henk Hogeveen
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
Smarten Augmented Analytics
 
Fact Sheet Find a contact near you by visiting www.gew.docx
Fact Sheet Find a contact near you by visiting www.gew.docxFact Sheet Find a contact near you by visiting www.gew.docx
Fact Sheet Find a contact near you by visiting www.gew.docx
mydrynan
 
Final Year Capstone Project
Final Year Capstone ProjectFinal Year Capstone Project
Final Year Capstone Project
Omar Ziena
 
Ifc handbook agro_supplychains
Ifc handbook agro_supplychainsIfc handbook agro_supplychains
Ifc handbook agro_supplychains
Dr Lendy Spires
 
The Internet of Food and Farm
The Internet of Food and FarmThe Internet of Food and Farm
The Internet of Food and Farm
Sjaak Wolfert
 
Organic skim milk market
Organic skim milk marketOrganic skim milk market
Organic skim milk market
CillianMurphy7
 
Nationwide Interoperability Roadmap draft version 1.0
Nationwide Interoperability Roadmap draft version 1.0Nationwide Interoperability Roadmap draft version 1.0
Nationwide Interoperability Roadmap draft version 1.0
Ed Dodds
 
Health care interoperability roadmap released by HHS ONC
Health care interoperability roadmap released by HHS ONCHealth care interoperability roadmap released by HHS ONC
Health care interoperability roadmap released by HHS ONC
David Sweigert
 
Dairt report
Dairt reportDairt report
Dairt report
Sandeep Kumar
 
View Online Catalogs Of Dairy Equipments | NETCO | www.milkanalyser.com
View Online Catalogs Of Dairy Equipments | NETCO | www.milkanalyser.comView Online Catalogs Of Dairy Equipments | NETCO | www.milkanalyser.com
View Online Catalogs Of Dairy Equipments | NETCO | www.milkanalyser.com
NETCO-New Dairy Engineering & Trading Company Pvt. Ltd.
 
EHCI Euro Health Consumer Index 2014
EHCI Euro Health Consumer Index 2014EHCI Euro Health Consumer Index 2014
EHCI Euro Health Consumer Index 2014
Emergency Live
 
BCP
BCPBCP
BCP
Kai Zhu
 
Data mining arm-2009-v0
Data mining arm-2009-v0Data mining arm-2009-v0
Data mining arm-2009-v0
Prithwis Mukerjee
 
How Can We Use Big Data in the Food Supply Chain
How Can We Use Big Data in the Food Supply Chain How Can We Use Big Data in the Food Supply Chain
How Can We Use Big Data in the Food Supply Chain
EtQ, Inc.
 
Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...
Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...
Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...
Dr Dev Kambhampati
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
hina firdaus
 
Control points for good manufacturing practices on dairy farms
Control points for good manufacturing practices on dairy farmsControl points for good manufacturing practices on dairy farms
Control points for good manufacturing practices on dairy farms
Eng. A.karam Al Malkawi
 
báo cáo xuất khẩu điều
báo cáo xuất khẩu điềubáo cáo xuất khẩu điều
báo cáo xuất khẩu điều
jackiela
 

Similar to Market basket analysis using apriori algorithm on (20)

Association Mining
Association Mining Association Mining
Association Mining
 
2015 06-24 precision dairy farming
2015 06-24 precision dairy farming2015 06-24 precision dairy farming
2015 06-24 precision dairy farming
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
 
Fact Sheet Find a contact near you by visiting www.gew.docx
Fact Sheet Find a contact near you by visiting www.gew.docxFact Sheet Find a contact near you by visiting www.gew.docx
Fact Sheet Find a contact near you by visiting www.gew.docx
 
Final Year Capstone Project
Final Year Capstone ProjectFinal Year Capstone Project
Final Year Capstone Project
 
Ifc handbook agro_supplychains
Ifc handbook agro_supplychainsIfc handbook agro_supplychains
Ifc handbook agro_supplychains
 
The Internet of Food and Farm
The Internet of Food and FarmThe Internet of Food and Farm
The Internet of Food and Farm
 
Organic skim milk market
Organic skim milk marketOrganic skim milk market
Organic skim milk market
 
Nationwide Interoperability Roadmap draft version 1.0
Nationwide Interoperability Roadmap draft version 1.0Nationwide Interoperability Roadmap draft version 1.0
Nationwide Interoperability Roadmap draft version 1.0
 
Health care interoperability roadmap released by HHS ONC
Health care interoperability roadmap released by HHS ONCHealth care interoperability roadmap released by HHS ONC
Health care interoperability roadmap released by HHS ONC
 
Dairt report
Dairt reportDairt report
Dairt report
 
View Online Catalogs Of Dairy Equipments | NETCO | www.milkanalyser.com
View Online Catalogs Of Dairy Equipments | NETCO | www.milkanalyser.comView Online Catalogs Of Dairy Equipments | NETCO | www.milkanalyser.com
View Online Catalogs Of Dairy Equipments | NETCO | www.milkanalyser.com
 
EHCI Euro Health Consumer Index 2014
EHCI Euro Health Consumer Index 2014EHCI Euro Health Consumer Index 2014
EHCI Euro Health Consumer Index 2014
 
BCP
BCPBCP
BCP
 
Data mining arm-2009-v0
Data mining arm-2009-v0Data mining arm-2009-v0
Data mining arm-2009-v0
 
How Can We Use Big Data in the Food Supply Chain
How Can We Use Big Data in the Food Supply Chain How Can We Use Big Data in the Food Supply Chain
How Can We Use Big Data in the Food Supply Chain
 
Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...
Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...
Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
Control points for good manufacturing practices on dairy farms
Control points for good manufacturing practices on dairy farmsControl points for good manufacturing practices on dairy farms
Control points for good manufacturing practices on dairy farms
 
báo cáo xuất khẩu điều
báo cáo xuất khẩu điềubáo cáo xuất khẩu điều
báo cáo xuất khẩu điều
 

Recently uploaded

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 

Recently uploaded (20)

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 

Market basket analysis using apriori algorithm on

  • 1. Market Basket Analysis using Apriori algorithm on “Groceries” dataset Submitted By: MadhuKiran P C20-085 Sai Vinod P C20-131 Sesha Sai Harsha C20-142
  • 2. Contents Overview:................................................................................................................................................3 Apriori algorithm:....................................................................................................................................3 The data: .................................................................................................................................................4 Transformed data to dummy flag variables:...........................................................................................4 Program flow: .........................................................................................................................................5 Top 12 most frequent items: ..................................................................................................................5 Results: Top 12 rules by “support”: ........................................................................................................5 Results: Top 12 rules by “confidence”:...................................................................................................6 Results: Top 12 rules by “lift”: ................................................................................................................6 Web:........................................................................................................................................................7 Discussion: ..............................................................................................................................................7 References: .............................................................................................................................................7
  • 3. Overview:  Identifies frequently purchased groceries from given transactional data  Implemented SPSS Modeler A-priori modelling node to calculate support, confidence and lift for association rules  Listed top 12 frequent bought items, top 10 combinations by support, confidence and lift values. Apriori algorithm:  Apriori algorithm employs a simple a priori belief as guideline for reducing the association rule search space: all subsets of a frequent item-set must also be frequent  The support of an item-set or rule measures how frequently it occurs in the data  A rule's confidence is a measurement of its predictive power or accuracy. It is defined as the support of the item-set containing both X and Y divided by the support of the item-set containing only X  Lift is a measure of how much more likely one item is to be purchased relative to its typical purchase rate, given that you know another item has been purchased
  • 4. The data: citrus fruit semi-finished bread margarine ready soups tropical fruit yogurt coffee whole milk pip fruit yogurt cream cheese meat spreads other vegetables whole milk condensed milk long life bakery product whole milk butter yogurt rice abrasive cleaner rolls/buns other vegetables UHT milk rolls/buns bottled beer liquor (appetizer) potted plants whole milk cereals tropical fruit other vegetables white bread bottled water chocolate citrus fruit tropical fruit whole milk butter curd beef frankfurter rolls/buns soda  The dataset has been created by researchers Department of Information Systems and Operations, Wirtschaftsuniversitat Wien, Austria  The “Groceries” data set contains 1 month (30 days) of real-world point-of-sale transaction data from a typical local grocery outlet. The data set contains 9835 transactions and the items are aggregated to 169 categories  Item categories have been used instead of brands, for simplicity. So “milk” can refer to any brand of milk. Transformed data to dummy flag variables: citrus fruit tropical fruit whole milk pip fruit other vegetables rolls/buns potted plants beef 1 1 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 3 0 0 1 0 0 0 0 0 4 0 0 0 1 0 0 0 0 5 0 0 1 0 1 0 0 0 6 0 0 1 0 0 0 0 0 7 0 0 0 0 0 1 0 0 8 0 0 0 0 1 1 0 0 9 0 0 0 0 0 0 1 0 10 0 0 1 0 0 0 0 0 11 0 1 0 0 1 0 0 0 12 1 1 1 0 0 0 0 0 13 0 0 0 0 0 0 0 1
  • 5. Program flow:  Converted dataset to dummy flag variables  Load the dataset into SPSS environment  Using data audit node, the matrix has 169 columns (corresponding to 169 item categories) and 9835 rows (corresponding to 9835 transactions)  Apply A-priori modelling node with 5% support and 30% confidence and lift parameters to generate association rules Top 12 most frequent items: Results: Top 12 rules by “support”: Consequent Antecedent Support % Confidence % Lift other vegetables whole milk 25.310 30.300 1.568 whole milk other vegetables 19.318 39.698 1.568 whole milk rolls/buns 18.443 31.542 1.246 other vegetables yogurt 14.011 32.570 1.686 whole milk yogurt 14.011 39.646 1.566 whole milk bottled water 11.270 30.789 1.216 other vegetables root vegetables 10.832 44.280 2.292 whole milk root vegetables 10.832 45.087 1.781 2513 1903 1809 1715 1372 1087 1072 1032 969 924 875 814 0 500 1000 1500 2000 2500 3000 Top 12 most frequent items
  • 6. other vegetables tropical fruit 10.395 33.801 1.750 whole milk tropical fruit 10.395 39.130 1.546 Results: Top 12 rules by “confidence”: Consequent Antecedent Support % Confidence % Lift whole milk butter 5.701 49.616 1.960 whole milk curd 5.642 48.320 1.909 whole milk domestic eggs 6.459 47.856 1.891 whole milk root vegetables 10.832 45.087 1.781 other vegetables root vegetables 10.832 44.280 2.292 whole milk whipped/sour cream 7.333 43.936 1.736 other vegetables yogurt and whole milk 5.555 43.045 2.228 whole milk beef 5.351 40.872 1.615 whole milk margarine 6.211 40.845 1.614 other vegetables whipped/sour cream 7.333 40.755 2.110 Results: Top 12 rules by “lift”: Consequent Antecedent Support % Confidence % Lift root vegetables beef 5.351 33.243 3.069 root vegetables other vegetables and whole milk 7.669 31.749 2.931 yogurt curd 5.642 34.884 2.490 other vegetables root vegetables 10.832 44.280 2.292 other vegetables yogurt and whole milk 5.555 43.045 2.228 yogurt other vegetables and whole milk 7.669 31.179 2.225 other vegetables whipped/sour cream 7.333 40.755 2.110 other vegetables pork 5.846 38.155 1.975 other vegetables beef 5.351 38.147 1.975 whole milk butter 5.701 49.616 1.960
  • 7. Web: ➔ We can observe that those who buys pastry, citrus fruit & sausage are a group of customers stand out ➔ It does mean that (here, for example), a customer is more likely to buy any of these three products if he/she buys one pf those three Discussion:  We can see that the top rules when sorted by “support” and “confidence” are dominated by “whole milk” and “other vegetables”, which are the two most frequently bought items overall  However, when “lift” is considered we get rules not involving “whole milk” and “other vegetables”. A lift value of greater than 1 implies that LHS and RHS sets are found more often than purely by chance  Although such market basket analysis may yield many rules, not all of them would be useful. Some would be trivial, some inexplicable and only a very few of them would be useful. Further analysis and extra domain knowledge and common-sense are often required to subjectively judge the real-world usefulness of the rules References:  Dataset download link (via “arules” package) http://cran.r- project.org/web/packages/arules/index.html  "Fast algorithms for mining association rule", in Proceedings of the 20th International Conference on Very Large Databases, pp. 487-499, by R. Agrawal, and R.Srikant, (1994).  “Implications of probabilistic data modelling for mining association rules” , in Studies in Classification, Data Analysis, and Knowledge Organization: from Data and Information Analysis to Knowledge Engineering, pp. 598–605, by M. Hahsler, K. Hornik, and T. Reutterer, (2006).  “Machine Learning with R”, Brett Lantz, Packt Publishing