SlideShare a Scribd company logo
1 of 7
Download to read offline
Market Basket Analysis
using Apriori algorithm
on “Groceries” dataset
Submitted By:
MadhuKiran P C20-085
Sai Vinod P C20-131
Sesha Sai Harsha C20-142
Contents
Overview:................................................................................................................................................3
Apriori algorithm:....................................................................................................................................3
The data: .................................................................................................................................................4
Transformed data to dummy flag variables:...........................................................................................4
Program flow: .........................................................................................................................................5
Top 12 most frequent items: ..................................................................................................................5
Results: Top 12 rules by “support”: ........................................................................................................5
Results: Top 12 rules by “confidence”:...................................................................................................6
Results: Top 12 rules by “lift”: ................................................................................................................6
Web:........................................................................................................................................................7
Discussion: ..............................................................................................................................................7
References: .............................................................................................................................................7
Overview:
 Identifies frequently purchased groceries from given transactional data
 Implemented SPSS Modeler A-priori modelling node to calculate support, confidence and lift for
association rules
 Listed top 12 frequent bought items, top 10 combinations by support, confidence and lift values.
Apriori algorithm:
 Apriori algorithm employs a simple a priori belief as guideline for reducing the association rule
search space: all subsets of a frequent item-set must also be frequent
 The support of an item-set or rule measures how frequently it occurs in the data
 A rule's confidence is a measurement of its predictive power or accuracy. It is defined as the
support of the item-set containing both X and Y divided by the support of the item-set
containing only X
 Lift is a measure of how much more likely one item is to be purchased relative to its typical
purchase rate, given that you know another item has been purchased
The data:
citrus fruit semi-finished
bread
margarine ready soups
tropical fruit yogurt coffee
whole milk
pip fruit yogurt cream cheese meat spreads
other vegetables whole milk condensed milk long life bakery
product
whole milk butter yogurt rice abrasive cleaner
rolls/buns
other vegetables UHT milk rolls/buns bottled beer liquor (appetizer)
potted plants
whole milk cereals
tropical fruit other vegetables white bread bottled water chocolate
citrus fruit tropical fruit whole milk butter curd
beef
frankfurter rolls/buns soda
 The dataset has been created by researchers Department of Information Systems and
Operations, Wirtschaftsuniversitat Wien, Austria
 The “Groceries” data set contains 1 month (30 days) of real-world point-of-sale transaction data
from a typical local grocery outlet. The data set contains 9835 transactions and the items are
aggregated to 169 categories
 Item categories have been used instead of brands, for simplicity. So “milk” can refer to any
brand of milk.
Transformed data to dummy flag variables:
citrus
fruit
tropical
fruit
whole
milk
pip fruit other
vegetables
rolls/buns potted
plants
beef
1 1 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0
3 0 0 1 0 0 0 0 0
4 0 0 0 1 0 0 0 0
5 0 0 1 0 1 0 0 0
6 0 0 1 0 0 0 0 0
7 0 0 0 0 0 1 0 0
8 0 0 0 0 1 1 0 0
9 0 0 0 0 0 0 1 0
10 0 0 1 0 0 0 0 0
11 0 1 0 0 1 0 0 0
12 1 1 1 0 0 0 0 0
13 0 0 0 0 0 0 0 1
Program flow:
 Converted dataset to dummy flag variables
 Load the dataset into SPSS environment
 Using data audit node, the matrix has 169 columns (corresponding to 169 item categories) and
9835 rows (corresponding to 9835 transactions)
 Apply A-priori modelling node with 5% support and 30% confidence and lift parameters to
generate association rules
Top 12 most frequent items:
Results: Top 12 rules by “support”:
Consequent Antecedent Support % Confidence % Lift
other
vegetables
whole milk 25.310 30.300 1.568
whole milk other vegetables 19.318 39.698 1.568
whole milk rolls/buns 18.443 31.542 1.246
other
vegetables
yogurt 14.011 32.570 1.686
whole milk yogurt 14.011 39.646 1.566
whole milk bottled water 11.270 30.789 1.216
other
vegetables
root vegetables 10.832 44.280 2.292
whole milk root vegetables 10.832 45.087 1.781
2513
1903 1809 1715
1372
1087 1072 1032 969 924 875 814
0
500
1000
1500
2000
2500
3000
Top 12 most frequent items
other
vegetables
tropical fruit 10.395 33.801 1.750
whole milk tropical fruit 10.395 39.130 1.546
Results: Top 12 rules by “confidence”:
Consequent Antecedent Support % Confidence % Lift
whole milk butter 5.701 49.616 1.960
whole milk curd 5.642 48.320 1.909
whole milk domestic eggs 6.459 47.856 1.891
whole milk root vegetables 10.832 45.087 1.781
other
vegetables
root vegetables 10.832 44.280 2.292
whole milk whipped/sour cream 7.333 43.936 1.736
other
vegetables
yogurt and whole milk 5.555 43.045 2.228
whole milk beef 5.351 40.872 1.615
whole milk margarine 6.211 40.845 1.614
other
vegetables
whipped/sour cream 7.333 40.755 2.110
Results: Top 12 rules by “lift”:
Consequent Antecedent Support % Confidence % Lift
root vegetables beef 5.351 33.243 3.069
root vegetables other vegetables and whole milk 7.669 31.749 2.931
yogurt curd 5.642 34.884 2.490
other
vegetables
root vegetables 10.832 44.280 2.292
other
vegetables
yogurt and whole milk 5.555 43.045 2.228
yogurt other vegetables and whole milk 7.669 31.179 2.225
other
vegetables
whipped/sour cream 7.333 40.755 2.110
other
vegetables
pork 5.846 38.155 1.975
other
vegetables
beef 5.351 38.147 1.975
whole milk butter 5.701 49.616 1.960
Web:
➔ We can observe that those who buys pastry, citrus fruit & sausage are a group of customers
stand out
➔ It does mean that (here, for example), a customer is more likely to buy any of these three
products if he/she buys one pf those three
Discussion:
 We can see that the top rules when sorted by “support” and “confidence” are dominated by
“whole milk” and “other vegetables”, which are the two most frequently bought items overall
 However, when “lift” is considered we get rules not involving “whole milk” and “other
vegetables”. A lift value of greater than 1 implies that LHS and RHS sets are found more often
than purely by chance
 Although such market basket analysis may yield many rules, not all of them would be useful.
Some would be trivial, some inexplicable and only a very few of them would be useful. Further
analysis and extra domain knowledge and common-sense are often required to subjectively
judge the real-world usefulness of the rules
References:
 Dataset download link (via “arules” package) http://cran.r-
project.org/web/packages/arules/index.html
 "Fast algorithms for mining association rule", in Proceedings of the 20th International
Conference on Very Large Databases, pp. 487-499, by R. Agrawal, and R.Srikant, (1994).
 “Implications of probabilistic data modelling for mining association rules” , in Studies in
Classification, Data Analysis, and Knowledge Organization: from Data and Information Analysis
to Knowledge Engineering, pp. 598–605, by M. Hahsler, K. Hornik, and T. Reutterer, (2006).
 “Machine Learning with R”, Brett Lantz, Packt Publishing

More Related Content

What's hot

KNN Algorithm Using R | Edureka
KNN Algorithm Using R | EdurekaKNN Algorithm Using R | Edureka
KNN Algorithm Using R | EdurekaEdureka!
 
Data mining in market basket analysis
Data mining in market basket analysisData mining in market basket analysis
Data mining in market basket analysisTanmayeeMandala
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data MiningKamal Acharya
 
Association Rule Mining in Data Mining
Association Rule Mining in Data Mining Association Rule Mining in Data Mining
Association Rule Mining in Data Mining Ayesha Ali
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5NIMMYRAJU
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningUtkarsh Sharma
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule miningDeepa Jeya
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERINGsingh7599
 
Mining Association Rules in Large Database
Mining Association Rules in Large DatabaseMining Association Rules in Large Database
Mining Association Rules in Large DatabaseEr. Nawaraj Bhandari
 
Data Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trendData Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trendSalah Amean
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxnikshaikh786
 

What's hot (20)

KNN Algorithm Using R | Edureka
KNN Algorithm Using R | EdurekaKNN Algorithm Using R | Edureka
KNN Algorithm Using R | Edureka
 
Data mining in market basket analysis
Data mining in market basket analysisData mining in market basket analysis
Data mining in market basket analysis
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Decision tree
Decision treeDecision tree
Decision tree
 
Association Rule Mining in Data Mining
Association Rule Mining in Data Mining Association Rule Mining in Data Mining
Association Rule Mining in Data Mining
 
Association rules apriori algorithm
Association rules   apriori algorithmAssociation rules   apriori algorithm
Association rules apriori algorithm
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Meat Analogue
Meat AnalogueMeat Analogue
Meat Analogue
 
Association rules
Association rulesAssociation rules
Association rules
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 
Clustering
ClusteringClustering
Clustering
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Mining Association Rules in Large Database
Mining Association Rules in Large DatabaseMining Association Rules in Large Database
Mining Association Rules in Large Database
 
Data Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trendData Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trend
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
 

Similar to Market basket analysis using apriori algorithm on

Association Mining
Association Mining Association Mining
Association Mining Edureka!
 
2015 06-24 precision dairy farming
2015 06-24 precision dairy farming2015 06-24 precision dairy farming
2015 06-24 precision dairy farmingHenk Hogeveen
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...Smarten Augmented Analytics
 
Fact Sheet Find a contact near you by visiting www.gew.docx
Fact Sheet Find a contact near you by visiting www.gew.docxFact Sheet Find a contact near you by visiting www.gew.docx
Fact Sheet Find a contact near you by visiting www.gew.docxmydrynan
 
Final Year Capstone Project
Final Year Capstone ProjectFinal Year Capstone Project
Final Year Capstone ProjectOmar Ziena
 
Ifc handbook agro_supplychains
Ifc handbook agro_supplychainsIfc handbook agro_supplychains
Ifc handbook agro_supplychainsDr Lendy Spires
 
The Internet of Food and Farm
The Internet of Food and FarmThe Internet of Food and Farm
The Internet of Food and FarmSjaak Wolfert
 
Organic skim milk market
Organic skim milk marketOrganic skim milk market
Organic skim milk marketCillianMurphy7
 
Nationwide Interoperability Roadmap draft version 1.0
Nationwide Interoperability Roadmap draft version 1.0Nationwide Interoperability Roadmap draft version 1.0
Nationwide Interoperability Roadmap draft version 1.0Ed Dodds
 
Health care interoperability roadmap released by HHS ONC
Health care interoperability roadmap released by HHS ONCHealth care interoperability roadmap released by HHS ONC
Health care interoperability roadmap released by HHS ONCDavid Sweigert
 
EHCI Euro Health Consumer Index 2014
EHCI Euro Health Consumer Index 2014EHCI Euro Health Consumer Index 2014
EHCI Euro Health Consumer Index 2014Emergency Live
 
How Can We Use Big Data in the Food Supply Chain
How Can We Use Big Data in the Food Supply Chain How Can We Use Big Data in the Food Supply Chain
How Can We Use Big Data in the Food Supply Chain EtQ, Inc.
 
Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...
Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...
Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...Dr Dev Kambhampati
 
Control points for good manufacturing practices on dairy farms
Control points for good manufacturing practices on dairy farmsControl points for good manufacturing practices on dairy farms
Control points for good manufacturing practices on dairy farmsEng. A.karam Al Malkawi
 
báo cáo xuất khẩu điều
báo cáo xuất khẩu điềubáo cáo xuất khẩu điều
báo cáo xuất khẩu điềujackiela
 

Similar to Market basket analysis using apriori algorithm on (20)

Association Mining
Association Mining Association Mining
Association Mining
 
2015 06-24 precision dairy farming
2015 06-24 precision dairy farming2015 06-24 precision dairy farming
2015 06-24 precision dairy farming
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
 
Fact Sheet Find a contact near you by visiting www.gew.docx
Fact Sheet Find a contact near you by visiting www.gew.docxFact Sheet Find a contact near you by visiting www.gew.docx
Fact Sheet Find a contact near you by visiting www.gew.docx
 
Final Year Capstone Project
Final Year Capstone ProjectFinal Year Capstone Project
Final Year Capstone Project
 
Ifc handbook agro_supplychains
Ifc handbook agro_supplychainsIfc handbook agro_supplychains
Ifc handbook agro_supplychains
 
The Internet of Food and Farm
The Internet of Food and FarmThe Internet of Food and Farm
The Internet of Food and Farm
 
Organic skim milk market
Organic skim milk marketOrganic skim milk market
Organic skim milk market
 
Nationwide Interoperability Roadmap draft version 1.0
Nationwide Interoperability Roadmap draft version 1.0Nationwide Interoperability Roadmap draft version 1.0
Nationwide Interoperability Roadmap draft version 1.0
 
Health care interoperability roadmap released by HHS ONC
Health care interoperability roadmap released by HHS ONCHealth care interoperability roadmap released by HHS ONC
Health care interoperability roadmap released by HHS ONC
 
Dairt report
Dairt reportDairt report
Dairt report
 
View Online Catalogs Of Dairy Equipments | NETCO | www.milkanalyser.com
View Online Catalogs Of Dairy Equipments | NETCO | www.milkanalyser.comView Online Catalogs Of Dairy Equipments | NETCO | www.milkanalyser.com
View Online Catalogs Of Dairy Equipments | NETCO | www.milkanalyser.com
 
EHCI Euro Health Consumer Index 2014
EHCI Euro Health Consumer Index 2014EHCI Euro Health Consumer Index 2014
EHCI Euro Health Consumer Index 2014
 
Dairy productioncostinpakistan2019
Dairy productioncostinpakistan2019Dairy productioncostinpakistan2019
Dairy productioncostinpakistan2019
 
BCP
BCPBCP
BCP
 
Data mining arm-2009-v0
Data mining arm-2009-v0Data mining arm-2009-v0
Data mining arm-2009-v0
 
How Can We Use Big Data in the Food Supply Chain
How Can We Use Big Data in the Food Supply Chain How Can We Use Big Data in the Food Supply Chain
How Can We Use Big Data in the Food Supply Chain
 
Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...
Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...
Dr Dev Kambhampati | USAID- Livestock Market Development- End Market Analysis...
 
Control points for good manufacturing practices on dairy farms
Control points for good manufacturing practices on dairy farmsControl points for good manufacturing practices on dairy farms
Control points for good manufacturing practices on dairy farms
 
báo cáo xuất khẩu điều
báo cáo xuất khẩu điềubáo cáo xuất khẩu điều
báo cáo xuất khẩu điều
 

Recently uploaded

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 

Market basket analysis using apriori algorithm on

  • 1. Market Basket Analysis using Apriori algorithm on “Groceries” dataset Submitted By: MadhuKiran P C20-085 Sai Vinod P C20-131 Sesha Sai Harsha C20-142
  • 2. Contents Overview:................................................................................................................................................3 Apriori algorithm:....................................................................................................................................3 The data: .................................................................................................................................................4 Transformed data to dummy flag variables:...........................................................................................4 Program flow: .........................................................................................................................................5 Top 12 most frequent items: ..................................................................................................................5 Results: Top 12 rules by “support”: ........................................................................................................5 Results: Top 12 rules by “confidence”:...................................................................................................6 Results: Top 12 rules by “lift”: ................................................................................................................6 Web:........................................................................................................................................................7 Discussion: ..............................................................................................................................................7 References: .............................................................................................................................................7
  • 3. Overview:  Identifies frequently purchased groceries from given transactional data  Implemented SPSS Modeler A-priori modelling node to calculate support, confidence and lift for association rules  Listed top 12 frequent bought items, top 10 combinations by support, confidence and lift values. Apriori algorithm:  Apriori algorithm employs a simple a priori belief as guideline for reducing the association rule search space: all subsets of a frequent item-set must also be frequent  The support of an item-set or rule measures how frequently it occurs in the data  A rule's confidence is a measurement of its predictive power or accuracy. It is defined as the support of the item-set containing both X and Y divided by the support of the item-set containing only X  Lift is a measure of how much more likely one item is to be purchased relative to its typical purchase rate, given that you know another item has been purchased
  • 4. The data: citrus fruit semi-finished bread margarine ready soups tropical fruit yogurt coffee whole milk pip fruit yogurt cream cheese meat spreads other vegetables whole milk condensed milk long life bakery product whole milk butter yogurt rice abrasive cleaner rolls/buns other vegetables UHT milk rolls/buns bottled beer liquor (appetizer) potted plants whole milk cereals tropical fruit other vegetables white bread bottled water chocolate citrus fruit tropical fruit whole milk butter curd beef frankfurter rolls/buns soda  The dataset has been created by researchers Department of Information Systems and Operations, Wirtschaftsuniversitat Wien, Austria  The “Groceries” data set contains 1 month (30 days) of real-world point-of-sale transaction data from a typical local grocery outlet. The data set contains 9835 transactions and the items are aggregated to 169 categories  Item categories have been used instead of brands, for simplicity. So “milk” can refer to any brand of milk. Transformed data to dummy flag variables: citrus fruit tropical fruit whole milk pip fruit other vegetables rolls/buns potted plants beef 1 1 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 3 0 0 1 0 0 0 0 0 4 0 0 0 1 0 0 0 0 5 0 0 1 0 1 0 0 0 6 0 0 1 0 0 0 0 0 7 0 0 0 0 0 1 0 0 8 0 0 0 0 1 1 0 0 9 0 0 0 0 0 0 1 0 10 0 0 1 0 0 0 0 0 11 0 1 0 0 1 0 0 0 12 1 1 1 0 0 0 0 0 13 0 0 0 0 0 0 0 1
  • 5. Program flow:  Converted dataset to dummy flag variables  Load the dataset into SPSS environment  Using data audit node, the matrix has 169 columns (corresponding to 169 item categories) and 9835 rows (corresponding to 9835 transactions)  Apply A-priori modelling node with 5% support and 30% confidence and lift parameters to generate association rules Top 12 most frequent items: Results: Top 12 rules by “support”: Consequent Antecedent Support % Confidence % Lift other vegetables whole milk 25.310 30.300 1.568 whole milk other vegetables 19.318 39.698 1.568 whole milk rolls/buns 18.443 31.542 1.246 other vegetables yogurt 14.011 32.570 1.686 whole milk yogurt 14.011 39.646 1.566 whole milk bottled water 11.270 30.789 1.216 other vegetables root vegetables 10.832 44.280 2.292 whole milk root vegetables 10.832 45.087 1.781 2513 1903 1809 1715 1372 1087 1072 1032 969 924 875 814 0 500 1000 1500 2000 2500 3000 Top 12 most frequent items
  • 6. other vegetables tropical fruit 10.395 33.801 1.750 whole milk tropical fruit 10.395 39.130 1.546 Results: Top 12 rules by “confidence”: Consequent Antecedent Support % Confidence % Lift whole milk butter 5.701 49.616 1.960 whole milk curd 5.642 48.320 1.909 whole milk domestic eggs 6.459 47.856 1.891 whole milk root vegetables 10.832 45.087 1.781 other vegetables root vegetables 10.832 44.280 2.292 whole milk whipped/sour cream 7.333 43.936 1.736 other vegetables yogurt and whole milk 5.555 43.045 2.228 whole milk beef 5.351 40.872 1.615 whole milk margarine 6.211 40.845 1.614 other vegetables whipped/sour cream 7.333 40.755 2.110 Results: Top 12 rules by “lift”: Consequent Antecedent Support % Confidence % Lift root vegetables beef 5.351 33.243 3.069 root vegetables other vegetables and whole milk 7.669 31.749 2.931 yogurt curd 5.642 34.884 2.490 other vegetables root vegetables 10.832 44.280 2.292 other vegetables yogurt and whole milk 5.555 43.045 2.228 yogurt other vegetables and whole milk 7.669 31.179 2.225 other vegetables whipped/sour cream 7.333 40.755 2.110 other vegetables pork 5.846 38.155 1.975 other vegetables beef 5.351 38.147 1.975 whole milk butter 5.701 49.616 1.960
  • 7. Web: ➔ We can observe that those who buys pastry, citrus fruit & sausage are a group of customers stand out ➔ It does mean that (here, for example), a customer is more likely to buy any of these three products if he/she buys one pf those three Discussion:  We can see that the top rules when sorted by “support” and “confidence” are dominated by “whole milk” and “other vegetables”, which are the two most frequently bought items overall  However, when “lift” is considered we get rules not involving “whole milk” and “other vegetables”. A lift value of greater than 1 implies that LHS and RHS sets are found more often than purely by chance  Although such market basket analysis may yield many rules, not all of them would be useful. Some would be trivial, some inexplicable and only a very few of them would be useful. Further analysis and extra domain knowledge and common-sense are often required to subjectively judge the real-world usefulness of the rules References:  Dataset download link (via “arules” package) http://cran.r- project.org/web/packages/arules/index.html  "Fast algorithms for mining association rule", in Proceedings of the 20th International Conference on Very Large Databases, pp. 487-499, by R. Agrawal, and R.Srikant, (1994).  “Implications of probabilistic data modelling for mining association rules” , in Studies in Classification, Data Analysis, and Knowledge Organization: from Data and Information Analysis to Knowledge Engineering, pp. 598–605, by M. Hahsler, K. Hornik, and T. Reutterer, (2006).  “Machine Learning with R”, Brett Lantz, Packt Publishing