SlideShare a Scribd company logo
1 of 16
MARKET BASKET ANALYSIS
LEARNING OBJECTIVES:
• EXPLAINWHAT ASSOCIATION RULESAND ITEM SETSARE.
• DESCRIBETHE BASIC PROCESS FOR MARKET BASKETANALYSIS.
• KNOWWHENTOAPPLY MARKET BASKETANALYSIS.
• UNDERSTANDTHE STRENGTHSANDWEAKNESSESOF MARKET BASKETANALYSIS.
By Obakeng Brian Pheelwane & Marc Berman – Group 14
ASSOCIATION RULES
• Association discovery is to find items that imply the presence of other items in the same transaction.
• Association rules are in the form:
If <Left hand Side (LHS)> then <Right hand Side(RHS)>
• To indicate the validity and importance of a rule, each rule has two parameters:
• Support Factor
• Confidence Factor.
• RHS usually has one item; LHS has one or more items.
EXAMPLES OF ASSOCIATION RULES
In a database of transactions of two items (X,Y) in a departmental store, an example association rule is:
Thus if a customer buys X (which occurs in 70% of the cases studied) then he/she will also buyY.This
occurs 20% of all purchases made at the departmental store.
Therefore, the rule has a 70% confidence factor and 20% support factor.
CONFIDENCE AND SUPPORT FACTORS
• Assume we have an association rule indicated as LHS -> RHS.
• T is the total number of cases in the database.
• X is the number of cases covered by the LHS.
• Y is the number of cases covered by the RHS.
• XY is the number of cases covered by both the LHS and the RHS,
indicated by the overlapping area in Figure 1.Figure 1: Confidence and support factors visualised
Figure 2: Confidence and support factor formula
• Confidence factor is calculated based upon the number of cases present
in both the left and right hand sides of the scenario, divided by the total
number of cases in the left hand side.
• Support factor is calculated based upon the number of cases present in
both the left and right hand sides of the scenario, divided by the total
number of cases in the database.
THE BASIC PROCESS OF MARKET BASKET ANALYSIS
1. Choosing the right item set.
• The objective is to define a set of items.When association rules are formed among these items, some
of the rules provide a meaningful interpretation that may lead to useful rules.
• Several methods to generating the right item sets:
• Use taxonomy to get the right level, range from general items to special items (see Figure 3)
• Use virtual items (see Figures 4 & 5)
• A combination of both
• The taxonomy and virtual items (to be prepared by the users or domain expert) become the means to
assist users to choose the right item set during the exploration to find useful rules.
Figure 3:Taxonomies start with the most general and move to increasing detail.
Figure 4:This is an example of poor choice of virtual items since the rules are likely to be
redundant.
The problem with this visualisation is the rules are just repeats of the definition.
Figure 5:This is an example of a good choice of virtual items, though one must be careful
to not totally encompass the items used for analysis as this would create redundancy
again.
BASIC PROCESS CONTINUED
2. Generating rules:
• The rule generation process involves generating the co-occurrence matrix, counting the frequencies of
co-occurrence between n items in the item set.
• To generate a rule of n item of the form:
If X1, X2,…,X(n-1)Then Xn
A co-occurrence matrix of n items is required.
Number of items on LHS Total number of combinations
1 100
2 4,950
3 161,700
4 3,921,225
5 75,287,520
6 1,192,052,400
7 16,007,560,800
8 186,087,894,300
Figure 5:This is a computationally expensive process, especially when a
large data set is present.
BASIC PROCESS CONTINUED
3. Identifying useful rules that are unknown, valid and actionable.
• First, specify the threshold values for confidence factor and support factor to filter out rules which are
not supported by the data automatically by the rule generation algorithm.
• Second, human judgement is required to identify the interestingness, validity and actionability of the
rules which have sifted through the automatic filter.
WHENTO APPLY MARKET BASKET ANALYSIS
• Problems that consist of well-defined items that group together in potentially interesting ways.
• Time-series problems that can be adapted for market basket analysis by relatively simple data
transformations.
STRENGTHS ANDWEAKNESSES
Strengths:
• Clear and understandable results
• Support undirected data mining
• Work on variable-length data
• Simple computational process
Weaknesses:
• Computation increases exponentially as
• the problem size grows.
• Limited support for attributes on the data.
• Difficult to determine the right number of items.
• It discounts rare items.
DISSOCIATION RULES
• Similar to association rules except that a negation “NOT” is used to an item. An example of
dissociation rule is:
• If X and notY then Z.
Problems with dissociation rules:
1. Doubling the items significantly slows down the runtime
2. The size of transactions grows because it includes inverted items
3. Tend to produce rules in which all items are inverted because the frequency of the inverted items are
usually much larger.
WHATWE HAVE LEARNED:
• We have learned about association and dissociation rules.
• How to generate more specialised items using taxonomy and virtual items.
• When to apply Market Basket Analysis
• Finally, the strengths and weaknesses of Market Basket Analysis
REVIEW QUESTIONS
1. Discuss the similarities and differences between a decision rule and an association rule in terms of rule structure and
how it is used.
Decision rule (Separate-and-conquer)
Decision rules are closely related to decision trees.The terminal nodes of a tree can be grouped into rules. Attempts to find a
partial solution for a part of a problem. Looking for the optimal solution to the problem
How it is used:
- One partial solution in each step
Association rule
An association rule does not have a target. It finds all rules that exist in data. Attempts to find a full set of solutions of a problem.
Looking for the optimal solution to the problem.
How it is used:
- Multiple combinations in each step
2. Discuss the due caution one should have when applying association rules. Relate your explanation to
the definition of data mining: Data mining is a process of extracting previously unknown, valid, and
actionable information from large databases and then using the information to make crucial decisions.
REVIEW QUESTIONS
3. Compare the model selection process in predictive modelling with the similar process in market basket
analysis. Answer the following questions in your comparison:
i. What is a model?
A model in predictive modelling tasks is one built to make prediction for unseen data.
E.g. the trained model is used to make a positive or negative diagnosis about a disease for a new patient.
A model in market basket analysis is in the form of a set of rules that describes the association between
attributes and they are not meant for prediction.
ii. How do the model selection processes differ?
The model selection process in predictive modelling is guided by maximising a measure determined
during the problem definition step, this process can be carried out objectively.
The model selection process in the market basket analysis is more subjective, although a few measures
can be used to reduce the set of candidate rules.
REVIEW QUESTIONS

More Related Content

What's hot

Analytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive TechniquesAnalytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive Techniquesleadershipsoil
 
Prescriptive Analytics
Prescriptive AnalyticsPrescriptive Analytics
Prescriptive AnalyticsŁukasz Grala
 
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
DI&A Slides: Descriptive, Prescriptive, and Predictive AnalyticsDI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
DI&A Slides: Descriptive, Prescriptive, and Predictive AnalyticsDATAVERSITY
 
Machine Learning for Sales & Marketing
Machine Learning for Sales & MarketingMachine Learning for Sales & Marketing
Machine Learning for Sales & MarketingPiyush Saggi
 
Application of predictive analytics
Application of predictive analyticsApplication of predictive analytics
Application of predictive analyticsPrasad Narasimhan
 
BSI Teradata: The Case of the Dropped Mobile Calls
BSI Teradata: The Case of the Dropped Mobile CallsBSI Teradata: The Case of the Dropped Mobile Calls
BSI Teradata: The Case of the Dropped Mobile CallsTeradata
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To AnalyticsAlex Meadows
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and ClusteringEng Teong Cheah
 
Synthetic data generation for machine learning
Synthetic data generation for machine learningSynthetic data generation for machine learning
Synthetic data generation for machine learningQuantUniversity
 
Flipkart marketplace overview
Flipkart marketplace overviewFlipkart marketplace overview
Flipkart marketplace overviewSellOnFlipkart
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
Data Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trendData Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trendSalah Amean
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopVarunSahdev2
 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesSindhujanDhayalan
 

What's hot (20)

Analytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive TechniquesAnalytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive Techniques
 
Prescriptive Analytics
Prescriptive AnalyticsPrescriptive Analytics
Prescriptive Analytics
 
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
DI&A Slides: Descriptive, Prescriptive, and Predictive AnalyticsDI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
 
Machine Learning for Sales & Marketing
Machine Learning for Sales & MarketingMachine Learning for Sales & Marketing
Machine Learning for Sales & Marketing
 
Application of predictive analytics
Application of predictive analyticsApplication of predictive analytics
Application of predictive analytics
 
BSI Teradata: The Case of the Dropped Mobile Calls
BSI Teradata: The Case of the Dropped Mobile CallsBSI Teradata: The Case of the Dropped Mobile Calls
BSI Teradata: The Case of the Dropped Mobile Calls
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
Synthetic data generation for machine learning
Synthetic data generation for machine learningSynthetic data generation for machine learning
Synthetic data generation for machine learning
 
Data mining
Data miningData mining
Data mining
 
Flipkart marketplace overview
Flipkart marketplace overviewFlipkart marketplace overview
Flipkart marketplace overview
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Data Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trendData Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trend
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery Shop
 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniques
 
Data reduction
Data reductionData reduction
Data reduction
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 

Viewers also liked

Real-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with HadoopReal-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with HadoopDataWorks Summit
 
Data mining- Association Analysis -market basket
Data mining- Association Analysis -market basketData mining- Association Analysis -market basket
Data mining- Association Analysis -market basketSwapnil Soni
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop Jongwook Woo
 
Increasing Order Size With Basket Analysis
Increasing Order Size With Basket AnalysisIncreasing Order Size With Basket Analysis
Increasing Order Size With Basket AnalysisEmcien Corporation
 
Mining Fuzzy Moving Object Clusters
Mining Fuzzy Moving Object ClustersMining Fuzzy Moving Object Clusters
Mining Fuzzy Moving Object ClustersNhatHai Phan
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
 
PhD Defense -- Ashish Mangalampalli
PhD Defense -- Ashish MangalampalliPhD Defense -- Ashish Mangalampalli
PhD Defense -- Ashish MangalampalliAshish Mangalampalli
 
Luftwaffe in pictures
Luftwaffe in picturesLuftwaffe in pictures
Luftwaffe in picturesMarius Bujor
 
Python programming advance lab api npr 2
Python programming advance lab api npr  2Python programming advance lab api npr  2
Python programming advance lab api npr 2profbnk
 
Why Awareness of Cognitive Dissonance Is So Elusive
Why Awareness of Cognitive Dissonance Is So ElusiveWhy Awareness of Cognitive Dissonance Is So Elusive
Why Awareness of Cognitive Dissonance Is So ElusiveEaron Davis
 
26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-MeansAndres Mendez-Vazquez
 
Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing
Market Basket Analysis Algorithm with Map/Reduce of Cloud ComputingMarket Basket Analysis Algorithm with Map/Reduce of Cloud Computing
Market Basket Analysis Algorithm with Map/Reduce of Cloud ComputingJongwook Woo
 
Market basketanalysis using r
Market basketanalysis using rMarket basketanalysis using r
Market basketanalysis using rYogesh Khandelwal
 

Viewers also liked (18)

Real-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with HadoopReal-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with Hadoop
 
Data mining- Association Analysis -market basket
Data mining- Association Analysis -market basketData mining- Association Analysis -market basket
Data mining- Association Analysis -market basket
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Increasing Order Size With Basket Analysis
Increasing Order Size With Basket AnalysisIncreasing Order Size With Basket Analysis
Increasing Order Size With Basket Analysis
 
Mining Fuzzy Moving Object Clusters
Mining Fuzzy Moving Object ClustersMining Fuzzy Moving Object Clusters
Mining Fuzzy Moving Object Clusters
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
PhD Defense -- Ashish Mangalampalli
PhD Defense -- Ashish MangalampalliPhD Defense -- Ashish Mangalampalli
PhD Defense -- Ashish Mangalampalli
 
Luftwaffe in pictures
Luftwaffe in picturesLuftwaffe in pictures
Luftwaffe in pictures
 
Python programming advance lab api npr 2
Python programming advance lab api npr  2Python programming advance lab api npr  2
Python programming advance lab api npr 2
 
Ijcatr04051004
Ijcatr04051004Ijcatr04051004
Ijcatr04051004
 
Why Awareness of Cognitive Dissonance Is So Elusive
Why Awareness of Cognitive Dissonance Is So ElusiveWhy Awareness of Cognitive Dissonance Is So Elusive
Why Awareness of Cognitive Dissonance Is So Elusive
 
26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means
 
Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing
Market Basket Analysis Algorithm with Map/Reduce of Cloud ComputingMarket Basket Analysis Algorithm with Map/Reduce of Cloud Computing
Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing
 
Market basketanalysis using r
Market basketanalysis using rMarket basketanalysis using r
Market basketanalysis using r
 

Similar to Masket Basket Analysis

2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_RulesFEG
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptxAmenahAbbood
 
Cluster2
Cluster2Cluster2
Cluster2work
 
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxModule_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxHarshitGoel87
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...IOSR Journals
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET-  	  Minning Frequent Patterns,Associations and CorrelationsIRJET-  	  Minning Frequent Patterns,Associations and Correlations
IRJET- Minning Frequent Patterns,Associations and CorrelationsIRJET Journal
 
An Ontological Approach for Mining Association Rules from Transactional Dataset
An Ontological Approach for Mining Association Rules from Transactional DatasetAn Ontological Approach for Mining Association Rules from Transactional Dataset
An Ontological Approach for Mining Association Rules from Transactional DatasetIJERA Editor
 
Analysis in Action 21 September 2021
Analysis in Action 21 September 2021Analysis in Action 21 September 2021
Analysis in Action 21 September 2021IIBA UK Chapter
 
Paper id 212014126
Paper id 212014126Paper id 212014126
Paper id 212014126IJRAT
 
Exam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsExam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsHarsh Parekh
 
A literature review of modern association rule mining techniques
A literature review of modern association rule mining techniquesA literature review of modern association rule mining techniques
A literature review of modern association rule mining techniquesijctet
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIJSRD
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptxmaha797959
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesDerek Kane
 
Module 3 Identifying fraud in forensic analysis.pptx
Module 3 Identifying fraud in forensic analysis.pptxModule 3 Identifying fraud in forensic analysis.pptx
Module 3 Identifying fraud in forensic analysis.pptxIqbalAli61
 

Similar to Masket Basket Analysis (20)

2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
Cluster2
Cluster2Cluster2
Cluster2
 
Unit 4_ML.pptx
Unit 4_ML.pptxUnit 4_ML.pptx
Unit 4_ML.pptx
 
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxModule_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET-  	  Minning Frequent Patterns,Associations and CorrelationsIRJET-  	  Minning Frequent Patterns,Associations and Correlations
IRJET- Minning Frequent Patterns,Associations and Correlations
 
An Ontological Approach for Mining Association Rules from Transactional Dataset
An Ontological Approach for Mining Association Rules from Transactional DatasetAn Ontological Approach for Mining Association Rules from Transactional Dataset
An Ontological Approach for Mining Association Rules from Transactional Dataset
 
Analysis in Action 21 September 2021
Analysis in Action 21 September 2021Analysis in Action 21 September 2021
Analysis in Action 21 September 2021
 
Paper id 212014126
Paper id 212014126Paper id 212014126
Paper id 212014126
 
Exam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsExam Short Preparation on Data Analytics
Exam Short Preparation on Data Analytics
 
A literature review of modern association rule mining techniques
A literature review of modern association rule mining techniquesA literature review of modern association rule mining techniques
A literature review of modern association rule mining techniques
 
Ae32208215
Ae32208215Ae32208215
Ae32208215
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 
IJCS_37_4_06
IJCS_37_4_06IJCS_37_4_06
IJCS_37_4_06
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
 
Module 3 Identifying fraud in forensic analysis.pptx
Module 3 Identifying fraud in forensic analysis.pptxModule 3 Identifying fraud in forensic analysis.pptx
Module 3 Identifying fraud in forensic analysis.pptx
 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Masket Basket Analysis

  • 1. MARKET BASKET ANALYSIS LEARNING OBJECTIVES: • EXPLAINWHAT ASSOCIATION RULESAND ITEM SETSARE. • DESCRIBETHE BASIC PROCESS FOR MARKET BASKETANALYSIS. • KNOWWHENTOAPPLY MARKET BASKETANALYSIS. • UNDERSTANDTHE STRENGTHSANDWEAKNESSESOF MARKET BASKETANALYSIS. By Obakeng Brian Pheelwane & Marc Berman – Group 14
  • 2. ASSOCIATION RULES • Association discovery is to find items that imply the presence of other items in the same transaction. • Association rules are in the form: If <Left hand Side (LHS)> then <Right hand Side(RHS)> • To indicate the validity and importance of a rule, each rule has two parameters: • Support Factor • Confidence Factor. • RHS usually has one item; LHS has one or more items.
  • 3. EXAMPLES OF ASSOCIATION RULES In a database of transactions of two items (X,Y) in a departmental store, an example association rule is: Thus if a customer buys X (which occurs in 70% of the cases studied) then he/she will also buyY.This occurs 20% of all purchases made at the departmental store. Therefore, the rule has a 70% confidence factor and 20% support factor.
  • 4. CONFIDENCE AND SUPPORT FACTORS • Assume we have an association rule indicated as LHS -> RHS. • T is the total number of cases in the database. • X is the number of cases covered by the LHS. • Y is the number of cases covered by the RHS. • XY is the number of cases covered by both the LHS and the RHS, indicated by the overlapping area in Figure 1.Figure 1: Confidence and support factors visualised Figure 2: Confidence and support factor formula • Confidence factor is calculated based upon the number of cases present in both the left and right hand sides of the scenario, divided by the total number of cases in the left hand side. • Support factor is calculated based upon the number of cases present in both the left and right hand sides of the scenario, divided by the total number of cases in the database.
  • 5. THE BASIC PROCESS OF MARKET BASKET ANALYSIS 1. Choosing the right item set. • The objective is to define a set of items.When association rules are formed among these items, some of the rules provide a meaningful interpretation that may lead to useful rules. • Several methods to generating the right item sets: • Use taxonomy to get the right level, range from general items to special items (see Figure 3) • Use virtual items (see Figures 4 & 5) • A combination of both • The taxonomy and virtual items (to be prepared by the users or domain expert) become the means to assist users to choose the right item set during the exploration to find useful rules.
  • 6. Figure 3:Taxonomies start with the most general and move to increasing detail.
  • 7. Figure 4:This is an example of poor choice of virtual items since the rules are likely to be redundant. The problem with this visualisation is the rules are just repeats of the definition. Figure 5:This is an example of a good choice of virtual items, though one must be careful to not totally encompass the items used for analysis as this would create redundancy again.
  • 8. BASIC PROCESS CONTINUED 2. Generating rules: • The rule generation process involves generating the co-occurrence matrix, counting the frequencies of co-occurrence between n items in the item set. • To generate a rule of n item of the form: If X1, X2,…,X(n-1)Then Xn A co-occurrence matrix of n items is required. Number of items on LHS Total number of combinations 1 100 2 4,950 3 161,700 4 3,921,225 5 75,287,520 6 1,192,052,400 7 16,007,560,800 8 186,087,894,300 Figure 5:This is a computationally expensive process, especially when a large data set is present.
  • 9. BASIC PROCESS CONTINUED 3. Identifying useful rules that are unknown, valid and actionable. • First, specify the threshold values for confidence factor and support factor to filter out rules which are not supported by the data automatically by the rule generation algorithm. • Second, human judgement is required to identify the interestingness, validity and actionability of the rules which have sifted through the automatic filter.
  • 10. WHENTO APPLY MARKET BASKET ANALYSIS • Problems that consist of well-defined items that group together in potentially interesting ways. • Time-series problems that can be adapted for market basket analysis by relatively simple data transformations.
  • 11. STRENGTHS ANDWEAKNESSES Strengths: • Clear and understandable results • Support undirected data mining • Work on variable-length data • Simple computational process Weaknesses: • Computation increases exponentially as • the problem size grows. • Limited support for attributes on the data. • Difficult to determine the right number of items. • It discounts rare items.
  • 12. DISSOCIATION RULES • Similar to association rules except that a negation “NOT” is used to an item. An example of dissociation rule is: • If X and notY then Z. Problems with dissociation rules: 1. Doubling the items significantly slows down the runtime 2. The size of transactions grows because it includes inverted items 3. Tend to produce rules in which all items are inverted because the frequency of the inverted items are usually much larger.
  • 13. WHATWE HAVE LEARNED: • We have learned about association and dissociation rules. • How to generate more specialised items using taxonomy and virtual items. • When to apply Market Basket Analysis • Finally, the strengths and weaknesses of Market Basket Analysis
  • 14. REVIEW QUESTIONS 1. Discuss the similarities and differences between a decision rule and an association rule in terms of rule structure and how it is used. Decision rule (Separate-and-conquer) Decision rules are closely related to decision trees.The terminal nodes of a tree can be grouped into rules. Attempts to find a partial solution for a part of a problem. Looking for the optimal solution to the problem How it is used: - One partial solution in each step Association rule An association rule does not have a target. It finds all rules that exist in data. Attempts to find a full set of solutions of a problem. Looking for the optimal solution to the problem. How it is used: - Multiple combinations in each step
  • 15. 2. Discuss the due caution one should have when applying association rules. Relate your explanation to the definition of data mining: Data mining is a process of extracting previously unknown, valid, and actionable information from large databases and then using the information to make crucial decisions. REVIEW QUESTIONS
  • 16. 3. Compare the model selection process in predictive modelling with the similar process in market basket analysis. Answer the following questions in your comparison: i. What is a model? A model in predictive modelling tasks is one built to make prediction for unseen data. E.g. the trained model is used to make a positive or negative diagnosis about a disease for a new patient. A model in market basket analysis is in the form of a set of rules that describes the association between attributes and they are not meant for prediction. ii. How do the model selection processes differ? The model selection process in predictive modelling is guided by maximising a measure determined during the problem definition step, this process can be carried out objectively. The model selection process in the market basket analysis is more subjective, although a few measures can be used to reduce the set of candidate rules. REVIEW QUESTIONS