SlideShare a Scribd company logo
Presented by: Derek Kane
 Association Rules
 Basic Terminology
 Support
 Confidence
 Lift
 Apriori Algorithm
 Practical Examples
 Grocery Shopping Basket Analysis
 Voting Patterns in the House of Representatives
 A series of methodologies for discovering
interesting relationships between variables in a
database.
 The outcome of this technique, in simple terms, is
a set of rules that can be understood as “if this,
then that”.
 An example of an association rule would be:
 If a person buys Peanut Butter and Bread then
they will also purchase Jelly.
Applications
 Product recommendation
 Digital Media recommendations
 Politics
 Medical diagnosis
 Content optimisation
 Bioinformatics
 Web mining
 Scientific data analysis
 Example: The analysis of earth science data may reveal interesting connections
among the ocean, land, and atmospheric processes. This may help scientists to
better understand how these systems interact with one another.
 For example, maybe people who buy flour and casting sugar, also tend to buy
eggs (because a high proportion of them are planning on baking a cake).
 A retailer can use this information to inform:
 Store layout (put products that co-occur together close to one another, to
improve the customer shopping experience).
 Marketing (e.g. target customers who buy flour with offers on eggs, to
encourage them to spend more on their shopping basket).
 Retailers can use these type of rules to identify new opportunities for cross
selling/upselling their products to their customers.
 Asked engineers and scientists around the world
to solve what might have seemed like a simple
problem: improve Netflix's ability to predict what
movies users would like by a modest 10%.
 From $5 million revenue in 1999 reached $3.2
billion revenue in 2011 as a result of becoming
an analytics competitor.
 By analyzing customer behavior and buying
patterns created a recommendation engine
which optimizes both customer tastes and
inventory condition.
 A rule is typically written in the following format: { i1, i2 } => { ik }
 The { i1, i2 } represents the left hand side, LHS, of the rule and the { ik }
represents the right hand side, RHS.
 This statement can be read as “if a user buys an item in the item set on the left
hand side, then the user will likely buy the item on the right hand side too”.
 A more human readable example is:
{coffee, sugar} => {milk}
 If a customer buys coffee and sugar, then they are also likely to buy milk.
 Before we can begin to employ association rules, we must first understand three
important ratios; the support, confidence and lift.
 Support: The fraction of which our item set occurs in our dataset.
 Confidence: Probability that a rule is correct for a new transaction with items on the
left.
 Lift: The ratio by which by the confidence of a rule exceeds the expected confidence.
 The support of an item or item set is the fraction of transactions in our data set
that contain that item or item set.
 Ex. A grocer has 15 transactions in total. Of which, Peanut Butter => Jelly appears
6 times. The support for this rule is 6 / 15 or 0.40.
 In general, it is nice to identify rules that have a high support, as these will be
applicable to a large number of transactions.
 Support is an important measure because a rule that has a low support may occur
simply by chance. A low support rule may also be uninteresting from a business
perspective because it may not be profitable to promote items that are seldom
bought together. For these reasons, support is often used to eliminate
uninteresting rules.
 For super market retailers, this is likely to involve basic products that are popular
across an entire user base (e.g. bread, milk). A printer cartridge retailer, for
example, may not have products with a high support, because each customer only
buys cartridges that are specific to his / her own printer.
 The confidence of a rule is the likelihood that it is true for a new transaction that
contains the items on the LHS of the rule. (I.e. it is the probability that the
transaction also contains the item(s) on the RHS.) Formally:
 Confidence(X -> Y) = Support(X ꓴ Y) / Support (X)
 Confidence(Peanut Butter => Jelly) = 0.26 / 0.40 = 0.65
 This means that for 65% of the transactions that contain Peanut Butter and Jelly,
the rule is correct.
 Confidence measures the reliability of the inference made by a given rule. For a
given rule X -> Y, the higher the confidence the more likely it is for Y to be present
in transactions that contain X.
 Association analysis results should be interpreted with caution. The inference
made by an association rule does not necessarily imply causality. Instead, it
suggests a strong co-occurance relationship between the items in the antecedent
and consequent of the rule.
 The lift of a rule is the ratio of the support of the items on the LHS of the rule co-
occuring with items on the RHS divided by probability that the LHS and RHS co-
occur if the two are independent.
 Lift (X -> Y) = Support(X ꓴ Y) / ( Support (Y) * Support (X) )
 Lift (Peanut Butter => Jelly) = 0.26 / (0.46 * 0.40) = 1.4
 If lift is greater than 1, it suggests that the presence of the items on the LHS has
increased the probability that the items on the right hand side will occur on this
transaction.
 If the lift is below 1, it suggests that the presence of the items on the LHS make the
probability that the items on the RHS will be part of the transaction lower.
 If the lift, is 1 it indicates that the items on the left and right are independent.
 When we perform market basket analysis, then, we are looking for rules with a lift
of more than one and preferably with a higher level of support.
 The Apriori algorithm is perhaps the best known algorithm to mine association rules.
 Apriori Theorem: “If an itemset is frequent, then all of its subsets must be also be frequent.”
 It uses a breadth first strategy to count the support of item sets and uses a candidate
generation function which exploits the downward closure property of support (anti-
monotonicity).
 The approach follows a 2 step process:
 First, minimum support is applied to find all frequent itemsets in the database.
 Second, these frequent itemsets and the minimum confidence constraint are used to form
rules.
 Finding all frequent itemsets in a database is difficult since it involves searching for
all item combinations. The set of possible item combinations is the power set over
I and has the size of 2n-1.
 The downward-closure property of support allows for efficient search and guarantees that for
a frequent itemset, all of its subsets are also frequent. Additionally, for an infrequent itemset, all
of its supersets must also be infrequent.
 Imagine 10000 receipts sitting on your
table. Each receipt represents a transaction
with items that were purchased. The
receipt is a representation of stuff that
went into a customer’s basket – and
therefore ‘Market Basket Analysis’.
 That is exactly what the Groceries Data Set
contains: a collection of receipts with each
line representing 1 receipt and the items
purchased. Each line is called a transaction
and each column in a row represents an
item.
 For each transaction, there can be only
distinct Item(s) without repeating entries.
This allows for us to create a binary (0,1)
representation whether a particular item
was purchased under a specific
transaction.
 The dataset will need to first be flipped across the horizontal axis in to a cross tabulation.
Notice that the “Items” are now the column headings. This preparation ensures that the
dataset can be read properly into the apriori market basket algorithm.
 The market basket analysis algorithm requires setting a threshold for detecting patterns in
the dataset.
 For this example, we specified a support value of 0.001 (due to the high volume of
receipts and large product offering) and a confidence level of 0.70.
 Additionally, we set the length of the rule not to exceed three elements. This
ensures that the we will have a maximum of 2 items on the LHS and that the
assessment will produce more meaningful insights at a tertiary glance.
 Different businesses will have distribution of items by transaction that look very
different, and so very different support and confidence parameters may be
applicable.
 To determine what works best, organizations need to experiment with different
parameters: as you reduce them, the number of rules generated will increase,
which will give us more to work with.
 However, we will need to sift through the rules more carefully to identify those
that will be more impactful for your business.
 There is no stead fast rule on where to begin so experiment with loose
parameters and go from there.
 To showcase how we can leverage this insight further lets focus in on 3 specific items of
interest:
 Yogurt
 Tropical Fruit
 Bottled Beer.
 We can run the algorithm (with the same thresholds) and specify that these terms are
used as criterion for the RHS of the ruleset generation.
 Now as criterion for the LHS of the ruleset generation.
 Before we use the data to make any kind
of business decision, it is important that
we take a step back and remember
something important:
 The output of the analysis reflects how
frequently items co-occur in transactions.
This is a function both of the strength of
association between the items, and the
way the business has presented them to
the customer.
 To say that in a different way: items might
co-occur not because they are “naturally”
connected, but because we, the people in
charge of the organization, have
presented them together.
 The market basket results can be used to drive targeted marketing campaigns.
 For each patron, we pick a handful of products based on products they have bought
to date which have both a high uplift and a high margin, and send them a e.g.
personalized email or display ads etc.
 How we use the analysis has significant implications for the analysis itself: if we are
feeding the analysis into a machine-driven process for delivering recommendations,
we are much more interested in generating an expansive set of rules.
 If, however, we are experimenting with targeted marketing for the first time, it makes
much more sense to pick a handful of particularly high value rules, and action just
them, before working out whether to invest in the effort of building out that capability
to manage a much wider and more complicated rule set.
 There are a number of ways we can use the data to drive site organization:
 Large clusters of co-occuring items should probably be placed in their own category /
theme.
 Item pairs that commonly co-occur should be placed close together within broader
categories on the website. This is especially important where one item in a pair is very
popular, and the other item is very high margin.
 Long lists of rules (including ones with low support and confidence) can be used to put
recommendations at the bottom of product pages and on product cart pages. The only
thing that matters for these rules is that the lift is greater than one. (And that we pick those
rules that are applicable for each product with the high lift where the product recommended
has a high margin.)
 In the event that doing the above (3) drives significant uplift in profit, it would strengthen the
case to invest in a recommendation system, that uses a similar algorithm in an operational
context to power automatic recommendation engine on your website.
 We will apply the results of
association analysis to the voting
records of members of the United
States House of Representatives. The
data is obtained from the 1984
Congressional Voting Records
Database, which is available at the
UCI machine learning data
repository.
 Each transaction contains
information about party affiliation
along with his or her voting record
on 16 key issues.
Observations:
 A vote in favor of the Physician Pay Freeze indicates a Republican (95%
confidence), a vote against indicates a Democrat (99% confidence).
 The voting patterns are relatively clear in this example with a high degree of
support and confidence. We can see which party favors a specific law without
knowing the contents of the legislation itself.
 Reside in Wayne, Illinois
 Active Semi-Professional Classical Musician
(Bassoon).
 Married my wife on 10/10/10 and been
together for 10 years.
 Pet Yorkshire Terrier / Toy Poodle named
Brunzie.
 Pet Maine Coons’ named Maximus Power and
Nemesis Gul du Cat.
 Enjoy Cooking, Hiking, Cycling, Kayaking, and
Astronomy.
 Self proclaimed Data Nerd and Technology
Lover.
Data Science - Part VI - Market Basket and Product Recommendation Engines

More Related Content

What's hot

Decision tree
Decision treeDecision tree
Decision tree
ShraddhaPandey45
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
Usama Fayyaz
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
Deepa Jeya
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
IJSRD
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
Krish_ver2
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Parth Khare
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
Hemant Chetwani
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Predictive Analytics - An Introduction
Predictive Analytics - An IntroductionPredictive Analytics - An Introduction
Predictive Analytics - An Introduction
Laguna State Polytechnic University
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
Salford Systems
 
Data mining PPT
Data mining PPTData mining PPT
Data mining PPTKapil Rode
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
T Kavitha
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
Nihar Ranjan
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
Mahendra Gupta
 
Linear regression
Linear regressionLinear regression
Linear regression
MartinHogg9
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
SOUMIT KAR
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Salah Amean
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 

What's hot (20)

Decision tree
Decision treeDecision tree
Decision tree
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Predictive Analytics - An Introduction
Predictive Analytics - An IntroductionPredictive Analytics - An Introduction
Predictive Analytics - An Introduction
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Data mining PPT
Data mining PPTData mining PPT
Data mining PPT
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 

Similar to Data Science - Part VI - Market Basket and Product Recommendation Engines

Unit 4_ML.pptx
Unit 4_ML.pptxUnit 4_ML.pptx
Unit 4_ML.pptx
SmithaRaj16
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
Geo Marian
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
hina firdaus
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
Wake Tech BAS
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdf
Jyoti Yadav
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery Shop
VarunSahdev2
 
What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)
Kumar P
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
AmenahAbbood
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
FEG
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
Smarten Augmented Analytics
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
researchinventy
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
researchinventy
 
Recommender system
Recommender systemRecommender system
Recommender system
Nilotpal Pramanik
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
Narayan Vyas
 
Association and Classification Algorithm
Association and Classification AlgorithmAssociation and Classification Algorithm
Association and Classification Algorithm
Medicaps University
 
Products Frequently Bought Together in Stores Using classificat...
Products Frequently Bought Together in Stores               Using classificat...Products Frequently Bought Together in Stores               Using classificat...
Products Frequently Bought Together in Stores Using classificat...
hibaziyad99
 
5Association AnalysisBasic Concepts an.docx
 5Association AnalysisBasic Concepts an.docx 5Association AnalysisBasic Concepts an.docx
5Association AnalysisBasic Concepts an.docx
ShiraPrater50
 
Data Mining Apriori Algorithm Implementation using R
Data Mining Apriori Algorithm Implementation using RData Mining Apriori Algorithm Implementation using R
Data Mining Apriori Algorithm Implementation using R
IRJET Journal
 
Cluster2
Cluster2Cluster2
Cluster2work
 

Similar to Data Science - Part VI - Market Basket and Product Recommendation Engines (20)

Unit 4_ML.pptx
Unit 4_ML.pptxUnit 4_ML.pptx
Unit 4_ML.pptx
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdf
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery Shop
 
What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Association and Classification Algorithm
Association and Classification AlgorithmAssociation and Classification Algorithm
Association and Classification Algorithm
 
Products Frequently Bought Together in Stores Using classificat...
Products Frequently Bought Together in Stores               Using classificat...Products Frequently Bought Together in Stores               Using classificat...
Products Frequently Bought Together in Stores Using classificat...
 
5Association AnalysisBasic Concepts an.docx
 5Association AnalysisBasic Concepts an.docx 5Association AnalysisBasic Concepts an.docx
5Association AnalysisBasic Concepts an.docx
 
Data Mining Apriori Algorithm Implementation using R
Data Mining Apriori Algorithm Implementation using RData Mining Apriori Algorithm Implementation using R
Data Mining Apriori Algorithm Implementation using R
 
Cluster2
Cluster2Cluster2
Cluster2
 
Assignment #3 10.19.14
Assignment #3 10.19.14Assignment #3 10.19.14
Assignment #3 10.19.14
 

More from Derek Kane

Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
Derek Kane
 
Data Science - Part XVI - Fourier Analysis
Data Science - Part XVI - Fourier AnalysisData Science - Part XVI - Fourier Analysis
Data Science - Part XVI - Fourier Analysis
Derek Kane
 
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science -  Part XV - MARS, Logistic Regression, & Survival AnalysisData Science -  Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Derek Kane
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
Derek Kane
 
Data Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsData Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov Models
Derek Kane
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Derek Kane
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
Derek Kane
 
Data Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingData Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series Forecasting
Derek Kane
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
Derek Kane
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
Derek Kane
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
Derek Kane
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
Derek Kane
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
Derek Kane
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
Derek Kane
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Derek Kane
 

More from Derek Kane (16)

Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
 
Data Science - Part XVI - Fourier Analysis
Data Science - Part XVI - Fourier AnalysisData Science - Part XVI - Fourier Analysis
Data Science - Part XVI - Fourier Analysis
 
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science -  Part XV - MARS, Logistic Regression, & Survival AnalysisData Science -  Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
 
Data Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsData Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov Models
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
Data Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingData Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series Forecasting
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 

Recently uploaded

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 

Recently uploaded (20)

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 

Data Science - Part VI - Market Basket and Product Recommendation Engines

  • 2.  Association Rules  Basic Terminology  Support  Confidence  Lift  Apriori Algorithm  Practical Examples  Grocery Shopping Basket Analysis  Voting Patterns in the House of Representatives
  • 3.  A series of methodologies for discovering interesting relationships between variables in a database.  The outcome of this technique, in simple terms, is a set of rules that can be understood as “if this, then that”.  An example of an association rule would be:  If a person buys Peanut Butter and Bread then they will also purchase Jelly.
  • 4. Applications  Product recommendation  Digital Media recommendations  Politics  Medical diagnosis  Content optimisation  Bioinformatics  Web mining  Scientific data analysis  Example: The analysis of earth science data may reveal interesting connections among the ocean, land, and atmospheric processes. This may help scientists to better understand how these systems interact with one another.
  • 5.  For example, maybe people who buy flour and casting sugar, also tend to buy eggs (because a high proportion of them are planning on baking a cake).  A retailer can use this information to inform:  Store layout (put products that co-occur together close to one another, to improve the customer shopping experience).  Marketing (e.g. target customers who buy flour with offers on eggs, to encourage them to spend more on their shopping basket).  Retailers can use these type of rules to identify new opportunities for cross selling/upselling their products to their customers.
  • 6.  Asked engineers and scientists around the world to solve what might have seemed like a simple problem: improve Netflix's ability to predict what movies users would like by a modest 10%.  From $5 million revenue in 1999 reached $3.2 billion revenue in 2011 as a result of becoming an analytics competitor.  By analyzing customer behavior and buying patterns created a recommendation engine which optimizes both customer tastes and inventory condition.
  • 7.  A rule is typically written in the following format: { i1, i2 } => { ik }  The { i1, i2 } represents the left hand side, LHS, of the rule and the { ik } represents the right hand side, RHS.  This statement can be read as “if a user buys an item in the item set on the left hand side, then the user will likely buy the item on the right hand side too”.  A more human readable example is: {coffee, sugar} => {milk}  If a customer buys coffee and sugar, then they are also likely to buy milk.
  • 8.  Before we can begin to employ association rules, we must first understand three important ratios; the support, confidence and lift.  Support: The fraction of which our item set occurs in our dataset.  Confidence: Probability that a rule is correct for a new transaction with items on the left.  Lift: The ratio by which by the confidence of a rule exceeds the expected confidence.
  • 9.  The support of an item or item set is the fraction of transactions in our data set that contain that item or item set.  Ex. A grocer has 15 transactions in total. Of which, Peanut Butter => Jelly appears 6 times. The support for this rule is 6 / 15 or 0.40.  In general, it is nice to identify rules that have a high support, as these will be applicable to a large number of transactions.  Support is an important measure because a rule that has a low support may occur simply by chance. A low support rule may also be uninteresting from a business perspective because it may not be profitable to promote items that are seldom bought together. For these reasons, support is often used to eliminate uninteresting rules.  For super market retailers, this is likely to involve basic products that are popular across an entire user base (e.g. bread, milk). A printer cartridge retailer, for example, may not have products with a high support, because each customer only buys cartridges that are specific to his / her own printer.
  • 10.  The confidence of a rule is the likelihood that it is true for a new transaction that contains the items on the LHS of the rule. (I.e. it is the probability that the transaction also contains the item(s) on the RHS.) Formally:  Confidence(X -> Y) = Support(X ꓴ Y) / Support (X)  Confidence(Peanut Butter => Jelly) = 0.26 / 0.40 = 0.65  This means that for 65% of the transactions that contain Peanut Butter and Jelly, the rule is correct.  Confidence measures the reliability of the inference made by a given rule. For a given rule X -> Y, the higher the confidence the more likely it is for Y to be present in transactions that contain X.  Association analysis results should be interpreted with caution. The inference made by an association rule does not necessarily imply causality. Instead, it suggests a strong co-occurance relationship between the items in the antecedent and consequent of the rule.
  • 11.  The lift of a rule is the ratio of the support of the items on the LHS of the rule co- occuring with items on the RHS divided by probability that the LHS and RHS co- occur if the two are independent.  Lift (X -> Y) = Support(X ꓴ Y) / ( Support (Y) * Support (X) )  Lift (Peanut Butter => Jelly) = 0.26 / (0.46 * 0.40) = 1.4  If lift is greater than 1, it suggests that the presence of the items on the LHS has increased the probability that the items on the right hand side will occur on this transaction.  If the lift is below 1, it suggests that the presence of the items on the LHS make the probability that the items on the RHS will be part of the transaction lower.  If the lift, is 1 it indicates that the items on the left and right are independent.  When we perform market basket analysis, then, we are looking for rules with a lift of more than one and preferably with a higher level of support.
  • 12.  The Apriori algorithm is perhaps the best known algorithm to mine association rules.  Apriori Theorem: “If an itemset is frequent, then all of its subsets must be also be frequent.”  It uses a breadth first strategy to count the support of item sets and uses a candidate generation function which exploits the downward closure property of support (anti- monotonicity).  The approach follows a 2 step process:  First, minimum support is applied to find all frequent itemsets in the database.  Second, these frequent itemsets and the minimum confidence constraint are used to form rules.
  • 13.  Finding all frequent itemsets in a database is difficult since it involves searching for all item combinations. The set of possible item combinations is the power set over I and has the size of 2n-1.
  • 14.  The downward-closure property of support allows for efficient search and guarantees that for a frequent itemset, all of its subsets are also frequent. Additionally, for an infrequent itemset, all of its supersets must also be infrequent.
  • 15.
  • 16.  Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket – and therefore ‘Market Basket Analysis’.  That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item.  For each transaction, there can be only distinct Item(s) without repeating entries. This allows for us to create a binary (0,1) representation whether a particular item was purchased under a specific transaction.
  • 17.  The dataset will need to first be flipped across the horizontal axis in to a cross tabulation. Notice that the “Items” are now the column headings. This preparation ensures that the dataset can be read properly into the apriori market basket algorithm.
  • 18.
  • 19.  The market basket analysis algorithm requires setting a threshold for detecting patterns in the dataset.  For this example, we specified a support value of 0.001 (due to the high volume of receipts and large product offering) and a confidence level of 0.70.  Additionally, we set the length of the rule not to exceed three elements. This ensures that the we will have a maximum of 2 items on the LHS and that the assessment will produce more meaningful insights at a tertiary glance.
  • 20.  Different businesses will have distribution of items by transaction that look very different, and so very different support and confidence parameters may be applicable.  To determine what works best, organizations need to experiment with different parameters: as you reduce them, the number of rules generated will increase, which will give us more to work with.  However, we will need to sift through the rules more carefully to identify those that will be more impactful for your business.  There is no stead fast rule on where to begin so experiment with loose parameters and go from there.
  • 21.  To showcase how we can leverage this insight further lets focus in on 3 specific items of interest:  Yogurt  Tropical Fruit  Bottled Beer.  We can run the algorithm (with the same thresholds) and specify that these terms are used as criterion for the RHS of the ruleset generation.
  • 22.  Now as criterion for the LHS of the ruleset generation.
  • 23.  Before we use the data to make any kind of business decision, it is important that we take a step back and remember something important:  The output of the analysis reflects how frequently items co-occur in transactions. This is a function both of the strength of association between the items, and the way the business has presented them to the customer.  To say that in a different way: items might co-occur not because they are “naturally” connected, but because we, the people in charge of the organization, have presented them together.
  • 24.  The market basket results can be used to drive targeted marketing campaigns.  For each patron, we pick a handful of products based on products they have bought to date which have both a high uplift and a high margin, and send them a e.g. personalized email or display ads etc.  How we use the analysis has significant implications for the analysis itself: if we are feeding the analysis into a machine-driven process for delivering recommendations, we are much more interested in generating an expansive set of rules.  If, however, we are experimenting with targeted marketing for the first time, it makes much more sense to pick a handful of particularly high value rules, and action just them, before working out whether to invest in the effort of building out that capability to manage a much wider and more complicated rule set.
  • 25.  There are a number of ways we can use the data to drive site organization:  Large clusters of co-occuring items should probably be placed in their own category / theme.  Item pairs that commonly co-occur should be placed close together within broader categories on the website. This is especially important where one item in a pair is very popular, and the other item is very high margin.  Long lists of rules (including ones with low support and confidence) can be used to put recommendations at the bottom of product pages and on product cart pages. The only thing that matters for these rules is that the lift is greater than one. (And that we pick those rules that are applicable for each product with the high lift where the product recommended has a high margin.)  In the event that doing the above (3) drives significant uplift in profit, it would strengthen the case to invest in a recommendation system, that uses a similar algorithm in an operational context to power automatic recommendation engine on your website.
  • 26.
  • 27.  We will apply the results of association analysis to the voting records of members of the United States House of Representatives. The data is obtained from the 1984 Congressional Voting Records Database, which is available at the UCI machine learning data repository.  Each transaction contains information about party affiliation along with his or her voting record on 16 key issues.
  • 28. Observations:  A vote in favor of the Physician Pay Freeze indicates a Republican (95% confidence), a vote against indicates a Democrat (99% confidence).  The voting patterns are relatively clear in this example with a high degree of support and confidence. We can see which party favors a specific law without knowing the contents of the legislation itself.
  • 29.  Reside in Wayne, Illinois  Active Semi-Professional Classical Musician (Bassoon).  Married my wife on 10/10/10 and been together for 10 years.  Pet Yorkshire Terrier / Toy Poodle named Brunzie.  Pet Maine Coons’ named Maximus Power and Nemesis Gul du Cat.  Enjoy Cooking, Hiking, Cycling, Kayaking, and Astronomy.  Self proclaimed Data Nerd and Technology Lover.