This lecture provides an overview of association analysis, which includes topics such as market basket analysis and product recommendation engines. The first practical example centers around analyzing supermarket retailer product receipts and the second example touches upon the use of the association rules in the political arena.
Slide helps in generating an understand about the intuition and mathematics / stats behind association rule mining. This presentation starts by highlighting the difference between causal and correlation. This is followed Apriori algorithm and the metrics which are used with it. Each metric is discussed in detail. Then a formulation has been generated in classification setting which can be used to generate rules i.e. rule mining.
Other Reference: https://www.slideshare.net/JustinCletus/mining-frequent-patterns-association-and-correlations
Abstract: This PDSG workshop introduces basic concepts of multiple linear regression in machine learning. Concepts covered are Feature Elimination and Backward Elimination, with examples in Python.
Level: Fundamental
Requirements: Should have some experience with Python programming.
This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.
Slide helps in generating an understand about the intuition and mathematics / stats behind association rule mining. This presentation starts by highlighting the difference between causal and correlation. This is followed Apriori algorithm and the metrics which are used with it. Each metric is discussed in detail. Then a formulation has been generated in classification setting which can be used to generate rules i.e. rule mining.
Other Reference: https://www.slideshare.net/JustinCletus/mining-frequent-patterns-association-and-correlations
Abstract: This PDSG workshop introduces basic concepts of multiple linear regression in machine learning. Concepts covered are Feature Elimination and Backward Elimination, with examples in Python.
Level: Fundamental
Requirements: Should have some experience with Python programming.
This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.This slide is about Data mining rules.
This is the most simplest and easy to understand ppt. Here you can define what is decision tree,information gain,gini impurity,steps for making decision tree there pros and cons etc which will helps you to easy understand and represent it.
Introduction To Multilevel Association Rule And Its MethodsIJSRD
Association rule mining is a popular and well researched method for discovering interesting relations between variables in large databases. In this paper we introduce the concept of Data mining, Association rule and Multilevel association rule with different algorithm, its advantage and concept of Fuzzy logic and Genetic Algorithm. Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework.
Discussed what is Prescriptive Analytics, comparison between Descriptive and Prescriptive Analytics, process, methods and tools. A report presentation conducted at University of East - Manila, Philippines dated July 6, 2017.
This is a presentation about Gradient Boosted Trees which starts from the basics of Data Mining, building up towards Ensemble Methods like Bagging,Boosting etc. and then building towards Gradient Boosted Trees.
Introduction to linear regression and the maths behind it like line of best fit, regression matrics. Other concepts include cost function, gradient descent, overfitting and underfitting, r squared.
This is the most simplest and easy to understand ppt. Here you can define what is decision tree,information gain,gini impurity,steps for making decision tree there pros and cons etc which will helps you to easy understand and represent it.
Introduction To Multilevel Association Rule And Its MethodsIJSRD
Association rule mining is a popular and well researched method for discovering interesting relations between variables in large databases. In this paper we introduce the concept of Data mining, Association rule and Multilevel association rule with different algorithm, its advantage and concept of Fuzzy logic and Genetic Algorithm. Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework.
Discussed what is Prescriptive Analytics, comparison between Descriptive and Prescriptive Analytics, process, methods and tools. A report presentation conducted at University of East - Manila, Philippines dated July 6, 2017.
This is a presentation about Gradient Boosted Trees which starts from the basics of Data Mining, building up towards Ensemble Methods like Bagging,Boosting etc. and then building towards Gradient Boosted Trees.
Introduction to linear regression and the maths behind it like line of best fit, regression matrics. Other concepts include cost function, gradient descent, overfitting and underfitting, r squared.
A Market Basket Analysis of a bakery shop data using Apriori Algorithms and Association Rule mining . Application and Benefits of Market Basket Analytics in Retail Management
Frequent pattern mining is an analytical algorithm that is used by businesses and, is accessible in some self-serve business intelligence solutions. The FP Growth analytical technique finds frequent patterns, associations, or causal structures from data sets in various kinds of databases such as relational databases, transactional databases, and other forms of data repositories.
Research Inventy : International Journal of Engineering and Scienceresearchinventy
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Research Inventy : International Journal of Engineering and Scienceresearchinventy
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
Products Frequently Bought Together in Stores Using classificat...hibaziyad99
Market Based Analysis is one of the key techniques used by large relations to show associations between items.
it can generate association rules from the given transactional datasets.Association rules are useful for analyzing and predicting customer behavior.
The disadvantage of association algorithms is require much more time to run than a decision tree algorithm.
The A priori Algorithm is an instrumental algorithm for mining familiar item sets.
The disadvantage is more exploration space and computational cost is too expensive.
�
� �
�
5
Association Analysis:
Basic Concepts and
Algorithms
Many business enterprises accumulate large quantities of data from their day-
to-day operations. For example, huge amounts of customer purchase data are
collected daily at the checkout counters of grocery stores. Table 5.1 gives an
example of such data, commonly known as market basket transactions.
Each row in this table corresponds to a transaction, which contains a unique
identifier labeled TID and a set of items bought by a given customer. Retailers
are interested in analyzing the data to learn about the purchasing behavior
of their customers. Such valuable information can be used to support a vari-
ety of business-related applications such as marketing promotions, inventory
management, and customer relationship management.
This chapter presents a methodology known as association analysis,
which is useful for discovering interesting relationships hidden in large data
sets. The uncovered relationships can be represented in the form of sets of
items present in many transactions, which are known as frequent itemsets,
Table 5.1. An example of market basket transactions.
TID Items
1 {Bread, Milk}
2 {Bread, Diapers, Beer, Eggs}
3 {Milk, Diapers, Beer, Cola}
4 {Bread, Milk, Diapers, Beer}
5 {Bread, Milk, Diapers, Cola}
�
� �
�
358 Chapter 5 Association Analysis
or association rules, that represent relationships between two itemsets. For
example, the following rule can be extracted from the data set shown in
Table 5.1:
{Diapers} −→ {Beer}.
The rule suggests a relationship between the sale of diapers and beer because
many customers who buy diapers also buy beer. Retailers can use these types
of rules to help them identify new opportunities for cross-selling their products
to the customers.
Besides market basket data, association analysis is also applicable to data
from other application domains such as bioinformatics, medical diagnosis, web
mining, and scientific data analysis. In the analysis of Earth science data, for
example, association patterns may reveal interesting connections among the
ocean, land, and atmospheric processes. Such information may help Earth
scientists develop a better understanding of how the different elements of the
Earth system interact with each other. Even though the techniques presented
here are generally applicable to a wider variety of data sets, for illustrative
purposes, our discussion will focus mainly on market basket data.
There are two key issues that need to be addressed when applying associ-
ation analysis to market basket data. First, discovering patterns from a large
transaction data set can be computationally expensive. Second, some of the
discovered patterns may be spurious (happen simply by chance) and even for
non-spurious patterns, some are more interesting than others. The remainder
of this chapter is organized around these two issues. The first part of the
chapter is devoted to explaining the basic concepts of association ...
Data Science - Part XVII - Deep Learning & Image ProcessingDerek Kane
This lecture provides an overview of Image Processing and Deep Learning for the applications of data science and machine learning. We will go through examples of image processing techniques using a couple of different R packages. Afterwards, we will shift our focus and dive into the topics of Deep Neural Networks and Deep Learning. We will discuss topics including Deep Boltzmann Machines, Deep Belief Networks, & Convolutional Neural Networks and finish the presentation with a practical exercise in hand writing recognition technique.
Data Science - Part XVI - Fourier AnalysisDerek Kane
This lecture provides an overview of the Fourier Analysis and the Fourier Transform as applied in Machine Learning. We will go through some methods of calibration and diagnostics and then apply the technique on a time series prediction of Manufacturing Order Volumes utilizing Fourier Analysis and Neural Networks.
Data Science - Part XV - MARS, Logistic Regression, & Survival AnalysisDerek Kane
This lecture provides an overview on extending the regression concepts brought forth in previous lectures. We will start off by going through a broad overview of the Multivariate Adaptive Regression Splines Algorithm, Logistic Regression, and then explore the Survival Analysis. The presentation will culminate with a real world example on how these techniques can be used in the US criminal justice system.
Data Science - Part XIV - Genetic AlgorithmsDerek Kane
This lecture provides an overview on biological evolution and genetic algorithms in a machine learning context. We will start off by going through a broad overview of the biological evolutionary process and then explore how genetic algorithms can be developed that mimic these processes. We will dive into the types of problems that can be solved with genetic algorithms and then we will conclude with a series of practical examples in R which highlights the techniques: The Knapsack Problem, Feature Selection and OLS regression, and constrained optimizations.
Data Science - Part XIII - Hidden Markov ModelsDerek Kane
This lecture provides an overview on Markov processes and Hidden Markov Models. We will start off by going through a basic conceptual example and then explore the types of problems that can be solved with HMM's. The underlying algorithms will be discussed in detail with a quantitative focus and then we will conclude with a practical example concerning stock market prediction which highlights the techniques.
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
This lecture provides an overview of some modern regression techniques including a discussion of the bias variance tradeoff for regression errors and the topic of shrinkage estimators. This leads into an overview of ridge regression, LASSO, and elastic nets. These topics will be discussed in detail and we will go through the calibration/diagnostics and then conclude with a practical example highlighting the techniques.
This is an introduction to text analytics for advanced business users and IT professionals with limited programming expertise. The presentation will go through different areas of text analytics as well as provide some real work examples that help to make the subject matter a little more relatable. We will cover topics like search engine building, categorization (supervised and unsupervised), clustering, NLP, and social media analysis.
Data Science - Part X - Time Series ForecastingDerek Kane
This lecture provides an overview of Time Series forecasting techniques and the process of creating effective forecasts. We will go through some of the popular statistical methods including time series decomposition, exponential smoothing, Holt-Winters, ARIMA, and GLM Models. These topics will be discussed in detail and we will go through the calibration and diagnostics effective time series models on a number of diverse datasets.
Data Science - Part IX - Support Vector MachineDerek Kane
This lecture provides an overview of Support Vector Machines in a more relatable and accessible manner. We will go through some methods of calibration and diagnostics of SVM and then apply the technique to accurately detect breast cancer within a dataset.
Data Science - Part VIII - Artifical Neural NetworkDerek Kane
This lecture provides an overview of biological based learning in the brain and how to simulate this approach through the use of feed-forward artificial neural networks with back propagation. We will go through some methods of calibration and diagnostics and then apply the technique on three different data mining tasks: binary prediction, classification, and time series prediction.
Data Science - Part VII - Cluster AnalysisDerek Kane
This lecture provides an overview of clustering techniques, including K-Means, Hierarchical Clustering, and Gaussian Mixed Models. We will go through some methods of calibration and diagnostics and then apply the technique on a recognizable dataset.
Data Science - Part V - Decision Trees & Random Forests Derek Kane
This lecture provides an overview of decision tree machine learning algorithms and random forest ensemble techniques. The practical example includes diagnosing Type II diabetes and evaluating customer churn in the telecommunication industry.
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
This lecture provides an overview of linear regression analysis, interaction terms, ANOVA, optimization, log-level, and log-log transformations. The first practical example centers around the Boston housing market where the second example dives into business applications of regression analysis in a supermarket retailer.
Data Science - Part III - EDA & Model SelectionDerek Kane
This lecture introduces the concept of EDA, understanding, and working with data for machine learning and predictive analysis. The lecture is designed for anyone who wants to understand how to work with data and does not get into the mathematics. We will discuss how to utilize summary statistics, diagnostic plots, data transformations, variable selection techniques including principal component analysis, and finally get into the concept of model selection.
Data Science - Part II - Working with R & R studioDerek Kane
This tutorial will go through a basic primer for individuals who want to get started with predictive analytics through downloading the open source (FREE) language R. I will go through some tips to get up and started and building predictive models ASAP.
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesDerek Kane
This is the first lecture in a series of data analytics topics and geared to individuals and business professionals who have no understand of building modern analytics approaches. This lecture provides an overview of the models and techniques we will address throughout the lecture series, we will discuss Business Intelligence topics, predictive analytics, and big data technologies. Finally, we will walk through a simple yet effective example which showcases the potential of predictive analytics in a business context.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
2. Association Rules
Basic Terminology
Support
Confidence
Lift
Apriori Algorithm
Practical Examples
Grocery Shopping Basket Analysis
Voting Patterns in the House of Representatives
3. A series of methodologies for discovering
interesting relationships between variables in a
database.
The outcome of this technique, in simple terms, is
a set of rules that can be understood as “if this,
then that”.
An example of an association rule would be:
If a person buys Peanut Butter and Bread then
they will also purchase Jelly.
4. Applications
Product recommendation
Digital Media recommendations
Politics
Medical diagnosis
Content optimisation
Bioinformatics
Web mining
Scientific data analysis
Example: The analysis of earth science data may reveal interesting connections
among the ocean, land, and atmospheric processes. This may help scientists to
better understand how these systems interact with one another.
5. For example, maybe people who buy flour and casting sugar, also tend to buy
eggs (because a high proportion of them are planning on baking a cake).
A retailer can use this information to inform:
Store layout (put products that co-occur together close to one another, to
improve the customer shopping experience).
Marketing (e.g. target customers who buy flour with offers on eggs, to
encourage them to spend more on their shopping basket).
Retailers can use these type of rules to identify new opportunities for cross
selling/upselling their products to their customers.
6. Asked engineers and scientists around the world
to solve what might have seemed like a simple
problem: improve Netflix's ability to predict what
movies users would like by a modest 10%.
From $5 million revenue in 1999 reached $3.2
billion revenue in 2011 as a result of becoming
an analytics competitor.
By analyzing customer behavior and buying
patterns created a recommendation engine
which optimizes both customer tastes and
inventory condition.
7. A rule is typically written in the following format: { i1, i2 } => { ik }
The { i1, i2 } represents the left hand side, LHS, of the rule and the { ik }
represents the right hand side, RHS.
This statement can be read as “if a user buys an item in the item set on the left
hand side, then the user will likely buy the item on the right hand side too”.
A more human readable example is:
{coffee, sugar} => {milk}
If a customer buys coffee and sugar, then they are also likely to buy milk.
8. Before we can begin to employ association rules, we must first understand three
important ratios; the support, confidence and lift.
Support: The fraction of which our item set occurs in our dataset.
Confidence: Probability that a rule is correct for a new transaction with items on the
left.
Lift: The ratio by which by the confidence of a rule exceeds the expected confidence.
9. The support of an item or item set is the fraction of transactions in our data set
that contain that item or item set.
Ex. A grocer has 15 transactions in total. Of which, Peanut Butter => Jelly appears
6 times. The support for this rule is 6 / 15 or 0.40.
In general, it is nice to identify rules that have a high support, as these will be
applicable to a large number of transactions.
Support is an important measure because a rule that has a low support may occur
simply by chance. A low support rule may also be uninteresting from a business
perspective because it may not be profitable to promote items that are seldom
bought together. For these reasons, support is often used to eliminate
uninteresting rules.
For super market retailers, this is likely to involve basic products that are popular
across an entire user base (e.g. bread, milk). A printer cartridge retailer, for
example, may not have products with a high support, because each customer only
buys cartridges that are specific to his / her own printer.
10. The confidence of a rule is the likelihood that it is true for a new transaction that
contains the items on the LHS of the rule. (I.e. it is the probability that the
transaction also contains the item(s) on the RHS.) Formally:
Confidence(X -> Y) = Support(X ꓴ Y) / Support (X)
Confidence(Peanut Butter => Jelly) = 0.26 / 0.40 = 0.65
This means that for 65% of the transactions that contain Peanut Butter and Jelly,
the rule is correct.
Confidence measures the reliability of the inference made by a given rule. For a
given rule X -> Y, the higher the confidence the more likely it is for Y to be present
in transactions that contain X.
Association analysis results should be interpreted with caution. The inference
made by an association rule does not necessarily imply causality. Instead, it
suggests a strong co-occurance relationship between the items in the antecedent
and consequent of the rule.
11. The lift of a rule is the ratio of the support of the items on the LHS of the rule co-
occuring with items on the RHS divided by probability that the LHS and RHS co-
occur if the two are independent.
Lift (X -> Y) = Support(X ꓴ Y) / ( Support (Y) * Support (X) )
Lift (Peanut Butter => Jelly) = 0.26 / (0.46 * 0.40) = 1.4
If lift is greater than 1, it suggests that the presence of the items on the LHS has
increased the probability that the items on the right hand side will occur on this
transaction.
If the lift is below 1, it suggests that the presence of the items on the LHS make the
probability that the items on the RHS will be part of the transaction lower.
If the lift, is 1 it indicates that the items on the left and right are independent.
When we perform market basket analysis, then, we are looking for rules with a lift
of more than one and preferably with a higher level of support.
12. The Apriori algorithm is perhaps the best known algorithm to mine association rules.
Apriori Theorem: “If an itemset is frequent, then all of its subsets must be also be frequent.”
It uses a breadth first strategy to count the support of item sets and uses a candidate
generation function which exploits the downward closure property of support (anti-
monotonicity).
The approach follows a 2 step process:
First, minimum support is applied to find all frequent itemsets in the database.
Second, these frequent itemsets and the minimum confidence constraint are used to form
rules.
13. Finding all frequent itemsets in a database is difficult since it involves searching for
all item combinations. The set of possible item combinations is the power set over
I and has the size of 2n-1.
14. The downward-closure property of support allows for efficient search and guarantees that for
a frequent itemset, all of its subsets are also frequent. Additionally, for an infrequent itemset, all
of its supersets must also be infrequent.
15.
16. Imagine 10000 receipts sitting on your
table. Each receipt represents a transaction
with items that were purchased. The
receipt is a representation of stuff that
went into a customer’s basket – and
therefore ‘Market Basket Analysis’.
That is exactly what the Groceries Data Set
contains: a collection of receipts with each
line representing 1 receipt and the items
purchased. Each line is called a transaction
and each column in a row represents an
item.
For each transaction, there can be only
distinct Item(s) without repeating entries.
This allows for us to create a binary (0,1)
representation whether a particular item
was purchased under a specific
transaction.
17. The dataset will need to first be flipped across the horizontal axis in to a cross tabulation.
Notice that the “Items” are now the column headings. This preparation ensures that the
dataset can be read properly into the apriori market basket algorithm.
18.
19. The market basket analysis algorithm requires setting a threshold for detecting patterns in
the dataset.
For this example, we specified a support value of 0.001 (due to the high volume of
receipts and large product offering) and a confidence level of 0.70.
Additionally, we set the length of the rule not to exceed three elements. This
ensures that the we will have a maximum of 2 items on the LHS and that the
assessment will produce more meaningful insights at a tertiary glance.
20. Different businesses will have distribution of items by transaction that look very
different, and so very different support and confidence parameters may be
applicable.
To determine what works best, organizations need to experiment with different
parameters: as you reduce them, the number of rules generated will increase,
which will give us more to work with.
However, we will need to sift through the rules more carefully to identify those
that will be more impactful for your business.
There is no stead fast rule on where to begin so experiment with loose
parameters and go from there.
21. To showcase how we can leverage this insight further lets focus in on 3 specific items of
interest:
Yogurt
Tropical Fruit
Bottled Beer.
We can run the algorithm (with the same thresholds) and specify that these terms are
used as criterion for the RHS of the ruleset generation.
22. Now as criterion for the LHS of the ruleset generation.
23. Before we use the data to make any kind
of business decision, it is important that
we take a step back and remember
something important:
The output of the analysis reflects how
frequently items co-occur in transactions.
This is a function both of the strength of
association between the items, and the
way the business has presented them to
the customer.
To say that in a different way: items might
co-occur not because they are “naturally”
connected, but because we, the people in
charge of the organization, have
presented them together.
24. The market basket results can be used to drive targeted marketing campaigns.
For each patron, we pick a handful of products based on products they have bought
to date which have both a high uplift and a high margin, and send them a e.g.
personalized email or display ads etc.
How we use the analysis has significant implications for the analysis itself: if we are
feeding the analysis into a machine-driven process for delivering recommendations,
we are much more interested in generating an expansive set of rules.
If, however, we are experimenting with targeted marketing for the first time, it makes
much more sense to pick a handful of particularly high value rules, and action just
them, before working out whether to invest in the effort of building out that capability
to manage a much wider and more complicated rule set.
25. There are a number of ways we can use the data to drive site organization:
Large clusters of co-occuring items should probably be placed in their own category /
theme.
Item pairs that commonly co-occur should be placed close together within broader
categories on the website. This is especially important where one item in a pair is very
popular, and the other item is very high margin.
Long lists of rules (including ones with low support and confidence) can be used to put
recommendations at the bottom of product pages and on product cart pages. The only
thing that matters for these rules is that the lift is greater than one. (And that we pick those
rules that are applicable for each product with the high lift where the product recommended
has a high margin.)
In the event that doing the above (3) drives significant uplift in profit, it would strengthen the
case to invest in a recommendation system, that uses a similar algorithm in an operational
context to power automatic recommendation engine on your website.
26.
27. We will apply the results of
association analysis to the voting
records of members of the United
States House of Representatives. The
data is obtained from the 1984
Congressional Voting Records
Database, which is available at the
UCI machine learning data
repository.
Each transaction contains
information about party affiliation
along with his or her voting record
on 16 key issues.
28. Observations:
A vote in favor of the Physician Pay Freeze indicates a Republican (95%
confidence), a vote against indicates a Democrat (99% confidence).
The voting patterns are relatively clear in this example with a high degree of
support and confidence. We can see which party favors a specific law without
knowing the contents of the legislation itself.
29. Reside in Wayne, Illinois
Active Semi-Professional Classical Musician
(Bassoon).
Married my wife on 10/10/10 and been
together for 10 years.
Pet Yorkshire Terrier / Toy Poodle named
Brunzie.
Pet Maine Coons’ named Maximus Power and
Nemesis Gul du Cat.
Enjoy Cooking, Hiking, Cycling, Kayaking, and
Astronomy.
Self proclaimed Data Nerd and Technology
Lover.