Association Mining

www.edureka.co/r-for-analytics
Know The Science Behind Product
Recommendation

www.edureka.co/r-for-analyticsSlide 2
Objectives
What is data mining
What is Business Analytics
Stages of Analytics / data mining
What is R
overview of Machine Learning
 What is Association rule mining
Use-case
At the end of this session, you will be able to

Business Analytics
Why Business Analytics is getting popular these days ?
Cost of storing data Cost of processing data

Cross Industry standard Process for data mining ( CRISP – DM )
Stages of Analytics / Data Mining

What is R
R is Programming Language
R is Environment for Statistical Analysis
R is Data Analysis Software

R : Characteristics
Effective and fast data handling and storage facility
A bunch of operators for calculations on arrays, lists, vectors etc
A large integrated collection of tools for data analysis, and visualization
Facilities for data analysis using graphs and display either directly at the computer or paper
A well implemented and effective programming language called ‘S’ on top of which R is built
A complete range of packages to extend and enrich the functionality of R

Who Uses R : Domains
 Telecom
 Pharmaceuticals
 Financial Services
 Life Sciences
 Education, etc

Common Machine Learning Algorithms
Types of Learning
Supervised Learning
Unsupervised Learning
Algorithms
 Naïve Bayes
 Support Vector Machines
 Random Forests
 Decision Trees
Algorithms
 K-means
 Fuzzy Clustering
 Hierarchical Clustering
Gaussian mixture models
Self-organizing maps

Slide 9Slide 9 www.edureka.co/r-for-analytics
Association Rule Mining
 Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of
also purchasing one of three types of candy bars
 Customers who purchase maintenance agreements are very likely to
purchase large appliances
 When a new hardware store opens, one of the most commonly sold items is
toilet bowl cleaners

What is Association Rule Mining?
 In data mining, Association Rule Mining is a popular and well researched method for discovering interesting relations
between variables in large databases.
 It is intended to identify strong rules discovered in databases using different measures of interests.
 The rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes
together, he or she is likely to also buy hamburger meat.
 Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing
or product placements.

How good is Association Rule?
Here we have 5 customers. Each customer is given a bucket and their purchases are as follows :
Customer Items Purchased
1 OJ, soda
2 Milk, OJ, window cleaner
3 OJ, detergent
4 OJ, detergent, soda
5 Window cleaner, soda
Here, customer 1 purchases OJ (orange juice), and soda.
customer 2 purchases Milk, OJ and window cleaner
customer 3 purchases OJ and detergent
customer 4 purchases OJ, detergent and soda
customer 5 purchases window cleaner and soda.
Now lets form a matrix to analyze the above data and conclude inferences

How good is Association Rule?
OJ Window
cleaner
Milk Soda Detergent
OJ 4 1 1 2 2
Window cleaner 1 2 1 1 0
Milk 1 1 1 0 0
Soda 2 1 0 3 1
Detergent 2 0 0 1 2
Simple patterns derived from the above observation :
 OJ and soda are more likely purchased together than any other two items
 Detergent is never purchased with milk or window cleaner
 Milk is never purchased with soda or detergent
Co-occurence of Products

The following three terms are the important constraints on which the Association Rules are made
Support
The support Supp(x)=proportion of
transactions in the data set which
contain the interest.
Confidence
The confidence of a rule:
Conf(x=>y)= Supp(X U Y)/Supp(X)
Lift
The lift of a rule: Lift(X=>Y)=
Supp(X U Y) / (Supp(X) X Supp(Y))
Now lets calculate the Support, Confidence and Lift for our ‘Groceries’ data
Support Confidence
{Soda} => {OJ} 0.4 0.6667
{OJ} => {Soda} 0.4 0.5

The Groceries data set contains 1 month (30 days) of real-world
point-of-sale transaction data from a typical local grocery outlet. The
data set contains 9835 transactions and the items are aggregated to
169 categories.
‘arules’ provides the infrastructure for representing, manipulating
and analyzing transaction data and patterns.
Various visualization techniques for association rules and
itemsets. This package extends package arules.

Syntax - apriori(data, parameter = NULL,
appearance = NULL, control = NULL)
apriori() - The apriori function is present in the ‘arules’ package. It employs level-wise search for frequent item-sets.

Going through 1098 rules manually, is not an efficient option.
Let us make use of the ‘Viz’ in arulesViz and visualize the rules.

Now lets plot the data using the ‘Scatter Plot’ graph
 A scatter plot is a mathematical diagram to display values
for two variables for a set of data.
 The data is displayed as a collection of points
 Scatter plot is used when a variable exists below the control
of the experimenter.
Conclusion:
 It can be seen that rules with high lift have relatively
low support.
 Most interesting rules reside on support-confidence
border.

Now after applying the Association Rules, the Support, Confidence and the Lift values for the Groceries data is as
shown below:

Conclusion:
 The most interesting rules according to ‘lift’ can be seen at the top-center.
 There are 3 rules containing “Butter” and 1 other item in the antecedent, in consequence to “whipped/sour cream”
Let us zoom into the plot to observe the significant inferences:

Association Mining

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Association Mining

Similar to Association Mining (20)

More from Edureka!

More from Edureka! (20)

Association Mining