The document is a presentation on using R for analytics. It discusses data mining, business analytics, the CRISP-DM process, machine learning algorithms like naive bayes and random forests, and association rule mining. Association rule mining finds relationships between variables in large datasets. It identifies strong rules using measures like support, confidence and lift. The presentation shows an example of applying association rule mining to grocery purchase data to discover rules like "customers who buy OJ also frequently buy soda". Visualization techniques in R are used to analyze the rules.
2. www.edureka.co/r-for-analyticsSlide 2
Objectives
What is data mining
What is Business Analytics
Stages of Analytics / data mining
What is R
overview of Machine Learning
What is Association rule mining
Use-case
At the end of this session, you will be able to
6. Slide 6 www.edureka.co/r-for-analytics
R : Characteristics
Effective and fast data handling and storage facility
A bunch of operators for calculations on arrays, lists, vectors etc
A large integrated collection of tools for data analysis, and visualization
Facilities for data analysis using graphs and display either directly at the computer or paper
A well implemented and effective programming language called ‘S’ on top of which R is built
A complete range of packages to extend and enrich the functionality of R
8. Slide 8
Common Machine Learning Algorithms
Types of Learning
Supervised Learning
Unsupervised Learning
Algorithms
Naïve Bayes
Support Vector Machines
Random Forests
Decision Trees
Algorithms
K-means
Fuzzy Clustering
Hierarchical Clustering
Gaussian mixture models
Self-organizing maps
9. Slide 9Slide 9Slide 9 www.edureka.co/r-for-analytics
Association Rule Mining
Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of
also purchasing one of three types of candy bars
Customers who purchase maintenance agreements are very likely to
purchase large appliances
When a new hardware store opens, one of the most commonly sold items is
toilet bowl cleaners
10. Slide 10Slide 10Slide 10 www.edureka.co/r-for-analytics
What is Association Rule Mining?
In data mining, Association Rule Mining is a popular and well researched method for discovering interesting relations
between variables in large databases.
It is intended to identify strong rules discovered in databases using different measures of interests.
The rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes
together, he or she is likely to also buy hamburger meat.
Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing
or product placements.
11. Slide 11Slide 11Slide 11 www.edureka.co/r-for-analytics
How good is Association Rule?
Here we have 5 customers. Each customer is given a bucket and their purchases are as follows :
Customer Items Purchased
1 OJ, soda
2 Milk, OJ, window cleaner
3 OJ, detergent
4 OJ, detergent, soda
5 Window cleaner, soda
Here, customer 1 purchases OJ (orange juice), and soda.
customer 2 purchases Milk, OJ and window cleaner
customer 3 purchases OJ and detergent
customer 4 purchases OJ, detergent and soda
customer 5 purchases window cleaner and soda.
Now lets form a matrix to analyze the above data and conclude inferences
12. Slide 12Slide 12Slide 12 www.edureka.co/r-for-analytics
How good is Association Rule?
OJ Window
cleaner
Milk Soda Detergent
OJ 4 1 1 2 2
Window cleaner 1 2 1 1 0
Milk 1 1 1 0 0
Soda 2 1 0 3 1
Detergent 2 0 0 1 2
Simple patterns derived from the above observation :
OJ and soda are more likely purchased together than any other two items
Detergent is never purchased with milk or window cleaner
Milk is never purchased with soda or detergent
Co-occurence of Products
13. Slide 13Slide 13Slide 13 www.edureka.co/r-for-analytics
Association Rule Mining
The following three terms are the important constraints on which the Association Rules are made
Support
The support Supp(x)=proportion of
transactions in the data set which
contain the interest.
Confidence
The confidence of a rule:
Conf(x=>y)= Supp(X U Y)/Supp(X)
Lift
The lift of a rule: Lift(X=>Y)=
Supp(X U Y) / (Supp(X) X Supp(Y))
Now lets calculate the Support, Confidence and Lift for our ‘Groceries’ data
Support Confidence
{Soda} => {OJ} 0.4 0.6667
{OJ} => {Soda} 0.4 0.5
14. Slide 14Slide 14Slide 14 www.edureka.co/r-for-analytics
Association Rule Mining
The Groceries data set contains 1 month (30 days) of real-world
point-of-sale transaction data from a typical local grocery outlet. The
data set contains 9835 transactions and the items are aggregated to
169 categories.
‘arules’ provides the infrastructure for representing, manipulating
and analyzing transaction data and patterns.
Various visualization techniques for association rules and
itemsets. This package extends package arules.
15. Slide 15Slide 15Slide 15 www.edureka.co/r-for-analytics
Association Rule Mining
Syntax - apriori(data, parameter = NULL,
appearance = NULL, control = NULL)
apriori() - The apriori function is present in the ‘arules’ package. It employs level-wise search for frequent item-sets.
16. Slide 16Slide 16Slide 16 www.edureka.co/r-for-analytics
Association Rule Mining
Going through 1098 rules manually, is not an efficient option.
Let us make use of the ‘Viz’ in arulesViz and visualize the rules.
17. Slide 17Slide 17Slide 17 www.edureka.co/r-for-analytics
Association Rule Mining
Now lets plot the data using the ‘Scatter Plot’ graph
A scatter plot is a mathematical diagram to display values
for two variables for a set of data.
The data is displayed as a collection of points
Scatter plot is used when a variable exists below the control
of the experimenter.
Conclusion:
It can be seen that rules with high lift have relatively
low support.
Most interesting rules reside on support-confidence
border.
18. Slide 18Slide 18Slide 18 www.edureka.co/r-for-analytics
Association Rule Mining
Now after applying the Association Rules, the Support, Confidence and the Lift values for the Groceries data is as
shown below:
20. Slide 20Slide 20Slide 20 www.edureka.co/r-for-analytics
Conclusion:
The most interesting rules according to ‘lift’ can be seen at the top-center.
There are 3 rules containing “Butter” and 1 other item in the antecedent, in consequence to “whipped/sour cream”
Let us zoom into the plot to observe the significant inferences:
Association Rule Mining