SlideShare a Scribd company logo
1 of 31
Download to read offline
Association Rule Mining with R 
Yanchang Zhao 
http://www.RDataMining.com 
30 September 2014 
1 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
2 / 30
Association Rule Mining with R 1 
I basic concepts of association rules 
I association rules mining with R 
I pruning redundant rules 
I interpreting and visualizing association rules 
I recommended readings 
1Chapter 9: Association Rules, R and Data Mining: Examples and Case 
Studies. http://www.rdatamining.com/docs/RDataMining.pdf 
3 / 30
Association Rules 
Association rules are rules presenting association or correlation 
between itemsets. 
support(A ) B) = P(A [ B) 
con
dence(A ) B) = P(BjA) 
= 
P(A [ B) 
P(A) 
lift(A ) B) = 
con
dence(A ) B) 
P(B) 
= 
P(A [ B) 
P(A)P(B) 
where P(A) is the percentage (or probability) of cases containing 
A. 
4 / 30
Association Rule Mining Algorithms in R 
I APRIORI 
I a level-wise, breadth-
rst algorithm which counts transactions 
to
nd frequent itemsets and then derive association rules from 
them 
I apriori() in package arules 
I ECLAT 
I
nds frequent itemsets with equivalence classes, depth-
rst 
search and set intersection instead of counting 
I eclat() in the same package 
5 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
6 / 30
The Titanic Dataset 
I The Titanic dataset in the datasets package is a 4-dimensional 
table with summarized information on the fate of passengers 
on the Titanic according to social class, sex, age and survival. 
I To make it suitable for association rule mining, we reconstruct 
the raw data as titanic.raw, where each row represents a 
person. 
I The reconstructed raw data can also be downloaded at 
http://www.rdatamining.com/data/titanic.raw.rdata. 
7 / 30
load("./data/titanic.raw.rdata") 
## draw a sample of 5 records 
idx <- sample(1:nrow(titanic.raw), 5) 
titanic.raw[idx, ] 
## Class Sex Age Survived 
## 950 Crew Male Adult No 
## 2176 3rd Female Adult Yes 
## 1716 Crew Male Adult Yes 
## 1001 Crew Male Adult No 
## 48 3rd Female Child No 
summary(titanic.raw) 
## Class Sex Age Survived 
## 1st :325 Female: 470 Adult:2092 No :1490 
## 2nd :285 Male :1731 Child: 109 Yes: 711 
## 3rd :706 
## Crew:885 
8 / 30
Function apriori() 
Mine frequent itemsets, association rules or association hyperedges 
using the Apriori algorithm. The Apriori algorithm employs 
level-wise search for frequent itemsets. 
Default settings: 
I minimum support: supp=0.1 
I minimum con
dence: conf=0.8 
I maximum length of rules: maxlen=10 
9 / 30
library(arules) 
rules.all <- apriori(titanic.raw) 
## 
## parameter specification: 
## confidence minval smax arem aval originalSupport support 
## 0.8 0.1 1 none FALSE TRUE 0.1 
## minlen maxlen target ext 
## 1 10 rules FALSE 
## 
## algorithmic control: 
## filter tree heap memopt load sort verbose 
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE 
## 
## apriori - find association rules with the apriori algorithm 
## version 4.21 (2004.05.09) (c) 1996-2004 Christian ... 
## set item appearances ...[0 item(s)] done [0.00s]. 
## set transactions ...[10 item(s), 2201 transaction(s)] done ... 
## sorting and recoding items ... [9 item(s)] done [0.00s]. 
## creating transaction tree ... done [0.00s]. 
## checking subsets of size 1 2 3 4 done [0.00s]. 
## writing ... [27 rule(s)] done [0.00s]. 
## creating S4 object ... done [0.00s]. 
10 / 30
inspect(rules.all) 
## lhs rhs support confidence lift 
## 1 {} => {Age=Adult} 0.9505 0.9505 1.0000 
## 2 {Class=2nd} => {Age=Adult} 0.1186 0.9158 0.9635 
## 3 {Class=1st} => {Age=Adult} 0.1449 0.9815 1.0327 
## 4 {Sex=Female} => {Age=Adult} 0.1931 0.9043 0.9514 
## 5 {Class=3rd} => {Age=Adult} 0.2849 0.8881 0.9344 
## 6 {Survived=Yes} => {Age=Adult} 0.2971 0.9198 0.9678 
## 7 {Class=Crew} => {Sex=Male} 0.3916 0.9740 1.2385 
## 8 {Class=Crew} => {Age=Adult} 0.4021 1.0000 1.0521 
## 9 {Survived=No} => {Sex=Male} 0.6197 0.9154 1.1640 
## 10 {Survived=No} => {Age=Adult} 0.6533 0.9651 1.0154 
## 11 {Sex=Male} => {Age=Adult} 0.7574 0.9630 1.0132 
## 12 {Sex=Female, 
## Survived=Yes} => {Age=Adult} 0.1436 0.9186 0.9665 
## 13 {Class=3rd, 
## Sex=Male} => {Survived=No} 0.1917 0.8275 1.2223 
## 14 {Class=3rd, 
## Survived=No} => {Age=Adult} 0.2163 0.9015 0.9485 
## 15 {Class=3rd, 
## Sex=Male} => {Age=Adult} 0.2099 0.9059 0.9531 
## 16 {Sex=Male, 
## Survived=Yes} => {Age=Adult} 0.1536 0.9210 0.9690 
11 / 30
# rules with rhs containing "Survived" only 
rules <- apriori(titanic.raw, 
control = list(verbose=F), 
parameter = list(minlen=2, supp=0.005, conf=0.8), 
appearance = list(rhs=c("Survived=No", 
"Survived=Yes"), 
default="lhs")) 
## keep three decimal places 
quality(rules) <- round(quality(rules), digits=3) 
## order rules by lift 
rules.sorted <- sort(rules, by="lift") 
12 / 30
inspect(rules.sorted) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1.000 3.096 
## 2 {Class=2nd, 
## Sex=Female, 
## Age=Child} => {Survived=Yes} 0.006 1.000 3.096 
## 3 {Class=1st, 
## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010 
## 4 {Class=1st, 
## Sex=Female, 
## Age=Adult} => {Survived=Yes} 0.064 0.972 3.010 
## 5 {Class=2nd, 
## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716 
## 6 {Class=Crew, 
## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692 
## 7 {Class=Crew, 
## Sex=Female, 
## Age=Adult} => {Survived=Yes} 0.009 0.870 2.692 
## 8 {Class=2nd, 
## Sex=Female, 
## Age=Adult} => {Survived=Yes} 0.036 0.860 2.663 
## 9 {Class=2nd, 
13 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
14 / 30
Redundant Rules 
inspect(rules.sorted[1:2]) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1 3.096 
## 2 {Class=2nd, 
## Sex=Female, 
## Age=Child} => {Survived=Yes} 0.006 1 3.096 
I Rule #2 provides no extra knowledge in addition to rule #1, 
since rules #1 tells us that all 2nd-class children survived. 
I When a rule (such as #2) is a super rule of another rule (#1) 
and the former has the same or a lower lift, the former rule 
(#2) is considered to be redundant. 
I Other redundant rules in the above result are rules #4, #7 
and #8, compared respectively with #3, #6 and #5. 
15 / 30
Remove Redundant Rules 
## find redundant rules 
subset.matrix <- is.subset(rules.sorted, rules.sorted) 
subset.matrix[lower.tri(subset.matrix, diag = T)] <- NA 
redundant <- colSums(subset.matrix, na.rm = T) >= 1 
## which rules are redundant 
which(redundant) 
## [1] 2 4 7 8 
## remove redundant rules 
rules.pruned <- rules.sorted[!redundant] 
16 / 30
Remaining Rules 
inspect(rules.pruned) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1.000 3.096 
## 2 {Class=1st, 
## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010 
## 3 {Class=2nd, 
## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716 
## 4 {Class=Crew, 
## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692 
## 5 {Class=2nd, 
## Sex=Male, 
## Age=Adult} => {Survived=No} 0.070 0.917 1.354 
## 6 {Class=2nd, 
## Sex=Male} => {Survived=No} 0.070 0.860 1.271 
## 7 {Class=3rd, 
## Sex=Male, 
## Age=Adult} => {Survived=No} 0.176 0.838 1.237 
## 8 {Class=3rd, 
## Sex=Male} => {Survived=No} 0.192 0.827 1.222 
17 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
18 / 30
inspect(rules.pruned[1]) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1 3.096 
Did children of the 2nd class have a higher survival rate than other 
children? 
19 / 30
inspect(rules.pruned[1]) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1 3.096 
Did children of the 2nd class have a higher survival rate than other 
children? 
The rule states only that all children of class 2 survived, but 
provides no information at all to compare the survival rates of 
dierent classes. 
19 / 30
Rules about Children 
rules - apriori(titanic.raw, control = list(verbose=F), 
parameter = list(minlen=3, supp=0.002, conf=0.2), 
appearance = list(default=none, rhs=c(Survived=Yes), 
lhs=c(Class=1st, Class=2nd, Class=3rd, 
Age=Child, Age=Adult))) 
rules.sorted - sort(rules, by=confidence) 
inspect(rules.sorted) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} = {Survived=Yes} 0.010904 1.0000 3.0956 
## 2 {Class=1st, 
## Age=Child} = {Survived=Yes} 0.002726 1.0000 3.0956 
## 3 {Class=1st, 
## Age=Adult} = {Survived=Yes} 0.089505 0.6176 1.9117 
## 4 {Class=2nd, 
## Age=Adult} = {Survived=Yes} 0.042708 0.3602 1.1149 
## 5 {Class=3rd, 
## Age=Child} = {Survived=Yes} 0.012267 0.3418 1.0580 
## 6 {Class=3rd, 
## Age=Adult} = {Survived=Yes} 0.068605 0.2408 0.7455 
20 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
21 / 30
library(arulesViz) 
plot(rules.all) 
Scatter plot for 27 rules 
1.25 
1.2 
1.15 
1.1 
1.05 
1 
0.95 
lift 
0.2 0.4 0.6 0.8 
1 
0.95 
0.9 
0.85 
support 
confidence 
22 / 30
plot(rules.all, method = grouped) 
Grouped matrix for 27 rules 
size: support 
1 (Class=Crew +2) 
1 (Class=Crew +1) 
1 (Class=3rd +2) 
1 (Age=Adult +1) 
2 (Class=Crew +1) 
2 (Class=Crew +0) 
2 (Survived=No +0) 
2 (Class=3rd +1) 
2 (Class=Crew +2) 
1 (Class=3rd +2) 
1 (Class=1st +0) 
1 (Sex=Male +1) 
1 (Sex=Male +0) 
1 (Class=1st +−1) 
2 (Survived=Yes +1) 
1 (Sex=Female +1) 
2 (Class=2nd +3) 
1 (Sex=Female +0) 
1 (Class=3rd +1) 
1 (Class=3rd +0) 
color: lift 
{Age=Adult} 
{Survived=No} 
{Sex=Male} 
LHS 
RHS 
23 / 30

More Related Content

What's hot

Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisYabebal Ayalew
 
R Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In RR Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In RRsquared Academy
 
Sentiment Analysis in R
Sentiment Analysis in RSentiment Analysis in R
Sentiment Analysis in REdureka!
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programmingizahn
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data VisualizationStephen Tracy
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis Peter Reimann
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket AnalysisSandeep Prasad
 
Association rule Mining
Association rule MiningAssociation rule Mining
Association rule Miningafsana40
 
Data visualization introduction
Data visualization introductionData visualization introduction
Data visualization introductionManokamnaKochar1
 
Multi dimensional model vs (1)
Multi dimensional model vs (1)Multi dimensional model vs (1)
Multi dimensional model vs (1)JamesDempsey1
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmhktripathy
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Edureka!
 
Math Internal Assessment
Math Internal AssessmentMath Internal Assessment
Math Internal AssessmentJanniie
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to RAngshuman Saha
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmKhushboo Gupta
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisGramener
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with RYanchang Zhao
 

What's hot (20)

Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Market baasket analysis
Market baasket analysisMarket baasket analysis
Market baasket analysis
 
R Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In RR Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In R
 
Sentiment Analysis in R
Sentiment Analysis in RSentiment Analysis in R
Sentiment Analysis in R
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data Visualization
 
Data mining
Data miningData mining
Data mining
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Association rule Mining
Association rule MiningAssociation rule Mining
Association rule Mining
 
Data visualization introduction
Data visualization introductionData visualization introduction
Data visualization introduction
 
Multi dimensional model vs (1)
Multi dimensional model vs (1)Multi dimensional model vs (1)
Multi dimensional model vs (1)
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...
 
Math Internal Assessment
Math Internal AssessmentMath Internal Assessment
Math Internal Assessment
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to R
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
 

Viewers also liked

Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningWan Aezwani Wab
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule miningDeepa Jeya
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataYanchang Zhao
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
Data mining fp growth
Data mining fp growthData mining fp growth
Data mining fp growthShihab Rahman
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysisguest0edcaf
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with RYanchang Zhao
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data MiningYanchang Zhao
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methodsProf.Nilesh Magar
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket AnalysisMahendra Gupta
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization techniquemustafasmart
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with RYanchang Zhao
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageRevolution Analytics
 
Machine Learning and Data Mining: 04 Association Rule Mining
Machine Learning and Data Mining: 04 Association Rule MiningMachine Learning and Data Mining: 04 Association Rule Mining
Machine Learning and Data Mining: 04 Association Rule MiningPier Luca Lanzi
 

Viewers also liked (20)

Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Data mining fp growth
Data mining fp growthData mining fp growth
Data mining fp growth
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data Mining
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization technique
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
 
Machine Learning and Data Mining: 04 Association Rule Mining
Machine Learning and Data Mining: 04 Association Rule MiningMachine Learning and Data Mining: 04 Association Rule Mining
Machine Learning and Data Mining: 04 Association Rule Mining
 

Similar to Association Rule Mining with R

R Activity in Biostatistics
R Activity in BiostatisticsR Activity in Biostatistics
R Activity in BiostatisticsLarry Sultiz
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better MathBrent Schneeman
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.pptbutest
 
Peterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_ProjectPeterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_Projectjpeterson2058
 
Profiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applicationsProfiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applicationsJano Suchal
 
Beautiful python - PyLadies
Beautiful python - PyLadiesBeautiful python - PyLadies
Beautiful python - PyLadiesAlicia Pérez
 
The Ring programming language version 1.5.3 book - Part 36 of 184
The Ring programming language version 1.5.3 book - Part 36 of 184The Ring programming language version 1.5.3 book - Part 36 of 184
The Ring programming language version 1.5.3 book - Part 36 of 184Mahmoud Samir Fayed
 
TAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RTAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RFayan TAO
 
data frames.pptx
data frames.pptxdata frames.pptx
data frames.pptxRacksaviR
 
Hierarchies of LifeExperiment 1 Classification of Common Objects.docx
Hierarchies of LifeExperiment 1 Classification of Common Objects.docxHierarchies of LifeExperiment 1 Classification of Common Objects.docx
Hierarchies of LifeExperiment 1 Classification of Common Objects.docxpooleavelina
 
[1062BPY12001] Data analysis with R / April 19
[1062BPY12001] Data analysis with R / April 19[1062BPY12001] Data analysis with R / April 19
[1062BPY12001] Data analysis with R / April 19Kevin Chun-Hsien Hsu
 
Market Basket Analysis in R
Market Basket Analysis in RMarket Basket Analysis in R
Market Basket Analysis in RRsquared Academy
 
Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)Theodore Grammatikopoulos
 
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Matt Harrison
 
Structured data type
Structured data typeStructured data type
Structured data typeOmkar Majukar
 
Day 1b R structures objects.pptx
Day 1b   R structures   objects.pptxDay 1b   R structures   objects.pptx
Day 1b R structures objects.pptxAdrien Melquiond
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.David Tollmyr
 

Similar to Association Rule Mining with R (20)

R Activity in Biostatistics
R Activity in BiostatisticsR Activity in Biostatistics
R Activity in Biostatistics
 
R and data mining
R and data miningR and data mining
R and data mining
 
R programming
R programmingR programming
R programming
 
Next Level Testing
Next Level TestingNext Level Testing
Next Level Testing
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better Math
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
 
Peterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_ProjectPeterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_Project
 
Profiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applicationsProfiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applications
 
Beautiful python - PyLadies
Beautiful python - PyLadiesBeautiful python - PyLadies
Beautiful python - PyLadies
 
The Ring programming language version 1.5.3 book - Part 36 of 184
The Ring programming language version 1.5.3 book - Part 36 of 184The Ring programming language version 1.5.3 book - Part 36 of 184
The Ring programming language version 1.5.3 book - Part 36 of 184
 
TAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RTAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with R
 
data frames.pptx
data frames.pptxdata frames.pptx
data frames.pptx
 
Hierarchies of LifeExperiment 1 Classification of Common Objects.docx
Hierarchies of LifeExperiment 1 Classification of Common Objects.docxHierarchies of LifeExperiment 1 Classification of Common Objects.docx
Hierarchies of LifeExperiment 1 Classification of Common Objects.docx
 
[1062BPY12001] Data analysis with R / April 19
[1062BPY12001] Data analysis with R / April 19[1062BPY12001] Data analysis with R / April 19
[1062BPY12001] Data analysis with R / April 19
 
Market Basket Analysis in R
Market Basket Analysis in RMarket Basket Analysis in R
Market Basket Analysis in R
 
Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)
 
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
 
Structured data type
Structured data typeStructured data type
Structured data type
 
Day 1b R structures objects.pptx
Day 1b   R structures   objects.pptxDay 1b   R structures   objects.pptx
Day 1b R structures objects.pptx
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.
 

More from Yanchang Zhao

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisYanchang Zhao
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rYanchang Zhao
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classificationYanchang Zhao
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rYanchang Zhao
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationYanchang Zhao
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rYanchang Zhao
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rYanchang Zhao
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-cardYanchang Zhao
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slidesYanchang Zhao
 

More from Yanchang Zhao (11)

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysis
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisation
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-r
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-card
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Association Rule Mining with R

  • 1. Association Rule Mining with R Yanchang Zhao http://www.RDataMining.com 30 September 2014 1 / 30
  • 2. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 2 / 30
  • 3. Association Rule Mining with R 1 I basic concepts of association rules I association rules mining with R I pruning redundant rules I interpreting and visualizing association rules I recommended readings 1Chapter 9: Association Rules, R and Data Mining: Examples and Case Studies. http://www.rdatamining.com/docs/RDataMining.pdf 3 / 30
  • 4. Association Rules Association rules are rules presenting association or correlation between itemsets. support(A ) B) = P(A [ B) con
  • 5. dence(A ) B) = P(BjA) = P(A [ B) P(A) lift(A ) B) = con
  • 6. dence(A ) B) P(B) = P(A [ B) P(A)P(B) where P(A) is the percentage (or probability) of cases containing A. 4 / 30
  • 7. Association Rule Mining Algorithms in R I APRIORI I a level-wise, breadth-
  • 8. rst algorithm which counts transactions to
  • 9. nd frequent itemsets and then derive association rules from them I apriori() in package arules I ECLAT I
  • 10. nds frequent itemsets with equivalence classes, depth-
  • 11. rst search and set intersection instead of counting I eclat() in the same package 5 / 30
  • 12. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 6 / 30
  • 13. The Titanic Dataset I The Titanic dataset in the datasets package is a 4-dimensional table with summarized information on the fate of passengers on the Titanic according to social class, sex, age and survival. I To make it suitable for association rule mining, we reconstruct the raw data as titanic.raw, where each row represents a person. I The reconstructed raw data can also be downloaded at http://www.rdatamining.com/data/titanic.raw.rdata. 7 / 30
  • 14. load("./data/titanic.raw.rdata") ## draw a sample of 5 records idx <- sample(1:nrow(titanic.raw), 5) titanic.raw[idx, ] ## Class Sex Age Survived ## 950 Crew Male Adult No ## 2176 3rd Female Adult Yes ## 1716 Crew Male Adult Yes ## 1001 Crew Male Adult No ## 48 3rd Female Child No summary(titanic.raw) ## Class Sex Age Survived ## 1st :325 Female: 470 Adult:2092 No :1490 ## 2nd :285 Male :1731 Child: 109 Yes: 711 ## 3rd :706 ## Crew:885 8 / 30
  • 15. Function apriori() Mine frequent itemsets, association rules or association hyperedges using the Apriori algorithm. The Apriori algorithm employs level-wise search for frequent itemsets. Default settings: I minimum support: supp=0.1 I minimum con
  • 16. dence: conf=0.8 I maximum length of rules: maxlen=10 9 / 30
  • 17. library(arules) rules.all <- apriori(titanic.raw) ## ## parameter specification: ## confidence minval smax arem aval originalSupport support ## 0.8 0.1 1 none FALSE TRUE 0.1 ## minlen maxlen target ext ## 1 10 rules FALSE ## ## algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## apriori - find association rules with the apriori algorithm ## version 4.21 (2004.05.09) (c) 1996-2004 Christian ... ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[10 item(s), 2201 transaction(s)] done ... ## sorting and recoding items ... [9 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 3 4 done [0.00s]. ## writing ... [27 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s]. 10 / 30
  • 18. inspect(rules.all) ## lhs rhs support confidence lift ## 1 {} => {Age=Adult} 0.9505 0.9505 1.0000 ## 2 {Class=2nd} => {Age=Adult} 0.1186 0.9158 0.9635 ## 3 {Class=1st} => {Age=Adult} 0.1449 0.9815 1.0327 ## 4 {Sex=Female} => {Age=Adult} 0.1931 0.9043 0.9514 ## 5 {Class=3rd} => {Age=Adult} 0.2849 0.8881 0.9344 ## 6 {Survived=Yes} => {Age=Adult} 0.2971 0.9198 0.9678 ## 7 {Class=Crew} => {Sex=Male} 0.3916 0.9740 1.2385 ## 8 {Class=Crew} => {Age=Adult} 0.4021 1.0000 1.0521 ## 9 {Survived=No} => {Sex=Male} 0.6197 0.9154 1.1640 ## 10 {Survived=No} => {Age=Adult} 0.6533 0.9651 1.0154 ## 11 {Sex=Male} => {Age=Adult} 0.7574 0.9630 1.0132 ## 12 {Sex=Female, ## Survived=Yes} => {Age=Adult} 0.1436 0.9186 0.9665 ## 13 {Class=3rd, ## Sex=Male} => {Survived=No} 0.1917 0.8275 1.2223 ## 14 {Class=3rd, ## Survived=No} => {Age=Adult} 0.2163 0.9015 0.9485 ## 15 {Class=3rd, ## Sex=Male} => {Age=Adult} 0.2099 0.9059 0.9531 ## 16 {Sex=Male, ## Survived=Yes} => {Age=Adult} 0.1536 0.9210 0.9690 11 / 30
  • 19. # rules with rhs containing "Survived" only rules <- apriori(titanic.raw, control = list(verbose=F), parameter = list(minlen=2, supp=0.005, conf=0.8), appearance = list(rhs=c("Survived=No", "Survived=Yes"), default="lhs")) ## keep three decimal places quality(rules) <- round(quality(rules), digits=3) ## order rules by lift rules.sorted <- sort(rules, by="lift") 12 / 30
  • 20. inspect(rules.sorted) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1.000 3.096 ## 2 {Class=2nd, ## Sex=Female, ## Age=Child} => {Survived=Yes} 0.006 1.000 3.096 ## 3 {Class=1st, ## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010 ## 4 {Class=1st, ## Sex=Female, ## Age=Adult} => {Survived=Yes} 0.064 0.972 3.010 ## 5 {Class=2nd, ## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716 ## 6 {Class=Crew, ## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692 ## 7 {Class=Crew, ## Sex=Female, ## Age=Adult} => {Survived=Yes} 0.009 0.870 2.692 ## 8 {Class=2nd, ## Sex=Female, ## Age=Adult} => {Survived=Yes} 0.036 0.860 2.663 ## 9 {Class=2nd, 13 / 30
  • 21. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 14 / 30
  • 22. Redundant Rules inspect(rules.sorted[1:2]) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1 3.096 ## 2 {Class=2nd, ## Sex=Female, ## Age=Child} => {Survived=Yes} 0.006 1 3.096 I Rule #2 provides no extra knowledge in addition to rule #1, since rules #1 tells us that all 2nd-class children survived. I When a rule (such as #2) is a super rule of another rule (#1) and the former has the same or a lower lift, the former rule (#2) is considered to be redundant. I Other redundant rules in the above result are rules #4, #7 and #8, compared respectively with #3, #6 and #5. 15 / 30
  • 23. Remove Redundant Rules ## find redundant rules subset.matrix <- is.subset(rules.sorted, rules.sorted) subset.matrix[lower.tri(subset.matrix, diag = T)] <- NA redundant <- colSums(subset.matrix, na.rm = T) >= 1 ## which rules are redundant which(redundant) ## [1] 2 4 7 8 ## remove redundant rules rules.pruned <- rules.sorted[!redundant] 16 / 30
  • 24. Remaining Rules inspect(rules.pruned) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1.000 3.096 ## 2 {Class=1st, ## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010 ## 3 {Class=2nd, ## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716 ## 4 {Class=Crew, ## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692 ## 5 {Class=2nd, ## Sex=Male, ## Age=Adult} => {Survived=No} 0.070 0.917 1.354 ## 6 {Class=2nd, ## Sex=Male} => {Survived=No} 0.070 0.860 1.271 ## 7 {Class=3rd, ## Sex=Male, ## Age=Adult} => {Survived=No} 0.176 0.838 1.237 ## 8 {Class=3rd, ## Sex=Male} => {Survived=No} 0.192 0.827 1.222 17 / 30
  • 25. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 18 / 30
  • 26. inspect(rules.pruned[1]) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1 3.096 Did children of the 2nd class have a higher survival rate than other children? 19 / 30
  • 27. inspect(rules.pruned[1]) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1 3.096 Did children of the 2nd class have a higher survival rate than other children? The rule states only that all children of class 2 survived, but provides no information at all to compare the survival rates of dierent classes. 19 / 30
  • 28. Rules about Children rules - apriori(titanic.raw, control = list(verbose=F), parameter = list(minlen=3, supp=0.002, conf=0.2), appearance = list(default=none, rhs=c(Survived=Yes), lhs=c(Class=1st, Class=2nd, Class=3rd, Age=Child, Age=Adult))) rules.sorted - sort(rules, by=confidence) inspect(rules.sorted) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} = {Survived=Yes} 0.010904 1.0000 3.0956 ## 2 {Class=1st, ## Age=Child} = {Survived=Yes} 0.002726 1.0000 3.0956 ## 3 {Class=1st, ## Age=Adult} = {Survived=Yes} 0.089505 0.6176 1.9117 ## 4 {Class=2nd, ## Age=Adult} = {Survived=Yes} 0.042708 0.3602 1.1149 ## 5 {Class=3rd, ## Age=Child} = {Survived=Yes} 0.012267 0.3418 1.0580 ## 6 {Class=3rd, ## Age=Adult} = {Survived=Yes} 0.068605 0.2408 0.7455 20 / 30
  • 29. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 21 / 30
  • 30. library(arulesViz) plot(rules.all) Scatter plot for 27 rules 1.25 1.2 1.15 1.1 1.05 1 0.95 lift 0.2 0.4 0.6 0.8 1 0.95 0.9 0.85 support confidence 22 / 30
  • 31. plot(rules.all, method = grouped) Grouped matrix for 27 rules size: support 1 (Class=Crew +2) 1 (Class=Crew +1) 1 (Class=3rd +2) 1 (Age=Adult +1) 2 (Class=Crew +1) 2 (Class=Crew +0) 2 (Survived=No +0) 2 (Class=3rd +1) 2 (Class=Crew +2) 1 (Class=3rd +2) 1 (Class=1st +0) 1 (Sex=Male +1) 1 (Sex=Male +0) 1 (Class=1st +−1) 2 (Survived=Yes +1) 1 (Sex=Female +1) 2 (Class=2nd +3) 1 (Sex=Female +0) 1 (Class=3rd +1) 1 (Class=3rd +0) color: lift {Age=Adult} {Survived=No} {Sex=Male} LHS RHS 23 / 30
  • 32. plot(rules.all, method = graph) Graph for 27 rules {Class=3rd,Sex=Male,Age=Adult} {Sex=Male,Survived=No} {Class=3rd,Survived=No} {Sex=Male,Survived=Yes} {Survived=No} {Age=Adult} {Age=Adult,Survived=No} {} {Class=1st} {Sex=Female} {Class=2nd} {Class=3rd,Age=Adult,Survived=No} {Class=3rd,Sex=Male,Survived=No} {Class=3rd,Sex=Male} {Class=3rd} {Class=Crew,Age=Adult,Survived=No} {Class=Crew,Age=Adult} {Class=Crew,Sex=Male,Survived=No} {Class=Crew,Sex=Male} {Class=Crew,Survived=No} {Class=Crew} {Sex=Female,Survived=Yes} {Sex=Male} {Survived=Yes} width: support (0.119 − 0.95) color: lift (0.934 − 1.266) 24 / 30
  • 33. plot(rules.all, method = graph, control = list(type = items)) Graph for 27 rules Class=1st Class=2nd Survived=Yes Class=3rd Class=Crew Sex=Female Age=Adult Sex=Male Survived=No size: support (0.119 − 0.95) color: lift (0.934 − 1.266) 25 / 30
  • 34. plot(rules.all, method = paracoord, control = list(reorder = TRUE)) Parallel coordinates plot for 27 rules 3 2 1 rhs Class=1st Survived=No Survived=Yes Class=3rd Class=2nd Sex=Male Sex=Female Class=Crew Age=Adult Position 26 / 30
  • 35. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 27 / 30
  • 36. Further Readings I More than 20 interestingness measures, such as chi-square, conviction, gini and leverage Tan, P.-N., Kumar, V., and Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In Proc. of KDD '02, pages 32-41, New York, NY, USA. ACM Press. I Post mining of association rules, such as selecting interesting association rules, visualization of association rules and using association rules for classi
  • 37. cation Yanchang Zhao, Chengqi Zhang and Longbing Cao (Eds.). Post-Mining of Association Rules: Techniques for Eective Knowledge Extraction, ISBN 978-1-60566-404-0, May 2009. Information Science Reference. I Package arulesSequences: mining sequential patterns http://cran.r-project.org/web/packages/arulesSequences/ 28 / 30
  • 38. Online Resources I Chapter 9: Association Rules, in book R and Data Mining: Examples and Case Studies http://www.rdatamining.com/docs/RDataMining.pdf I R Reference Card for Data Mining http://www.rdatamining.com/docs/R-refcard-data-mining.pdf I Free online courses and documents http://www.rdatamining.com/resources/ I RDataMining Group on LinkedIn (7,000+ members) http://group.rdatamining.com I RDataMining on Twitter (1,700+ followers) @RDataMining 29 / 30
  • 39. The End Thanks! Email: yanchang(at)rdatamining.com 30 / 30