Visualization of Supervised Learning with {arules} + {arulesViz}

Visualization of
Supervised Learning with
{arules} + {arulesViz}
Takashi J. OZAKI, Ph. D.
Recruit Communications Co., Ltd.
2014/4/17 1

About me
 Twitter: @TJO_datasci
 Data Scientist (Quant Analyst) in Recruit group
 A group of companies in advertisement media and
human resources
 Known as a major player with big data
 Current mission: ad-hoc analysis on various
marketing data
 Actually, still I’m new to the field of data science
2014/4/17 2

About me
 Original background: neuroscience in the human
brain (6 years experience as postdoc researcher)
2014/4/17 3
(Ozaki, PLoS One, 2011)

About me
 English version of my blog
http://tjo-en.hatenablog.com/
2014/4/17 4

2014/4/17 5
Tonight’s topic is:

2014/4/17 6
Graphical Visualization of
Supervised Learning

Advantage of this technique
More intuitive
Easy to grasp even for high-
dimensional data
Even lay guys can easily understand
Useful for presentation
2014/4/17 7

Supervised learning: lower dimension, more intuitive
 In case of 2D data… (e.g. nonlinear SVM)
2014/4/17 8
x y label
0.924335 -1.0665Yes
2.109901 2.615284No
0.988192 -0.90812Yes
1.299749 0.944518No
-0.60885 0.457816Yes
-2.25484 1.615489Yes

Supervised learning: higher dimension, less intuitive
 In case of 7D… no way!!!
2014/4/17 9
game1 game2 game3 social1 social2 app1 app2 cv
0 0 0 1 0 0 0No
1 0 0 1 1 0 0No
0 1 1 1 1 1 0Yes
0 0 1 1 0 1 1Yes
1 0 1 0 1 1 1Yes
0 0 0 1 1 1 0No
… … … … … … ……
???

2014/4/17 10
Is there any technique
that can easily visualize
supervised learning with
higher dimension?
(…for lay people?)

2014/4/17 11
 {arules} + {arulesViz}

Why association rules and its visualization?
 Much roughly, association rules can be interpreted
as a kind of (likeness of) generative modeling
 A large set of conditional probability
 If it can be regarded as a set of conditional
probability, it also can be described as (likeness of)
Bayesian network
“XY”
 If it’s like a Bayesian network, it can be visualized
as graph representation, e.g. by {igraph}
2014/4/17 12
𝑠𝑢𝑝𝑝 𝑋 → 𝑌 =
𝜎(𝑋 ∪ 𝑌)
𝑀
𝑐𝑜𝑛𝑓 𝑋 → 𝑌 =
𝑠𝑢𝑝𝑝(𝑋 → 𝑌)
𝑠𝑢𝑝𝑝(𝑋)
𝑙𝑖𝑓𝑡 𝑋 → 𝑌 =
𝑐𝑜𝑛𝑓(𝑋 → 𝑌)
𝑠𝑢𝑝𝑝(𝑌)
X Y

Further points…
 Only when all of independent variables are bivariate,
they can be handled as “basket transaction”
2014/4/17 13
0 0 0 1 0 0 0No
1 0 0 1 1 0 0No
0 1 1 1 1 1 0Yes
0 0 1 1 0 1 1Yes
1 0 1 0 1 1 1Yes
0 0 0 1 1 1 0No
… … … … … … ……
{social1, No}
{game1, social1, social2, No}
{game2, game3, social1, social2, app1, Yes}
{game3, social1, app1, app2, Yes}
{game1, game3, social2, app1, app2, Yes}
{socia1, social2, app1, No}
…

2014/4/17 14
Let’s try in R!

Sample data “d1”
2014/4/17 15
0 0 0 1 0 0 0No
1 0 0 1 1 0 0No
0 1 1 1 1 1 0Yes
0 0 1 1 0 1 1Yes
1 0 1 0 1 1 1Yes
0 0 0 1 1 1 0No
… … … … … … ……
Imagine you’re working on a certain platform for web entertainment.
It has 3 SP games, 2 SP social networking, 2 apps.
The data records user’s history of any activity on each content in a
month after registration, and “cv” label describes they are still active
after a month passed.

In the case with svm {e1071}…
2014/4/17 16
> d1.svm<-svm(cv~.,d1) # install and require {e1071}
# svm {e1071}
> table(d1$cv,predict(d1.svm,d1[,-8]))
No Yes
No 1402 98
Yes 80 1420
# Good accuracy (only for training data)

In the case with randomForest {randomForest}…
2014/4/17 17
> tuneRF(d1[,-8],d1[,8],doBest=T) # install and require {randomForest}
# (omitted)
> d1.rf<-randomForest(cv~.,d1,mtry=2)
# randomForest {randomForest}
> table(d1$cv,predict(d1.rf,d1[,-8]))
No Yes
No 1413 87
Yes 92 1408
# Good accuracy
> importance(d1.rf)
MeanDecreaseGini
game1 20.640253
game2 12.115196
game3 2.355584
social1 189.053648
social2 76.476470
app1 796.937087
app2 2.804019
# Variable importance (without any directionality)

In the case with glm {stats}…
2014/4/17 18
> d1.glm<-glm(cv~.,d1,family=binomial)
> summary(d1.glm)
Call:
glm(formula = cv ~ ., family = binomial, data = d1)
# (omitted)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.37793 0.25979 -5.304 1.13e-07 ***
game1 1.05846 0.17344 6.103 1.04e-09 ***
game2 -0.54914 0.16752 -3.278 0.00105 **
game3 0.12035 0.16803 0.716 0.47386
social1 -3.00110 0.21653 -13.860 < 2e-16 ***
social2 1.53098 0.17349 8.824 < 2e-16 ***
app1 5.33547 0.19191 27.802 < 2e-16 ***
app2 0.07811 0.16725 0.467 0.64048
---
# (omitted)

Sample data converted for transactions “d2”
2014/4/17 19
game1 game2 game3 social1 social2 app1 app2 yes no
0 0 0 1 0 0 0 0 1
1 0 0 1 1 0 0 0 1
0 1 1 1 1 1 0 1 0
0 0 1 1 0 1 1 1 0
1 0 1 0 1 1 1 1 0
0 0 0 1 1 1 0 0 1
… … … … … … … … …
Just “cv” column was divided into 2 columns: “yes” and “no” with
bivariate (0 or 1)

Run apriori {arules} to get association rules
2014/4/17 20
> d2.ap.small<-apriori(as.matrix(d2)) # install and require {arules}
parameter specification:
confidence minval smax arem aval originalSupport support minlen
maxlen target ext
0.8 0.1 1 none FALSE TRUE 0.1 1 10 rules FALSE
algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[9 item(s), 3000 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [50 rule(s)] done [0.00s]. # only 50 rules…
creating S4 object ... done [0.00s].

Run apriori {arules} to get association rules
2014/4/17 21
> d2.ap.large<-apriori(as.matrix(d2),parameter=list(support=0.001))
parameter specification:
confidence minval smax arem aval originalSupport support minlen
maxlen target ext
0.8 0.1 1 none FALSE TRUE 0.001 1 10 rules FALSE
algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[9 item(s), 3000 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 7 8 done [0.00s].
writing ... [182 rule(s)] done [0.00s]. # as much as 182 rules
creating S4 object ... done [0.00s].

OK, just visualize it
2014/4/17 22
> require(“arulesViz”)
# (omitted)
> plot(d2.ap.small, method=“graph”, control=list(type=“items”,
layout=layout.fruchterman.reingold,))
> plot(d2.ap.large, method=“graph”, control=list(type=“items”,
layout=layout.fruchterman.reingold,))
# Fruchterman – Reingold force-directed graph drawing algorithm can
locate nodes with distances that is proportional to “shortest path
length” between them
# Then nodes (items) should be located based on their “closeness”
between each other

Small set of rules visualized with {arulesViz}
2014/4/17 23

Compare with a result of glm
2014/4/17 24
> d1.glm<-glm(cv~.,d1,family=binomial)
> summary(d1.glm)
Call:
glm(formula = cv ~ ., family = binomial, data = d1)
# (omitted)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.37793 0.25979 -5.304 1.13e-07 ***
game1 1.05846 0.17344 6.103 1.04e-09 ***
game2 -0.54914 0.16752 -3.278 0.00105 **
game3 0.12035 0.16803 0.716 0.47386
social1 -3.00110 0.21653 -13.860 < 2e-16 ***
social2 1.53098 0.17349 8.824 < 2e-16 ***
app1 5.33547 0.19191 27.802 < 2e-16 ***
app2 0.07811 0.16725 0.467 0.64048
---
# (omitted)

Large set of rules visualized with {arulesViz}
2014/4/17 25

Compare with a result of randomForest
2014/4/17 26
> tuneRF(d1[,-8],d1[,8],doBest=T) # install and require {randomForest}
# (omitted)
> d1.rf<-randomForest(cv~.,d1,mtry=2)
# randomForest {randomForest}
> table(d1$cv,predict(d1.rf,d1[,-8]))
No Yes
No 1413 87
Yes 92 1408
# Good accuracy
> importance(d1.rf)
MeanDecreaseGini
game1 20.640253
game2 12.115196
game3 2.355584
social1 189.053648
social2 76.476470
app1 796.937087
app2 2.804019
# Variable importance (without any directionality)

See how far nodes are from yes / no
2014/4/17 27

Large set of rules visualized with {arulesViz}
2014/4/17 28

Advantage of this technique
More intuitive
Easy to grasp even for high-
dimensional data
Even lay guys can easily understand
Useful for presentation
2014/4/17 29

Disadvantage of this technique
Less strict
Never quantitative
2014/4/17 30

Any questions or comments?
2014/4/17 31
Don’t hesitate to ask me!
@TJO_datasci

Visualization of Supervised Learning with {arules} + {arulesViz}

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Visualization of Supervised Learning with {arules} + {arulesViz}

Similar to Visualization of Supervised Learning with {arules} + {arulesViz} (20)

Recently uploaded

Recently uploaded (20)

Visualization of Supervised Learning with {arules} + {arulesViz}