SlideShare a Scribd company logo
Visualization of
Supervised Learning with
{arules} + {arulesViz}
Takashi J. OZAKI, Ph. D.
Recruit Communications Co., Ltd.
2014/4/17 1
About me
 Twitter: @TJO_datasci
 Data Scientist (Quant Analyst) in Recruit group
 A group of companies in advertisement media and
human resources
 Known as a major player with big data
 Current mission: ad-hoc analysis on various
marketing data
 Actually, still I’m new to the field of data science
2014/4/17 2
About me
 Original background: neuroscience in the human
brain (6 years experience as postdoc researcher)
2014/4/17 3
(Ozaki, PLoS One, 2011)
About me
 English version of my blog
http://tjo-en.hatenablog.com/
2014/4/17 4
2014/4/17 5
Tonight’s topic is:
2014/4/17 6
Graphical Visualization of
Supervised Learning
Advantage of this technique
More intuitive
Easy to grasp even for high-
dimensional data
Even lay guys can easily understand
Useful for presentation
2014/4/17 7
Supervised learning: lower dimension, more intuitive
 In case of 2D data… (e.g. nonlinear SVM)
2014/4/17 8
x y label
0.924335 -1.0665Yes
2.109901 2.615284No
0.988192 -0.90812Yes
1.299749 0.944518No
-0.60885 0.457816Yes
-2.25484 1.615489Yes
Supervised learning: higher dimension, less intuitive
 In case of 7D… no way!!!
2014/4/17 9
game1 game2 game3 social1 social2 app1 app2 cv
0 0 0 1 0 0 0No
1 0 0 1 1 0 0No
0 1 1 1 1 1 0Yes
0 0 1 1 0 1 1Yes
1 0 1 0 1 1 1Yes
0 0 0 1 1 1 0No
… … … … … … ……
???
2014/4/17 10
Is there any technique
that can easily visualize
supervised learning with
higher dimension?
(…for lay people?)
2014/4/17 11
 {arules} + {arulesViz}
Why association rules and its visualization?
 Much roughly, association rules can be interpreted
as a kind of (likeness of) generative modeling
 A large set of conditional probability
 If it can be regarded as a set of conditional
probability, it also can be described as (likeness of)
Bayesian network
“XY”
 If it’s like a Bayesian network, it can be visualized
as graph representation, e.g. by {igraph}
2014/4/17 12
𝑠𝑢𝑝𝑝 𝑋 → 𝑌 =
𝜎(𝑋 ∪ 𝑌)
𝑀
𝑐𝑜𝑛𝑓 𝑋 → 𝑌 =
𝑠𝑢𝑝𝑝(𝑋 → 𝑌)
𝑠𝑢𝑝𝑝(𝑋)
𝑙𝑖𝑓𝑡 𝑋 → 𝑌 =
𝑐𝑜𝑛𝑓(𝑋 → 𝑌)
𝑠𝑢𝑝𝑝(𝑌)
X Y
Further points…
 Only when all of independent variables are bivariate,
they can be handled as “basket transaction”
2014/4/17 13
game1 game2 game3 social1 social2 app1 app2 cv
0 0 0 1 0 0 0No
1 0 0 1 1 0 0No
0 1 1 1 1 1 0Yes
0 0 1 1 0 1 1Yes
1 0 1 0 1 1 1Yes
0 0 0 1 1 1 0No
… … … … … … ……
{social1, No}
{game1, social1, social2, No}
{game2, game3, social1, social2, app1, Yes}
{game3, social1, app1, app2, Yes}
{game1, game3, social2, app1, app2, Yes}
{socia1, social2, app1, No}
…
2014/4/17 14
Let’s try in R!
Sample data “d1”
2014/4/17 15
game1 game2 game3 social1 social2 app1 app2 cv
0 0 0 1 0 0 0No
1 0 0 1 1 0 0No
0 1 1 1 1 1 0Yes
0 0 1 1 0 1 1Yes
1 0 1 0 1 1 1Yes
0 0 0 1 1 1 0No
… … … … … … ……
Imagine you’re working on a certain platform for web entertainment.
It has 3 SP games, 2 SP social networking, 2 apps.
The data records user’s history of any activity on each content in a
month after registration, and “cv” label describes they are still active
after a month passed.
In the case with svm {e1071}…
2014/4/17 16
> d1.svm<-svm(cv~.,d1) # install and require {e1071}
# svm {e1071}
> table(d1$cv,predict(d1.svm,d1[,-8]))
No Yes
No 1402 98
Yes 80 1420
# Good accuracy (only for training data)
In the case with randomForest {randomForest}…
2014/4/17 17
> tuneRF(d1[,-8],d1[,8],doBest=T) # install and require {randomForest}
# (omitted)
> d1.rf<-randomForest(cv~.,d1,mtry=2)
# randomForest {randomForest}
> table(d1$cv,predict(d1.rf,d1[,-8]))
No Yes
No 1413 87
Yes 92 1408
# Good accuracy
> importance(d1.rf)
MeanDecreaseGini
game1 20.640253
game2 12.115196
game3 2.355584
social1 189.053648
social2 76.476470
app1 796.937087
app2 2.804019
# Variable importance (without any directionality)
In the case with glm {stats}…
2014/4/17 18
> d1.glm<-glm(cv~.,d1,family=binomial)
> summary(d1.glm)
Call:
glm(formula = cv ~ ., family = binomial, data = d1)
# (omitted)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.37793 0.25979 -5.304 1.13e-07 ***
game1 1.05846 0.17344 6.103 1.04e-09 ***
game2 -0.54914 0.16752 -3.278 0.00105 **
game3 0.12035 0.16803 0.716 0.47386
social1 -3.00110 0.21653 -13.860 < 2e-16 ***
social2 1.53098 0.17349 8.824 < 2e-16 ***
app1 5.33547 0.19191 27.802 < 2e-16 ***
app2 0.07811 0.16725 0.467 0.64048
---
# (omitted)
Sample data converted for transactions “d2”
2014/4/17 19
game1 game2 game3 social1 social2 app1 app2 yes no
0 0 0 1 0 0 0 0 1
1 0 0 1 1 0 0 0 1
0 1 1 1 1 1 0 1 0
0 0 1 1 0 1 1 1 0
1 0 1 0 1 1 1 1 0
0 0 0 1 1 1 0 0 1
… … … … … … … … …
Just “cv” column was divided into 2 columns: “yes” and “no” with
bivariate (0 or 1)
Run apriori {arules} to get association rules
2014/4/17 20
> d2.ap.small<-apriori(as.matrix(d2)) # install and require {arules}
parameter specification:
confidence minval smax arem aval originalSupport support minlen
maxlen target ext
0.8 0.1 1 none FALSE TRUE 0.1 1 10 rules FALSE
algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[9 item(s), 3000 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [50 rule(s)] done [0.00s]. # only 50 rules…
creating S4 object ... done [0.00s].
Run apriori {arules} to get association rules
2014/4/17 21
> d2.ap.large<-apriori(as.matrix(d2),parameter=list(support=0.001))
parameter specification:
confidence minval smax arem aval originalSupport support minlen
maxlen target ext
0.8 0.1 1 none FALSE TRUE 0.001 1 10 rules FALSE
algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[9 item(s), 3000 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 7 8 done [0.00s].
writing ... [182 rule(s)] done [0.00s]. # as much as 182 rules
creating S4 object ... done [0.00s].
OK, just visualize it
2014/4/17 22
> require(“arulesViz”)
# (omitted)
> plot(d2.ap.small, method=“graph”, control=list(type=“items”,
layout=layout.fruchterman.reingold,))
> plot(d2.ap.large, method=“graph”, control=list(type=“items”,
layout=layout.fruchterman.reingold,))
# Fruchterman – Reingold force-directed graph drawing algorithm can
locate nodes with distances that is proportional to “shortest path
length” between them
# Then nodes (items) should be located based on their “closeness”
between each other
Small set of rules visualized with {arulesViz}
2014/4/17 23
Compare with a result of glm
2014/4/17 24
> d1.glm<-glm(cv~.,d1,family=binomial)
> summary(d1.glm)
Call:
glm(formula = cv ~ ., family = binomial, data = d1)
# (omitted)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.37793 0.25979 -5.304 1.13e-07 ***
game1 1.05846 0.17344 6.103 1.04e-09 ***
game2 -0.54914 0.16752 -3.278 0.00105 **
game3 0.12035 0.16803 0.716 0.47386
social1 -3.00110 0.21653 -13.860 < 2e-16 ***
social2 1.53098 0.17349 8.824 < 2e-16 ***
app1 5.33547 0.19191 27.802 < 2e-16 ***
app2 0.07811 0.16725 0.467 0.64048
---
# (omitted)
Large set of rules visualized with {arulesViz}
2014/4/17 25
Compare with a result of randomForest
2014/4/17 26
> tuneRF(d1[,-8],d1[,8],doBest=T) # install and require {randomForest}
# (omitted)
> d1.rf<-randomForest(cv~.,d1,mtry=2)
# randomForest {randomForest}
> table(d1$cv,predict(d1.rf,d1[,-8]))
No Yes
No 1413 87
Yes 92 1408
# Good accuracy
> importance(d1.rf)
MeanDecreaseGini
game1 20.640253
game2 12.115196
game3 2.355584
social1 189.053648
social2 76.476470
app1 796.937087
app2 2.804019
# Variable importance (without any directionality)
See how far nodes are from yes / no
2014/4/17 27
Large set of rules visualized with {arulesViz}
2014/4/17 28
Advantage of this technique
More intuitive
Easy to grasp even for high-
dimensional data
Even lay guys can easily understand
Useful for presentation
2014/4/17 29
Disadvantage of this technique
Less strict
Never quantitative
2014/4/17 30
Any questions or comments?
2014/4/17 31
Don’t hesitate to ask me!
@TJO_datasci

More Related Content

What's hot

Palestra sobre Collections com Python
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Pythonpugpe
 
밑바닥부터 시작하는 의료 AI
밑바닥부터 시작하는 의료 AI밑바닥부터 시작하는 의료 AI
밑바닥부터 시작하는 의료 AI
NAVER Engineering
 
Seistech SQL code
Seistech SQL codeSeistech SQL code
Seistech SQL codeSimon Hoyle
 
PHP 5.4
PHP 5.4PHP 5.4
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
PgDay.Seoul
 
M12 random forest-part01
M12 random forest-part01M12 random forest-part01
M12 random forest-part01
Raman Kannan
 
Clustering com numpy e cython
Clustering com numpy e cythonClustering com numpy e cython
Clustering com numpy e cythonAnderson Dantas
 
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Raman Kannan
 
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of WranglingPLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
Plotly
 
Database API, your new friend
Database API, your new friendDatabase API, your new friend
Database API, your new friendkikoalonsob
 
M11 bagging loo cv
M11 bagging loo cvM11 bagging loo cv
M11 bagging loo cv
Raman Kannan
 
Session 02
Session 02Session 02
Session 02
Felix Müller
 
M09-Cross validating-naive-bayes
M09-Cross validating-naive-bayesM09-Cross validating-naive-bayes
M09-Cross validating-naive-bayes
Raman Kannan
 
第7回 大規模データを用いたデータフレーム操作実習(1)
第7回 大規模データを用いたデータフレーム操作実習(1)第7回 大規模データを用いたデータフレーム操作実習(1)
第7回 大規模データを用いたデータフレーム操作実習(1)
Wataru Shito
 
手把手教你 R 語言分析實務
手把手教你 R 語言分析實務手把手教你 R 語言分析實務
手把手教你 R 語言分析實務
Helen Chang
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection
James Huang
 
Drupal 8 database api
Drupal 8 database apiDrupal 8 database api
Drupal 8 database api
Viswanath Polaki
 

What's hot (20)

Five
FiveFive
Five
 
Palestra sobre Collections com Python
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Python
 
밑바닥부터 시작하는 의료 AI
밑바닥부터 시작하는 의료 AI밑바닥부터 시작하는 의료 AI
밑바닥부터 시작하는 의료 AI
 
Seistech SQL code
Seistech SQL codeSeistech SQL code
Seistech SQL code
 
03
0303
03
 
PHP 5.4
PHP 5.4PHP 5.4
PHP 5.4
 
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
 
M12 random forest-part01
M12 random forest-part01M12 random forest-part01
M12 random forest-part01
 
Clustering com numpy e cython
Clustering com numpy e cythonClustering com numpy e cython
Clustering com numpy e cython
 
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
 
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of WranglingPLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
 
Database API, your new friend
Database API, your new friendDatabase API, your new friend
Database API, your new friend
 
M11 bagging loo cv
M11 bagging loo cvM11 bagging loo cv
M11 bagging loo cv
 
Session 02
Session 02Session 02
Session 02
 
Python 1
Python 1Python 1
Python 1
 
M09-Cross validating-naive-bayes
M09-Cross validating-naive-bayesM09-Cross validating-naive-bayes
M09-Cross validating-naive-bayes
 
第7回 大規模データを用いたデータフレーム操作実習(1)
第7回 大規模データを用いたデータフレーム操作実習(1)第7回 大規模データを用いたデータフレーム操作実習(1)
第7回 大規模データを用いたデータフレーム操作実習(1)
 
手把手教你 R 語言分析實務
手把手教你 R 語言分析實務手把手教你 R 語言分析實務
手把手教你 R 語言分析實務
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection
 
Drupal 8 database api
Drupal 8 database apiDrupal 8 database api
Drupal 8 database api
 

Viewers also liked

Taste of Wine vs. Data Science
Taste of Wine vs. Data ScienceTaste of Wine vs. Data Science
Taste of Wine vs. Data Science
Takashi J OZAKI
 
「データサイエンティスト・ブーム」後の企業におけるデータ分析者像を探る
「データサイエンティスト・ブーム」後の企業におけるデータ分析者像を探る「データサイエンティスト・ブーム」後の企業におけるデータ分析者像を探る
「データサイエンティスト・ブーム」後の企業におけるデータ分析者像を探る
Takashi J OZAKI
 
Deep Learningと他の分類器をRで比べてみよう in Japan.R 2014
Deep Learningと他の分類器をRで比べてみよう in Japan.R 2014Deep Learningと他の分類器をRで比べてみよう in Japan.R 2014
Deep Learningと他の分類器をRで比べてみよう in Japan.R 2014
Takashi J OZAKI
 
直感的な単変量モデルでは予測できない「ワインの味」を多変量モデルで予測する
直感的な単変量モデルでは予測できない「ワインの味」を多変量モデルで予測する直感的な単変量モデルでは予測できない「ワインの味」を多変量モデルで予測する
直感的な単変量モデルでは予測できない「ワインの味」を多変量モデルで予測する
Takashi J OZAKI
 
Rによるやさしい統計学第20章「検定力分析によるサンプルサイズの決定」
Rによるやさしい統計学第20章「検定力分析によるサンプルサイズの決定」Rによるやさしい統計学第20章「検定力分析によるサンプルサイズの決定」
Rによるやさしい統計学第20章「検定力分析によるサンプルサイズの決定」
Takashi J OZAKI
 
データ分析というお仕事のこれまでとこれから(HCMPL2014)
データ分析というお仕事のこれまでとこれから(HCMPL2014)データ分析というお仕事のこれまでとこれから(HCMPL2014)
データ分析というお仕事のこれまでとこれから(HCMPL2014)
Takashi J OZAKI
 
Trading volume mapping R in recent environment
Trading volume mapping R in recent environment Trading volume mapping R in recent environment
Trading volume mapping R in recent environment
Nagi Teramo
 
最新業界事情から見るデータサイエンティストの「実像」
最新業界事情から見るデータサイエンティストの「実像」最新業界事情から見るデータサイエンティストの「実像」
最新業界事情から見るデータサイエンティストの「実像」
Takashi J OZAKI
 
Jc 20141003 tjo
Jc 20141003 tjoJc 20141003 tjo
Jc 20141003 tjo
Takashi J OZAKI
 
『手を動かしながら学ぶ ビジネスに活かすデータマイニング』で目指したもの・学んでもらいたいもの
『手を動かしながら学ぶ ビジネスに活かすデータマイニング』で目指したもの・学んでもらいたいもの『手を動かしながら学ぶ ビジネスに活かすデータマイニング』で目指したもの・学んでもらいたいもの
『手を動かしながら学ぶ ビジネスに活かすデータマイニング』で目指したもの・学んでもらいたいもの
Takashi J OZAKI
 
Granger因果による 時系列データの因果推定(因果フェス2015)
Granger因果による時系列データの因果推定(因果フェス2015)Granger因果による時系列データの因果推定(因果フェス2015)
Granger因果による 時系列データの因果推定(因果フェス2015)
Takashi J OZAKI
 
計量時系列分析の立場からビジネスの現場のデータを見てみよう - 30th Tokyo Webmining
計量時系列分析の立場からビジネスの現場のデータを見てみよう - 30th Tokyo Webmining計量時系列分析の立場からビジネスの現場のデータを見てみよう - 30th Tokyo Webmining
計量時系列分析の立場からビジネスの現場のデータを見てみよう - 30th Tokyo Webmining
Takashi J OZAKI
 
21世紀で最もセクシーな職業!?「データサイエンティスト」の実像に迫る
21世紀で最もセクシーな職業!?「データサイエンティスト」の実像に迫る21世紀で最もセクシーな職業!?「データサイエンティスト」の実像に迫る
21世紀で最もセクシーな職業!?「データサイエンティスト」の実像に迫る
Takashi J OZAKI
 
Rで計量時系列分析~CRANパッケージ総ざらい~
Rで計量時系列分析~CRANパッケージ総ざらい~ Rで計量時系列分析~CRANパッケージ総ざらい~
Rで計量時系列分析~CRANパッケージ総ざらい~
Takashi J OZAKI
 
ビジネスの現場のデータ分析における理想と現実
ビジネスの現場のデータ分析における理想と現実ビジネスの現場のデータ分析における理想と現実
ビジネスの現場のデータ分析における理想と現実
Takashi J OZAKI
 
Tech Lab Paak講演会 20150601
Tech Lab Paak講演会 20150601Tech Lab Paak講演会 20150601
Tech Lab Paak講演会 20150601
Takashi J OZAKI
 
なぜ統計学がビジネスの 意思決定において大事なのか?
なぜ統計学がビジネスの 意思決定において大事なのか?なぜ統計学がビジネスの 意思決定において大事なのか?
なぜ統計学がビジネスの 意思決定において大事なのか?
Takashi J OZAKI
 
Simple perceptron by TJO
Simple perceptron by TJOSimple perceptron by TJO
Simple perceptron by TJOTakashi J OZAKI
 
傾向スコアを使ったキャンペーン効果検証V1
傾向スコアを使ったキャンペーン効果検証V1傾向スコアを使ったキャンペーン効果検証V1
傾向スコアを使ったキャンペーン効果検証V1Kazuya Obanayama
 

Viewers also liked (20)

Taste of Wine vs. Data Science
Taste of Wine vs. Data ScienceTaste of Wine vs. Data Science
Taste of Wine vs. Data Science
 
「データサイエンティスト・ブーム」後の企業におけるデータ分析者像を探る
「データサイエンティスト・ブーム」後の企業におけるデータ分析者像を探る「データサイエンティスト・ブーム」後の企業におけるデータ分析者像を探る
「データサイエンティスト・ブーム」後の企業におけるデータ分析者像を探る
 
Deep Learningと他の分類器をRで比べてみよう in Japan.R 2014
Deep Learningと他の分類器をRで比べてみよう in Japan.R 2014Deep Learningと他の分類器をRで比べてみよう in Japan.R 2014
Deep Learningと他の分類器をRで比べてみよう in Japan.R 2014
 
直感的な単変量モデルでは予測できない「ワインの味」を多変量モデルで予測する
直感的な単変量モデルでは予測できない「ワインの味」を多変量モデルで予測する直感的な単変量モデルでは予測できない「ワインの味」を多変量モデルで予測する
直感的な単変量モデルでは予測できない「ワインの味」を多変量モデルで予測する
 
Rによるやさしい統計学第20章「検定力分析によるサンプルサイズの決定」
Rによるやさしい統計学第20章「検定力分析によるサンプルサイズの決定」Rによるやさしい統計学第20章「検定力分析によるサンプルサイズの決定」
Rによるやさしい統計学第20章「検定力分析によるサンプルサイズの決定」
 
データ分析というお仕事のこれまでとこれから(HCMPL2014)
データ分析というお仕事のこれまでとこれから(HCMPL2014)データ分析というお仕事のこれまでとこれから(HCMPL2014)
データ分析というお仕事のこれまでとこれから(HCMPL2014)
 
Trading volume mapping R in recent environment
Trading volume mapping R in recent environment Trading volume mapping R in recent environment
Trading volume mapping R in recent environment
 
最新業界事情から見るデータサイエンティストの「実像」
最新業界事情から見るデータサイエンティストの「実像」最新業界事情から見るデータサイエンティストの「実像」
最新業界事情から見るデータサイエンティストの「実像」
 
Salmon cycle
Salmon cycleSalmon cycle
Salmon cycle
 
Jc 20141003 tjo
Jc 20141003 tjoJc 20141003 tjo
Jc 20141003 tjo
 
『手を動かしながら学ぶ ビジネスに活かすデータマイニング』で目指したもの・学んでもらいたいもの
『手を動かしながら学ぶ ビジネスに活かすデータマイニング』で目指したもの・学んでもらいたいもの『手を動かしながら学ぶ ビジネスに活かすデータマイニング』で目指したもの・学んでもらいたいもの
『手を動かしながら学ぶ ビジネスに活かすデータマイニング』で目指したもの・学んでもらいたいもの
 
Granger因果による 時系列データの因果推定(因果フェス2015)
Granger因果による時系列データの因果推定(因果フェス2015)Granger因果による時系列データの因果推定(因果フェス2015)
Granger因果による 時系列データの因果推定(因果フェス2015)
 
計量時系列分析の立場からビジネスの現場のデータを見てみよう - 30th Tokyo Webmining
計量時系列分析の立場からビジネスの現場のデータを見てみよう - 30th Tokyo Webmining計量時系列分析の立場からビジネスの現場のデータを見てみよう - 30th Tokyo Webmining
計量時系列分析の立場からビジネスの現場のデータを見てみよう - 30th Tokyo Webmining
 
21世紀で最もセクシーな職業!?「データサイエンティスト」の実像に迫る
21世紀で最もセクシーな職業!?「データサイエンティスト」の実像に迫る21世紀で最もセクシーな職業!?「データサイエンティスト」の実像に迫る
21世紀で最もセクシーな職業!?「データサイエンティスト」の実像に迫る
 
Rで計量時系列分析~CRANパッケージ総ざらい~
Rで計量時系列分析~CRANパッケージ総ざらい~ Rで計量時系列分析~CRANパッケージ総ざらい~
Rで計量時系列分析~CRANパッケージ総ざらい~
 
ビジネスの現場のデータ分析における理想と現実
ビジネスの現場のデータ分析における理想と現実ビジネスの現場のデータ分析における理想と現実
ビジネスの現場のデータ分析における理想と現実
 
Tech Lab Paak講演会 20150601
Tech Lab Paak講演会 20150601Tech Lab Paak講演会 20150601
Tech Lab Paak講演会 20150601
 
なぜ統計学がビジネスの 意思決定において大事なのか?
なぜ統計学がビジネスの 意思決定において大事なのか?なぜ統計学がビジネスの 意思決定において大事なのか?
なぜ統計学がビジネスの 意思決定において大事なのか?
 
Simple perceptron by TJO
Simple perceptron by TJOSimple perceptron by TJO
Simple perceptron by TJO
 
傾向スコアを使ったキャンペーン効果検証V1
傾向スコアを使ったキャンペーン効果検証V1傾向スコアを使ったキャンペーン効果検証V1
傾向スコアを使ったキャンペーン効果検証V1
 

Similar to Visualization of Supervised Learning with {arules} + {arulesViz}

TAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RTAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RFayan TAO
 
Comparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptxComparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptx
PremaGanesh1
 
R studio
R studio R studio
R studio
Kinza Irshad
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Yao Yao
 
Is your excel production code?
Is your excel production code?Is your excel production code?
Is your excel production code?
ProCogia
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
Majid Abdollahi
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
carliotwaycave
 
UNIT V.docx
UNIT V.docxUNIT V.docx
UNIT V.docx
Revathiparamanathan
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
The Statistical and Applied Mathematical Sciences Institute
 
mobl presentation @ IHomer
mobl presentation @ IHomermobl presentation @ IHomer
mobl presentation @ IHomerzefhemel
 
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Vivian S. Zhang
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structure
Rai University
 
4 Descriptive Statistics with R
4 Descriptive Statistics with R4 Descriptive Statistics with R
4 Descriptive Statistics with R
Dr Nisha Arora
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structure
Rai University
 
UNIT IV -Data Structures.pdf
UNIT IV -Data Structures.pdfUNIT IV -Data Structures.pdf
UNIT IV -Data Structures.pdf
KPRevathiAsstprofITD
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with N
OllieShoresna
 
Presentation on use of r statistics
Presentation on use of r statisticsPresentation on use of r statistics
Presentation on use of r statistics
Krishna Dhakal
 
Mca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structureMca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structure
Rai University
 
Get started with R lang
Get started with R langGet started with R lang
Get started with R lang
senthil0809
 

Similar to Visualization of Supervised Learning with {arules} + {arulesViz} (20)

TAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RTAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with R
 
Comparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptxComparing EDA with classical and Bayesian analysis.pptx
Comparing EDA with classical and Bayesian analysis.pptx
 
R studio
R studio R studio
R studio
 
ML .pptx
ML .pptxML .pptx
ML .pptx
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Is your excel production code?
Is your excel production code?Is your excel production code?
Is your excel production code?
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
 
UNIT V.docx
UNIT V.docxUNIT V.docx
UNIT V.docx
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
 
mobl presentation @ IHomer
mobl presentation @ IHomermobl presentation @ IHomer
mobl presentation @ IHomer
 
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structure
 
4 Descriptive Statistics with R
4 Descriptive Statistics with R4 Descriptive Statistics with R
4 Descriptive Statistics with R
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structure
 
UNIT IV -Data Structures.pdf
UNIT IV -Data Structures.pdfUNIT IV -Data Structures.pdf
UNIT IV -Data Structures.pdf
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with N
 
Presentation on use of r statistics
Presentation on use of r statisticsPresentation on use of r statistics
Presentation on use of r statistics
 
Mca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structureMca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structure
 
Get started with R lang
Get started with R langGet started with R lang
Get started with R lang
 

Recently uploaded

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 

Recently uploaded (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 

Visualization of Supervised Learning with {arules} + {arulesViz}

  • 1. Visualization of Supervised Learning with {arules} + {arulesViz} Takashi J. OZAKI, Ph. D. Recruit Communications Co., Ltd. 2014/4/17 1
  • 2. About me  Twitter: @TJO_datasci  Data Scientist (Quant Analyst) in Recruit group  A group of companies in advertisement media and human resources  Known as a major player with big data  Current mission: ad-hoc analysis on various marketing data  Actually, still I’m new to the field of data science 2014/4/17 2
  • 3. About me  Original background: neuroscience in the human brain (6 years experience as postdoc researcher) 2014/4/17 3 (Ozaki, PLoS One, 2011)
  • 4. About me  English version of my blog http://tjo-en.hatenablog.com/ 2014/4/17 4
  • 6. 2014/4/17 6 Graphical Visualization of Supervised Learning
  • 7. Advantage of this technique More intuitive Easy to grasp even for high- dimensional data Even lay guys can easily understand Useful for presentation 2014/4/17 7
  • 8. Supervised learning: lower dimension, more intuitive  In case of 2D data… (e.g. nonlinear SVM) 2014/4/17 8 x y label 0.924335 -1.0665Yes 2.109901 2.615284No 0.988192 -0.90812Yes 1.299749 0.944518No -0.60885 0.457816Yes -2.25484 1.615489Yes
  • 9. Supervised learning: higher dimension, less intuitive  In case of 7D… no way!!! 2014/4/17 9 game1 game2 game3 social1 social2 app1 app2 cv 0 0 0 1 0 0 0No 1 0 0 1 1 0 0No 0 1 1 1 1 1 0Yes 0 0 1 1 0 1 1Yes 1 0 1 0 1 1 1Yes 0 0 0 1 1 1 0No … … … … … … …… ???
  • 10. 2014/4/17 10 Is there any technique that can easily visualize supervised learning with higher dimension? (…for lay people?)
  • 11. 2014/4/17 11  {arules} + {arulesViz}
  • 12. Why association rules and its visualization?  Much roughly, association rules can be interpreted as a kind of (likeness of) generative modeling  A large set of conditional probability  If it can be regarded as a set of conditional probability, it also can be described as (likeness of) Bayesian network “XY”  If it’s like a Bayesian network, it can be visualized as graph representation, e.g. by {igraph} 2014/4/17 12 𝑠𝑢𝑝𝑝 𝑋 → 𝑌 = 𝜎(𝑋 ∪ 𝑌) 𝑀 𝑐𝑜𝑛𝑓 𝑋 → 𝑌 = 𝑠𝑢𝑝𝑝(𝑋 → 𝑌) 𝑠𝑢𝑝𝑝(𝑋) 𝑙𝑖𝑓𝑡 𝑋 → 𝑌 = 𝑐𝑜𝑛𝑓(𝑋 → 𝑌) 𝑠𝑢𝑝𝑝(𝑌) X Y
  • 13. Further points…  Only when all of independent variables are bivariate, they can be handled as “basket transaction” 2014/4/17 13 game1 game2 game3 social1 social2 app1 app2 cv 0 0 0 1 0 0 0No 1 0 0 1 1 0 0No 0 1 1 1 1 1 0Yes 0 0 1 1 0 1 1Yes 1 0 1 0 1 1 1Yes 0 0 0 1 1 1 0No … … … … … … …… {social1, No} {game1, social1, social2, No} {game2, game3, social1, social2, app1, Yes} {game3, social1, app1, app2, Yes} {game1, game3, social2, app1, app2, Yes} {socia1, social2, app1, No} …
  • 15. Sample data “d1” 2014/4/17 15 game1 game2 game3 social1 social2 app1 app2 cv 0 0 0 1 0 0 0No 1 0 0 1 1 0 0No 0 1 1 1 1 1 0Yes 0 0 1 1 0 1 1Yes 1 0 1 0 1 1 1Yes 0 0 0 1 1 1 0No … … … … … … …… Imagine you’re working on a certain platform for web entertainment. It has 3 SP games, 2 SP social networking, 2 apps. The data records user’s history of any activity on each content in a month after registration, and “cv” label describes they are still active after a month passed.
  • 16. In the case with svm {e1071}… 2014/4/17 16 > d1.svm<-svm(cv~.,d1) # install and require {e1071} # svm {e1071} > table(d1$cv,predict(d1.svm,d1[,-8])) No Yes No 1402 98 Yes 80 1420 # Good accuracy (only for training data)
  • 17. In the case with randomForest {randomForest}… 2014/4/17 17 > tuneRF(d1[,-8],d1[,8],doBest=T) # install and require {randomForest} # (omitted) > d1.rf<-randomForest(cv~.,d1,mtry=2) # randomForest {randomForest} > table(d1$cv,predict(d1.rf,d1[,-8])) No Yes No 1413 87 Yes 92 1408 # Good accuracy > importance(d1.rf) MeanDecreaseGini game1 20.640253 game2 12.115196 game3 2.355584 social1 189.053648 social2 76.476470 app1 796.937087 app2 2.804019 # Variable importance (without any directionality)
  • 18. In the case with glm {stats}… 2014/4/17 18 > d1.glm<-glm(cv~.,d1,family=binomial) > summary(d1.glm) Call: glm(formula = cv ~ ., family = binomial, data = d1) # (omitted) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.37793 0.25979 -5.304 1.13e-07 *** game1 1.05846 0.17344 6.103 1.04e-09 *** game2 -0.54914 0.16752 -3.278 0.00105 ** game3 0.12035 0.16803 0.716 0.47386 social1 -3.00110 0.21653 -13.860 < 2e-16 *** social2 1.53098 0.17349 8.824 < 2e-16 *** app1 5.33547 0.19191 27.802 < 2e-16 *** app2 0.07811 0.16725 0.467 0.64048 --- # (omitted)
  • 19. Sample data converted for transactions “d2” 2014/4/17 19 game1 game2 game3 social1 social2 app1 app2 yes no 0 0 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 1 1 1 1 0 1 0 0 0 1 1 0 1 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 1 1 1 0 0 1 … … … … … … … … … Just “cv” column was divided into 2 columns: “yes” and “no” with bivariate (0 or 1)
  • 20. Run apriori {arules} to get association rules 2014/4/17 20 > d2.ap.small<-apriori(as.matrix(d2)) # install and require {arules} parameter specification: confidence minval smax arem aval originalSupport support minlen maxlen target ext 0.8 0.1 1 none FALSE TRUE 0.1 1 10 rules FALSE algorithmic control: filter tree heap memopt load sort verbose 0.1 TRUE TRUE FALSE TRUE 2 TRUE apriori - find association rules with the apriori algorithm version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt set item appearances ...[0 item(s)] done [0.00s]. set transactions ...[9 item(s), 3000 transaction(s)] done [0.00s]. sorting and recoding items ... [9 item(s)] done [0.00s]. creating transaction tree ... done [0.00s]. checking subsets of size 1 2 3 4 5 done [0.00s]. writing ... [50 rule(s)] done [0.00s]. # only 50 rules… creating S4 object ... done [0.00s].
  • 21. Run apriori {arules} to get association rules 2014/4/17 21 > d2.ap.large<-apriori(as.matrix(d2),parameter=list(support=0.001)) parameter specification: confidence minval smax arem aval originalSupport support minlen maxlen target ext 0.8 0.1 1 none FALSE TRUE 0.001 1 10 rules FALSE algorithmic control: filter tree heap memopt load sort verbose 0.1 TRUE TRUE FALSE TRUE 2 TRUE apriori - find association rules with the apriori algorithm version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt set item appearances ...[0 item(s)] done [0.00s]. set transactions ...[9 item(s), 3000 transaction(s)] done [0.00s]. sorting and recoding items ... [9 item(s)] done [0.00s]. creating transaction tree ... done [0.00s]. checking subsets of size 1 2 3 4 5 6 7 8 done [0.00s]. writing ... [182 rule(s)] done [0.00s]. # as much as 182 rules creating S4 object ... done [0.00s].
  • 22. OK, just visualize it 2014/4/17 22 > require(“arulesViz”) # (omitted) > plot(d2.ap.small, method=“graph”, control=list(type=“items”, layout=layout.fruchterman.reingold,)) > plot(d2.ap.large, method=“graph”, control=list(type=“items”, layout=layout.fruchterman.reingold,)) # Fruchterman – Reingold force-directed graph drawing algorithm can locate nodes with distances that is proportional to “shortest path length” between them # Then nodes (items) should be located based on their “closeness” between each other
  • 23. Small set of rules visualized with {arulesViz} 2014/4/17 23
  • 24. Compare with a result of glm 2014/4/17 24 > d1.glm<-glm(cv~.,d1,family=binomial) > summary(d1.glm) Call: glm(formula = cv ~ ., family = binomial, data = d1) # (omitted) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.37793 0.25979 -5.304 1.13e-07 *** game1 1.05846 0.17344 6.103 1.04e-09 *** game2 -0.54914 0.16752 -3.278 0.00105 ** game3 0.12035 0.16803 0.716 0.47386 social1 -3.00110 0.21653 -13.860 < 2e-16 *** social2 1.53098 0.17349 8.824 < 2e-16 *** app1 5.33547 0.19191 27.802 < 2e-16 *** app2 0.07811 0.16725 0.467 0.64048 --- # (omitted)
  • 25. Large set of rules visualized with {arulesViz} 2014/4/17 25
  • 26. Compare with a result of randomForest 2014/4/17 26 > tuneRF(d1[,-8],d1[,8],doBest=T) # install and require {randomForest} # (omitted) > d1.rf<-randomForest(cv~.,d1,mtry=2) # randomForest {randomForest} > table(d1$cv,predict(d1.rf,d1[,-8])) No Yes No 1413 87 Yes 92 1408 # Good accuracy > importance(d1.rf) MeanDecreaseGini game1 20.640253 game2 12.115196 game3 2.355584 social1 189.053648 social2 76.476470 app1 796.937087 app2 2.804019 # Variable importance (without any directionality)
  • 27. See how far nodes are from yes / no 2014/4/17 27
  • 28. Large set of rules visualized with {arulesViz} 2014/4/17 28
  • 29. Advantage of this technique More intuitive Easy to grasp even for high- dimensional data Even lay guys can easily understand Useful for presentation 2014/4/17 29
  • 30. Disadvantage of this technique Less strict Never quantitative 2014/4/17 30
  • 31. Any questions or comments? 2014/4/17 31 Don’t hesitate to ask me! @TJO_datasci