Tokyo.R #22 Association Rules

アソシエーション分析
@bob3bob3
Tokyo.R #22
2012/04/28

実は
• Tokyo.R#05ですでにアソシエーション分析は取り上げられて
いる！
• http://www.slideshare.net/hamadakoichi/r-r-4219052

アソシエーション分析とは？
• ＰＯＳデータやＥＣサイトの取引データから一緒に買われやす
い商品の組み合わせを探す探索的データ分析の手法。
• 商品同士の組み合わせだけでなく、顧客の属性や購買時間帯などとの
組み合わせも分析可能。

• 「おむつとビール」の事例で有名。

• 「商品Aを買うと商品Bも買う確率が高い」というようなルール
を見つけ出すための手法。
• もちろん、そこにそのようなルールがなければ何も出てこない。

• いわゆるデータマイニングの代表的な手法の一つで、単に
「データマイニング」と言って、アソシエーション分析を指してい
るケースも散見される。
• 鉱脈から埋もれた金を発見する“マイニング”のイメージに最も合致する
からかも。

• 別名：
• マーケットバスケット分析、併売分析、アソシエーションルール、連関規
則、連想規則
• 「相関ルール」と呼ばれることもあるがこれは誤訳。
• 相関はcorrelation。

• Association
• 1 （共通の目的で組織された）団体, 会, 会社；組合；((しばしばA-))（…）協会
the student body association 学生自治会.
• 2 [U]（…との）交際, つき合い, かかわり, つながり, 共同, 提携((with ...))
in association with ... …に関連して；…と共同で
He denied any association with the plane maker.
彼はその航空機メーカーとは何のかかわりもないと言った.
• 3 [U]連想；[C]連想されるもの［意味］
the association of ideas 《心理学》観念連合, 連想
my associations from the poem
その詩から私が思い浮かべること.

• 出典：「プログレッシブ英和中辞典」

• 活用例
• 店舗のレイアウトや棚割り設計のための資料として活用。
• より併売されやすい商品を近くに配置。
• 顧客の利便性向上とクロスセリングによる売り上げ向上を目指す。

• 実用上
• 商品管理単位そのままのデータをアソシエーション分析にかけても、ア
イテム数が多すぎて有用な結果は得られにくいので工夫が必要。
• カテゴリごとにグールプ化する。
• ＡＢＣ分析に基づいて、主力商品を中心に分析する。

アソシエーションルール
• Ｘ⇒Ｙ
• 「ＸならばＹ」
• Ｘ：条件部、ルールヘッド、前提。
• Ｙ：結論部、ルールボディ。
• シャンプー⇒リンス。
• ビール⇒枝豆。
• 月末⇒残業
• 金曜日の夜＆山手線⇒酔っ払い

• 実用上
• 商品管理単位そのままのデータをアソシエーション分析にかけても、ア
イテム数が多すぎて有用な結果は得られにくいので工夫が必要。
• カテゴリごとにグールプ化する。
• ＡＢＣ分析に基づいて、主力商品を中心に分析する。

評価指標
• 前提確率
• 支持度（support、同時確率）
• 確信度（confidence、条件付き確率）
• リフト値（lift、改善率）

リンス（Ｙ）
○ × 計
シャンプー ○ ７１８
（Ｘ） × １１２
計８２１０

評価指標
• 前提確率
 全体の中でXを含むトランザクションの比率。
 前提確率が高いルールは良いルール。そのルールが発動する機会が
多いことを意味するので。
 8÷10＝0.8

リンス（Ｙ）
○ × 計
（Ｘ） × １１２
計８２１０

• 支持度（support、同時確率）
 全体の中でXとYの両方を含むトランザクションの比率。
 支持度の高いルールは良いルール。
 7÷10＝0.7

リンス（Ｙ）
○ × 計
（Ｘ） × １１２
計８３１０

• 確信度（confidence、条件付き確率）
• Xを含むトランザクションのうちYを含む比率。確信度の高いルールは良
いルール。
• 7÷8＝0.875

リンス（Ｙ）
○ × 計
（Ｘ） × １１２
計８２１０

• リフト値（lift、改善率）
• 確信度を事前確率で割ったもの。
• Xを買ってYも買う確率は、普通にYが買われる確率の何倍であるか。
• リフト値が１を越えるかどうかが有効なルールかどうかの判断基準の一
つ。
• （7÷8） ÷（8÷10）＝0.75÷0.8＝ 1.09375

リンス（Ｙ）
○ × 計
シャンプー ○ 7 1 ８
（Ｘ） × １１２
計 8 2 １０

Ｒでアソシエーション分析
• {arules}パッケージを使う

library(arules)

data(Groceries)
# arulesパッケージに含まれるデータセット。
# ある食料雑貨店で収集した30日間分のＰＯＳデータ。
# 牛乳など169品目の9,835件の購入履歴。

Groceries
# transactions in sparse format with
# 9835 transactions (rows) and
# 169 items (columns)

summary(Groceries)
# transactions as itemMatrix in sparse format with
# 9835 rows (elements/itemsets/transactions) and
# 169 columns (items) and a density of 0.02609146

# most frequent items:
# whole milk other vegetables rolls/buns soda
# 2513 1903 1809 1715
# yogurt (Other)
# 1372 34055

# element (itemset/transaction) length distribution:
# sizes
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# 2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55
# 16 17 18 19 20 21 22 23 24 26 27 28 29 32
# 46 29 14 14 9 11 4 6 1 1 1 1 3 1

# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 1.000 2.000 3.000 4.409 6.000 32.000

# includes extended item information - examples:
# labels level2 level1
# 1 frankfurter sausage meet and sausage
# 2 sausage sausage meet and sausage
# 3 liver loaf sausage meet and sausage

inspect(Groceries)
# 8283 {frankfurter,
# onions,
# liquor (appetizer),
# napkins}
# 8284 {butter}
# 8285 {organic sausage,
# tropical fruit,
# packaged fruit/vegetables,
# whole milk,
# curd,
# yogurt,
# soft cheese,
# curd cheese,
# frozen vegetables,
# domestic eggs,
# rolls/buns,
# pastry,
# margarine,
# bottled water,
# cooking chocolate,
# hygiene articles,
# shopping bags}

# 元データのアイテムごとの件数を確認
head(sort(itemFrequency(Groceries, type="absolute"), d=TRUE))
# whole milk other vegetables rolls/buns soda yogurt bottled water
# 2513 1903 1809 1715 1372 1087

# アイテムごとの出現頻度の確認
itemFrequencyPlot(Groceries)

# aprioriアルゴリズムの実行
#デフォルトは信頼度0.8以上、支持度0.1以上のルールのみ抽出
grule1 <- apriori(Groceries)

# parameter specification:
# confidence minval smax arem aval originalSupport support minlen maxlen target ext
# 0.8 0.1 1 none FALSE TRUE 0.1 1 10 rules FALSE

# algorithmic control:
# filter tree heap memopt load sort verbose
# 0.1 TRUE TRUE FALSE TRUE 2 TRUE

# apriori - find association rules with the apriori algorithm
# version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
# set item appearances ...[0 item(s)] done [0.00s].
# set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].
# sorting and recoding items ... [8 item(s)] done [0.00s].
# creating transaction tree ... done [0.00s].
# checking subsets of size 1 2 done [0.00s].
# writing ... [0 rule(s)] done [0.00s].
# creating S4 object ... done [0.00s].

## writing...の行で何個のルールが抽出されたか確認できる。

#パラメータの指定。extは事前確率の表示。
# 確信度0.5、支持度0.01
grule2 <- apriori(Groceries, p=list(support=0.01, confidence=0.5, maxlen=4, ext=TRUE))

# parameter specification:
# confidence minval smax arem aval originalSupport support minlen maxlen target ext
# 0.5 0.1 1 none FALSE TRUE 0.01 1 4 rules TRUE

# algorithmic control:
# filter tree heap memopt load sort verbose
# 0.1 TRUE TRUE FALSE TRUE 2 TRUE

# apriori - find association rules with the apriori algorithm
# version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
# set item appearances ...[0 item(s)] done [0.00s].
# set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
# sorting and recoding items ... [88 item(s)] done [0.00s].
# creating transaction tree ... done [0.01s].
# checking subsets of size 1 2 3 4 done [0.00s].
# writing ... [15 rule(s)] done [0.00s].
# creating S4 object ... done [0.00s].

## ルールが15個出た！

# 得られたルールの表示
inspect(grule2)
# lhs rhs support confidence lhs.support lift
# 1 {curd,
# yogurt} => {whole milk} 0.01006609 0.5823529 0.01728521 2.279125
# 2 {other vegetables,
# butter} => {whole milk} 0.01148958 0.5736041 0.02003050 2.244885
# domestic eggs} => {whole milk} 0.01230300 0.5525114 0.02226741 2.162336
# 4 {yogurt,
# whipped/sour cream} => {whole milk} 0.01087951 0.5245098 0.02074225 2.052747
# 6 {pip fruit,
# other vegetables} => {whole milk} 0.01352313 0.5175097 0.02613116 2.025351
# 7 {citrus fruit,
# root vegetables} => {other vegetables} 0.01037112 0.5862069 0.01769192 3.029608
# 8 {tropical fruit,
# root vegetables} => {whole milk} 0.01199797 0.5700483 0.02104728 2.230969
# ……

# 任意の評価指標で並び替えてルールを表示。ここではリフト値順。
inspect(sort(grule2, by="lift"))
# lhs rhs support confidence lhs.support lift
# 1 {citrus fruit,
# 3 {root vegetables,
# rolls/buns} => {other vegetables} 0.01220132 0.5020921 0.02430097 2.594890
# yogurt} => {other vegetables} 0.01291307 0.5000000 0.02582613 2.584078
# 5 {curd,
# butter} => {whole milk} 0.01148958 0.5736041 0.02003050 2.244885
# root vegetables} => {whole milk} 0.01199797 0.5700483 0.02104728 2.230969
# domestic eggs} => {whole milk} 0.01230300 0.5525114 0.02226741 2.162336
# 10 {yogurt,
# ……

# 前提や結論を指定してルールを抽出したい。前提にwhole milkを含むルールのみ抽出。
grule3 <- apriori(Groceries, p=list(support=0.001, confidence=0.1),
appearance=list(lhs="whole milk", default="rhs"))
head(inspect(sort(grule3, by="lift")))
# lhs rhs support confidence lift
# 1 {whole milk} => {butter} 0.02755465 0.1078392 1.9460530
# 2 {whole milk} => {curd} 0.02613116 0.1022682 1.9194805
# 3 {whole milk} => {domestic eggs} 0.02999492 0.1173896 1.8502027
# 4 {whole milk} => {whipped/sour cream} 0.03223183 0.1261441 1.7597542
# 5 {whole milk} => {root vegetables} 0.04890696 0.1914047 1.7560310
# 6 {whole milk} => {tropical fruit} 0.04229792 0.1655392 1.5775950
# 7 {whole milk} => {yogurt} 0.05602440 0.2192598 1.5717351
# 8 {whole milk} => {pip fruit} 0.03009659 0.1177875 1.5570432
# 9 {whole milk} => {other vegetables} 0.07483477 0.2928770 1.5136341
# 10 {whole milk} => {pastry} 0.03324860 0.1301234 1.4625865
# 11 {whole milk} => {citrus fruit} 0.03050330 0.1193792 1.4423768
# 12 {whole milk} => {fruit/vegetable juice} 0.02663955 0.1042579 1.4421604
# 13 {whole milk} => {newspapers} 0.02735130 0.1070434 1.3411103
# 14 {whole milk} => {sausage} 0.02989324 0.1169916 1.2452520
# 15 {whole milk} => {bottled water} 0.03436706 0.1345006 1.2169396
# 16 {whole milk} => {rolls/buns} 0.05663447 0.2216474 1.2050318
# ……

# そのうち、liftが1.5以上のルールのみ
grule4 <- subset(grule3, subset=(lift>=1.5))
inspect(grule4)
# lhs rhs support confidence lift
# 1 {whole milk} => {curd} 0.02613116 0.1022682 1.919481
# 2 {whole milk} => {butter} 0.02755465 0.1078392 1.946053
# 3 {whole milk} => {domestic eggs} 0.02999492 0.1173896 1.850203
# 4 {whole milk} => {whipped/sour cream} 0.03223183 0.1261441 1.759754
# 5 {whole milk} => {pip fruit} 0.03009659 0.1177875 1.557043
# 6 {whole milk} => {tropical fruit} 0.04229792 0.1655392 1.577595
# 7 {whole milk} => {root vegetables} 0.04890696 0.1914047 1.756031
# 8 {whole milk} => {yogurt} 0.05602440 0.2192598 1.571735
# 9 {whole milk} => {other vegetables} 0.07483477 0.2928770 1.513634

library("arulesViz")
gruleX <- apriori(Groceries, p=list(support=0.03, confidence=0.05, ext=TRUE))
gruleX2 <- subset(gruleX, subset=(lift>=1.5))
gruleX2
plot(gruleX2, method="graph")

Tokyo.R #22 Association Rules

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

More from osamu morimoto

More from osamu morimoto (8)

Tokyo.R #22 Association Rules