5. example
A B C D E
ㄎㄎ V V V
遙遙 V V V V
叮叮 V V
A B
Support P(A B):A 與 B 共同出現的機率,數值越⼤大越好。
Confidence P(B|A):在 A 出現的前提下,出現 B 的機率,數值越⼤大越好。
Lift P(B|A)/P(B):B 單獨出現比率與前項 Confidence 的比較,
當數值⼤大於 1 表⽰示規則有效,數值越⼤大效果越好。
∩
user item
ㄎㄎ A
ㄎㄎ B
ㄎㄎ E
遙遙 A
叮叮 B
[ A, B, E ]
[ A, B, C, E ]
[ A, B ]
transactions
+---------+----+
| items|freq|
+---------+----+
| [A]| 3|
| [B]| 3|
| [B, A]| 3|
| [E]| 2|
| [E, B]| 2|
|[E, B, A]| 2|
| [E, A]| 2|
+---------+----+
+----------+----------+------------+
|antecedent|consequent| confidence|
+----------+----------+------------+
| [E]| [B]| 1.0|
| [E]| [A]| 1.0|
| [B, A]| [E]|0.6666666666|
| [E, B]| [A]| 1.0|
| [A]| [B]| 1.0|
| [A]| [E]|0.6666666666|
| [E, A]| [B]| 1.0|
| [B]| [A]| 1.0|
| [B]| [E]|0.6666666666|
+----------+----------+------------+
6. Apriori Algorithm
• 所有項⽬目集合中,若若某項⽬目集合的頻次很低,則該⺟母集合也很低。
• e.q. 消費記錄中,購買「林林X營鮮乳」的 transaction 很少,則
item sets 組合中出現前品項者,都會先被排除。
[ A, B, E ]
[ A, B, C, E ]
[ A, B ]
transactions
n = 4
item sets = 24 - 1 = 15
4: {A}, {B}, {C}, {E}
6: {A,B}, {A,C}, {A,E}, {B,C}, {B,E}, {C,E}
4: {A,B,C}, {A,B,E}, {A,C,E}, {B,C,E}
1: {A,B,C,E}
item sets
If support > 0.5
( filter out C )
{A} 3
{B} 3
{A,B} 3
{E} 2
{A,E} 2
{B,E} 2
{A,B,E} 2
frequent item sets
Agrawal, Rakesh; and Srikant, Ramakrishnan; Fast algorithms for mining association rules in large databases, in Bocca,
Jorge B.; Jarke, Matthias; and Zaniolo, Carlo; editors, Proceedings of the 20th International Conference on Very Large
Data Bases (VLDB), Santiago, Chile, September 1994, pages 487-499
7. FP-Growth Algorithm
• 原理理跟 Apriori ⼀一樣,但利利⽤用 FP-tree 加速搜尋。
[ A, B, E ]
[ A, B, C, E ]
[ A, B ]
transactions FP-tree
ϕ
A
B
E C
E
1
1
1 1
1
2
2
3
3
prefix path
E {A,B}, {A,B,C}
B {A}
A { }
ϕ
conditional tree
on E, ϕ
A
B
{E} 2
{A,E} 2
{B,E} 2
{A,B,E} 2
header table
A 3
B 3
C 1
E 2
on B, {B} 3
{A,B} 3
ϕ
A
{A} 3
on A,
Han (2000). "Mining Frequent Patterns Without Candidate Generation". Proceedings of the 2000 ACM SIGMOD
International Conference on Management of Data. SIGMOD '00: 1–12.
8. packages
• Python
• Apriori
• FP-growth
• R
• Apriori
https://cran.r-project.org/web/packages/arules/arules.pdf
• FP-growth
https://cran.r-project.org/web/packages/rCBA/rCBA.pdf
from mlxtend.frequent_patterns import apriori
from orangecontrib.associate import fpgrowth
20. Trata DMP Recap
• Location Data from SDK
• Geography Information System (GIS) from Google API
• Home Country Tags & Cross-Border Travelers -> passport holders
• Reports
• Asia Report since 2016
• Collaboration with GFK - Great China Travelers in Japan
• 台北市觀傳局、⾼高雄市觀傳局
• O2O in ITF
• Osaka Report for Workshop@Osaka
• … etc.