More Related Content More from Hiroshi Ono (20) 1004201. The 22nd Annual Conference of the Japanese Society for Artificial Intelligence, 2008
3M1-1
é »åºãã¿ãŒã³çºèŠã¢ã«ãŽãªãºã å
¥é ïŒ ã¢ã€ãã éåããã°ã©ããŸã§ ïŒ
An Introduction to Frequent Pattern Mining â Itemsets to Graphs -
å®é æ¯
æ*1 ææ åçŽ*2
Takeaki Uno Hiroki Arimura
*1 *2
åœç«æ
å ±åŠç 究æã»ç·åç 究倧åŠé¢å€§åŠ åæµ·é倧åŠå€§åŠé¢æ
å ±ç§åŠç 究ç§
National Institute of Informatics, Hokkaido University, Graduate School for
The Graduate University for Advanced Studies Information Science and Technology
This paper provides a brief survey of efficient algorithms for the frequent itemset mining (FIM) problem, which is one of the
most popular and basic problems in data mining. Firstly, we give the descriptions of two well-known algorithms for the FIM
problem, called Apriori and Backtrack, which are based on breadth-first search and depth-first search over the space of
frequent itemsets, respectively. Then, we discuss advanced techniques to speed-up the algorithms for the FIM problem,
namely, database reduction, frequent itemset trees, down-project, occurrence-deliver, recursive database reduction, which
improve the key operations of FIM computation such as database access and frequency computation. We also explain how to
extend Backtrack algorithm for semi-structured data including trees and graphs.
1. ã¯ããã« A: 1,2,5,6,7,9 ïŒã€ä»¥äžã«å«ãŸãã
ããŒã¿ãã€ãã³ã°ã¯ïŒå€§éã®ããŒã¿ããæçšãªèŠåæ§ããã¿ãŒ B: 2,3,4,5 ã¢ã€ãã éå
ã³ãèŠã€ããããã®å¹çããææ³ã«é¢ããç 究ã§ããïŒããŒã¿
C: 1,2,7,8,9 {1} {2} {7} {9}
ãã€ãã³ã°ç 究ã¯ïŒ1994 幎ã®Agrawalãã®çµåèŠåçºèŠã«é¢ã
ãç 究ãå¥æ©ãšããŠïŒæ¥éã«çºå±ããïŒãã®çµåèŠåçºèŠã®éµ D: 1,7,9 {1,7} {1,9}
ãšãªãåºæ¬çãªæè¡ãé »åºã¢ã€ãã éåçºèŠã§ããïŒ ãã® 10 E: 2,3,7,9 {2,7} {2,9} {7,9}
幎éã§ïŒçµåèŠåçºèŠã«ãšã©ãŸããïŒæ·±ãåªå
æ¢çŽ¢æ³ã®ææ¡ãïŒ
飜åéåçºèŠïŒæ§é ããŒã¿ãžã®æ¡åŒµãªã©ïŒããŸããŸãªåœ¢ã§çã
ãªç 究ãç¶ããŠããïŒããŒã¿ãã€ãã³ã°ã®äž»èŠãªãããã¯ã®äžã€
å³ 1 ãã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ããŒã¹ãšé »åºéå
ãšãªã£ãŠããïŒ
æ¬ã¬ã¯ãã£ãŒã§ã¯ïŒé »åºéåçºèŠã¢ã«ãŽãªãºã ã®æè¿ã®é²å± ã§ã³(transaction)ã§ããããŒã¿ããŒã¹ã®ããšã§ããïŒãã©ã³ã¶ã¯ã·ã§
ã«ã€ããŠè§£èª¬ããïŒç¹ã«ïŒã¢ã«ãŽãªãºã ç 究ã®ç«å ŽããïŒãã®åº ã³ãšã¯ã¢ã€ãã (item)ã®éåã§ããïŒèšãæããã°ïŒã¢ã€ãã ã®
æ¬çãªæ§é ãšïŒå®éã®ããŒã¿ã®ããã®é«éãªå®è£
ã«ã€ããŠè§£ éåâ={1,âŠ,n}ã«å¯ŸããŠïŒãã©ã³ã¶ã¯ã·ã§ã³ ã¯âã®éšåéåã§
説ããïŒæ¬è§£èª¬ã¯ïŒçè
ãã«ããå®è£
ã³ã³ãã¹ãFIMI'04 ã§ã®åª ããïŒãã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ããŒã¹ã¯ãã©ã³ã¶ã¯ã·ã§ã³ã®éåã§ã
åããã°ã©ã LCMã®éçºçµéšã«åºã¥ãããã®ã§ããïŒçŸåšïŒ ãïŒãã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ããŒã¹ã«ã¯ïŒåäžã®ãã©ã³ã¶ã¯ã·ã§ã³ã
LCMãå«ãïŒåçš®ã®é »åºéåçºèŠã¢ã«ãŽãªãºã ãå
¬éããïŒå© è€æ°å«ãŸããŠãè¯ãïŒâã®éšåéåãããã§ã¯ ã¢ã€ãã éå
çšãããŠããïŒæ¬è§£èª¬ããããã®ã¢ã«ãŽãªãºã ãå®éã®åé¡ã« (itemset)ãšãã¶ïŒ
é©çšããå Žåã«åèã«ãªãã°å¹žãã§ããïŒ å³ 1 ã®å·Šã«ãã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ããŒã¹ã®äŸãããïŒããã«ïŒ
æ¬çš¿ã®æ§æã¯æ¬¡ã®ãšããã§ããïŒ2 ç¯ã§åé¡ã®å®çŸ©ãäžãïŒ ã¢ã€ãã ã®éåã¯{1,2,3,4,5,6,7,8,9}ã§ããïŒãã©ã³ã¶ã¯ã·ã§ã³ã®é
3 ç¯ã§ã¢ããªãªãªæ³ãšæ·±ãåªå
æ¢çŽ¢æ³ã®äºã€ã®åºæ¬çææ³ãäž åã¯{A, B, C, D, E, F}ã§ããïŒäŸãã°ïŒã¹ãŒããŒçã®å£²ãäžã
ããïŒ4 ç¯ã§ã¯é«éåæè¡ã«ã€ããŠè¿°ã¹ïŒ5 ç¯ã§ã¯å®ããŒã¿äž ããŒã¿ã¯ïŒåã¬ã³ãŒããã1 人ã®ã客ãããã©ã®ååãè²·ã£ã
ã§ã®æ§èœæ¯èŒïŒ6 ç¯ã§ã¯å¿çšã«ã€ããŠïŒ7 ç¯ã§ã¯æšæ§é ããŒã¿ ãããšããèšé²ã«ãªã£ãŠããã®ã§ïŒã¹ãŒããŒã®å
šååã®éåãã¢
ããã®é »åºãã¿ãŒã³çºèŠãžã®æ¡åŒµã«ã€ããŠè¿°ã¹ïŒ8 ç¯ã§ãŸãšã ã€ãã éåãšæãã°ïŒãã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ããŒã¹ã«ãªã£ãŠããïŒ
ãïŒãªãïŒæ¬çš¿ã¯ïŒäººå·¥ç¥èœåŠäŒèªã®ã¬ã¯ãã£ãŒã·ãªãŒãºãç¥èœ ãã®ã»ãïŒã¢ã³ã±ãŒãçµæã®ããŒã¿ãïŒïŒã€ã®ã¬ã³ãŒããïŒãã
ã³ã³ãã¥ãŒãã£ã³ã°ãšãã®åšèŸºã[è¥¿ç° 07]ã®ç¬¬ 2 åã¬ã¯ãã£ãŒ[å® äººããã§ãã¯ãå
¥ããé
ç®ã®éåã«ãªã£ãŠãããšèããã°ãã
é 07]ã®äžéšåãåºã«å çä¿®æ£ãããã®ã§ããïŒä»ã®åã®ã¬ã¯ ãïŒããããã®é
ç®ã«ã€ããŠïŒïŒããïŒã®ç¹æ°ãäžãããããªãã®
ãã£ãŒãšãåãããŠã芧ããã ããã°å¹žãã§ããïŒ ã§ããã°ïŒã質åãšåçã®çµãïŒQ1 ã« 5 ãã€ããŠããïŒçïŒãã¢ã€
ãã ã§ãããšèããããšã§ïŒãã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ããŒã¹ãšã¿ãª
ããïŒãã®ã»ãïŒå®éšããŒã¿çããã®ç¯çã«å
¥ãïŒéåžžã«å€ãã®
2. é »åºãã¿ãŒã³ ããŒã¿ããŒã¹ããã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ããŒã¹ã§ãããšã¿ãªããïŒ
2.1 ãã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ããŒã¹ 2.2 é »åºã¢ã€ãã éåã䜿ã£ãããŒã¿ããŒã¹åæ
ä» ïŒ æ å
ã« ã 㩠㳠㶠㯠㷠㧠㳠ã ㌠㿠ã ㌠㹠(transaction ããŠïŒãã®ããŒã¿ã解æããããšãããšïŒã©ã®ãããªèŠç¹ããã
database)ããããšããïŒããã¯ïŒåã¬ã³ãŒã(record)ããã©ã³ã¶ã¯ã· ã ãããïŒèŠç¹ãšããŠïŒæãå€ãå«ãŸããã¢ã€ãã ã¯äœãïŒãã©ã³ã¶
ã¯ã·ã§ã³ã®æ°ã¯ããã€ãïŒãã©ã³ã¶ã¯ã·ã§ã³ã®å¹³åçãªå€§ããã¯ã©
é£çµ¡å
ïŒ ææåçŽïŒåæµ·éå€§åŠ å€§åŠé¢æ
å ±ç§åŠç ç©¶ç§ ã®çšåºŠãïŒãããã¯å€§ããã®åæ£ã¯ã©ã®çšåºŠãïŒãšãã£ããã®ã
ã060-0814 æå¹åžååºå 14 æ¡è¥¿ 9 äžç®
TEL/FAX: 011-706-7680, E-mail: arim@ist.hokudai.ac.jp
-1-
2. The 22nd Annual Conference of the Japanese Society for Artificial Intelligence, 2008
æãã€ãã ããïŒãããã¯ïŒããã°ïŒããŒã¿ãå
šäœçã«æããŠç¹ åºãããïŒÎžãå°ããã«èšå®ããããšãæãŸããïŒããã«ã¯ãã¬
城ãèŠåºãæ段ã ãšèšããïŒ ãŒããªããããïŒÎžãäžæã«èšå®ããïŒãããã¯é »åºåºŠã®é«ã
ã§ã¯éã«ïŒããŒã¿ã®å±æçãªç¹åŸŽãæããããšã¯ã§ããªãã ãã®ãã k åé »åºéåãèŠã€ããããšã§ïŒè§£ã®æ°ãççºããã
ãããïŒå±æçãšãããšïŒïŒã€ïŒã€ã®ã¬ã³ãŒããèŠãæ¹æ³ãèãã ã«æçšãªç¥èãåãåºãããšãéèŠã§ããïŒ
ãããïŒããã§ã¯åã¬ã³ãŒãã®åæ§ã«è§£æãåŒãããããŠããŸãïŒ
ãŸãïŒãã©ã³ã¶ã¯ã·ã§ã³ã®å€§ããïŒã©ã®ã¢ã€ãã ãå«ããïŒãšãã£ã 2.4 æ§é ããŒã¿
ç¹åŸŽã§ã¯ïŒå±æçãªåæ§ãèªãã«ã¯åŒ±ãããïŒãã®æå³ã§ã¯ïŒ XMLããŒã¿ïŒæç³»åããŒã¿ïŒæååããŒã¿çïŒããŒã¿ããŒã¹
ãã©ã®ãããªã¢ã€ãã éåãå€ãã®ãã©ã³ã¶ã¯ã·ã§ã³ã«å«ãŸãããã ãæ§é ãæã€å ŽåïŒã¢ã€ãã ã®éåã ãã§ãªãïŒæ§é ãä»å ã
ãšããç¹ã«æ³šç®ããããšã¯ïŒäž¡æ¹ã®èŠæ±ãæºãããŠããïŒéœåã ããããªãã¿ãŒã³ãèããããšãã§ããïŒå€ãã®ã¬ã³ãŒãã«å«ãŸã
è¯ãïŒãŸãïŒå€ãã®ã¬ã³ãŒãã«å«ãŸããçµåãã«æ³šç®ããããšã§ïŒ ããã¿ãŒã³ã¯é »åºãã¿ãŒã³(frequent pattern)ãšãã°ãïŒé »åºéå
å±æçã ãå
šäœã«ããçšåºŠå
±éããæ§é ãèŠã€ããããšãã§ã ãšåãããã«ã¢ãã«åããããšãã§ããïŒããŒã¿ããŒã¹ã«ãã£ãŠã¯ïŒ
ãïŒããŒã¿ã®äžããïŒç¹æ®ã ãé¡èãªäŸãèŠã€ããã®ã«ã¯é©ã æ§é ãæã€æå³ããšãŠã倧ãããã®ãããïŒãã®ãããªå Žåã«ã¯ïŒ
ãŠããã ããïŒ æ§é ãæã€ãã¿ãŒã³ãèŠã€ããããšã«å€§ããªæ矩ãããïŒé »åº
ãã®ããã«ïŒããŒã¿ããŒã¹ã®äžã«è¯ãçŸããã¢ã€ãã éåãé » ãã¿ãŒã³ãèŠã€ããåé¡ã¯é »åºãã¿ãŒã³çºèŠïŒfrequent pattern
åºã¢ã€ãã éå (frequent itemset)ïŒãããã¯åã«é »åºéåãšã miningïŒãšãã°ãïŒå€ãã®ç 究ãããïŒ7 ç¯ã§ïŒé åºæšãã¿ãŒã³ã®
ã¶ïŒæ£ç¢ºã«ã¯ïŒæå°ãµããŒããšãã°ããéŸå€ÎžãçšããŠïŒÎžå æã«å¯Ÿããé »åºãã¿ãŒã³çºèŠã«ã€ããŠè¿°ã¹ãïŒ
以äžã®ãã©ã³ã¶ã¯ã·ã§ã³ã«å«ãŸãããããªã¢ã€ãã éåãšããŠå®çŸ©
ãããïŒå³ 1 ã§ã¯ïŒå³åŽã«ç€ºããéåã¯ã©ããå°ãªããšãïŒã€ä»¥
äžã®ãã©ã³ã¶ã¯ã·ã§ã³ã«å«ãŸããããïŒæå°ãµããŒãïŒã«å¯Ÿããé » 3. é »åºéååæã¢ã«ãŽãªãºã
åºéåãšãªã£ãŠããïŒã¢ã€ãã éåXã«å¯ŸããŠïŒãããå«ããã©ã³
ã¶ã¯ã·ã§ã³ãåºçŸ(occurrence)ãšãã¶ïŒãŸãïŒåºçŸã®éåãåºçŸ 3.1 èšç®æéã®è©äŸ¡
éå(occurrence set)ïŒåºçŸã®æ°ãé »åºåºŠ(frequency)ïŒããã㯠å®çšäžïŒé »åºéåçºèŠãçšãããããªå ŽåïŒå
¥åããããŒã¿
ãµããŒã(support)ãšããïŒããããOcc(X) ãšfrq(X) ãšè¡šèšããïŒ ããŒã¹ã¯å·šå€§ã§ããããšãå€ãïŒãŸãåºåãã解ã®æ°ã倧ããïŒ
å³ 1 ã®äŸã§ã¯ïŒã¢ã€ãã éå{2,5}ã®åºçŸã¯ 1 è¡ç®ãš 2 è¡ç®ã®ã ãã®ããïŒèšç®æéãççºçã«å¢å ãã解æ³ã¯çŸå®çã«ã¯äœ¿
ã©ã³ã¶ã¯ã·ã§ã³ã§ããïŒ çšã§ããªãïŒçŸå®çãªæéå
ã«æ±è§£ãçµäºãããããã«ã¯ïŒã¢
é »åºéåã®çºèŠã¯ïŒ1993 幎㫠Agrawal, Imielinski ,Srikant ã«ãŽãªãºã çãªæè¡ãæ¬ ãããªãïŒ
ãã«ãã£ãŠïŒçµåèŠåã®çºèŠã«é¢é£ããŠææ¡ãããã®ãæå㧠äžè¬ã«ïŒé »åºéååæåé¡ã®ãããªïŒå€ãã®è§£ãåæããå
ãã [AIS 93]ïŒ æã¢ã«ãŽãªãºã (enumeration algorithm)ã®èšç®æéã¯ïŒåé¡ã
å
¥åããŠåæåããéšåãšïŒå®éã«æ±è§£ãè¡ã£ãŠè§£ãåºåãã
éšåãšã«åãããïŒå
¥åã»åæåã«é¢ããŠã¯ã©ã®å®è£
ã»ã¢ã«ãŽãª
2.3 é »åºã¢ã€ãã éåçºèŠåé¡ ãºã ã§ãã»ãŒç·åœ¢æéã§ããïŒããã«ããã¯ã¯ãã¡ã€ã«ã®èªã¿èŸŒ
é »åºéåã¯ïŒäžè¿°ã®ããã«ããŒã¿ããŒã¹ã®å±æçã»ç¹ç°çã ã¿éšåã§ããïŒã¢ã«ãŽãªãºã ã®æè¡ã¯ããŸãé¢ä¿ããªãïŒ
ãå
šäœã«å
±éããæ§é ãèŠãããã®ã§ïŒç¥èçºèŠã®åéã§é èšç®æéã«å€åãåºãã®ã¯ïŒè§£ã®æ°ãå€åãããšãã§ããïŒ
å®ãããŠããïŒ äžè¬çã«ã¢ã«ãŽãªãºã ã®æ±è§£æéã¯å
¥åã®å€§ããã«å¯ŸããŠã®ã¿
äŸãã°ïŒé »åºéåã«å«ãŸããã¢ã€ãã å士ã¯ïŒå
±èµ·ãã確ç è©äŸ¡ããããïŒãã®ããã«è§£ãåæããã¢ã«ãŽãªãºã ã®å ŽåïŒåº
ãé«ããšèããããããïŒã¢ã€ãã éã®é¢ä¿ãèŠãããšãã§ãã åã®å€§ããïŒããªãã¡ïŒãã®å Žåã¯è§£ã®æ°ã«å¯Ÿããè©äŸ¡ãéèŠ
ãïŒããã©ã³ã¶ã¯ã·ã§ã³ãâãšâ³ãå«ããªãïŒâ¡ãå«ãããšãå€ãã ãšãªãïŒãã®ããïŒèšç®æéã[å
¥åã®å€§ããïŒåºåã®å€§ãã]ã®
ãšãã£ãã«ãŒã«ãçºèŠããå Žé¢ã§çšããããïŒ å€é
åŒæéãšãªã£ãŠããã¢ã«ãŽãªãºã ã¯åºåå€é
åŒæé(output
ãŸãïŒæ£äŸãšè² äŸãããªãããŒã¿ããããšãã«ïŒæ£äŸã«å€ãå« polynomial)ãšãã°ãïŒèšç®å¹çã®ïŒã€ã®ææšãšãªã£ãŠããïŒãŸãïŒ
ãŸããã¢ã€ãã éåãèŠã€ããããšã§ïŒæ£äŸã®å€ããå
±éããŠæ ãã匷ãæ¡ä»¶ãšããŠïŒïŒã€ã®è§£ãåºåããŠãã次ã®è§£ãåºåãã
ã¡ïŒè² äŸã®ããŒã¿ã«ã¯èŠãããªãæ§è³ªãæããããïŒãšããã¢ã ãŸã§ã®æéãïŒå
¥åãšãããŸã§ã®åºåã®å€§ããã®å€é
åŒã«ãªã£
ããŒããããïŒãã㯠6.5 ç¯ã®éã¿ã€ããã©ã³ã¶ã¯ã·ã§ã³ã§æ±ãïŒ ãŠãããšãïŒé次å€é
åŒ(incremental polynomial)ãšãã°ãïŒç¹ã«
ããã«ïŒé »åºéåã¯å€ãã®ã¬ã³ãŒãã«å
±éããŠå«ãŸããããïŒ ãããåºåã«äŸåããïŒå
¥åã®ã¿ã®å€é
åŒæéãšãªã£ãŠãããš
ãšã©ãŒãç°åžžå€çã«æ¯ãåãããããšãªãïŒå®å®ããŠããïŒãã® ãå€é
åŒé
延(polynomial delay)ãšãã°ããïŒ
ããïŒå
šãŠã®é »åºéåãæ±ãïŒãããããŒã¿ããŒã¹ã®çµ±èšéãš ã¢ã«ãŽãªãºã ã®å€é
åŒæ§ã¯ïŒåé¡ãæ§é ãæã€ãã©ããã®é
ã¿ãªãããšã§ïŒããŒã¿ããŒã¹éã®æ¯èŒãè¡ãããšãã§ãã[ACP èŠãªææšã§ããããšããïŒåæåã®ã¢ã«ãŽãªãºã ã®å€é
åŒæ§ãã
06]ïŒ ããã«è°è«ãããŠããïŒã°ã©ãã®ãã¹ïŒå
šåŒµæšïŒãããã³ã°ïŒã¯ãª
éåžžïŒé »åºéåã¯ããçš®ã®ç¥èãã«ãŒã«ã®åè£ã§ããããïŒ ãŒã¯ãšãã£ãæ§é ïŒã·ãŒã¯ãšã³ã¹ïŒæšãªã©ã®ãã¿ãŒã³ã¯å€é
åŒæ
ïŒã€ã ãèŠã€ããŠãããŸãæå³ããªãïŒãŸãïŒçµ±èšéãšããŠçšã éã§åæã§ããããšãç¥ãããŠããïŒéã«ïŒå€é¢äœã®é ç¹ãïŒæ©
ãå Žåã¯å
šãŠãèŠã€ããå¿
èŠãããïŒãã®ããåæçãªã¢ãã 械åŠç¿ã®åéã§èåãªæ¥µå°ãããã£ã³ã°ã»ããåæïŒãããã¯ã
ãŒããå¿
é ã§ããïŒäžãããããã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ããŒã¹ãš ã€ããŒã°ã©ãå察åãšãã£ãåé¡ïŒé »åºããã°ã©ããã¿ãŒã³ãçº
éŸå€Îžã«å¯ŸããŠé »åºéåãå
šãŠèŠã€ããåé¡ã¯é »åºéåçºèŠ èŠããåé¡çã¯ïŒåºåå€é
åŒæéã§è§£ãããã©ãããç¥ãããŠ
å é¡ (frequent itemset mining) ïŒ ã ã ã ã¯ é » åº é å ã® å æ ããªãïŒ
(enumeration)ãšãã°ããïŒé »åºåºŠã¯ãµããŒã(support)ãšããã°ã å®çšçã«ã¯ïŒåæã¢ã«ãŽãªãºã ã®å€é
åŒæ§ã¯å¹çè¯ãã®å¿
ãããïŒéŸå€Îžã¯æå°ãµããŒã(minimum support)ãšãã°ããïŒ èŠååæ¡ä»¶ã ãïŒããã ãã§ã¯ããŸãæå³ããªãïŒåæåé¡ã¯
é »åºéåã¯Îžã®å€§ããã«ãããã®æ°ãå€åããïŒèšç®ã®ç«å Ž éåžžã«å€ãã®è§£ãæã€ããïŒåºåã®ïŒä¹ãïŒä¹ã®æéãããã
ããã¯ïŒÎžã倧ãããã°è§£ãšãªãé »åºéåã®æ°ã¯å°ããïŒæå© ã¢ã«ãŽãªãºã ã¯ïŒå°åºåããªãã®ã§ããïŒå€é
åŒé
延ã§ããã°ïŒ
ã§ãããïŒã¢ãã«ã®ç«å Žããã¯ïŒæçšãªç¥èãããããèŠã€ã 解ã®æ°ã«å¯ŸããŠç·åœ¢ã«ããèšç®æéãå¢ããïŒå®çšçã§ãã
-2-
3. The 22nd Annual Conference of the Japanese Society for Artificial Intelligence, 2008
1,2,3, D1= { {1}, {2}, {7}, {9} }
D2 = { {1,2}, {1,7},{1,9},{2,7},{2,9}, {7,9} }
1,2, 1,2, 1,3, 2,3,
{ {1,7},{1,9},{2,7},{2,9}, {7,9} }
1,2,5,6,7,9
D3= { {1,2,7}, {1,2,9},{1,7,9},{2,7,9} }
1,2 1,3 1,4 2,3 2,4 3,4
2,3,4,5
{ {1,7,9} {2,7,9}
1 2 3 4 1,2,7,8,9
D4 = { {1,2,7,9} }
1,7,9
Ï 2,3,7,9
å³ïŒ ã¢ã€ãã éåãäœãæãšé »åºéåã®é å å³ïŒ Ξ=3 ã§ã®ã¢ããªãªãªã®å®è¡äŸ
ãïŒå
¥åã«å¯Ÿããèšç®æéããã倧ãããšïŒå·šå€§ãªããŒã¿ããŒ
ã¹ã«å¯ŸããŠã¯åããªããªãïŒãŸãïŒå®çšçã«ã¯ïŒå
šäœã®èšç®æé ãã®ã¢ã«ãŽãªãºã ã§ã¯ïŒkåç®ã®ã«ãŒããå§ãŸããšãïŒDk ã¯å€§
ãçããã°è§£ã®éã®èšç®æéã¯é·ããŠãè¯ãïŒãããã£ãæå³ ããã k ã§ããé »åºéåã®éåã«ãªã£ãŠããïŒãããŠïŒã¹ããã
ã§ã¯ïŒè§£ïŒã€ãããã®èšç®æéãå
¥åã®äœã次æ°ã®ãªãŒããŒã« 3 ã§Dkã®éåïŒã€ãçµåãããŠã§ãã倧ããk+1 ã®ã¢ã€ãã éå
ãªã£ãŠããããšãïŒå®çšçã«å¹çè¯ãã¢ã«ãŽãªãºã ã®æ¡ä»¶ãšãªãïŒ ãå
šãŠDk+1 ã«æ¿å
¥ããïŒé »åºéåã®å調æ§ããïŒå€§ãããk+1
èå³æ·±ãããšã«ïŒä»ãŸã§ã«ç¥ãããŠããå€é
åŒæéã®åæ㢠ã®ä»»æã®é »åºéåã¯å¿
ãDk+1ã«å«ãŸããïŒãã®ããïŒã¹ããã
ã«ãŽãªãºã ã¯ïŒã»ãŒå
šãŠãã®æ¡ä»¶ãæºããïŒå ããŠïŒåŸã«è¿°ã¹ã 4 ã§Dk+1 ããé »åºã§ãªãã¢ã€ãã éåãåãé€ãããšã§ïŒå€§ãã
æ«åºããæ§ãèæ
®ããå®è£
ãè¡ãããšã§ïŒè§£ïŒã€ãããã®èšç®æ k+1 ã®é »åºéåã®éåãåŸãããïŒ ã¢ããªãªãªã¯ 1994 幎ã®
éãå®æ°ã«è¿ããªãïŒã€ãŸãïŒåºåå€é
åŒæéã®è§£æ³ãååšã AgrawalãšSrikant ãã®è«æ [AS 94]ã§æåã«äžããããïŒãããš
ãã°ïŒäžå·¥å€«å ããããšã§å¹çè¯ãå®è£
ãåŸããããšèããŠã ç¬ç«ã«ïŒMannila ãš Toivonen ããåãã¢ã«ãŽãªãºã ãäžããŠãã
ãïŒ [AMST 96]ïŒ
ã¡ã¢ãªã®äœ¿çšéãïŒåæã¢ã«ãŽãªãºã ã§ã¯å€§ããªåé¡ã§ããïŒ ã¢ããªãªãªã®ã¹ããã 4 㧠Dk+1 ã®åã¢ã€ãã éåã®é »åºåºŠ
解ãå€ãããïŒçºèŠãã解ãå
šãŠã¡ã¢ãªã«èããã¢ã«ãŽãªãºã ã¯ïŒ ãèšç®ããéã«ïŒæ¬¡ã®ã¢ã«ãŽãªãºã ã䜿ããšå¹ççã§ããïŒ
å€å€§ãªã¡ã¢ãªãæ¶è²»ããããšãšãªãïŒçºèŠãã解ãã¡ã¢ãªã«ä¿åã
ãå¿
èŠãããããªããïŒãšããç¹ãïŒã¡ã¢ãªã®ç¹ã§ã®èšç®å¹çã« 1. for each SâDk+1 do frq(S) := 0
é¢ããéèŠãªãã€ã³ãã§ããïŒ 2. for each transaction tâT do
3. for each SâDk+1 do
3.2 ã¢ããªãªãªæ³. if Sât then frq(S) := frq(S)+1
4. end for
ãããã©ã³ã¶ã¯ã·ã§ã³tãé »åºéåXãå«ããªãã°ïŒtã¯Xã®ä»»æ
ã®éšåéåããå«ãïŒãã®ããšã¯ïŒXã®éšåéåã®åºçŸéå㯠ãŸãã¹ããã 1 ã§å
šãŠã®ã¢ã€ãã éå S ã«ã€ã㊠frq(S)ã 0
Xã®åºçŸéåãå«ã¿ïŒé »åºåºŠã¯Xããåãã倧ãããªããšããããš ã«èšå®ãïŒã¹ããã 2 ã§ïŒã€ãã€ãã©ã³ã¶ã¯ã·ã§ã³ãéžã³ïŒããã
ã瀺ãïŒãã®ããïŒé »åºéåã®æã¯å調æ§ãæºããïŒããã¯ïŒ å«ãã¢ã€ãã éå S ã«å¯Ÿã㊠frq(S)ãïŒå¢å ãããïŒå
šãŠã®ãã©
å³ 2 ã§ç€ºããããªã¢ã€ãã éåã®æ(éšåéåéã®å
å«é¢ä¿ã ã³ã¶ã¯ã·ã§ã³ã«å¯ŸããŠãã®åŠçããããšïŒfrq(S)㯠S ã®é »åºåºŠãšãª
è¡šããã°ã©ã)ã®äžã§ïŒé »åºéåã¯äžéšã«é£çµã«ååšããŠãããš ãïŒ
ããããšã§ããïŒ ã¢ããªãªãªã®èšç®æéã¯ïŒ(åè£ã®ç·æ°)Ã(ããŒã¿ããŒã¹ã®å€§
ãã®ããïŒç©ºéåããåºçºãïŒã¢ã€ãã ãïŒã€ãã€è¿œå ããŠã ãã)ã«æ¯äŸãïŒåè£ã®ç·æ°ã¯é »åºã¢ã€ãã éåã®æ°ã® n å(n
ãããšã§ä»»æã®é »åºéåãåŸãããšãã§ããïŒãã®æäœã網çŸ
ç ã¯ã¢ã€ãã ã®æ°)ãè¶
ããªãããšãèãããšïŒé »åºéåïŒã€ãããïŒ
ã«è¡ãããšã§ã¢ã€ãã æäžã®æ¢çŽ¢ãè¡ãã°ïŒå
šãŠã®é »åºéåã ææªã§ã(ããŒã¿ããŒã¹ã®å€§ãã)Ãn ã«æ¯äŸããæéãšãªãããš
èŠã€ããïŒãã®æ¢çŽ¢ãå¹
åªå
çã«è¡ããã®ãã¢ããªãªãª(apriori)ïŒ ããããïŒæ¬çš¿ã§ã¯ïŒããŒã¿ããŒã¹ã®å€§ãããšã¯ããŒã¿ããŒã¹ã®
æ·±ãåªå
çã«è¡ããã®ãããã¯ãã©ãã¯æ³(backtracking)ã§ããïŒ åãã©ã³ã¶ã¯ã·ã§ã³ã®å€§ãããåèšãããã®ãšããïŒïŒé »åºéåã
ã¢ããªãªãªã¯ãšãŠãæåãªææ³ã§ããïŒè³ã«ãããããšã®ããèª å
šãŠã¡ã¢ãªã«ä¿æããå¿
èŠãããããïŒã¡ã¢ãªäœ¿çšéã¯èŠã€ãã
è
ãå€ãããšãšæãïŒç©ºéåããåºçºãïŒå€§ãããïŒã®é »åºéåã é »åºéåã®æ°ã«æ¯äŸããïŒã¢ããªãªãªã¯ïŒåè£ãçæããæé
å
šãŠèŠã€ãïŒãããã«ã¢ã€ãã ã 1 ã€è¿œå ããããšã§å€§ãã 2 ã® ãå
¥åã®å€é
åŒæéãšãªããªãããïŒå€é
åŒé
延ãšã¯ãªããªãïŒ
é »åºéåãèŠã€ããïŒãšããããã«ïŒé »åºéåã倧ããé ã§ïŒå¹
ã¢ããªãªãªã®ç¹åŸŽã¯ïŒããŒã¿ããŒã¹ãžã®ã¢ã¯ã»ã¹æ°ã®å°ãªã
åªå
çã«èŠã€ãåºãã¢ã«ãŽãªãºã ã§ããïŒã¢ããªãªãªã®ã¢ã«ãŽãª ã§ããïŒäžèšã®é »åºåºŠèšç®æ¹æ³ã¯ïŒãã©ã³ã¶ã¯ã·ã§ã³ãäžåã¹ã
ãºã ã以äžã«ç€ºãïŒ ã£ã³ããããšã§å
šãŠã®åè£ã®é »åºåºŠãèšç®ããããšãã§ããïŒã
ã¢ããªãªãª ã®ããïŒããããŒã¿ããŒã¹ã巚倧ã§ã¡ã¢ãªã«å
¥ããªããããªå Žå
1. D1=倧ãã 1 ã®é »åºéåã®éå, k:=1 ã§ãïŒããŒã¿ããŒã¹ã(é »åºéåã®å€§ããã®æ倧)åã ãããã¹ã
2. while Dkâ Ï ã£ã³ããªãïŒ
3. 倧ãããk+1 ã§ããïŒDk ã® 2 ã€ã®ã¢ã€ãã éåã®åéåãš å³ 3 ã§ã¢ããªãªãªã®å®è¡äŸãèŠãŠã¿ããïŒãã®å³ã¯ïŒã¢ããªãª
ãªã£ãŠãããã®ãå
šãŠDk+1 ã«æ¿å
¥ãã ãªã®å®è¡ã«äŒŽãïŒåDk ãã©ãèšç®ããããã瀺ããŠããïŒã¢ããª
4. frq(S)<Ξ ãšãªãSãå
šãŠDk+1ããåãé€ã ãªãªã¯ãŸã倧ãã 1 ã®é »åºéåãå
šãŠæ±ãïŒãã®éåãD1ãšã
5. k := k+1 ãïŒããã 1 è¡ç®ã®éåã§ããïŒæ¬¡ã«D1ã®ïŒã€ã®ã¢ã€ãã éåã®
6. end while åéåã§å€§ããã 2 ã®ãã®ãå
šãŠD2ã«æ¿å
¥ããïŒããã 2 è¡ç®
-3-
4. The 22nd Annual Conference of the Japanese Society for Artificial Intelligence, 2008
é »åºéå 1,2,5,6,7,9 1,2,7,9
1,2,7,9 Ã2
2,3,4,5 2
Ï {1} {2} {7} 7 9 2
1,2,7,8,9 1,2,7,9
1,7,9
{9} {1,7} {1,9} 1 9 1,7,9 1,7,9
2,7,9 Ã2
{2,7} {2,9} 2,3,7,9 2,7,9
2 7 9 2,7,9 2,7,9
{7,9} {1,7,9} Ï
7 9
{2,7,9} å³5 ããŒã¿ããŒã¹çž®çŽ
9 9
å³4 é »åºãã¿ãŒã³æš ãã°ïŒã¢ã€ãã éå{1,3,5}ã®æ«å°Ÿã¯ 5 ã§ããïŒãããŠïŒåã¢ã€ã
ã éåXãçæããéã«ã¯ïŒtail(X)ãæåŸã«è¿œå ããïŒãšããç
æã«ãŒã«ãå®ãããïŒã€ãŸãXã¯ïŒæ«å°Ÿãå«ãŸãªãéåX â
ã®éåã§ããïŒããããé »åºåºŠã 3 æªæºã®ãã®ãå
šãŠåãé€ããšïŒ tail(X)ããã®ã¿çæããïŒãšããããšã§ããïŒãã®ããã«ããã°ïŒX
3 è¡ç®ã®éåãåŸãããïŒä»¥äžåæ§ã«ããŠïŒD3ïŒD4ãšæ±ããïŒD4 ãè€æ°ã®ã¢ã€ãã éåããçæãããããšããªããªãïŒX â tail(X)
ã¯é »åºéåãïŒã€ãå«ãŸãªãããïŒç©ºéåãšãªãïŒããã§ã«ãŒã ã 1 床ããçæãããªãããïŒXã 1 床ããçæãããªãïŒãã®
ãçµäºããŠïŒã¢ã«ãŽãªãºã ã¯çµäºããïŒ ã ã ã« ïŒ X ã X - tail(X) ã ã ç æ ã ã æ¹ æ³ ã æ« å°Ÿ æ¡ åŒµ (tail
extension)ãšãã¶ïŒ
3.3 ãã«ãŒãã³ã°
äŸãšããŠïŒ{1,3,5,6}ã¯{1,3,5}ã®æ«å°Ÿæ¡åŒµã ãïŒ{1,2,3,5}ã¯ã
ã¢ããªãªãªã®é«éåããæ¹æ³ãšããŠïŒãã«ãŒãã³ã°(pruning)ã ãã§ãªãïŒã¢ã€ãã éå X ããæ«å°Ÿæ¡åŒµãããŠã¢ã€ãã éå Y
ããïŒXãDk+1 ã®ã¢ã€ãã éåãšããïŒããXã®å€§ããkã®éšåé ãåŸããããšãïŒtail(X)<tail(Y)ãæãç«ã€ïŒã€ãŸãïŒY ã¯å¿
ã
åã§Dkã«å«ãŸããªããã®ãããã°ïŒXã¯é »åºéåã§ãªãã¢ã€ã tail(X)ãã倧ããªã¢ã€ãã ãè¿œå ããŠåŸãããïŒãŸãïŒtail(X)ãã
ã éåãå«ããšããããšããããïŒãã£ãŠïŒXã¯é »åºéåãšãªãã 倧ããªã¢ã€ãã ãè¿œå ãããã®ã¯å¿
ã X ã®æ«å°Ÿæ¡åŒµã«ãªã£ãŠã
ãšã¯ãªãïŒãã®æ€æ»ãDk+1 ã®å
šãŠã®ã¢ã€ãã éåã«å¯ŸããŠè¡ã ãïŒ
ã°ïŒé »åºåºŠã®æ€æ»å¯Ÿè±¡ãå°ãªãããããšãã§ããïŒäŸãã°ïŒå³ïŒ 以äžããïŒæ¬¡ã®ã¢ã«ãŽãªãºã ããã¯ãã©ãã¯ãåŸãããïŒããã¯ã
ã§D3ã«æåã«å«ãŸããŠãã{1,2,7}ã¯ïŒD2ã«å«ãŸããªãã¢ã€ãã ã©ãã¯(Ï)ãåŒã³åºãããšã§ïŒå
šãŠã®é »åºéåãåæãããïŒ
éå{1,2}ãå«ãã§ããïŒãã®ããšããïŒ{1,2,7}ã¯é »åºéåã§ã¯
ãªããšïŒé »åºåºŠã®èšç®ãããåã«çµè«ä»ããããã®ã§ïŒãã ã¡ ããã¯ãã©ãã¯(X)
ã«D3ããåãé€ãããšãã§ããïŒ 1. output X
3.4 é »åºãã¿ãŒã³æš 2. for each i>tail(X) do
3. if frq(Xâª{i})â§Îž then call ããã¯ãã©ãã¯(Xâª{i})
ã¢ããªãªãªã¯ïŒåDkãã¡ã¢ãªã«ä¿æããå¿
èŠãããïŒãããïŒã
ã©ã€ (trieãããã¯prefix tree)ã®ãããªæšæ§é ã§ä¿æããïŒæ¬æ¥ã
ããã¯ãã©ãã¯æ³ã®é·æã¯ïŒæ¢ã«çºèŠãã解ãã¡ã¢ãªã«ä¿åããŠ
ã©ã€ã¯æååãæ ŒçŽãããã®ã§ããã®ã§ïŒåé »åºéåãæ·»ãå
ããå¿
èŠããªãããïŒåºæ¬çã«å
¥åããããŒã¿ããŒã¹ãä¿æã
é ã«ãœãŒãããåãæååãšã¿ãªãïŒæ ŒçŽããïŒä»»æã®é »åºéå
ãã ãã®ã¡ã¢ãªãããã°åãç¹ã§ããïŒèšç®éã¯ã¢ããªãªãªãšåã
ããïŒã€ã¢ã€ãã ãåãé€ãããã®ã¯é »åºéåã§ããïŒãšããå
ã§ïŒïŒã€ããã(ããŒã¿ããŒã¹ã®å€§ãã)Ãn ã«æ¯äŸããæéãã
調æ§ããïŒã¡ã¢ãªã®äœ¿çšéã¯é »åºéåã®æ°ã«æ¯äŸããéã«ãªãïŒ
ããïŒãŸãïŒåå埩ã§è§£ãå¿
ãïŒã€åºåããããïŒå€é
åŒé
延ãš
ãã®ããã«ïŒé »åºéåãä¿æããããã®ãã©ã€åã®æšæ§é ãé »åº
ãªãïŒãããïŒããããŒã¿ããŒã¹ãã¡ã¢ãªã«å
¥ããããªããããªå ŽåïŒ
ãã¿ãŒã³æš(frequent pattern treeïŒãããã¯FP-tree)ãšãã¶ïŒ
åå埩ã§ããŒã¿ããŒã¹ãã¹ãã£ã³ããããšã«ãªãããïŒå°ãªããšã
å³ 4 ã«äŸã瀺ããïŒæ ¹ããå§ãŸãïŒä»»æã®é ç¹ã§çµäºããã
é »åºéåã®æ°ã ãããŒã¿ããŒã¹ã®ã¹ãã£ã³ãè¡ãããšãšãªãïŒå¹
ã¹ã«å¯ŸããŠïŒãã®ãã¹ãæ§æããé ç¹ã®éåãé »åºéåã«ãªã£
çãèœã¡ãïŒ
ãŠããïŒåŸè¿°ããããã«ïŒãã©ã€ã¯å
¥åããããŒã¿ããŒã¹ãå¹ç
å³ 5 ã¯ïŒå³ 1 ã®é »åºéåãããã¯ãã©ãã¯æ³ã§åæããå³ã§ã
è¯ãæ ŒçŽããã®ã«ã䜿ãããŠããïŒFP-growth [HPY 00] ã¯ïŒæ·±
ãïŒããã¯ãã©ãã¯æ³ã¯ïŒç©ºéåããåºçºãïŒã¢ã€ãã ã 1 ããé
ãåªå
åã®ã¢ã«ãŽãªãºã ã«ãããŠïŒãã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ããŒã¹
ã«è¿œå ããŠããïŒãŸã 1 ãè¿œå ããéã«ïŒfrq({1})â§3 ãæãç«
ãïŒçµãè¡šãèŠçŽ åã®ãã©ã€ãšããŠè¡šçŸãããã®ã§ããïŒ ZBDD-
ã€ã®ã§ååž°åŒã³åºããè¡ãïŒ{1}ãåŒæ°ãšããŠåŒã³åºãããå埩
growth [MA 07] ã¯ïŒãã®ã¢ã€ãã£ã¢ãæ¡åŒµããŠïŒããŒã¿ããŒã¹ã
ã§ã¯ïŒtail({1})=1 ãã倧ããªã¢ã€ãã ãè¿œå ããïŒ2,âŠ,6 ãŸã§ã¯ïŒ
ZBDDïŒè«çé¢æ°ãè¡šãçž®çŽããã DAGïŒã§è¡šããŠããïŒ
è¿œå ããŠãé »åºéåãšã¯ãªããªãããïŒååž°åŒã³åºãã¯è¡ãã
3.5 ããã¯ãã©ãã¯æ³ ãïŒ7 ãè¿œå ããéã« frq({1,7})â§3 ãæãç«ã€ããïŒååž°åŒã³
åºããè¡ãããïŒä»¥åŸåæ§ã«åŠçãé²ããããïŒ{1,7}ã«é¢ãã
ã¢ããªãªãªãå¹
åªå
çãªæ¢çŽ¢ãè¡ãã®ã«å¯ŸãïŒããã¯ãã©ãã¯æ³
ååž°åŒã³åºããçµäºããåŸïŒ{1}ã«é¢ããå埩ã§ã¯ïŒæ¬¡ã« 8 ã
ã¯æ·±ãåªå
çãªæ¢çŽ¢ãè¡ãïŒã¢ã€ãã ãïŒã€ãã€è¿œå ããïŒãšã
è¿œå ããïŒããã¯é »åºã§ã¯ãªãã®ã§ïŒååž°åŒã³åºããè¡ããïŒ
ãæ¹æ³ã§ç¶²çŸ
çã«é »åºéåãçæãããšïŒå€§ããkã®ã¢ã€ãã é
次㫠9 ãè¿œå ããïŒããã¯é »åºãšãªãããïŒååž°åŒã³åºããè¡
åã 2kåçæããŠããŸãïŒéåžžïŒã°ã©ãæ¢çŽ¢ã«ãããŠã¯ïŒéè€ã
ãããïŒ
é¿ããããã«äžåºŠèšªããé ç¹ã«ããŒã¯ãã€ããïŒé »åºéååæ
ããã¯ãã©ãã¯æ³ã«åºã¥ãã¢ã«ãŽãªãºã ã¯ïŒ1998 幎ãã 2000 幎
ã®å Žåã«ã¯ïŒãã®äœæ¥ã¯èŠã€ããé »åºéåãã¡ã¢ãªã«ä¿åããŠ
é ã«ãããŠè€æ°ã®èè
ã«ãã£ãŠææ¡ãããïŒ[BJ 98] ã¯ïŒæ¥µå€§
ããããšã«çžåœããïŒãããïŒãããããšãç°¡åãªæ¹æ³ã§éè€ã¯
éåçºèŠã®ããã«æ«å°Ÿæ¡åŒµãçšããã¢ã€ãã£ã¢ãææ¡ããŠããïŒ
åé¿ã§ããïŒ
[HPY 00ïŒMN 99, MS 00, ZPOL 97]ã¯æ«å°Ÿæ¡åŒµãçšããé »åº
ã¢ã€ãã éåXã«å¯ŸããŠïŒãã®äžã®æ倧ã®ã¢ã€ãã ã æ«å°Ÿ
éåçºèŠã«ã€ããŠè¿°ã¹ãŠããïŒ
(tail)ãšãã³ïŒtail(X)ãšè¡šèšããïŒãã ãtail(Ï)ã¯ïŒâãšããïŒäŸ
-4-
5. The 22nd Annual Conference of the Japanese Society for Artificial Intelligence, 2008
A: 1,2,5,6,7,9 1: A, C, D
* 1 2 5 6 7 9 A A: 1,2,5,6,7,9 2: A,B,C,E,F,
B: 2,3,4,5 3: B, E
7 8 9 B: 2,3,4,5
C 4: B
C: 1,2,7,8,9
7 9 D C: 1,2,7,8,9 5: A, B
D: 1,7,9 6: A
2 3 4 5 B D: 1,7,9 7: A,C,D,E,F
E: 2,3,7,9 8: C
7 9 E E: 2,3,7,9
9: A,C,D,E,F
7 9 F
å³6 ãã©ã€ãçšããããŒã¿ããŒã¹ã®å§çž® å³7 ããŒã¿ã®æ°Žå¹³é
眮ãšåçŽé
眮
4. é«éåã®æè¡
åè¿°ã®ã¢ããªãªãªïŒããã¯ãã©ãã¯ãšãã«ïŒèšç®éã®äžã§ã¯åºå 4.3 ããŠã³ãããžã§ã¯ã
æ°äŸåã®ã¢ã«ãŽãªãºã ã§ããïŒåºåæ°å€é
åŒæéãéæããŠã ããã¯ãã©ãã¯æ³ãšçµã¿åãããŠå¹æãçºæ®ããã®ãïŒæ¬¡ã«è¿°
ãïŒãããïŒå®éã«ã¯ããŒã¿ããŒã¹å
šäœãã¹ãã£ã³ãããšããäœ ã¹ãããŠã³ãããžã§ã¯ã(down project)ã§ããïŒ
æ¥ã¯éåžžã«æéããããïŒçŽ çŽãªå®è£
ã§ã¯ãšãŠãçŸå®çãªæé ä»ïŒããã¢ã€ãã éå X ã«ã¢ã€ãã i ãè¿œå ã㊠Y ãäœãïŒã
å
ã«æ±è§£ã§ããªãïŒãã®ããïŒããã€ãã®é«éåææ³ãææ¡ã ã®é »åºåºŠãèšç®ããããšããïŒãã®ãšãïŒY ãå«ããã©ã³ã¶ã¯ã·ã§
ããŠããïŒ ã³ã¯å¿
ã X ãå«ãã®ã§ïŒY ã®åºçŸéå㯠X ã®åºçŸéåã«å«ãŸ
ããããšããããïŒãã£ãŠïŒY ã®é »åºåºŠã¯ïŒX ã®åºçŸéåã®ã¿ã
4.1 ããŒã¿ããŒã¹çž®çŽ
ãæ±ãŸãïŒããã«ïŒY ã®åºçŸéå Occ(Y)ã¯ïŒX ã®åºçŸéå
æåã®ææ³ã¯ïŒããŒã¿ããŒã¹çž®çŽãšãã°ãããã®ã§ããïŒãã Occ(X)ãšã¢ã€ãã éå{i}ã®åºçŸéå Occ({i})ã®å
±ééšåã§ã
ã¯ïŒå
¥åããããŒã¿ããŒã¹ããïŒãããããäžèŠãªéšåãé€å» ãïŒã€ãŸãïŒä»»æã® Y = Xâª{i}ã«å¯ŸããŠïŒæ¬¡ã®çåŒãæç«ããïŒ
ããããšã§ããŒã¿ããŒã¹ãå°ãããïŒç©ºéã³ã¹ããäžãããšãšãã«ïŒ Occ(Y) = Occ(X) â©Occ({i}).
ã¹ãã£ã³ã«ãããæéãççž®ãããã®ã§ããïŒ ãã®ïŒå
±ééšåããšãäœæ¥ã¯ïŒããŒã¿ããŒã¹å
šäœãã¹ãã£ã³ã
ã¢ã€ãã i ãå«ããã©ã³ã¶ã¯ã·ã§ã³ã®æ°ãΞæªæºã§ãããšãïŒi ãããã¯ããã«çæéã§ã§ããïŒãã®ææ³ãããŠã³ãããžã§ã¯ããš
ãå«ãä»»æã®ã¢ã€ãã éåã®é »åºåºŠã¯Îžæªæºã«ãªãïŒé »åºé ãã¶ïŒ
åãšãªããªãïŒãã£ãŠ i ã¯è¿œå ããã¢ã€ãã ã®åè£ããã¯ãã㊠äŸãã°å³ 5 ã§ïŒ{1}ã« 5 ãè¿œå ããéã«ã¯Occ({1})={A, C,
ããïŒããã«ã¯ïŒããŒã¿ããŒã¹ããã¢ã€ãã i ãå
šãŠé€å»ããŠãèš D}, Occ({5})={A, B}ã§ããã®ã§ïŒOcc({1})â©Occ({5})={A}ãš
ç®çµæã«å€åã¯ãªãïŒãŸãïŒå
šãŠã®ãã©ã³ã¶ã¯ã·ã§ã³ã«å«ãŸãã ãªãïŒããŠã³ãããžã§ã¯ããè¡ãéã«ã¯ïŒãã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ã
ã¢ã€ãã ã¯ïŒä»»æã®ã¢ã€ãã éåã«è¿œå ããŠãé »åºåºŠãå€åã ãŒã¹ãïŒåiã«å¯ŸããOcc({i})ïŒã€ãŸããåã¢ã€ãã ãå«ããã©ã³ã¶
ããªãããïŒåæ§ã«äžèŠã§ããããšããããïŒããããåãé€ãã ã¯ã·ã§ã³ã®éåããšããŠä¿æãããšéœåãè¯ãïŒãã®ããŒã¿ã®ä¿
ãšã§ïŒããŒã¿ããŒã¹ãçž®å°ã§ããïŒããã«ïŒããåäžã®ãã©ã³ã¶ã¯ æã®ä»æ¹ãåçŽé
眮(vertical layout)ãšãã¶ïŒããã«å¯ŸããŠïŒåã
ã·ã§ã³ã k åããã°ïŒãããã¯ïŒã€ã«äœµåãïŒäœµåãããæ° k ã ã©ã³ã¶ã¯ã·ã§ã³ã«å¯ŸããŠïŒããã«å«ãŸããã¢ã€ãã ãä¿æããæ¹
éã¿ã®ãããªåœ¢ã§ä¿åããããšãã§ããïŒãã®ïŒã€ã®æäœã«ããïŒ æ³ãããŒã¿ã®æ°Žå¹³é
眮 (horizontal layout)ãšãã¶ïŒå³ 7 ã«äŸã瀺
çŸå®çãªããŒã¿ã¯ïŒç¹ã«æå°ãµããŒãã倧ããå ŽåïŒå€§å¹
ã«çž® ããïŒ
å°ããããšãã§ããïŒ ããŠã³ãããžã§ã¯ããšããŒã¿ã®æ°Žå¹³é
眮ã¯ïŒ1990 幎代çµããé
å³ïŒã«æå°ãµããŒãã 3 ãšããŠããŒã¿ãçž®çŽããäŸã瀺ããïŒ ã«ïŒããã¯ãã©ãã¯åã®ã¢ã«ãŽãªãºã [MN 99, MS 00, ZPOL 97]ãš
巊端ã®ããŒã¿ããŒã¹ãã 3 ã€æªæºã®ãã©ã³ã¶ã¯ã·ã§ã³ã«ããå«ãŸ å
±ã«å°å
¥ãããïŒãŸãïŒFP-growth [HPY 00]ãããŠã³ãããžã§ã¯ã
ããªããã®ãé€ãããã®ãçãäžã®ããŒã¿ããŒã¹ã§ããïŒããã« ã®å€çš®ãšãããïŒãã®ããã¯ãã©ãã¯åã¢ã«ãŽãªãºã ãšããŒã¿ã®æ°Ž
åäžã®ãã©ã³ã¶ã¯ã·ã§ã³ã䜵åãããã®ãå³ã§ããïŒ å¹³é
眮ãåãããŠïŒ ãã¿ãŒã³æé·æ³(pattern growth method)
[PHPCDH01] ãšåŒã¶ããšãããïŒ
4.2 ãã©ã€ãçšããããŒã¿ããŒã¹ã®å§çž®
å
¥åããããŒã¿ããŒã¹ãå°ããããã«ã¯ïŒããŒã¿ããŒã¹çž®çŽ 4.4. ããããããè¡šçŸ
ã®ã»ãã«ãïŒé »åºãã¿ãŒã³æšãšåæ§ã«ããŠïŒå³ 6 ã®ããã«ããŒã¿ ããŒã¿ããŒã¹ãå¯ã§ããïŒãã©ã³ã¶ã¯ã·ã§ã³ãå¹³åçã«ã¢ã€ã
ããŒã¹ããã©ã€ãçšããŠæšæ§é ãšããŠä¿æããæ¹æ³ãããïŒ ã éåã®äœå²ãã®ã¢ã€ãã ãå«ãå ŽåïŒãããæŒç®ã䜿ãããšã§
é »åºåºŠã®å°ããã¢ã€ãã ãããŒã¿ããŒã¹ããåãé€ãç¹ã¯å ããŠã³ãããžã§ã¯ããå¹çåã§ããïŒåOcc({i})ïŒããã³Occ(X)ã
ãã ãïŒåãé€ããŠåŸãããããŒã¿ããŒã¹ãïŒãã©ã€ã§æ ŒçŽãããš ãããåã§ä¿æãããïŒãããšïŒOcc({i})â©Occ(X)ã¯åãªãããã
ãããç°ãªãïŒãã©ã€ã§æ ŒçŽããããšã§ïŒèªåçã«åäžã®ãã©ã³ã¶ æŒç®ã®ïŒ¡ïŒ®ïŒ€ã§ãã©ã³ã¶ã¯ã·ã§ã³ 32 åïŒããã㯠64 åïŒãã€äžåºŠ
ã¯ã·ã§ã³ã¯ïŒã€ã«çž®çŽãããïŒé »åºãã¿ãŒã³æšãšåæ§ïŒåãã©ã³ã¶ ã«èšç®ã§ããïŒãã®ããïŒéåžžã«é«éåãããïŒãã®æ¹æ³ã¯ãã
ã¯ã·ã§ã³ã¯æ·»ãåé ã«ãœãŒãããŠæ ŒçŽããïŒãã©ã€ã«ããå§çž®å¹ç ãããã(bitmap)ãšãã°ããïŒ
ã¯ã¢ã€ãã ã«å¯Ÿããæ·»ãåã®ã€ãæ¹ã«ãããïŒäžè¬ã«é »åºåºŠã®
倧ããã¢ã€ãã ã«å°ããªæ·»ãåãã€ããããšã§ïŒãã©ã³ã¶ã¯ã·ã§ã³
ã®å€ãã®æ¥é èŸãå
±æããïŒå§çž®å¹çãé«ããªãïŒ
-5-
6. The 22nd Annual Conference of the Japanese Society for Artificial Intelligence, 2008
ããŒã¿
ã®å¯åºŠ pumsb, pumsb*,
connect, chess
1,2,5,6,7,9 2,7,9 Accidents
å¯
2,3,4,5 2
1,2,7,8,9 2,7,9 2 Kosarak BMS-WebView1,2
1,7,9 2,7,9 2,7,9 ÃïŒ T40I10D100K BMS-POS, webdoc,
retail, mushroom,
2,3,5,7,9 2,7,9 T10I4D100K
ç
2,7,9
é »åºé£œåéåæ°
Ã·é »åºéåæ°
å³ 8 ååž°çããŒã¿ããŒã¹çž®çŽïŒ ã¢ã€ãã éå
{2}ã«å¯Ÿããæ¡ä»¶ä»ããŒã¿ããŒã¹ å³9 FIMI ããŒã¿ã»ããã®åé¡
ããã¯ãã©ãã¯æ³ã®åå埩ã§ã¯ïŒå¿
ãtail(X)ãã倧ããªã¢ã€ãã
4.5. æ¯ãåã ã®ã¿ãè¿œå ãããïŒãã®ããïŒååž°çã«åŒã³åºãããå埩ã®äž
æ¯ãåã(occurrence-deliver)ã¯ããŠã³ãããžã§ã¯ãããŸãšããŠè¡ã ã§ãïŒè¿œå ãããã¢ã€ãã ã¯tail(X)ãã倧ããïŒã€ãŸãïŒXãåã
ããšã§ããã«é«éåããææ³ã§ãã [UAUA04, UKA 04]ïŒãã åãå埩ïŒããã³ããã§åŒã³åºãããå埩ã§ã¯ïŒtail(X)ããå°ã
ã¯ïŒç°ãªãã¢ã€ãã ã®çš®é¡ãéåžžã«å€ãå Žåãªã©ïŒçãªããŒã¿ ãã¢ã€ãã ã¯äžèŠã§ããïŒXã®ååºçŸããtail(X)ããå°ããªã¢ã€
ããŒã¹ã«ãããããã¯ãã©ãã¯æ³ã®é«éåã«ãããŠæçšãªææ³ ãã ãå
šãŠé€å»ãããšïŒã¹ãã£ã³ãã¹ãéšåã®ã¿ãæã€ããŒã¿
ã§ããïŒ ã ㌠㹠ã 㧠ã ã ïŒ ã ã ã æ¡ ä»¶ ä» ã ㌠㿠ã ㌠㹠(conditional
ä»ïŒã¢ã€ãã éå X ã«ã¢ã€ãã 1, âŠ, n åã
ãå ã㊠X ããïŒ database)ãšãã¶ïŒ
ã€å€§ããªã¢ã€ãã éåãäœæãïŒãã®é »åºåºŠãå
šãŠèšç®ããïŒãš æ¡ä»¶ä»ããŒã¿ããŒã¹ã®äžã§ã¯ïŒåã¢ã€ãã éåã®é »åºåºŠã¯ã
ããæäœãèããïŒãã®äœæ¥ã¯ïŒããŠã³ãããžã§ã¯ãã䜿ããšïŒã ãšããšã®ããŒã¿ããŒã¹ã§ã®é »åºåºŠãšåãããŸãã¯å°ãããªãïŒã
ãŒã¿ããŒã¹å
šäœã 1 åã¹ãã£ã³ããã®ãšåãæéããããïŒ ã®ããïŒæ¡ä»¶ä»ããŒã¿ããŒã¹ãããã«ããŒã¿ããŒã¹çž®çŽãããšïŒ
t ã X ã®åºçŸãšããïŒt ãã¢ã€ãã i ãå«ããªãïŒãŸããã®ãšã ããå°ããããŒã¿ããŒã¹ãåŸãããïŒæ¡ä»¶ä»ããŒã¿ããŒã¹ãäœ
ã«éãïŒXâª{i}㯠t ã«å«ãŸããïŒããã§ïŒtX ã«å«ãŸããã¢ã€ã æãïŒããŒã¿ããŒã¹çž®çŽãè¡ãïŒãšããäœæ¥ãåå埩ã§è¡ããšïŒå
ã å
šãŠã«ã€ããŠïŒt ã«å«ãŸããïŒãšããããŒã¯ãã€ãããïŒãã®äœ åž°çã«ã¹ãã£ã³ãã¹ãããŒã¿ããŒã¹ã¯å°ãããªãïŒãããååž°ç
æ¥ãïŒX ã®åºçŸå
šãŠã«å¯ŸããŠè¡ããšïŒã¢ã€ãã i ã«ã€ããããŒã¯ ããŒã¿ããŒã¹çž®çŽ(recursive database reduction)ãšãã¶ïŒå³ 8 ã«
ã®éåã¯ïŒXâª{i}ã®åºçŸéåã«ãªãïŒ æå°ãµããŒãã 3 ã§ãããšãïŒã¢ã€ãã éå{2}ã«å¯Ÿããæ¡ä»¶ä»
ãããæ¯ãåãã§ããïŒæ¬¡ã®ã¢ã«ãŽãªãºã ãšãªãïŒ ããŒã¿ããŒã¹ãšïŒãããååž°çã«çž®çŽãããã®ã瀺ããïŒ
ããã¯ãã©ãã¯ã®ãããªåæ圢ã®ã¢ã«ãŽãªãºã ã¯ïŒäžè¬ã«åå埩
æ¯ãåã(X) ã§ååž°åŒã³åºããè€æ°åè¡ãïŒãã®ããïŒååž°åŒã³åºããäœã
1. for each tâOcc(X) do èšç®æšã®æ§é ã¯ïŒæ ¹ã®éšåãå°ããïŒèã®æ¹ãžé²ãã»ã©é ç¹æ°
2. for each iât, i>tail(X) do ãå€ããªãæ«åºããçãªæ§é ãæã€ïŒã€ãŸãïŒæ·±ãã¬ãã«ã®å埩
3. i ã« t ã®ããŒã¯ãã€ãã ã§ã®èšç®æéãå
šäœã®èšç®æéã®ã»ãšãã©ã®éšåãå ãããã
4. end for ã«ãªãïŒãã®ããïŒäžèšã®ææ³ã®ããã«ïŒå埩ãæ·±ããªãã°æ·±ã
ãªãã»ã©èšç®æéãçž®å°ããããããªå·¥å€«ãå ããããšã§ïŒå€§ã
æ¯ãåãã¯ïŒæ°Žå¹³é
眮ã§èšè¿°ãããããŒã¿ããŒã¹ããïŒãã® ãªèšç®æéã®æ¹åãè¡ããã®ã§ããïŒãã®æ§è³ªã¯å€ãã®ãã¿ãŒ
åçŽé
眮ãæ±ããããšã«çžåœããïŒãã®ããïŒæ¯ãåãã®èšç® ã³ãã€ãã³ã°ã¢ã«ãŽãªãºã ãå§ãïŒåæã¢ã«ãŽãªãºã å
šè¬ã«æã
æé㯠X ã®åºçŸéåãäœãããŒã¿ããŒã¹ã®å€§ããã«æ¯äŸããïŒ ç«ã€æ§è³ªã§ããïŒ
ãã㯠X ã®é »åºåºŠãå°ãããªãã°å°ãããªãã»ã©ïŒããŠã³ãããž å®éšçã«ã¯ïŒãã®ãããªæ¹è¯ãè¡ããªãå ŽåïŒè§£ã®å¢å ãšãšã
ã§ã¯ãã«æ¯ã¹ãŠé«éåããããšããããšãæå³ããïŒããã¯ãã©ã㯠ã«èšç®æéã¯å¢å ããŠãããïŒååž°çã«èšç®æéãæžå°ããã
æ³ã®å ŽåïŒi>tail(X)ãæºãã i ã®ã¿ã«ã€ããŠé »åºåºŠã®èšç®ãã ãšïŒããçšåºŠè§£æ°ãå¢ãããŸã§ã¯åæåã®éšåãããã«ããã¯ãš
ãããïŒååºçŸã®ïŒtail(X)ãã倧ããªã¢ã€ãã ã®ã¿ã«ã€ããŠïŒã ãªãïŒèšç®æéã®å¢å ã¯ã¿ãããïŒè§£ãååã«å€§ãããªã£ããšã
ãŒã¯ãä»å ããïŒåãã©ã³ã¶ã¯ã·ã§ã³ã®ã¢ã€ãã ãæ·»ãåé ã§ãœãŒ ãŠãïŒè§£ã®æ°ã«æ¯äŸã¯ãããã®ã®ïŒããããã«èšç®æéãå¢å€§
ãããŠããããšã§ïŒå¹çè¯ãå®è¡ã§ããïŒ ããŠããïŒãã®ããïŒåå解ã®æ°ã倧ãããªããšïŒè§£ã®æ°ã®ã¿ã«
ããŒã¿ããŒã¹ãçã§ãããšïŒå€ãã®ã¢ã€ãã i ã«å¯Ÿã㊠Xâª{i} èšç®æéãäŸåããããã«ãªãïŒçŸå®çã«éåžžã«åªãã解æ³ãš
ã®é »åºåºŠã 0 ãšãªãïŒãã®ããïŒããŠã³ãããžã§ã¯ãã¯å€ãã®é »åº ãªãïŒ
床èšç®ãè¡ã£ãŠãå°ãªãé »åºéåããåŸãããªããªãïŒè§£ 1 〠ååž°çããŒã¿ããŒã¹çž®çŽã¯ïŒ2000 幎代ã«å
¥ã£ãŠ PrefixSpan
ãããã®èšç®æéã¯å€§ãããªãïŒæ¯ãåãã®å Žåã¯ïŒããŒã¯ã〠[PHPCDH01]ã LCM [UAUA 04, UKA 04]çã®ã¢ã«ãŽãªãºã ã§
ããã¢ã€ãã ã®ã¿ã«é¢ããŠã®ã¿é »åºåºŠãèŠãã°ããïŒãé »åºåºŠã çšããããïŒ
0 ãšãªãã¢ã€ãã ã¯ãããã調ã¹ãªãããšããããšãã§ããïŒ
5. å®è£
ã®æ¯èŒ
4.6. ååž°çããŒã¿ããŒã¹çž®çŽãšæ«åºããæ§
ããŠïŒãããã®ã¢ã«ãŽãªãºã ãšæè¡ã®çµåãã§ïŒå®éã®ããŒã¿
ããã¯ãã©ãã¯æ³ã®ããã®ããäžã€ã®æå¹ãªé«éåææ³ãïŒå ã«å¯ŸããŠæçšãªãã®ã¯ã©ãã§ããããïŒãã®çåã解決ãã¹ãïŒ
åž°çããŒã¿ããŒã¹çž®çŽã§ããïŒ FIMI(frequent itemset mining implementations)ã¯ãŒã¯ã·ã§ãã
-6-
7. The 22nd Annual Conference of the Japanese Society for Artificial Intelligence, 2008
[FIMI HP]ã«ãŠããã€ãã®å®è£
ã®æ¯èŒãè¡ãããŠããã®ã§ïŒã é »åºéååæã¯ïŒçŸå®çã«ã¯ã»ãŒæé©ãªæéã§è¡ãããšèã
ã®çµæã玹ä»ãããïŒ ãŠè¯ãã ããïŒ
5.1 ããŒã¿ã»ããã®ç¹æ§
FIMIã¯ãŒã¯ã·ã§ããã§çšããããããŒã¿ã»ãã(FIMIããŒã¿ã»ã 6. ä»ã®åé¡ãžã®é©çš
ããšåŒã¶)ã¯ïŒçŸå®åé¡ããåŸããããã®ãäžå¿ã§ããïŒPOSã é »åºéååæã¯ããèªäœæçšãªããŒã«ã§ãããïŒä»ã®åé¡
ãŒã¿ïŒwebã®ã¯ãªãã¯ããŒã¿çãããïŒå€ãããã©ã³ã¶ã¯ã·ã§ã³ã®å¹³ ãåž°çããŠïŒå¹
åºãåé¡ã解ãããã«ã䜿ããïŒæè¿ã§ã¯ïŒå€ã
åãµã€ãºã 10 æªæºãšçã§ãããïŒãã¯ãŒåãæºããïŒå€§ããªãã© ã®é »åºéååæåé¡ã®ããã°ã©ã å®è£
ãå
¬éãããŠããã®ã§
ã³ã¶ã¯ã·ã§ã³ãããã€ãå«ãŸããŠããïŒãŸãïŒå¯ãªããŒã¿ã»ããã FIMI HP, å®é HP]ïŒãã®çš®ã®ã¢ãããŒããæçšãšæãããïŒãã
ããïŒç¹ã«æ©æ¢°åŠç¿ã®ãã³ãããŒã¯ããæ¥ãåé¡ã¯å¯ã§ããïŒ ã§ã¯ïŒãã®ãããªé »åºéååæã®äœ¿ãæ¹ã解説ãããïŒ
å³ 9 ã«ããŒã¿ã®å¯åºŠãšæ§é ã«åºã¥ããããŒã¿ã»ããã®åé¡ã
瀺ããïŒçžŠè»žãå¯åºŠïŒã€ãŸãå
šã¢ã€ãã æ°ã«å¯Ÿããåãã©ã³ã¶ã¯ 6.1 ãã«ãã»ããã®çºèŠ
ã·ã§ã³ã®å¹³åçãªå€§ããã§ããïŒçãªãã®ã¯ã¢ã€ãã æ° 10000 ã« é »åºéååæã§ã¯ïŒãã©ã³ã¶ã¯ã·ã§ã³ã«åäžã®ã¢ã€ãã ãè€æ°
察ããŠãã©ã³ã¶ã¯ã·ã§ã³ã®å¹³åçãªå€§ããã 5-6 çšåºŠã§ããïŒå¯ãª åå«ãŸããããšã¯ãªããšããŠããïŒãããïŒå®éã®ããŒã¿ã§ã¯è€
ãã®ã¯ïŒå€ãã®ã¢ã€ãã ãåå以äžã®ãã©ã³ã¶ã¯ã·ã§ã³ã«å«ãŸããïŒ æ°åã®ã¢ã€ãã ãå«ãŸããå ŽåãããïŒã€ãŸããã©ã³ã¶ã¯ã·ã§ã³ã
暪軞ã¯ïŒè§£æ°ãååã«å€§ãããããªïŒããŒã¿ããŒã¹ã®å€§ãããšå éåã§ãªããã«ãã»ããïŒå€ééåïŒãšããŠæ±ãããïŒãã®ãããª
çšåºŠïŒæå°ãµããŒãã§ã®ïŒé »åºé£œåéåæ°Ã·é »åºéåæ°ã§ããïŒ ããŒã¿ããïŒåäžã®ã¢ã€ãã ãè€æ°åå«ãããšãããããé »åºé
ãã®å€ãå°ãããšïŒããŒã¿ã¯ã©ã³ãã æ§ãæã¡ïŒå€§ãããšïŒããŒã¿ åãèŠã€ããåé¡ããã«ãã»ããçºèŠåé¡ãšãã¶ïŒ
ã¯æ§é ãæã€ãšèããŠããïŒ ãã«ãã»ããçºèŠåé¡ãé »åºéååæã«åž°çããéã«ã¯ïŒã¢
ãŸãïŒå
šäœçãªåŸåãšããŠïŒé »åºéåæ°ãããŒã¿ããŒã¹ã®å€§ ã€ãã iããã©ã³ã¶ã¯ã·ã§ã³tã«kåçŸããŠãããšãïŒå
šãŠã®iãåãé€
ããã«å¯ŸããŠååå°ããå ŽåïŒèšç®ã®ããã«ããã¯ã¯åé¡ã®å
¥ ãïŒãããã«i1,âŠ,ikãæ¿å
¥ããïŒãšããå€æãè¡ãïŒijã¯ãjåç®ã®
åãšåæåã«ãªãïŒãã®ããïŒãã©ã€çãçšããªãïŒã·ã³ãã«ãª ã¢ã€ãã iããšããæå³ã§ããïŒãããšïŒiãjåå
¥ã£ããã«ãã»ãã
ã¢ã«ãŽãªãºã ãé«éãšãªãïŒããã解æ°ã®å¢å ãšãšãã«äž¡è
ã¯ã ã¯ã¢ã€ãã éå{i1,âŠ,ij}ã«å¯Ÿå¿ãïŒäž¡è
ã®é »åºåºŠã¯çããïŒã
ãã«é転ãïŒå
¥åãšåºåãã»ãŒçãã倧ããã«ãªãããã«ã¯äž¡è
ã®ããïŒå€æããããŒã¿ããŒã¹ã§é »åºéåãèŠã€ããããšã§ïŒ
ã®å·®ã¯éãïŒ å
ã®ããŒã¿ããŒã¹ã®é »åºãã«ãã»ãããèŠã€ããããšãã§ããïŒ
ãã ãïŒ{i3,i4}ã®ããã«ïŒãã«ãã»ããã«å¯Ÿå¿ããªãé »åºéåã
5.2 ã¢ããªãªãªæ³ vs. ããã¯ãã©ããã³ã°æ³
èŠã€ããããïŒå¹çã¯èœã¡ãïŒãããïŒãã®ãããªäžèŠãªé »åºé
ã¢ããªãªãªã¯ïŒã¹ãã£ã³ã®åæ°ãå°ãªãããããŒã¿ããŒã¹ãã¡ åã¯å¿
ã飜åéåã«ãªããªãããïŒé »åºé£œåéåãåæãã
ã¢ãªã«å
¥ããªãå Žåã§ãé«éã§ããããšïŒã°ã©ããã€ãã³ã°çã®ïŒ ããšã§ïŒãããã®äžèŠãªé »åºéåãèªåçã«åãé€ãããšãã§ã
å
å«é¢ä¿ã®å€å®ã«æéããããå Žåã§ãïŒæ¢çºèŠè§£ãçšã㊠ãïŒ
å
å«é¢ä¿ã®å€å®ãå°ãªãã§ããããšãå©ç¹ã§ããïŒãããïŒãã®ã¯
ãŒã¯ã·ã§ããã§çšããããããŒã¿ã¯ååã¡ã¢ãªã«åãŸã倧ãã㧠6.2 é »åºéšåæ§é ãã¿ãŒã³åé¡ã®åž°ç
ããïŒãã€å
å«é¢ä¿ã®å€å®ã¯çæéã§ã§ããïŒãã®ããïŒã¢ã㪠äžè¬ã«ïŒä»»æã®é »åºãã¿ãŒã³çºèŠåé¡ã¯ïŒé »åºéåçºèŠå
ãªãªåã®ã¢ã«ãŽãªãºã ã¯ã©ããæªãæ瞟ãåããŠããïŒç¹ã«æå° é¡ã«åž°çããŠè§£ãããšãã§ããïŒã¬ã³ãŒã t ã«å«ãŸãããã¿ãŒã³å
š
ãµããŒãã倧ããïŒãã€è§£æ°ã倧ããå Žå 100 åãè¶
ããå·®ã ãŠãããªãéåãäœãïŒã¬ã³ãŒã t ããã®éåã§çœ®ãæããïŒäŸã
åºãããšãããïŒ ã°ïŒåã¬ã³ãŒããæšã§ããããŒã¿ããŒã¹ããé »åºããæšãèŠã€
ãããå Žåã¯ïŒåã¬ã³ãŒã t ã«å¯ŸããŠïŒt ã«å«ãŸããéšåæšãå
š
5.3 é »åºŠèšç®ãšããŒã¿ããŒã¹ã®æ ŒçŽæ¹æ³
ãŠåæãïŒãã®éåãæ°ããªã¬ã³ãŒããšã㊠t ã眮ãæããïŒå
ããŠã³ãããžã§ã¯ããšæ¯ãåãã¯ïŒæ¯ãåãã®ã»ããé«éã§ãã ãã¿ãŒã³ãã¢ã€ãã ã§ãããšã¿ãªããšïŒçœ®ãæããããšã® t ã¯ãã©
ããã ïŒç¹ã«çãªããŒã¿ã§ã¯ãã®å·®ã倧ããïŒããŒã¿ããŒã¹çž® ã³ã¶ã¯ã·ã§ã³ã§ãããšã¿ãªããïŒãã£ãŠïŒå
šãŠã®ã¬ã³ãŒãã眮ãæ
çŽã¯ïŒæå°ãµããŒãã倧ãããã€è§£æ°ãå°ããå Žåã«æå¹ã§ã ããŠåŸãããããŒã¿ããŒã¹ã®é »åºéåãåæããããšã§ïŒå
ã®
ããïŒçãªããŒã¿çïŒæå°ãµããŒãã 10-20 ãšéåžžã«å°ããå Ž ããŒã¿ããŒã¹ã®å
šãŠã®é »åºãã¿ãŒã³ãèŠã€ããããšãã§ããïŒ
åã«ã¯å¹æããªãïŒå³ 9 ã§äžã®ã»ãã«äœçœ®ããããŒã¿ã«æå¹ã§ ãã®æ¹æ³ã§ãïŒãã«ãã»ããã®ãã€ãã³ã°ã®ããã«äžèŠãªé »åº
ããïŒ éåãå€ãèŠã€ããïŒé£œåéåãæ±ããããšã§å€ãã¯èªåçã«
ååž°çãªããŒã¿ããŒã¹çž®çŽãåãã§ãããïŒç¹ã«å¯ãªããŒã¿ åé¿ã§ãããïŒäŸãã°ïŒãã飜åéå X ã®åºçŸéåã«å¯Ÿå¿ãã
ã§è§£æ°ã倧ãããªã£ããšãã®èšç®æéã®æ¹åçã¯åçã§ããïŒ ã¬ã³ãŒããïŒå
å«é¢ä¿ã«ãªããã¿ãŒã³ãããã€ãå
±éããŠå«ãå Ž
䜿çšããŠããªãå®è£
ã§ã¯èšç®ãçµäºããªããããªåé¡ã§ã解ã åïŒããããå
šãŠé£œåéåã«å«ãŸããŠããŸãïŒãã®ããïŒå飜å
ããšãã§ããïŒ éåã¯ïŒåé·ãªæ
å ±ãæã€ããšãšãªãïŒ
ããããããã¯ïŒããŒã¿ããŒã¹ãå¯ã§ããå Žåã«å€å°æå¹ã§ ãŸãïŒã¬ã³ãŒãã倧ãããšïŒåé¡ã®å€ææã®ãã¿ãŒã³åæã§èš
ãããïŒããŒã¿ããŒã¹çž®çŽãšçµåãããšããããããã®åæ§ç¯ã« 倧ãªæéããããïŒãã®ããïŒå
šãŠã®ã¬ã³ãŒããããŸã倧ãããª
æéããããããïŒè§£æ°ãå€ãå Žåã®èšç®å¹çãæªãïŒãã® ãïŒãšããç¶æ³ã§äœ¿çšããããšãæãŸããïŒ
ããïŒå¹æã¯éå®çã§ããïŒ
å
¥åããã©ã€ã§æ ŒçŽããæ¹æ³ã¯ïŒçãªããŒã¿ã§ã¯å¹æãå°ã 6.3 倧ããªå
±ééšåãæã€ãã©ã³ã¶ã¯ã·ã§ã³éå
ãïŒããã£ãŠäœãããªãã»ããéãããšãå€ãïŒãããå¯ãªããŒã¿ ããç¥ãããé¢ä¿ãšããŠïŒãã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ããŒã¹ã¯ 2 éš
ã§ã¯å¹æãããïŒéåžžã«å¯ãªããŒã¿ïŒå³ 9 ã§äžéšã«äœçœ®ããã ã°ã©ããšç䟡ã§ããïŒã€ãŸãïŒã¢ã€ãã ã®éåãšïŒãã©ã³ã¶ã¯ã·ã§ã³
ãŒã¿ã§ã¯ 5-6 åã®é«éåãã§ããŠããïŒ ã®éåãé ç¹éåãšãïŒããã¢ã€ãã X ããã©ã³ã¶ã¯ã·ã§ã³ t ã«å«
é«éãªå®è£
ã§ã¯ïŒèšç®ã®ããã«ããã¯ã¯ïŒã»ãŒã©ã®åé¡ã§ã ãŸãããšãïŒX ãš t ã®éã«èŸºãååšãããã㪠2 éšã°ã©ãã§ããïŒ
解ã®åºåéšåã§ããïŒãã®æå³ã§ã¯ïŒåå解ã®æ°ã倧ãããã°ïŒ
-7-
8. The 22nd Annual Conference of the Japanese Society for Artificial Intelligence, 2008
ãã®ãšãïŒã¢ã€ãã ãšãã©ã³ã¶ã¯ã·ã§ã³ã¯å¯Ÿç§°ãªé¢ä¿ã«ããïŒäž¡ ããŒã¿ããŒã¹ ãã¿ãŒã³
è
ã®ç«å Žãå
¥ãæ¿ããããŒã¿ã¯ãã¯ããã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ã
ãŒã¹ãšãªãïŒãã®ããŒã¿ããŒã¹ã«å¯ŸããŠé »åºéåãçºèŠãããšïŒ A
ããã¯ããäžå®æ°ä»¥äžã®å€§ããã®å
±ééšåãæã€ãã©ã³ã¶ã¯ã·ã§ C A C B
ã³ã®éåã«ãªãïŒäŸãã°ïŒãã©ã³ã¶ã¯ã·ã§ã³ããŒã¿ããŒã¹ããïŒå
±
ééšåã®å€§ãããã©ã³ã¶ã¯ã·ã§ã³ã®çµãèŠã€ãããå Žåã¯ïŒãã® B A C A B
å€æãè¡ã£ãŠå€§ããã 2 ã®é »åºéåãèŠã€ããããšã§ïŒå
šãŠã® C B B
ãã®ãããªçµãèŠã€ããããïŒ ç
§åé¢æ°
6.4 top-K é »åºéåã®çºèŠ
é »åºéåçºèŠã®ã¢ãã«ã®é¢ã§ã®åŒ±ç¹ã«ïŒé©åãªéã®è§£ãåŸ å³ 10 é åºæšãã¿ãŒã³
ãããšãããšïŒæå°ãµããŒãã調æŽããŠäœåºŠã解ããªãããªããã°
ãªããªãïŒãšããç¹ãããïŒãã®ããïŒéã«è§£ã®æ°Kãæå®ãïŒé »
é »åºéšåéåçºèŠåé¡ã§ã¯ïŒå³ 10 ã«ç€ºããããã«ïŒããŒã¿ãš
åºéåã®æ°ãKã«ãªããããªæå°ãµããŒããæ±ãããïŒãšããtop-
ãã¿ãŒã³ã¯ãšãã«é åºæšã§è¡šãããïŒé ç¹ã©ãã«ã®éåΣã«å¯Ÿ
Ké »åºéåçºèŠãç 究ãããŠããïŒ
ããŠïŒÎ£äžã®ã©ãã«ä»ãé åºæš(labeled ordered tree)ãšã¯ïŒå³ 10
ã¢ã«ãŽãªãºã ã®ã¢ã€ãã£ã¢ã¯ç°¡åã§ïŒæåæå°ãµããŒãã 1 ã«
ã«ç€ºããããã«ïŒåé ç¹ãΣã®èŠçŽ ã§ã©ãã«ä»ããããæ ¹ä»ã
ããŠé »åºéåçºèŠã¢ã«ãŽãªãºã ãèµ°ããïŒK åã®è§£ãèŠã€ãã
æš T ã§ããïŒTã«ãããŠåé ç¹ã®åã®é åºã¯æå³ããã€ïŒããª
æç¹ã§ïŒè§£éåãšããŠä¿æãïŒæå°ãµããŒãããã®äžã§æãå°ã
ãã¡ïŒåã®é åºãå
¥ãæ¿ããŠåŸãããæšã¯å¥ã®é åºæšã§ãããš
ãªé »åºåºŠã«èšå®ããïŒä»¥åŸã¯ïŒæ°ãã«æå°ãµããŒãããé »åºåºŠ
èããïŒ
ã®å€§ããªè§£ãèŠã€ãããã³ã«ïŒè§£éåããé »åºåºŠãæãå°ããª
ã°ã©ãããŒã¿ããŒã¹ã¯ïŒåã¬ã³ãŒããã©ãã«ä»ãé åºæšã§ããã
解ãåãé€ãïŒæå°ãµããŒããæŽæ°ããïŒãšããäœæ¥ãç¹°ãè¿ãïŒ
ããªããŒã¿ããŒã¹ D = {D1, âŠ, Dm} ã§ããïŒãã¿ãŒã³ã¯ä»»æã®ã©
ã¢ã«ãŽãªãºã ãçµäºãããšãã«ïŒè§£éå㯠K çªç®ãŸã§ã«é »åºåºŠ
ãã«ä»ãé åºæšã§ããïŒãã¿ãŒã³ã§ããé åºæš S ãïŒã¬ã³ãŒãT
ã倧ããªã¢ã€ãã éåã®éåã«ãªã£ãŠããïŒå®ããŒã¿ã§ã®ãã
ã«åºçŸãããšã¯ïŒå³ 10 ã®ããã«ïŒé åºæšSã®é ç¹ã©ãã«ãšèŸºãã
ã©ãŒãã³ã¹ã¯ããã»ã©æªããªãïŒçŽæ¥ïŒè§£ã K åãšãªãæå°ãµã
ãŸãä¿åãããŸãŸïŒSãé åºæšTã®äžéšåã«ããŸãåã蟌ããããª
ãŒããäžããŠæ±è§£ããã®ã«æ¯ã¹ïŒ2-3 åããæéãããããªãïŒ
ç
§åé¢æ° f ãååšããããšãšå®çŸ©ããïŒãã®ãšãïŒSâŠTãšæžãïŒS
6.5 éã¿ä»ããã©ã³ã¶ã¯ã·ã§ã³ãšæé©éåçºèŠ ã¯Tã®éšåæšã§ãããšãããïŒããŒã¿ããŒã¹ã«ããããã¿ãŒã³Sã®
é »åºéåçºèŠã§ã¯ïŒéåžžãã©ã³ã¶ã¯ã·ã§ã³ã«ã¯éã¿ããªãïŒã© åºçŸã¯ïŒç
§åé¢æ°ã®å
šäœTOcc(S) = { f : ç
§åé¢æ° f ã«ãã£ãŠS
ã®ãã©ã³ã¶ã¯ã·ã§ã³ã«å«ãŸããŠããŠãéèŠæ§ã¯ç䟡ã§ãããšã¿ãª ã¯ãã Di ã«åºçŸãã } ãïŒææžåºçŸDOcc(S) = { i : S ã¯ãã
ãããïŒãããçŸå®ã®åé¡ã§ã¯ïŒéã¿ãä»ããããŒã¿ãæ±ãå¿
èŠ Di ã«åºçŸãã }ïŒæ ¹åºçŸ ROcc(S) = { x : S ã®æ ¹ã¯ããã¬ã³ãŒ
ãçããïŒãŸãïŒãããããã«æ¡åŒµããŠæ£è² ã®éã¿ãä»ããã㌠ãDiã®é ç¹ x ã«åºçŸãã }ãªã©ããŸããŸãªå®çŸ©ãèããããïŒã
ã¿ãèããããšãïŒããŒã¿ã®åé¡ã§ã¯æçšã§ããïŒäŸãã°ïŒæ£äŸ ãã§ã¯ïŒããŸããŸãªè¯ãæ§è³ªããïŒæ ¹åºçŸã«ããšã¥ãåºçŸã®å®çŸ©
ãšè² äŸãåé¡ããå Žåãªã©ïŒãå«ãŸããŠã»ãããªããã©ã³ã¶ã¯ã·ã§ Occ(S) = ROcc(S)ãæ¡çšããïŒ
ã³ããååšãããšãã¯ïŒè² ã®éã¿ãèæ
®ããã»ããïŒéœåãè¯ã ãã®ãšãïŒãã¿ãŒã³Sã®é »åºŠã¯frq(S) = |ROcc(S)|ã§ããïŒéŸå€
[MS 00]ïŒãã®ããã«ïŒé »åºéåãããŒã¿ã®åé¡ã«çšããŠåé¡ç²Ÿ Ξâ§1 ã«å¯ŸããŠïŒããfrq(ãªãã°Îžãªãã°ïŒSãDäžã®é »åºãã¿ãŒ
床ããã®ä»ã®ç®æšé¢æ°ãæ倧ã®é »åºéåãæ±ããæé©ã¢ã€ã ã³(frequent pattern)ãšããïŒäžããããã°ã©ãããŒã¿ããŒã¹DãšéŸ
ã éåçºèŠã¯ïŒçµéšãªã¹ã¯æå°åãïŒããŒã¹ãã£ã³ã°ãªã©ïŒåçš® å€Îžã«å¯ŸããŠïŒDäžã®é »åºé åºæšãå
šãŠèŠã€ããåé¡ãé »åºé
ã®æ©æ¢°åŠç¿åé¡ã§ãã°ãã°çŸãã [KMM 04, TK 06]ïŒ åºæšçºèŠåé¡(frequent tree mining) ãšããïŒ
æ£è² ã®éã¿ãããããå Žåã«ã¯ïŒãã¯ãé »åºéåã¯å調㧠7.2 çŽ æŽãªæ¹æ³ã®åé¡ç¹
ãªãããïŒåºåæ°ã«äŸåããã¢ã«ãŽãªãºã ã®æ§ç¯ã¯é£ãããªãïŒ
æ ¹åºçŸãŸãã¯ææžåºçŸã«ããšã¥ãé »åºŠãæ¡çšãããšãïŒé åº
ãŸãïŒçè«çã«ã¯åºãç¯å²ã®è©äŸ¡é¢æ°ã«å¯ŸããŠïŒçµéšèª€å·®ã
æšãã¿ãŒã³ã®æã¯ïŒé »åºŠã«é¢ããå調æ§ãæºããïŒããªãã¡ïŒ
çµéšãªã¹ã¯ã®æå°ååé¡ã¯ïŒã¢ã€ãã éåãå€ãã®ãã¿ãŒã³ã¯ã©
éšåæšé¢ä¿âŠã«é¢ããŠïŒããSâŠTãªãã°frq(S)â§frq(T)ãæç«
ã¹ã«å¯ŸããŠïŒè¿äŒŒçã«ããèšç®å°é£ã§ããããšãç¥ãããŠãã
ããïŒããã¯ïŒéšåæšé¢ä¿âŠã«é¢ããŠé åºæšå
šäœãã€ããææ§
[Mo 98, SAA 00]ïŒ
é ã§ïŒå°ããªæšã¯äžæ¹ã«ïŒå€§ããªæšã¯äžæ¹ã«ãããšãããšïŒé »åº
ãããïŒé »åºéååæã®ã¢ã«ãŽãªãºã ãããŒã¹ã«ããŠïŒæ£éã¿
ãã¿ãŒã³ã¯æã®äžååã«é£çµããŠååšããããšãæå³ããïŒã¢
ã«å¯Ÿããæå°ãµããŒã[KMM 04]ãïŒç®æšé¢æ°ã®åžæ§ãçšãã
ã€ãã éåå Žåãšåæ§ã«ïŒãµã€ãºããŒãã®ç©ºæšããåºçºããŠïŒé
æåã[MS 00, TK 06]çãå°å
¥ããããšã«ããïŒããçšåºŠå¹çã®
ç¹ãäžã€ãã€è¿œå ããŠããããšã§ïŒä»»æã®é »åºé åºæšãåŸãããš
è¯ãã¢ã«ãŽãªãºã ãæ§æããããšãã§ããïŒãã®å ŽåïŒåºåæ°ã«
ãã§ããïŒ
察ããèšç®æéã¯éåžžã«é·ããªããïŒäœ¿çšã«èããªãã»ã©ã§ã¯
ãããïŒäžè¬ã«äžèšã®ããæ¹ã§åçŽã«ç¶²çŸ
çãªæ¢çŽ¢ãè¡ããšïŒ
ãªãïŒããŒã¿è§£æã®ææ³ãšããŠïŒïŒã€ã®è¯ãéå
·ãšãªãã ããïŒ
ãµã€ãºkã®é åºæšã 2kå以äžéè€ããŠçæããŠããŸãïŒã¢ã€ãã
7. é »åºé åºæšãã¿ãŒã³çºèŠ éåã®å Žåãšåæ§ã«ïŒãã§ã«èŠã€ããé »åºé åºæšã ã¡ã¢ãªã«ä¿
åããŠïŒå¹
åªå
æ¢çŽ¢ãè¡ãããšã§ãã®åé¡ã¯è§£æ±ºã§ããïŒãããïŒ
7.1 ã©ãã«ä»ãé åºæšãšé »åºãã¿ãŒã³çºèŠ 次ã®ããã«æ³šææ·±ãæ¢çŽ¢ãè¡ãããšã§ïŒã¡ã¢ãªãžã®ä¿åãåé¿ããŠïŒ
ããã¯ãã©ãã¯æ³ãçšããæ·±ãåªå
æ¢çŽ¢ãè¡ãããšãå¯èœã§ããïŒ
æ§é ããŒã¿ããã®é »åºãã¿ãŒã³çºèŠã«å¯ŸããŠïŒãããŸã§ã«è¿°
ã¹ãŠããé »åºéååæã®æè¡ãé©çšã§ãã [æµ
äº 04, WM03]ïŒ 7.3 é åºæšã®ç³»åè¡šçŸ
ããã§ã¯ïŒäŸãšããŠïŒé åºæšããã®é »åºéšåæ§é ã®çºèŠã¢ã«ãŽãª
éµãšãªãã¢ã€ãã£ã¢ã¯ïŒé åºæšãé ç¹ãšãã®æ·±ãã®å¯Ÿãããªã
ãºã ã説æããïŒ
ç³»åãšããŠè¡šçŸããããšã§ããïŒãŸãïŒé åºæšã®åé ç¹vãïŒãã®
-8-
9. The 22nd Annual Conference of the Japanese Society for Artificial Intelligence, 2008
0 1 2 3 ... æšã®ãµã€ãº 7.5 é »åºŠèšç®ã®é«éå
ε å®çšçãªå®è£
ã®ããã«ã¯ïŒçæãããæšSã®é »åºŠfrq(S)ãé«
éã«èšç®ããããšãéèŠã§ããïŒãã®ããã«ïŒåé åºæšãã¿ãŒ
ã³Sã«å¯ŸããŠïŒæå³åºçŸãšãã¶äžéçãªããŒã¿ãä¿æãïŒããã
æå³æ¡åŒµãšå
±ã«æŒžå¢çã«æŽæ°ããããšã§ïŒå¹çããé »åºŠèšç®ã
è¡ããïŒæå³åºçŸ(rigihtmost occurrence)ãšã¯ïŒãã®ããŒã¿ããŒ
ã¹äžã§Sãç
§åé¢æ°ã«ãã£ãŠåã蟌ãŸããŠããå
šãŠã®åã蟌ã¿
ã«ãããSã®æå³èïŒããªãã¡æ«å°Ÿé ç¹ïŒã®åºçŸäœçœ®ããªã¹ãã«
ä¿æãããã®ã§ããïŒ
æ ¹åºçŸã®ä»£ããã«æå³åºçŸãçšããããšã§ïŒ4 ç¯ã§è°è«ãã
ããŠã³ãããžã§ã¯ããæ¯ãåãçã®ã¢ã€ãã éåçºèŠã®é«éåæ
æ³ããã®ãŸãŸæ¡åŒµã§ããïŒãŸãïŒæå³åºçŸããæ ¹åºçŸã«ããšã¥ã
å³ 11 æå³æ¡åŒµã«ããé åºæšã®åæ é »åºŠãäœãã³ã¹ãã§èšç®ã§ããïŒè©³çŽ°ã¯[AAKASA 02, æµ
äº 04]
ãåç
§ããããïŒ
æ·±ãïŒæ ¹ããã®ãã¹ã®é·ãïŒãšã©ãã«ã®å¯Ÿv =ãdepïŒvïŒ,labïŒvïŒã㧠7.6 ã°ã©ããã€ãã³ã°
è¡šãïŒãµã€ãºkã®é åºæšSã®æ·±ãã©ãã«å(depth-label sequence)
æå³æ¡åŒµã¯ïŒç¡é åºæšãäžè¬ã®ã°ã©ããªã©ïŒé åºæšä»¥å€ã®
ã¯ïŒSã®å
šé ç¹ã®æ·±ããšã©ãã«ã®å¯ŸãïŒSäžã®å眮é å·¡å(pre-
æ§é ããŒã¿ã«å¯ŸããŠæ£èŠåœ¢è¡šçŸ(canonical representation)çã®
order traversal)ã®é ã«äžŠã¹ãç³»åcode(S)= s = (v1, âŠ, vk) =
ææ³ãšçµã¿åãããŠïŒããã¯ãã©ãã¯åã®é »åºãã¿ãŒã³çºèŠã¢ã«
ïŒãdep(v1), lab(v1) ã, âŠ, ãdep(vk), lab(vk) ãïŒã§ããïŒã¢ã€ãã é
ãŽãªãºã ãå®çŸããããã«çšããããŠãã[æµ
äº 04, YH 02]ïŒ
åã®å Žåãšåæ§ã«ïŒé åºæšã®æ·±ãã©ãã«åsã«å¯ŸããŠïŒæåŸã®
PrefixSpan ãåæ§ã®ææ³ã§ããïŒ AGM çã®å¹
åªå
åã®ã°ã©ã
察vkãæ«å°Ÿãšãã³ïŒtail(s)ãšè¡šèšããïŒãã®ãšãïŒå
é ã®å¯Ÿv1ã¯S
ãã€ãã³ã°ã¢ã«ãŽãªãºã ã«ãããé£æ¥è¡åã®æ£èŠåœ¢è¡šçŸã®å©çš
ã®æ ¹ã«å¯Ÿå¿ãïŒæ«å°Ÿvk ã¯Sã®æãå³åŽã«ããèã«å¯Ÿå¿ããïŒäŸ
ãïŒããäžè¬çãªã¹ããŒããšã¿ãªãããšãã§ããïŒ
ãã°ïŒå³ 10 ã®ãã¿ãŒã³S1ã¯ç³»ås1= ïŒã0, Aã, ã1, Cã, ã1, BãïŒã§
泚æç¹ãšããŠïŒæå³æ¡åŒµã«ããããã¯ãã©ãã¯åã®ã¢ã«ãŽãªãºã
è¡šãããïŒãã®æ«å°Ÿã¯tail(s1)= ã1, Bãã§ããïŒç°¡åã®ããïŒä»¥åŸ
ã¯ïŒé åºæšãã¿ãŒã³ã«ãããŠã¯ããããŠå¹çãè¯ããïŒããäž
ã¯é åºæšãšæ·±ãã©ãã«åãåºå¥ããªãããšã«ããïŒ
è¬ã®ã°ã©ãæã«å¯Ÿãããã€ãã³ã°ã«ãããŠã¯ïŒåç¬ã§ã¯æ§èœã
7.4 æå³æ¡åŒµãçšããããã¯ãã©ãã¯æ³ åºãªãããšãããïŒäŸãã°ïŒäžè¬ã®ã°ã©ãã«å¯Ÿããæå³æ¡åŒµã®
é »åºé åºæšã®åæã§ã¯ïŒèŠªãšãªããµã€ãºkïŒ1 ã®é åºæšSã«æ° æ§æã¯é£ããïŒã°ã©ãã®ååæ§å€å®ã«ããæ€æ»ãšçµã¿åããã
ããé ç¹vãäžã€è¿œå ããŠïŒåãšãªããµã€ãºkã®é åºæš T ãçæ å¿
èŠããã[YH 02]ïŒãŸãæšããäžè¬çãªã°ã©ãæã«å¯ŸããŠã¯ïŒ
ããïŒãã®éã«ïŒæ°ããè¿œå ããé ç¹vãïŒTã®æ·±ãã©ãã«åäžã§ æå³æ¡åŒµãšããŠã³ãããžã§ã¯ãã®çµã¿åããã¯ïŒæ¢çŽ¢ã®æåãã
v = tail(T)ãæºããããã«ãããšããçæã«ãŒã«ãçšããïŒãã㯠ååã«å©ããªãå Žåãããã®ã§ïŒApriori ã®ãããªå¹
åªå
æ¢çŽ¢
é ç¹ã®è¿œå ã«éããŠïŒ(1)芪ã®æšã®æå³æ(rightmost branch)ã® [IWM 03]ãæ¡çšãããïŒã°ã©ãã®æã®æ§è³ªãèæ
®ããæåãæ
çŽäžã§ïŒ(2)å
åŒäžã§æ«åŒïŒæãå³ã®é ç¹ïŒãšãªãå Žæã«ã®ã¿è¿œ æ³ãïŒããã¯ãã©ãã¯åã«çµã¿èŸŒãçã®å·¥å€«ãå¿
èŠã«ãªãïŒ
å ãããšããå¶éãå ããããšã«ãªãïŒãããæå³æ¡åŒµ(rightmost
expansion)ãšãã [AAKASA 02, Nakano 02, Zaki 02]ïŒæ·±ãã©ã
8. ãŸãšã
ã«åã®èšèã§ããã°ïŒSã®æå³èããªãã¡æ«å°Ÿ tail(S)ã®æ·±ãã
dep(tail(S))ãšãããšïŒä»»æã®æ·±ã 1âŠdepâŠdep(tail(S))+1 ãšä»»æ æ¬çš¿ã§ã¯ïŒé »åºã¢ã€ãã éåçºèŠåé¡ãïŒã¢ã«ãŽãªãºã ãäž
ã®ã©ãã«labãããªã察ãdep, labããïŒåSã®æ«ç«¯ã«è¿œå ããããš å¿ã«ããŠè§£èª¬ããïŒé£œåéåãã€ãã³ã°ãäžè¬ã®ã°ã©ããã€ãã³
ã«å¯Ÿå¿ããïŒ ã°çïŒæ¬çš¿ã§è§Šããããªãã£ã話é¡ã«é¢ããŠã¯ïŒ [å®é 07,æµ
äº
以äžããŸãšãããšæ¬¡ã®ã¢ã«ãŽãªãºã æå³æ¡åŒµããã¯ãã©ãã¯ã 04, WM 03,鷲尟 07]çã®è§£èª¬ãåç
§ããããïŒ
åŸãããïŒãµã€ãº 0 ã®ç©ºæšÎµïŒããªãã¡ç©ºã®æ·±ãã©ãã«åïŒã«å¯Ÿ é »åºéåã¯ããŒã¿ãã€ãã³ã°ã®åºç€ã§ããïŒå®éã«ããŒã«ãš
ããŠïŒæå³æ¡åŒµããã¯ãã©ãã¯(ε)ãåŒã³åºãããšã§ïŒå
šãŠã®é »åº ããŠäœ¿ãæ©äŒãå€ãïŒæ¬çš¿ã§çŽ¹ä»ããã¢ã«ãŽãªãºã ã®å€ãã¯
éåãåæãããïŒ FIMI ã®ãµã€ãã«å®è£
ãšè§£èª¬è«æãããïŒãŸããã³ãããŒã¯åé¡
ãæã«å
¥ãïŒ ãŸãïŒèè
ã®ããŒã ããŒãž[å®é HP]ã§ã¯ïŒLCM
æå³æ¡åŒµããã¯ãã©ãã¯(S) ã®ä»ïŒããã€ãã®é »åºãã¿ãŒã³çºèŠã¢ã«ãŽãªãºã ã®å®è£
ãšåé¡
1. output S å€æçåçš®ã®ããŒã«ãå
¥æã§ããïŒäŸãã°ïŒ3.5 ç¯ã®ããã¯ãã©
2. for each 1âŠdepâŠdep(tail(S))+1 do ãã¯æ³ã«ããé »åºéåçºèŠã¢ã«ãŽãªãºã ã«ïŒ4 ç¯ã®è¿°ã¹ãåçš®ã®
3. for each label lab do é«éåææ³ãå ããé«éçå®è£
ã«ïŒé£œåéåçºèŠãè¿œå ãã
4. if frqïŒSã»ãdep, labãïŒâ§Îž then ãã®ã¯ LCM ver.4 ãšããŠïŒ6.6 ç¯ã®éã¿ä»ããã©ã³ã¶ã¯ã·ã§ã³å¯Ÿå¿
5. call æå³æ¡åŒµããã¯ãã©ãã¯ïŒSã»ãdep, labãïŒ çã®å®è£
㯠LCM ver.5 ãšããŠïŒãŸãïŒ7 ç¯ã®é »åºæšãã€ãã³ã°ã¯
Freqt ver.4 ãš Unot ãšããŠïŒåããŒã ããŒãžã§å©çšå¯èœã§ããïŒ
ãã ãïŒç©ºæšÎµã«ã¯ç¡æ¡ä»¶ã«äžã€ã®é ç¹ãè¿œå å¯èœãšããïŒ æ¬çš¿ãšå®è£
ãããŒã¿ãã€ãã³ã°ç 究æšé²ã®å©åãšãªãã°å¹ž
å³ïŒïŒã«æå³æ¡åŒµã«ããé åºæšã®åæã®äŸã瀺ãïŒããã¯ãã©ã ãã§ããïŒ
ã¯ã¢ã«ãŽãªãºã ãïŒãµã€ãº 0 ã®ç©ºæšããåºçºããŠïŒæå³ã®èïŒåœ±
ã§ç€ºãããŠããïŒãè¿œå ããªããïŒãµã€ãºã®å°ããªé ã«å
šãŠã®é »
åºé åºæšããããªãïŒãã€éè€ãªãåæããããšããããïŒ
-9-
10. The 22nd Annual Conference of the Japanese Society for Artificial Intelligence, 2008
[MS 00] S. Morishita, J. Sese, Traversing Itemset Lattice with
åèæç® Statistical Metric Pruning, Proc. PODS 2000, pp. 226-236
[AIS 93] R. Agrawal, T. Imielinski, A. N. Swami: Mining (2000)
Association Rules between Sets of Items in Large Databases. [è¥¿ç° 07] 西ç°è±æ(ç·š), ã¬ã¯ãã£ãŒã·ãªãŒãºãç¥èœã³ã³ãã¥ãŒã
Proc. SIGMOD Conference 1993, pp. 207-216 (1993) ã£ã³ã°ãšãã®åšèŸºã, 人工ç¥èœåŠäŒèª, 2007 幎 3 æå·ïœ
[AMST 96] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A. (2007)
I. Verkamo: Fast Discovery of Association Rules. Advances [SAA 00] S. Shimozono, H. Arimura, S. Arikawa, Efficient
in Knowledge Discovery and Data Mining, pp. 307-328 Discovery of Optimal Word-Association Patterns in Large
(1996) Text Databases, New Generation Computing, 18(1), pp. 49-
[AS 94] R. Agrawal, R. Srikant: Fast Algorithms for Mining 60 (2000)
Association Rules in Large Databases. Proc. VLDB 1994, pp. [TK 06] K. Tsuda, T. Kudo: Clustering graphs by weighted
487-499 (1994) substructure mining, Proc. ICML 2006, pp. 953-960 (2006)
[AAKASA 02] T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. [å®é HP] å®é æ¯
æã®ããŒã ããŒãž,
Sakamoto, S. Arikawa, Efficient Substructure Discovery http://research.nii.ac.jp/~uno/index-j.html
from Large Semi-structured Data, Proc. 2nd SIAM Data [UKA 04] T. Uno, M. Kiyomi, H. Arimura, LCM ver. 2:
Mining (SDM'02), pp. 158-174 (2002) Efficient Mining Algorithms for Frequent/Closed/Maximal
[æµ
äº 04] æµ
äºéå, ææåçŽ: åæ§é ããŒã¿ãã€ãã³ã°ã«ãã Itemsets, Proc. FIMI'04 (2004)
ããã¿ãŒã³çºèŠææ³, ããŒã¿å·¥åŠè«æç¹éïŒé»åæ
å ±éä¿¡ [UAUA 04] T. Uno, T. Asai, Y. Uchida, H. Arimura: An
åŠäŒè«æèª, Vol.J87-D-I, No.2, pp. 79-96 (2004) Efficient Algorithm for Enumerating Closed Patterns in
[BJ 98] R. J. Bayardo Jr.: Efficiently Mining Long Patterns from Transaction Databases, Proc. Discovery Science 2004, LNAI
Databases, Proc. SIGMOD Conference 1998: pp. 85-93 3245, pp. 16-31 (2004)
(1998) [å®é 07] å®éæ¯
æïŒææåçŽ: ããŒã¿ã€ã³ãã³ã·ãã³ã³ãã¥ãŒ
[EM 02] T. Eiter, K. Makino: On Computing all Abductive ãã£ã³ã° ãã® 2 é »åºã¢ã€ãã éåçºèŠã¢ã«ãŽãªãºã , ã¬ã¯ã
Explanations, Proc. AAAI-2002, pp.62 (2002) ã£ãŒã·ãªãŒãºãç¥èœã³ã³ãã¥ãŒãã£ã³ã°ãšãã®åšèŸºã, (ç·š)西ç°
[FIMI HP] B. Goethals, the FIMI repository, è±æ, 第 2 åïŒäººå·¥ç¥èœåŠäŒèª, 2007 幎 5 æå· (2007)
http://fimi.cs.helsinki.fi/ (2003) [WM 03] T. Washio, H. Motoda: State of the art of graph-based
[HPY 00] J. Han, J. Pei, Y. Yin: Mining Frequent Patterns data mining. SIGKDD Explorations, 5(1): pp. 59-68 (2003)
without Candidate Generation. Proc. SIGMOD Conference [鷲尟 07] 鷲尟é: ããŒã¿ã€ã³ãã³ã·ãã³ã³ãã¥ãŒãã£ã³ã° ãã®ïŒ
2000, pp. 1-12 (2000) é¢æ£æ§é ãã€ãã³ã°, ã¬ã¯ãã£ãŒã·ãªãŒãºãç¥èœã³ã³ãã¥ãŒã
[IWM 03] A. Inokuchi, T. Washio, H. Motoda, Complete Mining ã£ã³ã°ãšãã®åšèŸºã, (ç·š)西ç°è±æ, 第 1 åïŒäººå·¥ç¥èœåŠäŒ
of Frequent Patterns from Graphs: Mining Graph Data, èª, 2007 幎 3 æå· (2007)
Machine Learning, 50(3), pp. 321-354 (2003) [YH 02] X. Yan, J. Han, gSpan: Graph-Based Substructure
[KMM 04] T. Kudo, E. Maeda, Y. Matsumoto: An Application Pattern Mining, Proc. IEEE ICDM 2002, pp. 721-724 (2002)
of Boosting to Graph Classification, Proc. NIPS 2004 (2004) [Zaki 02] M. J. Zaki, Efficiently mining frequent trees in a forest,
[MA 07] S. Minato, H. Arimura: Frequent Pattern Mining and Proc. ACM KDD'00, pp. 71-80 (2002)
Knowledge Indexing Based on Zero-suppressed BDDs, Post- [ZPOL 97] M. J. Zaki, S. Parthasarathy, M. Ogihara, W. Li: New
Proc. of the 5th International Workshop on Knowledge Algorithms for Fast Discovery of Association Rules, Proc.
Discovery in Inductive Databases, LNAI, (2007) KDD 1997, pp. 283-286 (1997)
[Mo 98] S. Morishita, On Classification and Regression, Proc.
Discovery Science (DS'98), pp. 40-57 (1998)
[MN 99] S. Morishita, A. Nakaya: Parallel Branch-and-Bound
Graph Search for Correlated Association Rules, Large-Scale
Parallel Data Mining: pp. 127-144 (1999); S. Morishita: On
Classification and Regression, Proc. Discovery Science 1998,
LNAI 1532, pp. 40-57 (1998)
[MS 00] S. Morishita, J. Sese: Traversing Itemset Lattice with
Statistical Metric Pruning, Proc. PODS 2000, pp. 226-236
(2000)
[MTV 94] H. Mannila, H. Toivonen, A. I. Verkamo: Efficient
Algorithms for Discovering Association Rules, Proc. KDD
Workshop 1994: pp. 181-192 (1994)
[Nakano 02] S. Nakano, Efficient generation of plane trees, Inf.
Process. Lett., 84(3), pp. 167-172 (2002)
[PHPCDH 01] J. Pei, J. Han, B. Pinto, Q. Chen, U. Dayal, M.
Hsu: PrefixSpan: Mining Sequential Patterns by Prefix-
Projected Growth, Proc. IEEE ICDE 2001, pp. 215-224
(2001)
- 10 -