13. Aoe et. al
1989 Jun-ichi Aoe:
“An Efficient Digital Search Algorithm by Using a Double-Array Structure”
IEEE Transactions on Software Engineering archive Volume 15 Issue 9,
September 1989 Page 1066-1077
check[base[s] + c] = s
base[s] + c = t
18. Aoe et. al
1989 Jun-ichi Aoe:
“An Efficient Digital Search Algorithm by Using a Double-Array Structure”
IEEE Transactions on Software Engineering archive Volume 15 Issue 9,
September 1989 Page 1066-1077
2016 Masao Fuketa, Kazuhiro Morita, and Jun-ichi Aoe:
“Comparisons of Efficient Implementations for DAWG”
International Journal of Computer Theory and Engineering, Vol. 8, No. 1,
February 2016
DAWGを持っている前提 orz
24. 気になるサイズと速度
中国語350万語→Linked List DAWG 137 sec.
SparseMatrixの生成 58 sec.
SparseMatrix+Character List 52 MB (Double-Arrayより大きい)
中国語350万語(UTF8) 44 MB
中国語350万語(UTF8)のZIPファイル 18 MB
SparseMatrix読み込み 初回:2.1 sec.
キャッシュ後:0.19 sec.
Weibo 8万文での辞書マッチング速度
(書き込みなし)
55,212 文/sec/cpu (中央値)
(予想通り速い!)
25. 参考資料
1. Comparisons of Efficient Implementations for DAWG: Masao Fuketa, Kazuhiro Morita, and Jun-
ichi Aoe, International Journal of Computer Theory and Engineering, Vol. 8, No. 1, February 2016
2. A Retrieval Method for Double Array Structures by Using Byte N-Gram: Masao Fuketa, Kazuhiro
Morita, and Jun-Ichi Aoe, International Journal of Computer Theory and Engineering, Vol. 6, No. 2,
April 2014
3. Importance of Aho-Corasick String Matching Algorithm in Real World Applications: Saima Hasib,
Mahak Motwani, Amit Saxena, International Journal of Computer Science and Information
Technologies, Vol. 4 (3) , 2013, 467-469
4. Compressing dictionaries with a DAWG: Steve Hanov’s Blog ,
http://stevehanov.ca/blog/index.php?id=115