More Related Content Similar to 学術的に見たストリームデータ処理(私見) Similar to 学術的に見たストリームデータ処理(私見) (20) 学術的に見たストリームデータ処理(私見)5. Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
6. Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
7. Continual query, window
• Continual query
– DSMS: Queries are persistent, data are volatile
– DBMS: Data are persistent, queries are volatile
– CQ: Tapestryで導入された概念
• “Continuous queries over append only databases”, Terry,
et.al, SIGMOD’92.
• Window
– 無限長のデータを有限長に変換
• Type: ROWS or TIME
• Operators
– Aggregate -> window aggregate
– Join -> window join
• S. Babu and J. Widom. Continuous Queries over
Data Streams, SIGMOD Record, Sep. 2001
8. Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
9. Relational Stream, DSMS
• リレーショナルデータ処理をCQ化
– 狭義
• Selection, projection, …
• Join, aggregate, set operationsは窓が必須
– 広義
• 各種のマイニング処理
• Relational completenessを満たせば何でもOK
• DSMS (data stream management system)
– 連続的問合せを管理するシステム
• Relational
– STREAM (Stanford) -> Coral8 -> Aleri -> Sybase -> SAP
– Telegraph (UCB) -> CISCO
– Aurora -> Borealis -> StreamBase
• Non relational
– STORM
– S4(小山田さん@NECが詳しい)
10. Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
11. Incremental computation
• 差分計算
– 前と変わった部分のみ処理を行う計算方式
• 非常に多くの計算手法が提案
– Aggregate (MIN, MAX, AVG, SUM)
– Similarity search (dynamic time warping(SPRING),
ハウスドルフ距離)
– Handshake join
– Incremental LOCI (local outlier correlation integral)
12. • “Incremental outlier detection in data streams using local correlation integral”.
Xinjie Lu, Tian Yang, Zaifei Liao, Manzoor Elahi, Wei Liu, and Hongan Wang. SAC
2009.
• “Incoop: MapReduce for Incremental Computations”. Pramod Bhatotia, Alexander
Wieder, Rodrigo Rodrigues, Umut A. Acar, and Rafael Pasquini (MPI-SWS), SOCC
2011
• “Fast Incremental and Personalized PageRank”, Bahman Bahmani (Stanford
University), Abdur Chowdhury (Twitter Inc.), Ashish Goel (Stanford University,
Twitter Inc.), VLDB 2011
• “Incremental Graph Pattern Matching”, Wenfei Fan, University of Edinburgh;
Jianzhong Li, Harbin Institute of Technology; Jizhou Luo, Harbin Institute of
Technology; Zijing Tan, ; Xin Wang, University of Edinburgh; Yinghui Wu*, University
of Edinburgh, SIGMOD 2011
• “iCBS: Incremental Cost-based Scheduling under Piecewise Linear SLAs”, Yun Chi
(NEC Laboratories, America), Hyun Moon (NEC Labs America), Hakan Hacigumus
(NEC Labs America), VLDB 2011
• “An Incremental Hausdorff Distance Calculation Algorithm”, Sarana Nutanong
(University of Maryland), Edwin Jacox (University of Maryland), Hanan Samet
(University of Maryland), VLDB 2011
• “Large-scale Incremental Processing Using Distributed Transactions and
Notifications”, Daniel Peng and Frank Dabek, Google, Inc., OSDI 2011
13. Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
14. Online-LDA
• Limin Yao, Efficient Methods for Topic Model Inference
on Streaming Document Collections, KDD’09
– Gibbs samplingを通常の20倍程度高速化
– 既存の高速サンプリング(下)より2倍高速
• I. Porteous, Fast collapsed Gibbs sampling for latent Dirichlet
allocation. In SIGKDD, 2008.
• Matthew D. Hoffman, Online Learning for Latent
Dirichlet Allocation, NIPS’10
– オンライン変分ベイズ
– Gibbs samplingとの比較がない
– 変分ベイズより高性能と主張
– Blei’03に数行追加のみと,論文には記述
15. Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
17. Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
18. XML Stream
• XFilterが起源
– Mehmet Altinel and Michael J. Franklin. “Efficient
Filtering of XML Documents for Selective
Dissemination of Information”. VLDB '00.
• 学術的には重要 ?
– High-Performance Complex Event Processing over
XML Streams
• Barzan Mozafari, Kai Zeng, Carlo Zaniolo (UCLA)
• SIGMOD’12, Best paper award
19. Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
21. Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
22. Adaptive Query Processing
• Operator treeの構造を動的に変更
– “Eddies: continuously adaptive query processing”,
Ron Avnur and Joseph M. Hellerstein, SIGMOD’00.
A B C D
A B
C
D
A B
C
D
A B C D
Left
Deep
Tree
23. Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
24. データ処理パラダイム
• バッチ処理
– データが永続的
– RDB: Oracle, Vertica, GreenPlum
– NoSQL: Hadoop, MongoDB
• リアルタイム処理
– 問合せが永続的,窓関数
– DSMS: 日立,Sybase, IBM
– NoSQL: Storm, S4
Database
メモリ(低遅延)
ディスク(高遅延)
ストリーム 分析
処理
Database
メモリ(低遅延)
ディスク(高遅延)
ストリーム
分析
処理
DSMS: Data Stream Processing System
25. Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
27. まとめ
• キーワードの分類
• 重要概念
– Continual query
• Window
• Incremental computation
• 未解決分野
– Adaptive query processing
– Join of stream and relation (小山田さん@
NEC)