学術的に見たストリームデータ処理(私見)

3,504 views

Published on

Published in: Technology

学術的に見たストリームデータ処理(私見)

  1. 1. 学術的に見た ストリームデータ処理 2013年6月28日 筑波大学 講師 川島英之
  2. 2. Disclaimer • 学術的に見たストリーム処理について私見を述べます。 • 機能・性能・信頼性・安全性・信憑性の内、一部(機能 と性能)に関してのみ述べます。 • 内容には誤りがある可能性があります。
  3. 3. 概要 • キーワード分類 • 重要な概念 – Continual query
  4. 4. STORM Norikra Jubatus CEP DSMS SPE Relational-stream XML-stream S4 STREAM System S Algorithm trading Borealis(MIT/Brandeis) Stream computing Complex event processing Online learning Incremental computation Continual query Spring (DTW) CPD (Change Point Detection) Window-aggregate Window-join FPGA GPU SASE Fraud detection Malware detection AQP (Adaptive Query Proc.) Esper BRIMOS Handshake-join Incr. LOCI Online LDA Window Real-time Tuple-stream Materialized view Tapestry
  5. 5. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  6. 6. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  7. 7. Continual query, window • Continual query – DSMS: Queries are persistent, data are volatile – DBMS: Data are persistent, queries are volatile – CQ: Tapestryで導入された概念 • “Continuous queries over append only databases”, Terry, et.al, SIGMOD’92. • Window – 無限長のデータを有限長に変換 • Type: ROWS or TIME • Operators – Aggregate -> window aggregate – Join -> window join • S. Babu and J. Widom. Continuous Queries over Data Streams, SIGMOD Record, Sep. 2001
  8. 8. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  9. 9. Relational Stream, DSMS • リレーショナルデータ処理をCQ化 – 狭義 • Selection, projection, … • Join, aggregate, set operationsは窓が必須 – 広義 • 各種のマイニング処理 • Relational completenessを満たせば何でもOK • DSMS (data stream management system) – 連続的問合せを管理するシステム • Relational – STREAM (Stanford) -> Coral8 -> Aleri -> Sybase -> SAP – Telegraph (UCB) -> CISCO – Aurora -> Borealis -> StreamBase • Non relational – STORM – S4(小山田さん@NECが詳しい)
  10. 10. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  11. 11. Incremental computation • 差分計算 – 前と変わった部分のみ処理を行う計算方式 • 非常に多くの計算手法が提案 – Aggregate (MIN, MAX, AVG, SUM) – Similarity search (dynamic time warping(SPRING), ハウスドルフ距離) – Handshake join – Incremental LOCI (local outlier correlation integral)
  12. 12. • “Incremental outlier detection in data streams using local correlation integral”. Xinjie Lu, Tian Yang, Zaifei Liao, Manzoor Elahi, Wei Liu, and Hongan Wang. SAC 2009. • “Incoop: MapReduce for Incremental Computations”. Pramod Bhatotia, Alexander Wieder, Rodrigo Rodrigues, Umut A. Acar, and Rafael Pasquini (MPI-SWS), SOCC 2011 • “Fast Incremental and Personalized PageRank”, Bahman Bahmani (Stanford University), Abdur Chowdhury (Twitter Inc.), Ashish Goel (Stanford University, Twitter Inc.), VLDB 2011 • “Incremental Graph Pattern Matching”, Wenfei Fan, University of Edinburgh; Jianzhong Li, Harbin Institute of Technology; Jizhou Luo, Harbin Institute of Technology; Zijing Tan, ; Xin Wang, University of Edinburgh; Yinghui Wu*, University of Edinburgh, SIGMOD 2011 • “iCBS: Incremental Cost-based Scheduling under Piecewise Linear SLAs”, Yun Chi (NEC Laboratories, America), Hyun Moon (NEC Labs America), Hakan Hacigumus (NEC Labs America), VLDB 2011 • “An Incremental Hausdorff Distance Calculation Algorithm”, Sarana Nutanong (University of Maryland), Edwin Jacox (University of Maryland), Hanan Samet (University of Maryland), VLDB 2011 • “Large-scale Incremental Processing Using Distributed Transactions and Notifications”, Daniel Peng and Frank Dabek, Google, Inc., OSDI 2011
  13. 13. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  14. 14. Online-LDA • Limin Yao, Efficient Methods for Topic Model Inference on Streaming Document Collections, KDD’09 – Gibbs samplingを通常の20倍程度高速化 – 既存の高速サンプリング(下)より2倍高速 • I. Porteous, Fast collapsed Gibbs sampling for latent Dirichlet allocation. In SIGKDD, 2008. • Matthew D. Hoffman, Online Learning for Latent Dirichlet Allocation, NIPS’10 – オンライン変分ベイズ – Gibbs samplingとの比較がない – 変分ベイズより高性能と主張 – Blei’03に数行追加のみと,論文には記述
  15. 15. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  16. 16. BRIMOS, Malware • Xrosscloud®橋梁監視ソリューション (BRIMOS®) – 橋梁に設置した各種センサを用いて、リアル タイムかつ継続的に橋の状態を監視する橋梁 モニタリングシステム • Malware – NICTER • 21万程度のhostのダークネットトラフィック • MWS’13に参加すれば利用可能 (NON STOP)
  17. 17. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  18. 18. XML Stream • XFilterが起源 – Mehmet Altinel and Michael J. Franklin. “Efficient Filtering of XML Documents for Selective Dissemination of Information”. VLDB '00. • 学術的には重要 ? – High-Performance Complex Event Processing over XML Streams • Barzan Mozafari, Kai Zeng, Carlo Zaniolo (UCLA) • SIGMOD’12, Best paper award
  19. 19. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  20. 20. 高性能H/W • FPGA – アプリ回路を構築可能 – プロトコルスタックも実装可能 – E-trees (http://e-trees.jp/) • GPU – 1500程度の並列性 • MIC – 天河2号で利用(スパコン Top 500で1位)
  21. 21. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  22. 22. Adaptive Query Processing • Operator treeの構造を動的に変更 – “Eddies: continuously adaptive query processing”, Ron Avnur and Joseph M. Hellerstein, SIGMOD’00. A B C D A B C D A B C D A B C D Left Deep Tree
  23. 23. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  24. 24. データ処理パラダイム • バッチ処理 – データが永続的 – RDB: Oracle, Vertica, GreenPlum – NoSQL: Hadoop, MongoDB • リアルタイム処理 – 問合せが永続的,窓関数 – DSMS: 日立,Sybase, IBM – NoSQL: Storm, S4 Database メモリ(低遅延) ディスク(高遅延) ストリーム 分析 処理 Database メモリ(低遅延) ディスク(高遅延) ストリーム 分析 処理 DSMS: Data Stream Processing System
  25. 25. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  26. 26. ストリーム処理の問題点:応用 • 演算 – Window-join – Window-aggregate – Window-something • 「ぱっと今すぐに知りたい」事象とは? – アルゴリズムトレーディング – マルウェア検知 – RTB、クレジットカードなどの詐欺
  27. 27. まとめ • キーワードの分類 • 重要概念 – Continual query • Window • Incremental computation • 未解決分野 – Adaptive query processing – Join of stream and relation (小山田さん@ NEC)

×