• Save
学術的に見たストリームデータ処理(私見)
Upcoming SlideShare
Loading in...5
×
 

学術的に見たストリームデータ処理(私見)

on

  • 2,965 views

 

Statistics

Views

Total Views
2,965
Views on SlideShare
2,947
Embed Views
18

Actions

Likes
22
Downloads
1
Comments
0

2 Embeds 18

https://twitter.com 17
http://s.deeeki.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

学術的に見たストリームデータ処理(私見) 学術的に見たストリームデータ処理(私見) Presentation Transcript

  • 学術的に見た ストリームデータ処理 2013年6月28日 筑波大学 講師 川島英之
  • Disclaimer • 学術的に見たストリーム処理について私見を述べます。 • 機能・性能・信頼性・安全性・信憑性の内、一部(機能 と性能)に関してのみ述べます。 • 内容には誤りがある可能性があります。
  • 概要 • キーワード分類 • 重要な概念 – Continual query
  • STORM Norikra Jubatus CEP DSMS SPE Relational-stream XML-stream S4 STREAM System S Algorithm trading Borealis(MIT/Brandeis) Stream computing Complex event processing Online learning Incremental computation Continual query Spring (DTW) CPD (Change Point Detection) Window-aggregate Window-join FPGA GPU SASE Fraud detection Malware detection AQP (Adaptive Query Proc.) Esper BRIMOS Handshake-join Incr. LOCI Online LDA Window Real-time Tuple-stream Materialized view Tapestry
  • Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • Continual query, window • Continual query – DSMS: Queries are persistent, data are volatile – DBMS: Data are persistent, queries are volatile – CQ: Tapestryで導入された概念 • “Continuous queries over append only databases”, Terry, et.al, SIGMOD’92. • Window – 無限長のデータを有限長に変換 • Type: ROWS or TIME • Operators – Aggregate -> window aggregate – Join -> window join • S. Babu and J. Widom. Continuous Queries over Data Streams, SIGMOD Record, Sep. 2001
  • Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • Relational Stream, DSMS • リレーショナルデータ処理をCQ化 – 狭義 • Selection, projection, … • Join, aggregate, set operationsは窓が必須 – 広義 • 各種のマイニング処理 • Relational completenessを満たせば何でもOK • DSMS (data stream management system) – 連続的問合せを管理するシステム • Relational – STREAM (Stanford) -> Coral8 -> Aleri -> Sybase -> SAP – Telegraph (UCB) -> CISCO – Aurora -> Borealis -> StreamBase • Non relational – STORM – S4(小山田さん@NECが詳しい)
  • Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • Incremental computation • 差分計算 – 前と変わった部分のみ処理を行う計算方式 • 非常に多くの計算手法が提案 – Aggregate (MIN, MAX, AVG, SUM) – Similarity search (dynamic time warping(SPRING), ハウスドルフ距離) – Handshake join – Incremental LOCI (local outlier correlation integral)
  • • “Incremental outlier detection in data streams using local correlation integral”. Xinjie Lu, Tian Yang, Zaifei Liao, Manzoor Elahi, Wei Liu, and Hongan Wang. SAC 2009. • “Incoop: MapReduce for Incremental Computations”. Pramod Bhatotia, Alexander Wieder, Rodrigo Rodrigues, Umut A. Acar, and Rafael Pasquini (MPI-SWS), SOCC 2011 • “Fast Incremental and Personalized PageRank”, Bahman Bahmani (Stanford University), Abdur Chowdhury (Twitter Inc.), Ashish Goel (Stanford University, Twitter Inc.), VLDB 2011 • “Incremental Graph Pattern Matching”, Wenfei Fan, University of Edinburgh; Jianzhong Li, Harbin Institute of Technology; Jizhou Luo, Harbin Institute of Technology; Zijing Tan, ; Xin Wang, University of Edinburgh; Yinghui Wu*, University of Edinburgh, SIGMOD 2011 • “iCBS: Incremental Cost-based Scheduling under Piecewise Linear SLAs”, Yun Chi (NEC Laboratories, America), Hyun Moon (NEC Labs America), Hakan Hacigumus (NEC Labs America), VLDB 2011 • “An Incremental Hausdorff Distance Calculation Algorithm”, Sarana Nutanong (University of Maryland), Edwin Jacox (University of Maryland), Hanan Samet (University of Maryland), VLDB 2011 • “Large-scale Incremental Processing Using Distributed Transactions and Notifications”, Daniel Peng and Frank Dabek, Google, Inc., OSDI 2011
  • Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • Online-LDA • Limin Yao, Efficient Methods for Topic Model Inference on Streaming Document Collections, KDD’09 – Gibbs samplingを通常の20倍程度高速化 – 既存の高速サンプリング(下)より2倍高速 • I. Porteous, Fast collapsed Gibbs sampling for latent Dirichlet allocation. In SIGKDD, 2008. • Matthew D. Hoffman, Online Learning for Latent Dirichlet Allocation, NIPS’10 – オンライン変分ベイズ – Gibbs samplingとの比較がない – 変分ベイズより高性能と主張 – Blei’03に数行追加のみと,論文には記述
  • Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • BRIMOS, Malware • Xrosscloud®橋梁監視ソリューション (BRIMOS®) – 橋梁に設置した各種センサを用いて、リアル タイムかつ継続的に橋の状態を監視する橋梁 モニタリングシステム • Malware – NICTER • 21万程度のhostのダークネットトラフィック • MWS’13に参加すれば利用可能 (NON STOP)
  • Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • XML Stream • XFilterが起源 – Mehmet Altinel and Michael J. Franklin. “Efficient Filtering of XML Documents for Selective Dissemination of Information”. VLDB '00. • 学術的には重要 ? – High-Performance Complex Event Processing over XML Streams • Barzan Mozafari, Kai Zeng, Carlo Zaniolo (UCLA) • SIGMOD’12, Best paper award
  • Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • 高性能H/W • FPGA – アプリ回路を構築可能 – プロトコルスタックも実装可能 – E-trees (http://e-trees.jp/) • GPU – 1500程度の並列性 • MIC – 天河2号で利用(スパコン Top 500で1位)
  • Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • Adaptive Query Processing • Operator treeの構造を動的に変更 – “Eddies: continuously adaptive query processing”, Ron Avnur and Joseph M. Hellerstein, SIGMOD’00. A B C D A B C D A B C D A B C D Left Deep Tree
  • Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • データ処理パラダイム • バッチ処理 – データが永続的 – RDB: Oracle, Vertica, GreenPlum – NoSQL: Hadoop, MongoDB • リアルタイム処理 – 問合せが永続的,窓関数 – DSMS: 日立,Sybase, IBM – NoSQL: Storm, S4 Database メモリ(低遅延) ディスク(高遅延) ストリーム 分析 処理 Database メモリ(低遅延) ディスク(高遅延) ストリーム 分析 処理 DSMS: Data Stream Processing System
  • Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • ストリーム処理の問題点:応用 • 演算 – Window-join – Window-aggregate – Window-something • 「ぱっと今すぐに知りたい」事象とは? – アルゴリズムトレーディング – マルウェア検知 – RTB、クレジットカードなどの詐欺
  • まとめ • キーワードの分類 • 重要概念 – Continual query • Window • Incremental computation • 未解決分野 – Adaptive query processing – Join of stream and relation (小山田さん@ NEC)