SlideShare a Scribd company logo
1 of 27
学術的に見た
ストリームデータ処理
2013年6月28日
筑波大学 講師
川島英之
Disclaimer
• 学術的に見たストリーム処理について私見を述べます。
• 機能・性能・信頼性・安全性・信憑性の内、一部(機能
と性能)に関してのみ述べます。
• 内容には誤りがある可能性があります。
概要
• キーワード分類
• 重要な概念
– Continual query
STORM
Norikra
Jubatus
CEP
DSMS
SPE
Relational-stream
XML-stream
S4
STREAM
System S
Algorithm trading
Borealis(MIT/Brandeis)
Stream computing
Complex event processing
Online learning
Incremental computation
Continual query
Spring
(DTW)
CPD
(Change
Point
Detection) Window-aggregate
Window-join
FPGA GPU
SASE
Fraud detection
Malware detection
AQP
(Adaptive Query Proc.)
Esper
BRIMOS Handshake-join
Incr.
LOCI
Online
LDA
Window
Real-time
Tuple-stream
Materialized view
Tapestry
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
Continual query, window
• Continual query
– DSMS: Queries are persistent, data are volatile
– DBMS: Data are persistent, queries are volatile
– CQ: Tapestryで導入された概念
• “Continuous queries over append only databases”, Terry,
et.al, SIGMOD’92.
• Window
– 無限長のデータを有限長に変換
• Type: ROWS or TIME
• Operators
– Aggregate -> window aggregate
– Join -> window join
• S. Babu and J. Widom. Continuous Queries over
Data Streams, SIGMOD Record, Sep. 2001
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
Relational Stream, DSMS
• リレーショナルデータ処理をCQ化
– 狭義
• Selection, projection, …
• Join, aggregate, set operationsは窓が必須
– 広義
• 各種のマイニング処理
• Relational completenessを満たせば何でもOK
• DSMS (data stream management system)
– 連続的問合せを管理するシステム
• Relational
– STREAM (Stanford) -> Coral8 -> Aleri -> Sybase -> SAP
– Telegraph (UCB) -> CISCO
– Aurora -> Borealis -> StreamBase
• Non relational
– STORM
– S4(小山田さん@NECが詳しい)
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
Incremental computation
• 差分計算
– 前と変わった部分のみ処理を行う計算方式
• 非常に多くの計算手法が提案
– Aggregate (MIN, MAX, AVG, SUM)
– Similarity search (dynamic time warping(SPRING),
ハウスドルフ距離)
– Handshake join
– Incremental LOCI (local outlier correlation integral)
• “Incremental outlier detection in data streams using local correlation integral”.
Xinjie Lu, Tian Yang, Zaifei Liao, Manzoor Elahi, Wei Liu, and Hongan Wang. SAC
2009.
• “Incoop: MapReduce for Incremental Computations”. Pramod Bhatotia, Alexander
Wieder, Rodrigo Rodrigues, Umut A. Acar, and Rafael Pasquini (MPI-SWS), SOCC
2011
• “Fast Incremental and Personalized PageRank”, Bahman Bahmani (Stanford
University), Abdur Chowdhury (Twitter Inc.), Ashish Goel (Stanford University,
Twitter Inc.), VLDB 2011
• “Incremental Graph Pattern Matching”, Wenfei Fan, University of Edinburgh;
Jianzhong Li, Harbin Institute of Technology; Jizhou Luo, Harbin Institute of
Technology; Zijing Tan, ; Xin Wang, University of Edinburgh; Yinghui Wu*, University
of Edinburgh, SIGMOD 2011
• “iCBS: Incremental Cost-based Scheduling under Piecewise Linear SLAs”, Yun Chi
(NEC Laboratories, America), Hyun Moon (NEC Labs America), Hakan Hacigumus
(NEC Labs America), VLDB 2011
• “An Incremental Hausdorff Distance Calculation Algorithm”, Sarana Nutanong
(University of Maryland), Edwin Jacox (University of Maryland), Hanan Samet
(University of Maryland), VLDB 2011
• “Large-scale Incremental Processing Using Distributed Transactions and
Notifications”, Daniel Peng and Frank Dabek, Google, Inc., OSDI 2011
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
Online-LDA
• Limin Yao, Efficient Methods for Topic Model Inference
on Streaming Document Collections, KDD’09
– Gibbs samplingを通常の20倍程度高速化
– 既存の高速サンプリング(下)より2倍高速
• I. Porteous, Fast collapsed Gibbs sampling for latent Dirichlet
allocation. In SIGKDD, 2008.
• Matthew D. Hoffman, Online Learning for Latent
Dirichlet Allocation, NIPS’10
– オンライン変分ベイズ
– Gibbs samplingとの比較がない
– 変分ベイズより高性能と主張
– Blei’03に数行追加のみと,論文には記述
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
BRIMOS, Malware
• Xrosscloud®橋梁監視ソリューション
(BRIMOS®)
– 橋梁に設置した各種センサを用いて、リアル
タイムかつ継続的に橋の状態を監視する橋梁
モニタリングシステム
• Malware
– NICTER
• 21万程度のhostのダークネットトラフィック
• MWS’13に参加すれば利用可能 (NON STOP)
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
XML Stream
• XFilterが起源
– Mehmet Altinel and Michael J. Franklin. “Efficient
Filtering of XML Documents for Selective
Dissemination of Information”. VLDB '00.
• 学術的には重要 ?
– High-Performance Complex Event Processing over
XML Streams
• Barzan Mozafari, Kai Zeng, Carlo Zaniolo (UCLA)
• SIGMOD’12, Best paper award
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
高性能H/W
• FPGA
– アプリ回路を構築可能
– プロトコルスタックも実装可能
– E-trees (http://e-trees.jp/)
• GPU
– 1500程度の並列性
• MIC
– 天河2号で利用(スパコン Top 500で1位)
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
Adaptive Query Processing
• Operator treeの構造を動的に変更
– “Eddies: continuously adaptive query processing”,
Ron Avnur and Joseph M. Hellerstein, SIGMOD’00.
A B C D
A B
C
D
A B
C
D
A B C D
Left
Deep
Tree
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
データ処理パラダイム
• バッチ処理
– データが永続的
– RDB: Oracle, Vertica, GreenPlum
– NoSQL: Hadoop, MongoDB
• リアルタイム処理
– 問合せが永続的,窓関数
– DSMS: 日立,Sybase, IBM
– NoSQL: Storm, S4
Database
メモリ(低遅延)
ディスク(高遅延)
ストリーム 分析
処理
Database
メモリ(低遅延)
ディスク(高遅延)
ストリーム
分析
処理
DSMS: Data Stream Processing System
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
ストリーム処理の問題点:応用
• 演算
– Window-join
– Window-aggregate
– Window-something
• 「ぱっと今すぐに知りたい」事象とは?
– アルゴリズムトレーディング
– マルウェア検知
– RTB、クレジットカードなどの詐欺
まとめ
• キーワードの分類
• 重要概念
– Continual query
• Window
• Incremental computation
• 未解決分野
– Adaptive query processing
– Join of stream and relation (小山田さん@
NEC)

More Related Content

Viewers also liked

kibayos-ID/Locator-081031
kibayos-ID/Locator-081031kibayos-ID/Locator-081031
kibayos-ID/Locator-081031Mikio Yoshida
 
kibayos beaker-070829
kibayos beaker-070829kibayos beaker-070829
kibayos beaker-070829Mikio Yoshida
 
トランザクションの設計と進化
トランザクションの設計と進化トランザクションの設計と進化
トランザクションの設計と進化Kumazaki Hiroki
 
Baloncesto
BaloncestoBaloncesto
Baloncestoaha100
 
Etiquetas de cd
Etiquetas de cdEtiquetas de cd
Etiquetas de cdcastelbi
 
Ramazan hilal-2011
Ramazan hilal-2011Ramazan hilal-2011
Ramazan hilal-2011HamidAslan
 
Carta descriptiva wiki
Carta descriptiva wikiCarta descriptiva wiki
Carta descriptiva wikiairamm87
 
Tumanyan prezentation10000
Tumanyan prezentation10000 Tumanyan prezentation10000
Tumanyan prezentation10000 Armine
 
Christine Haigh: Financial markets and food price volatility - proposals to r...
Christine Haigh: Financial markets and food price volatility - proposals to r...Christine Haigh: Financial markets and food price volatility - proposals to r...
Christine Haigh: Financial markets and food price volatility - proposals to r...futureagricultures
 
Un bello-ejemplo-diapositivas
Un bello-ejemplo-diapositivasUn bello-ejemplo-diapositivas
Un bello-ejemplo-diapositivasJackson Dj
 
Wfwa tv, pbs ft. wayne
Wfwa tv, pbs ft. wayneWfwa tv, pbs ft. wayne
Wfwa tv, pbs ft. wayneweathervision
 
Englekirk News
Englekirk NewsEnglekirk News
Englekirk Newskimtanouye
 
Mysql+handlersocket=nosql
Mysql+handlersocket=nosqlMysql+handlersocket=nosql
Mysql+handlersocket=nosqlSergey Xek
 
National day of romania ziua nationala a romaniei
National day of romania ziua nationala a romanieiNational day of romania ziua nationala a romaniei
National day of romania ziua nationala a romanieibalada65
 

Viewers also liked (20)

kibayos-ID/Locator-081031
kibayos-ID/Locator-081031kibayos-ID/Locator-081031
kibayos-ID/Locator-081031
 
kibayos beaker-070829
kibayos beaker-070829kibayos beaker-070829
kibayos beaker-070829
 
トランザクションの設計と進化
トランザクションの設計と進化トランザクションの設計と進化
トランザクションの設計と進化
 
Prueba power
Prueba powerPrueba power
Prueba power
 
Agra tour iii article
Agra tour iii articleAgra tour iii article
Agra tour iii article
 
Cruzcerna
CruzcernaCruzcerna
Cruzcerna
 
Issue 7 March 2011
Issue 7 March 2011Issue 7 March 2011
Issue 7 March 2011
 
Baloncesto
BaloncestoBaloncesto
Baloncesto
 
Etiquetas de cd
Etiquetas de cdEtiquetas de cd
Etiquetas de cd
 
Ramazan hilal-2011
Ramazan hilal-2011Ramazan hilal-2011
Ramazan hilal-2011
 
Carta descriptiva wiki
Carta descriptiva wikiCarta descriptiva wiki
Carta descriptiva wiki
 
Tumanyan prezentation10000
Tumanyan prezentation10000 Tumanyan prezentation10000
Tumanyan prezentation10000
 
Christine Haigh: Financial markets and food price volatility - proposals to r...
Christine Haigh: Financial markets and food price volatility - proposals to r...Christine Haigh: Financial markets and food price volatility - proposals to r...
Christine Haigh: Financial markets and food price volatility - proposals to r...
 
Formato planeacion
Formato planeacionFormato planeacion
Formato planeacion
 
Un bello-ejemplo-diapositivas
Un bello-ejemplo-diapositivasUn bello-ejemplo-diapositivas
Un bello-ejemplo-diapositivas
 
Wfwa tv, pbs ft. wayne
Wfwa tv, pbs ft. wayneWfwa tv, pbs ft. wayne
Wfwa tv, pbs ft. wayne
 
Magento
MagentoMagento
Magento
 
Englekirk News
Englekirk NewsEnglekirk News
Englekirk News
 
Mysql+handlersocket=nosql
Mysql+handlersocket=nosqlMysql+handlersocket=nosql
Mysql+handlersocket=nosql
 
National day of romania ziua nationala a romaniei
National day of romania ziua nationala a romanieiNational day of romania ziua nationala a romaniei
National day of romania ziua nationala a romaniei
 

Similar to 学術的に見たストリームデータ処理(私見)

A sentient network - How High-velocity Data and Machine Learning will Shape t...
A sentient network - How High-velocity Data and Machine Learning will Shape t...A sentient network - How High-velocity Data and Machine Learning will Shape t...
A sentient network - How High-velocity Data and Machine Learning will Shape t...Wenjing Chu
 
Smart camera monitoring system
Smart camera monitoring systemSmart camera monitoring system
Smart camera monitoring systemArvind Krishnaa
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Jeffrey Sica
 
Computing Outside The Box
Computing Outside The BoxComputing Outside The Box
Computing Outside The BoxIan Foster
 
Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...
Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...
Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...Wright State University, Dayton, OH, USA
 
Issues with Ingesting/Staging/Analyzing Data in ConMon Implementation
Issues with Ingesting/Staging/Analyzing Data in ConMon ImplementationIssues with Ingesting/Staging/Analyzing Data in ConMon Implementation
Issues with Ingesting/Staging/Analyzing Data in ConMon ImplementationTieu Luu
 
Koss 1605 machine_learning_mariocho_t10
Koss 1605 machine_learning_mariocho_t10Koss 1605 machine_learning_mariocho_t10
Koss 1605 machine_learning_mariocho_t10Mario Cho
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
Implementation domain driven design - ch04 architecture
Implementation domain driven design - ch04 architectureImplementation domain driven design - ch04 architecture
Implementation domain driven design - ch04 architectureHarry Yao
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Seattle DAML meetup
 
Network and IT Operations
Network and IT OperationsNetwork and IT Operations
Network and IT OperationsNeo4j
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxPierre Schaus
 
TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...
TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...
TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...Alan Quayle
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Sri Ambati
 
Poster jsoe research expo 2008
Poster   jsoe research expo 2008Poster   jsoe research expo 2008
Poster jsoe research expo 2008bdemchak
 

Similar to 学術的に見たストリームデータ処理(私見) (20)

A sentient network - How High-velocity Data and Machine Learning will Shape t...
A sentient network - How High-velocity Data and Machine Learning will Shape t...A sentient network - How High-velocity Data and Machine Learning will Shape t...
A sentient network - How High-velocity Data and Machine Learning will Shape t...
 
USENIX OSDI2010 Report
USENIX OSDI2010 ReportUSENIX OSDI2010 Report
USENIX OSDI2010 Report
 
Smart camera monitoring system
Smart camera monitoring systemSmart camera monitoring system
Smart camera monitoring system
 
Spark Technology Center IBM
Spark Technology Center IBMSpark Technology Center IBM
Spark Technology Center IBM
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)
 
Computing Outside The Box
Computing Outside The BoxComputing Outside The Box
Computing Outside The Box
 
Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...
Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...
Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...
 
Issues with Ingesting/Staging/Analyzing Data in ConMon Implementation
Issues with Ingesting/Staging/Analyzing Data in ConMon ImplementationIssues with Ingesting/Staging/Analyzing Data in ConMon Implementation
Issues with Ingesting/Staging/Analyzing Data in ConMon Implementation
 
Koss 1605 machine_learning_mariocho_t10
Koss 1605 machine_learning_mariocho_t10Koss 1605 machine_learning_mariocho_t10
Koss 1605 machine_learning_mariocho_t10
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Implementation domain driven design - ch04 architecture
Implementation domain driven design - ch04 architectureImplementation domain driven design - ch04 architecture
Implementation domain driven design - ch04 architecture
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016
 
Network and IT Operations
Network and IT OperationsNetwork and IT Operations
Network and IT Operations
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...
TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...
TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...
 
Venkata brundavanam 2020
Venkata brundavanam 2020Venkata brundavanam 2020
Venkata brundavanam 2020
 
Venkata brundavanam 2020
Venkata brundavanam 2020Venkata brundavanam 2020
Venkata brundavanam 2020
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Poster jsoe research expo 2008
Poster   jsoe research expo 2008Poster   jsoe research expo 2008
Poster jsoe research expo 2008
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

学術的に見たストリームデータ処理(私見)

  • 4. STORM Norikra Jubatus CEP DSMS SPE Relational-stream XML-stream S4 STREAM System S Algorithm trading Borealis(MIT/Brandeis) Stream computing Complex event processing Online learning Incremental computation Continual query Spring (DTW) CPD (Change Point Detection) Window-aggregate Window-join FPGA GPU SASE Fraud detection Malware detection AQP (Adaptive Query Proc.) Esper BRIMOS Handshake-join Incr. LOCI Online LDA Window Real-time Tuple-stream Materialized view Tapestry
  • 5. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • 6. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • 7. Continual query, window • Continual query – DSMS: Queries are persistent, data are volatile – DBMS: Data are persistent, queries are volatile – CQ: Tapestryで導入された概念 • “Continuous queries over append only databases”, Terry, et.al, SIGMOD’92. • Window – 無限長のデータを有限長に変換 • Type: ROWS or TIME • Operators – Aggregate -> window aggregate – Join -> window join • S. Babu and J. Widom. Continuous Queries over Data Streams, SIGMOD Record, Sep. 2001
  • 8. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • 9. Relational Stream, DSMS • リレーショナルデータ処理をCQ化 – 狭義 • Selection, projection, … • Join, aggregate, set operationsは窓が必須 – 広義 • 各種のマイニング処理 • Relational completenessを満たせば何でもOK • DSMS (data stream management system) – 連続的問合せを管理するシステム • Relational – STREAM (Stanford) -> Coral8 -> Aleri -> Sybase -> SAP – Telegraph (UCB) -> CISCO – Aurora -> Borealis -> StreamBase • Non relational – STORM – S4(小山田さん@NECが詳しい)
  • 10. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • 11. Incremental computation • 差分計算 – 前と変わった部分のみ処理を行う計算方式 • 非常に多くの計算手法が提案 – Aggregate (MIN, MAX, AVG, SUM) – Similarity search (dynamic time warping(SPRING), ハウスドルフ距離) – Handshake join – Incremental LOCI (local outlier correlation integral)
  • 12. • “Incremental outlier detection in data streams using local correlation integral”. Xinjie Lu, Tian Yang, Zaifei Liao, Manzoor Elahi, Wei Liu, and Hongan Wang. SAC 2009. • “Incoop: MapReduce for Incremental Computations”. Pramod Bhatotia, Alexander Wieder, Rodrigo Rodrigues, Umut A. Acar, and Rafael Pasquini (MPI-SWS), SOCC 2011 • “Fast Incremental and Personalized PageRank”, Bahman Bahmani (Stanford University), Abdur Chowdhury (Twitter Inc.), Ashish Goel (Stanford University, Twitter Inc.), VLDB 2011 • “Incremental Graph Pattern Matching”, Wenfei Fan, University of Edinburgh; Jianzhong Li, Harbin Institute of Technology; Jizhou Luo, Harbin Institute of Technology; Zijing Tan, ; Xin Wang, University of Edinburgh; Yinghui Wu*, University of Edinburgh, SIGMOD 2011 • “iCBS: Incremental Cost-based Scheduling under Piecewise Linear SLAs”, Yun Chi (NEC Laboratories, America), Hyun Moon (NEC Labs America), Hakan Hacigumus (NEC Labs America), VLDB 2011 • “An Incremental Hausdorff Distance Calculation Algorithm”, Sarana Nutanong (University of Maryland), Edwin Jacox (University of Maryland), Hanan Samet (University of Maryland), VLDB 2011 • “Large-scale Incremental Processing Using Distributed Transactions and Notifications”, Daniel Peng and Frank Dabek, Google, Inc., OSDI 2011
  • 13. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • 14. Online-LDA • Limin Yao, Efficient Methods for Topic Model Inference on Streaming Document Collections, KDD’09 – Gibbs samplingを通常の20倍程度高速化 – 既存の高速サンプリング(下)より2倍高速 • I. Porteous, Fast collapsed Gibbs sampling for latent Dirichlet allocation. In SIGKDD, 2008. • Matthew D. Hoffman, Online Learning for Latent Dirichlet Allocation, NIPS’10 – オンライン変分ベイズ – Gibbs samplingとの比較がない – 変分ベイズより高性能と主張 – Blei’03に数行追加のみと,論文には記述
  • 15. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • 16. BRIMOS, Malware • Xrosscloud®橋梁監視ソリューション (BRIMOS®) – 橋梁に設置した各種センサを用いて、リアル タイムかつ継続的に橋の状態を監視する橋梁 モニタリングシステム • Malware – NICTER • 21万程度のhostのダークネットトラフィック • MWS’13に参加すれば利用可能 (NON STOP)
  • 17. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • 18. XML Stream • XFilterが起源 – Mehmet Altinel and Michael J. Franklin. “Efficient Filtering of XML Documents for Selective Dissemination of Information”. VLDB '00. • 学術的には重要 ? – High-Performance Complex Event Processing over XML Streams • Barzan Mozafari, Kai Zeng, Carlo Zaniolo (UCLA) • SIGMOD’12, Best paper award
  • 19. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • 20. 高性能H/W • FPGA – アプリ回路を構築可能 – プロトコルスタックも実装可能 – E-trees (http://e-trees.jp/) • GPU – 1500程度の並列性 • MIC – 天河2号で利用(スパコン Top 500で1位)
  • 21. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • 22. Adaptive Query Processing • Operator treeの構造を動的に変更 – “Eddies: continuously adaptive query processing”, Ron Avnur and Joseph M. Hellerstein, SIGMOD’00. A B C D A B C D A B C D A B C D Left Deep Tree
  • 23. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • 24. データ処理パラダイム • バッチ処理 – データが永続的 – RDB: Oracle, Vertica, GreenPlum – NoSQL: Hadoop, MongoDB • リアルタイム処理 – 問合せが永続的,窓関数 – DSMS: 日立,Sybase, IBM – NoSQL: Storm, S4 Database メモリ(低遅延) ディスク(高遅延) ストリーム 分析 処理 Database メモリ(低遅延) ディスク(高遅延) ストリーム 分析 処理 DSMS: Data Stream Processing System
  • 25. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • 26. ストリーム処理の問題点:応用 • 演算 – Window-join – Window-aggregate – Window-something • 「ぱっと今すぐに知りたい」事象とは? – アルゴリズムトレーディング – マルウェア検知 – RTB、クレジットカードなどの詐欺
  • 27. まとめ • キーワードの分類 • 重要概念 – Continual query • Window • Incremental computation • 未解決分野 – Adaptive query processing – Join of stream and relation (小山田さん@ NEC)