Taming
Big Data Streams
Hideyuki KAWASHIMA
Center for Computational Sciences
University of Tsukuba, Japan
STORM
Norikra
Jubatus
Relational-stream
XML-stream
S4
Puma
System S
MillWheel
Complex event processing
Machine learning
In...
Privacy PreservationCryptDB GPGPU Intel MICFPGA TileraEncryption
Privacy Accelerator
ML&DM SQL NoSQL
Relational
stream
Nor...
Which Analytics Style ?
SQL style NoSQL style
Embedded operators
Filter, join, aggregation
User defined operators
Python, ...
Relational
stream
Norikra
Online
Data Mining &
Machine Learning
EsperPuma
Complex
event
processing
Jubatus
BorealisSystem ...
3 Research Topics
• Falcon
– In-DSMS analytics system
– Multiple query optimization for CPD
• Window Operators
– Join oper...
A Multiple Query Optimization Scheme
for Change Point Detection on Falcon
Joint work with Masahiro OKE
Presented at BIRTE’...
SELECT COUNT(*)
FROM eth0[TIME 1 MIN]
WHERE port = 80
DSMS
Relation
eth0
・Destination IP
・Source IP
・Destination Port
・Sou...
• SQL is translated to operator tree.
• On arrival of data, tree is evaluated.
• Operators are based on relational databas...
A Target Application: Malware Detection
• Real datasets
– “Anti Malware engineering WorkShop 2013 (MWS
2013)”
– Extracted ...
Relational data processing
Attack Detection
Discussion
?• Aggregates are good
CPD(AR)/ LOF / LDA/FIM
Yet Another DSMS: Fal...
Example Query on Falcon (1/2)
• #Access for each port ? [1]
• Group by aggregates
SELECT dst_port,
COUNT(dst_port)
FROM pk...
Example Query on Falcon (2/2)
• Access on each port ? [2]
• Outlier score for each port/sec
select dst_port,
cpd(dst_port)...
Dividing CPD into 4 operators
Compute outiler score and
Moving average score
(omitting shwoing outlier score)
1st stage le...
Problem of CPD: Parameter setting
• CPD requires 6 parameters (𝛼 𝑅, 𝛼 𝐾, 𝛼 𝑇, 𝛽 𝑅, 𝛽 𝐾 , 𝛽 𝑇)
• Appropriate parameter sett...
Parameterset
2
A simple way for parameter tuning:
---Multiple CPDs with different parameter sets---
Input packet
Compute o...
The 4 sharing patterns
-- Only branch cases, not merge --
Compute
outiler
score
1st stage
learning
Compute
outiler
score
2...
Experiment
Measuring execution time when sharing ONLY
1st stage learning
– Implement CPD by C++ and eigen library (for
ma...
Exec. Time with Sharing 1st Stage Learning
ID
Parameters
Execution Time
(second)
Performance
Gain (times)
𝛼 𝑅 𝛼 𝐾 𝛼 𝑇 𝛽 𝑅 ...
3 Research Topics
• Falcon
– In-DSMS analytics system
• Window Operators acclerated by FPGA
– Join operator over data stre...
A Novel Architecture of
Merging Network
for Handshake Join
Joint work with
Yasin OGE, Takefumi MIYOSHI, Tsutomu YOSHINAGA
...
Falcon
- UDP-RX
- Window join (64-cores)
Performance
Monitor
22
FPGA instead of Data Center
Demo System at SSDBM’13
Example|Simple Continuous Query
S1.key = S2.key
S1 [Rows 100], S2 [Rows 100]
*SELECT
FROM
WHERE
23
key
value
key
value
tup...
S1.key = S2.keyWHERE
Window Operators
*SELECT
]]S1 [Rows 100], S2 [Rows 100]FROM
24
key
value
key
value
window
w
w
S1 [Rows 100], S2 [Rows 100]FROM
Join Operator
*SELECT
S1.key = S2.keyWHERE
25
w
w
key
value
key
value
join
*SELECT
S1 [Rows 100], S2 [Rows 100]FROM
S1.key = S2.keyWHERE
Overall Query Plan
26
w
w
key
value
key
value
HANDSHAKE JOIN
J. Teubner and R. Müller, SIGMOD’11
Handshake Join
basic idea
advantage
streams flow in opposite direction
highly parallel evaluation
input
stream 1
input
str...
Handshake Join
29
window of stream 2
old
tuple
new
tuple
new
tuple
old
tuple
window of stream 1
Handshake Join|Parallelization
30
Processor1 Processor2
divided into two sub-windows
Handshake Join|Parallelization
31
Processor1 Processor2 Processor3
divided into three sub-windows
Naïve
IMPLEMENTATION
Baseline Implementation
Join Core Join Core Join Core Join Core
33
dedicated processor
for join operation
Baseline Implementation
Join Core Join Core Join Core Join Core
buffer
34
results are stored in these buffers
Baseline Implementation
Merger
MergerMerger
Join Core Join Core Join Core Join Core
35
buffer
MergingNetwork
Output Data Flow
Merger
MergerMerger
Join Core Join Core Join Core Join Core
36
3
2
1
Merger
MergerMerger
Join Core Join Core Join Core
Issue|Inefficient Buffer Utilization
37
Join Core
3
2
1
Merger
MergerMerger
Join Core Join Core Join Core
Issue|Inefficient Buffer Utilization
38
Join Core
Merger
MergerMerger
Join Core Join Core Join Core
Issue|Inefficient Buffer Utilization
39
Join Core
Merger
MergerMerger
Join Core Join Core Join Core
Issue|Inefficient Buffer Utilization
40
Join Core
in case of 64cores
onl...
PROPOSED
IMPLEMENTATION
Adaptive Merging Network
Proposed Implementation
Join Core Join Core Join Core Join Core
42
Proposed Implementation
Join Core Join Core Join Core Join Core
43
Here, no buffering is required!
NO
buffer
Proposed Implementation
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Core
44
Proposed Implementation
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Core
45
buffer
Now, res...
Proposed Implementation
Merger
MergerMerger
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Cor...
Output Data Flow
47
Merger
MergerMerger
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Core
1
48
Merger
MergerMerger
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Core
2
Merger
MergerMerger
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Core 49
21 3 4
Merger
MergerMerger
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Core 50
21 3 4up to 100%
ut...
Performance|Proposed (64 join cores)
0
0.5
1
1.5
2
2.5
3
3.5
10 20 30 40 50 60 70 80 90 100
スループット[100万タプル/秒]
Match rate [...
Falcon
- UDP-RX
- Window join (64-cores)
Performance
Monitor
52
Basic: 6.7 millions of tuples per second
Proposal: 14.6 mi...
Wire-Speed Implementation of
Sliding-Window Aggregate Operator
over Out-of-Order Data Streams
Int’l Symposium on Embedded ...
3 Research Topics
• Falcon
– In-DSMS analytics system
• Window Operators
– Join operator over data streams
• CryptStream
–...
A Security aware Stream Data
Processing Scheme on the Cloud
Joint work with
Katsuhiro TOMIYAMA
Introduced at
CloudDB’11
Related work:CryptDB
• CryptDB [R. A. Popa, et al. SOSP’11]
– Realizes relational operators over encrypted data
on traditi...
Ciphers of CryptDB/CryptStream
• DET (Deterministic)
– Be able to check the equality of two encrypted values.
– 𝑥 = 𝑦 ⇔ 𝐷𝐷...
Scheme of CryptStream
Encryption
module
Encryption
module
Encryption
module
Trusted area Trusted area Trusted area
Public ...
• The side effect of encryption:Increase Tuple Size
– Generating three kinds of cipher
• Already Implemented onto Falcon
•...
3 Research Topics
• Falcon
– In-DSMS analytics system
• Window Operators
– Join operator over data streams
– Aggregate ope...
Summary
Relational
stream
Norikra
Online
Data Mining &
Machine Learning
EsperPuma
Complex
event
processing
Jubatus
MillWheelSystem...
Frontiers for Accelerators
SQL
Types of relational operator
are limited.
New techs are
created everyday !
63
NoSQL, Privac...
Upcoming SlideShare
Loading in...5
×

MCSoC'13 Keynote Talk "Taming Big Data Streams"

251

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
251
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

MCSoC'13 Keynote Talk "Taming Big Data Streams"

  1. 1. Taming Big Data Streams Hideyuki KAWASHIMA Center for Computational Sciences University of Tsukuba, Japan
  2. 2. STORM Norikra Jubatus Relational-stream XML-stream S4 Puma System S MillWheel Complex event processing Machine learning Incremental computation Continual query Spring (DTW) CPD (Change Point Detection) Window-aggregate Window-join FPGA GPU SASE Esper Handshake-join Incr. LOCI Online LDA Window Tuple-stream A Variety of Data Processing Techniques Cayuga Privacy Preservation CryptDB Data mining Kafka MLBase 2
  3. 3. Privacy PreservationCryptDB GPGPU Intel MICFPGA TileraEncryption Privacy Accelerator ML&DM SQL NoSQL Relational stream Norikra Online Data Mining & Machine Learning EsperPuma Complex event processing Jubatus BorealisSystem S Window aggregate SASE Cayuga Continual query & Window S4 Window join CPD Online LDA Tuple stream Kafka STORM MLBase Incr. LOCI Spring (DTW) IBM Facebook Line Twitter NTT & PFI UCB
  4. 4. Which Analytics Style ? SQL style NoSQL style Embedded operators Filter, join, aggregation User defined operators Python, java, C++, … Machine learning Data mining Machine learning Data mining 4 Poor operators High performance Rich operators Low performance In-DB style Embedded operators Filter, join, aggregation Machine learning Data mining Rich operators High performance Oracle-R MADLib
  5. 5. Relational stream Norikra Online Data Mining & Machine Learning EsperPuma Complex event processing Jubatus BorealisSystem S Window aggregate SASE Cayuga Continual query & Window S4 Window join CPD Online LDA Tuple stream Privacy PreservationCryptDB GPGPU Intel MICFPGA TileraEncryption Privacy Accelerator ML&DM SQL NoSQL Kafka STORM MLBase Incr. LOCI Spring (DTW) Falcon 5 MADLib @UCB Bismarck @Stanford Oracle-R
  6. 6. 3 Research Topics • Falcon – In-DSMS analytics system – Multiple query optimization for CPD • Window Operators – Join operator over data streams • Crypt stream – Privacy preserving stream data processing 6
  7. 7. A Multiple Query Optimization Scheme for Change Point Detection on Falcon Joint work with Masahiro OKE Presented at BIRTE’13
  8. 8. SELECT COUNT(*) FROM eth0[TIME 1 MIN] WHERE port = 80 DSMS Relation eth0 ・Destination IP ・Source IP ・Destination Port ・Source Port ・Interface (e.g. eth0) ・Length ・Version (e.g. IPV4 ) ・Payload Relational schema 20 Quick Review Data Stream Management System (DSMS) Q1 8 How many packets are arrived for port 80 in a minute ?
  9. 9. • SQL is translated to operator tree. • On arrival of data, tree is evaluated. • Operators are based on relational database – w(Window): Cutting off relations from a stream – σ (Selection): Filter – α (Aggregation): such as AVG, MIN, MAX Query Result Users/Apps. w σ αInput adapter Output adapter DSMS Data SELECT COUNT(*) FROM eth0[TIME 1 MIN] WHERE port = 80 9
  10. 10. A Target Application: Malware Detection • Real datasets – “Anti Malware engineering WorkShop 2013 (MWS 2013)” – Extracted by NEGI proposed by Dr. Shinichi Isida. • NICTER – Keeps about 160,000 unused ip addresses (DARK NET) • Packets to dark net are considered as attacks. – Uses CPD (Change Point Detection) [1]) to detect attacks such as DoS (denial of services). [1] Daisuke Inoue, K. Yoshioka, M. Eto, Masaya Yamagata, Eisuke Nishino, Jun-ichi Takeuchi, Kazuya Ohkouchi, Koji Nakao: An Incident Analysis System NICTER and Its Analysis Engines Based on Data Mining Techniques. ICONIP (1) 2008: 579-586 [2] J. Takeuchi and K. Yamanishi, “A Unifying Framework for Detecting Outliers and Change Points from Time Series,” IEEE TKDE, pp.482-492, 2006. 10
  11. 11. Relational data processing Attack Detection Discussion ?• Aggregates are good CPD(AR)/ LOF / LDA/FIM Yet Another DSMS: Falcon 11
  12. 12. Example Query on Falcon (1/2) • #Access for each port ? [1] • Group by aggregates SELECT dst_port, COUNT(dst_port) FROM pkt[1 sec] GROUP BY dst_port g-pkt src_ip dst_ip src_port dst_port seq_no packet_size timestamp protocol ack fin syn urg push reset content 22: 2 80: 2 15: 1 22 N I C 80 15 80 22 1 second [1] “Enabling Real Time Data Analysis”, Divesh Srivastava (AT&T Labs), et, al. Keynote talk, VLDB 2010. (a similar query is found in pp.15 of talk slide) 12
  13. 13. Example Query on Falcon (2/2) • Access on each port ? [2] • Outlier score for each port/sec select dst_port, cpd(dst_port) from pkt[1 sec] group by dst_port g-cpd-pkt src_ip dst_ip src_port dst_port seq_no packet_size timestamp protocol ack fin syn urg push reset content 22: 1.33 80: 2.44 15: 1.22 22 N I C 80 15 80 22 1 second [2] “An Incident Analysis System NICTER and Its Analysis Engines Based on Data Mining Techniques”, Daisuke Inoue (NICT), et, al. ICONIP (1) 2008: 579- 586 13
  14. 14. Dividing CPD into 4 operators Compute outiler score and Moving average score (omitting shwoing outlier score) 1st stage learning Compute outiler score and Moving average score Input tx 2nd stage learning Outlier scoreMoving average score Probability provided by 2nd stage learning Compute outiler score and Moving average ascore Input time series data Probability provided by 1st stage learning 14
  15. 15. Problem of CPD: Parameter setting • CPD requires 6 parameters (𝛼 𝑅, 𝛼 𝐾, 𝛼 𝑇, 𝛽 𝑅, 𝛽 𝐾 , 𝛽 𝑇) • Appropriate parameter setting is necessary … but it is difficult – Blue: # accesses, Red: CPD score Using appropriate parameter set Using inappropriate parameter set 15
  16. 16. Parameterset 2 A simple way for parameter tuning: ---Multiple CPDs with different parameter sets--- Input packet Compute outiler score 1st stage learning Compute outiler score 2nd stage learning Compute outiler score 1st stage learning Compute outiler score 2nd stage learning Result aggregation (e.g. majority voting) Parameterset 3 Parameterset 4 Parameterset 0k Issue: How to accelerate multiple CPD executions ? Approach: Multiple query optimization 16
  17. 17. The 4 sharing patterns -- Only branch cases, not merge -- Compute outiler score 1st stage learning Compute outiler score 2nd stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score 1st stage learning 2nd stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score 2nd stage learning Compute outiler score 1st stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score 1st stage learning Compute outiler score Compute outiler score 2nd stage learning NOTE: “1st stage learning” and “3rd stage learning” can be divided to sub operators, and a part of sub operators can also be shared. The sharing patterns are described in the paper. Pattern 1: Sharing CPD-1 if α_R and α_K are the same. Pattern 2: Sharing CPD-1, 2 if α_R, α_K and α_T are the same. Pattern 3: Sharing CPD-1, 2, 3 if α_R, α_K, α_T, β_R and β_K are the same. Pattern 4: Sharing CPD-1, 2, 3, 4 if α_R, α_K, α_T, β_R, β_K and β_T are the same. Pattern 1 Pattern 2 Pattern 3 Pattern 4 17
  18. 18. Experiment Measuring execution time when sharing ONLY 1st stage learning – Implement CPD by C++ and eigen library (for matrix manipulation). – Measured execution time using the CPD. 18
  19. 19. Exec. Time with Sharing 1st Stage Learning ID Parameters Execution Time (second) Performance Gain (times) 𝛼 𝑅 𝛼 𝐾 𝛼 𝑇 𝛽 𝑅 𝛽 𝐾 𝛽 𝑇 Naive Shared CPD-1 Shared CPD-1 1 .02 2 5 .02 3 5 2.92 1.77 1.65 2 .02 4 5 .02 3 5 3.65 1.77 2.06 3 .02 2 5 .02 4 5 3.29 2.17 1.52 4 .005 2 5 .02 4 5 2.91 1.76 1.65 5 .02 2 5 .005 3 5 2.89 1.77 1.64 6 .02 2 7 .02 7 5 3.00 1.87 1.60 7 .02 1 5 .02 1 5 1.96 1.11 1.78 8 .02 10 10 .02 10 10 11.2 5.84 1.92 9 .02 1 10 .02 10 10 6.66 5.80 1.15 10 .02 10 10 .02 1 10 6.68 1.34 5.00 19 Parallel execution with accelerator is required.
  20. 20. 3 Research Topics • Falcon – In-DSMS analytics system • Window Operators acclerated by FPGA – Join operator over data streams • Crypt stream – Privacy preserving stream data processing 20
  21. 21. A Novel Architecture of Merging Network for Handshake Join Joint work with Yasin OGE, Takefumi MIYOSHI, Tsutomu YOSHINAGA Presented at ICNC’11, FPL’11, MCSoC’12, SSDBM’13.
  22. 22. Falcon - UDP-RX - Window join (64-cores) Performance Monitor 22 FPGA instead of Data Center Demo System at SSDBM’13
  23. 23. Example|Simple Continuous Query S1.key = S2.key S1 [Rows 100], S2 [Rows 100] *SELECT FROM WHERE 23 key value key value tuple
  24. 24. S1.key = S2.keyWHERE Window Operators *SELECT ]]S1 [Rows 100], S2 [Rows 100]FROM 24 key value key value window w w
  25. 25. S1 [Rows 100], S2 [Rows 100]FROM Join Operator *SELECT S1.key = S2.keyWHERE 25 w w key value key value join
  26. 26. *SELECT S1 [Rows 100], S2 [Rows 100]FROM S1.key = S2.keyWHERE Overall Query Plan 26 w w key value key value
  27. 27. HANDSHAKE JOIN J. Teubner and R. Müller, SIGMOD’11
  28. 28. Handshake Join basic idea advantage streams flow in opposite direction highly parallel evaluation input stream 1 input stream 2 window 28
  29. 29. Handshake Join 29 window of stream 2 old tuple new tuple new tuple old tuple window of stream 1
  30. 30. Handshake Join|Parallelization 30 Processor1 Processor2 divided into two sub-windows
  31. 31. Handshake Join|Parallelization 31 Processor1 Processor2 Processor3 divided into three sub-windows
  32. 32. Naïve IMPLEMENTATION
  33. 33. Baseline Implementation Join Core Join Core Join Core Join Core 33 dedicated processor for join operation
  34. 34. Baseline Implementation Join Core Join Core Join Core Join Core buffer 34 results are stored in these buffers
  35. 35. Baseline Implementation Merger MergerMerger Join Core Join Core Join Core Join Core 35 buffer MergingNetwork
  36. 36. Output Data Flow Merger MergerMerger Join Core Join Core Join Core Join Core 36 3 2 1
  37. 37. Merger MergerMerger Join Core Join Core Join Core Issue|Inefficient Buffer Utilization 37 Join Core 3 2 1
  38. 38. Merger MergerMerger Join Core Join Core Join Core Issue|Inefficient Buffer Utilization 38 Join Core
  39. 39. Merger MergerMerger Join Core Join Core Join Core Issue|Inefficient Buffer Utilization 39 Join Core
  40. 40. Merger MergerMerger Join Core Join Core Join Core Issue|Inefficient Buffer Utilization 40 Join Core in case of 64cores only 5% of usage
  41. 41. PROPOSED IMPLEMENTATION Adaptive Merging Network
  42. 42. Proposed Implementation Join Core Join Core Join Core Join Core 42
  43. 43. Proposed Implementation Join Core Join Core Join Core Join Core 43 Here, no buffering is required! NO buffer
  44. 44. Proposed Implementation Ring Node Ring Node Ring Node Ring Node Join Core Join Core Join Core Join Core 44
  45. 45. Proposed Implementation Ring Node Ring Node Ring Node Ring Node Join Core Join Core Join Core Join Core 45 buffer Now, results are stored in these buffers!
  46. 46. Proposed Implementation Merger MergerMerger Ring Node Ring Node Ring Node Ring Node Join Core Join Core Join Core Join Core 46 NO buffer
  47. 47. Output Data Flow 47 Merger MergerMerger Ring Node Ring Node Ring Node Ring Node Join Core Join Core Join Core Join Core 1
  48. 48. 48 Merger MergerMerger Ring Node Ring Node Ring Node Ring Node Join Core Join Core Join Core Join Core 2
  49. 49. Merger MergerMerger Ring Node Ring Node Ring Node Ring Node Join Core Join Core Join Core Join Core 49 21 3 4
  50. 50. Merger MergerMerger Ring Node Ring Node Ring Node Ring Node Join Core Join Core Join Core Join Core 50 21 3 4up to 100% utilization
  51. 51. Performance|Proposed (64 join cores) 0 0.5 1 1.5 2 2.5 3 3.5 10 20 30 40 50 60 70 80 90 100 スループット[100万タプル/秒] Match rate [%] Throughput[1Mtuples/sec] 51 nested loop baseline proposed
  52. 52. Falcon - UDP-RX - Window join (64-cores) Performance Monitor 52 Basic: 6.7 millions of tuples per second Proposal: 14.6 millions of tuples per second
  53. 53. Wire-Speed Implementation of Sliding-Window Aggregate Operator over Out-of-Order Data Streams Int’l Symposium on Embedded Multicore/Many-core SoCs Sept. 26– 28, 2013, Tokyo Yasin Oge∗, Masato Yoshimi ∗, Takefumi Miyoshi†, Hideyuki Kawashima‡, Hidetsugu Irie ∗, and Tsutomu Yoshinaga∗ ∗Univ. of Electro-Communications †e-trees.Japan, Inc. ‡Univ. of Tsukuba
  54. 54. 3 Research Topics • Falcon – In-DSMS analytics system • Window Operators – Join operator over data streams • CryptStream – Privacy preserving stream data processing 54
  55. 55. A Security aware Stream Data Processing Scheme on the Cloud Joint work with Katsuhiro TOMIYAMA Introduced at CloudDB’11
  56. 56. Related work:CryptDB • CryptDB [R. A. Popa, et al. SOSP’11] – Realizes relational operators over encrypted data on traditional relational RDBMS #Our research goal is to achieve it on an SPE – Encrypts each value of the data stored in the DBMS • Uses more than one type of encryption having different characteristics ⇒Three kinds of cipher are generated for each value of one plain value Trusted area Untrusted area val 80 val-DET val-OPE val-HOM DET(80) OPE(80) HOM(80) Encrypted value Plain value Database proxy UDF DECRYPT_RND DECRYPT_DET … Client DBMS 56
  57. 57. Ciphers of CryptDB/CryptStream • DET (Deterministic) – Be able to check the equality of two encrypted values. – 𝑥 = 𝑦 ⇔ 𝐷𝐷𝐷 𝐾 𝑥 = 𝐷𝐷𝐷 𝐾 𝑦 • OPE (Order-preserving) – Be able to check the inequality of two encrypted values. – 𝑥 < 𝑦 ⇔ 𝑂𝑂𝑂 𝐾 𝑥 < 𝑂𝑂𝑂 𝐾 𝑦 • HOM (Homomorphic) – Be able to execute addition operation over two encrypted values. – 𝐻𝐻𝐻 𝐾 𝑥 ∙ 𝐻𝐻𝐻 𝐾 𝑦 = 𝐻𝐻𝐻 𝐾(𝑥 + 𝑦) 57
  58. 58. Scheme of CryptStream Encryption module Encryption module Encryption module Trusted area Trusted area Trusted area Public cloud (Untrusted area) Trusted area Decryption module ResultResult Stream processing engine 2012/10/29 Client
  59. 59. • The side effect of encryption:Increase Tuple Size – Generating three kinds of cipher • Already Implemented onto Falcon • Performance improvement is left on future work – Partially resolved by our work at CloudDB’11. id temp 1 32 id-DET id-OPE id-HOM DET(1) OPE(1) HOM(1) temp-DET temp-OPE temp-HOM DET(32) OPE(32) HOM(32) Encrypted stream data processing scheme 59
  60. 60. 3 Research Topics • Falcon – In-DSMS analytics system • Window Operators – Join operator over data streams – Aggregate operator over data streams • Crypt stream – Privacy preserving stream data processing 60 These 3 researches are in progress...
  61. 61. Summary
  62. 62. Relational stream Norikra Online Data Mining & Machine Learning EsperPuma Complex event processing Jubatus MillWheelSystem S Window aggregate SASE Cayuga Continual query & Window S4 Window join CPD Online LDA Tuple stream Privacy PreservationCryptDB GPGPU Intel MICFPGA TileraEncryption Privacy Accelerator ML&DM SQL NoSQL Kafka STORM MLBase Incr. LOCI Spring (DTW) 62
  63. 63. Frontiers for Accelerators SQL Types of relational operator are limited. New techs are created everyday ! 63 NoSQL, Privacy, DM&ML Blue ocean, many chances !Red ocean, but big return ! An FPGA memcached appliance S. Chalamalasetti, et al, FPGA'13 Deep Learning with COTS HPC A. Coates, et al, ICML’13 Netezza Exadata
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×