Published on

Talk at International Workshop on Streaming Media Delivery and Management Systems

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. A HIGH THROUGHPUT COMPLEX EVENT DETECTION TECHNIQUE WITH BULK EVALUATION Naotaka Nishimura (University of Tsukuba) Hideyuki Kawashima (University of Tsukuba) Hiroyuki Kitagawa (University of Tsukuba)
  2. 2. Outline Background Related work (SASE) State of the art Chance for further improvement Proposal: Bulk evaluation Extension of SASE Evaluation Conclusions and Future work
  3. 3. Big Data Streams -- Volume, Velocity, Variety, Veracity, Value -3 Social network Facebook, 600 TB in a day (VLDB’13 Keynote) Monitoring System CISCO, 322 Tbps Science LHC, 15 PB / year LSST, 20 TB / day CRS-3
  4. 4. Quick Review Data Stream Management System (DSMS) 4 How many packets are arrived for port 80 in a minute ? SELECT COUNT(*) FROM eth0[TIME 1 MIN] WHERE port = 80 Q1 DSMS 20 Relational schema Relation eth0 ・Destination IP ・Source IP ・Destination Port ・Source Port ・Interface (e.g. eth0) ・Length ・Version (e.g. IPV4 ) ・Payload
  5. 5. SELECT COUNT(*) FROM eth0[TIME 1 MIN] WHERE port = 80 5 DSMS Data Input adapter w σ α Query Output adapter Result Users/Apps. SQL is translated to operator tree. On arrival of data, tree is evaluated. Operators are based on relational database w(Window): Cutting off relations from a stream σ (Selection): Filter α (Aggregation): such as AVG, MIN, MAX CEP (complex event processing)
  6. 6. Complex Event Processing (CEP) Detect a certain pattern from input stream data Stream Data A1 A2 B3 C4 E5 D6 D7 … Query Pattern:A→B→D Pattern occurrences (sequences of events specified by user) A1→B3→D6 A2→B3→D6 A1→B3→D7 A2→B3→D7 …
  7. 7. Complex Event Processing (CEP) A case for order management in a restaurant. Detect a guest who passed entrance and took a seat. Pattern: Entrance→Seat RFID Place Seat2 Seat3 Floor Entrance Seat6 Seat5 9:54:11 xx Toilet Seat4 Entrance 10:10:01 xx 10:10:31 yy Floor Seat1 TagID Entrance Toilet Time 10:10:31 yy Seat5 10:11:11 yy A pattern occurrence is constructed by 2
  8. 8. Outline Background Related work (SASE) Proposal Evaluation Conclusions and Future work
  9. 9. SASE [1] Overview (1/2) [1]:High-Performance Complex Event Processing over Streams, ACM SIGMOD 2006 SASE detects specified patterns using NFA(Non deterministic Finite Automata). NFA (quick review) Is a finite automaton which can achieve multiple states at the same time. FA is an architecture that transits from current state to next state by input symbol. It is constituted of initial state, acceptance state, state set, input symbol, and transition function. Ex) NFA that detects A→B→D • Self transition; This is a self loop transition which is invoked by every event.
  10. 10. SASE Overview (2/2) Problem of NFA: NFA can detect specified patterns, but it does not produce pattern occurrences (sequence of input events that achieved acceptance state) SASE Utilizes stack structure (AIS) to output pattern occurrences. AIS (Active Instance Stack) For a state, an AIS is prepared 0 A 1 * AIS B D 2 3 * AIS AIS
  11. 11. Behavior of SASE (1/3) Translate a query pattern to an NFA A 0 A B 1 * B D 2 * D 3
  12. 12. Behavior of SASE (2/3) Prepare an AIS for each state of NFA Create a link when an event is pushed Event arrival sequence t a1 c2 b3 a4 d5 0 A 1 * a1 a4 B D 2 * b3 3 d5
  13. 13. Behavior of SASE (3/3) Create a pattern occurrence when acceptance state is achieved using link information 0 A 1 B D 2 * 3 * a1 a4 b3 d5 a1 b3 d5
  14. 14. IDEA: If we can evaluate d7 and d9 in a lump, the cost Problem of a1-b3 should be reduced (2 to 1). for constructing SASE we found Duplicate generation (e.g. b3 → a1) b6,d7,a8,d9 0 A 1 B 0 D 2 * 3 * a1 b3 a4 d7 b6 A 1 B * b3 b6 b3 d7 d7 d7 3 * a1 b3 a4 b6 a8 Result Generation a1 a1 a4 D 2 Result Generation a1 a1 a4 b3 b6 b3 d9 d9 d9 d9
  15. 15. Outline Background Related work (SASE) Proposal: Bulk evaluation Extension of SASE Evaluation Conclusions and Future work
  16. 16. Concept: Bulk Evaluation Generate Result Generate Result Generate Result Generate Result Generate Result [SASE] a1 c2 b3 a4 d5 b6 d7 a8 d9 b10 d11 d12 b13 [Proposal] Generate Result Generate Result t
  17. 17. Behavior of Proposal (1/3) Create a link when an event is pushed to AIS Keep D events, different from SASE a1 c2 b3 a4 d5 b6 d7 a8 d9 0 A 1 B D 2 3 * * a1 b3 d5 a4 b6 d7 a8 d9 t
  18. 18. Behavior of Proposal (2/3) Create a cluster on final AIS 0 A 1 B D 2 * * a1 b3 a4 b6 a8 3 0 A 1 B D 2 3 * d5 * a1 b3 d5 d7 a4 b6 d7 d9 a8 d9
  19. 19. Behavior of Proposal (3/3) Create pattern occurrences in a bulk Result with d9 is made with result on d7 0 A 1 B D 2 3 * * a1 b3 d5 a4 b6 d7 a8 d9 a1 b3 d5 a1 a1 a4 b3 b6 b6 d7 d7 d7 a1 a1 a4 b3 b6 b6 d9 d9 d9
  20. 20. Outline Background Related work Proposal Evaluation Conclusions and Future work
  21. 21. Environment for Experiment OS: WindowsXP RAM: 3GB CPU: Intel Core2Duo E8400(3GHz) Language: Java(JRE 1.7.0_4)
  22. 22. Result of Experiment Pattern:A→B→D 5.24 times
  23. 23. Outline Background Related work Proposal Evaluation Conclusions and Future work
  24. 24. Conclusions and Future Work Conclusions SASE had a chance for further improvement on throughput. Bulk evaluation scheme improved throughput. Factor of 5.24 at the maximum case Future work Implementing the proposal to Falcon
  25. 25. CryptDB Privacy Preservation Encryption FPGA Privacy ML&DM Jubatus Online LDA CPD SQL Norikra System S Borealis Puma MADLib @UCB Spring (DTW) Data Mining & Machine Learning Esper Kafka SASE STORM Cayuga Window join Bismarck Online @Stanford Intel MIC NoSQL MLBase Oracle-R Incr. LOCI Tilera Accelerator Falcon 26 GPGPU Window aggregate Relational stream Continual query & Window Complex event processing Tuple stream S4
  26. 26. 27 - UDP-RX - Window join (64-cores) Performance Monitor Falcon Basic: 6.7 millions of tuples per second Proposal: 14.6 millions of tuples per second