Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Efficiently Handling Streams from Millions of Sesors (@KIT - 2nd BMBF Big Data All Hands Meeting and 2nd Smart Data Innovation Conference)

148 views

Published on

The 2nd Big Data All Hands Meeting (BDAHM) is held in conjunction with the 2nd Smart Data Innovation Conference (SDIC).

This talk about "Efficiently Handling Streams from Millions of Sensors" provides an introduction to three recent research works dealing with massive sensor data streams in the age of the Internet of Things.

We present three publications and show in a big picture how they enable dealing with potentially billions of IoT sensors.

1)Real-time sensor data enables diverse applications such as smart metering, traffic monitoring, and sport analysis. In the Internet of Things, billions of sensor nodes form a sensor cloud and offer data streams to analysis systems. However, it is impossible to transfer all available data with maximal frequencies to all applications. Therefore, we need to tailor data streams to the demand of applications. We contribute a technique that optimizes communication costs while maintaining the desired accuracy. Our technique schedules reads across huge amounts of sensors based on the data-demands of a huge amount of concurrent queries. We introduce user-defined sampling functions that define the data-demand of queries and facilitate various adaptive sampling techniques, which decrease the amount of transferred data. Moreover, we share sensor reads and data transfers among queries. Our experiments with real-world data show that our approach saves up to 87% in data transmissions.

2) We present Cutty, an innovative technique for the efficient aggregation of user-defined windows over data streams. While the aggregation of periodic sliding and tumbling windows was extensively studied in the past, little to no work was done on optimizing the aggregation of common, non-periodic windows. Typical examples of non-periodic windows are punctuation windows and sessions which can implement complex business logic. Cutty performs aggregate sharing for data stream windows, which are declared as user-defined functions (UDFs) and can contain arbitrary business logic. Cutty outperforms the state of the art for aggregate sharing on single and multiple queries. Moreover, it enables aggregate sharing for a broad class of non-periodic UDWs.

3) We present I², an interactive development environment for real-time analysis pipelines, which is based on Apache Flink and Apache Zeppelin. The sheer amount of available streaming data frequently makes it impossible to visualize all data points at the same time. I² coordinates running cluster applications and corresponding visualizations such that only the currently depicted data points are processed in Flink and transferred towards the front end. We show how Flink jobs can adapt to changed visualization properties at runtime to allow interactive data exploration on high bandwidth data streams. Moreover, we present a data reduction technique which minimizes data transfer while providing loss free time-series plots.

Published in: Data & Analytics
  • Be the first to comment

Efficiently Handling Streams from Millions of Sesors (@KIT - 2nd BMBF Big Data All Hands Meeting and 2nd Smart Data Innovation Conference)

  1. 1. 2nd BMBF Big Data All Hands Meeting and 2nd Smart Data Innovation Conference Karlsruhe , October 11.-12., 2017 Presenting at Efficiently Handling Streams from Millions of Sensors Jonas Traub – TU Berlin / DFKI 1
  2. 2. The Growth of the Internet of Things Gartner says 6.4 billion connected "Things" will be in use in 2016 and more than 20 billion in 2020. Year # Devices (in billions) Jonas Traub – TU Berlin / DFKI – Efficiently Handling Streams from Millions of Sensors 2
  3. 3. Goal Provide real-time insights based on IoT data. Jonas Traub – TU Berlin / DFKI – Efficiently Handling Streams from Millions of Sensors 3
  4. 4. Problem • Billions of devices provide real-time data • Result: Vast amount of data streams Heavy Network Utilization Scalability Challenges Increasing Latencies Financial Costs Jonas Traub – TU Berlin / DFKI – Efficiently Handling Streams from Millions of Sensors 4
  5. 5. Solution Produce and process data streams based on the data demand of applications. Jonas Traub – TU Berlin / DFKI – Efficiently Handling Streams from Millions of Sensors 5
  6. 6. State of the Art Approach Data Stream Production with Periodic Sampling Major Challenges: • Oversampling • Missing Adaptivity Jonas Traub – TU Berlin / DFKI – Efficiently Handling Streams from Millions of Sensors 6
  7. 7. Solution On-Demand Data Streaming from Sensor Nodes Optimized On-Demand Data Streaming from Sensor Nodes Jonas Traub – TU Berlin / DFKI – Efficently Handling Streams from Millions of Sensors 7
  8. 8. State of the Art Approach Provide all Data to Front-End Applications Optimized On-Demand Data Streaming from Sensor Nodes Major Challenge: • Front End Overload Jonas Traub – TU Berlin / DFKI – Efficently Handling Streams from Millions of Sensors 8
  9. 9. Solution Adaptive Data Reduction with Streaming Engines Optimized On-Demand Data Streaming from Sensor Nodes I²: Interactive Real-Time Visualization for Streaming Data Jonas Traub – TU Berlin / DFKI – Efficiently Handling Streams from Millions of Sensors 9
  10. 10. Solution Adaptive Data Reduction with Streaming Engines Optimized On-Demand Data Streaming from Sensor Nodes I²: Interactive Real-Time Visualization for Streaming Data Jonas Traub – TU Berlin / DFKI – Efficiently Handling Streams from Millions of Sensors 10
  11. 11. Solution Efficient Processing of user-defined Windows Optimized On-Demand Data Streaming from Sensor Nodes I²: Interactive Real-Time Visualization for Streaming Data Cutty: Aggregate Sharing for User-Defined Windows Jonas Traub – TU Berlin / DFKI – Efficiently Handling Streams from Millions of Sensors 11
  12. 12. Publications Jonas Traub – TU Berlin / DFKI – Efficiently Handling Streams from Millions of Sensors 12
  13. 13. Optimized On-Demand Data Streaming from Sensor Nodes Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl Santa Clara, California, September 25-27, 2017 13
  14. 14. Architecture Overview 14 Jonas Traub et al. – Optimized On-Demand Data Streaming from Sensor Nodes – ACM SoCC 2017
  15. 15. Architecture Overview 14 Jonas Traub et al. – Optimized On-Demand Data Streaming from Sensor Nodes – ACM SoCC 2017
  16. 16. Architecture Overview 14 Jonas Traub et al. – Optimized On-Demand Data Streaming from Sensor Nodes – ACM SoCC 2017
  17. 17. Architecture Overview 14 Jonas Traub et al. – Optimized On-Demand Data Streaming from Sensor Nodes – ACM SoCC 2017
  18. 18. Architecture Overview 14 Jonas Traub et al. – Optimized On-Demand Data Streaming from Sensor Nodes – ACM SoCC 2017
  19. 19. User-Defined Sampling Functions 19 • Provide an abstraction to define the data demand of applications. • Upon a sensor read, request the next sensor read. Jonas Traub et al. – Optimized On-Demand Data Streaming from Sensor Nodes – ACM SoCC 2017
  20. 20. User-Defined Sampling Functions 20 • Provide an abstraction to define the data demand of applications. • Upon a sensor read, request the next sensor read. • Make read time tolerances explicit. Jonas Traub et al. – Optimized On-Demand Data Streaming from Sensor Nodes – ACM SoCC 2017
  21. 21. User-Defined Sampling Functions 21 Enable adaptive sampling techniques to reduce data transmission e.g., Adam [Trihinas ‘15], FAST [Fan ‘14], L-SIP [Gaura ’13] Jonas Traub et al. – Optimized On-Demand Data Streaming from Sensor Nodes – ACM SoCC 2017
  22. 22. Sensor Read Fusion 22 Jonas Traub et al. – Optimized On-Demand Data Streaming from Sensor Nodes – ACM SoCC 2017
  23. 23. Sensor Read Fusion 23 1) Minimize Sensor Reads and Data Transfer: Latest possible read time 2) Optimize Sensor Read Times: ● Check the paper for all details on the read time optimizer! Jonas Traub et al. – Optimized On-Demand Data Streaming from Sensor Nodes – ACM SoCC 2017
  24. 24. 24 Jonas Traub et al. – Optimized On-Demand Data Streaming from Sensor Nodes – ACM SoCC 2017
  25. 25. Local Filtering 25 Jonas Traub et al. – Optimized On-Demand Data Streaming from Sensor Nodes – ACM SoCC 2017
  26. 26. Optimized On-Demand Data Streaming from Sensor Nodes Wrap-Up: Tailor Data Streams to the Demand of Applications • Define data demand: User-Defined Sampling Functions • Schedule sensor reads and data transfer on-demand • Optimize read times globally - for all users and queries Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl 26
  27. 27. Cutty: Aggregate Sharing for User-Defined Windows Paris Cabone, Jonas Traub, Asterios Katsifodimos, Seif Haridi, Volker Markl 27
  28. 28. Streaming Window Aggragation Paris Carbone et al. – Cutty: Aggregate Sharing for User-Defined Windows – CIKM 2017 28
  29. 29. Stream Slicing Paris Carbone et al. – Cutty: Aggregate Sharing for User-Defined Windows – CIKM 2017 29
  30. 30. Applicability of Stream Slicing Paris Carbone et al. – Cutty: Aggregate Sharing for User-Defined Windows – CIKM 2017 30
  31. 31. Yes, we can do better! Paris Carbone et al. – Cutty: Aggregate Sharing for User-Defined Windows – CIKM 2017 31
  32. 32. Cutty Overview Paris Carbone et al. – Cutty: Aggregate Sharing for User-Defined Windows – CIKM 2017 32
  33. 33. Cutty: Aggregate Sharing for User-Defined Windows Wrap-Up: Enable Stream Slicing beyond Simple Tumbling and Sliding Windows • Cutty enables Stream Slicing for a broad class of windows • Cutty combines Stream Slicing, On-the-fly Aggregation, Aggregate Sharing, and Aggregate Trees Paris Cabone, Jonas Traub, Asterios Katsifodimos, Seif Haridi, Volker Markl 33
  34. 34. I²: Interactive Real-Time Visualization for Streaming Data Jonas Traub, Nikolaas Steenbergen, Philipp Grulich, Tilmann Rabl, Volker Markl 34
  35. 35. Architecture Overview Jonas Traub et al. – I²: Interactive Real-Time Visualization for Streaming Data – EDBT 2017 35
  36. 36. Check out our Flink Forward Talk youtube.com/watch?v=JNbq239JkK4 36
  37. 37. The Big Picture Optimized On-Demand Data Streaming from Sensor Nodes Traub et al.; ACM SoCC’17 I²: Interactive Real-Time Visualization for Streaming Data Traub et al.; EDBT’17 Cutty: Aggregate Sharing for User-Defined Windows Carbone et al.; CIKM’16 Jonas Traub – TU Berlin / DFKI – Efficiently Handling Streams from Millions of Sensors

×