SlideShare a Scribd company logo
1 of 84
Download to read offline
On-Demand Data Streaming
from Sensor Nodes
(ACM SoCC 2017)
and
A quick overview of Apache Flink
Presentation at Sep. 30, 2017
University of California, Santa Barbara
About me
• Researcher and PhD candidate at
– Technische Universität Berlin (DIMA)
– German Research Center for Artificial Intelligence (DFKI) / (IAM)
• Working with Volker Markl
• Before
– Master’s degree in Computer Science (KTH Stockholm and TU Belin)
– Bachelor’s degree in Applied Computer Science (DHBW Stuttgart)
– Four years at IBM in Germany and the USA
Jonas Traub
jon@s-traub.com
Jonas.traub@tu-berlin.de
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Optimized On-Demand Data
Streaming from Sensor Nodes
Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl
Extended Talk for . .
Santa Clara, California,
September 25-27, 2017
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud
Real-time
insights
4
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud
Real-time
insights
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
5
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud
Real-time
insights
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
6
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud
Real-time
insights
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
7
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud
Real-time
insights
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
8
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud – Problems
Real-time
insights
9
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud – Problems
Real-time
insights
Streaming all data from billions
of sensors to all applications
with maximal frequencies is impossible
10
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud – Problems
Real-time
insights
Streaming all data from billions
of sensors to all applications
with maximal frequencies is impossible
Increasing data rates
require expensive
system scale-out.
11
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud – Solutions
12
Tailor Data Streams to the Demand of Applications
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud – Solutions
13
Tailor Data Streams to the Demand of Applications
• Provide an abstraction to define the data demand of applications.
User-Defined Sampling Functions (UDSFs)
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud – Solutions
14
Tailor Data Streams to the Demand of Applications
• Provide an abstraction to define the data demand of applications.
• Optimize communication costs while maintaining the result accuracy.
User-Defined Sampling Functions (UDSFs)
Read-Time Optimization
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud – Solutions
15
Tailor Data Streams to the Demand of Applications
• Provide an abstraction to define the data demand of applications.
• Optimize communication costs while maintaining the result accuracy.
• Share sensor reads and data transfer among users and queries.
User-Defined Sampling Functions (UDSFs)
Read-Time Optimization
Multi-Query / Multi-User Optimization
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A Motivating Example
16
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A Motivating Example
17
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A Motivating Example
18
Different Data Data Demands:
• Query 1 adaptively increases sampling rates when accelerating or braking.
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A Motivating Example
19
Different Data Data Demands:
• Query 1 adaptively increases sampling rates when accelerating or braking.
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A Motivating Example
20
Different Data Data Demands:
• Query 1 adaptively increases sampling rates when accelerating or braking.
• Query 2 requires a sample at least every 20 meters
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A Motivating Example
21
Different Data Data Demands:
• Query 1 adaptively increases sampling rates when accelerating or braking.
• Query 2 requires a sample at least every 20 meters
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A Motivating Example
22
Different Data Data Demands:
• Query 1 adaptively increases sampling rates when accelerating or braking.
• Query 2 requires a sample at least every 20 meters
• Query 3 requires a sample at least every 0.3s.
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A Motivating Example
23
Different Data Data Demands:
• Query 1 adaptively increases sampling rates when accelerating or braking.
• Query 2 requires a sample at least every 20 meters
• Query 3 requires a sample at least every 0.3s.
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A Motivating Example
24
Different Data Data Demands:
• Query 1 adaptively increases sampling rates when accelerating or braking.
• Query 2 requires a sample at least every 20 meters
• Query 3 requires a sample at least every 0.3s.
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A Motivating Example - Evaluation
25
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A Motivating Example - Evaluation
26
-57%
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A Motivating Example - Evaluation
27
-57%
-72%
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A Motivating Example - Evaluation
28
1/3 because 3 values per tuple
-57%
-72%
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Architecture Overview
29
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Architecture Overview
30
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Architecture Overview
31
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Architecture Overview
32
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Architecture Overview
33
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Sensor Read Scheduling
34
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
User-Defined Sampling Functions
35
Input:
Sensor read time and value
Output:
Next Sensor Read Request
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
User-Defined Sampling Functions
36
Input:
Sensor read time and value
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
User-Defined Sampling Functions
37
Enable adaptive sampling techniques to reduce data transmission
e.g., Adam [Trihinas ‘15], FAST [Fan ‘14], L-SIP [Gaura ’13]
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
User-Defined Sampling Functions - Examples
38
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
User-Defined Sampling Functions - Examples
39
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
User-Defined Sampling Functions - Examples
40
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Sensor Read Fusion
41
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Sensor Read Fusion
42
1) Minimize Sensor Reads and Data Transfer:
Latest possible read time
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Read Time Optimization
43
2) Optimize Sensor Read Times:
● Minimize penalty while executing the minimum number of sensor reads only
● Challenge: assign read requests to sensor reads
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Assigning Read Requests to Sensor Reads
44
PostponeAssign to next Read
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Assigning Read Requests to Sensor Reads
45
PostponeAssign to next Read
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Assigning Read Requests to Sensor Reads
46
PostponeAssign to next Read
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Assigning Read Requests to Sensor Reads
47
PostponeAssign to next Read
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Local Filtering
48
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Local Filtering
49
● Enable adaptive filtering in combination with adaptive sampling
● Enable model-driven data acquisition
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Local Filtering
50
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Evaluation
●Replay sensor data
- from a football match [DEBS Grand Challenge ’13]
- formula 1 telementry data
●Random UDSFs:
- Read in a poisson process (also simulate load peaks)
- In average 1 read per query per second
- Exponentially distributed read time tolerance
- high probability for small tolerances
- small probability for large tolerances
- In average 0.04s read time tolerance
51
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 52
Increasing the number of concurrent queries
• On-Demand scheduling reduces sensor reads and data transfer by up to 87%.
• The # of reads and transfers increases sub-linearly with the # of queries.
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 53
Increasing the number of concurrent queries
• Our read-time optimizer reduces the deviation from desired read times
by up to 69% (preserving the min. # of reads and transfers).
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 54
Increasing read time tolerances
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 55
Increasing read time tolerances
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 56
Query Prioritization (1/2)
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 57
Query Prioritization (2/2)
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 58
Slack Robustness of Adaptive Sampling
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Optimized On-Demand Data
Streaming from Sensor Nodes
Wrap-Up:
Tailor Data Streams to the Demand of Applications
• Define data demand: User-Defined Sampling Functions
• Schedule sensor reads and data transfer on-demand
• Optimize read times globally - for all users and queries
Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl
Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
A quick overview of Apache Flink
- research summary -
Jonas Traub visiting September 30, 2017
Outline
Apache Flink Primer
• Stratosphere – The origin of Apache Flink
• What is Apache Flink? – Basic System Internals
• The Flink Community
An Apache Flink Research Summary
61
62 © Volker Markl
• Relational Algebra
• Declarativity
• Query Optimization
• Robust Out-of-core
• Scalability
• User-defined
Functions
• Complex Data Types
• Schema on Read
62
Draws on
Database Technology
Draws on
MapReduce Technology
Stratosphere: General Purpose
Programming + Database Execution
63 © Volker Markl
• Relational Algebra
• Declarativity
• Query Optimization
• Robust Out-of-core
• Scalability
• User-defined
Functions
• Complex Data Types
• Schema on Read
• Iterations
• Advanced Dataflows
• General APIs
• Native Streaming
63
Draws on
Database Technology
Draws on
MapReduce Technology
Adds
Stratosphere: General Purpose
Programming + Database Execution
64 © Volker Markl
64
64 © Volker Markl
Apache Flink is an open source platform for scalable batch and stream
data processing.
What is Apache Flink?
http://flink.apache.org
A distributed system that you can use to
process data
Like a DBMS but not exactly a DBMS
What kind of data?
Data that comes in the form of streams
What kind of processing
Quite flexible. You can use Java/Scala APIs
similar to programming with Java
collections, the new SQL API, etc
Distributed: runs on many (1000s) of machines
and hides this complexity from the user
64
65 © Volker Markl
65
65 © Volker Markl
Basic application architecture
app state
app state
app state
event log
Query
service
Sources of data
(e.g., sensors,
logs, …)
A replayable log of
events with pub/sub
functionality
Processing of
events
Storage and query systems
By courtesy of Kostas Tzoumas
65
66 © Volker Markl
66 © 2013 Berlin Big Data Center • All Rights Reserved
66 © Volker Markl
Technology inside Flink
case class Path (from: Long, to:
Long)
val tc = edges.iterate(10) {
paths: DataSet[Path] =>
val next = paths
.join(edges)
.where("to")
.equalTo("from") {
(path, edge) =>
Path(path.from, edge.to)
}
.union(paths)
.distinct()
next
}
Program
67 © Volker Markl
67 © 2013 Berlin Big Data Center • All Rights Reserved
67 © Volker Markl
Technology inside Flink
case class Path (from: Long, to:
Long)
val tc = edges.iterate(10) {
paths: DataSet[Path] =>
val next = paths
.join(edges)
.where("to")
.equalTo("from") {
(path, edge) =>
Path(path.from, edge.to)
}
.union(paths)
.distinct()
next
}
Cost-based
optimizer
Type extraction
stack
Pre-flight (Client)
Program
68 © Volker Markl
68 © 2013 Berlin Big Data Center • All Rights Reserved
68 © Volker Markl
Technology inside Flink
case class Path (from: Long, to:
Long)
val tc = edges.iterate(10) {
paths: DataSet[Path] =>
val next = paths
.join(edges)
.where("to")
.equalTo("from") {
(path, edge) =>
Path(path.from, edge.to)
}
.union(paths)
.distinct()
next
}
Cost-based
optimizer
Type extraction
stack
Pre-flight (Client)
DataSourc
e
orders.tbl
Filter
Map
DataSourc
e
lineitem.tbl
Join
Hybrid Hash
build
HT
probe
hash-part [0] hash-part [0]
GroupRed
sort
forward
Program
Dataflow
Graph
69 © Volker Markl
69 © 2013 Berlin Big Data Center • All Rights Reserved
69 © Volker Markl
Technology inside Flink
case class Path (from: Long, to:
Long)
val tc = edges.iterate(10) {
paths: DataSet[Path] =>
val next = paths
.join(edges)
.where("to")
.equalTo("from") {
(path, edge) =>
Path(path.from, edge.to)
}
.union(paths)
.distinct()
next
}
Cost-based
optimizer
Type extraction
stack
Task
scheduling
Recovery
metadata
Pre-flight (Client)
Master
Workers
DataSourc
e
orders.tbl
Filter
Map
DataSourc
e
lineitem.tbl
Join
Hybrid Hash
build
HT
probe
hash-part [0] hash-part [0]
GroupRed
sort
forward
Program
Dataflow
Graph
deploy
operators
track
intermediate
results
70 © Volker Markl
70
70 © Volker Markl
Flink community
0
50
100
150
200
250
300
Feb 15 Dec 15 Dec 16
Number of
Contributors
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Feb 15 Dec 15 Dec 16
Stars on GitHub
0
200
400
600
800
1000
1200
1400
Feb 15 Dec 15 Dec 16
Forks on GitHub
By courtesy of Kostas Tzoumas
Project Statistics (Updated: Sep 29, 2017)
70
71 © Volker Markl
71
71 © Volker Markl
Companies Using Flink
Apache Flink - Related Publications
System Paper 2015:
System Paper 2014:
72
Apache Flink - Related Publications
System Paper 2015:
System Paper 2014:
73
Apache Flink - Related Publications
System Paper 2015:
System Paper 2014:
74
State Management
VLDB 2017
75
Iterative Processing
VLDB 2012
76
Iterative Processing
VLDB 2012
SIGMOD 2013 77
Fault Tolerance
78
Fault Tolerance
79
Fault Tolerance
80
Fault Tolerance
81
Streaming Window Aggregation
82
Visualization of Streaming Data
83
On-Demand Data Streaming
from Sensor Nodes
(ACM SoCC 2017)
and
A quick overview of Apache Flink
Presentation at Sep. 30, 2017
University of California, Santa Barbara

More Related Content

Viewers also liked

Text and text stream mining tutorial
Text and text stream mining tutorialText and text stream mining tutorial
Text and text stream mining tutorial
mgrcar
 
Presentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecPresentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresec
Tiago Henriques
 

Viewers also liked (14)

Info vis 4-22-2013-dc-vis-meetup-shneiderman
Info vis 4-22-2013-dc-vis-meetup-shneidermanInfo vis 4-22-2013-dc-vis-meetup-shneiderman
Info vis 4-22-2013-dc-vis-meetup-shneiderman
 
What Is Visualization?
What Is Visualization?What Is Visualization?
What Is Visualization?
 
An Introduction to Evaluation in Medical Visualization
An Introduction to Evaluation in Medical VisualizationAn Introduction to Evaluation in Medical Visualization
An Introduction to Evaluation in Medical Visualization
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
Text and text stream mining tutorial
Text and text stream mining tutorialText and text stream mining tutorial
Text and text stream mining tutorial
 
Presentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecPresentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresec
 
Towards Utilizing GPUs in Information Visualization
Towards Utilizing GPUs in Information VisualizationTowards Utilizing GPUs in Information Visualization
Towards Utilizing GPUs in Information Visualization
 
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12cProcessing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
 
Web 2 0 Projects Elementary
Web 2 0 Projects ElementaryWeb 2 0 Projects Elementary
Web 2 0 Projects Elementary
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
 
Theius: A Streaming Visualization Suite for Hadoop Clusters
Theius: A Streaming Visualization Suite for Hadoop ClustersTheius: A Streaming Visualization Suite for Hadoop Clusters
Theius: A Streaming Visualization Suite for Hadoop Clusters
 
Information Visualization for Medical Informatics
Information Visualization for Medical Informatics Information Visualization for Medical Informatics
Information Visualization for Medical Informatics
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan
 

Similar to JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
Jonas Traub
 
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Jonas Traub
 

Similar to JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink (20)

UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
 
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
 
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
 
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
 
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
 
A Survey of Adaptive Sampling and Filtering Algorithms for the Internet of Th...
A Survey of Adaptive Sampling and Filtering Algorithms for the Internet of Th...A Survey of Adaptive Sampling and Filtering Algorithms for the Internet of Th...
A Survey of Adaptive Sampling and Filtering Algorithms for the Internet of Th...
 
From Sensors to Servers
From Sensors to ServersFrom Sensors to Servers
From Sensors to Servers
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
JPN1406 Snapshot and Continuous Data Collection in Probabilistic Wireless S...
JPN1406   Snapshot and Continuous Data Collection in Probabilistic Wireless S...JPN1406   Snapshot and Continuous Data Collection in Probabilistic Wireless S...
JPN1406 Snapshot and Continuous Data Collection in Probabilistic Wireless S...
 
Tdtd-Edr: Time Orient Delay Tolerant Density Estimation Technique Based Data ...
Tdtd-Edr: Time Orient Delay Tolerant Density Estimation Technique Based Data ...Tdtd-Edr: Time Orient Delay Tolerant Density Estimation Technique Based Data ...
Tdtd-Edr: Time Orient Delay Tolerant Density Estimation Technique Based Data ...
 
Improvising Network life time of Wireless sensor networks using mobile data a...
Improvising Network life time of Wireless sensor networks using mobile data a...Improvising Network life time of Wireless sensor networks using mobile data a...
Improvising Network life time of Wireless sensor networks using mobile data a...
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
ENSEIRB - Advanced Project
ENSEIRB - Advanced ProjectENSEIRB - Advanced Project
ENSEIRB - Advanced Project
 
P1121106496
P1121106496P1121106496
P1121106496
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
 
Omid Badretale Low-Dose CT noise reduction
Omid Badretale Low-Dose CT noise reductionOmid Badretale Low-Dose CT noise reduction
Omid Badretale Low-Dose CT noise reduction
 
Query optimization for_sensor_networks
Query optimization for_sensor_networksQuery optimization for_sensor_networks
Query optimization for_sensor_networks
 
Phd
PhdPhd
Phd
 

More from Jonas Traub

code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
Jonas Traub
 
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
Jonas Traub
 
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Jonas Traub
 
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Jonas Traub
 
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingFlink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Jonas Traub
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Jonas Traub
 

More from Jonas Traub (14)

Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
 
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
 
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
 
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
 
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
 
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
 
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingFlink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
 
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCL
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
 
I²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataI²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming Data
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)
 
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
 

Recently uploaded

Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
drm1699
 

Recently uploaded (20)

Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST API
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with Links
 
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
 
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdf
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
 
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
 
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
 
Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdf
 
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
 
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
 
Rapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and InsightsRapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and Insights
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
 
[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)
 

JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

  • 1. On-Demand Data Streaming from Sensor Nodes (ACM SoCC 2017) and A quick overview of Apache Flink Presentation at Sep. 30, 2017 University of California, Santa Barbara
  • 2. About me • Researcher and PhD candidate at – Technische Universität Berlin (DIMA) – German Research Center for Artificial Intelligence (DFKI) / (IAM) • Working with Volker Markl • Before – Master’s degree in Computer Science (KTH Stockholm and TU Belin) – Bachelor’s degree in Applied Computer Science (DHBW Stuttgart) – Four years at IBM in Germany and the USA Jonas Traub jon@s-traub.com Jonas.traub@tu-berlin.de
  • 3. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Optimized On-Demand Data Streaming from Sensor Nodes Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl Extended Talk for . . Santa Clara, California, September 25-27, 2017
  • 4. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud Real-time insights 4
  • 5. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud Real-time insights Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems. 5
  • 6. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud Real-time insights Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems. 6
  • 7. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud Real-time insights Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems. 7
  • 8. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud Real-time insights Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems. 8
  • 9. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud – Problems Real-time insights 9 Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems.
  • 10. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud – Problems Real-time insights Streaming all data from billions of sensors to all applications with maximal frequencies is impossible 10 Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems.
  • 11. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud – Problems Real-time insights Streaming all data from billions of sensors to all applications with maximal frequencies is impossible Increasing data rates require expensive system scale-out. 11 Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems.
  • 12. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud – Solutions 12 Tailor Data Streams to the Demand of Applications
  • 13. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud – Solutions 13 Tailor Data Streams to the Demand of Applications • Provide an abstraction to define the data demand of applications. User-Defined Sampling Functions (UDSFs)
  • 14. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud – Solutions 14 Tailor Data Streams to the Demand of Applications • Provide an abstraction to define the data demand of applications. • Optimize communication costs while maintaining the result accuracy. User-Defined Sampling Functions (UDSFs) Read-Time Optimization
  • 15. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud – Solutions 15 Tailor Data Streams to the Demand of Applications • Provide an abstraction to define the data demand of applications. • Optimize communication costs while maintaining the result accuracy. • Share sensor reads and data transfer among users and queries. User-Defined Sampling Functions (UDSFs) Read-Time Optimization Multi-Query / Multi-User Optimization
  • 16. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A Motivating Example 16
  • 17. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A Motivating Example 17
  • 18. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A Motivating Example 18 Different Data Data Demands: • Query 1 adaptively increases sampling rates when accelerating or braking.
  • 19. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A Motivating Example 19 Different Data Data Demands: • Query 1 adaptively increases sampling rates when accelerating or braking.
  • 20. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A Motivating Example 20 Different Data Data Demands: • Query 1 adaptively increases sampling rates when accelerating or braking. • Query 2 requires a sample at least every 20 meters
  • 21. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A Motivating Example 21 Different Data Data Demands: • Query 1 adaptively increases sampling rates when accelerating or braking. • Query 2 requires a sample at least every 20 meters
  • 22. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A Motivating Example 22 Different Data Data Demands: • Query 1 adaptively increases sampling rates when accelerating or braking. • Query 2 requires a sample at least every 20 meters • Query 3 requires a sample at least every 0.3s.
  • 23. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A Motivating Example 23 Different Data Data Demands: • Query 1 adaptively increases sampling rates when accelerating or braking. • Query 2 requires a sample at least every 20 meters • Query 3 requires a sample at least every 0.3s.
  • 24. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A Motivating Example 24 Different Data Data Demands: • Query 1 adaptively increases sampling rates when accelerating or braking. • Query 2 requires a sample at least every 20 meters • Query 3 requires a sample at least every 0.3s.
  • 25. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A Motivating Example - Evaluation 25
  • 26. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A Motivating Example - Evaluation 26 -57%
  • 27. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A Motivating Example - Evaluation 27 -57% -72%
  • 28. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A Motivating Example - Evaluation 28 1/3 because 3 values per tuple -57% -72%
  • 29. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Architecture Overview 29
  • 30. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Architecture Overview 30
  • 31. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Architecture Overview 31
  • 32. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Architecture Overview 32
  • 33. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Architecture Overview 33
  • 34. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Sensor Read Scheduling 34
  • 35. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 User-Defined Sampling Functions 35 Input: Sensor read time and value Output: Next Sensor Read Request
  • 36. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 User-Defined Sampling Functions 36 Input: Sensor read time and value
  • 37. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 User-Defined Sampling Functions 37 Enable adaptive sampling techniques to reduce data transmission e.g., Adam [Trihinas ‘15], FAST [Fan ‘14], L-SIP [Gaura ’13]
  • 38. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 User-Defined Sampling Functions - Examples 38
  • 39. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 User-Defined Sampling Functions - Examples 39
  • 40. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 User-Defined Sampling Functions - Examples 40
  • 41. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Sensor Read Fusion 41
  • 42. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Sensor Read Fusion 42 1) Minimize Sensor Reads and Data Transfer: Latest possible read time
  • 43. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Read Time Optimization 43 2) Optimize Sensor Read Times: ● Minimize penalty while executing the minimum number of sensor reads only ● Challenge: assign read requests to sensor reads
  • 44. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Assigning Read Requests to Sensor Reads 44 PostponeAssign to next Read
  • 45. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Assigning Read Requests to Sensor Reads 45 PostponeAssign to next Read
  • 46. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Assigning Read Requests to Sensor Reads 46 PostponeAssign to next Read
  • 47. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Assigning Read Requests to Sensor Reads 47 PostponeAssign to next Read
  • 48. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Local Filtering 48
  • 49. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Local Filtering 49 ● Enable adaptive filtering in combination with adaptive sampling ● Enable model-driven data acquisition
  • 50. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Local Filtering 50
  • 51. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Evaluation ●Replay sensor data - from a football match [DEBS Grand Challenge ’13] - formula 1 telementry data ●Random UDSFs: - Read in a poisson process (also simulate load peaks) - In average 1 read per query per second - Exponentially distributed read time tolerance - high probability for small tolerances - small probability for large tolerances - In average 0.04s read time tolerance 51
  • 52. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 52 Increasing the number of concurrent queries • On-Demand scheduling reduces sensor reads and data transfer by up to 87%. • The # of reads and transfers increases sub-linearly with the # of queries.
  • 53. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 53 Increasing the number of concurrent queries • Our read-time optimizer reduces the deviation from desired read times by up to 69% (preserving the min. # of reads and transfers).
  • 54. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 54 Increasing read time tolerances
  • 55. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 55 Increasing read time tolerances
  • 56. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 56 Query Prioritization (1/2)
  • 57. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 57 Query Prioritization (2/2)
  • 58. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 58 Slack Robustness of Adaptive Sampling
  • 59. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Optimized On-Demand Data Streaming from Sensor Nodes Wrap-Up: Tailor Data Streams to the Demand of Applications • Define data demand: User-Defined Sampling Functions • Schedule sensor reads and data transfer on-demand • Optimize read times globally - for all users and queries Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl
  • 60. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 A quick overview of Apache Flink - research summary - Jonas Traub visiting September 30, 2017
  • 61. Outline Apache Flink Primer • Stratosphere – The origin of Apache Flink • What is Apache Flink? – Basic System Internals • The Flink Community An Apache Flink Research Summary 61
  • 62. 62 © Volker Markl • Relational Algebra • Declarativity • Query Optimization • Robust Out-of-core • Scalability • User-defined Functions • Complex Data Types • Schema on Read 62 Draws on Database Technology Draws on MapReduce Technology Stratosphere: General Purpose Programming + Database Execution
  • 63. 63 © Volker Markl • Relational Algebra • Declarativity • Query Optimization • Robust Out-of-core • Scalability • User-defined Functions • Complex Data Types • Schema on Read • Iterations • Advanced Dataflows • General APIs • Native Streaming 63 Draws on Database Technology Draws on MapReduce Technology Adds Stratosphere: General Purpose Programming + Database Execution
  • 64. 64 © Volker Markl 64 64 © Volker Markl Apache Flink is an open source platform for scalable batch and stream data processing. What is Apache Flink? http://flink.apache.org A distributed system that you can use to process data Like a DBMS but not exactly a DBMS What kind of data? Data that comes in the form of streams What kind of processing Quite flexible. You can use Java/Scala APIs similar to programming with Java collections, the new SQL API, etc Distributed: runs on many (1000s) of machines and hides this complexity from the user 64
  • 65. 65 © Volker Markl 65 65 © Volker Markl Basic application architecture app state app state app state event log Query service Sources of data (e.g., sensors, logs, …) A replayable log of events with pub/sub functionality Processing of events Storage and query systems By courtesy of Kostas Tzoumas 65
  • 66. 66 © Volker Markl 66 © 2013 Berlin Big Data Center • All Rights Reserved 66 © Volker Markl Technology inside Flink case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Program
  • 67. 67 © Volker Markl 67 © 2013 Berlin Big Data Center • All Rights Reserved 67 © Volker Markl Technology inside Flink case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Cost-based optimizer Type extraction stack Pre-flight (Client) Program
  • 68. 68 © Volker Markl 68 © 2013 Berlin Big Data Center • All Rights Reserved 68 © Volker Markl Technology inside Flink case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Cost-based optimizer Type extraction stack Pre-flight (Client) DataSourc e orders.tbl Filter Map DataSourc e lineitem.tbl Join Hybrid Hash build HT probe hash-part [0] hash-part [0] GroupRed sort forward Program Dataflow Graph
  • 69. 69 © Volker Markl 69 © 2013 Berlin Big Data Center • All Rights Reserved 69 © Volker Markl Technology inside Flink case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Cost-based optimizer Type extraction stack Task scheduling Recovery metadata Pre-flight (Client) Master Workers DataSourc e orders.tbl Filter Map DataSourc e lineitem.tbl Join Hybrid Hash build HT probe hash-part [0] hash-part [0] GroupRed sort forward Program Dataflow Graph deploy operators track intermediate results
  • 70. 70 © Volker Markl 70 70 © Volker Markl Flink community 0 50 100 150 200 250 300 Feb 15 Dec 15 Dec 16 Number of Contributors 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Feb 15 Dec 15 Dec 16 Stars on GitHub 0 200 400 600 800 1000 1200 1400 Feb 15 Dec 15 Dec 16 Forks on GitHub By courtesy of Kostas Tzoumas Project Statistics (Updated: Sep 29, 2017) 70
  • 71. 71 © Volker Markl 71 71 © Volker Markl Companies Using Flink
  • 72. Apache Flink - Related Publications System Paper 2015: System Paper 2014: 72
  • 73. Apache Flink - Related Publications System Paper 2015: System Paper 2014: 73
  • 74. Apache Flink - Related Publications System Paper 2015: System Paper 2014: 74
  • 84. On-Demand Data Streaming from Sensor Nodes (ACM SoCC 2017) and A quick overview of Apache Flink Presentation at Sep. 30, 2017 University of California, Santa Barbara