SlideShare a Scribd company logo
Data Streaming in
Big Data Analysis
Docent lecture
Vincenzo Gulisano, Ph.D.
Vincenzo Gulisano Data streaming in Big Data analysis 1
Vincenzo Gulisano Data streaming in Big Data analysis 2
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 3
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 4
... back in the year 2000
• Continuous processing of data
streams
• Real-time fashion
... Store-then-process is not feasible
Vincenzo Gulisano Data streaming in Big Data analysis 5
Financial applications
Sensor networks
ISPs
~ 2010
Vincenzo Gulisano Data streaming in Big Data analysis 6
2017
Vincenzo Gulisano Data streaming in Big Data analysis 7
Advanced Metering Infrastructures
Vehicular Networks
1. Billions of readings per day cannot be
transferred continuously
2. The latency incurred while transferring data
might undermine the utility of the analysis
3. It is not secure to concentrate all the data in
a single place
4. Privacy can be leaked when giving away
fine-grained data
What do we need then?
• Efficient one-pass analysis
• In memory
• Bounded resources
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 8
Main Memory
Disk
1 Data
Query Processing
3 Query
results
2 Query
Main Memory
Query Processing
Continuous
Query
Data
Query
results
9Data streaming in Big Data analysisVincenzo Gulisano
DBMS vs. DSMS
data stream: unbounded sequence of tuples sharing the same schema
10
Example: vehicles’ speed and position reports
time
Field Field
vehicle id text
time (secs) text
speed (Km/h) double
X coordinate double
Y coordinate double
A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
Data streaming in Big Data analysisVincenzo Gulisano
continuous query: Directed Acyclic Graph (DAG) of streams and operators
11
OP
OP
OP
OP OP
OP OP
source op
(1+ out streams)
sink op
(1+ in streams)
stream
op
(1+ in, 1+ out streams)
Data streaming in Big Data analysisVincenzo Gulisano
data streaming operators
• Stateless operators
• do not maintain any state
• one-by-one processing
• Stateful operators
• maintain a state that evolves with the tuples being processed
• produce output tuples that depend on multiple input tuples
12
OP
OP
Data streaming in Big Data analysisVincenzo Gulisano
stateless operators
13
Filter
...
Map
Union
...
Filter / route tuples based on one (or more) conditions
Transform each tuple
Merge multiple streams (with the same schema) into one
Data streaming in Big Data analysisVincenzo Gulisano
stateful operators
14
Aggregate information from multiple tuples
(e.g., compute average speed of the tuples in the last hour)
Compare tuples coming from 2 streams given a certain predicate
(e.g., given the last 5 tuples from each stream, join every pair
reporting the same position)
Aggregate
Join
Data streaming in Big Data analysisVincenzo Gulisano
Since streams are unbounded, windows (over time or tuples)
are defined to bound the portion of tuples to aggregate or join
sample query
For each vehicle, raise an alert if the speed of the latest report is more
than 2 times higher than its average speed in the last 30 days.
15
time
A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
Data streaming in Big Data analysisVincenzo Gulisano
16
Field
vehicle id
time (secs)
speed (Km/h)
X coordinate
Y coordinate
Compute average
speed for each
vehicle during the
last 30 days
Aggregate
Field
vehicle id
time (secs)
avg speed (Km/h)
Join
Check
condition
Filter
Field
vehicle id
time (secs)
speed (Km/h)
Join on
vehicle id
Field
vehicle id
time (secs)
avg speed (Km/h)
speed (Km/h)
sample query
Data streaming in Big Data analysisVincenzo Gulisano
A B C
A B C
A B C
A B C
B
C
A
A
A
B
B
Vincenzo Gulisano Data streaming in Big Data analysis 17
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 18
Faulttolerance
Elasticity
Loadbalancing
Determinism
Parallel execution of streaming operators
Vincenzo Gulisano Data streaming in Big Data analysis 19
Parallel execution of streaming applications
OP1 OP2
OP1 OP2
OP1 OP2
=
1) how to route tuples?
2) where to route tuples?
3) How to merge tuples?
4) How many instances to
deploy per operator?
...
Faulttolerance
Elasticity
Loadbalancing
Determinism
Parallel execution of streaming operators
Vincenzo Gulisano Data streaming in Big Data analysis 20
Parallel execution of streaming applications
DDoSdetection
andmitigation
Intrusiondetection
Datavalidation
Differentially
privateaggregation
Vehicularnetworks
analysis
Urbanmobility
analysis
Security and
privacy
IoT
Transportation
sustainability
Synchronization / Data structures
Faulttolerance
Elasticity
Loadbalancing
Determinism
Parallel execution of streaming operators
Vincenzo Gulisano Data streaming in Big Data analysis 21
Parallel execution of streaming applications
Many-core systems / FPGAs
DDoSdetection
andmitigation
Intrusiondetection
Datavalidation
Differentially
privateaggregation
Vehicularnetworks
analysis
Urbanmobility
analysis
Parallel joins
Parallel
aggregates
Joins
modeling
Security and
privacy
IoT
Transportation
sustainability
Vincenzo Gulisano Data streaming in Big Data analysis 22
Synchronization / Data structures
Faulttolerance
Elasticity
Loadbalancing
DeterminismParallel execution of streaming operators
Parallel execution of streaming applications
Many-core systems / FPGAs
DDoSdetection
andmitigation
Intrusiondetection
Datavalidation
Differentially
privateaggregation
Vehicularnetworks
analysis
Urbanmobility
analysis
Security and
privacy
Transportatio
sustainabilit
Distributed execution of streaming
applications
first the hardware, then the query
first the query, then the hardware
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 23
Millions of
sensors
• Store information
• Iterate multiple times over data
• Think, do not rush through decisions
• ”Hard-wired” routines
• Real-time decisions
• High-throughput / low-latency
Do I really need to
try surströmming?
(... NO)
Danger!!!
Run!!!
(surströmming
can opened)
Vincenzo Gulisano Data streaming in Big Data analysis 24
What traffic
congestion
patterns can I
observe
frequently?
Don’t take
over, car in
opposite lane!
Vincenzo Gulisano Data streaming in Big Data analysis 25
• Store information
• Iterate multiple times over data
• Think, do not rush through decisions
• ”Hard-wired” routines
• Real-time decisions
• High-throughput / low-latency
Millions of
sensors
+ + +
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 26
Bibliography
1. Zhou, Jiazhen, Rose Qingyang Hu, and Yi Qian. "Scalable distributed communication architectures to support advanced
metering infrastructure in smart grid." IEEE Transactions on Parallel and Distributed Systems 23.9 (2012): 1632-1642.
2. Gulisano, Vincenzo, et al. "BES: Differentially Private and Distributed Event Aggregation in Advanced Metering Infrastructures."
Proceedings of the 2nd ACM International Workshop on Cyber-Physical System Security. ACM, 2016.
3. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "Online and scalable data validation in advanced metering
infrastructures." IEEE PES Innovative Smart Grid Technologies, Europe. IEEE, 2014.
4. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "METIS: a two-tier intrusion detection system for advanced
metering infrastructures." International Conference on Security and Privacy in Communication Systems. Springer International
Publishing, 2014.
5. Yousefi, Saleh, Mahmoud Siadat Mousavi, and Mahmood Fathy. "Vehicular ad hoc networks (VANETs): challenges and
perspectives." 2006 6th International Conference on ITS Telecommunications. IEEE, 2006.
6. El Zarki, Magda, et al. "Security issues in a future vehicular network." European Wireless. Vol. 2. 2002.
7. Georgiadis, Giorgos, and Marina Papatriantafilou. "Dealing with storage without forecasts in smart grids: Problem
transformation and online scheduling algorithm." Proceedings of the 29th Annual ACM Symposium on Applied Computing.
ACM, 2014.
8. Fu, Zhang, et al. "Online temporal-spatial analysis for detection of critical events in Cyber-Physical Systems." Big Data (Big
Data), 2014 IEEE International Conference on. IEEE, 2014.
Data streaming in Big Data analysis 27Vincenzo Gulisano
Bibliography
9. Arasu, Arvind, et al. "Linear road: a stream data management benchmark." Proceedings of the Thirtieth international
conference on Very large data bases-Volume 30. VLDB Endowment, 2004.
10. Lv, Yisheng, et al. "Traffic flow prediction with big data: a deep learning approach." IEEE Transactions on Intelligent
Transportation Systems 16.2 (2015): 865-873.
11. Grochocki, David, et al. "AMI threats, intrusion detection requirements and deployment recommendations." Smart Grid
Communications (SmartGridComm), 2012 IEEE Third International Conference on. IEEE, 2012.
12. Molina-Markham, Andrés, et al. "Private memoirs of a smart meter." Proceedings of the 2nd ACM workshop on embedded
sensing systems for energy-efficiency in building. ACM, 2010.
13. Gulisano, Vincenzo, et al. "Streamcloud: A large scale data streaming system." Distributed Computing Systems (ICDCS), 2010
IEEE 30th International Conference on. IEEE, 2010.
14. Stonebraker, Michael, Uǧur Çetintemel, and Stan Zdonik. "The 8 requirements of real-time stream processing." ACM SIGMOD
Record 34.4 (2005): 42-47.
15. Bonomi, Flavio, et al. "Fog computing and its role in the internet of things." Proceedings of the first edition of the MCC
workshop on Mobile cloud computing. ACM, 2012.
Data streaming in Big Data analysis 28Vincenzo Gulisano
Bibliography
16. Gulisano, Vincenzo Massimiliano. StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine. Diss. Informatica,
2012.
17. Cardellini, Valeria, et al. "Optimal operator placement for distributed stream processing applications." Proceedings of the 10th
ACM International Conference on Distributed and Event-based Systems. ACM, 2016.
18. Costache, Stefania, et al. "Understanding the Data-Processing Challenges in Intelligent Vehicular Systems." Proceedings of the
2016 IEEE Intelligent Vehicles Symposium (IV16).
19. Giatrakos, Nikos, Antonios Deligiannakis, and Minos Garofalakis. "Scalable Approximate Query Tracking over Highly Distributed
Data Streams." Proceedings of the 2016 International Conference on Management of Data. ACM, 2016.
20. Gulisano, Vincenzo, et al. "Streamcloud: An elastic and scalable data streaming system." IEEE Transactions on Parallel and
Distributed Systems 23.12 (2012): 2351-2365.
21. Shah, Mehul A., et al. "Flux: An adaptive partitioning operator for continuous query systems." Data Engineering, 2003.
Proceedings. 19th International Conference on. IEEE, 2003.
Data streaming in Big Data analysis 29Vincenzo Gulisano
Bibliography
22. Cederman, Daniel, et al. "Brief announcement: concurrent data structures for efficient streaming aggregation." Proceedings of
the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 2014.
23. Ji, Yuanzhen, et al. "Quality-driven processing of sliding window aggregates over out-of-order data streams." Proceedings of
the 9th ACM International Conference on Distributed Event-Based Systems. ACM, 2015.
24. Ji, Yuanzhen, et al. "Quality-driven disorder handling for concurrent windowed stream queries with shared operators."
Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. ACM, 2016.
25. Gulisano, Vincenzo, et al. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." Big Data (Big Data), 2015
IEEE International Conference on. IEEE, 2015.
26. Ottenwälder, Beate, et al. "MigCEP: operator migration for mobility driven distributed complex event processing." Proceedings
of the 7th ACM international conference on Distributed event-based systems. ACM, 2013.
27. De Matteis, Tiziano, and Gabriele Mencagli. "Keep calm and react with foresight: strategies for low-latency and energy-efficient
elastic data stream processing." Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming. ACM, 2016.
28. Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." ACM Transactions on
Database Systems (TODS) 33.1 (2008): 3.
29. Castro Fernandez, Raul, et al. "Integrating scale out and fault tolerance in stream processing using operator state
management." Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 2013.
Data streaming in Big Data analysis 30Vincenzo Gulisano
Bibliography
30. Dwork, Cynthia. "Differential privacy: A survey of results." International Conference on Theory and Applications of Models of
Computation. Springer Berlin Heidelberg, 2008.
31. Dwork, Cynthia, et al. "Differential privacy under continual observation." Proceedings of the forty-second ACM symposium on
Theory of computing. ACM, 2010.
32. Kargl, Frank, Arik Friedman, and Roksana Boreli. "Differential privacy in intelligent transportation systems." Proceedings of the
sixth ACM conference on Security and privacy in wireless and mobile networks. ACM, 2013.
Data streaming in Big Data analysis 31Vincenzo Gulisano

More Related Content

What's hot

Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
Krish_ver2
 
Hadoop
HadoopHadoop
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Haluan Irsad
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing
Luay AL-Assadi
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
Slideshare
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
Leandro Totino Pereira
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
tutorialvillage
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
Mohit Saini
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ankur bhalla
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
Krish_ver2
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
Krish_ver2
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
Ashraf Uddin
 

What's hot (20)

Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 

Similar to Data Streaming in Big Data Analysis

Data Streaming in IoT and Big Data Analytics
Data Streaming in  IoT and Big Data AnalyticsData Streaming in  IoT and Big Data Analytics
Data Streaming in IoT and Big Data Analytics
Vincenzo Gulisano
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
Vincenzo Gulisano
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
Vincenzo Gulisano
 
CINET: A Cyber-Infrastructure for Network Science Overview
CINET: A Cyber-Infrastructure for Network Science OverviewCINET: A Cyber-Infrastructure for Network Science Overview
CINET: A Cyber-Infrastructure for Network Science Overview
Biocomplexity Institute of Virginia Tech
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
FogGuru MSCA Project
 
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Demetris Trihinas
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things
PayamBarnaghi
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
Guido Schmutz
 
Smart Cities: How are they different?
Smart Cities: How are they different? Smart Cities: How are they different?
Smart Cities: How are they different?
PayamBarnaghi
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
STS FORUM 2016
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thessaloniki
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Stavros Kontopoulos
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in Highways
York University
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept
Miha Ahronovitz
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City Applications
Amit Sheth
 
Semantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream DataSemantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream Data
Oscar Corcho
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
Raja Chiky
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
Biniam Asnake
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
Boris Adryan
 
Physical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPhysical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City Applications
PayamBarnaghi
 

Similar to Data Streaming in Big Data Analysis (20)

Data Streaming in IoT and Big Data Analytics
Data Streaming in  IoT and Big Data AnalyticsData Streaming in  IoT and Big Data Analytics
Data Streaming in IoT and Big Data Analytics
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
 
CINET: A Cyber-Infrastructure for Network Science Overview
CINET: A Cyber-Infrastructure for Network Science OverviewCINET: A Cyber-Infrastructure for Network Science Overview
CINET: A Cyber-Infrastructure for Network Science Overview
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
 
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Smart Cities: How are they different?
Smart Cities: How are they different? Smart Cities: How are they different?
Smart Cities: How are they different?
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in Highways
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City Applications
 
Semantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream DataSemantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream Data
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
 
Physical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPhysical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City Applications
 

More from Vincenzo Gulisano

Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data StreamingTutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Vincenzo Gulisano
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
Vincenzo Gulisano
 
Strel streaming
Strel streamingStrel streaming
Strel streaming
Vincenzo Gulisano
 
Performance Modeling of Stream Joins
Performance Modeling of Stream JoinsPerformance Modeling of Stream Joins
Performance Modeling of Stream Joins
Vincenzo Gulisano
 
The data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architecturesThe data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architectures
Vincenzo Gulisano
 
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream JoinScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
Vincenzo Gulisano
 
The benefits of fine-grained synchronization in deterministic and efficient ...
The benefits of fine-grained synchronization in  deterministic and efficient ...The benefits of fine-grained synchronization in  deterministic and efficient ...
The benefits of fine-grained synchronization in deterministic and efficient ...
Vincenzo Gulisano
 

More from Vincenzo Gulisano (7)

Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data StreamingTutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
Strel streaming
Strel streamingStrel streaming
Strel streaming
 
Performance Modeling of Stream Joins
Performance Modeling of Stream JoinsPerformance Modeling of Stream Joins
Performance Modeling of Stream Joins
 
The data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architecturesThe data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architectures
 
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream JoinScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
 
The benefits of fine-grained synchronization in deterministic and efficient ...
The benefits of fine-grained synchronization in  deterministic and efficient ...The benefits of fine-grained synchronization in  deterministic and efficient ...
The benefits of fine-grained synchronization in deterministic and efficient ...
 

Recently uploaded

ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
SSR02
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 

Recently uploaded (20)

ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 

Data Streaming in Big Data Analysis

  • 1. Data Streaming in Big Data Analysis Docent lecture Vincenzo Gulisano, Ph.D. Vincenzo Gulisano Data streaming in Big Data analysis 1
  • 2. Vincenzo Gulisano Data streaming in Big Data analysis 2
  • 3. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 3
  • 4. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 4
  • 5. ... back in the year 2000 • Continuous processing of data streams • Real-time fashion ... Store-then-process is not feasible Vincenzo Gulisano Data streaming in Big Data analysis 5 Financial applications Sensor networks ISPs
  • 6. ~ 2010 Vincenzo Gulisano Data streaming in Big Data analysis 6
  • 7. 2017 Vincenzo Gulisano Data streaming in Big Data analysis 7 Advanced Metering Infrastructures Vehicular Networks 1. Billions of readings per day cannot be transferred continuously 2. The latency incurred while transferring data might undermine the utility of the analysis 3. It is not secure to concentrate all the data in a single place 4. Privacy can be leaked when giving away fine-grained data What do we need then? • Efficient one-pass analysis • In memory • Bounded resources
  • 8. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 8
  • 9. Main Memory Disk 1 Data Query Processing 3 Query results 2 Query Main Memory Query Processing Continuous Query Data Query results 9Data streaming in Big Data analysisVincenzo Gulisano DBMS vs. DSMS
  • 10. data stream: unbounded sequence of tuples sharing the same schema 10 Example: vehicles’ speed and position reports time Field Field vehicle id text time (secs) text speed (Km/h) double X coordinate double Y coordinate double A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2 Data streaming in Big Data analysisVincenzo Gulisano
  • 11. continuous query: Directed Acyclic Graph (DAG) of streams and operators 11 OP OP OP OP OP OP OP source op (1+ out streams) sink op (1+ in streams) stream op (1+ in, 1+ out streams) Data streaming in Big Data analysisVincenzo Gulisano
  • 12. data streaming operators • Stateless operators • do not maintain any state • one-by-one processing • Stateful operators • maintain a state that evolves with the tuples being processed • produce output tuples that depend on multiple input tuples 12 OP OP Data streaming in Big Data analysisVincenzo Gulisano
  • 13. stateless operators 13 Filter ... Map Union ... Filter / route tuples based on one (or more) conditions Transform each tuple Merge multiple streams (with the same schema) into one Data streaming in Big Data analysisVincenzo Gulisano
  • 14. stateful operators 14 Aggregate information from multiple tuples (e.g., compute average speed of the tuples in the last hour) Compare tuples coming from 2 streams given a certain predicate (e.g., given the last 5 tuples from each stream, join every pair reporting the same position) Aggregate Join Data streaming in Big Data analysisVincenzo Gulisano Since streams are unbounded, windows (over time or tuples) are defined to bound the portion of tuples to aggregate or join
  • 15. sample query For each vehicle, raise an alert if the speed of the latest report is more than 2 times higher than its average speed in the last 30 days. 15 time A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2 Data streaming in Big Data analysisVincenzo Gulisano
  • 16. 16 Field vehicle id time (secs) speed (Km/h) X coordinate Y coordinate Compute average speed for each vehicle during the last 30 days Aggregate Field vehicle id time (secs) avg speed (Km/h) Join Check condition Filter Field vehicle id time (secs) speed (Km/h) Join on vehicle id Field vehicle id time (secs) avg speed (Km/h) speed (Km/h) sample query Data streaming in Big Data analysisVincenzo Gulisano
  • 17. A B C A B C A B C A B C B C A A A B B Vincenzo Gulisano Data streaming in Big Data analysis 17
  • 18. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 18
  • 19. Faulttolerance Elasticity Loadbalancing Determinism Parallel execution of streaming operators Vincenzo Gulisano Data streaming in Big Data analysis 19 Parallel execution of streaming applications OP1 OP2 OP1 OP2 OP1 OP2 = 1) how to route tuples? 2) where to route tuples? 3) How to merge tuples? 4) How many instances to deploy per operator? ...
  • 20. Faulttolerance Elasticity Loadbalancing Determinism Parallel execution of streaming operators Vincenzo Gulisano Data streaming in Big Data analysis 20 Parallel execution of streaming applications DDoSdetection andmitigation Intrusiondetection Datavalidation Differentially privateaggregation Vehicularnetworks analysis Urbanmobility analysis Security and privacy IoT Transportation sustainability
  • 21. Synchronization / Data structures Faulttolerance Elasticity Loadbalancing Determinism Parallel execution of streaming operators Vincenzo Gulisano Data streaming in Big Data analysis 21 Parallel execution of streaming applications Many-core systems / FPGAs DDoSdetection andmitigation Intrusiondetection Datavalidation Differentially privateaggregation Vehicularnetworks analysis Urbanmobility analysis Parallel joins Parallel aggregates Joins modeling Security and privacy IoT Transportation sustainability
  • 22. Vincenzo Gulisano Data streaming in Big Data analysis 22 Synchronization / Data structures Faulttolerance Elasticity Loadbalancing DeterminismParallel execution of streaming operators Parallel execution of streaming applications Many-core systems / FPGAs DDoSdetection andmitigation Intrusiondetection Datavalidation Differentially privateaggregation Vehicularnetworks analysis Urbanmobility analysis Security and privacy Transportatio sustainabilit Distributed execution of streaming applications first the hardware, then the query first the query, then the hardware
  • 23. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 23
  • 24. Millions of sensors • Store information • Iterate multiple times over data • Think, do not rush through decisions • ”Hard-wired” routines • Real-time decisions • High-throughput / low-latency Do I really need to try surströmming? (... NO) Danger!!! Run!!! (surströmming can opened) Vincenzo Gulisano Data streaming in Big Data analysis 24
  • 25. What traffic congestion patterns can I observe frequently? Don’t take over, car in opposite lane! Vincenzo Gulisano Data streaming in Big Data analysis 25 • Store information • Iterate multiple times over data • Think, do not rush through decisions • ”Hard-wired” routines • Real-time decisions • High-throughput / low-latency Millions of sensors + + +
  • 26. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 26
  • 27. Bibliography 1. Zhou, Jiazhen, Rose Qingyang Hu, and Yi Qian. "Scalable distributed communication architectures to support advanced metering infrastructure in smart grid." IEEE Transactions on Parallel and Distributed Systems 23.9 (2012): 1632-1642. 2. Gulisano, Vincenzo, et al. "BES: Differentially Private and Distributed Event Aggregation in Advanced Metering Infrastructures." Proceedings of the 2nd ACM International Workshop on Cyber-Physical System Security. ACM, 2016. 3. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "Online and scalable data validation in advanced metering infrastructures." IEEE PES Innovative Smart Grid Technologies, Europe. IEEE, 2014. 4. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "METIS: a two-tier intrusion detection system for advanced metering infrastructures." International Conference on Security and Privacy in Communication Systems. Springer International Publishing, 2014. 5. Yousefi, Saleh, Mahmoud Siadat Mousavi, and Mahmood Fathy. "Vehicular ad hoc networks (VANETs): challenges and perspectives." 2006 6th International Conference on ITS Telecommunications. IEEE, 2006. 6. El Zarki, Magda, et al. "Security issues in a future vehicular network." European Wireless. Vol. 2. 2002. 7. Georgiadis, Giorgos, and Marina Papatriantafilou. "Dealing with storage without forecasts in smart grids: Problem transformation and online scheduling algorithm." Proceedings of the 29th Annual ACM Symposium on Applied Computing. ACM, 2014. 8. Fu, Zhang, et al. "Online temporal-spatial analysis for detection of critical events in Cyber-Physical Systems." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014. Data streaming in Big Data analysis 27Vincenzo Gulisano
  • 28. Bibliography 9. Arasu, Arvind, et al. "Linear road: a stream data management benchmark." Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. VLDB Endowment, 2004. 10. Lv, Yisheng, et al. "Traffic flow prediction with big data: a deep learning approach." IEEE Transactions on Intelligent Transportation Systems 16.2 (2015): 865-873. 11. Grochocki, David, et al. "AMI threats, intrusion detection requirements and deployment recommendations." Smart Grid Communications (SmartGridComm), 2012 IEEE Third International Conference on. IEEE, 2012. 12. Molina-Markham, Andrés, et al. "Private memoirs of a smart meter." Proceedings of the 2nd ACM workshop on embedded sensing systems for energy-efficiency in building. ACM, 2010. 13. Gulisano, Vincenzo, et al. "Streamcloud: A large scale data streaming system." Distributed Computing Systems (ICDCS), 2010 IEEE 30th International Conference on. IEEE, 2010. 14. Stonebraker, Michael, Uǧur Çetintemel, and Stan Zdonik. "The 8 requirements of real-time stream processing." ACM SIGMOD Record 34.4 (2005): 42-47. 15. Bonomi, Flavio, et al. "Fog computing and its role in the internet of things." Proceedings of the first edition of the MCC workshop on Mobile cloud computing. ACM, 2012. Data streaming in Big Data analysis 28Vincenzo Gulisano
  • 29. Bibliography 16. Gulisano, Vincenzo Massimiliano. StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine. Diss. Informatica, 2012. 17. Cardellini, Valeria, et al. "Optimal operator placement for distributed stream processing applications." Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. ACM, 2016. 18. Costache, Stefania, et al. "Understanding the Data-Processing Challenges in Intelligent Vehicular Systems." Proceedings of the 2016 IEEE Intelligent Vehicles Symposium (IV16). 19. Giatrakos, Nikos, Antonios Deligiannakis, and Minos Garofalakis. "Scalable Approximate Query Tracking over Highly Distributed Data Streams." Proceedings of the 2016 International Conference on Management of Data. ACM, 2016. 20. Gulisano, Vincenzo, et al. "Streamcloud: An elastic and scalable data streaming system." IEEE Transactions on Parallel and Distributed Systems 23.12 (2012): 2351-2365. 21. Shah, Mehul A., et al. "Flux: An adaptive partitioning operator for continuous query systems." Data Engineering, 2003. Proceedings. 19th International Conference on. IEEE, 2003. Data streaming in Big Data analysis 29Vincenzo Gulisano
  • 30. Bibliography 22. Cederman, Daniel, et al. "Brief announcement: concurrent data structures for efficient streaming aggregation." Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 2014. 23. Ji, Yuanzhen, et al. "Quality-driven processing of sliding window aggregates over out-of-order data streams." Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems. ACM, 2015. 24. Ji, Yuanzhen, et al. "Quality-driven disorder handling for concurrent windowed stream queries with shared operators." Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. ACM, 2016. 25. Gulisano, Vincenzo, et al. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." Big Data (Big Data), 2015 IEEE International Conference on. IEEE, 2015. 26. Ottenwälder, Beate, et al. "MigCEP: operator migration for mobility driven distributed complex event processing." Proceedings of the 7th ACM international conference on Distributed event-based systems. ACM, 2013. 27. De Matteis, Tiziano, and Gabriele Mencagli. "Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing." Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 2016. 28. Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." ACM Transactions on Database Systems (TODS) 33.1 (2008): 3. 29. Castro Fernandez, Raul, et al. "Integrating scale out and fault tolerance in stream processing using operator state management." Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 2013. Data streaming in Big Data analysis 30Vincenzo Gulisano
  • 31. Bibliography 30. Dwork, Cynthia. "Differential privacy: A survey of results." International Conference on Theory and Applications of Models of Computation. Springer Berlin Heidelberg, 2008. 31. Dwork, Cynthia, et al. "Differential privacy under continual observation." Proceedings of the forty-second ACM symposium on Theory of computing. ACM, 2010. 32. Kargl, Frank, Arik Friedman, and Roksana Boreli. "Differential privacy in intelligent transportation systems." Proceedings of the sixth ACM conference on Security and privacy in wireless and mobile networks. ACM, 2013. Data streaming in Big Data analysis 31Vincenzo Gulisano

Editor's Notes

  1. Before we start... questions and please notice
  2. making peace with DBs...