SlideShare a Scribd company logo
Hannaneh Najdataei
Parallel Data Streaming Analytics in the
Context of Internet of Things
Licentiate seminar
.: May 2019 :.
Introduction Continuous clustering Elasticity in stream processing Conclusions 2
Internet of Things (IoT)
Cloud Computing
IoT Analytics
3Introduction Continuous clustering Elasticity in stream processing Conclusions
20 GB per
car per hour
Edge Devices
Edge Computing
Fog Computing
Cloud Computing
IoT Analytics
4Introduction Continuous clustering Elasticity in stream processing Conclusions
Edge Computing
Fog Computing
Cloud Computing
IoT Analytics
5Introduction Continuous clustering Elasticity in stream processing Conclusions
3-tier IoT Architecture
6Introduction Continuous clustering Elasticity in stream processing Conclusions
Cloud Tier
Data Centers
Fog Tier
Nodes
Edge Tier
Devices
Scope of the Thesis
7
The challenges
• Unbounded data
• Unpredictable data rate
• Various platforms
• Time requirements
Computationalpower
High
Medium
Low
Introduction Continuous clustering Elasticity in stream processing Conclusions
• Design and implement analytics
Scope of the Thesis
8
The objectives
• Continuous analysis
• Adaptive reconfiguration
• Hardware independent
• Efficient processing
Introduction Continuous clustering Elasticity in stream processing Conclusions
• Design and implement analytics
The challenges
• Unbounded data
• Unpredictable data rate
• Various platforms
• Time requirement
Conventional Data Analytics (Batch processing)
9
Data
Analysis
Results
Database
Introduction Continuous clustering Elasticity in stream processing Conclusions
Continuous Processing
10
Data
Analysis
Results
Introduction Continuous clustering Elasticity in stream processing Conclusions
Stream Processing
11
Results
Data Analysis
Introduction Continuous clustering Elasticity in stream processing Conclusions
Stream Processing Operators
12Introduction Continuous clustering Elasticity in stream processing Conclusions
• Stateless
• Stateful
State is the memory of the operator
Stream Processing Operators
13Introduction Continuous clustering Elasticity in stream processing Conclusions
• Stateless
• E.g. filter
• Stateful
State is the memory of the operator
tuple <ts,x>
<3,1> <2,4> <1,3><4,3>
Stream Processing Operators
14Introduction Continuous clustering Elasticity in stream processing Conclusions
• Stateless
• E.g. filter
• Stateful
• E.g. aggregate
State is the memory of the operator
window
<1,3><4,3>
<3,1> <2,4> <1,3>
tuple <ts,x>
<3,8>
Outline
15
1. Introduction
• Motivation
• Thesis objectives
o Continuous analysis
o Adaptive reconfiguration
o Hardware independent
o Efficient processing
2. Continuous clustering
3. Elasticity in stream processing
4. Conclusions
Introduction Continuous clustering Elasticity in stream processing Conclusions
LiDAR Point Cloud Clustering
16
Side view
Top view
𝑑
Introduction Continuous clustering Elasticity in stream processing Conclusions
Raw LiDAR data points
LiDAR Point Cloud Clustering
17Introduction Continuous clustering Elasticity in stream processing Conclusions
Clustered data pointsRaw LiDAR data points
Batch Clustering
18
1. Collect data points for one rotation
2. Store the points in search optimized data structure
3. Apply the clustering
𝜖
Parameters: 𝑚𝑖𝑛𝑃𝑡𝑠, 𝜖
Euclidean clustering
*[Ester et al.,Density-based1996] [Rusu et al., Semantic3D2010] [Rusu et al., pcl2011] [Patwary et al., DBSCAN2012]
Introduction Continuous clustering Elasticity in stream processing Conclusions
Batch Clustering
19
1. Collect data points for one rotation
2. Store the points in search optimized data structure
3. Apply the clustering
Introduction Continuous clustering Elasticity in stream processing Conclusions
Velodyne HDL-64E
• ~8 rotations per second
• Up to ~2.2 million points per second
Challenge?
Continuous Clustering
20
Ø H. Najdataei, Y. Nikolakopoulos, V. Gulisano, M. Papatriantafilou. “Continuous and Parallel LiDAR Point-cloud Clustering”
The 38th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2018.
Introduction Continuous clustering Elasticity in stream processing Conclusions
1. Collect data points for one rotation
2. Store the points in search optimized data structure
3. Apply the clustering
Lisco: continuous clustering while the data is being collected
Introduction Continuous clustering Elasticity in stream processing Conclusions
Lisco
Side view Top
view
𝑺 𝟐
𝑺 𝟏
𝑺4
𝑺 𝟑
𝑺5
𝑺6𝑺 𝟕
𝒍 𝟏
𝒍 𝟐
𝒍 𝟑
𝒍 𝟒
𝒍5
21
2D view
𝑺 𝟏 𝑺 𝟐 𝑺 𝟑 𝑺 𝟒 𝑺 𝟓 𝑺 𝟔 𝑺 𝟕
𝒍 𝟏
𝒍 𝟐 𝒅 𝟏 𝒅 𝟓 𝒅 𝟗
𝒍 𝟑 𝒅 𝟐 𝒅 𝟔 𝒅 𝟏𝟎
𝒍 𝟒 𝒅 𝟑 𝒅 𝟕 𝒅 𝟏𝟏
𝒍 𝟓 𝒅 𝟒 𝒅 𝟖 𝒅 𝟏𝟐
𝑝
Introduction Continuous clustering Elasticity in stream processing Conclusions
Lisco
Side view Top
view
𝑺 𝟐
𝑺 𝟏
𝑺4
𝑺 𝟑
𝑺5
𝑺6𝑺 𝟕
𝒍 𝟏
𝒍 𝟐
𝒍 𝟑
𝒍 𝟒
𝒍5
22
L lasers
S steps
2D view
𝜀 Neighbor mask of
point 𝑝
Continuous Clustering Challenges
23
Partial view of neighbor mask
𝑝’
𝑝
LiDAR’s last read
Continuous cluster management
𝐶9
𝐶:
𝐶
𝐻9
𝐻:full neighbor mask of 𝑝actual neighbor mask of 𝑝neighbor mask of 𝑝′
Introduction Continuous clustering Elasticity in stream processing Conclusions
Lisco
24
1. Find the neighbor mask and
compute distances
𝑝
2. Link the clusters
𝐶9
𝐶:
𝐻9
Introduction Continuous clustering Elasticity in stream processing Conclusions
P-Lisco
25
1. Find the neighbor mask and
compute distances
𝑝
2. Link the clusters
𝐶9
𝐶:
𝐻9
Introduction Continuous clustering Elasticity in stream processing Conclusions
Scouting Linking
P-Lisco
26
1. Find the neighbor mask and
compute distances
2. Link the clusters
𝐶9
𝐶:
𝐻9
Introduction Continuous clustering Elasticity in stream processing Conclusions
Scouting Linking
Thread 1
Thread 2
Thread 3
P-Lisco
27
Thread 1
Thread 2
Thread 3
Introduction Continuous clustering Elasticity in stream processing Conclusions
flag point 𝜀 − 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠
…
…
LinkerScouts
S1 S2 S3 S4 S5 S6 S7
𝑙9
𝑙:
𝑙E
𝑙F
𝑙G
𝑙H
Read the data points Modify the clusters
28
0.3 0.4 0.7
(m)
0
5
10
15
20
ExecutionTime(s)
PCL
Lisco
P-Lisco1
P-Lisco2
P-Lisco4
Intel Xeon E5-2695 ODROID-XU3
0.3 0.4 0.7
(m)
0
0.2
0.4
0.6
0.8
1
ExecutionTime(ms)
PCL
Lisco
P-Lisco1
P-Lisco2
P-Lisco4
P-Lisco8
P-Lisco16
0.3 0.4 0.7
(m)
0
5
10
15
20
ExecutionTime(s)
PCL
Lisco
P-Lisco1
P-Lisco2
P-Lisco4
Performance Evaluation (Real dataset)
Introduction Continuous clustering Elasticity in stream processing Conclusions
Use case (1-Vehicle 1-Day)
29
t
⟨𝑥1, ⟩𝑦1GPS data ⟨𝑥2, ⟩𝑦2 ⟨𝑥3, ⟩𝑦3 ⟨𝑥5, ⟩𝑦5⟨𝑥4, ⟩𝑦4 ⟨𝑥6, ⟩𝑦6 ⟨ 𝑥7, ⟩𝑦7
Heavy traffic Exceeding speed limit
Introduction Continuous clustering Elasticity in stream processing Conclusions
System Model
30
Ø B. Havers, R. Duvignau, H. Najdataei, V. Gulisano, A. Chaitanya Koppisetty, M.
Papatriantafilou “DRIVEN: a framework for efficient Data Retrieval and clustering in
Vehicular Networks” The 35th International Conference on Data Engineering (ICDE).
IEEE, 2019
• Continuous bounded error approximation
• Compress volumes of data
• Utilize communication bandwidth
• Generalized form of Lisco
• Leverage the inherent ordering of spatial
and temporal data
Introduction Continuous clustering Elasticity in stream processing Conclusions
Outline
31
1. Introduction
2. Continuous clustering
• Lisco
• P-Lisco
3. Elasticity in stream processing
4. Conclusions
Introduction Continuous clustering Elasticity in stream processing Conclusions
Stream Processing
32Introduction Continuous clustering Elasticity in stream processing Conclusions
Stream Processing Performance
33
• Throughput
Number of tuples processed per time unit
Introduction Continuous clustering Elasticity in stream processing Conclusions
Stream Processing Performance
34
• Throughput
• Latency
Time difference between receiving a tuple and
producing the corresponding results
Introduction Continuous clustering Elasticity in stream processing Conclusions
Stream Processing Parallelism
35Introduction Continuous clustering Elasticity in stream processing Conclusions
• Task parallelism
Stream Processing Parallelism
36Introduction Continuous clustering Elasticity in stream processing Conclusions
• Task parallelism
Determinism: Consistent results independent of
tuples’ inter-arrival times
*
[Walulya et al.,FGCS18][Gulisano et al., ScaleJoin 2016]
• Data parallelism
Stream Processing Elasticity
37Introduction Continuous clustering Elasticity in stream processing Conclusions
Decommissioning
Provisioning
Stream Processing Elasticity
38Introduction Continuous clustering Elasticity in stream processing Conclusions
Scale out
* [Cardellini et al., HPCS16][Carbone et al.,VLDB17]
Stream Processing Efficiency
39Introduction Continuous clustering Elasticity in stream processing Conclusions
Shared-nothing Shared
Parallelism Reconfiguration
memory
Virtual
Shared-nothing
STRETCH Framework
40
Components:
• State manager
• Virtual shared-nothing
parallelism
Introduction Continuous clustering Elasticity in stream processing Conclusions
Ø H. Najdataei, Y. Nikolakopoulos, M. Papatriantafilou, P. Tsigas, V. Gulisano “STRETCH: Scalable and Elastic Deterministic Streaming Analysis with
Virtual Shared-Nothing Parallelism” To appear in the 13th International Conference on Distributed and Event-Based Systems (DEBS). ACM, 2019.
Virtual Shared-nothing Parallelism
41Introduction Continuous clustering Elasticity in stream processing Conclusions
STRETCH Framework
42Introduction Continuous clustering Elasticity in stream processing Conclusions
Components:
• State manager
• Virtual shared-nothing
parallelism
• Elastic ScaleGate (ESG)
ScaleGate
43Introduction Continuous clustering Elasticity in stream processing Conclusions
t t t t t t t
sourcesourcereaderreader
Tuples that are ready to be
retrieved by readers • Methods
• addTuple(tuple, sourceID)
• getNextReadyTuple(readerID)
Elastic ScaleGate
44
• Methods
• addTuple(tuple, sourceID)
• getNextReadyTuple(readerID)
• Additional methods
• announceReaders(List reader_IDs, rID)
• removeReaders(List reader_IDs)
• announceSources(List source_IDs, min_ts)
• removeSources(List source_IDs)
Introduction Continuous clustering Elasticity in stream processing Conclusions
t t t t t t t
sourcesourcereaderreader
Tuples that are ready to be
retrieved by readers
STRETCH Framework
45Introduction Continuous clustering Elasticity in stream processing Conclusions
ts=3
ts=3
ts=2
ts=1ts=5ts=9
ts=6ts=8
ts=1ts=2
STRETCH Framework
46Introduction Continuous clustering Elasticity in stream processing Conclusions
ts=5ts=9
ts=8
ts=5
ts=6ts=6
STRETCH Framework
47Introduction Continuous clustering Elasticity in stream processing Conclusions
ts=5
ts=9
ts=8
ts=5ts=6
ts=6
ts=6
ts=8
STRETCH Framework
48Introduction Continuous clustering Elasticity in stream processing Conclusions
ts=6
ts=8
STRETCH Framework
49Introduction Continuous clustering Elasticity in stream processing Conclusions
2000
4000
6000
8000
Inputrate(t/s)
Intra-epoch
2500
3000
3500
4000
4500
provisioning
(18 -> 31 PTs)
1500
2000
2500
decommissioning
(18 -> 7 PTs)
0.0
0.2
0.4
0.6
0.8
1.0
throughput(c/s)
1e10
Single thread STRETCH ScaleJoin
0
1
2
3
1e9
0.0
0.2
0.4
0.6
0.8
1.0
1e9
0 20 40 60
# threads
101
102
103
latency(ms)
hyper-threading
0 250 500 750
time (sec)
101
102
103
0 250 500 750
time (sec)
101
102
103
scalability
Performance Evaluation
50
• Use case: ScaleJoin
• Setup: Intel Xeon E5-2695
Introduction Continuous clustering Elasticity in stream processing Conclusions
Outline
51
1. Introduction
2. Continuous clustering
3. Elasticity in stream processing
• Virtual shared-nothing parallelism
• Elastic ScaleGate
• STRETCH framework
4. Conclusions
Introduction Continuous clustering Elasticity in stream processing Conclusions
Conclusions
52
• Continuous clustering
• Efficient data structure to leverage parallelism
• High throughput and low latency
• Architecture independent
• Elasticity in stream processing
• Virtual shared-nothing parallelism
• Adaptive reconfiguration of processing units
• Intra-node resource utilization
• Deterministic execution
Ø Scale up/scale out
Ø Automatic control unit
Ø IoT applications
Ø Data quality improvement
Introduction Continuous clustering Elasticity in stream processing Conclusions

More Related Content

Similar to Lic may17

Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
Carlos Reaño González
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
Vincenzo Gulisano
 
rerngvit_phd_seminar
rerngvit_phd_seminarrerngvit_phd_seminar
rerngvit_phd_seminar
rerngvit yanggratoke
 
Personal Research Overview presented at the KU-NAIST Research Meeting
Personal Research Overview presented at the KU-NAIST Research MeetingPersonal Research Overview presented at the KU-NAIST Research Meeting
Personal Research Overview presented at the KU-NAIST Research Meeting
Chawanat Nakasan
 
Data Streaming in IoT and Big Data Analytics
Data Streaming in  IoT and Big Data AnalyticsData Streaming in  IoT and Big Data Analytics
Data Streaming in IoT and Big Data Analytics
Vincenzo Gulisano
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
RAHUL BHOJWANI
 
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Pradeeban Kathiravelu, Ph.D.
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
Nesreen K. Ahmed
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
NECST Lab @ Politecnico di Milano
 
Major Report on ADIAN
Major Report on ADIANMajor Report on ADIAN
Major Report on ADIAN
smittal121
 
dc09ttp-2011-thesis
dc09ttp-2011-thesisdc09ttp-2011-thesis
dc09ttp-2011-thesis
Theofilos Papapanagiotou
 
Distributed Mobility in Dynamic Environments
Distributed Mobility in Dynamic EnvironmentsDistributed Mobility in Dynamic Environments
Distributed Mobility in Dynamic Environments
Jonathan Carvalho
 
Slides for Ph.D. Thesis Defense of Dheryta Jaisinghani at IIIT-Delhi, INDIA
Slides for Ph.D. Thesis Defense of Dheryta Jaisinghani at IIIT-Delhi, INDIASlides for Ph.D. Thesis Defense of Dheryta Jaisinghani at IIIT-Delhi, INDIA
Slides for Ph.D. Thesis Defense of Dheryta Jaisinghani at IIIT-Delhi, INDIA
Dheryta Jaisinghani
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
Pradeeban Kathiravelu, Ph.D.
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
Pradeeban Kathiravelu, Ph.D.
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
Ian Foster
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
Vincenzo Gulisano
 
Clustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow DetectionClustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow Detection
APNIC
 
Middleware para IoT basado en analítica de datos
Middleware para IoT basado en analítica de datosMiddleware para IoT basado en analítica de datos
Middleware para IoT basado en analítica de datos
Facultad de Informática UCM
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Duy-Hieu Bui
 

Similar to Lic may17 (20)

Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
rerngvit_phd_seminar
rerngvit_phd_seminarrerngvit_phd_seminar
rerngvit_phd_seminar
 
Personal Research Overview presented at the KU-NAIST Research Meeting
Personal Research Overview presented at the KU-NAIST Research MeetingPersonal Research Overview presented at the KU-NAIST Research Meeting
Personal Research Overview presented at the KU-NAIST Research Meeting
 
Data Streaming in IoT and Big Data Analytics
Data Streaming in  IoT and Big Data AnalyticsData Streaming in  IoT and Big Data Analytics
Data Streaming in IoT and Big Data Analytics
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
Major Report on ADIAN
Major Report on ADIANMajor Report on ADIAN
Major Report on ADIAN
 
dc09ttp-2011-thesis
dc09ttp-2011-thesisdc09ttp-2011-thesis
dc09ttp-2011-thesis
 
Distributed Mobility in Dynamic Environments
Distributed Mobility in Dynamic EnvironmentsDistributed Mobility in Dynamic Environments
Distributed Mobility in Dynamic Environments
 
Slides for Ph.D. Thesis Defense of Dheryta Jaisinghani at IIIT-Delhi, INDIA
Slides for Ph.D. Thesis Defense of Dheryta Jaisinghani at IIIT-Delhi, INDIASlides for Ph.D. Thesis Defense of Dheryta Jaisinghani at IIIT-Delhi, INDIA
Slides for Ph.D. Thesis Defense of Dheryta Jaisinghani at IIIT-Delhi, INDIA
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Clustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow DetectionClustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow Detection
 
Middleware para IoT basado en analítica de datos
Middleware para IoT basado en analítica de datosMiddleware para IoT basado en analítica de datos
Middleware para IoT basado en analítica de datos
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 

Recently uploaded

20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 

Recently uploaded (20)

20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 

Lic may17

  • 1. Hannaneh Najdataei Parallel Data Streaming Analytics in the Context of Internet of Things Licentiate seminar .: May 2019 :.
  • 2. Introduction Continuous clustering Elasticity in stream processing Conclusions 2 Internet of Things (IoT)
  • 3. Cloud Computing IoT Analytics 3Introduction Continuous clustering Elasticity in stream processing Conclusions 20 GB per car per hour Edge Devices
  • 4. Edge Computing Fog Computing Cloud Computing IoT Analytics 4Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 5. Edge Computing Fog Computing Cloud Computing IoT Analytics 5Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 6. 3-tier IoT Architecture 6Introduction Continuous clustering Elasticity in stream processing Conclusions Cloud Tier Data Centers Fog Tier Nodes Edge Tier Devices
  • 7. Scope of the Thesis 7 The challenges • Unbounded data • Unpredictable data rate • Various platforms • Time requirements Computationalpower High Medium Low Introduction Continuous clustering Elasticity in stream processing Conclusions • Design and implement analytics
  • 8. Scope of the Thesis 8 The objectives • Continuous analysis • Adaptive reconfiguration • Hardware independent • Efficient processing Introduction Continuous clustering Elasticity in stream processing Conclusions • Design and implement analytics The challenges • Unbounded data • Unpredictable data rate • Various platforms • Time requirement
  • 9. Conventional Data Analytics (Batch processing) 9 Data Analysis Results Database Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 10. Continuous Processing 10 Data Analysis Results Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 11. Stream Processing 11 Results Data Analysis Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 12. Stream Processing Operators 12Introduction Continuous clustering Elasticity in stream processing Conclusions • Stateless • Stateful State is the memory of the operator
  • 13. Stream Processing Operators 13Introduction Continuous clustering Elasticity in stream processing Conclusions • Stateless • E.g. filter • Stateful State is the memory of the operator tuple <ts,x> <3,1> <2,4> <1,3><4,3>
  • 14. Stream Processing Operators 14Introduction Continuous clustering Elasticity in stream processing Conclusions • Stateless • E.g. filter • Stateful • E.g. aggregate State is the memory of the operator window <1,3><4,3> <3,1> <2,4> <1,3> tuple <ts,x> <3,8>
  • 15. Outline 15 1. Introduction • Motivation • Thesis objectives o Continuous analysis o Adaptive reconfiguration o Hardware independent o Efficient processing 2. Continuous clustering 3. Elasticity in stream processing 4. Conclusions Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 16. LiDAR Point Cloud Clustering 16 Side view Top view 𝑑 Introduction Continuous clustering Elasticity in stream processing Conclusions Raw LiDAR data points
  • 17. LiDAR Point Cloud Clustering 17Introduction Continuous clustering Elasticity in stream processing Conclusions Clustered data pointsRaw LiDAR data points
  • 18. Batch Clustering 18 1. Collect data points for one rotation 2. Store the points in search optimized data structure 3. Apply the clustering 𝜖 Parameters: 𝑚𝑖𝑛𝑃𝑡𝑠, 𝜖 Euclidean clustering *[Ester et al.,Density-based1996] [Rusu et al., Semantic3D2010] [Rusu et al., pcl2011] [Patwary et al., DBSCAN2012] Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 19. Batch Clustering 19 1. Collect data points for one rotation 2. Store the points in search optimized data structure 3. Apply the clustering Introduction Continuous clustering Elasticity in stream processing Conclusions Velodyne HDL-64E • ~8 rotations per second • Up to ~2.2 million points per second Challenge?
  • 20. Continuous Clustering 20 Ø H. Najdataei, Y. Nikolakopoulos, V. Gulisano, M. Papatriantafilou. “Continuous and Parallel LiDAR Point-cloud Clustering” The 38th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2018. Introduction Continuous clustering Elasticity in stream processing Conclusions 1. Collect data points for one rotation 2. Store the points in search optimized data structure 3. Apply the clustering Lisco: continuous clustering while the data is being collected
  • 21. Introduction Continuous clustering Elasticity in stream processing Conclusions Lisco Side view Top view 𝑺 𝟐 𝑺 𝟏 𝑺4 𝑺 𝟑 𝑺5 𝑺6𝑺 𝟕 𝒍 𝟏 𝒍 𝟐 𝒍 𝟑 𝒍 𝟒 𝒍5 21 2D view 𝑺 𝟏 𝑺 𝟐 𝑺 𝟑 𝑺 𝟒 𝑺 𝟓 𝑺 𝟔 𝑺 𝟕 𝒍 𝟏 𝒍 𝟐 𝒅 𝟏 𝒅 𝟓 𝒅 𝟗 𝒍 𝟑 𝒅 𝟐 𝒅 𝟔 𝒅 𝟏𝟎 𝒍 𝟒 𝒅 𝟑 𝒅 𝟕 𝒅 𝟏𝟏 𝒍 𝟓 𝒅 𝟒 𝒅 𝟖 𝒅 𝟏𝟐
  • 22. 𝑝 Introduction Continuous clustering Elasticity in stream processing Conclusions Lisco Side view Top view 𝑺 𝟐 𝑺 𝟏 𝑺4 𝑺 𝟑 𝑺5 𝑺6𝑺 𝟕 𝒍 𝟏 𝒍 𝟐 𝒍 𝟑 𝒍 𝟒 𝒍5 22 L lasers S steps 2D view 𝜀 Neighbor mask of point 𝑝
  • 23. Continuous Clustering Challenges 23 Partial view of neighbor mask 𝑝’ 𝑝 LiDAR’s last read Continuous cluster management 𝐶9 𝐶: 𝐶 𝐻9 𝐻:full neighbor mask of 𝑝actual neighbor mask of 𝑝neighbor mask of 𝑝′ Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 24. Lisco 24 1. Find the neighbor mask and compute distances 𝑝 2. Link the clusters 𝐶9 𝐶: 𝐻9 Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 25. P-Lisco 25 1. Find the neighbor mask and compute distances 𝑝 2. Link the clusters 𝐶9 𝐶: 𝐻9 Introduction Continuous clustering Elasticity in stream processing Conclusions Scouting Linking
  • 26. P-Lisco 26 1. Find the neighbor mask and compute distances 2. Link the clusters 𝐶9 𝐶: 𝐻9 Introduction Continuous clustering Elasticity in stream processing Conclusions Scouting Linking Thread 1 Thread 2 Thread 3
  • 27. P-Lisco 27 Thread 1 Thread 2 Thread 3 Introduction Continuous clustering Elasticity in stream processing Conclusions flag point 𝜀 − 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 … … LinkerScouts S1 S2 S3 S4 S5 S6 S7 𝑙9 𝑙: 𝑙E 𝑙F 𝑙G 𝑙H Read the data points Modify the clusters
  • 28. 28 0.3 0.4 0.7 (m) 0 5 10 15 20 ExecutionTime(s) PCL Lisco P-Lisco1 P-Lisco2 P-Lisco4 Intel Xeon E5-2695 ODROID-XU3 0.3 0.4 0.7 (m) 0 0.2 0.4 0.6 0.8 1 ExecutionTime(ms) PCL Lisco P-Lisco1 P-Lisco2 P-Lisco4 P-Lisco8 P-Lisco16 0.3 0.4 0.7 (m) 0 5 10 15 20 ExecutionTime(s) PCL Lisco P-Lisco1 P-Lisco2 P-Lisco4 Performance Evaluation (Real dataset) Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 29. Use case (1-Vehicle 1-Day) 29 t ⟨𝑥1, ⟩𝑦1GPS data ⟨𝑥2, ⟩𝑦2 ⟨𝑥3, ⟩𝑦3 ⟨𝑥5, ⟩𝑦5⟨𝑥4, ⟩𝑦4 ⟨𝑥6, ⟩𝑦6 ⟨ 𝑥7, ⟩𝑦7 Heavy traffic Exceeding speed limit Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 30. System Model 30 Ø B. Havers, R. Duvignau, H. Najdataei, V. Gulisano, A. Chaitanya Koppisetty, M. Papatriantafilou “DRIVEN: a framework for efficient Data Retrieval and clustering in Vehicular Networks” The 35th International Conference on Data Engineering (ICDE). IEEE, 2019 • Continuous bounded error approximation • Compress volumes of data • Utilize communication bandwidth • Generalized form of Lisco • Leverage the inherent ordering of spatial and temporal data Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 31. Outline 31 1. Introduction 2. Continuous clustering • Lisco • P-Lisco 3. Elasticity in stream processing 4. Conclusions Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 32. Stream Processing 32Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 33. Stream Processing Performance 33 • Throughput Number of tuples processed per time unit Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 34. Stream Processing Performance 34 • Throughput • Latency Time difference between receiving a tuple and producing the corresponding results Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 35. Stream Processing Parallelism 35Introduction Continuous clustering Elasticity in stream processing Conclusions • Task parallelism
  • 36. Stream Processing Parallelism 36Introduction Continuous clustering Elasticity in stream processing Conclusions • Task parallelism Determinism: Consistent results independent of tuples’ inter-arrival times * [Walulya et al.,FGCS18][Gulisano et al., ScaleJoin 2016] • Data parallelism
  • 37. Stream Processing Elasticity 37Introduction Continuous clustering Elasticity in stream processing Conclusions Decommissioning Provisioning
  • 38. Stream Processing Elasticity 38Introduction Continuous clustering Elasticity in stream processing Conclusions Scale out * [Cardellini et al., HPCS16][Carbone et al.,VLDB17]
  • 39. Stream Processing Efficiency 39Introduction Continuous clustering Elasticity in stream processing Conclusions Shared-nothing Shared Parallelism Reconfiguration memory Virtual Shared-nothing
  • 40. STRETCH Framework 40 Components: • State manager • Virtual shared-nothing parallelism Introduction Continuous clustering Elasticity in stream processing Conclusions Ø H. Najdataei, Y. Nikolakopoulos, M. Papatriantafilou, P. Tsigas, V. Gulisano “STRETCH: Scalable and Elastic Deterministic Streaming Analysis with Virtual Shared-Nothing Parallelism” To appear in the 13th International Conference on Distributed and Event-Based Systems (DEBS). ACM, 2019.
  • 41. Virtual Shared-nothing Parallelism 41Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 42. STRETCH Framework 42Introduction Continuous clustering Elasticity in stream processing Conclusions Components: • State manager • Virtual shared-nothing parallelism • Elastic ScaleGate (ESG)
  • 43. ScaleGate 43Introduction Continuous clustering Elasticity in stream processing Conclusions t t t t t t t sourcesourcereaderreader Tuples that are ready to be retrieved by readers • Methods • addTuple(tuple, sourceID) • getNextReadyTuple(readerID)
  • 44. Elastic ScaleGate 44 • Methods • addTuple(tuple, sourceID) • getNextReadyTuple(readerID) • Additional methods • announceReaders(List reader_IDs, rID) • removeReaders(List reader_IDs) • announceSources(List source_IDs, min_ts) • removeSources(List source_IDs) Introduction Continuous clustering Elasticity in stream processing Conclusions t t t t t t t sourcesourcereaderreader Tuples that are ready to be retrieved by readers
  • 45. STRETCH Framework 45Introduction Continuous clustering Elasticity in stream processing Conclusions ts=3 ts=3 ts=2 ts=1ts=5ts=9 ts=6ts=8 ts=1ts=2
  • 46. STRETCH Framework 46Introduction Continuous clustering Elasticity in stream processing Conclusions ts=5ts=9 ts=8 ts=5 ts=6ts=6
  • 47. STRETCH Framework 47Introduction Continuous clustering Elasticity in stream processing Conclusions ts=5 ts=9 ts=8 ts=5ts=6 ts=6 ts=6 ts=8
  • 48. STRETCH Framework 48Introduction Continuous clustering Elasticity in stream processing Conclusions ts=6 ts=8
  • 49. STRETCH Framework 49Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 50. 2000 4000 6000 8000 Inputrate(t/s) Intra-epoch 2500 3000 3500 4000 4500 provisioning (18 -> 31 PTs) 1500 2000 2500 decommissioning (18 -> 7 PTs) 0.0 0.2 0.4 0.6 0.8 1.0 throughput(c/s) 1e10 Single thread STRETCH ScaleJoin 0 1 2 3 1e9 0.0 0.2 0.4 0.6 0.8 1.0 1e9 0 20 40 60 # threads 101 102 103 latency(ms) hyper-threading 0 250 500 750 time (sec) 101 102 103 0 250 500 750 time (sec) 101 102 103 scalability Performance Evaluation 50 • Use case: ScaleJoin • Setup: Intel Xeon E5-2695 Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 51. Outline 51 1. Introduction 2. Continuous clustering 3. Elasticity in stream processing • Virtual shared-nothing parallelism • Elastic ScaleGate • STRETCH framework 4. Conclusions Introduction Continuous clustering Elasticity in stream processing Conclusions
  • 52. Conclusions 52 • Continuous clustering • Efficient data structure to leverage parallelism • High throughput and low latency • Architecture independent • Elasticity in stream processing • Virtual shared-nothing parallelism • Adaptive reconfiguration of processing units • Intra-node resource utilization • Deterministic execution Ø Scale up/scale out Ø Automatic control unit Ø IoT applications Ø Data quality improvement Introduction Continuous clustering Elasticity in stream processing Conclusions