SlideShare a Scribd company logo
ONLINE
PERFORMANCE
ANALYSIS
OF DISTRIBUTED DATAFLOW SYSTEMS
Vasia Kalavri
kalavriv@inf.ethz.ch
Support:
O’Reilly Velocity London
20 October 2017
ABOUT ME
▸ Postdoc at ETH Zürich
▸ Systems Group: https://www.systems.ethz.ch/
▸ PMC member of Apache Flink
▸ Research interests
▸ Large-scale graph processing
▸ Streaming dataflow engines
▸ Current project
▸ Predictive datacenter analytics and
management
2
@vkalavri
33
traces, configuration,
topology updates, …
Datacenter
Strymon
queries, complex analytics,
simulations, …
policy enforcement,
what-if scenarios, …
STRYMON: ONLINE DATACENTER ANALYTICS AND MANAGEMENT
Datacenter
model
event streams
strymon.systems.ethz.ch
STRYMON IS BUILT ON TIMELY
4
▸ A steaming framework for data-parallel computations
▸ Arbitrary cyclic dataflows
▸ Logical timestamps (epochs)
▸ Asynchronous execution
▸ Low latency
D. Murray, F. McSherry, M. Isard, R. Isaacs, P. Barham, M. Abadi.
Naiad: A Timely Dataflow System. In SOSP, 2013.
4
https://github.com/frankmcsherry/timely-dataflow
What-if scenarios
Real-time
datacenter analytics
Incremental
network routing
Online critical
path analysis
Explaining
simulations
Strymon
5
Apache Flink
SnailTrail: Strymon’s
component for online
performance analysis of
distributed dataflows
6
What-if scenarios
Real-time
datacenter analytics
Incremental
network routing
Online critical
path analysis
Explaining
simulations
Strymon
DATAFLOW PROGRAMMING
7
sources
sinks
operators
data exchange
UNDERSTANDING THE PERFORMANCE OF DISTRIBUTED DATAFLOWS
Hard to troubleshoot
▸ long-running, dynamic
workloads
▸ many tasks, activities,
operators, dependencies
▸ bottleneck causes are
usually not isolated but
span multiple processes
8
client
W1
W1
scheduler
PERFORMANCE METRICS
9
PERFORMANCE METRICS
9
Dataflow graph
PERFORMANCE METRICS
9
Duration
Dataflow graph
PERFORMANCE METRICS
9
Duration
Aggregate data exchange
Dataflow graph
PERFORMANCE METRICS
9
Duration
Aggregate data exchange
Custom
aggregate metrics
Dataflow graph
W1
W2
W3
serialization processing
waiting
deserialization
10
processing
message
A PARALLEL EXECUTION
W1
W2
W3
serialization processing
waiting
deserialization
10
processing
message
A PARALLEL EXECUTION
W1
W2
W3
serialization processing
waiting
deserialization
10
processing
messageProcessing is the most time-consuming activity
A PARALLEL EXECUTION
W1
W2
W3
serialization processing
waiting
deserialization
11
processing
What if we optimize it?
A PARALLEL EXECUTION
W1
W2
W3
serialization
waiting
deserialization
12
A PARALLEL EXECUTION
13
W1
W2
W3
serialization
waiting
deserialization
A PARALLEL EXECUTION
13
W1
W2
W3
serialization
waiting
deserialization
A PARALLEL EXECUTION
13
W1
W2
W3
serialization
waiting
deserialization
No performance benefit
for the parallel execution!
A PARALLEL EXECUTION
CONVENTIONAL
PROFILING CAN
BE MISLEADING
CRITICAL PATH ANALYSIS
THE PROGRAM ACTIVITY GRAPH (PAG)
16
W1
W2
W3
a b
c d
t=k t=k+1
x u v z
THE PROGRAM ACTIVITY GRAPH (PAG)
16
W1
W2
W3
a b
c d
t=k t=k+1
x u v z
Nodes are timestamped events:
start or end of a worker activity
THE PROGRAM ACTIVITY GRAPH (PAG)
16
W1
W2
W3
a b
c d
t=k t=k+1
x u v z
Nodes are timestamped events:
start or end of a worker activity
u = {

timestamp: k+1,

worker: 2
}
THE PROGRAM ACTIVITY GRAPH (PAG)
17
W1
W2
W3
a b
c d
t=k t=k+1
x u v z
Edges represent activities
annotated with a type and duration
THE PROGRAM ACTIVITY GRAPH (PAG)
17
W1
W2
W3
a b
c d
t=k t=k+1
x u v z
Edges represent activities
annotated with a type and duration
(u, v) = {

type: serialization

duration: 1
}
W1
W2
W3
The Program Activity
Graph captures
computational
dependencies among
parallel workers
▸ Which activities delay the overall execution?
▸ i.e. which activities lie on the critical path of execution?
18
CRITICAL PATH
19
W1
W2
W3
a b
c d
The longest path in the execution history
(not considering waiting activities)
CRITICAL PATH
19
W1
W2
W3
a b
c d
The longest path in the execution history
(not considering waiting activities)
CRITICAL PATH
19
W1
W2
W3
a b
c d
The longest path in the execution history
(not considering waiting activities)
CRITICAL PATH
19
W1
W2
W3
a b
c d
The longest path in the execution history
(not considering waiting activities)
CRITICAL PATH
19
W1
W2
W3
a b
c d
The longest path in the execution history
(not considering waiting activities)
CRITICAL PATH
19
W1
W2
W3
a b
c d
The longest path in the execution history
(not considering waiting activities)
CRITICAL PATH
19
W1
W2
W3
a b
c d
The longest path in the execution history
(not considering waiting activities)
20
W1
W2
W3
a b
c d
CRITICAL PATH
CRITICAL PATH
21
W1
W2
W3
a b
c d
CRITICAL PATH
21
W1
W2
W3
a b
c d
Reduced execution time
POST-MORTEM CRITICAL PATH ANALYSIS IS EASY
1. Collect traces during execution
job start job end
profiler
22
POST-MORTEM CRITICAL PATH ANALYSIS IS EASY
1. Collect traces during execution
job start job end
profiler
22
2. Analyze traces offline
analyzer
performance
summaries
How to compute the critical path for
continuously running,
distributed streaming applications,
with potentially unbounded input?
23
How to compute the critical path for
continuously running,
distributed streaming applications,
with potentially unbounded input?
▸ There might be no “job end”
▸ The PAG and critical path are continuously evolving
▸ Stale profiling information is not useful
23
ONLINE CRITICAL PATH ANALYSIS
ONLINE ANALYSIS OF TRACE SNAPSHOTS
25
input stream output stream
ONLINE ANALYSIS OF TRACE SNAPSHOTS
25
input stream output stream
periodic
snapshot
ONLINE ANALYSIS OF TRACE SNAPSHOTS
25
input stream output stream
periodic
snapshot
trace snapshot
stream
analyzer
performance
summaries
stream
PROGRAM ACTIVITY GRAPH SNAPSHOT
26
W1
W2
W3
a b
c d
t=k t=k+1
x u v z
ts te
W1
W2
W3
a b
c d
x u v z
ts te
27
▸ All paths have the same length: te - ts
W1
W2
W3
a b
c d
x u v z
ts te
27
W1
W2
W3
a b
c d
x u v z
ts te
▸ All paths have the same length: te - ts
28
W1
W2
W3
a b
c d
x u v z
ts te
▸ All paths have the same length: te - ts
29
W1
W2
W3
a b
c d
x u v z
ts te
▸ All paths have the same length: te - ts
▸ Choosing a random path might miss critical activities
30
W1
W2
W3
a b
c d
x u v z
ts te
▸ All paths have the same length: te - ts
▸ Choosing a random path might miss critical activities
31
W1
W2
W3
a b
c d
x u v z
ts te
▸ All paths have the same length: te - ts
▸ Choosing a random path might miss critical activities
31
▸ All paths have the same length: te - ts
▸ Choosing a random path might miss critical activities
▸ Enumerating all paths is impractical
W1
W2
W3
a b
c d
x u v z
ts te
32
W1
W2
W3
a b
c d
x u v z
ts te
How to rank activities with regard to criticality?
All paths are
potentially part
of the evolving
critical path
33
W1
W2
W3
a b
c d
x u v z
ts te
How to rank activities with regard to criticality?
All paths are
potentially part
of the evolving
critical path
Intuition: the more paths an activity appears on
the more probable it is that this activity is critical
33
1
2
3
4
5
6
7
8
9
W1
W2
W3
a b
c d
x u v z
ts te
34
1
2
3
4
5
6
7
8
9
W1
W2
W3
a b
c d
x u v z
ts te
9
0
0
6 6
35
36
CRITICAL PARTICIPATION (CP METRIC)
An estimation of the activity’s participation in the critical path
total number of paths
in the snapshot
activity duration: edge weight
centrality: the number of
paths this activity appears on
tion 8. Transient Path Centrality: Let P = {~p1, ~p2, ...~pN}
set of N transient paths of snapshot G[ts,te]. The tran-
path centrality of an edge e 2 G[ts,te] is defined as
c(e) =
NX
i=1
ci(e), where ci(e) =
8
>><
>>:
0 : e < ~pi
1 : e 2 ~pi
e following holds:
CPa =
TPC(a) · aw
N(te ts)
(3)
Sp
di↵er
ysis:
graph
whos
ers (t
graph
all wo
tions
1 We p
4
36
CRITICAL PARTICIPATION (CP METRIC)
An estimation of the activity’s participation in the critical path
total number of paths
in the snapshot
activity duration: edge weight
centrality: the number of
paths this activity appears on
Can be computed
without path
enumeration!
tion 8. Transient Path Centrality: Let P = {~p1, ~p2, ...~pN}
set of N transient paths of snapshot G[ts,te]. The tran-
path centrality of an edge e 2 G[ts,te] is defined as
c(e) =
NX
i=1
ci(e), where ci(e) =
8
>><
>>:
0 : e < ~pi
1 : e 2 ~pi
e following holds:
CPa =
TPC(a) · aw
N(te ts)
(3)
Sp
di↵er
ysis:
graph
whos
ers (t
graph
all wo
tions
1 We p
4
ONLINE PERFORMANCE ANALYSIS WITH
SNAILTRAIL
38
reference application SnailTrail
Timely
Trace ingestion
CP-based
performance
summaries
PAG construction
CP computation and
activity ranking
trace streams
Profiling
Trace generation
Apache Flink,
Apache Spark,
TensorFlow,
Heron,
Timely Dataflow, ...
SNAILTRAIL IN ACTION
EXAMPLE: TASK SCHEDULING IN APACHE SPARK
39
DRIVER
W1
W2
W3
Venkataraman, Shivaram, et al. "Drizzle: Fast and adaptable stream processing at scale." Spark Summit (2016).
SCHEDULING BOTTLENECK IN APACHE SPARK
40
0 5 10 15
Snapshot
0.0
0.2
0.4
0.6
0.8
CP
0 5 10 15
Snapshot
%weight
Processing Scheduling
Conventional ProfilingSnailTrail Profiling
Apache Spark: Yahoo! Streaming Benchmark, 16 workers, 8s snapshots
SNAILTRAIL CP-BASED SUMMARIES
▸ Activity Summary
▸ which activity type is a bottleneck?
41
0 5 10 15
Snapshot
0.0
0.2
0.4
0.6
0.8
1.0
CP
DataMessage
Unknown
Buffer
Deserialization
Serialization
Processing
Activity Summary
Apache Flink: Dhalion WordCount Benchmark, 4 workers, 1s snapshots
42
0 5 10 15
Snapshot
0.0
0.2
0.4
0.6
0.8
1.0
CP
DataMessage
Unknown
Buffer
Deserialization
Serialization
Processing
Activity Summary
Optimize
serialization!
Apache Flink: Dhalion WordCount Benchmark, 4 workers, 1s snapshots
42
SNAILTRAIL CP-BASED SUMMARIES
▸ Activity Summary
▸ which activity type is a bottleneck?
▸ Straggler Summary
▸ which worker is a bottleneck?
43
0 5 10 15
Snapshot
0.00
0.05
0.10
0.15
CP
W1
W2
W3
W4
Straggler Summary
Apache Flink: Dhalion WordCount Benchmark, 4 workers, 1s snapshots
44
0 5 10 15
Snapshot
0.00
0.05
0.10
0.15
CP
W1
W2
W3
W4
Straggler Summary
Steal work from W1
Apache Flink: Dhalion WordCount Benchmark, 4 workers, 1s snapshots
44
SNAILTRAIL CP-BASED SUMMARIES
▸ Activity Summary
▸ which activity type is a bottleneck?
▸ Straggler Summary
▸ which worker is a bottleneck?
▸ Operator Summary
▸ which operator is a bottleneck?
45
Operator Summary
Apache Flink: Dhalion WordCount Benchmark, 10 workers, 1s snapshots
46
Operator Summary
Increase
flatmap’s parallelism!
Apache Flink: Dhalion WordCount Benchmark, 10 workers, 1s snapshots
46
SNAILTRAIL CP-BASED SUMMARIES
▸ Activity Summary
▸ which activity type is a bottleneck?
▸ Straggler Summary
▸ which worker is a bottleneck?
▸ Operator Summary
▸ which operator is a bottleneck?
▸ Communication Summary
▸ which communication channels are bottlenecks?
47
Communication Criticality
Communication Summary
48
0 1 2 3 4 5 6 7 8 9 10 11 12
Worker
0
1
2
3
4
5
6
7
8
9
10
11
12
Worker
1 32 54 86 7 109 1211
Worker
Worker
1
2
3
4
5
6
7
8
9
10
11
12
Communication Criticality
Communication Summary
48
0 1 2 3 4 5 6 7 8 9 10 11 12
Worker
0
1
2
3
4
5
6
7
8
9
10
11
12
Worker
1 32 54 86 7 109 1211
Worker
Worker
1
2
3
4
5
6
7
8
9
10
11
12
Collocate worker 5
with 2, 9, 10, 11
SNAILTRAIL PERFORMANCE
▸ Low instrumentation overhead
▸ < 10% for all reference systems
▸ High throughput
▸ 1.2 million events per second
▸ Always online
▸ 1s of traces in 6ms, 256s of traces in < 25s
49
Traces from Apache Flink Sessionization, 48 workers, 1s-256s snapshots
SnailTrail on Intel Xeon E5-4640, 2.40GHz, 32 cores, 512GB RAM
RECAP
50
RECAP
50
Strymon: online datacenter analytics
and management
RECAP
50
Strymon: online datacenter analytics
and management
0 5 10 15
Snapshot
0.0
0.2
0.4
0.6
0.8
CP
0 5 10 15
Snapshot
%weight
Processing Scheduling
Conventional profiling is misleading
RECAP
50
Strymon: online datacenter analytics
and management
0 5 10 15
Snapshot
0.0
0.2
0.4
0.6
0.8
CP
0 5 10 15
Snapshot
%weight
Processing Scheduling
Conventional profiling is misleading
CP-metric: online critical path analysis
RECAP
50
Strymon: online datacenter analytics
and management
0 5 10 15
Snapshot
0.0
0.2
0.4
0.6
0.8
CP
0 5 10 15
Snapshot
%weight
Processing Scheduling
Conventional profiling is misleading
CP-metric: online critical path analysis SnailTrail: online CP-based summaries
THE STRYMON TEAM & FRIENDS
51
Vasiliki Kalavri
Zaheer Chothia
Andrea Lattuada
Prof. Timothy Roscoe
Sebastian Wicki
Moritz Hoffmann
Desislava Dimitrova
John Liagouris
Frank McSherry
THE STRYMON TEAM & FRIENDS
51
Vasiliki Kalavri
Zaheer Chothia
Andrea Lattuada
Prof. Timothy Roscoe
Sebastian Wicki
Moritz Hoffmann
Desislava Dimitrova
John Liagouris
? ?
Frank McSherry
THE STRYMON TEAM & FRIENDS
51
Vasiliki Kalavri
Zaheer Chothia
Andrea Lattuada
Prof. Timothy Roscoe
Sebastian Wicki
Moritz Hoffmann
Desislava Dimitrova
John Liagouris
? ?
Frank McSherry
IT COULD BE YOU!
strymon.systems.ethz.ch
www.systems.ethz.ch/positions
ONLINE
PERFORMANCE
ANALYSIS
OF DISTRIBUTED DATAFLOW SYSTEMS
Vasia Kalavri
kalavriv@inf.ethz.ch
Support:
O’Reilly Velocity London
20 October 2017

More Related Content

Similar to Online performance analysis of distributed dataflow systems (O'Reilly Velocity 2017)

Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Flink Forward
 
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
Two Sigma
 
A Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic AssignmentA Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic Assignment
Kelly Taylor
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
Carlos Castillo (ChaTo)
 
Actors for Behavioural Simulation
Actors for Behavioural SimulationActors for Behavioural Simulation
Actors for Behavioural Simulation
ClarkTony
 
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executions
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executionsRedisDay London 2018 - CRDTs and Redis From sequential to concurrent executions
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executions
Redis Labs
 
Digital Communication - Stochastic Process
Digital Communication - Stochastic ProcessDigital Communication - Stochastic Process
Digital Communication - Stochastic Process
International Institute of Information Technology (I²IT)
 
Fluid dynamics
Fluid dynamicsFluid dynamics
Fluid dynamicsCik Minn
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
Carlos Castillo (ChaTo)
 
Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
Real-Time Top-R Topic Detection on Twitter with Topic Hijack FilteringReal-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
Kohei Hayashi
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
helzerpatrina
 
Mit15 082 jf10_lec01
Mit15 082 jf10_lec01Mit15 082 jf10_lec01
Mit15 082 jf10_lec01Saad Liaqat
 
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptx
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptxASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptx
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptx
sunitha1792
 
Compiler Construction | Lecture 11 | Monotone Frameworks
Compiler Construction | Lecture 11 | Monotone FrameworksCompiler Construction | Lecture 11 | Monotone Frameworks
Compiler Construction | Lecture 11 | Monotone Frameworks
Eelco Visser
 
FluidDynamics.ppt
FluidDynamics.pptFluidDynamics.ppt
FluidDynamics.ppt
MuddassirMudassar
 
Graph mining 2: Statistical approaches for graph mining
Graph mining 2: Statistical approaches for graph miningGraph mining 2: Statistical approaches for graph mining
Graph mining 2: Statistical approaches for graph mining
tuxette
 
Introduction to Graph Theory
Introduction to Graph TheoryIntroduction to Graph Theory
Introduction to Graph Theory
Yosuke Mizutani
 
Summarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic EnvironmentsSummarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic Environments
Eirini Ntoutsi
 

Similar to Online performance analysis of distributed dataflow systems (O'Reilly Velocity 2017) (20)

Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
 
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
 
A Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic AssignmentA Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic Assignment
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Actors for Behavioural Simulation
Actors for Behavioural SimulationActors for Behavioural Simulation
Actors for Behavioural Simulation
 
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executions
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executionsRedisDay London 2018 - CRDTs and Redis From sequential to concurrent executions
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executions
 
Digital Communication - Stochastic Process
Digital Communication - Stochastic ProcessDigital Communication - Stochastic Process
Digital Communication - Stochastic Process
 
Fluid dynamics
Fluid dynamicsFluid dynamics
Fluid dynamics
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Learning End to End
Learning End to EndLearning End to End
Learning End to End
 
Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
Real-Time Top-R Topic Detection on Twitter with Topic Hijack FilteringReal-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
 
Mit15 082 jf10_lec01
Mit15 082 jf10_lec01Mit15 082 jf10_lec01
Mit15 082 jf10_lec01
 
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptx
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptxASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptx
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptx
 
Compiler Construction | Lecture 11 | Monotone Frameworks
Compiler Construction | Lecture 11 | Monotone FrameworksCompiler Construction | Lecture 11 | Monotone Frameworks
Compiler Construction | Lecture 11 | Monotone Frameworks
 
FluidDynamics.ppt
FluidDynamics.pptFluidDynamics.ppt
FluidDynamics.ppt
 
Graph mining 2: Statistical approaches for graph mining
Graph mining 2: Statistical approaches for graph miningGraph mining 2: Statistical approaches for graph mining
Graph mining 2: Statistical approaches for graph mining
 
Introduction to Graph Theory
Introduction to Graph TheoryIntroduction to Graph Theory
Introduction to Graph Theory
 
Summarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic EnvironmentsSummarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic Environments
 
Project Management Techniques
Project Management TechniquesProject Management Techniques
Project Management Techniques
 

More from Vasia Kalavri

From data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyondFrom data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyond
Vasia Kalavri
 
Predictive Datacenter Analytics with Strymon
Predictive Datacenter Analytics with StrymonPredictive Datacenter Analytics with Strymon
Predictive Datacenter Analytics with Strymon
Vasia Kalavri
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
Vasia Kalavri
 
Graphs as Streams: Rethinking Graph Processing in the Streaming Era
Graphs as Streams: Rethinking Graph Processing in the Streaming EraGraphs as Streams: Rethinking Graph Processing in the Streaming Era
Graphs as Streams: Rethinking Graph Processing in the Streaming Era
Vasia Kalavri
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph Processing
Vasia Kalavri
 
Like a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web TrackersLike a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web Trackers
Vasia Kalavri
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
Vasia Kalavri
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems research
Vasia Kalavri
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, Explained
Vasia Kalavri
 
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceBlock Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Vasia Kalavri
 
m2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reusem2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reuse
Vasia Kalavri
 
MapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open Issues
Vasia Kalavri
 
A Skype case study (2011)
A Skype case study (2011)A Skype case study (2011)
A Skype case study (2011)
Vasia Kalavri
 
Gelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area MeetupGelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area Meetup
Vasia Kalavri
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
Vasia Kalavri
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Vasia Kalavri
 

More from Vasia Kalavri (17)

From data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyondFrom data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyond
 
Predictive Datacenter Analytics with Strymon
Predictive Datacenter Analytics with StrymonPredictive Datacenter Analytics with Strymon
Predictive Datacenter Analytics with Strymon
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
 
Graphs as Streams: Rethinking Graph Processing in the Streaming Era
Graphs as Streams: Rethinking Graph Processing in the Streaming EraGraphs as Streams: Rethinking Graph Processing in the Streaming Era
Graphs as Streams: Rethinking Graph Processing in the Streaming Era
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph Processing
 
Like a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web TrackersLike a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web Trackers
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems research
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, Explained
 
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceBlock Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
 
m2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reusem2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reuse
 
MapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open Issues
 
A Skype case study (2011)
A Skype case study (2011)A Skype case study (2011)
A Skype case study (2011)
 
Gelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area MeetupGelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area Meetup
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
 

Recently uploaded

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Online performance analysis of distributed dataflow systems (O'Reilly Velocity 2017)