Christian Kreuzfeld – Static vs Dynamic Stream Processing

STATIC VS DYNAMIC STREAM PROCESSING
Christian Kreutzfeldt
@mnxfst
STATIC VS DYNAMIC STREAM PROCESSING
Christian Kreutzfeldt
@mnxfst

1. Introduction
2. Stream Processing - First Encounter
3. Increasing number of Use Cases
4. Arising Implementation Issues
5. Requirements for Stream Processing Framework
6. Way to SPQR (+ short demo)
7. Way to Apache Flink (extension points + short demo)
8. Future (hope to come)
9. Q&A

Christian Kreutzfeldt (@mnxfst)
Senior Software Developer & Architect at
Otto Group Business Intelligence Department
Tech Lead “Real-Time Stream Processing”
Computer Science at University of Luebeck

w/ catalogue business,
e-commerce and over-
the-counter retail
Multichannel Retail
covering the entire
portfolio of retail
services across the
value-added chain
Services
World’s Second-Largest Online Retailer in End-Consumer Business
Europe’s Largest Online Retailer in End-Consumer Fashion & Lifestyle Business
providing retail-related
financial services
across the value-
added chain
Financial Services

definition of
business
intelligence
strategy
BI Strategy
talent
recruitment &
training,
networking &
consulting
Consulting
evaluation &
impl. of data
driven
business
models
Business
Development
maintaining &
providing
data pools
Data Pool
software-as-
a-service
solutions
SaaS
Products
driven by data, inspired by our customers

dedicated to open source
stream processing
framework
SPQR
scheduling framework
for painfree agile
development of your
datahub
Schedoscope
framework for
developing real-world
machine learning
solutions
Palladium
follow us on github.com/ottogroup

Stream Processing
first steps w/ unified tracking
U
n
i
f
i
e
d
T
r
a
c
k
i
n
g

Stream Processing
prevent quality problems
U
n
i
f
i
e
d
T
r
a
c
k
i
n
g
Tagging
Template
Tagging
Template
Tagging
Template
Tagging
Template

Stream Processing
prevent quality problems
U
n
i
f
i
e
d
T
r
a
c
k
i
n
g
Tagging
Template
Tagging
Template
Tagging
Template
Tagging
Template
Event
Stream
Event Validator
akka-based
real stream
processing

customer sessions
search sessions
user-agent identification
dynamic profile
selection
dynamic stream
queries
Stream Processing
developing project ideas

Umberto Salvagnin https://www.flickr.com/photos/kaibara/4688161016 (cc by 2.0)
Stream Processing
software development issues
resource intensive use-
case implementation
required ops support for
topology deployment and
monitoring
rather static
implementations than
highly flexible ones
highly time consuming
Static Topologies (Queries)
Dynamic Data
Highly Flexible Context

Stream Processing
requirements to ease the pain
unified runtime
environment
operations support
support for multiple
sources and sinks
real stream processing
easy-to-extend
steep learning curve

Stream Processing
working w/ data the business way
no-code topology definition
(the SQL way)
self dependent,
immediate deployments
consistent monitoring
(behavior / result retrieval)
adjustment through re-
deployments
Dynamic Topologies (Queries)
Dynamic Data
Highly Flexible Context

Stream Processing
framework decision
unified runtime
environment
operations support
support for multiple
sources and sinks
real stream processing
easy-to-extend
steep learning curve
SPQR
(spooker)
no-code topology
definition
self dependent
deployments
short feedback circuit

SPQR
concepts
independent
library
deployments into
node repositories
for later use
library
deployment
configuration
based pipeline
descriptions
zero-code
topologies
support for
ad hoc queries,
immediate
adjustments and
short feedback
circuits
ad hoc queries
https://github.com/ottogroup/spqr

Dynamic Stream Processing
importance for (business) acceptance
no-code topology
definition
self dependent
deployments
short feedback circuit
steep learning curve, focus on functionality instead
of implementation, better representation
no or less ops support, shorter time-to-execution,
independency from tech teams, easier to use
short feedback circuit, easier to adjust
support people to try out new ideas, get more
people to work with data streams
choose representation defined by topology author
as foundation for monitoring to have common
understanding (topology author, ops team)

from spqr to apache flink - it’s all there
Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0)
akka

variety of ways to interact with apache flink
variety to message types (request/response) available to interact with job
manager / cluster:
● RequestNumberRegisteredTaskManager
● RequestTotalNumberOfSlots
● SubmitJob
● CancelJob
● RequestPartitionState
● RequestJobStatus
● RequestRunningJobs
● RequestRunningJobsStatus
● RequestJob
● RequestRegisteredTaskManagers
● RequestStackTrace
● RequestJobManagerStatus
● AccumulatorMessage (RequestAccumulatorResultsStringified,...)
● ...

Apache Flink
short feedback circuit & consistent monitoring (impl)
akka
FlinkMetricsCollector RunningJobsManagerspawns
queries
JobManager
JobMetricsCollector
spawns for each
job
queries
JobManager

Apache Flink
short feedback circuit & consistent monitoring (impl)
akka
public void preStart() throws Exception {
context().system().scheduler().schedule(
FiniteDuration.Zero(),
FiniteDuration.apply(5, TimeUnit.SECONDS),
this.remoteJobManagerRef,
new RequestAccumulatorResults(this.jobId),
context().dispatcher(),
getSelf()
);
} AccumulatorResultsFound
public void preStart() throws Exception {
context().system().scheduler().schedule(
FiniteDuration.Zero(),
FiniteDuration.apply(5, TimeUnit.SECONDS),
this.remoteJobManagerRef,
JobManagerMessages.getRequestRunningJobsStatus(),
context().dispatcher(),
getSelf()
);
}
receive RunningJobsStatus
extract job identifier
start job metrics collector
RunningJobsManager
JobMetricsCollector

Apache Flink
metrics retrieval through accumulators
D E M O

https://nifi.apache.org/
Apache Flink
how to move on
deploy metrics
under
construction

Apache Flink
topology definition & deployments (integration points)
akka
no-code topology
definition
self dependent
deployments
expects code
requires far too
much framework
modifications
the place to be

metricsdeploy
Apache Flink
relevance
Static Data
Static Queries
Static Data
Dynamic Queries
Dynamic Data
Static Queries
Dynamic Data
Dynamic Queries
SQL

metricsdeploy
Apache Flink
apache zeppelin points the right direction
Static Data
Static Queries
Static Data
Dynamic Queries
Dynamic Data
Static Queries
Dynamic Data
Dynamic Queries
SQL

http://www.ottogroup.com/en/karriere/
W
e
are
hiring!

Christian Kreuzfeld – Static vs Dynamic Stream Processing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Christian Kreuzfeld – Static vs Dynamic Stream Processing

Similar to Christian Kreuzfeld – Static vs Dynamic Stream Processing (20)

More from Flink Forward

More from Flink Forward (20)

Recently uploaded

Recently uploaded (20)

Christian Kreuzfeld – Static vs Dynamic Stream Processing