Till Rohrmann
till@data-artisans.com
@stsffap
Apache Flink® Meets
Apache Mesos® and DC/OS
Jörg Schad
joerg@mesosphere.io
@joerg_schad
MapReduce is
crunching Data
We need to turn faster!
Evolution of Data Analytics
Batch Event ProcessingMicro-Batch
Days Hours Minutes Seconds Microseconds
Solves problems using predictive and
prescriptive analytics
Reports what has happened using descriptive
analytics
Predictive User InterfaceReal-time Pricing and
Routing
Real-time
Advertising
Billing,

Chargeback
Product
recommendations
FMACK Stack
EVENTS
Ubiquitous data
streams from
connected devices
INGEST
Apache
Kafka
STORE
Apache
Flink
ANALYZE
Apache
Cassandra
ACT
Akka
Ingest millions of
events per second
Distributed & highly
scalable database
Real-time and batch
process data
Visualize data & build
data driven apps
Mesos/ DC/OS
Sensors
Devices
Clients
Datacenter
Naive Approach
Typical Datacenter

siloed, over-provisioned servers,

low utilization
Industry Average

12-15% utilization
mySQL
microservice
Cassandra
Flink
Kafka
© 2017 Mesosphere, Inc. All Rights Reserved. 10
Apache Mesos
Typical Datacenter

siloed, over-provisioned servers,

low utilization
Industry Average

12-15% utilization
mySQL
microservice
Cassandra
Flink
Kafka
Mesos

automated schedulers, workload multiplexing
onto the same machines
Apache Mesos


Why Mesos?
! 2-level scheduling
! Fault-tolerant, battle-tested
! Scalable to 10,000+ nodes
! Created by Mesosphere founder
@ UC Berkeley; used in production
by 100+ web-scale companies [1]
[1] http://mesos.apache.org/documentation/latest/powered-by-mesos/
Apache Flink & Apache Mesos
Why Apache Mesos?
▪ Mesos offers full functionality to implement fault
tolerant and elastic distributed applications
▪ 30% of survey respondents were running Flink
on Mesos (prior to proper Mesos support,
September 2016)
Flink’s Mesos Integration
Apache Flink Framework
Mesos Master
Mesos App Master
Flink Mesos

ResourceManager
JobManager
Mesos Task
TaskManager
Mesos Task
TaskManager
Allocate
Resources
Launch Mesos
tasks
Register
Execute Job
Resource Manager Components
▪ Monitors connection to Mesos
Connection Monitor Launch Coordinator
▪ Resource offer processing and task
scheduling
▪ Gathers offers and matches them to
tasks using Fenzo
Task Monitor
Reconciliation
Coordinator
▪ Monitors Mesos tasks
▪ Triggers reconciliation
▪ Makes sure tasks are properly killed
▪ Reconciles tasks view between
ResourceManager and Mesos Master
Component Interplay
ResourceManager
Connection
Monitor
Launch
Coordinator
Task MonitorReconciliation
Coordinator
Mesos Master
Resource offers
Launch
tasks
Monitor tasks
Status
messages
Trigger
reconciliation
Status messages
Mesos Task
Reconcile tasks
Start
TaskManagers
Recover tasks
Kill task
Fenzo
▪ Generic task scheduler for Mesos frameworks
▪ Developed by Netflix
▪ Matching between tasks and resource offers
▪ Pluggable fitness evaluator
Fenzo
Mesos
Launch
Coordinator
Periodic
resource
offers
Tell Fenzo offered
resources & tasks
Fenzo returns resource
task matchings
Tasks to launch
New Distributed Architecture
Mesos Master
Mesos Cluster
Client
(2) HTTP POST
JobGraph/Jars
Flink Master Process
Flink Mesos

ResourceManager
JobManager
(4) Start Process
(and supervise)
(8) Deploy

Tasks
(7) Register
(5) Request slots
Flink Mesos
Dispatcher
(3) Allocate
container

for Flink
master
(6) Allocate
containers

for TaskManagers
Marathon
(1) Start and
monitor
dispatcher
Mesos Task
TaskManager
Mesos Task
TaskManager
DC/OS
Datacenter Operating System (DC/OS)
Distributed Systems Kernel (Mesos)
Big Data + Analytics EnginesMicroservices (in containers)
Streaming
Batch
Machine Learning
Analytics
Functions &
Logic
Search
Time Series
SQL / NoSQL
Databases
Modern App Components
Any Infrastructure (Physical, Virtual, Cloud)
Demo Time
Generator
▪ Financial data generated by generator
▪ Written to Kafka topics
▪ Kafka topics consumed by Flink
▪ Flink pipeline operates on Kafka data
▪ Results written back into Kafka
Conclusion
TL;DL
▪ Apache Flink runs on Mesos using Fenzo
▪ New distributed architecture supports
dynamic resource allocation
▪ DC/OS offers easy to use Flink package
25
Thank you!
@joerg_schad
@stsffap
@ApacheFlink
@dataArtisans
@dcos
We are hiring!
data-artisans.com/careers

Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apache Mesos and DC/OS

  • 1.
    Till Rohrmann till@data-artisans.com @stsffap Apache Flink®Meets Apache Mesos® and DC/OS Jörg Schad joerg@mesosphere.io @joerg_schad
  • 4.
  • 5.
    We need toturn faster!
  • 6.
    Evolution of DataAnalytics Batch Event ProcessingMicro-Batch Days Hours Minutes Seconds Microseconds Solves problems using predictive and prescriptive analytics Reports what has happened using descriptive analytics Predictive User InterfaceReal-time Pricing and Routing Real-time Advertising Billing,
 Chargeback Product recommendations
  • 7.
    FMACK Stack EVENTS Ubiquitous data streamsfrom connected devices INGEST Apache Kafka STORE Apache Flink ANALYZE Apache Cassandra ACT Akka Ingest millions of events per second Distributed & highly scalable database Real-time and batch process data Visualize data & build data driven apps Mesos/ DC/OS Sensors Devices Clients
  • 8.
  • 9.
    Naive Approach Typical Datacenter
 siloed,over-provisioned servers,
 low utilization Industry Average
 12-15% utilization mySQL microservice Cassandra Flink Kafka
  • 10.
    © 2017 Mesosphere,Inc. All Rights Reserved. 10
  • 11.
    Apache Mesos Typical Datacenter
 siloed,over-provisioned servers,
 low utilization Industry Average
 12-15% utilization mySQL microservice Cassandra Flink Kafka Mesos
 automated schedulers, workload multiplexing onto the same machines
  • 12.
    Apache Mesos 
 Why Mesos? !2-level scheduling ! Fault-tolerant, battle-tested ! Scalable to 10,000+ nodes ! Created by Mesosphere founder @ UC Berkeley; used in production by 100+ web-scale companies [1] [1] http://mesos.apache.org/documentation/latest/powered-by-mesos/
  • 13.
    Apache Flink &Apache Mesos
  • 14.
    Why Apache Mesos? ▪Mesos offers full functionality to implement fault tolerant and elastic distributed applications ▪ 30% of survey respondents were running Flink on Mesos (prior to proper Mesos support, September 2016)
  • 15.
    Flink’s Mesos Integration ApacheFlink Framework Mesos Master Mesos App Master Flink Mesos
 ResourceManager JobManager Mesos Task TaskManager Mesos Task TaskManager Allocate Resources Launch Mesos tasks Register Execute Job
  • 16.
    Resource Manager Components ▪Monitors connection to Mesos Connection Monitor Launch Coordinator ▪ Resource offer processing and task scheduling ▪ Gathers offers and matches them to tasks using Fenzo Task Monitor Reconciliation Coordinator ▪ Monitors Mesos tasks ▪ Triggers reconciliation ▪ Makes sure tasks are properly killed ▪ Reconciles tasks view between ResourceManager and Mesos Master
  • 17.
    Component Interplay ResourceManager Connection Monitor Launch Coordinator Task MonitorReconciliation Coordinator MesosMaster Resource offers Launch tasks Monitor tasks Status messages Trigger reconciliation Status messages Mesos Task Reconcile tasks Start TaskManagers Recover tasks Kill task
  • 18.
    Fenzo ▪ Generic taskscheduler for Mesos frameworks ▪ Developed by Netflix ▪ Matching between tasks and resource offers ▪ Pluggable fitness evaluator Fenzo Mesos Launch Coordinator Periodic resource offers Tell Fenzo offered resources & tasks Fenzo returns resource task matchings Tasks to launch
  • 19.
    New Distributed Architecture MesosMaster Mesos Cluster Client (2) HTTP POST JobGraph/Jars Flink Master Process Flink Mesos
 ResourceManager JobManager (4) Start Process (and supervise) (8) Deploy
 Tasks (7) Register (5) Request slots Flink Mesos Dispatcher (3) Allocate container
 for Flink master (6) Allocate containers
 for TaskManagers Marathon (1) Start and monitor dispatcher Mesos Task TaskManager Mesos Task TaskManager
  • 21.
    DC/OS Datacenter Operating System(DC/OS) Distributed Systems Kernel (Mesos) Big Data + Analytics EnginesMicroservices (in containers) Streaming Batch Machine Learning Analytics Functions & Logic Search Time Series SQL / NoSQL Databases Modern App Components Any Infrastructure (Physical, Virtual, Cloud)
  • 22.
    Demo Time Generator ▪ Financialdata generated by generator ▪ Written to Kafka topics ▪ Kafka topics consumed by Flink ▪ Flink pipeline operates on Kafka data ▪ Results written back into Kafka
  • 23.
  • 24.
    TL;DL ▪ Apache Flinkruns on Mesos using Fenzo ▪ New distributed architecture supports dynamic resource allocation ▪ DC/OS offers easy to use Flink package
  • 25.
  • 26.