SlideShare a Scribd company logo
1 of 48
Download to read offline
© 2019 Ververica
David Anderson | @alpinegizmo | Training Coordinator
Getting Started with
Apache Flink® on Kubernetes
2 © 2019 Ververica
About Ververica
Creators of
Apache Flink®
Real Time
Stream Processing
for the Enterprise
3 © 2019 Ververica
Outline
1. Introduction
2. Detailed Example
3. Debugging Tips
4. Future Plans
4 © 2019 Ververica
Why Containers?
• Containers provide isolation at low cost
– Require fewer resources than VMs
– Smaller, boot faster
• Simpler to manage
– Each container does one thing
– Consistent packaging
• Enables flexible and dynamic resource allocation
– Scalable
– Composable
5 © 2019 Ververica
Container Orchestration with Kubernetes
• Declarative configuration:
– You tell K8s the desired state, and a background process makes it happen
• 3 replicas of this container should be kept running
• A load balancer should exist, listening on port 443, backed by container with this label
• Core resource types:
– Pod: a group of one or more containers
– Job: keeps pod(s) running until finished
– Deployment: keeps n pods running indefinitely
– Service: a REST object backed by a set of pods
– Persistent Volume Claim: storage whose lifetime is not coupled to any of the pods
6 © 2019 Ververica
Vision: Flink as a Library
• Makes deployments simpler
– Focus is on deploying/running an application
– You build one, complete job-specific Docker image that includes:
• Your application code
• Flink libraries
• Other dependencies
• Configuration files
7 © 2019 Ververica
Flink’s Runtime Building Blocks
• Cluster framework-specific
• Manages available TaskManagers
• Acquires / releases resources
ResourceManager
TaskManagerJobManager
• Registers with ResourceManager
• Provides “task slots”
• Assigned tasks by one or more JobManagers
• One per job
• Schedules job in terms of "task slots"
• Monitors task execution
• Coordinates checkpointing
Dispatcher
• Touch-point for job submissions
• Spawns JobManagers
8 © 2019 Ververica
Flink’s Runtime Building Blocks
• Cluster framework-specific
• Manages available TaskManagers
• Acquires / releases resources
ResourceManager
TaskManagerJobManager
• Registers with ResourceManager
• Provides “task slots”
• Assigned tasks by JobManager(s)
• One per job
• Schedules job in terms of "task slots"
• Monitors task execution
• Coordinates checkpointing
Dispatcher
• Touch-point for job submissions
• Spawns JobManagers
9 © 2019 Ververica
Runtime Building Blocks (on Yarn)
ResourceManager
(3) Request slots
TaskManager
JobManager
(4) Start TaskManager
(5) Register
(7) Deploy Tasks
Dispatcher
App/Client
(1) Submit Job
(2) Start JobManager
(6) Offer slots
10 © 2019 Ververica
But we’re not quite there yet with K8s
11 © 2019 Ververica
Flink on K8s: current status
• Still using the legacy standalone resource manager
• Deployment establishes a static execution environment
• You will have a k8s manifest that effectively says
– there should be n taskmanagers that look like this
Flink is not aware of Kubernetes
12 © 2019 Ververica
Master Container
ResourceManager
JobManager
Mini Dispatcher
(2) Run & Start
Worker Container
TaskManager
Worker Container
TaskManager
Worker Container
TaskManager
(3) Register
(4) Deploy Tasks
(0) One image is built that can be either a Master or Worker
(1) Container framework starts Master & Worker Containers
Flink job cluster on K8s
13 © 2019 Ververica
2. EXAMPLE
https://github.com/alpinegizmo/flink-containers-example
14 © 2019 Ververica
Very Simple Streaming Job
https://github.com/alpinegizmo/flink-containers-example
data generator RichFlatMap print
# events per user
keyBy
15 © 2019 Ververica
16 © 2019 Ververica
Desired Runtime Landscape for K8s
17 © 2019 Ververica
Steps
1. Build the docker image
2. Set up job cluster (k8s job) &
task managers (k8s deployment)
3. Set up job cluster service
4. Add minio for checkpoints
18 © 2019 Ververica
1: Build a docker image
ADD $flink_dist $FLINK_INSTALL_PATH
ADD $job_jar $FLINK_INSTALL_PATH/job.jar
. . .
COPY docker/flink/flink-conf.yaml $FLINK_HOME/conf
COPY docker/flink/log4j-console.properties $FLINK_HOME/conf
COPY docker/flink/docker-entrypoint.sh /
. . .
ENTRYPOINT ["/docker-entrypoint.sh"]
Dockerfile
19 © 2019 Ververica
. . .
JOB_CLUSTER="job-cluster"
TASK_MANAGER="task-manager"
CMD="$1"
shift;
if [ "${CMD}" == "${JOB_CLUSTER}" -o "${CMD}" == "${TASK_MANAGER}" ]; then
if [ "${CMD}" == "${TASK_MANAGER}" ]; then
exec $FLINK_HOME/bin/taskmanager.sh start-foreground "$@"
else
exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$@"
fi
fi
exec "$@"
docker-entrypoint.sh
20 © 2019 Ververica
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: flink-task-manager
spec:
replicas: ${FLINK_NUM_OF_TASKMANAGERS}
template:
metadata:
labels:
app: flink
component: task-manager
spec:
containers:
- name: flink-task-manager
image: ${FLINK_IMAGE_NAME}
imagePullPolicy: Never
args: ["task-manager",
"-Djobmanager.rpc.address=flink-job-cluster"]
task-manager-deployment.yaml.template
apiVersion: batch/v1
kind: Job
metadata:
name: flink-job-cluster
spec:
template:
metadata:
labels:
app: flink
component: job-cluster
spec:
restartPolicy: OnFailure
containers:
- name: flink-job-cluster
image: ${FLINK_IMAGE_NAME}
imagePullPolicy: Never
args: ["job-cluster",
"-Djobmanager.rpc.address=flink-job-cluster",
"-Dblob.server.port=6124",
"-Dqueryable-state.server.ports=6125"]
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob
- containerPort: 6125
name: query
- containerPort: 8081
name: ui
job-cluster-job.yaml.template
2: K8s manifests
21 © 2019 Ververica
task-manager-deployment.yaml.template
apiVersion: batch/v1
kind: Job
metadata:
name: flink-job-cluster
spec:
template:
metadata:
labels:
app: flink
component: job-cluster
spec:
restartPolicy: OnFailure
containers:
- name: flink-job-cluster
image: ${FLINK_IMAGE_NAME}
imagePullPolicy: Never
args: ["job-cluster",
"-Djobmanager.rpc.address=flink-job-cluster",
"-Dblob.server.port=6124",
"-Dqueryable-state.server.ports=6125"]
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob
- containerPort: 6125
name: query
- containerPort: 8081
name: ui
job-cluster-job.yaml.template
2: K8s manifests
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: flink-task-manager
spec:
replicas: ${FLINK_NUM_OF_TASKMANAGERS}
template:
metadata:
labels:
app: flink
component: task-manager
spec:
containers:
- name: flink-task-manager
image: ${FLINK_IMAGE_NAME}
imagePullPolicy: Never
args: ["task-manager",
"-Djobmanager.rpc.address=flink-job-cluster"]
22 © 2019 Ververica
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: flink-task-manager
spec:
replicas: ${FLINK_NUM_OF_TASKMANAGERS}
template:
metadata:
labels:
app: flink
component: task-manager
spec:
containers:
- name: flink-task-manager
image: ${FLINK_IMAGE_NAME}
imagePullPolicy: Never
args: ["task-manager",
"-Djobmanager.rpc.address=flink-job-cluster"]
task-manager-deployment.yaml.template
apiVersion: batch/v1
kind: Job
metadata:
name: flink-job-cluster
spec:
template:
metadata:
labels:
app: flink
component: job-cluster
spec:
restartPolicy: OnFailure
containers:
- name: flink-job-cluster
image: ${FLINK_IMAGE_NAME}
imagePullPolicy: Never
args: ["job-cluster",
"-Djobmanager.rpc.address=flink-job-cluster",
"-Dblob.server.port=6124",
"-Dqueryable-state.server.ports=6125"]
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob
- containerPort: 6125
name: query
- containerPort: 8081
name: ui
job-cluster-job.yaml.template
2: K8s manifests
23 © 2019 Ververica
24 © 2019 Ververica
apiVersion: v1
kind: Service
metadata:
name: flink-job-cluster
labels:
app: flink
component: job-cluster
spec:
ports:
- name: rpc
port: 6123
- name: blob
port: 6124
- name: query
port: 6125
nodePort: 30025
- name: ui
port: 8081
nodePort: 30081
type: NodePort
selector:
app: flink
component: job-cluster
3: Expose job cluster as a service
job-cluster-service.yaml
internal ports
external ports
25 © 2019 Ververica
26 © 2019 Ververica
4: Setup minio for checkpoints & savepoints
• S3-compatible storage service
• Apache License v2.0
• Lightweight, easy to setup
27 © 2019 Ververica
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minio-pv-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
minio-standalone-pvc.yaml
28 © 2019 Ververica
minio-standalone-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: minio
spec:
strategy:
type: Recreate
template:
metadata:
labels:
app: minio
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: minio-pv-claim
containers:
- name: minio
volumeMounts:
- name: data
mountPath: "/data"
image: minio/minio:RELEASE.2019-03-13T21-59-47Z
args:
- server
- /data
env:
- name: MINIO_ACCESS_KEY
value: "minio"
- name: MINIO_SECRET_KEY
value: "minio123"
ports:
- containerPort: 9000
livenessProbe:
httpGet:
path: /minio/health/live
port: 9000
initialDelaySeconds: 120
periodSeconds: 20
29 © 2019 Ververica
apiVersion: v1
kind: Service
metadata:
name: minio-service
spec:
type: NodePort
ports:
- port: 9000
nodePort: 30090
selector:
app: minio
s3.path-style-access: true
s3.endpoint: http://minio-service:9000
minio-standalone-service.yaml
flink-conf.yaml
30 © 2019 Ververica
/bin/sh -c "
sleep 10;
/usr/bin/mc config host add myminio http://minio-service:9000 minio minio123;
/usr/bin/mc mb myminio/state;
exit 0;
"
minio setup job
state.checkpoints.dir: s3://state/checkpoints
state.savepoints.dir: s3://state/savepoints
s3.access-key: minio
s3.secret-key: minio123
flink-conf.yaml
31 © 2019 Ververica
A Note on Bucket Addresses
• Two ways to specify buckets:
– virtual-hosted style: state.minio-service:9000
– path-style: minio-service:9000/state
• It’s easier to get path-style addresses working, by either using
– s3.path-style-access: true (requires flink 1.8+)
or by
– specifying the endpoint with its IP address, rather than hostname
32 © 2019 Ververica
33 © 2019 Ververica
34 © 2019 Ververica
Rescaling
$ kubectl scale deployment -l component=task-manager --replicas=2
deployment.extensions "flink-task-manager" scaled
$ flink modify 00000000000000000000000000000000 -p 8 -m localhost:30081
Modify job 00000000000000000000000000000000.
Rescaled job 00000000000000000000000000000000. Its new parallelism is 8.
35 © 2019 Ververica
3. DEBUGGING
36 © 2019 Ververica
. . .
JOB_CLUSTER="job-cluster"
TASK_MANAGER="task-manager"
if [ "${CMD}" == "${JOB_CLUSTER}" -o "${CMD}" == "${TASK_MANAGER}" ]; then
echo "Starting the ${CMD}"
echo "config file: " && grep '^[^n#]' $FLINK_HOME/conf/flink-conf.yaml
if [ "${CMD}" == "${TASK_MANAGER}" ]; then
exec $FLINK_HOME/bin/taskmanager.sh start-foreground "$@"
else
exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$@"
fi
fi
exec "$@"
docker-entrypoint.sh
37 © 2019 Ververica
Starting the job-cluster
config file:
jobmanager.rpc.address: localhost
jobmanager.rpc.port: 6123
jobmanager.heap.size: 1024m
taskmanager.heap.size: 1024m
taskmanager.numberOfTaskSlots: 4
parallelism.default: 1
high-availability: zookeeper
high-availability.jobmanager.port: 6123
high-availability.storageDir: s3://highavailability/storage
high-availability.zookeeper.quorum: zoo1:2181
state.backend: filesystem
state.checkpoints.dir: s3://state/checkpoints
state.savepoints.dir: s3://state/savepoints
rest.port: 8081
zookeeper.sasl.disable: true
s3.access-key: minio
s3.secret-key: minio123
s3.path-style-access: true
s3.endpoint: http://minio-service:9000
logs
38 © 2019 Ververica
39 © 2019 Ververica
4. FUTURE PLANS
40 © 2019 Ververica
Tighter Integration with K8s
• Active mode
– Flink is aware of the cluster manager that it is running on,
and interacts with it
– Examples exist, e.g., FLIP-6 YARN
• Reactive mode
– Flink is oblivious to its environment
– Flink may react to resources changes by scaling job
41 © 2019 Ververica
Active k8s Integration
K8s deployment
controller
Client
TaskManager
JobManager
K8sResourceManager
ApplicationMaster
TaskManager
(3) Submit job
(1) Submit AM deployment
(2) Start AM
pod
(4) Start JM
(5) Request slots
(6) Submit TM
deployment
(7) Start TM pod
(8) Register(9) Request slots
(10) Offer slots
42 © 2019 Ververica
FLINK-9953: Active Kubernetes integration
The ResourceManager can talk to Kubernetes to launch new pods
43 © 2019 Ververica
Reactive Container Mode
• Relies on external system to start/release
TaskManagers, e.g.,
– Kubernetes Horizontal Pod Autoscaler
– GCP Autoscaling
– AWS Auto Scaling Group
• Re-scale job as resources are
added/removed (take savepoint and resume
job with new parallelism automatically)
• By definition works with all cluster managers
Flink cluster
JM TM TM
ASG
Start new TM if
CPU% > threshold
Monitor metrics, e.g, CPU%
Register
& offer
slots
Event rate over time
44 © 2019 Ververica
FLINK-10407: Reactive container mode
Re-scale job as resources are added/removed
45 © 2019 Ververica
Summary
• Flink currently supports job and session clusters on K8s
• Example
– https://github.com/alpinegizmo/flink-containers-example
• Active K8s integration is in progress
• Reactive container mode has been designed/planned
• Call to action:
– Umbrella tickets: FLINK-9953, FLINK-10407
– Join discussions on dev@flink.apache.org
46 © 2019 Ververica
Thank you!
47 © 2019 Ververica
Questions?
48 © 2019 Ververica
www.ververica.com @VervericaDatadavid@ververica.com

More Related Content

What's hot

What's hot (20)

Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - Preview
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 

Similar to Deploying Flink on Kubernetes - David Anderson

Photon Controller: An Open Source Container Infrastructure Platform from VMware
Photon Controller: An Open Source Container Infrastructure Platform from VMwarePhoton Controller: An Open Source Container Infrastructure Platform from VMware
Photon Controller: An Open Source Container Infrastructure Platform from VMware
Docker, Inc.
 
DevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes IntegrationDevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes Integration
Hank Preston
 

Similar to Deploying Flink on Kubernetes - David Anderson (20)

Flink Forward San Francisco 2019: Future of Apache Flink Deployments: Contain...
Flink Forward San Francisco 2019: Future of Apache Flink Deployments: Contain...Flink Forward San Francisco 2019: Future of Apache Flink Deployments: Contain...
Flink Forward San Francisco 2019: Future of Apache Flink Deployments: Contain...
 
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
 
Dockerizing OpenStack for High Availability
Dockerizing OpenStack for High AvailabilityDockerizing OpenStack for High Availability
Dockerizing OpenStack for High Availability
 
Scaling docker with kubernetes
Scaling docker with kubernetesScaling docker with kubernetes
Scaling docker with kubernetes
 
Kubernetes for the VI Admin
Kubernetes for the VI AdminKubernetes for the VI Admin
Kubernetes for the VI Admin
 
Photon Controller: An Open Source Container Infrastructure Platform from VMware
Photon Controller: An Open Source Container Infrastructure Platform from VMwarePhoton Controller: An Open Source Container Infrastructure Platform from VMware
Photon Controller: An Open Source Container Infrastructure Platform from VMware
 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
 
DockerCon 2022 - From legacy to Kubernetes, securely & quickly
DockerCon 2022 - From legacy to Kubernetes, securely & quicklyDockerCon 2022 - From legacy to Kubernetes, securely & quickly
DockerCon 2022 - From legacy to Kubernetes, securely & quickly
 
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
 
The Kubernetes WebLogic revival (part 2)
The Kubernetes WebLogic revival (part 2)The Kubernetes WebLogic revival (part 2)
The Kubernetes WebLogic revival (part 2)
 
Docker kubernetes fundamental(pod_service)_190307
Docker kubernetes fundamental(pod_service)_190307Docker kubernetes fundamental(pod_service)_190307
Docker kubernetes fundamental(pod_service)_190307
 
K8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals TrainingK8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals Training
 
20191201 kubernetes managed weblogic revival - part 2
20191201 kubernetes managed weblogic revival - part 220191201 kubernetes managed weblogic revival - part 2
20191201 kubernetes managed weblogic revival - part 2
 
Kubernetes workshop -_the_basics
Kubernetes workshop -_the_basicsKubernetes workshop -_the_basics
Kubernetes workshop -_the_basics
 
VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020
 
Get you Java application ready for Kubernetes !
Get you Java application ready for Kubernetes !Get you Java application ready for Kubernetes !
Get you Java application ready for Kubernetes !
 
DevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes IntegrationDevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes Integration
 
Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
 
Deploying Kubernetes in the Enterprise (IBM #Think2019 #7678 Tech Talk)
Deploying Kubernetes in the Enterprise (IBM #Think2019 #7678 Tech Talk)Deploying Kubernetes in the Enterprise (IBM #Think2019 #7678 Tech Talk)
Deploying Kubernetes in the Enterprise (IBM #Think2019 #7678 Tech Talk)
 
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
 

More from Ververica

Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 

More from Ververica (20)

2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
 
Webinar: How to contribute to Apache Flink - Robert Metzger
Webinar:  How to contribute to Apache Flink - Robert MetzgerWebinar:  How to contribute to Apache Flink - Robert Metzger
Webinar: How to contribute to Apache Flink - Robert Metzger
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar:  Detecting row patterns with Flink SQL - Dawid WysakowiczWebinar:  Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
 
Webinar: Flink SQL in Action - Fabian Hueske
 Webinar: Flink SQL in Action - Fabian Hueske Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian Hueske
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
 
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 22018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
 
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processingTimo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
 
Kostas Kloudas - Extending Flink's Streaming APIs
Kostas Kloudas - Extending Flink's Streaming APIsKostas Kloudas - Extending Flink's Streaming APIs
Kostas Kloudas - Extending Flink's Streaming APIs
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
 
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
 
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
 

Recently uploaded

Recently uploaded (20)

Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Buy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxBuy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptx
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 

Deploying Flink on Kubernetes - David Anderson

  • 1. © 2019 Ververica David Anderson | @alpinegizmo | Training Coordinator Getting Started with Apache Flink® on Kubernetes
  • 2. 2 © 2019 Ververica About Ververica Creators of Apache Flink® Real Time Stream Processing for the Enterprise
  • 3. 3 © 2019 Ververica Outline 1. Introduction 2. Detailed Example 3. Debugging Tips 4. Future Plans
  • 4. 4 © 2019 Ververica Why Containers? • Containers provide isolation at low cost – Require fewer resources than VMs – Smaller, boot faster • Simpler to manage – Each container does one thing – Consistent packaging • Enables flexible and dynamic resource allocation – Scalable – Composable
  • 5. 5 © 2019 Ververica Container Orchestration with Kubernetes • Declarative configuration: – You tell K8s the desired state, and a background process makes it happen • 3 replicas of this container should be kept running • A load balancer should exist, listening on port 443, backed by container with this label • Core resource types: – Pod: a group of one or more containers – Job: keeps pod(s) running until finished – Deployment: keeps n pods running indefinitely – Service: a REST object backed by a set of pods – Persistent Volume Claim: storage whose lifetime is not coupled to any of the pods
  • 6. 6 © 2019 Ververica Vision: Flink as a Library • Makes deployments simpler – Focus is on deploying/running an application – You build one, complete job-specific Docker image that includes: • Your application code • Flink libraries • Other dependencies • Configuration files
  • 7. 7 © 2019 Ververica Flink’s Runtime Building Blocks • Cluster framework-specific • Manages available TaskManagers • Acquires / releases resources ResourceManager TaskManagerJobManager • Registers with ResourceManager • Provides “task slots” • Assigned tasks by one or more JobManagers • One per job • Schedules job in terms of "task slots" • Monitors task execution • Coordinates checkpointing Dispatcher • Touch-point for job submissions • Spawns JobManagers
  • 8. 8 © 2019 Ververica Flink’s Runtime Building Blocks • Cluster framework-specific • Manages available TaskManagers • Acquires / releases resources ResourceManager TaskManagerJobManager • Registers with ResourceManager • Provides “task slots” • Assigned tasks by JobManager(s) • One per job • Schedules job in terms of "task slots" • Monitors task execution • Coordinates checkpointing Dispatcher • Touch-point for job submissions • Spawns JobManagers
  • 9. 9 © 2019 Ververica Runtime Building Blocks (on Yarn) ResourceManager (3) Request slots TaskManager JobManager (4) Start TaskManager (5) Register (7) Deploy Tasks Dispatcher App/Client (1) Submit Job (2) Start JobManager (6) Offer slots
  • 10. 10 © 2019 Ververica But we’re not quite there yet with K8s
  • 11. 11 © 2019 Ververica Flink on K8s: current status • Still using the legacy standalone resource manager • Deployment establishes a static execution environment • You will have a k8s manifest that effectively says – there should be n taskmanagers that look like this Flink is not aware of Kubernetes
  • 12. 12 © 2019 Ververica Master Container ResourceManager JobManager Mini Dispatcher (2) Run & Start Worker Container TaskManager Worker Container TaskManager Worker Container TaskManager (3) Register (4) Deploy Tasks (0) One image is built that can be either a Master or Worker (1) Container framework starts Master & Worker Containers Flink job cluster on K8s
  • 13. 13 © 2019 Ververica 2. EXAMPLE https://github.com/alpinegizmo/flink-containers-example
  • 14. 14 © 2019 Ververica Very Simple Streaming Job https://github.com/alpinegizmo/flink-containers-example data generator RichFlatMap print # events per user keyBy
  • 15. 15 © 2019 Ververica
  • 16. 16 © 2019 Ververica Desired Runtime Landscape for K8s
  • 17. 17 © 2019 Ververica Steps 1. Build the docker image 2. Set up job cluster (k8s job) & task managers (k8s deployment) 3. Set up job cluster service 4. Add minio for checkpoints
  • 18. 18 © 2019 Ververica 1: Build a docker image ADD $flink_dist $FLINK_INSTALL_PATH ADD $job_jar $FLINK_INSTALL_PATH/job.jar . . . COPY docker/flink/flink-conf.yaml $FLINK_HOME/conf COPY docker/flink/log4j-console.properties $FLINK_HOME/conf COPY docker/flink/docker-entrypoint.sh / . . . ENTRYPOINT ["/docker-entrypoint.sh"] Dockerfile
  • 19. 19 © 2019 Ververica . . . JOB_CLUSTER="job-cluster" TASK_MANAGER="task-manager" CMD="$1" shift; if [ "${CMD}" == "${JOB_CLUSTER}" -o "${CMD}" == "${TASK_MANAGER}" ]; then if [ "${CMD}" == "${TASK_MANAGER}" ]; then exec $FLINK_HOME/bin/taskmanager.sh start-foreground "$@" else exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$@" fi fi exec "$@" docker-entrypoint.sh
  • 20. 20 © 2019 Ververica apiVersion: extensions/v1beta1 kind: Deployment metadata: name: flink-task-manager spec: replicas: ${FLINK_NUM_OF_TASKMANAGERS} template: metadata: labels: app: flink component: task-manager spec: containers: - name: flink-task-manager image: ${FLINK_IMAGE_NAME} imagePullPolicy: Never args: ["task-manager", "-Djobmanager.rpc.address=flink-job-cluster"] task-manager-deployment.yaml.template apiVersion: batch/v1 kind: Job metadata: name: flink-job-cluster spec: template: metadata: labels: app: flink component: job-cluster spec: restartPolicy: OnFailure containers: - name: flink-job-cluster image: ${FLINK_IMAGE_NAME} imagePullPolicy: Never args: ["job-cluster", "-Djobmanager.rpc.address=flink-job-cluster", "-Dblob.server.port=6124", "-Dqueryable-state.server.ports=6125"] ports: - containerPort: 6123 name: rpc - containerPort: 6124 name: blob - containerPort: 6125 name: query - containerPort: 8081 name: ui job-cluster-job.yaml.template 2: K8s manifests
  • 21. 21 © 2019 Ververica task-manager-deployment.yaml.template apiVersion: batch/v1 kind: Job metadata: name: flink-job-cluster spec: template: metadata: labels: app: flink component: job-cluster spec: restartPolicy: OnFailure containers: - name: flink-job-cluster image: ${FLINK_IMAGE_NAME} imagePullPolicy: Never args: ["job-cluster", "-Djobmanager.rpc.address=flink-job-cluster", "-Dblob.server.port=6124", "-Dqueryable-state.server.ports=6125"] ports: - containerPort: 6123 name: rpc - containerPort: 6124 name: blob - containerPort: 6125 name: query - containerPort: 8081 name: ui job-cluster-job.yaml.template 2: K8s manifests apiVersion: extensions/v1beta1 kind: Deployment metadata: name: flink-task-manager spec: replicas: ${FLINK_NUM_OF_TASKMANAGERS} template: metadata: labels: app: flink component: task-manager spec: containers: - name: flink-task-manager image: ${FLINK_IMAGE_NAME} imagePullPolicy: Never args: ["task-manager", "-Djobmanager.rpc.address=flink-job-cluster"]
  • 22. 22 © 2019 Ververica apiVersion: extensions/v1beta1 kind: Deployment metadata: name: flink-task-manager spec: replicas: ${FLINK_NUM_OF_TASKMANAGERS} template: metadata: labels: app: flink component: task-manager spec: containers: - name: flink-task-manager image: ${FLINK_IMAGE_NAME} imagePullPolicy: Never args: ["task-manager", "-Djobmanager.rpc.address=flink-job-cluster"] task-manager-deployment.yaml.template apiVersion: batch/v1 kind: Job metadata: name: flink-job-cluster spec: template: metadata: labels: app: flink component: job-cluster spec: restartPolicy: OnFailure containers: - name: flink-job-cluster image: ${FLINK_IMAGE_NAME} imagePullPolicy: Never args: ["job-cluster", "-Djobmanager.rpc.address=flink-job-cluster", "-Dblob.server.port=6124", "-Dqueryable-state.server.ports=6125"] ports: - containerPort: 6123 name: rpc - containerPort: 6124 name: blob - containerPort: 6125 name: query - containerPort: 8081 name: ui job-cluster-job.yaml.template 2: K8s manifests
  • 23. 23 © 2019 Ververica
  • 24. 24 © 2019 Ververica apiVersion: v1 kind: Service metadata: name: flink-job-cluster labels: app: flink component: job-cluster spec: ports: - name: rpc port: 6123 - name: blob port: 6124 - name: query port: 6125 nodePort: 30025 - name: ui port: 8081 nodePort: 30081 type: NodePort selector: app: flink component: job-cluster 3: Expose job cluster as a service job-cluster-service.yaml internal ports external ports
  • 25. 25 © 2019 Ververica
  • 26. 26 © 2019 Ververica 4: Setup minio for checkpoints & savepoints • S3-compatible storage service • Apache License v2.0 • Lightweight, easy to setup
  • 27. 27 © 2019 Ververica apiVersion: v1 kind: PersistentVolumeClaim metadata: name: minio-pv-claim spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi minio-standalone-pvc.yaml
  • 28. 28 © 2019 Ververica minio-standalone-deployment.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: minio spec: strategy: type: Recreate template: metadata: labels: app: minio spec: volumes: - name: data persistentVolumeClaim: claimName: minio-pv-claim containers: - name: minio volumeMounts: - name: data mountPath: "/data" image: minio/minio:RELEASE.2019-03-13T21-59-47Z args: - server - /data env: - name: MINIO_ACCESS_KEY value: "minio" - name: MINIO_SECRET_KEY value: "minio123" ports: - containerPort: 9000 livenessProbe: httpGet: path: /minio/health/live port: 9000 initialDelaySeconds: 120 periodSeconds: 20
  • 29. 29 © 2019 Ververica apiVersion: v1 kind: Service metadata: name: minio-service spec: type: NodePort ports: - port: 9000 nodePort: 30090 selector: app: minio s3.path-style-access: true s3.endpoint: http://minio-service:9000 minio-standalone-service.yaml flink-conf.yaml
  • 30. 30 © 2019 Ververica /bin/sh -c " sleep 10; /usr/bin/mc config host add myminio http://minio-service:9000 minio minio123; /usr/bin/mc mb myminio/state; exit 0; " minio setup job state.checkpoints.dir: s3://state/checkpoints state.savepoints.dir: s3://state/savepoints s3.access-key: minio s3.secret-key: minio123 flink-conf.yaml
  • 31. 31 © 2019 Ververica A Note on Bucket Addresses • Two ways to specify buckets: – virtual-hosted style: state.minio-service:9000 – path-style: minio-service:9000/state • It’s easier to get path-style addresses working, by either using – s3.path-style-access: true (requires flink 1.8+) or by – specifying the endpoint with its IP address, rather than hostname
  • 32. 32 © 2019 Ververica
  • 33. 33 © 2019 Ververica
  • 34. 34 © 2019 Ververica Rescaling $ kubectl scale deployment -l component=task-manager --replicas=2 deployment.extensions "flink-task-manager" scaled $ flink modify 00000000000000000000000000000000 -p 8 -m localhost:30081 Modify job 00000000000000000000000000000000. Rescaled job 00000000000000000000000000000000. Its new parallelism is 8.
  • 35. 35 © 2019 Ververica 3. DEBUGGING
  • 36. 36 © 2019 Ververica . . . JOB_CLUSTER="job-cluster" TASK_MANAGER="task-manager" if [ "${CMD}" == "${JOB_CLUSTER}" -o "${CMD}" == "${TASK_MANAGER}" ]; then echo "Starting the ${CMD}" echo "config file: " && grep '^[^n#]' $FLINK_HOME/conf/flink-conf.yaml if [ "${CMD}" == "${TASK_MANAGER}" ]; then exec $FLINK_HOME/bin/taskmanager.sh start-foreground "$@" else exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$@" fi fi exec "$@" docker-entrypoint.sh
  • 37. 37 © 2019 Ververica Starting the job-cluster config file: jobmanager.rpc.address: localhost jobmanager.rpc.port: 6123 jobmanager.heap.size: 1024m taskmanager.heap.size: 1024m taskmanager.numberOfTaskSlots: 4 parallelism.default: 1 high-availability: zookeeper high-availability.jobmanager.port: 6123 high-availability.storageDir: s3://highavailability/storage high-availability.zookeeper.quorum: zoo1:2181 state.backend: filesystem state.checkpoints.dir: s3://state/checkpoints state.savepoints.dir: s3://state/savepoints rest.port: 8081 zookeeper.sasl.disable: true s3.access-key: minio s3.secret-key: minio123 s3.path-style-access: true s3.endpoint: http://minio-service:9000 logs
  • 38. 38 © 2019 Ververica
  • 39. 39 © 2019 Ververica 4. FUTURE PLANS
  • 40. 40 © 2019 Ververica Tighter Integration with K8s • Active mode – Flink is aware of the cluster manager that it is running on, and interacts with it – Examples exist, e.g., FLIP-6 YARN • Reactive mode – Flink is oblivious to its environment – Flink may react to resources changes by scaling job
  • 41. 41 © 2019 Ververica Active k8s Integration K8s deployment controller Client TaskManager JobManager K8sResourceManager ApplicationMaster TaskManager (3) Submit job (1) Submit AM deployment (2) Start AM pod (4) Start JM (5) Request slots (6) Submit TM deployment (7) Start TM pod (8) Register(9) Request slots (10) Offer slots
  • 42. 42 © 2019 Ververica FLINK-9953: Active Kubernetes integration The ResourceManager can talk to Kubernetes to launch new pods
  • 43. 43 © 2019 Ververica Reactive Container Mode • Relies on external system to start/release TaskManagers, e.g., – Kubernetes Horizontal Pod Autoscaler – GCP Autoscaling – AWS Auto Scaling Group • Re-scale job as resources are added/removed (take savepoint and resume job with new parallelism automatically) • By definition works with all cluster managers Flink cluster JM TM TM ASG Start new TM if CPU% > threshold Monitor metrics, e.g, CPU% Register & offer slots Event rate over time
  • 44. 44 © 2019 Ververica FLINK-10407: Reactive container mode Re-scale job as resources are added/removed
  • 45. 45 © 2019 Ververica Summary • Flink currently supports job and session clusters on K8s • Example – https://github.com/alpinegizmo/flink-containers-example • Active K8s integration is in progress • Reactive container mode has been designed/planned • Call to action: – Umbrella tickets: FLINK-9953, FLINK-10407 – Join discussions on dev@flink.apache.org
  • 46. 46 © 2019 Ververica Thank you!
  • 47. 47 © 2019 Ververica Questions?
  • 48. 48 © 2019 Ververica www.ververica.com @VervericaDatadavid@ververica.com