SlideShare a Scribd company logo
1 of 34
Download to read offline
How to build a tool for
operating Flink on Kubernetes
Andrea Medeghini
Software Engineer / Contractor
Which products are available?
Ververica dA Platform:
● Automated deployments
● Easy management of jobs
● Monitoring
● Logging (ELK)
● Trial version available
Are there free alternatives?
There are few projects on GitHub, however…
● They mainly focus on deployment
● They do not provide a complete solution
● They might not work for all use cases
Can we use Helm?
It’s good, but it doesn’t help with…
● Jobs and Savepoints
● Monitoring / Alerting
● Automatic Scaling
# helm install --name my-flink-cluster charts/flink
Wait for next Flink release?
Better integration with Kubernetes it’s coming:
● Reactive container mode
https://issues.apache.org/jira/browse/FLINK-10407
● Active Kubernetes integration
https://issues.apache.org/jira/browse/FLINK-9953
Shall we build our own tool?
It’s going to be challenging! Because…
● Flink is a distributed engine
● Flink is a stateful engine
● Jobs need to be packaged and uploaded
● Jobs need to be monitored to detect failures
● Resources need to be adjusted according to workload
Do we need an open source tool?
Everybody likes open source tools…
… how do we build one?
Overview of a Flink Cluster
● One or more JobManagers
(typically one)
● One or more TaskManagers
(typically many)
● One or more jobs packaged as
JAR files
● Storage for savepoints
Exploiting Kubernetes API
It’s all REST!
There are clients libraries…
… for many languages not only Go!
See Kubernetes Documentation:
https://kubernetes.io/docs/reference/
Control Kubernetes Programmatically
val jobmanagerStatefulSet = V1StatefulSet()
.metadata(jobmanagerMetadata)
.spec(
V1StatefulSetSpec()
.replicas(1)
.template(
V1PodTemplateSpec().spec(jobmanagerPodSpec).metadata(jobmanagerMetadata)
)
.updateStrategy(updateStrategy)
.serviceName("jobmanager")
.selector(jobmanagerSelector)
.addVolumeClaimTemplatesItem(persistentVolumeClaim)
)
api.createNamespacedStatefulSet(namespace, jobmanagerStatefulSet, null, null, null)
What resources do we need?
● StatefulSet for JobManager (1 replica)
● StatefulSet for TaskManager (N replicas)
● Services for JobManager (headless, NodePort, ...)
● PersistentVolumeClaims
● …
What configuration do we need?
● Set JOB_MANAGER_RPC_ADDRESS to JobManager service
● Set TASK_MANAGER_NUMBER_OF_TASK_SLOTS to 1
● Set memory limits of container higher than max heap
● Set CPU limits to sensible value
● Configure pod affinity to spread workload
● Expose relevant ports (usually only internally)
● Add sensible labels to identify resources
Run exec against the Job Manager
How does it work?
● Kubernetes Client for
managing clusters
● Exec for executing
commands in the
containers
Easy to implement but...
● It depends on commands installed in the container
● It seems too consuming in terms of resources (we need to
run a process inside the container for each operation)
● It doesn’t enforce any protocol (stdin/stdout)
Flink Monitoring API to the rescue!
Flink has a pretty useful REST API:
● Endpoints for managing jobs
● Endpoints for managing savepoints
● Endpoints for monitoring the cluster
Is there a client library? I am afraid not…
Create client using OpenAPI
I manually crafted an
OpenAPI specification file…
… It’s tedious but the
generated client works fine!
See Swagger Documentation:
https://swagger.io/docs/specification/about/
Swagger Editor and Code generator
/v1/jobs:
get:
operationId: getJobs
summary: Returns an overview over all jobs and their current state
responses:
'200':
description: |-
200 response
content:
application/json:
schema:
$ref: '#/components/schemas/JobIdsWithStatusOverview'
...
See full specification on GitHub:
https://github.com/nextbreakpoint/flink-client/blob/master/flink-openapi.yaml
Combine all in one application
We can combine the APIs:
● Kubernetes Client for
managing clusters
● Flink Client for
managing jobs
What are the limitations?
● Where does the client live?
● Still no monitoring or automatic scaling
● NodePort or Port Forward required for each
JobManager (for each Flink Cluster)
● Port Forward doesn’t work well with file
upload (there is a problem with timeout in
the Kubernetes Client for Java)
Run controller inside Kubernetes
What are the benefits?
● It can easily access
internal resources
● It runs with its own service
account
● It can monitor the clusters
● It can rescale the clusters
Better than before but...
● One port forward is still required
● Authorization is required for API
● It doesn’t follow best practises!
We need a Kubernetes Operator!
Everybody think we need Go, but…
… an Operator is like a pattern…
… and we can use any programming language!
Operator SDK for Go:
https://github.com/operator-framework/operator-sdk
Operator Pattern:
https://coreos.com/blog/introducing-operators.html
Custom Resource Definition
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: flinkclusters.beta.nextbreakpoint.com
spec:
group: beta.nextbreakpoint.com
versions:
- name: v1
served: true
storage: true
scope: Namespaced
names:
plural: flinkclusters
singular: flinkcluster
kind: FlinkCluster
shortNames:
- fc
/api/beta.nextbreakpoint.com/v1/namespaces/*/flinkclusters
# kubectl create -f flink-crd.yaml
# kubectl get crd
NAME AGE
flinkclusters.beta.nextbreakpoint.com 1d
…
Custom Objects
apiVersion: "beta.nextbreakpoint.com/v1"
kind: FlinkCluster
metadata:
name: test
spec:
clusterName: test
environment: test
pullSecrets: regcred
pullPolicy: Always
flinkImage: nextbreakpoint/flink:1.7.2-1
sidecarImage: flink-workshop-jobs:2
sidecarServiceAccount: flink-operator
sidecarClassName: com.nextbreakpoint.flink.jobs.TestJob
sidecarJarPath: /com.nextbreakpoint.flinkworkshop-1.0.0.jar
sidecarParallelism: 1
sidecarArguments:
- --BUCKET_BASE_PATH
- file:///var/tmp
# kubectl create -f cluster.yaml
# kubectl get flinkclusters
NAME AGE
test 4s
The Operator Loop
1. Receive updates of Custom Objects
2. Receive updates of StatefulSets,
Services, PVCs, …
3. Compare desired state to actual
state
4. Adjust current state to match
desired state
5. Repeat from 1
Run a Flink Operator
What are the benefits?
● It follows Kubernetes
best practises
● It runs with its own
service account
● We only need to create
cluster objects
Operator meets Controller
They can operate together:
● Use operator with CD pipeline
● Use controller for manual ops
● Use controller for monitoring
● Use controller for alerting
● Use controller for scaling
Time for a demo !
A preview of Flink K8S Toolbox:
● Easy installation
● Easy deployments
● Jobs management
● Cluster metrics
● Cluster scaling
Monitoring and Scaling
We can use Flink API for:
● Watching jobs status and
alerting when something is
broken
● Observing cluster metrics
and scaling cluster when
required
Checkpoints/Savepoints
We can use Flink API for:
● Monitoring checkpoints
● Managing savepoints
● Retrieving last savepoint
Continuous Delivery
We can use tools like Flux:
● Push changes into Git repo
● Changes are automatically
applied to resources
Flux (I haven’t actually tried it)
https://github.com/weaveworks/flux
Nice features to have...
● Pluggable alerting strategy
● Pluggable scaling strategy
● Web console
● Secure access
● Support for HA mode
● …
It’s all free!
Flink Kubernetes Toolbox:
https://github.com/nextbreakpoint/flink-k8s-toolbox
Related projects:
https://github.com/nextbreakpoint/flink-client
https://github.com/nextbreakpoint/flink-workshop
https://github.com/nextbreakpoint/kubernetes-playground
Fine.
Where to follow:
@AndreaMedeghini
nextbreakpoint.com

More Related Content

What's hot

Brief intro to K8s controller and operator
Brief intro to K8s controller and operator Brief intro to K8s controller and operator
Brief intro to K8s controller and operator Shang Xiang Fan
 
Ci with jenkins docker and mssql belgium
Ci with jenkins docker and mssql belgiumCi with jenkins docker and mssql belgium
Ci with jenkins docker and mssql belgiumChris Adkin
 
Eclipse 2011 Hot Topics
Eclipse 2011 Hot TopicsEclipse 2011 Hot Topics
Eclipse 2011 Hot TopicsLars Vogel
 
Delivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 PipelineDelivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 PipelineSlawa Giterman
 
Import golang; struct microservice
Import golang; struct microserviceImport golang; struct microservice
Import golang; struct microserviceGiulio De Donato
 
(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis
(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis
(CISC 2013) Real-Time Record and Replay on Android for Malware AnalysisZongXian Shen
 
Jenkins, pipeline and docker
Jenkins, pipeline and docker Jenkins, pipeline and docker
Jenkins, pipeline and docker AgileDenver
 
Eclipse Che : ParisJUG
Eclipse Che : ParisJUGEclipse Che : ParisJUG
Eclipse Che : ParisJUGFlorent BENOIT
 
(Declarative) Jenkins Pipelines
(Declarative) Jenkins Pipelines(Declarative) Jenkins Pipelines
(Declarative) Jenkins PipelinesSteffen Gebert
 
Using Docker to build and test in your laptop and Jenkins
Using Docker to build and test in your laptop and JenkinsUsing Docker to build and test in your laptop and Jenkins
Using Docker to build and test in your laptop and JenkinsMicael Gallego
 
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins PipelinesAn Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins PipelinesSteffen Gebert
 
An Introduction to Eclipse Che - Next-Gen Eclipse Java IDE
An Introduction to Eclipse Che - Next-Gen Eclipse Java IDEAn Introduction to Eclipse Che - Next-Gen Eclipse Java IDE
An Introduction to Eclipse Che - Next-Gen Eclipse Java IDEKubeAcademy
 
Introduction to the Android NDK
Introduction to the Android NDKIntroduction to the Android NDK
Introduction to the Android NDKSebastian Mauer
 
Pipeline as code - new feature in Jenkins 2
Pipeline as code - new feature in Jenkins 2Pipeline as code - new feature in Jenkins 2
Pipeline as code - new feature in Jenkins 2Michal Ziarnik
 
Eclipse Che: The Next-Gen Eclipse IDE - Bordeaux jug 2016
Eclipse Che: The Next-Gen Eclipse IDE - Bordeaux jug 2016Eclipse Che: The Next-Gen Eclipse IDE - Bordeaux jug 2016
Eclipse Che: The Next-Gen Eclipse IDE - Bordeaux jug 2016Florent BENOIT
 
Extending Eclipse Che to build custom cloud IDEs
Extending Eclipse Che to build custom cloud IDEsExtending Eclipse Che to build custom cloud IDEs
Extending Eclipse Che to build custom cloud IDEsFlorent BENOIT
 
Code in the cloud with Eclipse Che and Docker
Code in the cloud with Eclipse Che and DockerCode in the cloud with Eclipse Che and Docker
Code in the cloud with Eclipse Che and DockerFlorent BENOIT
 
Flink on Kubernetes operator
Flink on Kubernetes operatorFlink on Kubernetes operator
Flink on Kubernetes operatorEui Heo
 
Pipeline based deployments on Jenkins
Pipeline based deployments  on JenkinsPipeline based deployments  on Jenkins
Pipeline based deployments on JenkinsKnoldus Inc.
 

What's hot (20)

Brief intro to K8s controller and operator
Brief intro to K8s controller and operator Brief intro to K8s controller and operator
Brief intro to K8s controller and operator
 
Ci with jenkins docker and mssql belgium
Ci with jenkins docker and mssql belgiumCi with jenkins docker and mssql belgium
Ci with jenkins docker and mssql belgium
 
Eclipse 2011 Hot Topics
Eclipse 2011 Hot TopicsEclipse 2011 Hot Topics
Eclipse 2011 Hot Topics
 
Delivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 PipelineDelivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 Pipeline
 
Import golang; struct microservice
Import golang; struct microserviceImport golang; struct microservice
Import golang; struct microservice
 
(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis
(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis
(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis
 
Jenkins, pipeline and docker
Jenkins, pipeline and docker Jenkins, pipeline and docker
Jenkins, pipeline and docker
 
Eclipse Che : ParisJUG
Eclipse Che : ParisJUGEclipse Che : ParisJUG
Eclipse Che : ParisJUG
 
Ci for-android-apps
Ci for-android-appsCi for-android-apps
Ci for-android-apps
 
(Declarative) Jenkins Pipelines
(Declarative) Jenkins Pipelines(Declarative) Jenkins Pipelines
(Declarative) Jenkins Pipelines
 
Using Docker to build and test in your laptop and Jenkins
Using Docker to build and test in your laptop and JenkinsUsing Docker to build and test in your laptop and Jenkins
Using Docker to build and test in your laptop and Jenkins
 
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins PipelinesAn Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
 
An Introduction to Eclipse Che - Next-Gen Eclipse Java IDE
An Introduction to Eclipse Che - Next-Gen Eclipse Java IDEAn Introduction to Eclipse Che - Next-Gen Eclipse Java IDE
An Introduction to Eclipse Che - Next-Gen Eclipse Java IDE
 
Introduction to the Android NDK
Introduction to the Android NDKIntroduction to the Android NDK
Introduction to the Android NDK
 
Pipeline as code - new feature in Jenkins 2
Pipeline as code - new feature in Jenkins 2Pipeline as code - new feature in Jenkins 2
Pipeline as code - new feature in Jenkins 2
 
Eclipse Che: The Next-Gen Eclipse IDE - Bordeaux jug 2016
Eclipse Che: The Next-Gen Eclipse IDE - Bordeaux jug 2016Eclipse Che: The Next-Gen Eclipse IDE - Bordeaux jug 2016
Eclipse Che: The Next-Gen Eclipse IDE - Bordeaux jug 2016
 
Extending Eclipse Che to build custom cloud IDEs
Extending Eclipse Che to build custom cloud IDEsExtending Eclipse Che to build custom cloud IDEs
Extending Eclipse Che to build custom cloud IDEs
 
Code in the cloud with Eclipse Che and Docker
Code in the cloud with Eclipse Che and DockerCode in the cloud with Eclipse Che and Docker
Code in the cloud with Eclipse Che and Docker
 
Flink on Kubernetes operator
Flink on Kubernetes operatorFlink on Kubernetes operator
Flink on Kubernetes operator
 
Pipeline based deployments on Jenkins
Pipeline based deployments  on JenkinsPipeline based deployments  on Jenkins
Pipeline based deployments on Jenkins
 

Similar to How to build a tool for operating Flink on Kubernetes

Odo improving the developer experience on OpenShift - hack & sangria
Odo   improving the developer experience on OpenShift - hack & sangriaOdo   improving the developer experience on OpenShift - hack & sangria
Odo improving the developer experience on OpenShift - hack & sangriaJorge Morales
 
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang WangVirtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang WangFlink Forward
 
Kubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformKubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformBob Killen
 
Rejekts 24 EU No GitOps Pain, No Platform Gain
Rejekts 24 EU No GitOps Pain, No Platform GainRejekts 24 EU No GitOps Pain, No Platform Gain
Rejekts 24 EU No GitOps Pain, No Platform GainŁukasz Piątkowski
 
Devops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShiftDevops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShiftYaniv cohen
 
Настройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'aНастройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'acorehard_by
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsAmbassador Labs
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesGabriel Carro
 
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward
 
Introducing Koki Short
Introducing Koki ShortIntroducing Koki Short
Introducing Koki ShortSidhartha Mani
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
CI/CD Across Multiple Environments
CI/CD Across Multiple EnvironmentsCI/CD Across Multiple Environments
CI/CD Across Multiple EnvironmentsKarl Isenberg
 
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetesWilliam Stewart
 
Kubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetupKubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetupMist.io
 
When to use Serverless? When to use Kubernetes?
When to use Serverless? When to use Kubernetes?When to use Serverless? When to use Kubernetes?
When to use Serverless? When to use Kubernetes?Niklas Heidloff
 
Introduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopIntroduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopBob Killen
 
The State of the Veil Framework
The State of the Veil FrameworkThe State of the Veil Framework
The State of the Veil FrameworkVeilFramework
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kuberneteskloia
 

Similar to How to build a tool for operating Flink on Kubernetes (20)

Odo improving the developer experience on OpenShift - hack & sangria
Odo   improving the developer experience on OpenShift - hack & sangriaOdo   improving the developer experience on OpenShift - hack & sangria
Odo improving the developer experience on OpenShift - hack & sangria
 
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang WangVirtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
 
Kubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformKubernetes: The Next Research Platform
Kubernetes: The Next Research Platform
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Introduction to Tekton
Introduction to TektonIntroduction to Tekton
Introduction to Tekton
 
Rejekts 24 EU No GitOps Pain, No Platform Gain
Rejekts 24 EU No GitOps Pain, No Platform GainRejekts 24 EU No GitOps Pain, No Platform Gain
Rejekts 24 EU No GitOps Pain, No Platform Gain
 
Devops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShiftDevops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShift
 
Настройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'aНастройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'a
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
 
Introducing Koki Short
Introducing Koki ShortIntroducing Koki Short
Introducing Koki Short
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
CI/CD Across Multiple Environments
CI/CD Across Multiple EnvironmentsCI/CD Across Multiple Environments
CI/CD Across Multiple Environments
 
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetes
 
Kubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetupKubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetup
 
When to use Serverless? When to use Kubernetes?
When to use Serverless? When to use Kubernetes?When to use Serverless? When to use Kubernetes?
When to use Serverless? When to use Kubernetes?
 
Introduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopIntroduction to Kubernetes Workshop
Introduction to Kubernetes Workshop
 
The State of the Veil Framework
The State of the Veil FrameworkThe State of the Veil Framework
The State of the Veil Framework
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
 

Recently uploaded

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 

Recently uploaded (20)

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 

How to build a tool for operating Flink on Kubernetes

  • 1. How to build a tool for operating Flink on Kubernetes Andrea Medeghini Software Engineer / Contractor
  • 2. Which products are available? Ververica dA Platform: ● Automated deployments ● Easy management of jobs ● Monitoring ● Logging (ELK) ● Trial version available
  • 3. Are there free alternatives? There are few projects on GitHub, however… ● They mainly focus on deployment ● They do not provide a complete solution ● They might not work for all use cases
  • 4. Can we use Helm? It’s good, but it doesn’t help with… ● Jobs and Savepoints ● Monitoring / Alerting ● Automatic Scaling # helm install --name my-flink-cluster charts/flink
  • 5. Wait for next Flink release? Better integration with Kubernetes it’s coming: ● Reactive container mode https://issues.apache.org/jira/browse/FLINK-10407 ● Active Kubernetes integration https://issues.apache.org/jira/browse/FLINK-9953
  • 6. Shall we build our own tool? It’s going to be challenging! Because… ● Flink is a distributed engine ● Flink is a stateful engine ● Jobs need to be packaged and uploaded ● Jobs need to be monitored to detect failures ● Resources need to be adjusted according to workload
  • 7. Do we need an open source tool? Everybody likes open source tools… … how do we build one?
  • 8. Overview of a Flink Cluster ● One or more JobManagers (typically one) ● One or more TaskManagers (typically many) ● One or more jobs packaged as JAR files ● Storage for savepoints
  • 9. Exploiting Kubernetes API It’s all REST! There are clients libraries… … for many languages not only Go! See Kubernetes Documentation: https://kubernetes.io/docs/reference/
  • 10. Control Kubernetes Programmatically val jobmanagerStatefulSet = V1StatefulSet() .metadata(jobmanagerMetadata) .spec( V1StatefulSetSpec() .replicas(1) .template( V1PodTemplateSpec().spec(jobmanagerPodSpec).metadata(jobmanagerMetadata) ) .updateStrategy(updateStrategy) .serviceName("jobmanager") .selector(jobmanagerSelector) .addVolumeClaimTemplatesItem(persistentVolumeClaim) ) api.createNamespacedStatefulSet(namespace, jobmanagerStatefulSet, null, null, null)
  • 11. What resources do we need? ● StatefulSet for JobManager (1 replica) ● StatefulSet for TaskManager (N replicas) ● Services for JobManager (headless, NodePort, ...) ● PersistentVolumeClaims ● …
  • 12. What configuration do we need? ● Set JOB_MANAGER_RPC_ADDRESS to JobManager service ● Set TASK_MANAGER_NUMBER_OF_TASK_SLOTS to 1 ● Set memory limits of container higher than max heap ● Set CPU limits to sensible value ● Configure pod affinity to spread workload ● Expose relevant ports (usually only internally) ● Add sensible labels to identify resources
  • 13. Run exec against the Job Manager How does it work? ● Kubernetes Client for managing clusters ● Exec for executing commands in the containers
  • 14. Easy to implement but... ● It depends on commands installed in the container ● It seems too consuming in terms of resources (we need to run a process inside the container for each operation) ● It doesn’t enforce any protocol (stdin/stdout)
  • 15. Flink Monitoring API to the rescue! Flink has a pretty useful REST API: ● Endpoints for managing jobs ● Endpoints for managing savepoints ● Endpoints for monitoring the cluster Is there a client library? I am afraid not…
  • 16. Create client using OpenAPI I manually crafted an OpenAPI specification file… … It’s tedious but the generated client works fine! See Swagger Documentation: https://swagger.io/docs/specification/about/
  • 17. Swagger Editor and Code generator /v1/jobs: get: operationId: getJobs summary: Returns an overview over all jobs and their current state responses: '200': description: |- 200 response content: application/json: schema: $ref: '#/components/schemas/JobIdsWithStatusOverview' ... See full specification on GitHub: https://github.com/nextbreakpoint/flink-client/blob/master/flink-openapi.yaml
  • 18. Combine all in one application We can combine the APIs: ● Kubernetes Client for managing clusters ● Flink Client for managing jobs
  • 19. What are the limitations? ● Where does the client live? ● Still no monitoring or automatic scaling ● NodePort or Port Forward required for each JobManager (for each Flink Cluster) ● Port Forward doesn’t work well with file upload (there is a problem with timeout in the Kubernetes Client for Java)
  • 20. Run controller inside Kubernetes What are the benefits? ● It can easily access internal resources ● It runs with its own service account ● It can monitor the clusters ● It can rescale the clusters
  • 21. Better than before but... ● One port forward is still required ● Authorization is required for API ● It doesn’t follow best practises!
  • 22. We need a Kubernetes Operator! Everybody think we need Go, but… … an Operator is like a pattern… … and we can use any programming language! Operator SDK for Go: https://github.com/operator-framework/operator-sdk Operator Pattern: https://coreos.com/blog/introducing-operators.html
  • 23. Custom Resource Definition apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: flinkclusters.beta.nextbreakpoint.com spec: group: beta.nextbreakpoint.com versions: - name: v1 served: true storage: true scope: Namespaced names: plural: flinkclusters singular: flinkcluster kind: FlinkCluster shortNames: - fc /api/beta.nextbreakpoint.com/v1/namespaces/*/flinkclusters # kubectl create -f flink-crd.yaml # kubectl get crd NAME AGE flinkclusters.beta.nextbreakpoint.com 1d …
  • 24. Custom Objects apiVersion: "beta.nextbreakpoint.com/v1" kind: FlinkCluster metadata: name: test spec: clusterName: test environment: test pullSecrets: regcred pullPolicy: Always flinkImage: nextbreakpoint/flink:1.7.2-1 sidecarImage: flink-workshop-jobs:2 sidecarServiceAccount: flink-operator sidecarClassName: com.nextbreakpoint.flink.jobs.TestJob sidecarJarPath: /com.nextbreakpoint.flinkworkshop-1.0.0.jar sidecarParallelism: 1 sidecarArguments: - --BUCKET_BASE_PATH - file:///var/tmp # kubectl create -f cluster.yaml # kubectl get flinkclusters NAME AGE test 4s
  • 25. The Operator Loop 1. Receive updates of Custom Objects 2. Receive updates of StatefulSets, Services, PVCs, … 3. Compare desired state to actual state 4. Adjust current state to match desired state 5. Repeat from 1
  • 26. Run a Flink Operator What are the benefits? ● It follows Kubernetes best practises ● It runs with its own service account ● We only need to create cluster objects
  • 27. Operator meets Controller They can operate together: ● Use operator with CD pipeline ● Use controller for manual ops ● Use controller for monitoring ● Use controller for alerting ● Use controller for scaling
  • 28. Time for a demo ! A preview of Flink K8S Toolbox: ● Easy installation ● Easy deployments ● Jobs management ● Cluster metrics ● Cluster scaling
  • 29. Monitoring and Scaling We can use Flink API for: ● Watching jobs status and alerting when something is broken ● Observing cluster metrics and scaling cluster when required
  • 30. Checkpoints/Savepoints We can use Flink API for: ● Monitoring checkpoints ● Managing savepoints ● Retrieving last savepoint
  • 31. Continuous Delivery We can use tools like Flux: ● Push changes into Git repo ● Changes are automatically applied to resources Flux (I haven’t actually tried it) https://github.com/weaveworks/flux
  • 32. Nice features to have... ● Pluggable alerting strategy ● Pluggable scaling strategy ● Web console ● Secure access ● Support for HA mode ● …
  • 33. It’s all free! Flink Kubernetes Toolbox: https://github.com/nextbreakpoint/flink-k8s-toolbox Related projects: https://github.com/nextbreakpoint/flink-client https://github.com/nextbreakpoint/flink-workshop https://github.com/nextbreakpoint/kubernetes-playground