SlideShare a Scribd company logo
Apache Flink
Kubernetes Operator
THIS IS NOT A CONTRIBUTION
In a nutshell
Manage Flink Application and Session clusters/jobs

Deploy, Upgrade, Savepoint, Monitor
Intro
Why another Flink operator
•Existing OSS solutions

•Fragmentation, unreliable maintenance

•Contribution barrier

•FLIP-212: Introduce Flink Kubernetes Operator
Supported Deployment Modes
•Flink deployments in application or session mode

•Flink application managed through FlinkDeployment

•Empty Flink session managed through FlinkDeployment +
jobs managed through FlinkSessionJob

•Job submissions against a session cluster

•Foundation for the common workload types and languages
(Java, SQL, Python)
Project Status
•4 releases

•Active community with contributors from multiple
organizations



•Production ready
Acknowledgement
• Gyula Fora (gyfora)

• Matyas Orhidi (morhidi)

• Aitozi

• Marton Balassi (mbalassi)

• Yang Wang (wangyang0918)

• Nicholas Jiang (SteNicholas)

• Biao Geng (bgeng777)

• Thomas Weise (tweise)

• Hao Xin (haoxins)

• Usamah Jassat (usamj)

• James Busche (jbusche)

• Junfan Zhang (zuston)

• zeus1ammon
27 Contributors
The basics
Control Loop
Deployment Flow
1. Submit FlinkDeployment (CR)

2. Operator creates JobManager

3. JobManager creates task manager pods

4. JobManager submits job
FlinkDeployment Status
Status captures everything the operator knows about the application

What’s in it?

• JobManager Deployment Status

• Job status

• Basic Job Details

• Savepoint Info

• Reconciliation Status

• Last reconciled spec

• Success / Error information
Observing FlinkDeployment status
The Observer component is responsible for determining the status

Observe
fl
ow

1. Observe JobManager Deployment Status (exists, errors, Flink
ports)

2. Observe Job Status (Rest API accessible, job status)

3. Observe Savepoint Status (pending savepoints)

It takes a few reconcile loops to reach steady state after deployment…
Once the job is running…
• Check the Flink UI -> Metrics, Flame Graphs, Memory Utilisation

• Check the logs

• Operator log -> Reconcile / deployment errors

• JM/TM logs -> Job errors, warnings etc.

• Trigger Savepoints

• Perform upgrades

• Suspend / Resume processing
Lifecycle management
Resource Lifecycle
Upgrade / Suspend Applications
Jobs can be upgraded by simply submitting a new spec.

What happens then?

1. Suspend running job (keep state)

2. Restore using the new spec (using state from last run)

If job.state is set to SUSPENDED the job will be paused.
Application Upgrade Modes
Controls how the streaming job will be suspended and restarted on
spec changes

Available modes

• Stateless 

• Last State

• Savepoint
Application Upgrade Modes
Stateless Last State Savepoint
Con
fi
g Requirement None
• Checkpointing Enabled

• Kubernetes HA Enabled
• Savepoint directory
de
fi
ned
Job Status Requirement None* None*
Deployment healthy

Job Running
Suspend Mechanism Cancel / Delete
Delete Flink deployment
(keep HA data)
Cancel with savepoint
Restore Mechanism Deploy from empty state
Deploy job -> recover state
from HA data
Restore From savepoint
When to use? Stateless jobs, prototyping Most stateful jobs Job Migration / Forking
* No savepoint in progress
Manual Savepoints
•
Allows you to keep “backups” of your application state

Trigger by changing job.savepointTriggerNonce


•
Use job.initialSavepointPath to start from a speci
fi
c savepoint
on new deployments

•
Savepoints are cleaned up automatically
Automatic Savepoint Management
•Periodic savepoints

•Con
fi
g: kubernetes.operator.periodic.savepoint.interval

•Savepoints triggered as part of regular reconcile loop

•Savepoint history

•Count and age based

•Disposal of savepoints
Configuration
Zero Downtime Changes
Initial con
fi
guration through helm chart

•How to apply changes?

Dynamic changes without control plane interruption

•Clusters with many concurrent reconciliations

•Reload operator con
fi
guration from con
fi
g map

Namespaces to watch

•List of namespaces + dynamic con
fi
g change
Operator (System) Level
•General operator wide con
fi
guration

•Cannot be overridden on a per-resource basis

Examples:

•Timeout for the observer to wait the Flink REST client to return

•Interval for the controller to reschedule the reconcile process

•Maximum number of threads running the reconciliation loop

https://nightlies.apache.org/
fl
ink/
fl
ink-kubernetes-operator-docs-main/docs/operations/con
fi
guration/#system-con
fi
guration
Resource (User) Level
•Settings that a
ff
ect a single deployment

•“Extend” the CR (but don’t require CRD changes!)

Examples:

•Enable recovery of missing/deleted jobmanager deployments

•Timeout for deployments to become ready/stable before being rolled
back

•Interval before a savepoint trigger attempt is marked as unsuccessful

https://nightlies.apache.org/
fl
ink/
fl
ink-kubernetes-operator-docs-main/docs/operations/con
fi
guration/#resourceuser-con
fi
guration
Flink Pod Templates
CR with limited direct settings (like memory and cpu resource)

Maximum
fl
exibility through
fl
inkCon
fi
guration and pod templates (init, sidecar, storage etc.)



Common template with job/task manager override

https://nightlies.apache.org/
fl
ink/
fl
ink-kubernetes-operator-docs-main/docs/custom-resource/pod-template/
Observability
Metrics
Flink metric reporters (default reporters shipped with image)

•Prometheus (example)



Metric scopes

•Operator: JVM, k8s client metrics

•Watched Namespace: Deployment count, Status count

•Resource: Reconciliation count, .. (JOSDK metrics)
Logging
Con
fi
gured via Con
fi
gMap (default in helm chart)

Kubernetes Events
Important changes (and errors) recorded as events

kubectl describe flinkdeployment basic-example




Events can be forwarded to infrastructure speci
fi
c collectors
When things go wrong
Error Scenarios
Typical causes
•Operator service account access problem

•Invalid Flink deployment con
fi
guration

•Operator failure / bug

Where to look
•Operator log

•Service accounts / roles / role bindings

FlinkDeployment Not Created
Error Scenarios
Typical causes
•Flink service account access problem

•Flink image pull error

•Pod template / other Kubernetes issues

Where to look
•FlinkDeployment (CR) events

•Describe JobManager replicaset
JobManager pod not created
Error Scenarios
Typical causes
•Flink service account access problem

•TM pod template problems

•Insu
ffi
cient resources

Where to look
•JobManager pod logs

•Describe pending task manager pod
TaskManager pods not ready
Roadmap
Roadmap
•Version 1.1

•Dynamic change of watched namespaces

•Pluggable Status and Event reporters (integration point for control planes)

•Improved savepoint management & periodic triggering

•Experimental auto-scaling using Horizontal Pod AutoScaler

•Version 1.2

•Standalone deployment mode support (FLIP-225, support for older Flink versions)

•Improved scaling and autoscaling support

•Improved rollback mechanism

•Roadmap documentation page
Q&A

More Related Content

What's hot

Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst Practices
Konstantin Knauf
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Databricks
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
Aljoscha Krettek
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
Ververica
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Flink Forward
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
confluent
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
AKASH SIHAG
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Flink Forward
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
HostedbyConfluent
 

What's hot (20)

Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst Practices
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
 

Similar to Introducing the Apache Flink Kubernetes Operator

Enhanced Reframework Session_16-07-2022.pptx
Enhanced Reframework Session_16-07-2022.pptxEnhanced Reframework Session_16-07-2022.pptx
Enhanced Reframework Session_16-07-2022.pptx
Rohit Radhakrishnan
 
python_development.pptx
python_development.pptxpython_development.pptx
python_development.pptx
LemonReddy1
 
SynapseIndia drupal presentation on drupal info
SynapseIndia drupal  presentation on drupal infoSynapseIndia drupal  presentation on drupal info
SynapseIndia drupal presentation on drupal info
Synapseindiappsdevelopment
 
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
Alluxio, Inc.
 
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Amazon Web Services
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio, Inc.
 
Using The Right Tool For The Job
Using The Right Tool For The JobUsing The Right Tool For The Job
Using The Right Tool For The Job
Chris Baldock
 
Index Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessIndex Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for Success
Dean Willson
 
Life In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPagesLife In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPages
Ulrich Krause
 
Serverless design with Fn project
Serverless design with Fn projectServerless design with Fn project
Serverless design with Fn project
Siva Rama Krishna Chunduru
 
La vita nella corsia di sorpasso; A tutta velocità, XPages!
La vita nella corsia di sorpasso; A tutta velocità, XPages!La vita nella corsia di sorpasso; A tutta velocità, XPages!
La vita nella corsia di sorpasso; A tutta velocità, XPages!
Ulrich Krause
 
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
garrett honeycutt
 
Anatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business SuiteAnatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business Suite
vasuballa
 
Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501
Tjarda Peelen
 
Apex world 2018 continuously delivering APEX
Apex world 2018 continuously delivering APEXApex world 2018 continuously delivering APEX
Apex world 2018 continuously delivering APEX
Sergei Martens
 
APEX Application Lifecycle and Deployment 20220714.pdf
APEX Application Lifecycle and Deployment 20220714.pdfAPEX Application Lifecycle and Deployment 20220714.pdf
APEX Application Lifecycle and Deployment 20220714.pdf
Richard Martens
 
Actions rules and workflow in alfresco
Actions rules and workflow in alfrescoActions rules and workflow in alfresco
Actions rules and workflow in alfresco
Alfresco Software
 
Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016
Phil Estes
 
Oracle EBS Production Support - Recommendations
Oracle EBS Production Support - RecommendationsOracle EBS Production Support - Recommendations
Oracle EBS Production Support - Recommendations
Vigilant Technologies
 
Tuenti Release Workflow
Tuenti Release WorkflowTuenti Release Workflow
Tuenti Release Workflow
Tuenti
 

Similar to Introducing the Apache Flink Kubernetes Operator (20)

Enhanced Reframework Session_16-07-2022.pptx
Enhanced Reframework Session_16-07-2022.pptxEnhanced Reframework Session_16-07-2022.pptx
Enhanced Reframework Session_16-07-2022.pptx
 
python_development.pptx
python_development.pptxpython_development.pptx
python_development.pptx
 
SynapseIndia drupal presentation on drupal info
SynapseIndia drupal  presentation on drupal infoSynapseIndia drupal  presentation on drupal info
SynapseIndia drupal presentation on drupal info
 
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
 
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Using The Right Tool For The Job
Using The Right Tool For The JobUsing The Right Tool For The Job
Using The Right Tool For The Job
 
Index Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessIndex Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for Success
 
Life In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPagesLife In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPages
 
Serverless design with Fn project
Serverless design with Fn projectServerless design with Fn project
Serverless design with Fn project
 
La vita nella corsia di sorpasso; A tutta velocità, XPages!
La vita nella corsia di sorpasso; A tutta velocità, XPages!La vita nella corsia di sorpasso; A tutta velocità, XPages!
La vita nella corsia di sorpasso; A tutta velocità, XPages!
 
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
 
Anatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business SuiteAnatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business Suite
 
Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501
 
Apex world 2018 continuously delivering APEX
Apex world 2018 continuously delivering APEXApex world 2018 continuously delivering APEX
Apex world 2018 continuously delivering APEX
 
APEX Application Lifecycle and Deployment 20220714.pdf
APEX Application Lifecycle and Deployment 20220714.pdfAPEX Application Lifecycle and Deployment 20220714.pdf
APEX Application Lifecycle and Deployment 20220714.pdf
 
Actions rules and workflow in alfresco
Actions rules and workflow in alfrescoActions rules and workflow in alfresco
Actions rules and workflow in alfresco
 
Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016
 
Oracle EBS Production Support - Recommendations
Oracle EBS Production Support - RecommendationsOracle EBS Production Support - Recommendations
Oracle EBS Production Support - Recommendations
 
Tuenti Release Workflow
Tuenti Release WorkflowTuenti Release Workflow
Tuenti Release Workflow
 

More from Flink Forward

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
Flink Forward
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
Flink Forward
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
Flink Forward
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
Flink Forward
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 

More from Flink Forward (15)

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 

Introducing the Apache Flink Kubernetes Operator

  • 2. In a nutshell Manage Flink Application and Session clusters/jobs Deploy, Upgrade, Savepoint, Monitor
  • 4. Why another Flink operator •Existing OSS solutions •Fragmentation, unreliable maintenance •Contribution barrier •FLIP-212: Introduce Flink Kubernetes Operator
  • 5. Supported Deployment Modes •Flink deployments in application or session mode •Flink application managed through FlinkDeployment •Empty Flink session managed through FlinkDeployment + jobs managed through FlinkSessionJob •Job submissions against a session cluster •Foundation for the common workload types and languages (Java, SQL, Python)
  • 6. Project Status •4 releases •Active community with contributors from multiple organizations
 •Production ready
  • 7. Acknowledgement • Gyula Fora (gyfora) • Matyas Orhidi (morhidi) • Aitozi • Marton Balassi (mbalassi) • Yang Wang (wangyang0918) • Nicholas Jiang (SteNicholas) • Biao Geng (bgeng777) • Thomas Weise (tweise) • Hao Xin (haoxins) • Usamah Jassat (usamj) • James Busche (jbusche) • Junfan Zhang (zuston) • zeus1ammon 27 Contributors
  • 10. Deployment Flow 1. Submit FlinkDeployment (CR) 2. Operator creates JobManager 3. JobManager creates task manager pods 4. JobManager submits job
  • 11. FlinkDeployment Status Status captures everything the operator knows about the application What’s in it? • JobManager Deployment Status • Job status • Basic Job Details • Savepoint Info • Reconciliation Status • Last reconciled spec • Success / Error information
  • 12. Observing FlinkDeployment status The Observer component is responsible for determining the status Observe fl ow 1. Observe JobManager Deployment Status (exists, errors, Flink ports) 2. Observe Job Status (Rest API accessible, job status) 3. Observe Savepoint Status (pending savepoints) It takes a few reconcile loops to reach steady state after deployment…
  • 13. Once the job is running… • Check the Flink UI -> Metrics, Flame Graphs, Memory Utilisation • Check the logs • Operator log -> Reconcile / deployment errors • JM/TM logs -> Job errors, warnings etc. • Trigger Savepoints • Perform upgrades • Suspend / Resume processing
  • 14.
  • 17. Upgrade / Suspend Applications Jobs can be upgraded by simply submitting a new spec. What happens then? 1. Suspend running job (keep state) 2. Restore using the new spec (using state from last run) If job.state is set to SUSPENDED the job will be paused.
  • 18. Application Upgrade Modes Controls how the streaming job will be suspended and restarted on spec changes Available modes • Stateless • Last State • Savepoint
  • 19. Application Upgrade Modes Stateless Last State Savepoint Con fi g Requirement None • Checkpointing Enabled • Kubernetes HA Enabled • Savepoint directory de fi ned Job Status Requirement None* None* Deployment healthy Job Running Suspend Mechanism Cancel / Delete Delete Flink deployment (keep HA data) Cancel with savepoint Restore Mechanism Deploy from empty state Deploy job -> recover state from HA data Restore From savepoint When to use? Stateless jobs, prototyping Most stateful jobs Job Migration / Forking * No savepoint in progress
  • 20. Manual Savepoints • Allows you to keep “backups” of your application state Trigger by changing job.savepointTriggerNonce 
 • Use job.initialSavepointPath to start from a speci fi c savepoint on new deployments
 • Savepoints are cleaned up automatically
  • 21. Automatic Savepoint Management •Periodic savepoints •Con fi g: kubernetes.operator.periodic.savepoint.interval •Savepoints triggered as part of regular reconcile loop •Savepoint history •Count and age based •Disposal of savepoints
  • 23. Zero Downtime Changes Initial con fi guration through helm chart •How to apply changes? Dynamic changes without control plane interruption •Clusters with many concurrent reconciliations •Reload operator con fi guration from con fi g map Namespaces to watch •List of namespaces + dynamic con fi g change
  • 24. Operator (System) Level •General operator wide con fi guration •Cannot be overridden on a per-resource basis Examples: •Timeout for the observer to wait the Flink REST client to return •Interval for the controller to reschedule the reconcile process •Maximum number of threads running the reconciliation loop https://nightlies.apache.org/ fl ink/ fl ink-kubernetes-operator-docs-main/docs/operations/con fi guration/#system-con fi guration
  • 25. Resource (User) Level •Settings that a ff ect a single deployment •“Extend” the CR (but don’t require CRD changes!) Examples: •Enable recovery of missing/deleted jobmanager deployments •Timeout for deployments to become ready/stable before being rolled back •Interval before a savepoint trigger attempt is marked as unsuccessful https://nightlies.apache.org/ fl ink/ fl ink-kubernetes-operator-docs-main/docs/operations/con fi guration/#resourceuser-con fi guration
  • 26. Flink Pod Templates CR with limited direct settings (like memory and cpu resource) Maximum fl exibility through fl inkCon fi guration and pod templates (init, sidecar, storage etc.) Common template with job/task manager override https://nightlies.apache.org/ fl ink/ fl ink-kubernetes-operator-docs-main/docs/custom-resource/pod-template/
  • 28. Metrics Flink metric reporters (default reporters shipped with image) •Prometheus (example)
 Metric scopes •Operator: JVM, k8s client metrics •Watched Namespace: Deployment count, Status count •Resource: Reconciliation count, .. (JOSDK metrics)
  • 29. Logging Con fi gured via Con fi gMap (default in helm chart)

  • 30. Kubernetes Events Important changes (and errors) recorded as events kubectl describe flinkdeployment basic-example Events can be forwarded to infrastructure speci fi c collectors
  • 31. When things go wrong
  • 32. Error Scenarios Typical causes •Operator service account access problem •Invalid Flink deployment con fi guration •Operator failure / bug Where to look •Operator log •Service accounts / roles / role bindings FlinkDeployment Not Created
  • 33. Error Scenarios Typical causes •Flink service account access problem •Flink image pull error •Pod template / other Kubernetes issues Where to look •FlinkDeployment (CR) events •Describe JobManager replicaset JobManager pod not created
  • 34. Error Scenarios Typical causes •Flink service account access problem •TM pod template problems •Insu ffi cient resources Where to look •JobManager pod logs •Describe pending task manager pod TaskManager pods not ready
  • 36. Roadmap •Version 1.1 •Dynamic change of watched namespaces •Pluggable Status and Event reporters (integration point for control planes) •Improved savepoint management & periodic triggering •Experimental auto-scaling using Horizontal Pod AutoScaler •Version 1.2 •Standalone deployment mode support (FLIP-225, support for older Flink versions) •Improved scaling and autoscaling support •Improved rollback mechanism •Roadmap documentation page
  • 37. Q&A