SlideShare a Scribd company logo
Flink SQL
Workshop
Author: Albert Lewandowski
Overview
Apache Flink
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data Streaming vs. Batch
Events
1 2 3 4 5 6
Batch
Stream
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Use cases
User activity
Fraud detection
Logistics
Industrial IoT
Location data
Recommendations
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
One tool, multiple languages
Java 8 or 11
Python
SQL
Scala 2.11 or 2.12
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Where should I install?
Standalone
Kubernetes
YARN cluster
● CICD process
● Service Discovery - monitoring with Prometheus
● Scalability
● Managing resources
● A/B Testing
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
High Availability of Flink
JobManager level
Storage level
● ZooKeeper
● Kubernetes (beta)
● High Availability of
storage to/from
which Flink
writes/reads
savepoints and
checkpoints
● Performance of
storage
Job Strategy
● Data reprocessing
policy
● How to deploy new
job?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Flink K8S Operator
Kubernetes Operator for
Apache Flink
Ververica Platform
Native Kubernetes -
Apache Flink
CRDs Yes Yes No No
CICD Kubernetes API Kubernetes API REST API or Web UI Kubernetes API
Installation
Helm chart or raw
Kubernetes manifests
Helm chart or raw
Kubernetes manifests
Helm chart or raw
Kubernetes manifests
No need to install any
component
SQL Editor No No Yes No
Dependencies No No
Persistence volume for
database
Object storage for
artifactory
No
Status beta beta production beta
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Why Flink on Kubernetes?
Simpler deployment process Flexible jobs management
Simple Service Discovery -
Prometheus
Flexible testing
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Installation & Configuration
Helm
A package manager
for Kubernetes
CICD tool
Example: Gitlab CI
Kubernetes API
Flink jobs
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Testing
Incubating Mode
A/B Testing
Blue Green
Deployment
Production
data
Production
job
Job
Incubating
mode
Separated
output
Standard
output
Dedicated TaskManagers
Dedicated TaskManagers
savepoi
nt
Proxy
Flink Job
#1
Flink Job
#2
Result #1
Result #2
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Job Cluster & Session Cluster
Job Cluster Session Cluster
Full set of Flink cluster for
each individual job
Standalone Flink cluster on
Kubernetes
Short running tasks Ad-hoc queries
Long running tasks
Separate images for
different jobs
Overview
Ververica Platform
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Ververica Platform components
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Ververica Platform components
● AppManager
It sits as an orchestrator between a user’s requests,
Kubernetes, and Apache Flink
● Gateway
Container hosting the SQL service and processing any
SQL-related task.
● UI
App responsible for the Web User Interface
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Ververica Platform external resources (optional)
● Blob storage (like AWS S3)
It can be used to store artifacts if required
● IAM permissions
We can define which Kubernetes ServiceAccount with
associated IAM role we can use.
● Logging & Metrics
We can set up links to external components in which we
can read logs and metrics from Flink jobs.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Ververica Platform dictionary
● Deployment Target
DeploymentTargets correspond to different Kubernetes
namespaces in the same Kubernetes cluster.
● Deployment
A Deployment specifies the desired state of a Flink job and its
configuration. Ververica Platform tracks and reports it.
● Namespaces (in Ververica, not Kubernetes!)
Namespaces are the primary means to isolate resources between
different groups of users and grant access to resources, allowing for
multi-tenancy.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Ververica Platform dictionary
● Session Cluster
Your Deployment will be executed in a Flink Session Cluster that
may be shared with other Deployments. The lifecycle of the Flink
cluster is independent of the Deployment lifecycle.
● Application Cluster
Your Deployment will be executed in a separate Flink cluster. The
lifecycle of the Flink cluster is tied to the lifecycle of the Deployment.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Lifecycle Management
● Upgrade or suspend job
When a Deployment is suspended or a stateful upgrade is triggered,
Ververica Platform will submit a Stop-with-Savepoint command to
Apache Flink. This command will atomically trigger a Savepoint and
stop the job.
● Drain the pipeline
Draining emits the maximum watermark before stopping the job.
When the watermark is emitted, all event time timers will fire,
allowing you to process events that depend on this timer.
This is useful when you want to fully shut down your job without
leaving any unhandled events or state.
Quick start
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Repo with examples
git clone
https://github.com/getindata/ververica-platform-flink-workshop
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Prerequisites
● Container Runtime
● Installed Kubectl
● Installed Helm
Instruction can be found on Confluence.
Observability
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Observability
Observability is about measuring how well internal states of the
system can be inferred from knowledge of its external outputs
(according to the control theory).
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Part One: Metrics
Get metrics from environment and application - but how?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Prometheus - Kubernetes-native solution
open-source systems
monitoring and alerting toolkit
joined the Cloud Native Computing
Foundation in 2016 as the second
hosted project, after Kubernetes
a lot of exporters
you can write your own easily
mature ecosystem
PushGateway, Blackbox, AlertManager, etc.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Pull vs. push-based monitoring
Pull Push
Collector takes metrics Agents push metrics
Workload on central poller increases with the number of
devices polled.
Polling task fully distributed among agents, resulting in
linear scalability.
Polling protocol can potentially open up system to
remote access and denial of service attacks.
Push agents are inherently secure against remote
attacks since they do not listen for network connections.
Flexible: poller can ask for any metric at any time.
Relatively inflexible: pre-determined, fixed set of
measurements are periodically exported.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Prometheus - Stories
service discovery
simple on k8s
limited security
archived data
how old data is required?
monitor monitoring
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Part Two: Logs analytics
1. Get logs from app or environment.
2. Save logs.
3. Query them.
4. Make your system self-healing and discover what’s
happening inside your platform.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Logs analytics - which tool should I choose?
Logs Analytics for Developers Logs Analytics for Business
Loki ElasticSearch
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
ELK vs. Loki
ELK Loki + Promtail/Fluentd
Indexing Keys and content of each key Only labels
Query language Query DSL or Lucene QL LogQL
Tool for data visualisation Kibana Grafana
Query performances Faster due to indexed all the data Slower due to indexing only labels
Resource requirements Higher due to the need of indexing Lower due to index only labels
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What about alerts?
Alerts signify that
a human needs to take action
immediately
in response to something that is
either happening or about to
happen, in order to improve the
situation.
Usage
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
How to deploy job?
Web UI REST API
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Local Artifactory
● Ververica provides its own artifactory
● JARs can be also downloaded from the available source
like Nexus
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Flink job & AWS
● We can connect Flink job with our IAM user or IAM role
● Flink supports AWS IAM credentials by default
● We can run Flink via EMR or Kinesis
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Flink job & GCP
● We can connect Flink job with our IAM user or IAM role
● Flink supports AWS IAM credentials by default
● We can run Flink via Dataproc or Dataflow (as Beam
supports Flink jobs)
Useful
Commands
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Check Flink jobs status
● Check status of TaskManagers and JobManager
kubectl -n flow-ververica-jobs get po
● Check any issues with deployment
TaskManagers
kubectl -n flow-ververica-jobs get deployment
kubectl -n flow-ververica-jobs describe deployment
DEPLOYMENT_NAME - check Events tab
JobManagers
kubectl -n flow-ververica-jobs get job
kubectl -n flow-ververica-jobs describe job
DEPLOYMENT_NAME - check Events tab
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Check Flink jobs status
● Check volume claim used by Ververica
kubectl -n flow-ververica-jobs get pvc
● Check volume used by Ververica
kubectl get pv
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Get logs from application
● Ververica compontents
kubectl -n flow-ververica logs $(kubectl -n flow-ververica
get po | awk 'FNR == 2 {print $1}') CONTAINER_NAME
Here you can pass: appmanager, gateway or ui
● Flink job
kubectl -n flow-ververica-jobs get po
kubectl -n flow-ververica-jobs logs
POD_NAME_FROM_1st_COMMAND
To follow all newly appeared logs, add flag -f just after logs, example:
kubectl -n flow-ververica-jobs logs -f POD
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Get logs from application
● Access Exception from Flink Web User Interface
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Q&A
https://linkedin.com/in/albert-lewandowski
albert.lewandowski@getindata.com
Thank you for your
attention!

More Related Content

Similar to Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewandowski, GetInData

Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
OpenStack
 
Take control of your dev ops dumping ground
Take control of your  dev ops dumping groundTake control of your  dev ops dumping ground
Take control of your dev ops dumping ground
Puppet
 
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStackAutomated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
NTT Communications Technology Development
 
JDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav Tulach
JDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav TulachJDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav Tulach
JDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav Tulach
PROIDEA
 
Is 12 Factor App Right About Logging
Is 12 Factor App Right About LoggingIs 12 Factor App Right About Logging
Is 12 Factor App Right About Logging
Phil Wilkins
 
Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and Docker
Bob Killen
 
Dark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanDark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill Monkman
Ambassador Labs
 
Monitoring system for OpenStack,using a OSS products
Monitoring system for OpenStack,using a OSS productsMonitoring system for OpenStack,using a OSS products
Monitoring system for OpenStack,using a OSS products
satsuki fukazu
 
Pivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First LookPivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First Look
VMware Tanzu
 
Programmable infrastructure with FlyScript
Programmable infrastructure with FlyScriptProgrammable infrastructure with FlyScript
Programmable infrastructure with FlyScript
Riverbed Technology
 
Openshift serverless Solution
Openshift serverless SolutionOpenshift serverless Solution
Openshift serverless Solution
Ryan ZhangCheng
 
Puppet devops wdec
Puppet devops wdecPuppet devops wdec
Puppet devops wdec
Wojciech Dec
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
VMware Tanzu
 
Toronto MuleSoft_Meetup_Run Time Fabric - Self Managed Kubernetes.pptx
Toronto MuleSoft_Meetup_Run Time Fabric - Self Managed Kubernetes.pptxToronto MuleSoft_Meetup_Run Time Fabric - Self Managed Kubernetes.pptx
Toronto MuleSoft_Meetup_Run Time Fabric - Self Managed Kubernetes.pptx
Anurag Dwivedi
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
Brian Brazil
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
Monal Daxini
 
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdfPrometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Knoldus Inc.
 
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
Datacratic
 
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherOSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
NETWAYS
 
Node summit workshop
Node summit workshopNode summit workshop
Node summit workshop
Shubhra Kar
 

Similar to Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewandowski, GetInData (20)

Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
Take control of your dev ops dumping ground
Take control of your  dev ops dumping groundTake control of your  dev ops dumping ground
Take control of your dev ops dumping ground
 
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStackAutomated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
 
JDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav Tulach
JDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav TulachJDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav Tulach
JDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav Tulach
 
Is 12 Factor App Right About Logging
Is 12 Factor App Right About LoggingIs 12 Factor App Right About Logging
Is 12 Factor App Right About Logging
 
Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and Docker
 
Dark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanDark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill Monkman
 
Monitoring system for OpenStack,using a OSS products
Monitoring system for OpenStack,using a OSS productsMonitoring system for OpenStack,using a OSS products
Monitoring system for OpenStack,using a OSS products
 
Pivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First LookPivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First Look
 
Programmable infrastructure with FlyScript
Programmable infrastructure with FlyScriptProgrammable infrastructure with FlyScript
Programmable infrastructure with FlyScript
 
Openshift serverless Solution
Openshift serverless SolutionOpenshift serverless Solution
Openshift serverless Solution
 
Puppet devops wdec
Puppet devops wdecPuppet devops wdec
Puppet devops wdec
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
 
Toronto MuleSoft_Meetup_Run Time Fabric - Self Managed Kubernetes.pptx
Toronto MuleSoft_Meetup_Run Time Fabric - Self Managed Kubernetes.pptxToronto MuleSoft_Meetup_Run Time Fabric - Self Managed Kubernetes.pptx
Toronto MuleSoft_Meetup_Run Time Fabric - Self Managed Kubernetes.pptx
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdfPrometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
 
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
 
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherOSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
 
Node summit workshop
Node summit workshopNode summit workshop
Node summit workshop
 

More from GetInData

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
GetInData
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
GetInData
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competition
GetInData
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team?
GetInData
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easier
GetInData
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
GetInData
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
GetInData
 
MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...
GetInData
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
GetInData
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInData
GetInData
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
GetInData
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
GetInData
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...
GetInData
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
GetInData
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
GetInData
 
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataStrategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
GetInData
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...
GetInData
 
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
GetInData
 
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
GetInData
 

More from GetInData (20)

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competition
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team?
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easier
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
 
MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInData
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
 
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataStrategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...
 
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
 
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
 

Recently uploaded

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 

Recently uploaded (20)

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 

Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewandowski, GetInData

  • 3. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data Streaming vs. Batch Events 1 2 3 4 5 6 Batch Stream
  • 4. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Use cases User activity Fraud detection Logistics Industrial IoT Location data Recommendations
  • 5. © Copyright. All rights reserved. Not to be reproduced without prior written consent. One tool, multiple languages Java 8 or 11 Python SQL Scala 2.11 or 2.12
  • 6. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Where should I install? Standalone Kubernetes YARN cluster ● CICD process ● Service Discovery - monitoring with Prometheus ● Scalability ● Managing resources ● A/B Testing
  • 7. © Copyright. All rights reserved. Not to be reproduced without prior written consent. High Availability of Flink JobManager level Storage level ● ZooKeeper ● Kubernetes (beta) ● High Availability of storage to/from which Flink writes/reads savepoints and checkpoints ● Performance of storage Job Strategy ● Data reprocessing policy ● How to deploy new job?
  • 8. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Flink K8S Operator Kubernetes Operator for Apache Flink Ververica Platform Native Kubernetes - Apache Flink CRDs Yes Yes No No CICD Kubernetes API Kubernetes API REST API or Web UI Kubernetes API Installation Helm chart or raw Kubernetes manifests Helm chart or raw Kubernetes manifests Helm chart or raw Kubernetes manifests No need to install any component SQL Editor No No Yes No Dependencies No No Persistence volume for database Object storage for artifactory No Status beta beta production beta
  • 9. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Why Flink on Kubernetes? Simpler deployment process Flexible jobs management Simple Service Discovery - Prometheus Flexible testing
  • 10. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Installation & Configuration Helm A package manager for Kubernetes CICD tool Example: Gitlab CI Kubernetes API Flink jobs
  • 11. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Testing Incubating Mode A/B Testing Blue Green Deployment Production data Production job Job Incubating mode Separated output Standard output Dedicated TaskManagers Dedicated TaskManagers savepoi nt Proxy Flink Job #1 Flink Job #2 Result #1 Result #2
  • 12. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Job Cluster & Session Cluster Job Cluster Session Cluster Full set of Flink cluster for each individual job Standalone Flink cluster on Kubernetes Short running tasks Ad-hoc queries Long running tasks Separate images for different jobs
  • 14. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Ververica Platform components
  • 15. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Ververica Platform components ● AppManager It sits as an orchestrator between a user’s requests, Kubernetes, and Apache Flink ● Gateway Container hosting the SQL service and processing any SQL-related task. ● UI App responsible for the Web User Interface
  • 16. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Ververica Platform external resources (optional) ● Blob storage (like AWS S3) It can be used to store artifacts if required ● IAM permissions We can define which Kubernetes ServiceAccount with associated IAM role we can use. ● Logging & Metrics We can set up links to external components in which we can read logs and metrics from Flink jobs.
  • 17. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 18. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Ververica Platform dictionary ● Deployment Target DeploymentTargets correspond to different Kubernetes namespaces in the same Kubernetes cluster. ● Deployment A Deployment specifies the desired state of a Flink job and its configuration. Ververica Platform tracks and reports it. ● Namespaces (in Ververica, not Kubernetes!) Namespaces are the primary means to isolate resources between different groups of users and grant access to resources, allowing for multi-tenancy.
  • 19. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Ververica Platform dictionary ● Session Cluster Your Deployment will be executed in a Flink Session Cluster that may be shared with other Deployments. The lifecycle of the Flink cluster is independent of the Deployment lifecycle. ● Application Cluster Your Deployment will be executed in a separate Flink cluster. The lifecycle of the Flink cluster is tied to the lifecycle of the Deployment.
  • 20. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Lifecycle Management ● Upgrade or suspend job When a Deployment is suspended or a stateful upgrade is triggered, Ververica Platform will submit a Stop-with-Savepoint command to Apache Flink. This command will atomically trigger a Savepoint and stop the job. ● Drain the pipeline Draining emits the maximum watermark before stopping the job. When the watermark is emitted, all event time timers will fire, allowing you to process events that depend on this timer. This is useful when you want to fully shut down your job without leaving any unhandled events or state.
  • 22. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Repo with examples git clone https://github.com/getindata/ververica-platform-flink-workshop
  • 23. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Prerequisites ● Container Runtime ● Installed Kubectl ● Installed Helm Instruction can be found on Confluence.
  • 25. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Observability Observability is about measuring how well internal states of the system can be inferred from knowledge of its external outputs (according to the control theory).
  • 26. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Part One: Metrics Get metrics from environment and application - but how?
  • 27. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Prometheus - Kubernetes-native solution open-source systems monitoring and alerting toolkit joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes a lot of exporters you can write your own easily mature ecosystem PushGateway, Blackbox, AlertManager, etc.
  • 28. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Pull vs. push-based monitoring Pull Push Collector takes metrics Agents push metrics Workload on central poller increases with the number of devices polled. Polling task fully distributed among agents, resulting in linear scalability. Polling protocol can potentially open up system to remote access and denial of service attacks. Push agents are inherently secure against remote attacks since they do not listen for network connections. Flexible: poller can ask for any metric at any time. Relatively inflexible: pre-determined, fixed set of measurements are periodically exported.
  • 29. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Prometheus - Stories service discovery simple on k8s limited security archived data how old data is required? monitor monitoring
  • 30. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Part Two: Logs analytics 1. Get logs from app or environment. 2. Save logs. 3. Query them. 4. Make your system self-healing and discover what’s happening inside your platform.
  • 31. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Logs analytics - which tool should I choose? Logs Analytics for Developers Logs Analytics for Business Loki ElasticSearch
  • 32. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ELK vs. Loki ELK Loki + Promtail/Fluentd Indexing Keys and content of each key Only labels Query language Query DSL or Lucene QL LogQL Tool for data visualisation Kibana Grafana Query performances Faster due to indexed all the data Slower due to indexing only labels Resource requirements Higher due to the need of indexing Lower due to index only labels
  • 33. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What about alerts? Alerts signify that a human needs to take action immediately in response to something that is either happening or about to happen, in order to improve the situation.
  • 34. Usage
  • 35. © Copyright. All rights reserved. Not to be reproduced without prior written consent. How to deploy job? Web UI REST API
  • 36. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Local Artifactory ● Ververica provides its own artifactory ● JARs can be also downloaded from the available source like Nexus
  • 37. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Flink job & AWS ● We can connect Flink job with our IAM user or IAM role ● Flink supports AWS IAM credentials by default ● We can run Flink via EMR or Kinesis
  • 38. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Flink job & GCP ● We can connect Flink job with our IAM user or IAM role ● Flink supports AWS IAM credentials by default ● We can run Flink via Dataproc or Dataflow (as Beam supports Flink jobs)
  • 40. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Check Flink jobs status ● Check status of TaskManagers and JobManager kubectl -n flow-ververica-jobs get po ● Check any issues with deployment TaskManagers kubectl -n flow-ververica-jobs get deployment kubectl -n flow-ververica-jobs describe deployment DEPLOYMENT_NAME - check Events tab JobManagers kubectl -n flow-ververica-jobs get job kubectl -n flow-ververica-jobs describe job DEPLOYMENT_NAME - check Events tab
  • 41. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Check Flink jobs status ● Check volume claim used by Ververica kubectl -n flow-ververica-jobs get pvc ● Check volume used by Ververica kubectl get pv
  • 42. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Get logs from application ● Ververica compontents kubectl -n flow-ververica logs $(kubectl -n flow-ververica get po | awk 'FNR == 2 {print $1}') CONTAINER_NAME Here you can pass: appmanager, gateway or ui ● Flink job kubectl -n flow-ververica-jobs get po kubectl -n flow-ververica-jobs logs POD_NAME_FROM_1st_COMMAND To follow all newly appeared logs, add flag -f just after logs, example: kubectl -n flow-ververica-jobs logs -f POD
  • 43. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Get logs from application ● Access Exception from Flink Web User Interface
  • 44. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Q&A https://linkedin.com/in/albert-lewandowski albert.lewandowski@getindata.com
  • 45. Thank you for your attention!