SlideShare a Scribd company logo
1 of 41
Download to read offline
Scheduled Scaling with Dask
and Argo Workflows
Dok Talks #111
Severin Ryberg 20/02/2022 2
INTRODUCTION
MOTIVATION
SETUP
DEMO
Q & A
Goals of this presentation:
o Understand why use Argo+Dask for automated data pipeline
scheduling made sense for Us
o Provide a rough overview of our infrastructure set-up
o Describe basic Argo Workflows scaling example
o Describe basic Dask data pipeline example
o Showcase set-up
INTRODUCTION
WHO AM I? WHAT DO I DO?
20/02/2022
Severin Ryberg 3
INTRODUCTION
MOTIVATION
SETUP
DEMO
Q & A
Background
20/02/2022
Severin Ryberg 4
2013
2015
Bachelors: Physics Masters: Electrical Eng.
2016
Adjunct Professor
2019
Developer
PhD researcher,
infra maintainer
Post Doctoral Researcher,
infra admin
2020
Infra Architect
Current Status
20/02/2022
Severin Ryberg 5
Infrastructure Architect
o Start-up founded in Mid 2020
• Closing in on 50 employees (Hiring!)
o Battery Intelligence as a Service
• Basic Monitoring
• State of Health
• Safety alerting
• Operation optimization
o USP
• Spin-off from renown research group
• 100% software driven
• Born in the cloud
o Primary tools / languages
• Python
• AWS
• Kubernetes
o Sole infra developer for awhile
• “Infra“ Team size now around 10
o Data Engineer
• Onboarding new customers, Data conditioning
o Developer
• Develop & maintain company-wide fudamental
tools. Primarily in Python. (Hint! Dask )
o Devops Engineer
• Automate my job away using GitLab CI
o Cloud Engineer
• AWS: EKS, S3, IAM, Lambda, and all that‘s in between
o Kubernetes Engineer
• Computation pipeline schedule & scale reliably
(Hint! Argo )
MOTIVATION
WHAT PROBLEMS NEED SOLVING?
20/02/2022
Severin Ryberg 6
INTRODUCTION
MOTIVATION
SETUP
DEMO
Q & A
Operational
o Stay in Python
• Aligns with developer team’s skillset
• Spark flips between Python and the JVM
o Need to scale generic black-box functions
• Goes beyond “simple” map/reduce
o Shared development experience in testing and production environments
• Same code for sequential, parallel, and distributed contexts
o Conduct both batch-processing and shared parallel-processing
o Promote self-service for data engineers
ACCURE’S REQUIREMENTS
20/02/2022
Severin Ryberg 7
Infrastructure and Security
o Ultra-low latency parallelization
• Pod spin up times greatly slow down workflow runs
• Need to set up pools of pods for highly parallelized
operations
o Multi-tenant environment
• Dedicated namespace per customers
• Operators can access each namespace
• Workflow service account scoped to the namespace
o Cost efficient computations
• Elastic compute infrastructure should automatically
scale up and down according to load
ACCURE’S REQUIREMENTS
20/02/2022
Severin Ryberg 8
o Deployment automation and version controlling
o High data throughput
• Avoid database bottlenecks for data-in and data-out
o Secure access
• Only robots access production environment
• Customer-specific credentials allow access to own data
o Dependable scheduling
o Exporting of logs to ELK
o Archiving of workflow execution history
Why not use a service?
o Apache Airflow
• High learning curve for pipeline developers
• Poor Kubernetes support
- Note! Prior to Airflow 2.0
• Still need to maintain the Kubernetes cluster yourself
o Prefect
• First tried option
• Early-stage start-up going through its own growing pains
- Note! This was in Jan – March 2021
• Change it cost model would drastically change our price point
o AWS Batch, AWS Glue, AWS Data Pipeline, etc…
• Batch was used when having troubles with other solutions. Is okay, but not very flexible
• We have a preference to stay cloud-agnostic as much as possible
20/02/2022
Severin Ryberg 9
Why not use a service?
o Apache Airflow
• High learning curve for pipeline developers
• Poor Kubernetes support
- Note! Prior to Airflow 2.0
• Still need to maintain the Kubernetes cluster yourself
o Prefect
• First tried option
• Early-stage start-up going through its own growing pains
- Note! This was in Jan – March 2021
• Change it cost model would drastically change our price point
o AWS Batch, AWS Glue, AWS Data Pipeline, etc…
• Batch was used when having troubles with other solutions. Is okay, but not very flexible
• We have a preference to stay cloud-agnostic as much as possible
20/02/2022
Severin Ryberg 10
SETUP
KUBERNETES, ARGO, DASK, JOY!
20/02/2022
Severin Ryberg 11
INTRODUCTION
MOTIVATION
SETUP
DEMO
Q & A
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 12
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 13
Kubernetes
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 14
Kubernetes
K8S
Autoscaler
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 15
Kubernetes
K8S
Autoscaler
Argo
Workflows
Argo Workflows Overview
Argo Workflows is an open source container-native workflow engine
for orchestrating parallel jobs on Kubernetes. Argo Workflows is
implemented as a Kubernetes CRD.
o Define workflows where each step in the workflow is a container.
o Model multi-step workflows as a sequence of tasks or capture the
dependencies between tasks using a graph (DAG).
- Argo Workflows homepage
INTERMISSION
20/02/2022
Severin Ryberg 16
https://github.com/mcgrawia/argocon-21-demo
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 17
Kubernetes
K8S
Autoscaler
Argo
Workflows
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 18
Kubernetes
K8S
Autoscaler
Argo
Workflows
ELK
Prome
theus
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 19
Kubernetes
K8S
Autoscaler
Argo
Workflows
ELK
Prome
theus
Postgresql
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 20
Kubernetes
K8S
Autoscaler
Argo
Workflows
ELK
Prome
theus
Postgresql S3
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 21
Kubernetes
K8S
Autoscaler
Argo
Workflows
ELK
Prome
theus
Postgresql S3
Pulumi
Gitlab
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 22
Kubernetes
K8S
Autoscaler
Argo
Workflows
ELK
Prome
theus
Postgresql S3
Pulumi
Gitlab
Dask
Dask Overview
o Dask: high-throughput data-pipelines in python
o Comes out of the box with
• Multi-domain execution
• Dask-dataframes & futures interface
o Supporting resources
• Dask development support from the community, and as a service
• Cluster-provisioning services (e.g. Coiled)
o ACCURE had to implement
• Work-avoidance
• Task artifacting (on S3)
• Logging to ELK
INTERMISSION II
20/02/2022
Severin Ryberg 23
https://docs.dask.org/en/latest/
https://docs.dask.org/en/latest/graphs.html
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 24
Kubernetes
K8S
Autoscaler
Argo
Workflows
ELK
Prome
theus
Postgresql S3
Pulumi
Gitlab
Dask
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 25
Kubernetes
K8S
Autoscaler
Argo
Workflows
ELK
Prome
theus
Postgresql S3
Pulumi
Gitlab
Dask Coiled
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 26
Kubernetes
K8S
Autoscaler
Argo
Workflows
ELK
Prome
theus
Postgresql S3
Pulumi
Gitlab
Dask Coiled
ACCURE Utilities
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The stack
20/02/2022
Severin Ryberg 27
Kubernetes
K8S
Autoscaler
Argo
Workflows
ELK
Prome
theus
Postgresql S3
Pulumi
Gitlab
Dask Coiled
ACCURE Utilities
ACCURE Data Pipelines
Combining Dask and Argo Workflows
20/02/2022
Severin Ryberg 28
Pipeline configuration (Workflow Template)
Standard dask scaler (Cluster Workflow Template)
Initiate workflow
Dask Scheduler
(daemon)
Dask Worker Deployment
(daemon)
Worker tear down
(on exit)
Primary pipeline
Combining Dask and Argo Workflows
20/02/2022
Severin Ryberg 29
Pipeline configuration (Workflow Template)
Standard dask scaler (Cluster Workflow Template)
Initiate workflow
Dask Scheduler
(daemon)
Dask Worker Deployment
(daemon)
Worker tear down
(on exit)
Primary pipeline
Image repo, pod resources (workers,
scheduler, pipeline), pipeline settings
Combining Dask and Argo Workflows
20/02/2022
Severin Ryberg 30
Pipeline configuration (Workflow Template)
Standard dask scaler (Cluster Workflow Template)
Initiate workflow
Dask Scheduler
(daemon)
Dask Worker Deployment
(daemon)
Worker tear down
(on exit)
Primary pipeline
Image repo, pod resources (workers,
scheduler, pipeline), pipeline settings
Combining Dask and Argo Workflows
20/02/2022
Severin Ryberg 31
Pipeline configuration (Workflow Template)
Standard dask scaler (Cluster Workflow Template)
Initiate workflow
Dask Scheduler
(daemon)
Dask Worker Deployment
(daemon)
Worker tear down
(on exit)
Primary pipeline
Image repo, pod resources (workers,
scheduler, pipeline), pipeline settings
Image tag, worker count,
pipeline arguments
Combining Dask and Argo Workflows
20/02/2022
Severin Ryberg 32
Pipeline configuration (Workflow Template)
Standard dask scaler (Cluster Workflow Template)
Initiate workflow
Dask Scheduler
(daemon)
Dask Worker Deployment
(daemon)
Worker tear down
(on exit)
Primary pipeline
Image repo, pod resources (workers,
scheduler, pipeline), pipeline settings
Image tag, worker count,
pipeline arguments
“Product-specific Dask image”
Python:3.8.8-buster
● Pip-install dask, distributed, bokeh, and
whatever else you need
● Activate environment (e.g. conda)
● Ensure “dask-scheduler’ and “dask-
worker” are on path
● Inject python script
Combining Dask and Argo Workflows
20/02/2022
Severin Ryberg 33
Pipeline configuration (Workflow Template)
Standard dask scaler (Cluster Workflow Template)
Initiate workflow
Dask Scheduler
(daemon)
Dask Worker Deployment
(daemon)
Worker tear down
(on exit)
Primary pipeline
Image repo, pod resources (workers,
scheduler, pipeline), pipeline settings
Image tag, worker count,
pipeline arguments
“Product-specific Dask image”
Python:3.8.8-buster
● Pip-install dask, distributed, bokeh, and
whatever else you need
● Activate environment (e.g. conda)
● Ensure “dask-scheduler’ and “dask-
worker” are on path
● Inject python script
Example Dask Pipeline - Today’s example
o Timeseries weather data in Spain
• Madrid, Valencia, Barcelona,
Seville, Bilbao
o Which is the windiest city!?
20/02/2022
Severin Ryberg 34
Get all timestamps
Get all timestamps
Get all timestamps
Get all timestamps
Identify windiest city at
timestamp
Count windiest city
observations
Report
pipeline
dask worker
Computed in…
DEMO
THE PROOF IS IN THE PUDDING
20/02/2022
Severin Ryberg 35
INTRODUCTION
MOTIVATION
SETUP
DEMO
Q & A
Git Repo
o Code is available at: https://github.com/pipekit/argocon-demos
• Created with the help of the Pipekit team! (https://pipekit.io/)
o Contains:
• Sample Argo workflows installation
- Workload is too heavy for Docker Desktop, Minikube, etc. - use cloud k8s
• Cluster workflow template containing Dask pipeline
• Workflow template that invokes the Dask pipeline
• Dockerized python pipeline that schedules tasks on Dask
• Sample weather data for major cities in Spain
20/02/2022
Severin Ryberg 36
Stated Requirements:
o Python
o Black-box generic
o Shared development experience
o Batch- and shared parallel-processing
o Self-service
o Ultra-low latency
o Multi-tenant
o Cost efficient
o Secure
o Scheduling
o Logs
o Archiving
o Data throughput
o Automated and versioned
The demo stack
20/02/2022
Severin Ryberg 37
Kubernetes
Argo
Workflows
Dask
Simple Data
Pipeline
Kubernetes Cluster Internals - Demo
o Any “substantial” Kubernetes cluster should work
• Locally-hosted K3S, minikube, or Docker Desktop don’t work
o Cluster wide Argo-Workflows installation
• Provision a namespace for each customer
• Operators have access to creating, updating, deleting
Argo Workflows and Workflow Templates in namespaces
o Customer-specific namespaces
• Workflow templates differentiated by namespace,
configured for specific customer context (i.e. IAM secret)
• Cron workflows trigger workflow templates
• Customer-specific credentials and other secrets isolated
to namespace
20/02/2022
Severin Ryberg 38
Kubernetes cluster
Cluster scope
Customer-
namespaces
Workflow
Template
CRON Wf
Template
Customer
specific config
Argo server
Controller
config-map
Argo
controller
Cluster Wf
Template
RoleBinding
ClusterRole
We’re Live
20/02/2022
Severin Ryberg 39
Closing remarks
o When to use Argo Workflows & Dask vs when not to:
1. Scheduled vs. “ad-hoc” computing
- Scheduled / automated workflows? Argo and Dask is great!
- Are humans in the loop? If so, a Dask-as-a-service may be better (e.g. Coiled)
 Provides for fast development iteration
 Empowers self-service of data engineers
2. Dask is not a silver bullet
- Some instabilities for long-running tasks (hours, yes - days, no)
- Can do batch processing, but not the main focus
- Dask is an active project. Community involvement / interest can help improve this!
o Dask Alternatives (no significant personal experience)
• Couler – Develop Argo workflows directly in python
• Ray – Low level parallelization engine in Python
20/02/2022
Severin Ryberg 40
Q & A
r/RoastMe
20/02/2022
Severin Ryberg 41
INTRODUCTION
MOTIVATION
SETUP
DEMO
Q & A
s.ryberg@accure.net
Github: sevberg
linkedIn: david-severin-ryberg

More Related Content

What's hot

Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023HyunJoon Jung
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
Intro to Azure OpenAI Service L100 (Thai Ver).pdf
Intro to Azure OpenAI Service L100 (Thai Ver).pdfIntro to Azure OpenAI Service L100 (Thai Ver).pdf
Intro to Azure OpenAI Service L100 (Thai Ver).pdfKorkrid Akepanidtaworn
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsDatabricks
 
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)Naoki (Neo) SATO
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentationIlias Okacha
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup Omid Vahdaty
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Databricks
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and DeltaDatabricks
 
OPA: The Cloud Native Policy Engine
OPA: The Cloud Native Policy EngineOPA: The Cloud Native Policy Engine
OPA: The Cloud Native Policy EngineTorin Sandall
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural searchDmitry Kan
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters MongoDB
 

What's hot (20)

Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Fluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at ScaleFluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at Scale
 
Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Intro to Azure OpenAI Service L100 (Thai Ver).pdf
Intro to Azure OpenAI Service L100 (Thai Ver).pdfIntro to Azure OpenAI Service L100 (Thai Ver).pdf
Intro to Azure OpenAI Service L100 (Thai Ver).pdf
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
 
OPA: The Cloud Native Policy Engine
OPA: The Cloud Native Policy EngineOPA: The Cloud Native Policy Engine
OPA: The Cloud Native Policy Engine
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 

Similar to Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows

AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 ServerAWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 ServerScality
 
A Unified Process for Code and Configuration in Kubernetes
A Unified Process for Code and Configuration in KubernetesA Unified Process for Code and Configuration in Kubernetes
A Unified Process for Code and Configuration in KubernetesOmerKahani
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...QAware GmbH
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupKaxil Naik
 
Distributed application usecase on docker
Distributed application usecase on dockerDistributed application usecase on docker
Distributed application usecase on dockerHiroshi Miura
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On DemandBogdan Kyryliuk
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkDatabricks
 
Use GitLab with Chaos Engineering to Harden your Applications + OpenEBS 1.3 ...
 Use GitLab with Chaos Engineering to Harden your Applications + OpenEBS 1.3 ... Use GitLab with Chaos Engineering to Harden your Applications + OpenEBS 1.3 ...
Use GitLab with Chaos Engineering to Harden your Applications + OpenEBS 1.3 ...MayaData Inc
 
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for UnknownsTectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for UnknownsCoreOS
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko GlobalLogic Ukraine
 
PoC Requirements and Use Cases
PoC Requirements and Use CasesPoC Requirements and Use Cases
PoC Requirements and Use Casesjennimenni
 
ODSA - PoC Requirements and Use Cases
ODSA - PoC Requirements and Use CasesODSA - PoC Requirements and Use Cases
ODSA - PoC Requirements and Use CasesODSA Workgroup
 
Container Attached Storage (CAS) with OpenEBS - SDC 2018
Container Attached Storage (CAS) with OpenEBS -  SDC 2018Container Attached Storage (CAS) with OpenEBS -  SDC 2018
Container Attached Storage (CAS) with OpenEBS - SDC 2018OpenEBS
 
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
 Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e... Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...VMware Tanzu
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsIgor Sfiligoi
 

Similar to Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows (20)

OpenDataPlane Project
OpenDataPlane ProjectOpenDataPlane Project
OpenDataPlane Project
 
56k.cloud training
56k.cloud training56k.cloud training
56k.cloud training
 
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 ServerAWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
 
2 万林涛
2 万林涛2 万林涛
2 万林涛
 
A Unified Process for Code and Configuration in Kubernetes
A Unified Process for Code and Configuration in KubernetesA Unified Process for Code and Configuration in Kubernetes
A Unified Process for Code and Configuration in Kubernetes
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
 
Distributed application usecase on docker
Distributed application usecase on dockerDistributed application usecase on docker
Distributed application usecase on docker
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
Use GitLab with Chaos Engineering to Harden your Applications + OpenEBS 1.3 ...
 Use GitLab with Chaos Engineering to Harden your Applications + OpenEBS 1.3 ... Use GitLab with Chaos Engineering to Harden your Applications + OpenEBS 1.3 ...
Use GitLab with Chaos Engineering to Harden your Applications + OpenEBS 1.3 ...
 
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for UnknownsTectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
PoC Requirements and Use Cases
PoC Requirements and Use CasesPoC Requirements and Use Cases
PoC Requirements and Use Cases
 
ODSA - PoC Requirements and Use Cases
ODSA - PoC Requirements and Use CasesODSA - PoC Requirements and Use Cases
ODSA - PoC Requirements and Use Cases
 
Container Attached Storage (CAS) with OpenEBS - SDC 2018
Container Attached Storage (CAS) with OpenEBS -  SDC 2018Container Attached Storage (CAS) with OpenEBS -  SDC 2018
Container Attached Storage (CAS) with OpenEBS - SDC 2018
 
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
 Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e... Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the Clouds
 

More from DoKC

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDoKC
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsDoKC
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryDoKC
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...DoKC
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on KubernetesDoKC
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...DoKC
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyDoKC
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...DoKC
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudDoKC
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native DatabaseDoKC
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023DoKC
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentDoKC
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154DoKC
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...DoKC
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151DoKC
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...DoKC
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147DoKC
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...DoKC
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sDoKC
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators DoKC
 

More from DoKC (20)

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and How
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on Kubernetes
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-Ready
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators
 

Recently uploaded

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

Recently uploaded (20)

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows

  • 1. Scheduled Scaling with Dask and Argo Workflows Dok Talks #111
  • 2. Severin Ryberg 20/02/2022 2 INTRODUCTION MOTIVATION SETUP DEMO Q & A Goals of this presentation: o Understand why use Argo+Dask for automated data pipeline scheduling made sense for Us o Provide a rough overview of our infrastructure set-up o Describe basic Argo Workflows scaling example o Describe basic Dask data pipeline example o Showcase set-up
  • 3. INTRODUCTION WHO AM I? WHAT DO I DO? 20/02/2022 Severin Ryberg 3 INTRODUCTION MOTIVATION SETUP DEMO Q & A
  • 4. Background 20/02/2022 Severin Ryberg 4 2013 2015 Bachelors: Physics Masters: Electrical Eng. 2016 Adjunct Professor 2019 Developer PhD researcher, infra maintainer Post Doctoral Researcher, infra admin 2020 Infra Architect
  • 5. Current Status 20/02/2022 Severin Ryberg 5 Infrastructure Architect o Start-up founded in Mid 2020 • Closing in on 50 employees (Hiring!) o Battery Intelligence as a Service • Basic Monitoring • State of Health • Safety alerting • Operation optimization o USP • Spin-off from renown research group • 100% software driven • Born in the cloud o Primary tools / languages • Python • AWS • Kubernetes o Sole infra developer for awhile • “Infra“ Team size now around 10 o Data Engineer • Onboarding new customers, Data conditioning o Developer • Develop & maintain company-wide fudamental tools. Primarily in Python. (Hint! Dask ) o Devops Engineer • Automate my job away using GitLab CI o Cloud Engineer • AWS: EKS, S3, IAM, Lambda, and all that‘s in between o Kubernetes Engineer • Computation pipeline schedule & scale reliably (Hint! Argo )
  • 6. MOTIVATION WHAT PROBLEMS NEED SOLVING? 20/02/2022 Severin Ryberg 6 INTRODUCTION MOTIVATION SETUP DEMO Q & A
  • 7. Operational o Stay in Python • Aligns with developer team’s skillset • Spark flips between Python and the JVM o Need to scale generic black-box functions • Goes beyond “simple” map/reduce o Shared development experience in testing and production environments • Same code for sequential, parallel, and distributed contexts o Conduct both batch-processing and shared parallel-processing o Promote self-service for data engineers ACCURE’S REQUIREMENTS 20/02/2022 Severin Ryberg 7
  • 8. Infrastructure and Security o Ultra-low latency parallelization • Pod spin up times greatly slow down workflow runs • Need to set up pools of pods for highly parallelized operations o Multi-tenant environment • Dedicated namespace per customers • Operators can access each namespace • Workflow service account scoped to the namespace o Cost efficient computations • Elastic compute infrastructure should automatically scale up and down according to load ACCURE’S REQUIREMENTS 20/02/2022 Severin Ryberg 8 o Deployment automation and version controlling o High data throughput • Avoid database bottlenecks for data-in and data-out o Secure access • Only robots access production environment • Customer-specific credentials allow access to own data o Dependable scheduling o Exporting of logs to ELK o Archiving of workflow execution history
  • 9. Why not use a service? o Apache Airflow • High learning curve for pipeline developers • Poor Kubernetes support - Note! Prior to Airflow 2.0 • Still need to maintain the Kubernetes cluster yourself o Prefect • First tried option • Early-stage start-up going through its own growing pains - Note! This was in Jan – March 2021 • Change it cost model would drastically change our price point o AWS Batch, AWS Glue, AWS Data Pipeline, etc… • Batch was used when having troubles with other solutions. Is okay, but not very flexible • We have a preference to stay cloud-agnostic as much as possible 20/02/2022 Severin Ryberg 9
  • 10. Why not use a service? o Apache Airflow • High learning curve for pipeline developers • Poor Kubernetes support - Note! Prior to Airflow 2.0 • Still need to maintain the Kubernetes cluster yourself o Prefect • First tried option • Early-stage start-up going through its own growing pains - Note! This was in Jan – March 2021 • Change it cost model would drastically change our price point o AWS Batch, AWS Glue, AWS Data Pipeline, etc… • Batch was used when having troubles with other solutions. Is okay, but not very flexible • We have a preference to stay cloud-agnostic as much as possible 20/02/2022 Severin Ryberg 10
  • 11. SETUP KUBERNETES, ARGO, DASK, JOY! 20/02/2022 Severin Ryberg 11 INTRODUCTION MOTIVATION SETUP DEMO Q & A
  • 12. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 12
  • 13. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 13 Kubernetes
  • 14. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 14 Kubernetes K8S Autoscaler
  • 15. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 15 Kubernetes K8S Autoscaler Argo Workflows
  • 16. Argo Workflows Overview Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD. o Define workflows where each step in the workflow is a container. o Model multi-step workflows as a sequence of tasks or capture the dependencies between tasks using a graph (DAG). - Argo Workflows homepage INTERMISSION 20/02/2022 Severin Ryberg 16 https://github.com/mcgrawia/argocon-21-demo
  • 17. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 17 Kubernetes K8S Autoscaler Argo Workflows
  • 18. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 18 Kubernetes K8S Autoscaler Argo Workflows ELK Prome theus
  • 19. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 19 Kubernetes K8S Autoscaler Argo Workflows ELK Prome theus Postgresql
  • 20. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 20 Kubernetes K8S Autoscaler Argo Workflows ELK Prome theus Postgresql S3
  • 21. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 21 Kubernetes K8S Autoscaler Argo Workflows ELK Prome theus Postgresql S3 Pulumi Gitlab
  • 22. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 22 Kubernetes K8S Autoscaler Argo Workflows ELK Prome theus Postgresql S3 Pulumi Gitlab Dask
  • 23. Dask Overview o Dask: high-throughput data-pipelines in python o Comes out of the box with • Multi-domain execution • Dask-dataframes & futures interface o Supporting resources • Dask development support from the community, and as a service • Cluster-provisioning services (e.g. Coiled) o ACCURE had to implement • Work-avoidance • Task artifacting (on S3) • Logging to ELK INTERMISSION II 20/02/2022 Severin Ryberg 23 https://docs.dask.org/en/latest/ https://docs.dask.org/en/latest/graphs.html
  • 24. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 24 Kubernetes K8S Autoscaler Argo Workflows ELK Prome theus Postgresql S3 Pulumi Gitlab Dask
  • 25. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 25 Kubernetes K8S Autoscaler Argo Workflows ELK Prome theus Postgresql S3 Pulumi Gitlab Dask Coiled
  • 26. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 26 Kubernetes K8S Autoscaler Argo Workflows ELK Prome theus Postgresql S3 Pulumi Gitlab Dask Coiled ACCURE Utilities
  • 27. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The stack 20/02/2022 Severin Ryberg 27 Kubernetes K8S Autoscaler Argo Workflows ELK Prome theus Postgresql S3 Pulumi Gitlab Dask Coiled ACCURE Utilities ACCURE Data Pipelines
  • 28. Combining Dask and Argo Workflows 20/02/2022 Severin Ryberg 28 Pipeline configuration (Workflow Template) Standard dask scaler (Cluster Workflow Template) Initiate workflow Dask Scheduler (daemon) Dask Worker Deployment (daemon) Worker tear down (on exit) Primary pipeline
  • 29. Combining Dask and Argo Workflows 20/02/2022 Severin Ryberg 29 Pipeline configuration (Workflow Template) Standard dask scaler (Cluster Workflow Template) Initiate workflow Dask Scheduler (daemon) Dask Worker Deployment (daemon) Worker tear down (on exit) Primary pipeline Image repo, pod resources (workers, scheduler, pipeline), pipeline settings
  • 30. Combining Dask and Argo Workflows 20/02/2022 Severin Ryberg 30 Pipeline configuration (Workflow Template) Standard dask scaler (Cluster Workflow Template) Initiate workflow Dask Scheduler (daemon) Dask Worker Deployment (daemon) Worker tear down (on exit) Primary pipeline Image repo, pod resources (workers, scheduler, pipeline), pipeline settings
  • 31. Combining Dask and Argo Workflows 20/02/2022 Severin Ryberg 31 Pipeline configuration (Workflow Template) Standard dask scaler (Cluster Workflow Template) Initiate workflow Dask Scheduler (daemon) Dask Worker Deployment (daemon) Worker tear down (on exit) Primary pipeline Image repo, pod resources (workers, scheduler, pipeline), pipeline settings Image tag, worker count, pipeline arguments
  • 32. Combining Dask and Argo Workflows 20/02/2022 Severin Ryberg 32 Pipeline configuration (Workflow Template) Standard dask scaler (Cluster Workflow Template) Initiate workflow Dask Scheduler (daemon) Dask Worker Deployment (daemon) Worker tear down (on exit) Primary pipeline Image repo, pod resources (workers, scheduler, pipeline), pipeline settings Image tag, worker count, pipeline arguments “Product-specific Dask image” Python:3.8.8-buster ● Pip-install dask, distributed, bokeh, and whatever else you need ● Activate environment (e.g. conda) ● Ensure “dask-scheduler’ and “dask- worker” are on path ● Inject python script
  • 33. Combining Dask and Argo Workflows 20/02/2022 Severin Ryberg 33 Pipeline configuration (Workflow Template) Standard dask scaler (Cluster Workflow Template) Initiate workflow Dask Scheduler (daemon) Dask Worker Deployment (daemon) Worker tear down (on exit) Primary pipeline Image repo, pod resources (workers, scheduler, pipeline), pipeline settings Image tag, worker count, pipeline arguments “Product-specific Dask image” Python:3.8.8-buster ● Pip-install dask, distributed, bokeh, and whatever else you need ● Activate environment (e.g. conda) ● Ensure “dask-scheduler’ and “dask- worker” are on path ● Inject python script
  • 34. Example Dask Pipeline - Today’s example o Timeseries weather data in Spain • Madrid, Valencia, Barcelona, Seville, Bilbao o Which is the windiest city!? 20/02/2022 Severin Ryberg 34 Get all timestamps Get all timestamps Get all timestamps Get all timestamps Identify windiest city at timestamp Count windiest city observations Report pipeline dask worker Computed in…
  • 35. DEMO THE PROOF IS IN THE PUDDING 20/02/2022 Severin Ryberg 35 INTRODUCTION MOTIVATION SETUP DEMO Q & A
  • 36. Git Repo o Code is available at: https://github.com/pipekit/argocon-demos • Created with the help of the Pipekit team! (https://pipekit.io/) o Contains: • Sample Argo workflows installation - Workload is too heavy for Docker Desktop, Minikube, etc. - use cloud k8s • Cluster workflow template containing Dask pipeline • Workflow template that invokes the Dask pipeline • Dockerized python pipeline that schedules tasks on Dask • Sample weather data for major cities in Spain 20/02/2022 Severin Ryberg 36
  • 37. Stated Requirements: o Python o Black-box generic o Shared development experience o Batch- and shared parallel-processing o Self-service o Ultra-low latency o Multi-tenant o Cost efficient o Secure o Scheduling o Logs o Archiving o Data throughput o Automated and versioned The demo stack 20/02/2022 Severin Ryberg 37 Kubernetes Argo Workflows Dask Simple Data Pipeline
  • 38. Kubernetes Cluster Internals - Demo o Any “substantial” Kubernetes cluster should work • Locally-hosted K3S, minikube, or Docker Desktop don’t work o Cluster wide Argo-Workflows installation • Provision a namespace for each customer • Operators have access to creating, updating, deleting Argo Workflows and Workflow Templates in namespaces o Customer-specific namespaces • Workflow templates differentiated by namespace, configured for specific customer context (i.e. IAM secret) • Cron workflows trigger workflow templates • Customer-specific credentials and other secrets isolated to namespace 20/02/2022 Severin Ryberg 38 Kubernetes cluster Cluster scope Customer- namespaces Workflow Template CRON Wf Template Customer specific config Argo server Controller config-map Argo controller Cluster Wf Template RoleBinding ClusterRole
  • 40. Closing remarks o When to use Argo Workflows & Dask vs when not to: 1. Scheduled vs. “ad-hoc” computing - Scheduled / automated workflows? Argo and Dask is great! - Are humans in the loop? If so, a Dask-as-a-service may be better (e.g. Coiled)  Provides for fast development iteration  Empowers self-service of data engineers 2. Dask is not a silver bullet - Some instabilities for long-running tasks (hours, yes - days, no) - Can do batch processing, but not the main focus - Dask is an active project. Community involvement / interest can help improve this! o Dask Alternatives (no significant personal experience) • Couler – Develop Argo workflows directly in python • Ray – Low level parallelization engine in Python 20/02/2022 Severin Ryberg 40
  • 41. Q & A r/RoastMe 20/02/2022 Severin Ryberg 41 INTRODUCTION MOTIVATION SETUP DEMO Q & A s.ryberg@accure.net Github: sevberg linkedIn: david-severin-ryberg