GKE, Helm, Terraform and Airflow Integration

© 2023 GitLab Inc.
Ved Prakash
Staff Data Engineer GitLab
Intro to GKE setup of
Airflow with helm and
Terraform

© 2023 GitLab Inc.
GitLab company
Our mission: Everyone Can Contribute

© 2023 GitLab Inc.
GitLab CREDIT values
Collaboration Results Iteration Transparency
Efficiency
inclusion & belonging
Diversity,

© 2023 GitLab Inc.
Plan
Create
Verify
Package
Secure
Deploy
Monitor
Govern
What GitLab is doing?
The DevSecOps
Platform
delivered as a
single application
to help you
iterate faster and
innovate together

© 2023 GitLab Inc.
Introduction to Google Kubernetes
Engine(GKE),Apache Airflow,Helm
and Terraform.
How all Components fit together.
Agenda
Best Practices and Considerations
Wins of Data platform Team.
Installation within Gitlab Data
platform.

© 2023 GitLab Inc.
Introduction to Google Kubernetes Engine (GKE)
Google Kubernetes Engine (GKE) is a
managed Kubernetes service that
simplifies containerized application
deployment, scaling, and
management on Google Cloud,
offering a robust and efficient
platform for container orchestration.
What is GKE? Key Features
● Managed Kubernetes
● Automatic Scaling
● Security and Compliance
● Integrations
Use Cases:
● Microservices Deployment
● Continuous Integration/Continuous Deployment (CI/CD)
● Scalable Applications

© 2023 GitLab Inc.
Overview of Apache Airflow
Apache Airflow is an open-source
platform designed to
programmatically author, schedule,
and monitor workflows.
What is Apache Airflow? Key Features
● Directed Acyclic Graphs (DAGs)
● Extensibility
● Dynamic Workflow Execution
● Rich UI and Logging
● Scalability
Use Cases:
● Data pipeline orchestration
● ETL (Extract, Transform, Load) processes
● Workflow automation in diverse industries

© 2023 GitLab Inc.
Terraform: Infrastructure as Code for GKE
Terraform is an open-source
Infrastructure as Code (IaC) tool that
enables users to define and provision
infrastructure using a declarative
configuration language.
What is Terraform? Key Features
● Infrastructure as Code (IaC)
● Multi-Cloud Provisioning
● Declarative Configuration
Language
● Plan and Apply Workflow
● State Management
Use Cases:
● Provisioning Servers
● Network Infrastructure
● Application Deployments

© 2023 GitLab Inc.
Helm: A Package Manager for Kubernetes
Helm is a package manager for
Kubernetes applications, simplifying
the deployment and management of
containerized applications.
What is Helm? Key Features
● Standardized Packaging
● Simplified Configuration
● Version Control and Rollbacks
● Dependency Management
Use Case
● Microservices Deployment

© 2023 GitLab Inc.
Integration of
Airflow with
GKE using Helm
and Terraform
Why Use GKE for Airflow?
Why Helm Charts for Airflow?
Terraform Modules for GKE

© 2023 GitLab Inc.
Infrastructure Provisioning with Terraform
Connecting the
Dots: GKE + Helm +
Terraform + Airflow
Terraform Sets the Foundation
Kubernetes Cluster Orchestration with GKE
GKE Ensures Seamless Operations
Package Management with Helm
Helm Charts Define Airflow Configurations:
Helm Charts Deployed on GKE
Smooth Deployment on GKE
Integrated Workflows
Orchestration with Airflow

© 2023 GitLab Inc.
GKE cluster provisioned through
Terraform
Installation
within Gitlab
Data Platform
Team
● Two Namespace (Prod and testing)
● Seven Nodepools (Different Machine type for
each type load)
● Remote state file for any changes required for
GKE cluster.
● Gitlab CI/CD pipeline to validate the Changes
done to terraform script.(This ensure the
changes will not break the terraform apply)
Airflow Installed using helm chart
● Airflow Version 2.5.3 using Helm Chart for
Apache Airflow which will bootstrap an
Airflow deployment on a Kubernetes cluster
using the Helm package manager.
● Overridden with Cloud SQL Postgres
instance.
● Git sync with the analytics repository.
● Modified Web server secret key and fernet
key

© 2023 GitLab Inc.
● 88 Active Airflow Dags
● 1200+ Task run every 24 hours
● Empowering Workflows:Task Dynamism with Airflow
● K8s Pods operator to schedule dynamic workload
● Cost effective solution
● On-Demand Node Provisioning with Terraform
● Minimal downtime , typically under 45 min, in the event of a
disaster recovery scenario.
How this benefits the data platform team
managing data pipeline?

© 2023 GitLab Inc.
Security Best Practices
● Private Cluster Configuration
● VPC Peering
● Identity and Access Management (IAM) Controls
● Node Pool Isolation
● Securing Secrets
Best Practices
and
Considerations Scalability Considerations
● Horizontal Pod Autoscaling (HPA)
● Database Scaling
● Task Parallelism
● Resource Requests and Limits
● Persistent Storage Considerations
● GKE Node Pools
Monitoring and Logging Strategies
● Leverage K8s-native monitoring solutions
Prometheus and Grafana
● Alerting and Notification Channels.
● Airflow Metrics.

© 2023 GitLab Inc.
Additional Resources
● Gitlab Handbook for information about nodepool and
namespace.
● Airflow Infrastructure
● Gitlab Data Analytics or our Dag bags

GKE, Helm, Terraform and Airflow Integration

Recommended

Recommended

More Related Content

Similar to GKE, Helm, Terraform and Airflow Integration

Similar to GKE, Helm, Terraform and Airflow Integration (20)

More from DataScienceConferenc1

More from DataScienceConferenc1 (20)

Recently uploaded

Recently uploaded (20)

GKE, Helm, Terraform and Airflow Integration

Editor's Notes