Applying DevOps to Databricks can be a daunting task. In this talk this will be broken down into bite size chunks. Common DevOps subject areas will be covered, including CI/CD (Continuous Integration/Continuous Deployment), IAC (Infrastructure as Code) and Build Agents.
We will explore how to apply DevOps to Databricks (in Azure), primarily using Azure DevOps tooling. As a lot of Spark/Databricks users are Python users, will will focus on the Databricks Rest API (using Python) to perform our tasks.
2. Agenda
§ What is DevOps
§ CI/CD (Continuous
Integration/Continuous
Deployment)
§ IAC (Infrastructure as Code)
§ Build Agents
§ Databricks Rest API
§ Real World Example
§ Other Tooling Examples
4. BI
Developer
Data
Scientist
Software
Engineer
Data
Engineer
“I want to get my
dashboard
published on the
website”
“I want to
productionize my
models and have
them automatically
update”
“I want to update
the website with the
latest dashboard”
“I want to push the
latest ETL pipelines
to production”
DevOps
13. What is a Build Agent?
• It is the compute under your DevOps Pipelines
• “Out of the box” available Agents in Azure DevOps
• Custom VM Agent
• Custom Docker Container Agent
Pipeline (yml) triggered Agent found/located
Pipeline executed on
Agent
14. Why a Custom Build Agent?
• You can decide specifically what you want your code you
run on (what Linux/Windows version/docker image)
• Make sure all the tools you need are installed on your
Agent
• Keep state
• Run within a VNet (Virtual Network)
16. Why the Databricks REST API?
• Can use your existing knowledge of REST
• Can incorporate into our language of choice (Python)
• Cross Platform
https://docs.databricks.com/dev-tools/api/latest/index.html
18. What are we going to do?
• Use Python Scripts and Databricks Rest API to:
• Create a Databricks Cluster
• Check Cluster Status
• Upload Notebooks to Databricks Workspace
• Run some tests against our Python code
• Build and upload a Python Wheel to Databricks
• Install/uninstall/update Python Wheel in Databricks
• Use Azure DevOps to run our scripts
• YML Pipelines
• Custom DevOps Agent
31. What is Terraform?
Terraform is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage
hundreds of cloud services. Terraform codifies cloud APIs into declarative configuration files.
Write Plan Apply
34. Build, deploy, and manage modern cloud applications and infrastructure using familiar languages, tools, and engineering
practices.
https://github.com/pulumi/pulumi-azure
What is Pulumi?
https://www.pulumi.com/docs/reference/pkg/azure/databricks/
Cloud Engineering for Everyone
36. What is Bicep?
Write Apply
• Project Bicep – Next Generation ARM Templates
• ARM Templates can get complex
• Bicep is a cleaner more readable language, that gets compiled into ARM to
deploy (a language around ARM)
Write and Compile Bicep
Language
ARM Templates Azure Resource Manager
Deployed Solution
38. Summary
• DevOps is for Everyone
• CI/CD keeps your code in check and the latest
features/changes in production as soon as possible
• IAC is the blueprint of your solution
• Lots of tooling options
• Databricks Rest API can be used in conjunction with Python
and Azure DevOps to create effective fault tolerant pipelines