Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

DevOps for Databricks Slide 1 DevOps for Databricks Slide 2 DevOps for Databricks Slide 3 DevOps for Databricks Slide 4 DevOps for Databricks Slide 5 DevOps for Databricks Slide 6 DevOps for Databricks Slide 7 DevOps for Databricks Slide 8 DevOps for Databricks Slide 9 DevOps for Databricks Slide 10 DevOps for Databricks Slide 11 DevOps for Databricks Slide 12 DevOps for Databricks Slide 13 DevOps for Databricks Slide 14 DevOps for Databricks Slide 15 DevOps for Databricks Slide 16 DevOps for Databricks Slide 17 DevOps for Databricks Slide 18 DevOps for Databricks Slide 19 DevOps for Databricks Slide 20 DevOps for Databricks Slide 21 DevOps for Databricks Slide 22 DevOps for Databricks Slide 23 DevOps for Databricks Slide 24 DevOps for Databricks Slide 25 DevOps for Databricks Slide 26 DevOps for Databricks Slide 27 DevOps for Databricks Slide 28 DevOps for Databricks Slide 29 DevOps for Databricks Slide 30 DevOps for Databricks Slide 31 DevOps for Databricks Slide 32 DevOps for Databricks Slide 33 DevOps for Databricks Slide 34 DevOps for Databricks Slide 35 DevOps for Databricks Slide 36 DevOps for Databricks Slide 37 DevOps for Databricks Slide 38 DevOps for Databricks Slide 39
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

DevOps for Databricks

Download to read offline

Applying DevOps to Databricks can be a daunting task. In this talk this will be broken down into bite size chunks. Common DevOps subject areas will be covered, including CI/CD (Continuous Integration/Continuous Deployment), IAC (Infrastructure as Code) and Build Agents.

We will explore how to apply DevOps to Databricks (in Azure), primarily using Azure DevOps tooling. As a lot of Spark/Databricks users are Python users, will will focus on the Databricks Rest API (using Python) to perform our tasks.

Related Books

Free with a 30 day trial from Scribd

See all

DevOps for Databricks

  1. 1. DevOps for Databricks Anna-Maria Wykes Data Engineering Consultant
  2. 2. Agenda § What is DevOps § CI/CD (Continuous Integration/Continuous Deployment) § IAC (Infrastructure as Code) § Build Agents § Databricks Rest API § Real World Example § Other Tooling Examples
  3. 3. What is DevOps?
  4. 4. BI Developer Data Scientist Software Engineer Data Engineer “I want to get my dashboard published on the website” “I want to productionize my models and have them automatically update” “I want to update the website with the latest dashboard” “I want to push the latest ETL pipelines to production” DevOps
  5. 5. DevOps
  6. 6. DevOps Pipelines Development Test Production
  7. 7. DevOps Tools Continuous Integration/Continuous Deployment (CI/CD) Infrastructure as Code (IAC) ARM (Azure Resource Manager) Templates Azure Bicep
  8. 8. Continuous Integration & Continuous Deployment (CI/CD)
  9. 9. CI/CD • Continues Improvements • Feature releases • Fast bug fixes • Ability to quickly rollback • Testing • Unit/Integration/End to End testing • Linting (check code is formatted correctly)
  10. 10. Infrastructure as Code (IAC)
  11. 11. IAC: The Blueprint of your Solution
  12. 12. Build Agents
  13. 13. What is a Build Agent? • It is the compute under your DevOps Pipelines • “Out of the box” available Agents in Azure DevOps • Custom VM Agent • Custom Docker Container Agent Pipeline (yml) triggered Agent found/located Pipeline executed on Agent
  14. 14. Why a Custom Build Agent? • You can decide specifically what you want your code you run on (what Linux/Windows version/docker image) • Make sure all the tools you need are installed on your Agent • Keep state • Run within a VNet (Virtual Network)
  15. 15. Databricks Rest API
  16. 16. Why the Databricks REST API? • Can use your existing knowledge of REST • Can incorporate into our language of choice (Python) • Cross Platform https://docs.databricks.com/dev-tools/api/latest/index.html
  17. 17. Real World Example
  18. 18. What are we going to do? • Use Python Scripts and Databricks Rest API to: • Create a Databricks Cluster • Check Cluster Status • Upload Notebooks to Databricks Workspace • Run some tests against our Python code • Build and upload a Python Wheel to Databricks • Install/uninstall/update Python Wheel in Databricks • Use Azure DevOps to run our scripts • YML Pipelines • Custom DevOps Agent
  19. 19. Create a Databricks Cluster Live demo using VSCode and Databricks in Azure
  20. 20. Check Cluster Status Live demo using VSCode and Databricks in Azure
  21. 21. Live demo using VSCode and Databricks in Azure Upload Notebooks to Databricks Workspace
  22. 22. Run some tests against our Python code Live demo using VSCode and Databricks in Azure
  23. 23. Live demo using VSCode and Databricks in Azure Build and upload a Python Wheel to Databricks
  24. 24. Live demo using VSCode and Databricks in Azure Install/uninstall/update Python Wheel in Databricks
  25. 25. Introduction to Azure DevOps • How to Create a Pipeline • How to Run a Pipeline • How to use a Custom Agent
  26. 26. How to Create and Run a DevOps Pipeline Live demo using Azure DevOps
  27. 27. Adding our Python Scripts to a Pipeline Live demo using Azure DevOps and Databricks in Azure
  28. 28. Using a Custom DevOps Agent Live demo using Azure DevOps and Databricks in Azure
  29. 29. Examples of other DevOps IAC tools
  30. 30. Azure ARM templates Azure Bicep Other IAC Tools
  31. 31. What is Terraform? Terraform is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage hundreds of cloud services. Terraform codifies cloud APIs into declarative configuration files. Write Plan Apply
  32. 32. Terraform for Databricks
  33. 33. Terraform for Databricks https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs
  34. 34. Build, deploy, and manage modern cloud applications and infrastructure using familiar languages, tools, and engineering practices. https://github.com/pulumi/pulumi-azure What is Pulumi? https://www.pulumi.com/docs/reference/pkg/azure/databricks/ Cloud Engineering for Everyone
  35. 35. Pulumi azure.databricks Module Based on the azurerm Terraform Provider. Creating a Workspace Resource using Pulumi
  36. 36. What is Bicep? Write Apply • Project Bicep – Next Generation ARM Templates • ARM Templates can get complex • Bicep is a cleaner more readable language, that gets compiled into ARM to deploy (a language around ARM) Write and Compile Bicep Language ARM Templates Azure Resource Manager Deployed Solution
  37. 37. Summary
  38. 38. Summary • DevOps is for Everyone • CI/CD keeps your code in check and the latest features/changes in production as soon as possible • IAC is the blueprint of your solution • Lots of tooling options • Databricks Rest API can be used in conjunction with Python and Azure DevOps to create effective fault tolerant pipelines
  39. 39. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  • saurabhverma2412

    Jul. 24, 2021

Applying DevOps to Databricks can be a daunting task. In this talk this will be broken down into bite size chunks. Common DevOps subject areas will be covered, including CI/CD (Continuous Integration/Continuous Deployment), IAC (Infrastructure as Code) and Build Agents. We will explore how to apply DevOps to Databricks (in Azure), primarily using Azure DevOps tooling. As a lot of Spark/Databricks users are Python users, will will focus on the Databricks Rest API (using Python) to perform our tasks.

Views

Total views

293

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

19

Shares

0

Comments

0

Likes

1

×