SlideShare a Scribd company logo
Bring your own container
Using Docker images in production
Harin Sanghirun & Max Cantor
Machine Learning Engineering at Condé Nast
Introduction
Condé Nast
• Global leader in media featuring many iconic brands:
• The New Yorker, WIRED, Vanity Fair, Epicurious, and many more.
• In addition to content production, Condé Nast also invests in companion
software products to enhance our audience experiences.
• Examples: Content Recommendations, Audience Segmentation
Spire
Audience Segmentation
• Spire is a platform for user segmentation and targeted advertising
• Analyzes over one-hundred million users
• See our Spark+AI Summit 2020 talk for more background on Spire
Presentation Overview
• 1000-Foot View of Spire’s Architecture
• What Spire looked like before Docker
• How Docker helped streamline Spire’s deployment
• Containerization of Spire on Databricks
• Walkthrough of Spire’s containerization strategy in production
• Learning from Experience: Pros & Cons
• What we’ve learned from containerizing Spire on Databricks
How Docker streamlined Spire’s deployment
The 1000-Foot View
Ad Targeting
Content
Recommendations
Conde Nast’s Platform
Dataset
source(s)
(1st, 2nd,
and 3rd
party)
Components of Spire
▪ Model interface
standardization
▪ Model serialization
▪ Model versioning & tracking
▪ Hyperparameter tuning
• Database Management
• Scheduling and
Orchestration
• Spire Core Library
• Kalos
• Utility Methods
• Data Science Common Library
Data Science
Common
spire-2.0.0-py3-none-any.whl
spire-3.0.0-py3-none-any.whl
spire-3.1.0-py3-none-any.whl
kalos-1.0.0-py3-none-any.whl
kalos-1.2.0-py3-none-any.whl
kalos-1.2.1-py3-none-any.whl
datasci-1.0.0-py3-none-any.w
hl
CI/CD
CI/CD
CI/CD
Without Container
DBFS
Data Science
Common
ghcr.io/condenast/spire:3.0.0
ghcr.io/condenast/spire:3.1.0
ghcr.io/condenast/spire:3-stable
Dockerfile
CI/CD
With Container
Recap
• Pre-packaged all dependencies
• Each image represents a tested combination of packages
• Linked to a specific Databricks Runtime and Spire
• Fewer pipelines to manage
• Explicit and upfront control over dependency versions
Custom Containers on Databricks, 101
The Basics
• Step 1 - Choosing a base image
• Step 2 - Adding your dependency
• Step 3 - Push to a Docker Registry
• Step 4 - Launching a cluster
Step 1 - Choosing a Base Image
See their content at:
https://github.com/databricks/containers
• Available Images:
• Standard
• Minimal
• Python
• R
• DBFS FUSE
• SSH
• GPU
Step 2 - Adding Dependency and build
• Standard Docker workflow:
• “Do what you want”
• Things to watch out for:
• Make sure to match package
version listed in your target
Runtime Version
# select base image
FROM databricksruntime/standard:latest
# install pip libraries
RUN pip install pandas urllib3
# installing binaries
RUN apt-get install git
Step 3 - Pushing to a Docker Registry
● The recommended way:
○ AWS ECR
○ Azure Container Registry
● Also support any registry with basic authentication
● What we do:
○ GitHub Container Registry + Basic Authentication
Step 4 - Launching Your Cluster
Containerization of Spire on Databricks
Containerization of Spire on Databricks: Docker Image
Goal: Produce a Docker image w/ DBR + Spire
• The image is built from databricks-minimal
• Ubuntu 18.04
• Custom DBR 7.x functionality
• Spire package and sub-package install
• Dependency management
Containerization of Spire on Databricks: Hosting
Goal: Host image on GitHub Packages (ghcr.io)
• Manage production and development packages
• Can host multiple images per package
• Image Tagging:
• GitHub release tag
• Spire package version (version.py)
Containerization of Spire on Databricks: CI/CD
Goal: Integrate images creation with CI/CD pipeline
• GitHub Actions CI/CD Integration
• ubuntu-latest and macOS-10.15
• Automated pytest testing on push
• Automated build and deployment of Docker image on release tag
What We Learned: Pros & Cons
What We Learned: Pro 1
• Library customization
• Automatic and simplified control of the Spire module
• Integration of module dependencies
What We Learned: Pro 2
• Fluid integration with existing deployment pipeline
• GitHub Actions CI/CD
• Test database integration
• Pytest integration
• Multiple OS support (Ubuntu and macOS)
• Release tagging
• Databricks Jobs
What We Learned: Pro 3
• Ease of debugging
• ssh into container via docker run -it <package> bash
• pdb set trace support
• Test database Docker volume for pytest integrations testing
What We Learned: Con 1
• DBR Version Incompatibility
• DBR 8.x not currently supported, only DBR 6.x and 7.x
• Need to create custom base image
What We Learned: Con 2
• Pip Package Management
• Must match target runtime specifications
What We Learned: Con 3
• Large image size
• Base images are large compared to Docker images used for deployment
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
databricksruntime/standard latest fd383efa2fcc 19 months ago 1.84GB
What We Learned: Con 4
• Requires prior Docker experience
• Local memory space requirement for local
builds grow quickly
• Need to continually prune these resources
• Deployment pipeline is slower
• More data is uploaded
Delete all docker images
docker rmi $(docker images -q)
Delete all docker containers
docker rm $(docker ps --filter status=exited -q)
Pruning
docker image prune -f
docker container prune -f
docker builder prune -f
● Credentials for Basic
Auth are stored of
plaintext
basic auth stored as plain text
What We Learned: Con 5
● Each usage requires a pull of the container
○ Image pulls grow quickly
What We Learned: Con 6
Summary
● What Databricks’ Containers offer:
● Streamlining of dependency management
● Streamlining of CI/CD
● More predictable runtime behavior
● Things to consider:
● Need careful synchronization of dependencies
○ Between your image and your target runtime
● Security concern for Basic Auth
● Additional overhead of Docker
● Each usage requires a pull of the container
THE END
Fonts
Agenda
▪ First Presenter
▪ Topic Lorem ipsum
dolor sit amet
consectetur.
▪ Second Presenter
▪ Topic Lorem ipsum
dolor sit amet.
▪ Third Presenter
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
Colors
Color Palette
Primary
Colors
Content Slides
Basic Slide
• Bullet 1
• Sub-bullet
• Sub-bullet
• Bullet 2
• Sub-bullet
• Sub-bullet
Reduce Long Titles
• Bullet 1
• Sub-bullet
• Sub-bullet
• Bullet 2
• Sub-bullet
• Sub-bullet
By splitting them into a short title, and a more detailed subtitle using this slide format that includes a
subtitle area
Two Columns
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
• Headline Format
Headline Format
Two Box
▪ Bulleted list
▪ Bulleted list
▪ Bulleted list
▪ Bulleted list
• Category
• Category
Three Box
▪ Bulleted list
▪ Bulleted list
• Bulleted list
• Bulleted list
• Category
• Category
• Bulleted list
• Bulleted list
• Category
Four Box
▪ Bulleted list
▪ Bulleted list
▪ Bulleted list
▪ Bulleted list
• Category
• Category
▪ Bulleted list
▪ Bulleted list
• Category
▪ Bulleted list
▪ Bulleted list
• Category
Shapes
Shapes
Rounded corner rectangle Double corner
rectangle
Double corner
rectangle
Tables and Charts
Table
Column Column Column
Row Value Value Value
Row Value Value Value
Row Value Value Value
Row Value Value Value
Row Value Value Value
Row Value Value Value
Row Value Value Value
Bar chart
Line chart
Pie Chart
Quotes and Text Callouts
Attribution Format
Second line of attribution
This is a template for a quote
slide. This is where the quote
goes. Attribute the source
below.
Logos
Data + AI Summit Logos
Databricks Logos
Open Source Logos

More Related Content

What's hot

Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
confluent
 
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Janusz Nowak
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
Amr Alaa Yassen
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
Rahul Jain
 
How to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized OptimizationsHow to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized Optimizations
Databricks
 
Welcome to Azure Devops
Welcome to Azure DevopsWelcome to Azure Devops
Welcome to Azure Devops
Alessandro Scardova
 
Neo4j - 5 cool graph examples
Neo4j - 5 cool graph examplesNeo4j - 5 cool graph examples
Neo4j - 5 cool graph examplesPeter Neubauer
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
Spark Summit
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér
 
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyA Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
HostedbyConfluent
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
AWS CDK in Practice
AWS CDK in PracticeAWS CDK in Practice
AWS CDK in Practice
Chulwoo Choi
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Confluent Tech Talk Korea
Confluent Tech Talk KoreaConfluent Tech Talk Korea
Confluent Tech Talk Korea
confluent
 
DevOps-as-a-Service: Towards Automating the Automation
DevOps-as-a-Service: Towards Automating the AutomationDevOps-as-a-Service: Towards Automating the Automation
DevOps-as-a-Service: Towards Automating the Automation
Keith Pleas
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
confluent
 
The Power of Azure DevOps
The Power of Azure DevOpsThe Power of Azure DevOps
The Power of Azure DevOps
Jeff Bramwell
 

What's hot (20)

Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
 
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
How to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized OptimizationsHow to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized Optimizations
 
Welcome to Azure Devops
Welcome to Azure DevopsWelcome to Azure Devops
Welcome to Azure Devops
 
Neo4j - 5 cool graph examples
Neo4j - 5 cool graph examplesNeo4j - 5 cool graph examples
Neo4j - 5 cool graph examples
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyA Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
AWS CDK in Practice
AWS CDK in PracticeAWS CDK in Practice
AWS CDK in Practice
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Confluent Tech Talk Korea
Confluent Tech Talk KoreaConfluent Tech Talk Korea
Confluent Tech Talk Korea
 
DevOps-as-a-Service: Towards Automating the Automation
DevOps-as-a-Service: Towards Automating the AutomationDevOps-as-a-Service: Towards Automating the Automation
DevOps-as-a-Service: Towards Automating the Automation
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
The Power of Azure DevOps
The Power of Azure DevOpsThe Power of Azure DevOps
The Power of Azure DevOps
 

Similar to Bring Your Own Container: Using Docker Images In Production

Preparing your dockerised application for production deployment
Preparing your dockerised application for production deploymentPreparing your dockerised application for production deployment
Preparing your dockerised application for production deployment
Dave Ward
 
Docker for the new Era: Introducing Docker,its components and tools
Docker for the new Era: Introducing Docker,its components and toolsDocker for the new Era: Introducing Docker,its components and tools
Docker for the new Era: Introducing Docker,its components and tools
Ramit Surana
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 
A new model for Docker image distribution
A new model for Docker image distributionA new model for Docker image distribution
A new model for Docker image distributionDocker, Inc.
 
DockerCon SF 2015: Maintaining the official node.js docker image
DockerCon SF 2015: Maintaining the official node.js docker imageDockerCon SF 2015: Maintaining the official node.js docker image
DockerCon SF 2015: Maintaining the official node.js docker imageDocker, Inc.
 
Docker Indy Meetup - An Opinionated View of Building Docker Images and Pipelines
Docker Indy Meetup - An Opinionated View of Building Docker Images and PipelinesDocker Indy Meetup - An Opinionated View of Building Docker Images and Pipelines
Docker Indy Meetup - An Opinionated View of Building Docker Images and Pipelines
Matt Bentley
 
DockerCon SF 2015: Maintaining the Official Node.js Docker Image
DockerCon SF 2015: Maintaining the Official Node.js Docker ImageDockerCon SF 2015: Maintaining the Official Node.js Docker Image
DockerCon SF 2015: Maintaining the Official Node.js Docker Image
Docker, Inc.
 
Build pipelines with bitbucket for Magento
Build pipelines with bitbucket for MagentoBuild pipelines with bitbucket for Magento
Build pipelines with bitbucket for Magento
Rrap Software Pvt Ltd
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
Araf Karsh Hamid
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to dockerJohn Willis
 
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWERContinuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Indrajit Poddar
 
Containers 101
Containers 101Containers 101
Containers 101
Black Duck by Synopsys
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
Derek Jacoby
 
DockerCon SF 2015: A New Model for Image Distribution
DockerCon SF 2015: A New Model for Image DistributionDockerCon SF 2015: A New Model for Image Distribution
DockerCon SF 2015: A New Model for Image Distribution
Docker, Inc.
 
Docker Registry V2
Docker Registry V2Docker Registry V2
Docker Registry V2
Docker, Inc.
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker bud
Mandi Walls
 
Scaling Your App With Docker Swarm using Terraform, Packer on Openstack
Scaling Your App With Docker Swarm using Terraform, Packer on OpenstackScaling Your App With Docker Swarm using Terraform, Packer on Openstack
Scaling Your App With Docker Swarm using Terraform, Packer on Openstack
Bobby DeVeaux, DevOps Consultant
 
Containerization using docker and its applications
Containerization using docker and its applicationsContainerization using docker and its applications
Containerization using docker and its applications
Puneet Kumar Bhatia (MBA, ITIL V3 Certified)
 
Getting Started with Docker
Getting Started with DockerGetting Started with Docker
Getting Started with Docker
Geeta Vinnakota
 

Similar to Bring Your Own Container: Using Docker Images In Production (20)

Preparing your dockerised application for production deployment
Preparing your dockerised application for production deploymentPreparing your dockerised application for production deployment
Preparing your dockerised application for production deployment
 
Docker for the new Era: Introducing Docker,its components and tools
Docker for the new Era: Introducing Docker,its components and toolsDocker for the new Era: Introducing Docker,its components and tools
Docker for the new Era: Introducing Docker,its components and tools
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
A new model for Docker image distribution
A new model for Docker image distributionA new model for Docker image distribution
A new model for Docker image distribution
 
DockerCon SF 2015: Maintaining the official node.js docker image
DockerCon SF 2015: Maintaining the official node.js docker imageDockerCon SF 2015: Maintaining the official node.js docker image
DockerCon SF 2015: Maintaining the official node.js docker image
 
Docker Indy Meetup - An Opinionated View of Building Docker Images and Pipelines
Docker Indy Meetup - An Opinionated View of Building Docker Images and PipelinesDocker Indy Meetup - An Opinionated View of Building Docker Images and Pipelines
Docker Indy Meetup - An Opinionated View of Building Docker Images and Pipelines
 
DockerCon SF 2015: Maintaining the Official Node.js Docker Image
DockerCon SF 2015: Maintaining the Official Node.js Docker ImageDockerCon SF 2015: Maintaining the Official Node.js Docker Image
DockerCon SF 2015: Maintaining the Official Node.js Docker Image
 
Build pipelines with bitbucket for Magento
Build pipelines with bitbucket for MagentoBuild pipelines with bitbucket for Magento
Build pipelines with bitbucket for Magento
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
 
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWERContinuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
 
Containers 101
Containers 101Containers 101
Containers 101
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
 
DockerCon SF 2015: A New Model for Image Distribution
DockerCon SF 2015: A New Model for Image DistributionDockerCon SF 2015: A New Model for Image Distribution
DockerCon SF 2015: A New Model for Image Distribution
 
Docker Registry V2
Docker Registry V2Docker Registry V2
Docker Registry V2
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker bud
 
Scaling Your App With Docker Swarm using Terraform, Packer on Openstack
Scaling Your App With Docker Swarm using Terraform, Packer on OpenstackScaling Your App With Docker Swarm using Terraform, Packer on Openstack
Scaling Your App With Docker Swarm using Terraform, Packer on Openstack
 
Containerization using docker and its applications
Containerization using docker and its applicationsContainerization using docker and its applications
Containerization using docker and its applications
 
Containerization using docker and its applications
Containerization using docker and its applicationsContainerization using docker and its applications
Containerization using docker and its applications
 
Getting Started with Docker
Getting Started with DockerGetting Started with Docker
Getting Started with Docker
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 

Recently uploaded (20)

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 

Bring Your Own Container: Using Docker Images In Production

  • 1. Bring your own container Using Docker images in production Harin Sanghirun & Max Cantor Machine Learning Engineering at Condé Nast
  • 3. Condé Nast • Global leader in media featuring many iconic brands: • The New Yorker, WIRED, Vanity Fair, Epicurious, and many more. • In addition to content production, Condé Nast also invests in companion software products to enhance our audience experiences. • Examples: Content Recommendations, Audience Segmentation
  • 4. Spire Audience Segmentation • Spire is a platform for user segmentation and targeted advertising • Analyzes over one-hundred million users • See our Spark+AI Summit 2020 talk for more background on Spire
  • 5. Presentation Overview • 1000-Foot View of Spire’s Architecture • What Spire looked like before Docker • How Docker helped streamline Spire’s deployment • Containerization of Spire on Databricks • Walkthrough of Spire’s containerization strategy in production • Learning from Experience: Pros & Cons • What we’ve learned from containerizing Spire on Databricks
  • 6. How Docker streamlined Spire’s deployment
  • 7. The 1000-Foot View Ad Targeting Content Recommendations Conde Nast’s Platform Dataset source(s) (1st, 2nd, and 3rd party)
  • 8. Components of Spire ▪ Model interface standardization ▪ Model serialization ▪ Model versioning & tracking ▪ Hyperparameter tuning • Database Management • Scheduling and Orchestration • Spire Core Library • Kalos • Utility Methods • Data Science Common Library
  • 11. Recap • Pre-packaged all dependencies • Each image represents a tested combination of packages • Linked to a specific Databricks Runtime and Spire • Fewer pipelines to manage • Explicit and upfront control over dependency versions
  • 12. Custom Containers on Databricks, 101
  • 13. The Basics • Step 1 - Choosing a base image • Step 2 - Adding your dependency • Step 3 - Push to a Docker Registry • Step 4 - Launching a cluster
  • 14. Step 1 - Choosing a Base Image See their content at: https://github.com/databricks/containers • Available Images: • Standard • Minimal • Python • R • DBFS FUSE • SSH • GPU
  • 15. Step 2 - Adding Dependency and build • Standard Docker workflow: • “Do what you want” • Things to watch out for: • Make sure to match package version listed in your target Runtime Version # select base image FROM databricksruntime/standard:latest # install pip libraries RUN pip install pandas urllib3 # installing binaries RUN apt-get install git
  • 16. Step 3 - Pushing to a Docker Registry ● The recommended way: ○ AWS ECR ○ Azure Container Registry ● Also support any registry with basic authentication ● What we do: ○ GitHub Container Registry + Basic Authentication
  • 17. Step 4 - Launching Your Cluster
  • 18. Containerization of Spire on Databricks
  • 19. Containerization of Spire on Databricks: Docker Image Goal: Produce a Docker image w/ DBR + Spire • The image is built from databricks-minimal • Ubuntu 18.04 • Custom DBR 7.x functionality • Spire package and sub-package install • Dependency management
  • 20. Containerization of Spire on Databricks: Hosting Goal: Host image on GitHub Packages (ghcr.io) • Manage production and development packages • Can host multiple images per package • Image Tagging: • GitHub release tag • Spire package version (version.py)
  • 21. Containerization of Spire on Databricks: CI/CD Goal: Integrate images creation with CI/CD pipeline • GitHub Actions CI/CD Integration • ubuntu-latest and macOS-10.15 • Automated pytest testing on push • Automated build and deployment of Docker image on release tag
  • 22. What We Learned: Pros & Cons
  • 23. What We Learned: Pro 1 • Library customization • Automatic and simplified control of the Spire module • Integration of module dependencies
  • 24. What We Learned: Pro 2 • Fluid integration with existing deployment pipeline • GitHub Actions CI/CD • Test database integration • Pytest integration • Multiple OS support (Ubuntu and macOS) • Release tagging • Databricks Jobs
  • 25. What We Learned: Pro 3 • Ease of debugging • ssh into container via docker run -it <package> bash • pdb set trace support • Test database Docker volume for pytest integrations testing
  • 26. What We Learned: Con 1 • DBR Version Incompatibility • DBR 8.x not currently supported, only DBR 6.x and 7.x • Need to create custom base image
  • 27. What We Learned: Con 2 • Pip Package Management • Must match target runtime specifications
  • 28. What We Learned: Con 3 • Large image size • Base images are large compared to Docker images used for deployment $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE databricksruntime/standard latest fd383efa2fcc 19 months ago 1.84GB
  • 29. What We Learned: Con 4 • Requires prior Docker experience • Local memory space requirement for local builds grow quickly • Need to continually prune these resources • Deployment pipeline is slower • More data is uploaded Delete all docker images docker rmi $(docker images -q) Delete all docker containers docker rm $(docker ps --filter status=exited -q) Pruning docker image prune -f docker container prune -f docker builder prune -f
  • 30. ● Credentials for Basic Auth are stored of plaintext basic auth stored as plain text What We Learned: Con 5
  • 31. ● Each usage requires a pull of the container ○ Image pulls grow quickly What We Learned: Con 6
  • 32. Summary ● What Databricks’ Containers offer: ● Streamlining of dependency management ● Streamlining of CI/CD ● More predictable runtime behavior ● Things to consider: ● Need careful synchronization of dependencies ○ Between your image and your target runtime ● Security concern for Basic Auth ● Additional overhead of Docker ● Each usage requires a pull of the container
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. Fonts
  • 40. Agenda ▪ First Presenter ▪ Topic Lorem ipsum dolor sit amet consectetur. ▪ Second Presenter ▪ Topic Lorem ipsum dolor sit amet. ▪ Third Presenter
  • 41. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  • 45. Basic Slide • Bullet 1 • Sub-bullet • Sub-bullet • Bullet 2 • Sub-bullet • Sub-bullet
  • 46. Reduce Long Titles • Bullet 1 • Sub-bullet • Sub-bullet • Bullet 2 • Sub-bullet • Sub-bullet By splitting them into a short title, and a more detailed subtitle using this slide format that includes a subtitle area
  • 47. Two Columns ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format • Headline Format Headline Format
  • 48. Two Box ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list • Category • Category
  • 49. Three Box ▪ Bulleted list ▪ Bulleted list • Bulleted list • Bulleted list • Category • Category • Bulleted list • Bulleted list • Category
  • 50. Four Box ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list • Category • Category ▪ Bulleted list ▪ Bulleted list • Category ▪ Bulleted list ▪ Bulleted list • Category
  • 52. Shapes Rounded corner rectangle Double corner rectangle Double corner rectangle
  • 54. Table Column Column Column Row Value Value Value Row Value Value Value Row Value Value Value Row Value Value Value Row Value Value Value Row Value Value Value Row Value Value Value
  • 58. Quotes and Text Callouts
  • 59. Attribution Format Second line of attribution This is a template for a quote slide. This is where the quote goes. Attribute the source below.
  • 60. Logos
  • 61. Data + AI Summit Logos