SlideShare a Scribd company logo
1 of 63
Download to read offline
Bring your own container
Using Docker images in production
Harin Sanghirun & Max Cantor
Machine Learning Engineering at Condé Nast
Introduction
Condé Nast
• Global leader in media featuring many iconic brands:
• The New Yorker, WIRED, Vanity Fair, Epicurious, and many more.
• In addition to content production, Condé Nast also invests in companion
software products to enhance our audience experiences.
• Examples: Content Recommendations, Audience Segmentation
Spire
Audience Segmentation
• Spire is a platform for user segmentation and targeted advertising
• Analyzes over one-hundred million users
• See our Spark+AI Summit 2020 talk for more background on Spire
Presentation Overview
• 1000-Foot View of Spire’s Architecture
• What Spire looked like before Docker
• How Docker helped streamline Spire’s deployment
• Containerization of Spire on Databricks
• Walkthrough of Spire’s containerization strategy in production
• Learning from Experience: Pros & Cons
• What we’ve learned from containerizing Spire on Databricks
How Docker streamlined Spire’s deployment
The 1000-Foot View
Ad Targeting
Content
Recommendations
Conde Nast’s Platform
Dataset
source(s)
(1st, 2nd,
and 3rd
party)
Components of Spire
▪ Model interface
standardization
▪ Model serialization
▪ Model versioning & tracking
▪ Hyperparameter tuning
• Database Management
• Scheduling and
Orchestration
• Spire Core Library
• Kalos
• Utility Methods
• Data Science Common Library
Data Science
Common
spire-2.0.0-py3-none-any.whl
spire-3.0.0-py3-none-any.whl
spire-3.1.0-py3-none-any.whl
kalos-1.0.0-py3-none-any.whl
kalos-1.2.0-py3-none-any.whl
kalos-1.2.1-py3-none-any.whl
datasci-1.0.0-py3-none-any.w
hl
CI/CD
CI/CD
CI/CD
Without Container
DBFS
Data Science
Common
ghcr.io/condenast/spire:3.0.0
ghcr.io/condenast/spire:3.1.0
ghcr.io/condenast/spire:3-stable
Dockerfile
CI/CD
With Container
Recap
• Pre-packaged all dependencies
• Each image represents a tested combination of packages
• Linked to a specific Databricks Runtime and Spire
• Fewer pipelines to manage
• Explicit and upfront control over dependency versions
Custom Containers on Databricks, 101
The Basics
• Step 1 - Choosing a base image
• Step 2 - Adding your dependency
• Step 3 - Push to a Docker Registry
• Step 4 - Launching a cluster
Step 1 - Choosing a Base Image
See their content at:
https://github.com/databricks/containers
• Available Images:
• Standard
• Minimal
• Python
• R
• DBFS FUSE
• SSH
• GPU
Step 2 - Adding Dependency and build
• Standard Docker workflow:
• “Do what you want”
• Things to watch out for:
• Make sure to match package
version listed in your target
Runtime Version
# select base image
FROM databricksruntime/standard:latest
# install pip libraries
RUN pip install pandas urllib3
# installing binaries
RUN apt-get install git
Step 3 - Pushing to a Docker Registry
● The recommended way:
○ AWS ECR
○ Azure Container Registry
● Also support any registry with basic authentication
● What we do:
○ GitHub Container Registry + Basic Authentication
Step 4 - Launching Your Cluster
Containerization of Spire on Databricks
Containerization of Spire on Databricks: Docker Image
Goal: Produce a Docker image w/ DBR + Spire
• The image is built from databricks-minimal
• Ubuntu 18.04
• Custom DBR 7.x functionality
• Spire package and sub-package install
• Dependency management
Containerization of Spire on Databricks: Hosting
Goal: Host image on GitHub Packages (ghcr.io)
• Manage production and development packages
• Can host multiple images per package
• Image Tagging:
• GitHub release tag
• Spire package version (version.py)
Containerization of Spire on Databricks: CI/CD
Goal: Integrate images creation with CI/CD pipeline
• GitHub Actions CI/CD Integration
• ubuntu-latest and macOS-10.15
• Automated pytest testing on push
• Automated build and deployment of Docker image on release tag
What We Learned: Pros & Cons
What We Learned: Pro 1
• Library customization
• Automatic and simplified control of the Spire module
• Integration of module dependencies
What We Learned: Pro 2
• Fluid integration with existing deployment pipeline
• GitHub Actions CI/CD
• Test database integration
• Pytest integration
• Multiple OS support (Ubuntu and macOS)
• Release tagging
• Databricks Jobs
What We Learned: Pro 3
• Ease of debugging
• ssh into container via docker run -it <package> bash
• pdb set trace support
• Test database Docker volume for pytest integrations testing
What We Learned: Con 1
• DBR Version Incompatibility
• DBR 8.x not currently supported, only DBR 6.x and 7.x
• Need to create custom base image
What We Learned: Con 2
• Pip Package Management
• Must match target runtime specifications
What We Learned: Con 3
• Large image size
• Base images are large compared to Docker images used for deployment
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
databricksruntime/standard latest fd383efa2fcc 19 months ago 1.84GB
What We Learned: Con 4
• Requires prior Docker experience
• Local memory space requirement for local
builds grow quickly
• Need to continually prune these resources
• Deployment pipeline is slower
• More data is uploaded
Delete all docker images
docker rmi $(docker images -q)
Delete all docker containers
docker rm $(docker ps --filter status=exited -q)
Pruning
docker image prune -f
docker container prune -f
docker builder prune -f
● Credentials for Basic
Auth are stored of
plaintext
basic auth stored as plain text
What We Learned: Con 5
● Each usage requires a pull of the container
○ Image pulls grow quickly
What We Learned: Con 6
Summary
● What Databricks’ Containers offer:
● Streamlining of dependency management
● Streamlining of CI/CD
● More predictable runtime behavior
● Things to consider:
● Need careful synchronization of dependencies
○ Between your image and your target runtime
● Security concern for Basic Auth
● Additional overhead of Docker
● Each usage requires a pull of the container
THE END
Fonts
Agenda
▪ First Presenter
▪ Topic Lorem ipsum
dolor sit amet
consectetur.
▪ Second Presenter
▪ Topic Lorem ipsum
dolor sit amet.
▪ Third Presenter
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
Colors
Color Palette
Primary
Colors
Content Slides
Basic Slide
• Bullet 1
• Sub-bullet
• Sub-bullet
• Bullet 2
• Sub-bullet
• Sub-bullet
Reduce Long Titles
• Bullet 1
• Sub-bullet
• Sub-bullet
• Bullet 2
• Sub-bullet
• Sub-bullet
By splitting them into a short title, and a more detailed subtitle using this slide format that includes a
subtitle area
Two Columns
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
• Headline Format
Headline Format
Two Box
▪ Bulleted list
▪ Bulleted list
▪ Bulleted list
▪ Bulleted list
• Category
• Category
Three Box
▪ Bulleted list
▪ Bulleted list
• Bulleted list
• Bulleted list
• Category
• Category
• Bulleted list
• Bulleted list
• Category
Four Box
▪ Bulleted list
▪ Bulleted list
▪ Bulleted list
▪ Bulleted list
• Category
• Category
▪ Bulleted list
▪ Bulleted list
• Category
▪ Bulleted list
▪ Bulleted list
• Category
Shapes
Shapes
Rounded corner rectangle Double corner
rectangle
Double corner
rectangle
Tables and Charts
Table
Column Column Column
Row Value Value Value
Row Value Value Value
Row Value Value Value
Row Value Value Value
Row Value Value Value
Row Value Value Value
Row Value Value Value
Bar chart
Line chart
Pie Chart
Quotes and Text Callouts
Attribution Format
Second line of attribution
This is a template for a quote
slide. This is where the quote
goes. Attribute the source
below.
Logos
Data + AI Summit Logos
Databricks Logos
Open Source Logos

More Related Content

What's hot

Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastDatabricks
 
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...Databricks
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applicationsLars Albertsson
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoDimko Zhluktenko
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta LakeKnoldus Inc.
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Databricks
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes Minio
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceSamanthaBerlant
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakHakka Labs
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptxBRIJESH KUMAR
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 

What's hot (20)

PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applications
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 

Similar to Bring Your Own Container: Using Docker Images In Production

Preparing your dockerised application for production deployment
Preparing your dockerised application for production deploymentPreparing your dockerised application for production deployment
Preparing your dockerised application for production deploymentDave Ward
 
Docker for the new Era: Introducing Docker,its components and tools
Docker for the new Era: Introducing Docker,its components and toolsDocker for the new Era: Introducing Docker,its components and tools
Docker for the new Era: Introducing Docker,its components and toolsRamit Surana
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 
A new model for Docker image distribution
A new model for Docker image distributionA new model for Docker image distribution
A new model for Docker image distributionDocker, Inc.
 
DockerCon SF 2015: Maintaining the official node.js docker image
DockerCon SF 2015: Maintaining the official node.js docker imageDockerCon SF 2015: Maintaining the official node.js docker image
DockerCon SF 2015: Maintaining the official node.js docker imageDocker, Inc.
 
Docker Indy Meetup - An Opinionated View of Building Docker Images and Pipelines
Docker Indy Meetup - An Opinionated View of Building Docker Images and PipelinesDocker Indy Meetup - An Opinionated View of Building Docker Images and Pipelines
Docker Indy Meetup - An Opinionated View of Building Docker Images and PipelinesMatt Bentley
 
DockerCon SF 2015: Maintaining the Official Node.js Docker Image
DockerCon SF 2015: Maintaining the Official Node.js Docker ImageDockerCon SF 2015: Maintaining the Official Node.js Docker Image
DockerCon SF 2015: Maintaining the Official Node.js Docker ImageDocker, Inc.
 
Build pipelines with bitbucket for Magento
Build pipelines with bitbucket for MagentoBuild pipelines with bitbucket for Magento
Build pipelines with bitbucket for MagentoRrap Software Pvt Ltd
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to dockerJohn Willis
 
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWERContinuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWERIndrajit Poddar
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Derek Jacoby
 
DockerCon SF 2015: A New Model for Image Distribution
DockerCon SF 2015: A New Model for Image DistributionDockerCon SF 2015: A New Model for Image Distribution
DockerCon SF 2015: A New Model for Image DistributionDocker, Inc.
 
Docker Registry V2
Docker Registry V2Docker Registry V2
Docker Registry V2Docker, Inc.
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker budMandi Walls
 
Scaling Your App With Docker Swarm using Terraform, Packer on Openstack
Scaling Your App With Docker Swarm using Terraform, Packer on OpenstackScaling Your App With Docker Swarm using Terraform, Packer on Openstack
Scaling Your App With Docker Swarm using Terraform, Packer on OpenstackBobby DeVeaux, DevOps Consultant
 
Getting Started with Docker
Getting Started with DockerGetting Started with Docker
Getting Started with DockerGeeta Vinnakota
 

Similar to Bring Your Own Container: Using Docker Images In Production (20)

Preparing your dockerised application for production deployment
Preparing your dockerised application for production deploymentPreparing your dockerised application for production deployment
Preparing your dockerised application for production deployment
 
Docker for the new Era: Introducing Docker,its components and tools
Docker for the new Era: Introducing Docker,its components and toolsDocker for the new Era: Introducing Docker,its components and tools
Docker for the new Era: Introducing Docker,its components and tools
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
A new model for Docker image distribution
A new model for Docker image distributionA new model for Docker image distribution
A new model for Docker image distribution
 
DockerCon SF 2015: Maintaining the official node.js docker image
DockerCon SF 2015: Maintaining the official node.js docker imageDockerCon SF 2015: Maintaining the official node.js docker image
DockerCon SF 2015: Maintaining the official node.js docker image
 
Docker Indy Meetup - An Opinionated View of Building Docker Images and Pipelines
Docker Indy Meetup - An Opinionated View of Building Docker Images and PipelinesDocker Indy Meetup - An Opinionated View of Building Docker Images and Pipelines
Docker Indy Meetup - An Opinionated View of Building Docker Images and Pipelines
 
DockerCon SF 2015: Maintaining the Official Node.js Docker Image
DockerCon SF 2015: Maintaining the Official Node.js Docker ImageDockerCon SF 2015: Maintaining the Official Node.js Docker Image
DockerCon SF 2015: Maintaining the Official Node.js Docker Image
 
Build pipelines with bitbucket for Magento
Build pipelines with bitbucket for MagentoBuild pipelines with bitbucket for Magento
Build pipelines with bitbucket for Magento
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
 
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWERContinuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
 
Containers 101
Containers 101Containers 101
Containers 101
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
 
DockerCon SF 2015: A New Model for Image Distribution
DockerCon SF 2015: A New Model for Image DistributionDockerCon SF 2015: A New Model for Image Distribution
DockerCon SF 2015: A New Model for Image Distribution
 
Docker Registry V2
Docker Registry V2Docker Registry V2
Docker Registry V2
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker bud
 
Scaling Your App With Docker Swarm using Terraform, Packer on Openstack
Scaling Your App With Docker Swarm using Terraform, Packer on OpenstackScaling Your App With Docker Swarm using Terraform, Packer on Openstack
Scaling Your App With Docker Swarm using Terraform, Packer on Openstack
 
Containerization using docker and its applications
Containerization using docker and its applicationsContainerization using docker and its applications
Containerization using docker and its applications
 
Containerization using docker and its applications
Containerization using docker and its applicationsContainerization using docker and its applications
Containerization using docker and its applications
 
Getting Started with Docker
Getting Started with DockerGetting Started with Docker
Getting Started with Docker
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 

Recently uploaded (20)

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 

Bring Your Own Container: Using Docker Images In Production

  • 1. Bring your own container Using Docker images in production Harin Sanghirun & Max Cantor Machine Learning Engineering at Condé Nast
  • 3. Condé Nast • Global leader in media featuring many iconic brands: • The New Yorker, WIRED, Vanity Fair, Epicurious, and many more. • In addition to content production, Condé Nast also invests in companion software products to enhance our audience experiences. • Examples: Content Recommendations, Audience Segmentation
  • 4. Spire Audience Segmentation • Spire is a platform for user segmentation and targeted advertising • Analyzes over one-hundred million users • See our Spark+AI Summit 2020 talk for more background on Spire
  • 5. Presentation Overview • 1000-Foot View of Spire’s Architecture • What Spire looked like before Docker • How Docker helped streamline Spire’s deployment • Containerization of Spire on Databricks • Walkthrough of Spire’s containerization strategy in production • Learning from Experience: Pros & Cons • What we’ve learned from containerizing Spire on Databricks
  • 6. How Docker streamlined Spire’s deployment
  • 7. The 1000-Foot View Ad Targeting Content Recommendations Conde Nast’s Platform Dataset source(s) (1st, 2nd, and 3rd party)
  • 8. Components of Spire ▪ Model interface standardization ▪ Model serialization ▪ Model versioning & tracking ▪ Hyperparameter tuning • Database Management • Scheduling and Orchestration • Spire Core Library • Kalos • Utility Methods • Data Science Common Library
  • 11. Recap • Pre-packaged all dependencies • Each image represents a tested combination of packages • Linked to a specific Databricks Runtime and Spire • Fewer pipelines to manage • Explicit and upfront control over dependency versions
  • 12. Custom Containers on Databricks, 101
  • 13. The Basics • Step 1 - Choosing a base image • Step 2 - Adding your dependency • Step 3 - Push to a Docker Registry • Step 4 - Launching a cluster
  • 14. Step 1 - Choosing a Base Image See their content at: https://github.com/databricks/containers • Available Images: • Standard • Minimal • Python • R • DBFS FUSE • SSH • GPU
  • 15. Step 2 - Adding Dependency and build • Standard Docker workflow: • “Do what you want” • Things to watch out for: • Make sure to match package version listed in your target Runtime Version # select base image FROM databricksruntime/standard:latest # install pip libraries RUN pip install pandas urllib3 # installing binaries RUN apt-get install git
  • 16. Step 3 - Pushing to a Docker Registry ● The recommended way: ○ AWS ECR ○ Azure Container Registry ● Also support any registry with basic authentication ● What we do: ○ GitHub Container Registry + Basic Authentication
  • 17. Step 4 - Launching Your Cluster
  • 18. Containerization of Spire on Databricks
  • 19. Containerization of Spire on Databricks: Docker Image Goal: Produce a Docker image w/ DBR + Spire • The image is built from databricks-minimal • Ubuntu 18.04 • Custom DBR 7.x functionality • Spire package and sub-package install • Dependency management
  • 20. Containerization of Spire on Databricks: Hosting Goal: Host image on GitHub Packages (ghcr.io) • Manage production and development packages • Can host multiple images per package • Image Tagging: • GitHub release tag • Spire package version (version.py)
  • 21. Containerization of Spire on Databricks: CI/CD Goal: Integrate images creation with CI/CD pipeline • GitHub Actions CI/CD Integration • ubuntu-latest and macOS-10.15 • Automated pytest testing on push • Automated build and deployment of Docker image on release tag
  • 22. What We Learned: Pros & Cons
  • 23. What We Learned: Pro 1 • Library customization • Automatic and simplified control of the Spire module • Integration of module dependencies
  • 24. What We Learned: Pro 2 • Fluid integration with existing deployment pipeline • GitHub Actions CI/CD • Test database integration • Pytest integration • Multiple OS support (Ubuntu and macOS) • Release tagging • Databricks Jobs
  • 25. What We Learned: Pro 3 • Ease of debugging • ssh into container via docker run -it <package> bash • pdb set trace support • Test database Docker volume for pytest integrations testing
  • 26. What We Learned: Con 1 • DBR Version Incompatibility • DBR 8.x not currently supported, only DBR 6.x and 7.x • Need to create custom base image
  • 27. What We Learned: Con 2 • Pip Package Management • Must match target runtime specifications
  • 28. What We Learned: Con 3 • Large image size • Base images are large compared to Docker images used for deployment $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE databricksruntime/standard latest fd383efa2fcc 19 months ago 1.84GB
  • 29. What We Learned: Con 4 • Requires prior Docker experience • Local memory space requirement for local builds grow quickly • Need to continually prune these resources • Deployment pipeline is slower • More data is uploaded Delete all docker images docker rmi $(docker images -q) Delete all docker containers docker rm $(docker ps --filter status=exited -q) Pruning docker image prune -f docker container prune -f docker builder prune -f
  • 30. ● Credentials for Basic Auth are stored of plaintext basic auth stored as plain text What We Learned: Con 5
  • 31. ● Each usage requires a pull of the container ○ Image pulls grow quickly What We Learned: Con 6
  • 32. Summary ● What Databricks’ Containers offer: ● Streamlining of dependency management ● Streamlining of CI/CD ● More predictable runtime behavior ● Things to consider: ● Need careful synchronization of dependencies ○ Between your image and your target runtime ● Security concern for Basic Auth ● Additional overhead of Docker ● Each usage requires a pull of the container
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. Fonts
  • 40. Agenda ▪ First Presenter ▪ Topic Lorem ipsum dolor sit amet consectetur. ▪ Second Presenter ▪ Topic Lorem ipsum dolor sit amet. ▪ Third Presenter
  • 41. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  • 45. Basic Slide • Bullet 1 • Sub-bullet • Sub-bullet • Bullet 2 • Sub-bullet • Sub-bullet
  • 46. Reduce Long Titles • Bullet 1 • Sub-bullet • Sub-bullet • Bullet 2 • Sub-bullet • Sub-bullet By splitting them into a short title, and a more detailed subtitle using this slide format that includes a subtitle area
  • 47. Two Columns ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format • Headline Format Headline Format
  • 48. Two Box ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list • Category • Category
  • 49. Three Box ▪ Bulleted list ▪ Bulleted list • Bulleted list • Bulleted list • Category • Category • Bulleted list • Bulleted list • Category
  • 50. Four Box ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list • Category • Category ▪ Bulleted list ▪ Bulleted list • Category ▪ Bulleted list ▪ Bulleted list • Category
  • 52. Shapes Rounded corner rectangle Double corner rectangle Double corner rectangle
  • 54. Table Column Column Column Row Value Value Value Row Value Value Value Row Value Value Value Row Value Value Value Row Value Value Value Row Value Value Value Row Value Value Value
  • 58. Quotes and Text Callouts
  • 59. Attribution Format Second line of attribution This is a template for a quote slide. This is where the quote goes. Attribute the source below.
  • 60. Logos
  • 61. Data + AI Summit Logos