SlideShare a Scribd company logo
1 of 23
Download to read offline
The hardest thing
in computer science
Hard things
Docker Caching
Dependency versions
Install dependencies
[ 20 minutes or so ]
Only here copy all sources
Intended behaviour
● No change:
docker is not rebuilt - LIGHTNING FAST!!!!
● Sources change/dependencies not:
only sources are added - QUITE FAST !!!
● Dependencies change:
dependencies installed, sources - LITTLE SLOWER !!
Actual behaviour
same machine - local checkout
● Local docker registry
● Repeated build: 1:06m
● Only sources: 1:30m
● Dependencies: 11m
● Whole build: ~ 30m
CI case
● Always fresh machine
○ no code
○ no registry
● Git clone/checkout
● Build
● Wipeout
Docker registry to the rescue!
Build cache:
● Docker build
● Docker push airflow/airflow:latest
Use cache:
● Docker pull airflow/airflow:latest
● docker build --cache-from ariflow/airflow:latest
Actual behaviour
Docker Hub automated build
● DockerHub docker registry as cache
● Repeated build: 11m
● Only sources: 11m <- Still OK
● Dependencies: ~1h
● Whole build: ~ 2h
Using the cache in Travis CI
● Docker Hub builds are slow
● Travis or Cloud Build use earlier image
with --cache-from
● But only sources change most of the
time
Actual BAD behaviour
Travis CI automated build
● Build on Travis with cache from DockerHub
● Repeated build: 11m
● Only sources: 1 h <-
● Dependencies: 1h
● Whole build: ~ 2h
Problem no 1
Git & permissions
● git clone file creation:
○ local user
○ default user’s group
● file/dir permissions (rwxs)
○ preserves user, group and other rx permissions files & dirs
○ does not store w and by default uses umask when cloning by default
○ core.sharedRepository git-config
■ one of: group(true), all, umask(false), 0xxx
● Umask WTF:
○ file: 644 (DockeHub) vs. 664 (Travis CI)
○ dir: 755 (DockerHub) vs. 775 (Travis CI)
Solution to problem 1
Fix group permissions
Problem no 2
Generated files
● not only .gitignore
● generated files
○ autoapi - documentation
○ build artifacts
○ npm cache
○ .pyc files
○ files created accidentally (wget in source folder anyone?)
● COPY .
● Context calculated based on ALL files
● .dockerignore != .gitignore
● slightly different syntax
Solution to problem 2
Set .dockerignore ** by default
Problem no. 3
● Download & compile ALL dependencies takes time!
Partial solution to problem 3
Find the weakest link
Solution to problem 3
a) build image with wheels
Solution to problem 3
b) Copy directory via multi-stage
Docker builds
Solution 3
c) install using wheels
Thank You!
You can add some info where to follow you,
or add information about
polidea.com/blog

More Related Content

What's hot

Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Gosuke Miyashita
 

What's hot (20)

Introduction to Docker Compose
Introduction to Docker ComposeIntroduction to Docker Compose
Introduction to Docker Compose
 
Docker Athens: Docker Engine Evolution & Containerd Use Cases
Docker Athens: Docker Engine Evolution & Containerd Use CasesDocker Athens: Docker Engine Evolution & Containerd Use Cases
Docker Athens: Docker Engine Evolution & Containerd Use Cases
 
RancherOS - The perfect place to run Docker
RancherOS - The perfect place to run DockerRancherOS - The perfect place to run Docker
RancherOS - The perfect place to run Docker
 
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQDocker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
 
Ansible as a better shell script
Ansible as a better shell scriptAnsible as a better shell script
Ansible as a better shell script
 
[Szjug] Docker. Does it matter for java developer?
[Szjug] Docker. Does it matter for java developer?[Szjug] Docker. Does it matter for java developer?
[Szjug] Docker. Does it matter for java developer?
 
Let's Count Bytes! Launching Ruby in 32K of RAM
Let's Count Bytes! Launching Ruby in 32K of RAMLet's Count Bytes! Launching Ruby in 32K of RAM
Let's Count Bytes! Launching Ruby in 32K of RAM
 
Containers: What are they, Really?
Containers: What are they, Really?Containers: What are they, Really?
Containers: What are they, Really?
 
CRI Runtimes Deep-Dive: Who's Running My Pod!?
CRI Runtimes Deep-Dive: Who's Running My Pod!?CRI Runtimes Deep-Dive: Who's Running My Pod!?
CRI Runtimes Deep-Dive: Who's Running My Pod!?
 
Clustering Docker with Docker Swarm on openSUSE
Clustering Docker with Docker Swarm on openSUSEClustering Docker with Docker Swarm on openSUSE
Clustering Docker with Docker Swarm on openSUSE
 
EC2 Storage for Docker 150526b
EC2 Storage for Docker   150526bEC2 Storage for Docker   150526b
EC2 Storage for Docker 150526b
 
CoreOS + Kubernetes @ All Things Open 2015
CoreOS + Kubernetes @ All Things Open 2015CoreOS + Kubernetes @ All Things Open 2015
CoreOS + Kubernetes @ All Things Open 2015
 
It's 2018. Are My Containers Secure Yet!?
It's 2018. Are My Containers Secure Yet!?It's 2018. Are My Containers Secure Yet!?
It's 2018. Are My Containers Secure Yet!?
 
Upstate DevOps - Containers 101 - March 28, 2019
Upstate DevOps - Containers 101 - March 28, 2019Upstate DevOps - Containers 101 - March 28, 2019
Upstate DevOps - Containers 101 - March 28, 2019
 
Docker. Micro services for lazy developers
Docker. Micro services for lazy developersDocker. Micro services for lazy developers
Docker. Micro services for lazy developers
 
Datacenter Airlift - "Docker and the world of “containerized" environments"
Datacenter Airlift - "Docker and the world of “containerized" environments"Datacenter Airlift - "Docker and the world of “containerized" environments"
Datacenter Airlift - "Docker and the world of “containerized" environments"
 
2 docker engine_hands_on
2 docker engine_hands_on2 docker engine_hands_on
2 docker engine_hands_on
 
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
 
Docker tutorial2
Docker tutorial2Docker tutorial2
Docker tutorial2
 
Dockerin10mins
Dockerin10minsDockerin10mins
Dockerin10mins
 

Similar to Caching in Docker - the hardest thing in computer science

A Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and ContainersA Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and Containers
Docker, Inc.
 

Similar to Caching in Docker - the hardest thing in computer science (20)

Docker presentation
Docker presentationDocker presentation
Docker presentation
 
Introduction to containers
Introduction to containersIntroduction to containers
Introduction to containers
 
Docker4Drupal 2.1 for Development
Docker4Drupal 2.1 for DevelopmentDocker4Drupal 2.1 for Development
Docker4Drupal 2.1 for Development
 
Docker n co
Docker n coDocker n co
Docker n co
 
Powercoders · Docker · Fall 2021.pptx
Powercoders · Docker · Fall 2021.pptxPowercoders · Docker · Fall 2021.pptx
Powercoders · Docker · Fall 2021.pptx
 
Build optimization mechanisms in GitLab and Docker
Build optimization mechanisms in GitLab and DockerBuild optimization mechanisms in GitLab and Docker
Build optimization mechanisms in GitLab and Docker
 
DockerCon EU 2015: Breaking the RPiDocker Challenge
DockerCon EU 2015: Breaking the RPiDocker Challenge DockerCon EU 2015: Breaking the RPiDocker Challenge
DockerCon EU 2015: Breaking the RPiDocker Challenge
 
Docker 0.11 at MaxCDN meetup in Los Angeles
Docker 0.11 at MaxCDN meetup in Los AngelesDocker 0.11 at MaxCDN meetup in Los Angeles
Docker 0.11 at MaxCDN meetup in Los Angeles
 
Talk on PHP Day Uruguay about Docker
Talk on PHP Day Uruguay about DockerTalk on PHP Day Uruguay about Docker
Talk on PHP Day Uruguay about Docker
 
Docker SQL Continuous Integration Flow
Docker SQL Continuous Integration FlowDocker SQL Continuous Integration Flow
Docker SQL Continuous Integration Flow
 
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
 
Perspectives on Docker
Perspectives on DockerPerspectives on Docker
Perspectives on Docker
 
Super powered Drupal development with docker
Super powered Drupal development with dockerSuper powered Drupal development with docker
Super powered Drupal development with docker
 
A Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and ContainersA Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and Containers
 
Magento Docker Setup.pdf
Magento Docker Setup.pdfMagento Docker Setup.pdf
Magento Docker Setup.pdf
 
Infrastructure = Code
Infrastructure = CodeInfrastructure = Code
Infrastructure = Code
 
Docker 102 - Immutable Infrastructure
Docker 102 - Immutable InfrastructureDocker 102 - Immutable Infrastructure
Docker 102 - Immutable Infrastructure
 
Настройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'aНастройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'a
 
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
 
Introduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange CountyIntroduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange County
 

More from Jarek Potiuk

Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFestManageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Jarek Potiuk
 
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Jarek Potiuk
 

More from Jarek Potiuk (11)

What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
 
Subtle Differences between Python versions
Subtle Differences between Python versionsSubtle Differences between Python versions
Subtle Differences between Python versions
 
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFestManageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
 
Off time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social mediaOff time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social media
 
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
 
Berlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow WorkshopsBerlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow Workshops
 
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
 
Ci for android OS
Ci for android OSCi for android OS
Ci for android OS
 
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
 
It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)
 
React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Caching in Docker - the hardest thing in computer science

  • 1. The hardest thing in computer science
  • 3. Docker Caching Dependency versions Install dependencies [ 20 minutes or so ] Only here copy all sources
  • 4. Intended behaviour ● No change: docker is not rebuilt - LIGHTNING FAST!!!! ● Sources change/dependencies not: only sources are added - QUITE FAST !!! ● Dependencies change: dependencies installed, sources - LITTLE SLOWER !!
  • 5. Actual behaviour same machine - local checkout ● Local docker registry ● Repeated build: 1:06m ● Only sources: 1:30m ● Dependencies: 11m ● Whole build: ~ 30m
  • 6. CI case ● Always fresh machine ○ no code ○ no registry ● Git clone/checkout ● Build ● Wipeout
  • 7. Docker registry to the rescue! Build cache: ● Docker build ● Docker push airflow/airflow:latest Use cache: ● Docker pull airflow/airflow:latest ● docker build --cache-from ariflow/airflow:latest
  • 8. Actual behaviour Docker Hub automated build ● DockerHub docker registry as cache ● Repeated build: 11m ● Only sources: 11m <- Still OK ● Dependencies: ~1h ● Whole build: ~ 2h
  • 9. Using the cache in Travis CI ● Docker Hub builds are slow ● Travis or Cloud Build use earlier image with --cache-from ● But only sources change most of the time
  • 10.
  • 11. Actual BAD behaviour Travis CI automated build ● Build on Travis with cache from DockerHub ● Repeated build: 11m ● Only sources: 1 h <- ● Dependencies: 1h ● Whole build: ~ 2h
  • 12.
  • 13. Problem no 1 Git & permissions ● git clone file creation: ○ local user ○ default user’s group ● file/dir permissions (rwxs) ○ preserves user, group and other rx permissions files & dirs ○ does not store w and by default uses umask when cloning by default ○ core.sharedRepository git-config ■ one of: group(true), all, umask(false), 0xxx ● Umask WTF: ○ file: 644 (DockeHub) vs. 664 (Travis CI) ○ dir: 755 (DockerHub) vs. 775 (Travis CI)
  • 14. Solution to problem 1 Fix group permissions
  • 15. Problem no 2 Generated files ● not only .gitignore ● generated files ○ autoapi - documentation ○ build artifacts ○ npm cache ○ .pyc files ○ files created accidentally (wget in source folder anyone?) ● COPY . ● Context calculated based on ALL files ● .dockerignore != .gitignore ● slightly different syntax
  • 16. Solution to problem 2 Set .dockerignore ** by default
  • 17. Problem no. 3 ● Download & compile ALL dependencies takes time!
  • 18. Partial solution to problem 3 Find the weakest link
  • 19. Solution to problem 3 a) build image with wheels
  • 20. Solution to problem 3 b) Copy directory via multi-stage Docker builds
  • 21. Solution 3 c) install using wheels
  • 22.
  • 23. Thank You! You can add some info where to follow you, or add information about polidea.com/blog