The hardest thing
in computer science
Hard things
Docker Caching
Dependency versions
Install dependencies
[ 20 minutes or so ]
Only here copy all sources
Intended behaviour
● No change:
docker is not rebuilt - LIGHTNING FAST!!!!
● Sources change/dependencies not:
only sources are added - QUITE FAST !!!
● Dependencies change:
dependencies installed, sources - LITTLE SLOWER !!
Actual behaviour
same machine - local checkout
● Local docker registry
● Repeated build: 1:06m
● Only sources: 1:30m
● Dependencies: 11m
● Whole build: ~ 30m
CI case
● Always fresh machine
○ no code
○ no registry
● Git clone/checkout
● Build
● Wipeout
Docker registry to the rescue!
Build cache:
● Docker build
● Docker push airflow/airflow:latest
Use cache:
● Docker pull airflow/airflow:latest
● docker build --cache-from ariflow/airflow:latest
Actual behaviour
Docker Hub automated build
● DockerHub docker registry as cache
● Repeated build: 11m
● Only sources: 11m <- Still OK
● Dependencies: ~1h
● Whole build: ~ 2h
Using the cache in Travis CI
● Docker Hub builds are slow
● Travis or Cloud Build use earlier image
with --cache-from
● But only sources change most of the
time
Actual BAD behaviour
Travis CI automated build
● Build on Travis with cache from DockerHub
● Repeated build: 11m
● Only sources: 1 h <-
● Dependencies: 1h
● Whole build: ~ 2h
Problem no 1
Git & permissions
● git clone file creation:
○ local user
○ default user’s group
● file/dir permissions (rwxs)
○ preserves user, group and other rx permissions files & dirs
○ does not store w and by default uses umask when cloning by default
○ core.sharedRepository git-config
■ one of: group(true), all, umask(false), 0xxx
● Umask WTF:
○ file: 644 (DockeHub) vs. 664 (Travis CI)
○ dir: 755 (DockerHub) vs. 775 (Travis CI)
Solution to problem 1
Fix group permissions
Problem no 2
Generated files
● not only .gitignore
● generated files
○ autoapi - documentation
○ build artifacts
○ npm cache
○ .pyc files
○ files created accidentally (wget in source folder anyone?)
● COPY .
● Context calculated based on ALL files
● .dockerignore != .gitignore
● slightly different syntax
Solution to problem 2
Set .dockerignore ** by default
Problem no. 3
● Download & compile ALL dependencies takes time!
Partial solution to problem 3
Find the weakest link
Solution to problem 3
a) build image with wheels
Solution to problem 3
b) Copy directory via multi-stage
Docker builds
Solution 3
c) install using wheels
Thank You!
You can add some info where to follow you,
or add information about
polidea.com/blog

Caching in Docker - the hardest thing in computer science

  • 1.
    The hardest thing incomputer science
  • 2.
  • 3.
    Docker Caching Dependency versions Installdependencies [ 20 minutes or so ] Only here copy all sources
  • 4.
    Intended behaviour ● Nochange: docker is not rebuilt - LIGHTNING FAST!!!! ● Sources change/dependencies not: only sources are added - QUITE FAST !!! ● Dependencies change: dependencies installed, sources - LITTLE SLOWER !!
  • 5.
    Actual behaviour same machine- local checkout ● Local docker registry ● Repeated build: 1:06m ● Only sources: 1:30m ● Dependencies: 11m ● Whole build: ~ 30m
  • 6.
    CI case ● Alwaysfresh machine ○ no code ○ no registry ● Git clone/checkout ● Build ● Wipeout
  • 7.
    Docker registry tothe rescue! Build cache: ● Docker build ● Docker push airflow/airflow:latest Use cache: ● Docker pull airflow/airflow:latest ● docker build --cache-from ariflow/airflow:latest
  • 8.
    Actual behaviour Docker Hubautomated build ● DockerHub docker registry as cache ● Repeated build: 11m ● Only sources: 11m <- Still OK ● Dependencies: ~1h ● Whole build: ~ 2h
  • 9.
    Using the cachein Travis CI ● Docker Hub builds are slow ● Travis or Cloud Build use earlier image with --cache-from ● But only sources change most of the time
  • 11.
    Actual BAD behaviour TravisCI automated build ● Build on Travis with cache from DockerHub ● Repeated build: 11m ● Only sources: 1 h <- ● Dependencies: 1h ● Whole build: ~ 2h
  • 13.
    Problem no 1 Git& permissions ● git clone file creation: ○ local user ○ default user’s group ● file/dir permissions (rwxs) ○ preserves user, group and other rx permissions files & dirs ○ does not store w and by default uses umask when cloning by default ○ core.sharedRepository git-config ■ one of: group(true), all, umask(false), 0xxx ● Umask WTF: ○ file: 644 (DockeHub) vs. 664 (Travis CI) ○ dir: 755 (DockerHub) vs. 775 (Travis CI)
  • 14.
    Solution to problem1 Fix group permissions
  • 15.
    Problem no 2 Generatedfiles ● not only .gitignore ● generated files ○ autoapi - documentation ○ build artifacts ○ npm cache ○ .pyc files ○ files created accidentally (wget in source folder anyone?) ● COPY . ● Context calculated based on ALL files ● .dockerignore != .gitignore ● slightly different syntax
  • 16.
    Solution to problem2 Set .dockerignore ** by default
  • 17.
    Problem no. 3 ●Download & compile ALL dependencies takes time!
  • 18.
    Partial solution toproblem 3 Find the weakest link
  • 19.
    Solution to problem3 a) build image with wheels
  • 20.
    Solution to problem3 b) Copy directory via multi-stage Docker builds
  • 21.
  • 23.
    Thank You! You canadd some info where to follow you, or add information about polidea.com/blog