SlideShare a Scribd company logo
The hardest thing
in computer science
Hard things
Docker Caching
Dependency versions
Install dependencies
[ 20 minutes or so ]
Only here copy all sources
Intended behaviour
● No change:
docker is not rebuilt - LIGHTNING FAST!!!!
● Sources change/dependencies not:
only sources are added - QUITE FAST !!!
● Dependencies change:
dependencies installed, sources - LITTLE SLOWER !!
Actual behaviour
same machine - local checkout
● Local docker registry
● Repeated build: 1:06m
● Only sources: 1:30m
● Dependencies: 11m
● Whole build: ~ 30m
CI case
● Always fresh machine
○ no code
○ no registry
● Git clone/checkout
● Build
● Wipeout
Docker registry to the rescue!
Build cache:
● Docker build
● Docker push airflow/airflow:latest
Use cache:
● Docker pull airflow/airflow:latest
● docker build --cache-from ariflow/airflow:latest
Actual behaviour
Docker Hub automated build
● DockerHub docker registry as cache
● Repeated build: 11m
● Only sources: 11m <- Still OK
● Dependencies: ~1h
● Whole build: ~ 2h
Using the cache in Travis CI
● Docker Hub builds are slow
● Travis or Cloud Build use earlier image
with --cache-from
● But only sources change most of the
time
Actual BAD behaviour
Travis CI automated build
● Build on Travis with cache from DockerHub
● Repeated build: 11m
● Only sources: 1 h <-
● Dependencies: 1h
● Whole build: ~ 2h
Problem no 1
Git & permissions
● git clone file creation:
○ local user
○ default user’s group
● file/dir permissions (rwxs)
○ preserves user, group and other rx permissions files & dirs
○ does not store w and by default uses umask when cloning by default
○ core.sharedRepository git-config
■ one of: group(true), all, umask(false), 0xxx
● Umask WTF:
○ file: 644 (DockeHub) vs. 664 (Travis CI)
○ dir: 755 (DockerHub) vs. 775 (Travis CI)
Solution to problem 1
Fix group permissions
Problem no 2
Generated files
● not only .gitignore
● generated files
○ autoapi - documentation
○ build artifacts
○ npm cache
○ .pyc files
○ files created accidentally (wget in source folder anyone?)
● COPY .
● Context calculated based on ALL files
● .dockerignore != .gitignore
● slightly different syntax
Solution to problem 2
Set .dockerignore ** by default
Problem no. 3
● Download & compile ALL dependencies takes time!
Partial solution to problem 3
Find the weakest link
Solution to problem 3
a) build image with wheels
Solution to problem 3
b) Copy directory via multi-stage
Docker builds
Solution 3
c) install using wheels
Thank You!
You can add some info where to follow you,
or add information about
polidea.com/blog

More Related Content

What's hot

Introduction to Docker Compose
Introduction to Docker ComposeIntroduction to Docker Compose
Introduction to Docker Compose
Prabhas Gupte
 
Docker Athens: Docker Engine Evolution & Containerd Use Cases
Docker Athens: Docker Engine Evolution & Containerd Use CasesDocker Athens: Docker Engine Evolution & Containerd Use Cases
Docker Athens: Docker Engine Evolution & Containerd Use Cases
Phil Estes
 
RancherOS - The perfect place to run Docker
RancherOS - The perfect place to run DockerRancherOS - The perfect place to run Docker
RancherOS - The perfect place to run Docker
Saputro Aryulianto
 
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQDocker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Erica Windisch
 
Ansible as a better shell script
Ansible as a better shell scriptAnsible as a better shell script
Ansible as a better shell script
Takuya Nishimoto
 
[Szjug] Docker. Does it matter for java developer?
[Szjug] Docker. Does it matter for java developer?[Szjug] Docker. Does it matter for java developer?
[Szjug] Docker. Does it matter for java developer?
Izzet Mustafaiev
 
Let's Count Bytes! Launching Ruby in 32K of RAM
Let's Count Bytes! Launching Ruby in 32K of RAMLet's Count Bytes! Launching Ruby in 32K of RAM
Let's Count Bytes! Launching Ruby in 32K of RAM
Amoniac OÜ
 
Containers: What are they, Really?
Containers: What are they, Really?Containers: What are they, Really?
Containers: What are they, Really?
Sneha Inguva
 
CRI Runtimes Deep-Dive: Who's Running My Pod!?
CRI Runtimes Deep-Dive: Who's Running My Pod!?CRI Runtimes Deep-Dive: Who's Running My Pod!?
CRI Runtimes Deep-Dive: Who's Running My Pod!?
Phil Estes
 
Clustering Docker with Docker Swarm on openSUSE
Clustering Docker with Docker Swarm on openSUSEClustering Docker with Docker Swarm on openSUSE
Clustering Docker with Docker Swarm on openSUSE
Saputro Aryulianto
 
EC2 Storage for Docker 150526b
EC2 Storage for Docker   150526bEC2 Storage for Docker   150526b
EC2 Storage for Docker 150526b
Clinton Kitson
 
CoreOS + Kubernetes @ All Things Open 2015
CoreOS + Kubernetes @ All Things Open 2015CoreOS + Kubernetes @ All Things Open 2015
CoreOS + Kubernetes @ All Things Open 2015
Brandon Philips
 
It's 2018. Are My Containers Secure Yet!?
It's 2018. Are My Containers Secure Yet!?It's 2018. Are My Containers Secure Yet!?
It's 2018. Are My Containers Secure Yet!?
Phil Estes
 
Upstate DevOps - Containers 101 - March 28, 2019
Upstate DevOps - Containers 101 - March 28, 2019Upstate DevOps - Containers 101 - March 28, 2019
Upstate DevOps - Containers 101 - March 28, 2019
Allen Vailliencourt
 
Docker. Micro services for lazy developers
Docker. Micro services for lazy developersDocker. Micro services for lazy developers
Docker. Micro services for lazy developers
Eugene Krevenets
 
Datacenter Airlift - "Docker and the world of “containerized" environments"
Datacenter Airlift - "Docker and the world of “containerized" environments"Datacenter Airlift - "Docker and the world of “containerized" environments"
Datacenter Airlift - "Docker and the world of “containerized" environments"
Pedro Sousa
 
2 docker engine_hands_on
2 docker engine_hands_on2 docker engine_hands_on
2 docker engine_hands_on
FEG
 
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Gosuke Miyashita
 
Docker tutorial2
Docker tutorial2Docker tutorial2
Docker tutorial2
Tatsuya Yagi
 
Dockerin10mins
Dockerin10minsDockerin10mins
Dockerin10mins
Dawood M.S
 

What's hot (20)

Introduction to Docker Compose
Introduction to Docker ComposeIntroduction to Docker Compose
Introduction to Docker Compose
 
Docker Athens: Docker Engine Evolution & Containerd Use Cases
Docker Athens: Docker Engine Evolution & Containerd Use CasesDocker Athens: Docker Engine Evolution & Containerd Use Cases
Docker Athens: Docker Engine Evolution & Containerd Use Cases
 
RancherOS - The perfect place to run Docker
RancherOS - The perfect place to run DockerRancherOS - The perfect place to run Docker
RancherOS - The perfect place to run Docker
 
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQDocker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
 
Ansible as a better shell script
Ansible as a better shell scriptAnsible as a better shell script
Ansible as a better shell script
 
[Szjug] Docker. Does it matter for java developer?
[Szjug] Docker. Does it matter for java developer?[Szjug] Docker. Does it matter for java developer?
[Szjug] Docker. Does it matter for java developer?
 
Let's Count Bytes! Launching Ruby in 32K of RAM
Let's Count Bytes! Launching Ruby in 32K of RAMLet's Count Bytes! Launching Ruby in 32K of RAM
Let's Count Bytes! Launching Ruby in 32K of RAM
 
Containers: What are they, Really?
Containers: What are they, Really?Containers: What are they, Really?
Containers: What are they, Really?
 
CRI Runtimes Deep-Dive: Who's Running My Pod!?
CRI Runtimes Deep-Dive: Who's Running My Pod!?CRI Runtimes Deep-Dive: Who's Running My Pod!?
CRI Runtimes Deep-Dive: Who's Running My Pod!?
 
Clustering Docker with Docker Swarm on openSUSE
Clustering Docker with Docker Swarm on openSUSEClustering Docker with Docker Swarm on openSUSE
Clustering Docker with Docker Swarm on openSUSE
 
EC2 Storage for Docker 150526b
EC2 Storage for Docker   150526bEC2 Storage for Docker   150526b
EC2 Storage for Docker 150526b
 
CoreOS + Kubernetes @ All Things Open 2015
CoreOS + Kubernetes @ All Things Open 2015CoreOS + Kubernetes @ All Things Open 2015
CoreOS + Kubernetes @ All Things Open 2015
 
It's 2018. Are My Containers Secure Yet!?
It's 2018. Are My Containers Secure Yet!?It's 2018. Are My Containers Secure Yet!?
It's 2018. Are My Containers Secure Yet!?
 
Upstate DevOps - Containers 101 - March 28, 2019
Upstate DevOps - Containers 101 - March 28, 2019Upstate DevOps - Containers 101 - March 28, 2019
Upstate DevOps - Containers 101 - March 28, 2019
 
Docker. Micro services for lazy developers
Docker. Micro services for lazy developersDocker. Micro services for lazy developers
Docker. Micro services for lazy developers
 
Datacenter Airlift - "Docker and the world of “containerized" environments"
Datacenter Airlift - "Docker and the world of “containerized" environments"Datacenter Airlift - "Docker and the world of “containerized" environments"
Datacenter Airlift - "Docker and the world of “containerized" environments"
 
2 docker engine_hands_on
2 docker engine_hands_on2 docker engine_hands_on
2 docker engine_hands_on
 
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
 
Docker tutorial2
Docker tutorial2Docker tutorial2
Docker tutorial2
 
Dockerin10mins
Dockerin10minsDockerin10mins
Dockerin10mins
 

Similar to Caching in Docker - the hardest thing in computer science

Docker presentation
Docker presentationDocker presentation
Docker presentation
Jaskaran Singh
 
Introduction to containers
Introduction to containersIntroduction to containers
Introduction to containers
Nitish Jadia
 
Docker4Drupal 2.1 for Development
Docker4Drupal 2.1 for DevelopmentDocker4Drupal 2.1 for Development
Docker4Drupal 2.1 for Development
Websolutions Agency
 
Docker n co
Docker n coDocker n co
Docker n co
Rohit Jnagal
 
Powercoders · Docker · Fall 2021.pptx
Powercoders · Docker · Fall 2021.pptxPowercoders · Docker · Fall 2021.pptx
Powercoders · Docker · Fall 2021.pptx
IgnacioTamayo2
 
Build optimization mechanisms in GitLab and Docker
Build optimization mechanisms in GitLab and DockerBuild optimization mechanisms in GitLab and Docker
Build optimization mechanisms in GitLab and Docker
Dmytro Patkovskyi
 
DockerCon EU 2015: Breaking the RPiDocker Challenge
DockerCon EU 2015: Breaking the RPiDocker Challenge DockerCon EU 2015: Breaking the RPiDocker Challenge
DockerCon EU 2015: Breaking the RPiDocker Challenge
Docker, Inc.
 
Docker 0.11 at MaxCDN meetup in Los Angeles
Docker 0.11 at MaxCDN meetup in Los AngelesDocker 0.11 at MaxCDN meetup in Los Angeles
Docker 0.11 at MaxCDN meetup in Los Angeles
Jérôme Petazzoni
 
Talk on PHP Day Uruguay about Docker
Talk on PHP Day Uruguay about DockerTalk on PHP Day Uruguay about Docker
Talk on PHP Day Uruguay about Docker
Wellington Silva
 
Docker SQL Continuous Integration Flow
Docker SQL Continuous Integration FlowDocker SQL Continuous Integration Flow
Docker SQL Continuous Integration Flow
Andrii Podanenko
 
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
Jérôme Petazzoni
 
Perspectives on Docker
Perspectives on DockerPerspectives on Docker
Perspectives on Docker
RightScale
 
Super powered Drupal development with docker
Super powered Drupal development with dockerSuper powered Drupal development with docker
Super powered Drupal development with docker
Maciej Lukianski
 
A Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and ContainersA Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and Containers
Docker, Inc.
 
Magento Docker Setup.pdf
Magento Docker Setup.pdfMagento Docker Setup.pdf
Magento Docker Setup.pdf
Abid Malik
 
Infrastructure = Code
Infrastructure = CodeInfrastructure = Code
Infrastructure = Code
Georg Sorst
 
Docker 102 - Immutable Infrastructure
Docker 102 - Immutable InfrastructureDocker 102 - Immutable Infrastructure
Docker 102 - Immutable Infrastructure
Adrian Otto
 
Настройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'aНастройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'a
corehard_by
 
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
Nagios
 
Introduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange CountyIntroduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange County
Jérôme Petazzoni
 

Similar to Caching in Docker - the hardest thing in computer science (20)

Docker presentation
Docker presentationDocker presentation
Docker presentation
 
Introduction to containers
Introduction to containersIntroduction to containers
Introduction to containers
 
Docker4Drupal 2.1 for Development
Docker4Drupal 2.1 for DevelopmentDocker4Drupal 2.1 for Development
Docker4Drupal 2.1 for Development
 
Docker n co
Docker n coDocker n co
Docker n co
 
Powercoders · Docker · Fall 2021.pptx
Powercoders · Docker · Fall 2021.pptxPowercoders · Docker · Fall 2021.pptx
Powercoders · Docker · Fall 2021.pptx
 
Build optimization mechanisms in GitLab and Docker
Build optimization mechanisms in GitLab and DockerBuild optimization mechanisms in GitLab and Docker
Build optimization mechanisms in GitLab and Docker
 
DockerCon EU 2015: Breaking the RPiDocker Challenge
DockerCon EU 2015: Breaking the RPiDocker Challenge DockerCon EU 2015: Breaking the RPiDocker Challenge
DockerCon EU 2015: Breaking the RPiDocker Challenge
 
Docker 0.11 at MaxCDN meetup in Los Angeles
Docker 0.11 at MaxCDN meetup in Los AngelesDocker 0.11 at MaxCDN meetup in Los Angeles
Docker 0.11 at MaxCDN meetup in Los Angeles
 
Talk on PHP Day Uruguay about Docker
Talk on PHP Day Uruguay about DockerTalk on PHP Day Uruguay about Docker
Talk on PHP Day Uruguay about Docker
 
Docker SQL Continuous Integration Flow
Docker SQL Continuous Integration FlowDocker SQL Continuous Integration Flow
Docker SQL Continuous Integration Flow
 
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
Docker 1 0 1 0 1: a Docker introduction, actualized for the stable release of...
 
Perspectives on Docker
Perspectives on DockerPerspectives on Docker
Perspectives on Docker
 
Super powered Drupal development with docker
Super powered Drupal development with dockerSuper powered Drupal development with docker
Super powered Drupal development with docker
 
A Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and ContainersA Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and Containers
 
Magento Docker Setup.pdf
Magento Docker Setup.pdfMagento Docker Setup.pdf
Magento Docker Setup.pdf
 
Infrastructure = Code
Infrastructure = CodeInfrastructure = Code
Infrastructure = Code
 
Docker 102 - Immutable Infrastructure
Docker 102 - Immutable InfrastructureDocker 102 - Immutable Infrastructure
Docker 102 - Immutable Infrastructure
 
Настройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'aНастройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'a
 
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
 
Introduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange CountyIntroduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange County
 

More from Jarek Potiuk

What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
Jarek Potiuk
 
Subtle Differences between Python versions
Subtle Differences between Python versionsSubtle Differences between Python versions
Subtle Differences between Python versions
Jarek Potiuk
 
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFestManageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Jarek Potiuk
 
Off time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social mediaOff time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social media
Jarek Potiuk
 
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
Jarek Potiuk
 
Berlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow WorkshopsBerlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow Workshops
Jarek Potiuk
 
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Jarek Potiuk
 
Ci for android OS
Ci for android OSCi for android OS
Ci for android OS
Jarek Potiuk
 
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
Jarek Potiuk
 
It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)
Jarek Potiuk
 
React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)
Jarek Potiuk
 

More from Jarek Potiuk (11)

What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
 
Subtle Differences between Python versions
Subtle Differences between Python versionsSubtle Differences between Python versions
Subtle Differences between Python versions
 
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFestManageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
 
Off time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social mediaOff time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social media
 
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
 
Berlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow WorkshopsBerlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow Workshops
 
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
 
Ci for android OS
Ci for android OSCi for android OS
Ci for android OS
 
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
 
It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)
 
React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)
 

Recently uploaded

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 

Recently uploaded (20)

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 

Caching in Docker - the hardest thing in computer science

  • 1. The hardest thing in computer science
  • 3. Docker Caching Dependency versions Install dependencies [ 20 minutes or so ] Only here copy all sources
  • 4. Intended behaviour ● No change: docker is not rebuilt - LIGHTNING FAST!!!! ● Sources change/dependencies not: only sources are added - QUITE FAST !!! ● Dependencies change: dependencies installed, sources - LITTLE SLOWER !!
  • 5. Actual behaviour same machine - local checkout ● Local docker registry ● Repeated build: 1:06m ● Only sources: 1:30m ● Dependencies: 11m ● Whole build: ~ 30m
  • 6. CI case ● Always fresh machine ○ no code ○ no registry ● Git clone/checkout ● Build ● Wipeout
  • 7. Docker registry to the rescue! Build cache: ● Docker build ● Docker push airflow/airflow:latest Use cache: ● Docker pull airflow/airflow:latest ● docker build --cache-from ariflow/airflow:latest
  • 8. Actual behaviour Docker Hub automated build ● DockerHub docker registry as cache ● Repeated build: 11m ● Only sources: 11m <- Still OK ● Dependencies: ~1h ● Whole build: ~ 2h
  • 9. Using the cache in Travis CI ● Docker Hub builds are slow ● Travis or Cloud Build use earlier image with --cache-from ● But only sources change most of the time
  • 10.
  • 11. Actual BAD behaviour Travis CI automated build ● Build on Travis with cache from DockerHub ● Repeated build: 11m ● Only sources: 1 h <- ● Dependencies: 1h ● Whole build: ~ 2h
  • 12.
  • 13. Problem no 1 Git & permissions ● git clone file creation: ○ local user ○ default user’s group ● file/dir permissions (rwxs) ○ preserves user, group and other rx permissions files & dirs ○ does not store w and by default uses umask when cloning by default ○ core.sharedRepository git-config ■ one of: group(true), all, umask(false), 0xxx ● Umask WTF: ○ file: 644 (DockeHub) vs. 664 (Travis CI) ○ dir: 755 (DockerHub) vs. 775 (Travis CI)
  • 14. Solution to problem 1 Fix group permissions
  • 15. Problem no 2 Generated files ● not only .gitignore ● generated files ○ autoapi - documentation ○ build artifacts ○ npm cache ○ .pyc files ○ files created accidentally (wget in source folder anyone?) ● COPY . ● Context calculated based on ALL files ● .dockerignore != .gitignore ● slightly different syntax
  • 16. Solution to problem 2 Set .dockerignore ** by default
  • 17. Problem no. 3 ● Download & compile ALL dependencies takes time!
  • 18. Partial solution to problem 3 Find the weakest link
  • 19. Solution to problem 3 a) build image with wheels
  • 20. Solution to problem 3 b) Copy directory via multi-stage Docker builds
  • 21. Solution 3 c) install using wheels
  • 22.
  • 23. Thank You! You can add some info where to follow you, or add information about polidea.com/blog