Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Alexander Akbashev
RootConf | June 06, 2017
Docker in Continuous
Integration
Agenda
• Context
• Very naive time
• Start Project Docker
• Something went wrong
• Chaos
• Still not perfect
• New day - n...
Context
What does CI System mean
for us
Context
• Self-hosted Jenkins
• Cloud based + target hardware in the labs
• Tons of configured project
• All changes are g...
Day 0
Very naive time
Everything was so simple… to break
Mutable host
- Yes, I really want to change /etc/
hosts for my integration test
- …
One agent - one package
You don’t want to mix some stuff
on one host
• one version of python
• one version of system libra...
New package - new pain
- Oops, I didn’t know that
libXYZ-1.2 comes with new API
compare to libXYZ-1.1
Painful verification process
To test new package you need:
• new node
• new label
• cloned job (multiple jobs?)
• … but it...
Bad utilization
Some nodes are needed only in
rare cases
I want to test only on CentOS 5! It’s
my favourite production OS!
Download to build
Java, Python, Ruby, nodeJs tends
to download staff on-the-fly
External dependency
It’s not safe to query Internet in
pre-commit
> Could not resolve commons-io:commons-io:2.4.
> Could n...
Day 0
Start Project Docker!
Docker is so awesome!
• We can control docker content
• CI builds are reproducible locally
• Tests do not affect each othe...
Docker
Small intro
Definition
Docker provides isolated user
space
Dockerfile
FROM ubuntu:16.04
RUN apt-get update
RUN apt-get -y install 
gcc ccache cppcheck
docker build
Sending build context to Docker daemon 2.048kB
Step 1/3 : FROM ubuntu:16.04
---> f49eec89601e
Step 2/3 : RUN ...
docker build
Sending build context to Docker daemon 2.048kB
Step 1/3 : FROM ubuntu:16.04
---> f49eec89601e
Step 2/3 : RUN ...
docker image history
IMAGE CREATED (ago) CREATED BY SIZE
06f880788e38 6 minutes apt-get -y install gcc… 153MB
3964096123fa...
docker image history
IMAGE CREATED (ago) CREATED BY SIZE
06f880788e38 6 minutes apt-get -y install gcc… 153MB
3964096123fa...
docker image history
IMAGE CREATED (ago) CREATED BY SIZE
06f880788e38 6 minutes apt-get -y install gcc… 153MB
3964096123fa...
docker image history
IMAGE CREATED (ago) CREATED BY SIZE
06f880788e38 6 minutes apt-get -y install gcc… 153MB
3964096123fa...
docker image history
IMAGE CREATED (ago) CREATED BY SIZE
17534b008d4e 10 seconds apt-get update && apt-ge… 153MB
<missing>...
docker push
uploads image to central storage:
• DockerHub
• Artifactory
• AWS ECR
• default
docker pull
verifies that image is up-to-date
download layers
extract layers
docker run
pulls if image doesn’t exist
executes command in container
Day 1
Something went wrong
Our expectations didn’t meet reality
New image - new pain
docker pull my_product:latest
docker pull test:latest
sha256:12d30ce421ad530494d588f87b2328ddc3ca
Sta...
New image - new pain
docker pull my_product:latest
docker pull test:latest
sha256:12d30ce421ad530494d588f87b2328ddc3ca
Sta...
New image - new pain
docker pull my_product:latest
docker pull test:latest
sha256:12d30ce421ad530494d588f87b2328ddc3ca
Sta...
Testing new images in pre-commit
• tag as a version number
• versioning is mandatory (no
“latest” anymore!)
• overrides ar...
Timeouts
“docker pull” times out
docker pull my_image:1.0
b6f892: Downloading [===========> ] XX MB/YY MB
55010f: Download...
Timeouts
New feature in Timeout Plugin ->
Step with timeout
All images are backed in AMI itself
Docker stucks
--rm doesn’t guarantee much
docker run ——rm my_image:1.0 do_work.sh
Docker stucks: trap for docker!
Add trap for $DOCKER_TAG
trap "{
docker ps -aq --filter name=$BUILD_TAG |
xargs --no-run-i...
Docker stucks: trap for docker!
Add trap for $DOCKER_TAG
trap "{
docker ps -aq --filter name=$BUILD_TAG |
xargs --no-run-i...
Lightweight docker image?!
docker images have trend to
become bigger and bigger
from 500 MB… up to 3.2 GB
Let’s share common stuff
Base images
• configs
• user
• packages
Let’s share common stuff
FROM base:1.0
RUN apt-get install
gcc-4.9 python
FROM base:1.0
RUN apt-get install
gcc-4.9 nodejs...
Let’s share common stuff
Base images
FROM ubuntu:16.04
RUN apt-get install
gcc-4.9
FROM base:1.0
RUN apt-get install
pytho...
Day 2
Chaos
Duplicated code is not worst duplicate problem
Docker image should do one thing only
Need something? Just put to the
basic image and enjoy!
Docker image should do one thing only
Split base image to build and test
images
- base image for building
- base image for...
Mandatory reviews
Too many images -> too easy to
copy/paste
Mandatory reviews
Restrict permissions to repository
Too many projects
Hard to review:
• Explain same things multiple
times
• Argue
Simplify review process
Static analyzes:
• versions
• number of layers
• hardcoded value
• etc.
Day 3
Still not perfect
But already much better!
Images are still big
Hard to explain best practices
each time:
• no-install-recommends
• rm -rf /var/lib/apt/lists
• apt-g...
etc/apt/apt.conf.d/docker-no-cache
Dpkg {
# Don't keep copies of packages after
download
 Cache "";
    Cache::archives ""...
etc/apt/apt.conf.d/docker-no-cache
Dpkg {
# Don't keep copies of packages after
download
 Cache "";
    Cache::archives ""...
etc/apt/apt.conf.d/docker-no-cache
Dpkg {
# Don't keep copies of packages after
download
 Cache "";
    Cache::archives ""...
etc/apt/apt.conf.d/docker-no-cache
Dpkg {
# Don't keep copies of packages after
download   
 Cache "";
    Cache::archives...
External dependency
It’s not safe to query Internet in
pre-commit
Still.
Restrict external resources
--net=none
--net=container:$BUILD_TAG
are only allowed in pre-submit tests
And a little bit mo...
Restrict resources
All tests must be equal
• thread starvation
• oom-killer
• prevent regressions
Restrict external resources
standard profiles and recommend
values:
--cpus
--memory
Docker registry returns 500
Docker registry is down
• everything is blocked
• nothing is really needed from
registry
Docker registry returns 500
Don’t do `docker pull` if it’s not
needed
• check existing images
• exclude “:latest”
• not ou...
Day X
New day - new challenges
Monitoring
We monitor:
- uptime for `docker run`
- parameters
- infra issues
Build Failure Analyzer Plugin
• docker: Error response from daemon: linux runtime spec
devices: .+
• docker: Error respons...
Groovy Event Listener Plugin
def bfa = run.getAction(FailureCauseBuildAction.class)
def causes = bfa
.getFailureCauseDispl...
Groovy Event Listener Plugin
def bfa = run.getAction(FailureCauseBuildAction.class)
def causes = bfa
.getFailureCauseDispl...
Groovy Event Listener Plugin
def bfa = run.getAction(FailureCauseBuildAction.class)
def causes = bfa
.getFailureCauseDispl...
Groovy Event Listener Plugin
def bfa = run.getAction(FailureCauseBuildAction.class)
def causes = bfa
.getFailureCauseDispl...
Groovy Event Listener Plugin
def bfa = run.getAction(FailureCauseBuildAction.class)
def causes = bfa
.getFailureCauseDispl...
FluentD
<match influx.bfa>
<store>
@type influxdb
host influxdb.internal
port 8086
dbname bfs
tag_keys ["name","node","cau...
FluentD
<match influx.bfa>
<store>
@type influxdb
host influxdb.internal
port 8086
dbname bfs
tag_keys ["name","node","cau...
Docker issues per week
Morale
Morale
• There is no silver bullet
• Consider Dockerfile as a source code
• Build monitoring for your CI
• Docker is under...
Thank you
Contact
Alexander Akbashev
HERE
Invalidenstraße 116
10115 Berlin
GitHub: Jimilian
alexander.akbashev@here.com
Использование Docker в CI / Александр Акбашев (HERE Technologies)
Upcoming SlideShare
Loading in …5
×

Использование Docker в CI / Александр Акбашев (HERE Technologies)

334 views

Published on

РИТ++ 2017, Root Conf
Зал Пекин + Шанхай, 6 июня, 17:00

Тезисы:
http://rootconf.ru/2017/abstracts/2504.html

В своём докладе я расскажу о том, почему мы решили использовать Docker в рамках Continuous Integration: ускорить тесты, повысить стабильность, улучшить контроль над окружением и используемыми библиотеками.

Доклад так же содержит подробности о многих сложностях, с которыми пришлось столкнуться в ходе миграции на Docker: борьба с растущим числом и размером образов, бесконтрольные обновления образов, нестабильное поведение, и другие.

В конце доклада я покажу, как именно мы следим за стабильностью Docker в нашей инфраструктуре. И насколько Docker стабилен на больших объемах (больше 100k билдов в сутки).

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Использование Docker в CI / Александр Акбашев (HERE Technologies)

  1. 1. Alexander Akbashev RootConf | June 06, 2017 Docker in Continuous Integration
  2. 2. Agenda • Context • Very naive time • Start Project Docker • Something went wrong • Chaos • Still not perfect • New day - new challenges • Monitoring • Morale
  3. 3. Context What does CI System mean for us
  4. 4. Context • Self-hosted Jenkins • Cloud based + target hardware in the labs • Tons of configured project • All changes are going through pre-commit validation pipelines • Different platform and different products • Our users are our colleagues
  5. 5. Day 0 Very naive time Everything was so simple… to break
  6. 6. Mutable host - Yes, I really want to change /etc/ hosts for my integration test - …
  7. 7. One agent - one package You don’t want to mix some stuff on one host • one version of python • one version of system library • one version of everything
  8. 8. New package - new pain - Oops, I didn’t know that libXYZ-1.2 comes with new API compare to libXYZ-1.1
  9. 9. Painful verification process To test new package you need: • new node • new label • cloned job (multiple jobs?) • … but it’s used in 100+ projects…
  10. 10. Bad utilization Some nodes are needed only in rare cases I want to test only on CentOS 5! It’s my favourite production OS!
  11. 11. Download to build Java, Python, Ruby, nodeJs tends to download staff on-the-fly
  12. 12. External dependency It’s not safe to query Internet in pre-commit > Could not resolve commons-io:commons-io:2.4. > Could not get resource https://jcenter.bintray.com/commons- io/commons-io/2.4/commons-io-2.4.pom > Received status code 500 from server: Internal Server Error
  13. 13. Day 0 Start Project Docker!
  14. 14. Docker is so awesome! • We can control docker content • CI builds are reproducible locally • Tests do not affect each other • We can cache stuff in docker
  15. 15. Docker Small intro
  16. 16. Definition Docker provides isolated user space
  17. 17. Dockerfile FROM ubuntu:16.04 RUN apt-get update RUN apt-get -y install gcc ccache cppcheck
  18. 18. docker build Sending build context to Docker daemon 2.048kB Step 1/3 : FROM ubuntu:16.04 ---> f49eec89601e Step 2/3 : RUN apt-get update ---> Running in c469408dd82f ---> 3964096123fa Removing intermediate container c469408dd82f Step 3/3 : RUN apt-get -y install g++ ccache cppcheck ---> Running in dc0e107be645 Removing intermediate container dc0e107be645 Successfully built 06f880788e38
  19. 19. docker build Sending build context to Docker daemon 2.048kB Step 1/3 : FROM ubuntu:16.04 ---> f49eec89601e Step 2/3 : RUN apt-get update ---> Running in c469408dd82f ---> 3964096123fa Removing intermediate container c469408dd82f Step 3/3 : RUN apt-get -y install g++ ccache cppcheck ---> Running in dc0e107be645 Removing intermediate container dc0e107be645 Successfully built 06f880788e38
  20. 20. docker image history IMAGE CREATED (ago) CREATED BY SIZE 06f880788e38 6 minutes apt-get -y install gcc… 153MB 3964096123fa 7 minutes apt-get update 40MB <missing> 4 months mkdir -p /run/... 7B <missing> 4 months sed -i ’s/^... 1.9kB <missing> 4 months set -xe &... 745B <missing> 4 months (nop) ADD file 68f83d96c… 129MB
  21. 21. docker image history IMAGE CREATED (ago) CREATED BY SIZE 06f880788e38 6 minutes apt-get -y install gcc… 153MB 3964096123fa 7 minutes apt-get update 40MB <missing> 4 months mkdir -p /run/... 7B <missing> 4 months sed -i ’s/^... 1.9kB <missing> 4 months set -xe &... 745B <missing> 4 months (nop) ADD file 68f83d96c… 129MB
  22. 22. docker image history IMAGE CREATED (ago) CREATED BY SIZE 06f880788e38 6 minutes apt-get -y install gcc… 153MB 3964096123fa 7 minutes apt-get update 40MB <missing> 4 months mkdir -p /run/... 7B <missing> 4 months sed -i ’s/^... 1.9kB <missing> 4 months set -xe &... 745B <missing> 4 months (nop) ADD file 68f83d96c… 129MB
  23. 23. docker image history IMAGE CREATED (ago) CREATED BY SIZE 06f880788e38 6 minutes apt-get -y install gcc… 153MB 3964096123fa 7 minutes apt-get update 40MB <missing> 4 months mkdir -p /run/... 7B <missing> 4 months sed -i ’s/^... 1.9kB <missing> 4 months set -xe &... 745B <missing> 4 months (nop) ADD file 68f83d96c… 129MB
  24. 24. docker image history IMAGE CREATED (ago) CREATED BY SIZE 17534b008d4e 10 seconds apt-get update && apt-ge… 153MB <missing> 4 months mkdir -p /run/... 7B <missing> 4 months sed -i ’s/^... 1.9kB <missing> 4 months set -xe &... 745B <missing> 4 months (nop) ADD file 68f83d96c… 129MB
  25. 25. docker push uploads image to central storage: • DockerHub • Artifactory • AWS ECR • default
  26. 26. docker pull verifies that image is up-to-date download layers extract layers
  27. 27. docker run pulls if image doesn’t exist executes command in container
  28. 28. Day 1 Something went wrong Our expectations didn’t meet reality
  29. 29. New image - new pain docker pull my_product:latest docker pull test:latest sha256:12d30ce421ad530494d588f87b2328ddc3ca Status: Downloaded newer image for test:latest
  30. 30. New image - new pain docker pull my_product:latest docker pull test:latest sha256:12d30ce421ad530494d588f87b2328ddc3ca Status: Downloaded newer image for test:latest docker pull test:latest sha256:01a21daf124543213d1a0514523612345198 Status: Downloaded newer image for test:latest
  31. 31. New image - new pain docker pull my_product:latest docker pull test:latest sha256:12d30ce421ad530494d588f87b2328ddc3ca Status: Downloaded newer image for test:latest docker pull test:latest sha256:01a21daf124543213d1a0514523612345198 Status: Downloaded newer image for test:latest
  32. 32. Testing new images in pre-commit • tag as a version number • versioning is mandatory (no “latest” anymore!) • overrides are not allowed • actual version is defined in config file (pre-submit testable now)
  33. 33. Timeouts “docker pull” times out docker pull my_image:1.0 b6f892: Downloading [===========> ] XX MB/YY MB 55010f: Downloading [============> ] XX MB/YY MB 2955fb: Downloading [=============> ] XX MB/YY MB
  34. 34. Timeouts New feature in Timeout Plugin -> Step with timeout All images are backed in AMI itself
  35. 35. Docker stucks --rm doesn’t guarantee much docker run ——rm my_image:1.0 do_work.sh
  36. 36. Docker stucks: trap for docker! Add trap for $DOCKER_TAG trap "{ docker ps -aq --filter name=$BUILD_TAG | xargs --no-run-if-empty docker rm -f --volumes || true; } &> /dev/null" EXIT docker run ——rm —name=${BUILD_TAG} my_image:1.0 do_work.sh
  37. 37. Docker stucks: trap for docker! Add trap for $DOCKER_TAG trap "{ docker ps -aq --filter name=$BUILD_TAG | xargs --no-run-if-empty docker rm -f --volumes || true; } &> /dev/null" EXIT docker run ——rm —name=${BUILD_TAG} my_image:1.0 do_work.sh
  38. 38. Lightweight docker image?! docker images have trend to become bigger and bigger from 500 MB… up to 3.2 GB
  39. 39. Let’s share common stuff Base images • configs • user • packages
  40. 40. Let’s share common stuff FROM base:1.0 RUN apt-get install gcc-4.9 python FROM base:1.0 RUN apt-get install gcc-4.9 nodejs Base images
  41. 41. Let’s share common stuff Base images FROM ubuntu:16.04 RUN apt-get install gcc-4.9 FROM base:1.0 RUN apt-get install python FROM base:1.0 RUN apt-get install nodejs
  42. 42. Day 2 Chaos Duplicated code is not worst duplicate problem
  43. 43. Docker image should do one thing only Need something? Just put to the basic image and enjoy!
  44. 44. Docker image should do one thing only Split base image to build and test images - base image for building - base image for testing (no -dev packages) - do not mix different tests
  45. 45. Mandatory reviews Too many images -> too easy to copy/paste
  46. 46. Mandatory reviews Restrict permissions to repository
  47. 47. Too many projects Hard to review: • Explain same things multiple times • Argue
  48. 48. Simplify review process Static analyzes: • versions • number of layers • hardcoded value • etc.
  49. 49. Day 3 Still not perfect But already much better!
  50. 50. Images are still big Hard to explain best practices each time: • no-install-recommends • rm -rf /var/lib/apt/lists • apt-get clean
  51. 51. etc/apt/apt.conf.d/docker-no-cache Dpkg { # Don't keep copies of packages after download  Cache "";     Cache::archives ""; # Always delete list of packages     Post-Invoke {"rm -rf /var/lib/apt/lists”;}; } APT {     Install-Recommends "false"; } DSELECT::Clean "always”;
  52. 52. etc/apt/apt.conf.d/docker-no-cache Dpkg { # Don't keep copies of packages after download  Cache "";     Cache::archives ""; # Always delete list of packages     Post-Invoke {"rm -rf /var/lib/apt/lists”;}; } APT {     Install-Recommends "false"; } DSELECT::Clean "always”;
  53. 53. etc/apt/apt.conf.d/docker-no-cache Dpkg { # Don't keep copies of packages after download  Cache "";     Cache::archives ""; # Always delete list of packages     Post-Invoke {"rm -rf /var/lib/apt/lists”;}; } APT {     Install-Recommends "false"; } DSELECT::Clean "always”;
  54. 54. etc/apt/apt.conf.d/docker-no-cache Dpkg { # Don't keep copies of packages after download     Cache "";     Cache::archives ""; # Always delete list of packages     Post-Invoke {"rm -rf /var/lib/apt/lists”;}; } APT {     Install-Recommends "false"; } DSELECT::Clean "always”;
  55. 55. External dependency It’s not safe to query Internet in pre-commit Still.
  56. 56. Restrict external resources --net=none --net=container:$BUILD_TAG are only allowed in pre-submit tests And a little bit more in builds
  57. 57. Restrict resources All tests must be equal • thread starvation • oom-killer • prevent regressions
  58. 58. Restrict external resources standard profiles and recommend values: --cpus --memory
  59. 59. Docker registry returns 500 Docker registry is down • everything is blocked • nothing is really needed from registry
  60. 60. Docker registry returns 500 Don’t do `docker pull` if it’s not needed • check existing images • exclude “:latest” • not our registry
  61. 61. Day X New day - new challenges
  62. 62. Monitoring We monitor: - uptime for `docker run` - parameters - infra issues
  63. 63. Build Failure Analyzer Plugin • docker: Error response from daemon: linux runtime spec devices: .+ • docker: Error response from daemon: rpc error: code = 2 desc = "containerd: container did not start before the specified timeout” • docker: Error response from daemon: Cannot start container [0-9a-f]+: lstat .+ • docker: Error response from daemon: shim error: context deadline exceeded.+
  64. 64. Groovy Event Listener Plugin def bfa = run.getAction(FailureCauseBuildAction.class) def causes = bfa .getFailureCauseDisplayData() .getFoundFailureCauses() if (bfa != null && !causes.isEmpty()) { for(def cause :causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", run.getParent().getFullName()) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", run.timestamp.timeInMillis) data.put(“node", run.getExecutor().getOwner().getNode() .getNodeName()) logger.log("influx.bfa", bfaData) } }
  65. 65. Groovy Event Listener Plugin def bfa = run.getAction(FailureCauseBuildAction.class) def causes = bfa .getFailureCauseDisplayData() .getFoundFailureCauses() if (bfa != null && !causes.isEmpty()) { for(def cause :causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", run.getParent().getFullName()) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", run.timestamp.timeInMillis) data.put(“node", run.getExecutor().getOwner().getNode() .getNodeName()) logger.log("influx.bfa", bfaData) } }
  66. 66. Groovy Event Listener Plugin def bfa = run.getAction(FailureCauseBuildAction.class) def causes = bfa .getFailureCauseDisplayData() .getFoundFailureCauses() if (bfa != null && !causes.isEmpty()) { for(def cause :causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", run.getParent().getFullName()) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", run.timestamp.timeInMillis) data.put(“node", run.getExecutor().getOwner().getNode() .getNodeName()) logger.log("influx.bfa", bfaData) } }
  67. 67. Groovy Event Listener Plugin def bfa = run.getAction(FailureCauseBuildAction.class) def causes = bfa .getFailureCauseDisplayData() .getFoundFailureCauses() if (bfa != null && !causes.isEmpty()) { for(def cause :causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", run.getParent().getFullName()) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", run.timestamp.timeInMillis) data.put(“node", run.getExecutor().getOwner().getNode() .getNodeName()) logger.log("influx.bfa", bfaData) } }
  68. 68. Groovy Event Listener Plugin def bfa = run.getAction(FailureCauseBuildAction.class) def causes = bfa .getFailureCauseDisplayData() .getFoundFailureCauses() if (bfa != null && !causes.isEmpty()) { for(def cause :causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", run.getParent().getFullName()) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", run.timestamp.timeInMillis) data.put(“node", run.getExecutor().getOwner().getNode() .getNodeName()) logger.log("influx.bfa", bfaData) } }
  69. 69. FluentD <match influx.bfa> <store> @type influxdb host influxdb.internal port 8086 dbname bfs tag_keys ["name","node","cause","categories"] timestamp_tag timestamp time_precision ms </store> </match>
  70. 70. FluentD <match influx.bfa> <store> @type influxdb host influxdb.internal port 8086 dbname bfs tag_keys ["name","node","cause","categories"] timestamp_tag timestamp time_precision ms </store> </match>
  71. 71. Docker issues per week
  72. 72. Morale
  73. 73. Morale • There is no silver bullet • Consider Dockerfile as a source code • Build monitoring for your CI • Docker is under development (still) • Docker really helps to stabilize CI pipelines
  74. 74. Thank you Contact Alexander Akbashev HERE Invalidenstraße 116 10115 Berlin GitHub: Jimilian alexander.akbashev@here.com

×