Alexander Akbashev
RootConf | June 06, 2017
Docker in Continuous
Integration
Agenda
• Context
• Very naive time
• Start Project Docker
• Something went wrong
• Chaos
• Still not perfect
• New day - new challenges
• Monitoring
• Morale
Context
What does CI System mean
for us
Context
• Self-hosted Jenkins
• Cloud based + target hardware in the labs
• Tons of configured project
• All changes are going through pre-commit
validation pipelines
• Different platform and different products
• Our users are our colleagues
Day 0
Very naive time
Everything was so simple… to break
Mutable host
- Yes, I really want to change /etc/
hosts for my integration test
- …
One agent - one package
You don’t want to mix some stuff
on one host
• one version of python
• one version of system library
• one version of everything
New package - new pain
- Oops, I didn’t know that
libXYZ-1.2 comes with new API
compare to libXYZ-1.1
Painful verification process
To test new package you need:
• new node
• new label
• cloned job (multiple jobs?)
• … but it’s used in 100+ projects…
Bad utilization
Some nodes are needed only in
rare cases
I want to test only on CentOS 5! It’s
my favourite production OS!
Download to build
Java, Python, Ruby, nodeJs tends
to download staff on-the-fly
External dependency
It’s not safe to query Internet in
pre-commit
> Could not resolve commons-io:commons-io:2.4.
> Could not get resource https://jcenter.bintray.com/commons-
io/commons-io/2.4/commons-io-2.4.pom
> Received status code 500 from server: Internal Server Error
Day 0
Start Project Docker!
Docker is so awesome!
• We can control docker content
• CI builds are reproducible locally
• Tests do not affect each other
• We can cache stuff in docker
Docker
Small intro
Definition
Docker provides isolated user
space
Dockerfile
FROM ubuntu:16.04
RUN apt-get update
RUN apt-get -y install 
gcc ccache cppcheck
docker build
Sending build context to Docker daemon 2.048kB
Step 1/3 : FROM ubuntu:16.04
---> f49eec89601e
Step 2/3 : RUN apt-get update
---> Running in c469408dd82f
---> 3964096123fa
Removing intermediate container c469408dd82f
Step 3/3 : RUN apt-get -y install g++ ccache cppcheck
---> Running in dc0e107be645
Removing intermediate container dc0e107be645
Successfully built 06f880788e38
docker build
Sending build context to Docker daemon 2.048kB
Step 1/3 : FROM ubuntu:16.04
---> f49eec89601e
Step 2/3 : RUN apt-get update
---> Running in c469408dd82f
---> 3964096123fa
Removing intermediate container c469408dd82f
Step 3/3 : RUN apt-get -y install g++ ccache cppcheck
---> Running in dc0e107be645
Removing intermediate container dc0e107be645
Successfully built 06f880788e38
docker image history
IMAGE CREATED (ago) CREATED BY SIZE
06f880788e38 6 minutes apt-get -y install gcc… 153MB
3964096123fa 7 minutes apt-get update 40MB
<missing> 4 months mkdir -p /run/... 7B
<missing> 4 months sed -i ’s/^... 1.9kB
<missing> 4 months set -xe &... 745B
<missing> 4 months (nop) ADD file 68f83d96c… 129MB
docker image history
IMAGE CREATED (ago) CREATED BY SIZE
06f880788e38 6 minutes apt-get -y install gcc… 153MB
3964096123fa 7 minutes apt-get update 40MB
<missing> 4 months mkdir -p /run/... 7B
<missing> 4 months sed -i ’s/^... 1.9kB
<missing> 4 months set -xe &... 745B
<missing> 4 months (nop) ADD file 68f83d96c… 129MB
docker image history
IMAGE CREATED (ago) CREATED BY SIZE
06f880788e38 6 minutes apt-get -y install gcc… 153MB
3964096123fa 7 minutes apt-get update 40MB
<missing> 4 months mkdir -p /run/... 7B
<missing> 4 months sed -i ’s/^... 1.9kB
<missing> 4 months set -xe &... 745B
<missing> 4 months (nop) ADD file 68f83d96c… 129MB
docker image history
IMAGE CREATED (ago) CREATED BY SIZE
06f880788e38 6 minutes apt-get -y install gcc… 153MB
3964096123fa 7 minutes apt-get update 40MB
<missing> 4 months mkdir -p /run/... 7B
<missing> 4 months sed -i ’s/^... 1.9kB
<missing> 4 months set -xe &... 745B
<missing> 4 months (nop) ADD file 68f83d96c… 129MB
docker image history
IMAGE CREATED (ago) CREATED BY SIZE
17534b008d4e 10 seconds apt-get update && apt-ge… 153MB
<missing> 4 months mkdir -p /run/... 7B
<missing> 4 months sed -i ’s/^... 1.9kB
<missing> 4 months set -xe &... 745B
<missing> 4 months (nop) ADD file 68f83d96c… 129MB
docker push
uploads image to central storage:
• DockerHub
• Artifactory
• AWS ECR
• default
docker pull
verifies that image is up-to-date
download layers
extract layers
docker run
pulls if image doesn’t exist
executes command in container
Day 1
Something went wrong
Our expectations didn’t meet reality
New image - new pain
docker pull my_product:latest
docker pull test:latest
sha256:12d30ce421ad530494d588f87b2328ddc3ca
Status: Downloaded newer image for test:latest
New image - new pain
docker pull my_product:latest
docker pull test:latest
sha256:12d30ce421ad530494d588f87b2328ddc3ca
Status: Downloaded newer image for test:latest
docker pull test:latest
sha256:01a21daf124543213d1a0514523612345198
Status: Downloaded newer image for test:latest
New image - new pain
docker pull my_product:latest
docker pull test:latest
sha256:12d30ce421ad530494d588f87b2328ddc3ca
Status: Downloaded newer image for test:latest
docker pull test:latest
sha256:01a21daf124543213d1a0514523612345198
Status: Downloaded newer image for test:latest
Testing new images in pre-commit
• tag as a version number
• versioning is mandatory (no
“latest” anymore!)
• overrides are not allowed
• actual version is defined in config
file (pre-submit testable now)
Timeouts
“docker pull” times out
docker pull my_image:1.0
b6f892: Downloading [===========> ] XX MB/YY MB
55010f: Downloading [============> ] XX MB/YY MB
2955fb: Downloading [=============> ] XX MB/YY MB
Timeouts
New feature in Timeout Plugin ->
Step with timeout
All images are backed in AMI itself
Docker stucks
--rm doesn’t guarantee much
docker run ——rm my_image:1.0 do_work.sh
Docker stucks: trap for docker!
Add trap for $DOCKER_TAG
trap "{
docker ps -aq --filter name=$BUILD_TAG |
xargs --no-run-if-empty docker rm -f --volumes ||
true;
} &> /dev/null" EXIT
docker run ——rm —name=${BUILD_TAG} my_image:1.0
do_work.sh
Docker stucks: trap for docker!
Add trap for $DOCKER_TAG
trap "{
docker ps -aq --filter name=$BUILD_TAG |
xargs --no-run-if-empty docker rm -f --volumes ||
true;
} &> /dev/null" EXIT
docker run ——rm —name=${BUILD_TAG} my_image:1.0
do_work.sh
Lightweight docker image?!
docker images have trend to
become bigger and bigger
from 500 MB… up to 3.2 GB
Let’s share common stuff
Base images
• configs
• user
• packages
Let’s share common stuff
FROM base:1.0
RUN apt-get install
gcc-4.9 python
FROM base:1.0
RUN apt-get install
gcc-4.9 nodejs
Base images
Let’s share common stuff
Base images
FROM ubuntu:16.04
RUN apt-get install
gcc-4.9
FROM base:1.0
RUN apt-get install
python
FROM base:1.0
RUN apt-get install
nodejs
Day 2
Chaos
Duplicated code is not worst duplicate problem
Docker image should do one thing only
Need something? Just put to the
basic image and enjoy!
Docker image should do one thing only
Split base image to build and test
images
- base image for building
- base image for testing (no -dev
packages)
- do not mix different tests
Mandatory reviews
Too many images -> too easy to
copy/paste
Mandatory reviews
Restrict permissions to repository
Too many projects
Hard to review:
• Explain same things multiple
times
• Argue
Simplify review process
Static analyzes:
• versions
• number of layers
• hardcoded value
• etc.
Day 3
Still not perfect
But already much better!
Images are still big
Hard to explain best practices
each time:
• no-install-recommends
• rm -rf /var/lib/apt/lists
• apt-get clean
etc/apt/apt.conf.d/docker-no-cache
Dpkg {
# Don't keep copies of packages after
download
 Cache "";
    Cache::archives "";
# Always delete list of packages
    Post-Invoke {"rm -rf /var/lib/apt/lists”;};
}
APT {
    Install-Recommends "false";
}
DSELECT::Clean "always”;
etc/apt/apt.conf.d/docker-no-cache
Dpkg {
# Don't keep copies of packages after
download
 Cache "";
    Cache::archives "";
# Always delete list of packages
    Post-Invoke {"rm -rf /var/lib/apt/lists”;};
}
APT {
    Install-Recommends "false";
}
DSELECT::Clean "always”;
etc/apt/apt.conf.d/docker-no-cache
Dpkg {
# Don't keep copies of packages after
download
 Cache "";
    Cache::archives "";
# Always delete list of packages
    Post-Invoke {"rm -rf /var/lib/apt/lists”;};
}
APT {
    Install-Recommends "false";
}
DSELECT::Clean "always”;
etc/apt/apt.conf.d/docker-no-cache
Dpkg {
# Don't keep copies of packages after
download   
 Cache "";
    Cache::archives "";
# Always delete list of packages
    Post-Invoke {"rm -rf /var/lib/apt/lists”;};
}
APT {
    Install-Recommends "false";
}
DSELECT::Clean "always”;
External dependency
It’s not safe to query Internet in
pre-commit
Still.
Restrict external resources
--net=none
--net=container:$BUILD_TAG
are only allowed in pre-submit tests
And a little bit more in builds
Restrict resources
All tests must be equal
• thread starvation
• oom-killer
• prevent regressions
Restrict external resources
standard profiles and recommend
values:
--cpus
--memory
Docker registry returns 500
Docker registry is down
• everything is blocked
• nothing is really needed from
registry
Docker registry returns 500
Don’t do `docker pull` if it’s not
needed
• check existing images
• exclude “:latest”
• not our registry
Day X
New day - new challenges
Monitoring
We monitor:
- uptime for `docker run`
- parameters
- infra issues
Build Failure Analyzer Plugin
• docker: Error response from daemon: linux runtime spec
devices: .+
• docker: Error response from daemon: rpc error: code =
2 desc = "containerd: container did not start before
the specified timeout”
• docker: Error response from daemon: Cannot start
container [0-9a-f]+: lstat .+
• docker: Error response from daemon: shim error:
context deadline exceeded.+
Groovy Event Listener Plugin
def bfa = run.getAction(FailureCauseBuildAction.class)
def causes = bfa
.getFailureCauseDisplayData()
.getFoundFailureCauses()
if (bfa != null && !causes.isEmpty()) {
for(def cause :causes) {
final Map<String, Object> data = new HashMap<>();
data.put("name", run.getParent().getFullName())
data.put("cause", cause.getName())
data.put("categories", cause.getCategories().join(','))
data.put("timestamp", run.timestamp.timeInMillis)
data.put(“node", run.getExecutor().getOwner().getNode()
.getNodeName())
logger.log("influx.bfa", bfaData)
}
}
Groovy Event Listener Plugin
def bfa = run.getAction(FailureCauseBuildAction.class)
def causes = bfa
.getFailureCauseDisplayData()
.getFoundFailureCauses()
if (bfa != null && !causes.isEmpty()) {
for(def cause :causes) {
final Map<String, Object> data = new HashMap<>();
data.put("name", run.getParent().getFullName())
data.put("cause", cause.getName())
data.put("categories", cause.getCategories().join(','))
data.put("timestamp", run.timestamp.timeInMillis)
data.put(“node", run.getExecutor().getOwner().getNode()
.getNodeName())
logger.log("influx.bfa", bfaData)
}
}
Groovy Event Listener Plugin
def bfa = run.getAction(FailureCauseBuildAction.class)
def causes = bfa
.getFailureCauseDisplayData()
.getFoundFailureCauses()
if (bfa != null && !causes.isEmpty()) {
for(def cause :causes) {
final Map<String, Object> data = new HashMap<>();
data.put("name", run.getParent().getFullName())
data.put("cause", cause.getName())
data.put("categories", cause.getCategories().join(','))
data.put("timestamp", run.timestamp.timeInMillis)
data.put(“node", run.getExecutor().getOwner().getNode()
.getNodeName())
logger.log("influx.bfa", bfaData)
}
}
Groovy Event Listener Plugin
def bfa = run.getAction(FailureCauseBuildAction.class)
def causes = bfa
.getFailureCauseDisplayData()
.getFoundFailureCauses()
if (bfa != null && !causes.isEmpty()) {
for(def cause :causes) {
final Map<String, Object> data = new HashMap<>();
data.put("name", run.getParent().getFullName())
data.put("cause", cause.getName())
data.put("categories", cause.getCategories().join(','))
data.put("timestamp", run.timestamp.timeInMillis)
data.put(“node", run.getExecutor().getOwner().getNode()
.getNodeName())
logger.log("influx.bfa", bfaData)
}
}
Groovy Event Listener Plugin
def bfa = run.getAction(FailureCauseBuildAction.class)
def causes = bfa
.getFailureCauseDisplayData()
.getFoundFailureCauses()
if (bfa != null && !causes.isEmpty()) {
for(def cause :causes) {
final Map<String, Object> data = new HashMap<>();
data.put("name", run.getParent().getFullName())
data.put("cause", cause.getName())
data.put("categories", cause.getCategories().join(','))
data.put("timestamp", run.timestamp.timeInMillis)
data.put(“node", run.getExecutor().getOwner().getNode()
.getNodeName())
logger.log("influx.bfa", bfaData)
}
}
FluentD
<match influx.bfa>
<store>
@type influxdb
host influxdb.internal
port 8086
dbname bfs
tag_keys ["name","node","cause","categories"]
timestamp_tag timestamp
time_precision ms
</store>
</match>
FluentD
<match influx.bfa>
<store>
@type influxdb
host influxdb.internal
port 8086
dbname bfs
tag_keys ["name","node","cause","categories"]
timestamp_tag timestamp
time_precision ms
</store>
</match>
Docker issues per week
Morale
Morale
• There is no silver bullet
• Consider Dockerfile as a source code
• Build monitoring for your CI
• Docker is under development (still)
• Docker really helps to stabilize CI
pipelines
Thank you
Contact
Alexander Akbashev
HERE
Invalidenstraße 116
10115 Berlin
GitHub: Jimilian
alexander.akbashev@here.com

Использование Docker в CI / Александр Акбашев (HERE Technologies)

  • 1.
    Alexander Akbashev RootConf |June 06, 2017 Docker in Continuous Integration
  • 3.
    Agenda • Context • Verynaive time • Start Project Docker • Something went wrong • Chaos • Still not perfect • New day - new challenges • Monitoring • Morale
  • 4.
    Context What does CISystem mean for us
  • 5.
    Context • Self-hosted Jenkins •Cloud based + target hardware in the labs • Tons of configured project • All changes are going through pre-commit validation pipelines • Different platform and different products • Our users are our colleagues
  • 6.
    Day 0 Very naivetime Everything was so simple… to break
  • 7.
    Mutable host - Yes,I really want to change /etc/ hosts for my integration test - …
  • 8.
    One agent -one package You don’t want to mix some stuff on one host • one version of python • one version of system library • one version of everything
  • 9.
    New package -new pain - Oops, I didn’t know that libXYZ-1.2 comes with new API compare to libXYZ-1.1
  • 10.
    Painful verification process Totest new package you need: • new node • new label • cloned job (multiple jobs?) • … but it’s used in 100+ projects…
  • 11.
    Bad utilization Some nodesare needed only in rare cases I want to test only on CentOS 5! It’s my favourite production OS!
  • 12.
    Download to build Java,Python, Ruby, nodeJs tends to download staff on-the-fly
  • 13.
    External dependency It’s notsafe to query Internet in pre-commit > Could not resolve commons-io:commons-io:2.4. > Could not get resource https://jcenter.bintray.com/commons- io/commons-io/2.4/commons-io-2.4.pom > Received status code 500 from server: Internal Server Error
  • 14.
  • 15.
    Docker is soawesome! • We can control docker content • CI builds are reproducible locally • Tests do not affect each other • We can cache stuff in docker
  • 16.
  • 17.
  • 18.
    Dockerfile FROM ubuntu:16.04 RUN apt-getupdate RUN apt-get -y install gcc ccache cppcheck
  • 19.
    docker build Sending buildcontext to Docker daemon 2.048kB Step 1/3 : FROM ubuntu:16.04 ---> f49eec89601e Step 2/3 : RUN apt-get update ---> Running in c469408dd82f ---> 3964096123fa Removing intermediate container c469408dd82f Step 3/3 : RUN apt-get -y install g++ ccache cppcheck ---> Running in dc0e107be645 Removing intermediate container dc0e107be645 Successfully built 06f880788e38
  • 20.
    docker build Sending buildcontext to Docker daemon 2.048kB Step 1/3 : FROM ubuntu:16.04 ---> f49eec89601e Step 2/3 : RUN apt-get update ---> Running in c469408dd82f ---> 3964096123fa Removing intermediate container c469408dd82f Step 3/3 : RUN apt-get -y install g++ ccache cppcheck ---> Running in dc0e107be645 Removing intermediate container dc0e107be645 Successfully built 06f880788e38
  • 21.
    docker image history IMAGECREATED (ago) CREATED BY SIZE 06f880788e38 6 minutes apt-get -y install gcc… 153MB 3964096123fa 7 minutes apt-get update 40MB <missing> 4 months mkdir -p /run/... 7B <missing> 4 months sed -i ’s/^... 1.9kB <missing> 4 months set -xe &... 745B <missing> 4 months (nop) ADD file 68f83d96c… 129MB
  • 22.
    docker image history IMAGECREATED (ago) CREATED BY SIZE 06f880788e38 6 minutes apt-get -y install gcc… 153MB 3964096123fa 7 minutes apt-get update 40MB <missing> 4 months mkdir -p /run/... 7B <missing> 4 months sed -i ’s/^... 1.9kB <missing> 4 months set -xe &... 745B <missing> 4 months (nop) ADD file 68f83d96c… 129MB
  • 23.
    docker image history IMAGECREATED (ago) CREATED BY SIZE 06f880788e38 6 minutes apt-get -y install gcc… 153MB 3964096123fa 7 minutes apt-get update 40MB <missing> 4 months mkdir -p /run/... 7B <missing> 4 months sed -i ’s/^... 1.9kB <missing> 4 months set -xe &... 745B <missing> 4 months (nop) ADD file 68f83d96c… 129MB
  • 24.
    docker image history IMAGECREATED (ago) CREATED BY SIZE 06f880788e38 6 minutes apt-get -y install gcc… 153MB 3964096123fa 7 minutes apt-get update 40MB <missing> 4 months mkdir -p /run/... 7B <missing> 4 months sed -i ’s/^... 1.9kB <missing> 4 months set -xe &... 745B <missing> 4 months (nop) ADD file 68f83d96c… 129MB
  • 25.
    docker image history IMAGECREATED (ago) CREATED BY SIZE 17534b008d4e 10 seconds apt-get update && apt-ge… 153MB <missing> 4 months mkdir -p /run/... 7B <missing> 4 months sed -i ’s/^... 1.9kB <missing> 4 months set -xe &... 745B <missing> 4 months (nop) ADD file 68f83d96c… 129MB
  • 26.
    docker push uploads imageto central storage: • DockerHub • Artifactory • AWS ECR • default
  • 27.
    docker pull verifies thatimage is up-to-date download layers extract layers
  • 28.
    docker run pulls ifimage doesn’t exist executes command in container
  • 29.
    Day 1 Something wentwrong Our expectations didn’t meet reality
  • 30.
    New image -new pain docker pull my_product:latest docker pull test:latest sha256:12d30ce421ad530494d588f87b2328ddc3ca Status: Downloaded newer image for test:latest
  • 31.
    New image -new pain docker pull my_product:latest docker pull test:latest sha256:12d30ce421ad530494d588f87b2328ddc3ca Status: Downloaded newer image for test:latest docker pull test:latest sha256:01a21daf124543213d1a0514523612345198 Status: Downloaded newer image for test:latest
  • 32.
    New image -new pain docker pull my_product:latest docker pull test:latest sha256:12d30ce421ad530494d588f87b2328ddc3ca Status: Downloaded newer image for test:latest docker pull test:latest sha256:01a21daf124543213d1a0514523612345198 Status: Downloaded newer image for test:latest
  • 33.
    Testing new imagesin pre-commit • tag as a version number • versioning is mandatory (no “latest” anymore!) • overrides are not allowed • actual version is defined in config file (pre-submit testable now)
  • 34.
    Timeouts “docker pull” timesout docker pull my_image:1.0 b6f892: Downloading [===========> ] XX MB/YY MB 55010f: Downloading [============> ] XX MB/YY MB 2955fb: Downloading [=============> ] XX MB/YY MB
  • 35.
    Timeouts New feature inTimeout Plugin -> Step with timeout All images are backed in AMI itself
  • 36.
    Docker stucks --rm doesn’tguarantee much docker run ——rm my_image:1.0 do_work.sh
  • 37.
    Docker stucks: trapfor docker! Add trap for $DOCKER_TAG trap "{ docker ps -aq --filter name=$BUILD_TAG | xargs --no-run-if-empty docker rm -f --volumes || true; } &> /dev/null" EXIT docker run ——rm —name=${BUILD_TAG} my_image:1.0 do_work.sh
  • 38.
    Docker stucks: trapfor docker! Add trap for $DOCKER_TAG trap "{ docker ps -aq --filter name=$BUILD_TAG | xargs --no-run-if-empty docker rm -f --volumes || true; } &> /dev/null" EXIT docker run ——rm —name=${BUILD_TAG} my_image:1.0 do_work.sh
  • 39.
    Lightweight docker image?! dockerimages have trend to become bigger and bigger from 500 MB… up to 3.2 GB
  • 40.
    Let’s share commonstuff Base images • configs • user • packages
  • 41.
    Let’s share commonstuff FROM base:1.0 RUN apt-get install gcc-4.9 python FROM base:1.0 RUN apt-get install gcc-4.9 nodejs Base images
  • 42.
    Let’s share commonstuff Base images FROM ubuntu:16.04 RUN apt-get install gcc-4.9 FROM base:1.0 RUN apt-get install python FROM base:1.0 RUN apt-get install nodejs
  • 43.
    Day 2 Chaos Duplicated codeis not worst duplicate problem
  • 44.
    Docker image shoulddo one thing only Need something? Just put to the basic image and enjoy!
  • 45.
    Docker image shoulddo one thing only Split base image to build and test images - base image for building - base image for testing (no -dev packages) - do not mix different tests
  • 46.
    Mandatory reviews Too manyimages -> too easy to copy/paste
  • 47.
  • 48.
    Too many projects Hardto review: • Explain same things multiple times • Argue
  • 49.
    Simplify review process Staticanalyzes: • versions • number of layers • hardcoded value • etc.
  • 50.
    Day 3 Still notperfect But already much better!
  • 51.
    Images are stillbig Hard to explain best practices each time: • no-install-recommends • rm -rf /var/lib/apt/lists • apt-get clean
  • 52.
    etc/apt/apt.conf.d/docker-no-cache Dpkg { # Don'tkeep copies of packages after download  Cache "";     Cache::archives ""; # Always delete list of packages     Post-Invoke {"rm -rf /var/lib/apt/lists”;}; } APT {     Install-Recommends "false"; } DSELECT::Clean "always”;
  • 53.
    etc/apt/apt.conf.d/docker-no-cache Dpkg { # Don'tkeep copies of packages after download  Cache "";     Cache::archives ""; # Always delete list of packages     Post-Invoke {"rm -rf /var/lib/apt/lists”;}; } APT {     Install-Recommends "false"; } DSELECT::Clean "always”;
  • 54.
    etc/apt/apt.conf.d/docker-no-cache Dpkg { # Don'tkeep copies of packages after download  Cache "";     Cache::archives ""; # Always delete list of packages     Post-Invoke {"rm -rf /var/lib/apt/lists”;}; } APT {     Install-Recommends "false"; } DSELECT::Clean "always”;
  • 55.
    etc/apt/apt.conf.d/docker-no-cache Dpkg { # Don'tkeep copies of packages after download     Cache "";     Cache::archives ""; # Always delete list of packages     Post-Invoke {"rm -rf /var/lib/apt/lists”;}; } APT {     Install-Recommends "false"; } DSELECT::Clean "always”;
  • 56.
    External dependency It’s notsafe to query Internet in pre-commit Still.
  • 57.
    Restrict external resources --net=none --net=container:$BUILD_TAG areonly allowed in pre-submit tests And a little bit more in builds
  • 58.
    Restrict resources All testsmust be equal • thread starvation • oom-killer • prevent regressions
  • 59.
    Restrict external resources standardprofiles and recommend values: --cpus --memory
  • 60.
    Docker registry returns500 Docker registry is down • everything is blocked • nothing is really needed from registry
  • 61.
    Docker registry returns500 Don’t do `docker pull` if it’s not needed • check existing images • exclude “:latest” • not our registry
  • 62.
    Day X New day- new challenges
  • 63.
    Monitoring We monitor: - uptimefor `docker run` - parameters - infra issues
  • 64.
    Build Failure AnalyzerPlugin • docker: Error response from daemon: linux runtime spec devices: .+ • docker: Error response from daemon: rpc error: code = 2 desc = "containerd: container did not start before the specified timeout” • docker: Error response from daemon: Cannot start container [0-9a-f]+: lstat .+ • docker: Error response from daemon: shim error: context deadline exceeded.+
  • 65.
    Groovy Event ListenerPlugin def bfa = run.getAction(FailureCauseBuildAction.class) def causes = bfa .getFailureCauseDisplayData() .getFoundFailureCauses() if (bfa != null && !causes.isEmpty()) { for(def cause :causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", run.getParent().getFullName()) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", run.timestamp.timeInMillis) data.put(“node", run.getExecutor().getOwner().getNode() .getNodeName()) logger.log("influx.bfa", bfaData) } }
  • 66.
    Groovy Event ListenerPlugin def bfa = run.getAction(FailureCauseBuildAction.class) def causes = bfa .getFailureCauseDisplayData() .getFoundFailureCauses() if (bfa != null && !causes.isEmpty()) { for(def cause :causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", run.getParent().getFullName()) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", run.timestamp.timeInMillis) data.put(“node", run.getExecutor().getOwner().getNode() .getNodeName()) logger.log("influx.bfa", bfaData) } }
  • 67.
    Groovy Event ListenerPlugin def bfa = run.getAction(FailureCauseBuildAction.class) def causes = bfa .getFailureCauseDisplayData() .getFoundFailureCauses() if (bfa != null && !causes.isEmpty()) { for(def cause :causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", run.getParent().getFullName()) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", run.timestamp.timeInMillis) data.put(“node", run.getExecutor().getOwner().getNode() .getNodeName()) logger.log("influx.bfa", bfaData) } }
  • 68.
    Groovy Event ListenerPlugin def bfa = run.getAction(FailureCauseBuildAction.class) def causes = bfa .getFailureCauseDisplayData() .getFoundFailureCauses() if (bfa != null && !causes.isEmpty()) { for(def cause :causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", run.getParent().getFullName()) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", run.timestamp.timeInMillis) data.put(“node", run.getExecutor().getOwner().getNode() .getNodeName()) logger.log("influx.bfa", bfaData) } }
  • 69.
    Groovy Event ListenerPlugin def bfa = run.getAction(FailureCauseBuildAction.class) def causes = bfa .getFailureCauseDisplayData() .getFoundFailureCauses() if (bfa != null && !causes.isEmpty()) { for(def cause :causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", run.getParent().getFullName()) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", run.timestamp.timeInMillis) data.put(“node", run.getExecutor().getOwner().getNode() .getNodeName()) logger.log("influx.bfa", bfaData) } }
  • 70.
    FluentD <match influx.bfa> <store> @type influxdb hostinfluxdb.internal port 8086 dbname bfs tag_keys ["name","node","cause","categories"] timestamp_tag timestamp time_precision ms </store> </match>
  • 71.
    FluentD <match influx.bfa> <store> @type influxdb hostinfluxdb.internal port 8086 dbname bfs tag_keys ["name","node","cause","categories"] timestamp_tag timestamp time_precision ms </store> </match>
  • 72.
  • 73.
  • 74.
    Morale • There isno silver bullet • Consider Dockerfile as a source code • Build monitoring for your CI • Docker is under development (still) • Docker really helps to stabilize CI pipelines
  • 75.
    Thank you Contact Alexander Akbashev HERE Invalidenstraße116 10115 Berlin GitHub: Jimilian alexander.akbashev@here.com