https://qconnewyork.com/ny2016/presentation/reaching-production-faster-with-containers-in-testing
Spotify adopted container technology early on and built its own OSS framework for container orchestration called Helios. Not only do containers run many critical systems at Spotify, they also improve and accelerate development. We run containerized integration tests close to 400 times a day.
This talk covers how our Helios testing framework drives integration tests and spins up entirely self-contained environments during test runs. Developers can test services locally in an environment closely resembling the production stack; spin up dependent services like Cassandra, memcached, or even other containerized Spotify services; and even test their deployment and service discovery configurations.
Learn how this style of integration testing has increased our code quality and successful deployments.
2. Spotify’s Scale
• 150+ people in infrastructure
• Thousands of hosts,
2000+ running containers
• 1500+ deploys in past month,
majority were containers
3. About David Xia
• Work on deployment infrastructure
• Work on open-source Docker orchestration tool Helios
4. Prerequisites for This Talk
• You’re familiar with containers
• You like tests
• That’s it!
5. Prerequisites for This Talk
You don’t need to use containers in
production for this talk to be useful!
6. Three Problems, Three Ways to Solve with Containers
How can I enable developers to:
• catch container misconfiguration in tests?
• easily install and start non-trivial test dependencies?
• make their tests isolated and reproducible?
7. “Why did my service pass
integration tests but fail when
I deployed it as a container?”
- Sad Developer 1
Problem 1: Container Misconfiguration
10. “I want to run my project's
integration tests locally. The tests
need a local Cassandra/other
DB. How do I set everything up?”
- Sad Developer 2
Problem 2: Non-trivial dependencies
16. Key Takeaways - Using Containers in Tests Can Help
You:
• Test more of the stack in an env
resembling production
• Easily start real dependencies
• Ensure tests are reproducible
and isolated
24. “Testing with a
Cassandra container is
the closest I can get to
testing against Cassandra
in reality.”
Successes
25. “I especially like the fact that I
can test my image in its final
state and be confident that it
will work in production.”
Successes
26. “Using helios-testing to run datastores in
containers has made the tests portable
and setup free (by setup I mean no
manual installation of the datastore on
the test machine or locally).”
Successes
27. Make sure your testing framework and
infrastructure are fast and reliable.
Lessons Learned
30. Key Takeaways - Using Containers in Tests Can Help
You:
• Test more of the stack in an env
resembling production
• Easily start real dependencies
• Ensure tests are reproducible
and isolated
31. When Not to Use Containers in
Tests
• Don’t test functionality unrelated
to containers that you can easily
test separately
• When your container-based tests
overlaps a lot with regular
integration tests
early days, tech infrastructure was one small team
then Spotify grew [next]
today: 100+M users, 30+M subscribers, 55+ countries
needed more robust infra, tools, and shared services
Infra team grew and split into different teams with different responsibilities
[next] Has over 150 people today
[next] Help backend devs maintain thousands of hosts
We optimize for speed in iterating on product
several years ago, we realized we needed to implement and improve CI/CD practices
[next] today: over 1500 deploys last month, most were containers
There are very few prereqs for this talk
bc concepts are broadly applicable to variety of tools and technologies
Poll audience
Raise your hand if you’re familiar with containers or Docker.
Raise hand if you like tests
[familiar with containers]
[like tests]
#3…[that’s it!] There’s no more pre-reqs
[drawing] Tests I’m talking are high level and test overall behavior of a service
can be considered more high level than ITs
Raise hand if you use containers in
dev
testing
prod
very important
[You don’t even need to be using containers in prod for this talk to be useful]
this talk is from POV of devops team
talk addresses “How can I create tools to make other devs’ jobs easier?”
[Why did my tests pass…?] [sad dev 1]
poll audience, who’s had this problem?
bc of mistake in how container was configured to run
Bad entrypoint or command
Missing file mount
Port not exposed
weird edge case: Service jar inside container couldn’t start bc maven shade plugin left out a dependency due to setting minimizeJar to true
in this case, dev forgot to expose the port for the service in the container config
[explain drawing]
solution was to deploy service + container during IT exactly as it would be deployed for prod
We start the container itself exactly as how it will be started at deployment time. Less surprises
note that tests are running from outside the container not inside it
imagine a service that talks to cassandra, you want to write an integration test
cassandra in this case is a test dependency
you’ll inevitably run into this question:
[How do I install cassandra?]
poll audience
not that easy to do especially for someone who has never set it up before
googling, trial and error, banging head against the wall
just docker run!
[next]
Docker makes installing and starting Cassandra easy
Platform independent
poll audience
just stop and start a new instance of the container. Easy!
[Test more of the stack in an env resembling production]
if your service is running inside a container in prod, you are testing the container part works as well
Ensure configuration is correct
[Use containers to easily start real dependencies]
Datastores like cassandra and web servers like nginx are a hassle to install and start
Containers cut out the work of people figuring out how to install on different platforms
no more snowflake testing machines that no one remembers how to setup
Don’t have to run `docker` commands.
Can provide testing framework that lets users write code to start and stop dependencies
[reproducible and isolated]
to start with a fresh dependency, just drop the container and start a new one
special case: Can have mock data if you bake it into the image. It’s easy to reset it to the original mock data
I’ll walk through high level design of Helios
A helios job is a docker image with configuration for how to run it like ports, volumes, etc
[next]
(talk about it)
helios-testing resulted from what Spotify needed at various times
In the beginning, we were running one instance of a service per host. Wasteful but conscious choice since Spotify optimized for speed over costs.
[multitenancy] We wanted to be more efficient and save money
[docker] We wanted isolation between service instances. Saw Docker as a path to this goal.
[container orchestration] We needed this and wrote our own since open-source tools didn’t exist yet
Spotify optimizes for speed instead of cost-savings so multi tenancy goal never became top priority.
What did become a top priority was faster, more frequent, and automated releases.
As a result, we needed great test coverage that built confidence for frequent releases
Using Docker helped achieve CI/CD goal because of provided immutability of artifacts. Wanted hosts to be cattle not pets.
[container testing framework] Our devs needed a way to test their helios jobs
[let’s you write code to start and stop containers]
Take the example of our three problems
problem 1 was where port wasn’t exposed and this wasn’t caught in ITs bc we didn’t test the container
[helios testing starts the service in a container and then runs the test]
problem 2 was where it was hard to start a test dependency like cassandra on OS X
[helios-testing allows you to easily start the cassandra container, then the service (which doesn’t have to be in container), then run the test]
problem 3 was where it was hard to restore test dependency to clean state for next test
[just stop the cassandra container, then start a new instance, then run next test]
helios-testing is java bc most of company writes java
integrated with JUnit because we mostly use JUnit as test framework
These are real quotes from Spotify backend devs
[quote]
local testing much harder with complex deps. it just won't happen. helios-testing enabled devs to run tests on laptops before committing and submitting a PR that’d take up a slot in build queue
faster feedback, less resources used up
no prompting or asking leading questions
their off-the-top-of-their-head answers are really close to the key takeaways I spent days thinking how to phrase
[fast and reliable]
Networking between remote CI cluster and office networks or docker registry was sometimes flakey and slow
this frustrated our users by slowing them down
[interface simple as possible]
helios-testing is complex and we spent lots of time on support when users threw their hands in the air bc of complexity
Make it easy to troubleshoot
otherwise, you’ll spend a lot of time debugging people’s tests
Have clear and concise error messages
Easy to find logs
Relevant logs are those from Docker daemon and Docker containers
One idea for making them easily accessible: test framework collects relevant parts of those logs and organize them in an obvious place
[provide great test examples]
We had a template that included a very simple test for them, but they never added more
Provide images for commonly used dependencies
Don’t do these and you’ll incentivize people to delete their tests which is bad
[just read the bullet points to drill home the takeaways]
Disadvantages of container-based tests
tests take longer to run
More infrastructure has to be available, e.g. image registry
More layers to that can potentially break; and then devs have to debug these layers (more cognitive load)
What’s the division between non-container tests and container tests?
Container tests have more overhead
[Don’t test functionality unrelated to containers that you can easily test separately]
use them to test things that you couldn’t test with regular integration tests (e.g. realish deps)
Test that the container comes up and is configured correctly and can respond to a request that’s representative of what its main purpose
[When your container-based tests overlaps a lot with regular integration tests ]
Minimize redundancy between container tests and what could simply be covered by regular integration tests
Sangeeta Narayanan (a QCon organizer and Netflix engineering director)
Wesley Reisz (QCon Product Manager && Community Advocate)
Harry Brumleve (QCon committee member)