QCon NYC 2016: Reach Production Faster with Containers in Testing


Published on


Spotify adopted container technology early on and built its own OSS framework for container orchestration called Helios. Not only do containers run many critical systems at Spotify, they also improve and accelerate development. We run containerized integration tests close to 400 times a day.

This talk covers how our Helios testing framework drives integration tests and spins up entirely self-contained environments during test runs. Developers can test services locally in an environment closely resembling the production stack; spin up dependent services like Cassandra, memcached, or even other containerized Spotify services; and even test their deployment and service discovery configurations.

Learn how this style of integration testing has increased our code quality and successful deployments.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • early days, tech infrastructure was one small team
    then Spotify grew [next]
    today: 100+M users, 30+M subscribers, 55+ countries
    needed more robust infra, tools, and shared services
    Infra team grew and split into different teams with different responsibilities
    [next] Has over 150 people today
    [next] Help backend devs maintain thousands of hosts
    We optimize for speed in iterating on product
    several years ago, we realized we needed to implement and improve CI/CD practices
    [next] today: over 1500 deploys last month, most were containers
  • There are very few prereqs for this talk
    bc concepts are broadly applicable to variety of tools and technologies
    Poll audience
    Raise your hand if you’re familiar with containers or Docker.
    Raise hand if you like tests

    [familiar with containers]
    [like tests]
    #3…[that’s it!] There’s no more pre-reqs
    [drawing] Tests I’m talking are high level and test overall behavior of a service
    can be considered more high level than ITs
  • Raise hand if you use containers in
    very important
    [You don’t even need to be using containers in prod for this talk to be useful]
  • this talk is from POV of devops team
    talk addresses “How can I create tools to make other devs’ jobs easier?”
  • [Why did my tests pass…?] [sad dev 1]
    poll audience, who’s had this problem?
    bc of mistake in how container was configured to run
    Bad entrypoint or command
    Missing file mount
    Port not exposed
    weird edge case: Service jar inside container couldn’t start bc maven shade plugin left out a dependency due to setting minimizeJar to true
  • in this case, dev forgot to expose the port for the service in the container config
    [explain drawing]
  • solution was to deploy service + container during IT exactly as it would be deployed for prod
    We start the container itself exactly as how it will be started at deployment time. Less surprises
    note that tests are running from outside the container not inside it
  • imagine a service that talks to cassandra, you want to write an integration test
    cassandra in this case is a test dependency
    you’ll inevitably run into this question:
    [How do I install cassandra?]
    poll audience
  • not that easy to do especially for someone who has never set it up before
    googling, trial and error, banging head against the wall
  • just docker run!
    Docker makes installing and starting Cassandra easy
    Platform independent
  • poll audience
  • just stop and start a new instance of the container. Easy!
  • [Test more of the stack in an env resembling production]
    if your service is running inside a container in prod, you are testing the container part works as well
    Ensure configuration is correct
    [Use containers to easily start real dependencies]
    Datastores like cassandra and web servers like nginx are a hassle to install and start
    Containers cut out the work of people figuring out how to install on different platforms
    no more snowflake testing machines that no one remembers how to setup
    Don’t have to run `docker` commands.
    Can provide testing framework that lets users write code to start and stop dependencies
    [reproducible and isolated]
    to start with a fresh dependency, just drop the container and start a new one
    special case: Can have mock data if you bake it into the image. It’s easy to reset it to the original mock data
  • I’ll walk through high level design of Helios
    A helios job is a docker image with configuration for how to run it like ports, volumes, etc
    (talk about it)
  • helios-testing resulted from what Spotify needed at various times
    In the beginning, we were running one instance of a service per host. Wasteful but conscious choice since Spotify optimized for speed over costs.
    [multitenancy] We wanted to be more efficient and save money
    [docker] We wanted isolation between service instances. Saw Docker as a path to this goal.
    [container orchestration] We needed this and wrote our own since open-source tools didn’t exist yet
    Spotify optimizes for speed instead of cost-savings so multi tenancy goal never became top priority.
    What did become a top priority was faster, more frequent, and automated releases.
    As a result, we needed great test coverage that built confidence for frequent releases
    Using Docker helped achieve CI/CD goal because of provided immutability of artifacts. Wanted hosts to be cattle not pets.
    [container testing framework] Our devs needed a way to test their helios jobs
  • [let’s you write code to start and stop containers]
    Take the example of our three problems
    problem 1 was where port wasn’t exposed and this wasn’t caught in ITs bc we didn’t test the container
    [helios testing starts the service in a container and then runs the test]
  • problem 2 was where it was hard to start a test dependency like cassandra on OS X
    [helios-testing allows you to easily start the cassandra container, then the service (which doesn’t have to be in container), then run the test]
  • problem 3 was where it was hard to restore test dependency to clean state for next test
    [just stop the cassandra container, then start a new instance, then run next test]
    helios-testing is java bc most of company writes java
    integrated with JUnit because we mostly use JUnit as test framework
  • These are real quotes from Spotify backend devs
  • [quote]
    local testing much harder with complex deps. it just won't happen. helios-testing enabled devs to run tests on laptops before committing and submitting a PR that’d take up a slot in build queue
    faster feedback, less resources used up
    no prompting or asking leading questions
    their off-the-top-of-their-head answers are really close to the key takeaways I spent days thinking how to phrase
  • [fast and reliable]
    Networking between remote CI cluster and office networks or docker registry was sometimes flakey and slow
    this frustrated our users by slowing them down
  • [interface simple as possible]
    helios-testing is complex and we spent lots of time on support when users threw their hands in the air bc of complexity
    Make it easy to troubleshoot
    otherwise, you’ll spend a lot of time debugging people’s tests
    Have clear and concise error messages
    Easy to find logs
    Relevant logs are those from Docker daemon and Docker containers
    One idea for making them easily accessible: test framework collects relevant parts of those logs and organize them in an obvious place
  • [provide great test examples]
    We had a template that included a very simple test for them, but they never added more
    Provide images for commonly used dependencies
    Don’t do these and you’ll incentivize people to delete their tests which is bad
  • [just read the bullet points to drill home the takeaways]
  • Disadvantages of container-based tests
    tests take longer to run
    More infrastructure has to be available, e.g. image registry
    More layers to that can potentially break; and then devs have to debug these layers (more cognitive load)
    What’s the division between non-container tests and container tests?
    Container tests have more overhead
    [Don’t test functionality unrelated to containers that you can easily test separately]
    use them to test things that you couldn’t test with regular integration tests (e.g. realish deps)
    Test that the container comes up and is configured correctly and can respond to a request that’s representative of what its main purpose
    [When your container-based tests overlaps a lot with regular integration tests ]
    Minimize redundancy between container tests and what could simply be covered by regular integration tests
  • Sangeeta Narayanan (a QCon organizer and Netflix engineering director)
    Wesley Reisz (QCon Product Manager && Community Advocate)
    Harry Brumleve (QCon committee member)
  • QCon NYC 2016: Reach Production Faster with Containers in Testing

    1. 1. Reach Production Faster with Containers in Testing David Xia
    2. 2. Spotify’s Scale • 150+ people in infrastructure • Thousands of hosts, 2000+ running containers • 1500+ deploys in past month, majority were containers
    3. 3. About David Xia • Work on deployment infrastructure • Work on open-source Docker orchestration tool Helios
    4. 4. Prerequisites for This Talk • You’re familiar with containers • You like tests • That’s it!
    5. 5. Prerequisites for This Talk You don’t need to use containers in production for this talk to be useful!
    6. 6. Three Problems, Three Ways to Solve with Containers How can I enable developers to: • catch container misconfiguration in tests? • easily install and start non-trivial test dependencies? • make their tests isolated and reproducible?
    7. 7. “Why did my service pass integration tests but fail when I deployed it as a container?” - Sad Developer 1 Problem 1: Container Misconfiguration
    8. 8. Problem 1: Container Misconfiguration
    9. 9. Solution 1: Container Misconfiguration
    10. 10. “I want to run my project's integration tests locally. The tests need a local Cassandra/other DB. How do I set everything up?” - Sad Developer 2 Problem 2: Non-trivial dependencies
    11. 11. Problem 2: Non-trivial dependencies
    12. 12. docker run --name foo -d cassandra - Happy Developer 2 Solution 2: Non-trivial dependencies
    13. 13. Solution 2: Non-trivial dependencies
    14. 14. “How can I easily restore my test dependencies to a clean state?” - Sad Developer 3 Problem 3: Reproducible Tests
    15. 15. docker stop <container ID> docker run --name foo -d cassandra - Happy Developer 3 Solution 3: Reproducible Tests
    16. 16. Key Takeaways - Using Containers in Tests Can Help You: • Test more of the stack in an env resembling production • Easily start real dependencies • Ensure tests are reproducible and isolated
    17. 17. Helios in a Nutshell
    18. 18. How helios-testing was born • multitenancy • docker • container orchestration (helios) • container testing framework (helios-testing)
    19. 19. What does helios-testing do? Let’s you write code to start and stop containers.
    20. 20. What does helios-testing do? Let’s you write code to start and stop containers.
    21. 21. What does helios-testing do? Let’s you write code to start and stop containers.
    22. 22. Demo! Solution 1: Container Configuration https://www.youtube.com/watch?v=iWtTFI9zDfk
    23. 23. Demo! Solution 2: Isolated Tests https://www.youtube.com/watch?v=GInAJSMd9cs
    24. 24. “Testing with a Cassandra container is the closest I can get to testing against Cassandra in reality.” Successes
    25. 25. “I especially like the fact that I can test my image in its final state and be confident that it will work in production.” Successes
    26. 26. “Using helios-testing to run datastores in containers has made the tests portable and setup free (by setup I mean no manual installation of the datastore on the test machine or locally).” Successes
    27. 27. Make sure your testing framework and infrastructure are fast and reliable. Lessons Learned
    28. 28. Make framework’s interface and implementation as simple as possible. Lessons Learned
    29. 29. Provide great test examples. Lessons Learned
    30. 30. Key Takeaways - Using Containers in Tests Can Help You: • Test more of the stack in an env resembling production • Easily start real dependencies • Ensure tests are reproducible and isolated
    31. 31. When Not to Use Containers in Tests • Don’t test functionality unrelated to containers that you can easily test separately • When your container-based tests overlaps a lot with regular integration tests
    32. 32. Acknowledgements Rohan Singh Matt Brown Staffan Gimåker Mats Linander Nic Cope
    33. 33. Q&A @davidxia_ github.com/davidxia github.com/spotify/helios github.com/davidxia/qcon-demo Demo videos: example 1 and 2 helios-testing