Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Continuous testing at scale

315 views

Published on

0/5
You’ve built a new feature in your app that is ready ship. Or is it? How can you be sure you’ve not introduced regressions in cases you forgot to test? What if your code crashes only on certain devices? Could the feature freeze up for a few users?

Shipping frequently with little to no functional, UX or performance issues or regressions is not easy - but it’s also a problem that has been solved before. Where things get a lot more interesting is how to keep the same quality bar when you have hundreds of pull requests going in every day, with tens or hundreds of developers working on the same project? How do you test that your app still works - with this kind of scale?

In this talk, you’ll learn about the different approaches we combined into a system used by hundreds of mobile engineers at Uber to test our native iOS and Android apps during development, at release as well as when in production. We’ll talk about what did and what did not work for us on our journey of iterating frequently and continuously improving the quality bar.

Published in: Technology
  • Be the first to comment

Continuous testing at scale

  1. 1. Continuous Testing at Scale Gergely Orosz, Engineering Manager @GergelyOrosz 8 May 2018
  2. 2. ● Engineering manager @Uber, in Amsterdam ● 10+ years of software development (Skyscanner, Skype, JP Morgan alumni) ● Full-stack, iOS, Android, (Windows Phone) Introduction
  3. 3. War stories Trading systems Oil rig monitoring XBox One launch Uber apps rewrites Payment systems
  4. 4. A war story: Uber app rewrite
  5. 5. Testing is hard
  6. 6. Iterating is part of the journey
  7. 7. Why do we test? We test to ship no bugs.
  8. 8. Bug-free code of substance But at what cost?
  9. 9. Why do we test? To minimize the business impact of mistakes while maintaining good execution speed.
  10. 10. We will cover testing of mobile apps. Still, a lot of the concepts apply across the stack.
  11. 11. Crashes Functional Bugs UI Bugs We test, so we can avoid:
  12. 12. … at Scale … at Uber … Tools & a Framework Continuous testing...
  13. 13. … at Scale … at Uber … Tools & a Framework Continuous testing...
  14. 14. Initial Team
  15. 15. Team Growth
  16. 16. ● 600+ cities, 65+ countries, 6 continents ● 10 engineering offices (4x US, Amsterdam, Denmark, 2x India, Sofia, Vilnius) ● 18,000+ people, of which 2,500+ engineers & 400+ mobile engineers Some Uber facts
  17. 17. Hundreds of mobile engineers? Request a ride Fare split Cash Uber for Business Credit card rewards points Promotions Promotions Safety Over 10 ways to pay Scheduled rides Drive with Uber Uber Eats, Freight, Bike, Rental... Experimentation 65+ countries, 600+ cities Performance Cash Instant payments Maps & navigation uberPOOL Driver incentives App health Developer tools Networking Feed cards Driver experience Driver recognition Airport pickup Uber Family Beacon Campaigns Fraud EATS app Driver app Freight app Restaurants app Other apps Fleet app
  18. 18. What can “at scale” mean? ● More functionality ● More users & regions, locales ● More code ● More engineers ● More engineering offices & locations ● More automated testing ● More apps
  19. 19. ● More functionality ● More users & regions, locales ● More code ● More engineers ● More engineering offices & locations ● More automated testing ● More apps What does “at scale” mean? ● More bugs ● Smaller/local bugs have bigger impact ● Longer build times ● Communication overhead ● Developer systems need to work 24/7 ● Longer time to run tests ● The same problems repeating Problems
  20. 20. What does “at scale” mean?
  21. 21. … at Scale … at Uber … Framework Continuous testing...
  22. 22. A few things I found different @Uber compared to my previous experience: ● No formal QA role, testing teams or dedicated DevOps team ● Dedicated team(s) owning testing infrastructure & developer tooling ● More formal planning process ● No staging systems: test tenancies instead ● Blameless postmortem culture Engineering culture
  23. 23. Continuous testing process @Uber Write code & land to master Pre-release testing Ship to users
  24. 24. Continuous testing process @Uber Write code & land to master Pre-release testing Ship to users
  25. 25. Continuous Integration arc diff Phabricator diff Local validations Code reviewers ● Commit message validation (e.g. test plan, revert plan) ● Linting Herald rules Rules like: ● “If certain files are touched, add {certain people} as reviewers ● If the files added contain a certain phrase, add a comment to the diff Build results Do a build with: ● Linting ● Unit tests ● Static code analysis Create a pull request
  26. 26. Herald rules
  27. 27. Herald rule example
  28. 28. ● Our lint rules are extensive, evolved since the early years ● NEAL: our language agonistic linting platform (open sourced) Linting: a first class citizen
  29. 29. Continuous Integration arc diff Phabricator diff Local validations Lint, Build, Test Update the diff arc land “Merge to master” Code Repo Submit Queue Do a “full” build with: ● Linting ● Unit tests ● Static code analysis ● UI testsBuild Result Validation pass
  30. 30. Build speeds matter (even) more, as the team grows
  31. 31. Continuous testing process @Uber Write code & land to master Pre-release testing Ship to users
  32. 32. Continuous testing process @Uber Write code & land to master Pre-release testing Ship to users ● Local checks (linting) ● Continuous Integration (linting, unit tests, static analysis) ● Code review ● Safe merging to master (UI tests, SubmitQueue)
  33. 33. Continuous testing process @Uber Write code & land to master Pre-release testing Ship to users
  34. 34. Ready for production release. Merge code to master Release candidate ? master Build cut Automated tests Manual tests
  35. 35. Manual testing (sanity)
  36. 36. Manual testing (sanity)
  37. 37. Test tenancy Staging Production code (master) Test accounts Production accounts Production accounts Test accounts Test tenancy Production tenancy Staged rollout code (master) Staging & production systems Production system with test tenancy
  38. 38. Ready for production release. Merge code to master Release candidate master Build cut Automated tests Manual tests Dogfooding bugreports
  39. 39. Dogfooding
  40. 40. Dogfooding: sending bug reports Bug reporter tool Phabricator ticket Take screenshot Teams triage
  41. 41. Ready for production release. Merge code to master Release candidate master Build cut Automated tests Manual tests Dogfooding bugreports Crash reports
  42. 42. Ready for production release. Merge code to master Release candidate master Build cut Automated tests Manual tests Dogfooding bugreports Crash reports Localization ... Fix Hotfix
  43. 43. Build Train
  44. 44. Continuous testing process @Uber Write code & land to master Pre-release testing Ship to users ● Manual testing (sanity) ● Dogfooding ● Crash reports ● Build train
  45. 45. Continuous testing process @Uber Write code & land to master Pre-release testing Ship to users
  46. 46. Facts ● Bugs will be introduced that none of the previous tests catch ● With native apps ○ New builds can take days to ship due to the app store approval process ○ Users might not update their apps for a while. Conclusion ● Every change should be revertable, remotely. ● Let’s use backend-controlled feature flags Rolling out to production on mobile
  47. 47. Remote Bugfixing: Feature Flags
  48. 48. Rollout can be risky if the population is large & there is no monitoring. Staged rollout ● Control user exposure in early stages via a feature flag ● Monitor the impact on key business metrics at each stage Rolling out to production (not just) on mobile
  49. 49. Ready for production release. Staged rollout Monitor Rolled out Rolling out a new feature
  50. 50. Staged rollout monitoring for business impact: statistically significant differences
  51. 51. Monitoring: business events
  52. 52. Monitoring: performance
  53. 53. Continuous testing process @Uber Write code & land to master Pre-release testing Ship to users ● Staged rollout ● Monitoring & alerting ○ Crash reports ○ Business events ○ Performance
  54. 54. The mobile testing lifecycle Write code & land to master Pre-release testing Ship to users In production Build cut Release Staged rollout & monitoring Code & functional quality checks Functional & UX quality checks, hotfixes Are we done testing? Rolled out
  55. 55. Things will catch fire
  56. 56. The mobile testing lifecycle Write code & land to master Pre-release testing Ship to users In production Build cut Release Staged rollout & monitoring Code & functional quality checks Functional & UX quality checks, hotfixes Uh-oh... Monitor & triage issues/alerts
  57. 57. The mobile testing lifecycle Write code & land to master Pre-release testing Ship to users In production Build cut Release Staged rollout & monitoring Code & functional quality checks Functional & UX quality checks Outages Uh-oh... Monitor & triage issues/alerts
  58. 58. How can we make sure this does not happen again?
  59. 59. Blameless postmortems
  60. 60. The goal of a postmortem Understand the root cause in order to take action to prevent the same issue from impacting customers again.
  61. 61. The 5 whys
  62. 62. The mobile testing lifecycle Write code & land to master Pre-release testing Ship to users In production Requirements & planning Product & engineering spec, with testing plan Outages & postmortems Uh-oh... “We did not do proper planning.” “We did not test this edge case.” “We did not have a test plan.”
  63. 63. The mobile testing lifecycle @Uber Write code & land to master Pre-release testing Ship to users In production Requirements & planning Staged rollout & monitoring Code level quality checks Functional & UX quality checks Outages & postmortems Monitor & triage issues/alerts Spec & testing plan Build cut Release Rolled out
  64. 64. … at Scale … at Uber … Tools & a Framework Continuous testing...
  65. 65. What worked for us, will not (exactly) work for you.
  66. 66. Why do we test? To minimize the business impact of mistakes while maintaining good execution speed.
  67. 67. Continuous testing: tools Crashes Functional Bugs UI Bugs We test, so we can avoid:A few tools to detect / avoid:
  68. 68. Continuous testing toolset Crashes Functional Bugs UI Bugs ● Crash reports ● Crash report alerting ● Code reviews ● Unit testing ● UI testing ● Manual testing ● Dogfooding ● Staged rollout ● Manual testing ● Dogfooding ● Screenshot testing A few tools to detect / avoid:
  69. 69. Continuous testing toolset Crashes Functional Bugs UI Bugs A few tools to detect / avoid: ● Crash reports ● Crash report alerting ● Code reviews ● Unit testing ● UI testing ● Manual testing ● Dogfooding ● Staged rollout ● Manual testing ● Dogfooding ● Screenshot testing Other things impacting the business ● Business monitoring & alerting ● Performance testing / monitoring ● (Tools that might work for you)
  70. 70. Continuous testing toolset Crashes Functional Bugs UI Bugs A few off the shelf tools to detect / avoid: ● Crash reporting: Crashlytics ● Code reviews ○ Github ○ In-house: Phab ● CI ○ Travis CI / Bitrise ○ In-house: Jenkins ● Manual testing: crowdsourced platforms ● Screenshot testing ● UI testing ○ XCTest ○ Espresso Other things impacting the business ● Analytics: GA, Mixpanel ● In-house analytics: Kafka, Elastisearch & Grafana + ML ● Performance testing ○ XCode & Android studio profilers
  71. 71. A framework to think about testing
  72. 72. The Continuous Testing Pyramid Manual tests UI tests Unit Tests Dog fooding Blameless postmortems Code reviews Continuous integration Monitor Alert Triage Things going wrong for customers Team owning testing infrastructure To make all of this scale: Improve processes & systems All engineers All engineers All engineers All teams All employees All teams
  73. 73. Continuous testing at Scale Why do we test? To minimize the business impact of mistakes while maintaining good execution speed. As you scale, iterate on the tools you use, your team structure & processes to keep doing this.
  74. 74. Gergely Orosz Engineering Manager, Uber Amsterdam Thank you Open sourced tools for more efficient testing ● uber.github.io ● Language agonistic linting platform: NEAL ● Android ○ Nanoscope (tracing tool) ○ NullAway (static checks to avoid NullPointer exceptions) ○ OkBuck: use the buck build system on a gradle project @GergelyOrosz eng.uber.com
  75. 75. Proprietary and confidential © 2018 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber.

×