Software Testing
April 30, 2021 - Anyscale
Andrew Wang
andrew@umbrant.com
Twitter: @umbrant
Airtable Tech Lead, Storage and Caching
Scale AI First staff-level engineering hire
Cloudera Tech Lead, HDFS and ML platform
Berkeley CS PhD track, distributed systems
UVa B.S. Computer Science
About Me
2
War Stories
3
4
Therac-25, from Wikipedia
5
6
7
Mars Climate Orbiter, 1998
8
Personal experiences
● Incorrectly parsing an old version of a file format, producing an
erroneous empty result
● Under-calculating how much data to flush to disk
● Full site outage caused by a rogue query, followed by broad database
corruption from bad database restart procedure
● A “save” button that would almost always throw a 500
How to prevent software defects?
9
● Typechecker
● Static analysis
● Unit tests
● Integration tests
● System tests
● UI tests
● Manual tests
● Performance tests
● Canary tests
● ….and more!
A. Compiler error?
B. Unit test failure?
C. Manual QA issue?
D. Customer issue?
Write your answers in chat!
10
1 second
1 minute
2 hours
5-10 hours
Time to fix a...
11
Test Pyramid
https:/
/martinfowler.com/articles/practical-test-pyramid.html
Test Ice Cream Cone
12
https:/
/www.james-willett.com/the-evolution-of-the-testing-pyramid/
13
Test Trophy
https:/
/kentcdodds.com/blog/write-tests
General principles
● Most of your test coverage should be fast and easy to run
● Write automated tests
● Write tests with different granularity
14
Unit tests
● Most granular type of testing
● Testing a single function, class, or component
● Narrow scope makes it easy to identify and isolate bugs
● Run fast (1 second)
15
Integration tests
● Tests multiple components together
● Multiple threads, processes, DBs, filesystem, etc
● Run in 10-100 seconds
16
UI tests
● Golden age of frontend
development
● React is pretty testable
● Cypress is awesome
17
System tests
● Testing multiple services in a realistic environment
● Full end-to-end customer workflows
○ Create a resource
○ Use the resource
○ Delete it
● Tests things that are expensive or limited
○ Uses something that you only have one of
○ Calling external services
○ Expensive operations
18
Manual tests
● Most flexible but also most expensive and slowest
● Less necessary these days, because of great testing libraries
● Generally want to avoid if possible
● Exceptions
○ During development
○ When there’s a site incident
○ The functionality is rarely used
○ The setup overhead is just too high (for now)
19
Continuous Integration
● Test every change
● Run different tests at different times, based on cost/speed
● Detect and identify bugs as early as possible
20
Continuous Integration
21
Stage Additional Tests Run
Pre-commit Unit + integration tests
Post-commit UI tests
Nightly System tests
Staging Manual tests
Canary Live user testing
Continuous Integration
● The faster the test suite, the more often you can run it
● My rule of thumb: getting a cup of coffee ☕
● Run tests in parallel and distributed
○ https://www.umbrant.com/2016/08/25/distributed-testing/
○ 60x improvement for Hadoop’s test suite, 8.5 hours -> 8 minutes
● Testing can be 💰💰💰, but are generally worth it
○ $100s/mo per developer
22
Flaky tests
● Tests that spuriously fail x% of the time
● Can waste a lot of time triaging failures and retrying builds
● Kills trust in the test suite!
● Strategies
○ Temporarily disable flaky tests and fix with urgency
○ Make a dashboard of flaky rate per test
○ Track test flakiness over time to help bisect the suspect commit
23
Why do tests flake?
● Timing dependencies in multi-threaded applications
○ time.sleep() is a code smell
○ Use barriers/locks/condition variables instead
○ Use a FakeTicker class to advance system time
● Calling external services
○ Just don’t!
○ Spy your HTTP/RPC libraries to detect errant network calls
● Leaked global state
○ Run tests individually in isolation
○ Run tests in a deterministic random order
○ Don’t use statics
24
Why tests are a developer’s best friend
● Fast Develop -> Test -> Debug loop
● Demonstrates that the code works
● Acts as a contract for the behavior of the code
○ Prevents other people from breaking your code
● Lets you fearlessly refactor the codebase
○ Prevents you from breaking other people’s code
25
What we didn’t cover
● Code review
● Design review
● Deploy process
● Monitoring and alerting
● Feature flags
26
Takeaway
● Write tests
● Write automated tests
● Write different kinds of tests
● Run your tests often
● Make the test suite fast
27
Resources
● Martin Fowler’s site: https://martinfowler.com/testing/
● JUnit docs: https://junit.org/junit5/docs/current/user-guide/#writing-tests
● Google Testing Blog: https://testing.googleblog.com/
● Uber: Keeping master green at scale
https://eng.uber.com/research/keeping-master-green-at-scale/
● Cindy Sridharan: Testing in Production, the safe way
https://copyconstruct.medium.com/testing-in-production-the-safe-way-18ca10
2d0ef1
● Automating safe, hands-off deployments (AWS):
https://aws.amazon.com/builders-library/automating-safe-hands-off-deploym
ents/
28

Software Testing

  • 1.
    Software Testing April 30,2021 - Anyscale Andrew Wang andrew@umbrant.com Twitter: @umbrant
  • 2.
    Airtable Tech Lead,Storage and Caching Scale AI First staff-level engineering hire Cloudera Tech Lead, HDFS and ML platform Berkeley CS PhD track, distributed systems UVa B.S. Computer Science About Me 2
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
    8 Personal experiences ● Incorrectlyparsing an old version of a file format, producing an erroneous empty result ● Under-calculating how much data to flush to disk ● Full site outage caused by a rogue query, followed by broad database corruption from bad database restart procedure ● A “save” button that would almost always throw a 500
  • 9.
    How to preventsoftware defects? 9 ● Typechecker ● Static analysis ● Unit tests ● Integration tests ● System tests ● UI tests ● Manual tests ● Performance tests ● Canary tests ● ….and more!
  • 10.
    A. Compiler error? B.Unit test failure? C. Manual QA issue? D. Customer issue? Write your answers in chat! 10 1 second 1 minute 2 hours 5-10 hours Time to fix a...
  • 11.
  • 12.
    Test Ice CreamCone 12 https:/ /www.james-willett.com/the-evolution-of-the-testing-pyramid/
  • 13.
  • 14.
    General principles ● Mostof your test coverage should be fast and easy to run ● Write automated tests ● Write tests with different granularity 14
  • 15.
    Unit tests ● Mostgranular type of testing ● Testing a single function, class, or component ● Narrow scope makes it easy to identify and isolate bugs ● Run fast (1 second) 15
  • 16.
    Integration tests ● Testsmultiple components together ● Multiple threads, processes, DBs, filesystem, etc ● Run in 10-100 seconds 16
  • 17.
    UI tests ● Goldenage of frontend development ● React is pretty testable ● Cypress is awesome 17
  • 18.
    System tests ● Testingmultiple services in a realistic environment ● Full end-to-end customer workflows ○ Create a resource ○ Use the resource ○ Delete it ● Tests things that are expensive or limited ○ Uses something that you only have one of ○ Calling external services ○ Expensive operations 18
  • 19.
    Manual tests ● Mostflexible but also most expensive and slowest ● Less necessary these days, because of great testing libraries ● Generally want to avoid if possible ● Exceptions ○ During development ○ When there’s a site incident ○ The functionality is rarely used ○ The setup overhead is just too high (for now) 19
  • 20.
    Continuous Integration ● Testevery change ● Run different tests at different times, based on cost/speed ● Detect and identify bugs as early as possible 20
  • 21.
    Continuous Integration 21 Stage AdditionalTests Run Pre-commit Unit + integration tests Post-commit UI tests Nightly System tests Staging Manual tests Canary Live user testing
  • 22.
    Continuous Integration ● Thefaster the test suite, the more often you can run it ● My rule of thumb: getting a cup of coffee ☕ ● Run tests in parallel and distributed ○ https://www.umbrant.com/2016/08/25/distributed-testing/ ○ 60x improvement for Hadoop’s test suite, 8.5 hours -> 8 minutes ● Testing can be 💰💰💰, but are generally worth it ○ $100s/mo per developer 22
  • 23.
    Flaky tests ● Teststhat spuriously fail x% of the time ● Can waste a lot of time triaging failures and retrying builds ● Kills trust in the test suite! ● Strategies ○ Temporarily disable flaky tests and fix with urgency ○ Make a dashboard of flaky rate per test ○ Track test flakiness over time to help bisect the suspect commit 23
  • 24.
    Why do testsflake? ● Timing dependencies in multi-threaded applications ○ time.sleep() is a code smell ○ Use barriers/locks/condition variables instead ○ Use a FakeTicker class to advance system time ● Calling external services ○ Just don’t! ○ Spy your HTTP/RPC libraries to detect errant network calls ● Leaked global state ○ Run tests individually in isolation ○ Run tests in a deterministic random order ○ Don’t use statics 24
  • 25.
    Why tests area developer’s best friend ● Fast Develop -> Test -> Debug loop ● Demonstrates that the code works ● Acts as a contract for the behavior of the code ○ Prevents other people from breaking your code ● Lets you fearlessly refactor the codebase ○ Prevents you from breaking other people’s code 25
  • 26.
    What we didn’tcover ● Code review ● Design review ● Deploy process ● Monitoring and alerting ● Feature flags 26
  • 27.
    Takeaway ● Write tests ●Write automated tests ● Write different kinds of tests ● Run your tests often ● Make the test suite fast 27
  • 28.
    Resources ● Martin Fowler’ssite: https://martinfowler.com/testing/ ● JUnit docs: https://junit.org/junit5/docs/current/user-guide/#writing-tests ● Google Testing Blog: https://testing.googleblog.com/ ● Uber: Keeping master green at scale https://eng.uber.com/research/keeping-master-green-at-scale/ ● Cindy Sridharan: Testing in Production, the safe way https://copyconstruct.medium.com/testing-in-production-the-safe-way-18ca10 2d0ef1 ● Automating safe, hands-off deployments (AWS): https://aws.amazon.com/builders-library/automating-safe-hands-off-deploym ents/ 28