Automatic Testing of Microservices at Twitter with Diffy

Diffy
Automatic Testing of Microservices @Twitter
Puneet Khanduri, Arun Kejariwal
(@pzdk, @arun_kejariwal)
1

Oct 8, 2014
Twitter, Inc. Down 2% Due To Broken Signup
2

Oct 8, 2014
Twitter, Inc. NOT Down 2% Due To NOT Broken Signup
3

“I just refactored a critical part of my
service. How do I know I didn’t break
anything?”
- Every Service Developer @ Twitter
4

“They just refactored a critical part of
their service. How do I know they didn’t
break anything?”
- Every Site Reliability Engineer @ Twitter
5

Tier #0
Unit Tests
Cost
Writing good tests takes
1.5x development time
Limited Scope
Testing classes/methods
in isolation
High coverage per test
Example: A method has 5
independent code paths
1 unit test => 20% coverage
Tier#0 - Unit Tests
Cost
Writing good tests takes ~1.5x of development time
Limited Scope
Testing classes/methods in isolation
High Coverage % per Test
e.g. A method has 5 independent code paths
=> 1 test yields 20% coverage
6

Tier #1
Component Tests
Cost
Same as Unit Tests
Limited Scope
Testing classes/methods
in isolation
Low coverage per test
Cyclomatic complexity is
O(kn
) - impractical to
target 100%
Handpicked test cases
Tier#1 - Component Tests
Testing a service in isolation with a fully mocked environment.
Cost of a single test
Same as unit tests
Low Coverage% per test
Cyclomatic complexity is O(k^n) - impractical to target
100%
Handpicked test cases
e.g. A request path has 6 methods with 5 paths per method
=> 1 test = 0.03% coverage
7

Tier #1
Component Tests
t.
Request path with 6 methods and
5 paths per method
1 test => 0.03% coverage
8

Tier #2
Integration Tests
Cost
Same as Unit Tests
+ Amortized cost of a staging environment
Negligible coverage per test
Much less than component tests
A request path has 4 services, 6 methods/
service, 5 paths/methods
Testing a service and its downstream dependencies in a real (staging)
environment
9

Emerging pattern
Super exponential cost of coverage
emerging pattern ...
uper exponential cost of coverage
10

Diffy Approach
Higher coverage for free
11

Diﬀy Approach
Free test inputs
Sample production traffic or whatever traffic
source you prefer
Free assertions
Use “known good” versions of your code to
generate assertions
12

What about the noise?
Server generated timestamps
Random number generators
Downstream non-determinism
Race conditions
13

Diﬀy Topology
iffy Topology
diffy
secondary
candidate
primary
raw
differences
non-deterministic noise
filtered
differences
sampled
production
traffic
14

Automation
Compare latest in master against last deploy to production
Automatically deploy master as candidate
Automatically deploy prod tag as primary and secondary
16

Automation (contd.)
Reporting
Diffy e-mails a report with highlighted critical endpoints and fields
Sample requests and response available for further analysis
17

Performance Regression
Why is it challenging?
Software
New release
Hardware performance
Uncontrolled parameter
Makes robust analysis challenging
Large variability across nodes
19

Performance Regression: Diﬀy Approach
Observation
All target service instances see identical load
Key Idea
Discover all performance metrics (thousands of time series)
Compare reference instances to test instances
Report metrics with significant deviations
20

Performance Regression (contd.)
Visual analysis: Error prone
False&nega)ve&
21

Common Statistical Methods
Welch’s t-Test
Two sample test
H0: Means of two populations are equal
22

Common Statistical Methods (contd.)
F-Test
H0: Means of a set of populations are equal
Two groups
F = t2, where t is Student’s statistic
Assumptions
Normally distributed populations [1]
Equal variance (Homoscedastic)
Independent samples
[1]
“Power
Func/on
of
the
F-‐Test
Under
Non-‐Normal
Situa/ons”,
by
M.
L.
Tiku.
In
Journal
of
the
American
Sta2s2cal
Associa2on,
Vol.
66,
No.
336
(Dec.,
1971),
pp.
913-‐916. 23

Similarity based
Match count
Longest subsequence based
Clustering
k-Means, phased k-Means
EM
Dynamic clustering
k-Mediods
Single linkage clustering
PCA, SVM
24
Other Previous Work
Common Statistical Methods (contd.)

Diﬀy Performance TopologyDiffy-Performance Topology
diffy
reference
cluster
test cluster
sampled
production
traffic
classifier
PASSED
IGNORED
FAILED
25

Classiﬁers
Sample count
Minimum number of samples
Relative Threshold
Variance within reference vs. distance between reference and test
Absolute Threshold
Distance between reference and test vs. median of reference
26

Classiﬁers (contd.)
MAD
Median Absolute Deviation
Robust Statistic
27

Classiﬁers (contd.)
Ensemble of Composable Classifiers
val classifier = {
SampleCountClassifier(40) and (
RelativeThresholdClassifier(50, 0.1) or
AbsoluteThresholdClassifier(50, 0.1) or
MadClassifier
)
}
28

Open Source (@diﬀyproject)
Github
https://github.com/twitter/diffy
Blog
https://blog.twitter.com/2015/diffy-testing-services-without-writing-tests
30

Automatic Testing of Microservices at Twitter with Diffy

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Automatic Testing of Microservices at Twitter with Diffy

Similar to Automatic Testing of Microservices at Twitter with Diffy (20)

Recently uploaded

Recently uploaded (20)

Automatic Testing of Microservices at Twitter with Diffy