OSMC 2015 | Testing in Production by Devdas Bhagat
1. I have a test network. You may know it as
production.
– Andreas Thienemann
Testing
2. Asking the right questions
● Information is not a scarce resource, attention
is
– Herb Simon
3. Asking the right questions
● Information is not a scarce resource, attention
is
– Herb Simon
● What do you know?
4. Asking the right questions
● Information is not a scarce resource, attention
is
– Herb Simon
● What do you know?
● What do you not know?
5. Asking the right questions
● Information is not a scarce resource, attention
is
– Herb Simon
● What do you know?
● What do you not know?
● What do you not know that you do not know?
9. Fast Feedback Loops
● Test driven business
– It's like Test Driven
Design
● Enable IT to speak
the language of
business
10. Fast Feedback Loops
● Test driven business
– It's like Test Driven
Design
● Enable IT to speak
the language of
business
– Show me the data!
11. Fast Feedback Loops
● Test driven business
– It's like Test Driven
Design
● Enable IT to speak
the language of
business
– Show me the data!
– HIPPO
26. Testing vs Reality
● Stable environment
● No humans
● Low latency
● Highly unstable
● Humans
● Potentially high
latency
27. Testing vs Reality
● Stable environment
● No humans
● Low latency
● Not always the same
size of dataset
● Highly unstable
● Humans
● Potentially high
latency
● Large, ever changing
datasets
30. Users
● Humans do strange things
● Or sometimes make mistakes
● They come up with different requirements
31. Users
● Humans do strange things
● Or sometimes make mistakes
● They come up with different requirements
● They change the world your software works in
32. Risk management
● Approach 1:
– Scope your problems well
– Test a lot
– Release stable code
– Avoid changing a working system
33. Risk management
● Approach 2:
– Accept that you have an ill-defined problem
– Iterate rapidly
– Make a large number of small changes
– Build software to be able to isolate these changes
– Test them in the real world
– Keep only what works
34. What if When it breaks?
● Fix fast (maybe)
– “Do it right the first time” does not apply
● Business process for handling failure
– Hardware will eventually fail, software will
eventually work
36. The lifetime of code
● How long does your code live?
– Hours or days?
● This should be most code out there
37. The lifetime of code
● How long does your code live?
– Hours or days?
● This should be most code out there
– Months?
● A little code, often “libraries” with a single application
38. The lifetime of code
● How long does your code live?
– Hours or days?
● This should be most code out there
– Months?
● A little code, often “libraries” with a single application
– Years?
● Very little. Just core libraries
39. The lifetime of code
● How long does your code live?
– Hours or days?
● This should be most code out there
– Months?
● A little code, often “libraries” with a single application
– Years?
● Very little. Just core libraries
● It is safe to delete code, if you are using version
control
40. Event processing
● Generate information about software use and
changes in realtime
● For more information and tooling:
– https://www.quora.com/Are-there-any-open-source-
CEP-tools?share=1
– https://en.wikipedia.org/wiki/Complex_event_proces
sing
41. Event processing
● Generate information about software use and
changes in realtime
● For more information and tooling:
– https://www.quora.com/Are-there-any-open-source-
CEP-tools?share=1
– https://en.wikipedia.org/wiki/Complex_event_proces
sing
44. Monitoring/Alerting
● Process events to generate graphs
● Riemann is an excellent tool for generating
alerts from event streams
● Generate graphs as close to realtime as
possible
– Developers doing rollouts know that something else
is changing
45. Monitoring/Alerting
● Process events to generate graphs
● Riemann is an excellent tool for generating
alerts from event streams
● Generate graphs as close to realtime as
possible
– Developers doing rollouts know that something else
is changing
– Any major problems will be caught really quickly
46. Monitoring/Alerting
● Process events to generate graphs
● Riemann is an excellent tool for generating alerts
from event streams
● Generate graphs as close to realtime as possible
– Developers doing rollouts know that something else is
changing
– Any major problems will be caught really quickly
● Isolation of changes means that you can track
longer term effects of each change