3. @a_bangser
Why learn about observability?
Groundwork for many tools & techniques:
● Sustainable on-call
● Chaos engineering
● Testing in production
● Progressive rollouts
6. @a_bangser
Grounding the definition in capabilities
* https://lightstep.com/observability/
Observability helps you “understand your entire system
and how it fits together, and then use that information to
discover what specifically you should care about when it’s
most important.”*
7. @a_bangser
Grounding the definition in capabilities
* https://lightstep.com/observability/
Observability helps you “understand your entire system
and how it fits together, and then use that information to
discover what specifically you should care about when it’s
most important.”*
Observability is access to
telemetry (data) that is
both
relevant and explorable
8. @a_bangser
Our journey today
✓ Define observability
✓ Techniques that rely on observability
➔ Observability in testing today and the future
➔ Current data structures and pitfalls
➔ Where to focus our investment now
14. @a_bangser
Persona based test charters are
speculative
https://cdn.pixabay.com/photo/2012/04/28/17/11/people-43575__340.png
15. @a_bangser
So we extend into data driven testing
SELECT MIN(column_name)
FROM table_name
WHERE condition;
https://stackoverflow.com/a/50507519/2035223
17. @a_bangser
Our journey today
✓ Define observability
✓ Techniques that rely on observability
✓ Observability in testing today and the future
➔ Current data structures and pitfalls
➔ Where to focus our investment now
18. @a_bangser
Revisiting our definition of Observability
* https://lightstep.com/observability/
Observability helps you “understand your entire system
and how it fits together, and then use that information to
discover what specifically you should care about when it’s
most important.”*
Observability is access to
telemetry (data) that is
both
relevant and explorable
19. @a_bangser
These are sometimes referred to as the
“3 pillars of observability”
https://images.contentstack.io/v3/assets/bltefdd0b53724fa2ce/bltf85be52d51892228/5c98d45f8e3cc6505f19f678/three-pillars-of-observability-logs-metrics-tracs-apm.png
20. @a_bangser
Quick recap on logs
https://images.contentstack.io/v3/assets/bltefdd0b53724fa2ce/bltf85be52d51892228/5c98d45f8e3cc6505f19f678/three-pillars-of-observability-logs-metrics-tracs-apm.png
Logs
23. @a_bangser
Log recap
Strengths:
+ Very detailed insights
+ Provides a clear order of
operation
Weaknesses:
‐ No built in relationship to a
user’s goals
‐ Relies on a schema so adding
new data can be difficult
‐ Expensive to store
‐ Privacy risks for certain data
24. @a_bangser
Deep dive on metrics
https://images.contentstack.io/v3/assets/bltefdd0b53724fa2ce/bltf85be52d51892228/5c98d45f8e3cc6505f19f678/three-pillars-of-observability-logs-metrics-tracs-apm.png
Metrics
25. @a_bangser
Metrics provide a story over time
Metrics
https://medium.com/@srpillai/deploying-prometheus-in-kubernetes-with-persistent-disk-or-configmap-1f47e1a34a2e
27. @a_bangser
How histograms are stored in a time series DB
le=
1k
http_requests_duration_microseconds
le=
250k
le=
500k
le=
1M
le=
5M
le=
+inf
* `le` stands for “less than or equal to”
Metrics
28. @a_bangser
How histograms gets generated in a time
series DB
http_requests_duration_microseconds
www.website.com in 0.25 seconds
Metrics
* `le` stands for “less than or equal to”
le=
1k
le=
250k
le=
500k
le=
1M
le=
5M
le=
+inf
37. @a_bangser
Metrics Recap
Metrics
Strengths:
+ Cheap to gather & store
+ High level view over
long periods of time
+ Discrete numbers make
for easy math
Weaknesses:
‐ Requires additional tools
to debug
‐ Aggregated data
‐ Requires pre-determined
questions
38. @a_bangser
Quick recap on traces
https://images.contentstack.io/v3/assets/bltefdd0b53724fa2ce/bltf85be52d51892228/5c98d45f8e3cc6505f19f678/three-pillars-of-observability-logs-metrics-tracs-apm.png
Traces
(APM)
39. @a_bangser
Tracing answers “where” based questions
https://monzo.com/blog/we-built-network-isolation-for-1-500-services
Each dot is one of
1,500 services
Each line is one
possible network call
Each colour is a
different team
40. @a_bangser
Tracing is a call stack for a distributed
system
https://medium.com/opentracing/take-opentracing-for-a-hotrod-ride-f6e3141f7941
41. @a_bangser
What services, in what order, and for how
long
https://medium.com/opentracing/take-opentracing-for-a-hotrod-ride-f6e3141f7941
43. @a_bangser
Recap on the current “3 pillar” approach
https://images.contentstack.io/v3/assets/bltefdd0b53724fa2ce/bltf85be52d51892228/5c98d45f8e3cc6505f19f678/three-pillars-of-observability-logs-metrics-tracs-apm.png
Traces
(APM)
MetricsLogs
Strengths:
+ Can support long term tracking
and in the moment debugging
Weaknesses:
‐ Stores data in 3 different ways ($$$)
‐ Requires 3 different query languages
‐ Depends on knowing our questions
upfront
44. @a_bangser
The 3 pillars are better suited to
Monitoring
https://images.contentstack.io/v3/assets/bltefdd0b53724fa2ce/bltf85be52d51892228/5c98d45f8e3cc6505f19f678/three-pillars-of-observability-logs-metrics-tracs-apm.png
Monitoring
46. @a_bangser
Our journey today
✓ Define observability
✓ Techniques that rely on observability
✓ Observability in testing and quality today
✓ What observability in testing and quality can be
✓ Current data structures and pitfalls
➔ Where to focus our investment now
47. @a_bangser
So let’s talk the future
https://images.contentstack.io/v3/assets/bltefdd0b53724fa2ce/bltf85be52d51892228/5c98d45f8e3cc6505f19f678/three-pillars-of-observability-logs-metrics-tracs-apm.png
Events
59. @a_bangser
Explorability with the data in one place
● Key numbers like latency and count
● Key variables like IDs
● Correlation between services
60. @a_bangser
Explorability with the data in one place
Strengths:
+ All visualisations can
be derived including
timelines and big
picture view
+ Full context of a users
request
Weaknesses:
‐ Requires investment from
engineers who know the
app!
‐ Expensive to store
‐ Privacy risks for certain
data
61. @a_bangser
There is work to do away from code too!
➔ Drive the virtuous cycle of deep
domain knowledge supporting better
data collection.
62. @a_bangser
There is work to do away from code too!
➔ Drive the virtuous cycle of deep
domain knowledge supporting better
data collection.
➔ Encourage the use of observability as
a part of feature validation.
63. @a_bangser
➔ Drive the virtuous cycle of deep
domain knowledge supporting better
data collection.
➔ Encourage the use of observability as
a part of feature validation.
➔ Keep asking high value questions and
don’t settle until tools & data can
answer them!
There is work to do away from code too!