Why observability matters!
Jeroen Tjepkema
Product Director @ MeasureWorks
Observability:
Observability is a management
strategy focused on keeping the
most relevant, important and core
issues at or near the top of an
operations process flow.
What is performance?
Is this performance?
Uncertain waits
Uncertain waits...
Unfair waits...
http://www.nytimes.com/2012/08/19/opinion/sunday/why-waiting-in-line-is-torture.html?pagewanted=all&_r=0
Unexplained waits...
“Both of
fl
ine and online we associate
bad performance with
poor customer service”
Reachability Availability Reliability
Can I get
there?
Is it working?
Performance
Fast or Slow? Working
consistent?
4 pillars of web performance
Reachability Availability Reliability
Can I get
there?
Is it working?
Performance
Fast or Slow? Working
consistent?
when the delivery of quality is
noticeable but does but does not
influence the user’s preference of
one service over another.
This allows you a position to win
from you competitors.
4 pillars of web performance
Time neutralisation:
The reality is, things fail…
Performance Budgets Incident Alerts Service Impact
Setting thresholds?
Dashboards galore!
…did we miss something?
0
15
30
45
60
0 5 10 15 20 25 30 35 40 45 50 55 60
#
pageviews
Min/Hour
Typical downtime pattern
0
15
30
45
60
0 5 10 15 20 25 30 35 40 45 50 55 60
#
pageviews
Min/Hour
Regular operations
Typical downtime pattern
0
15
30
45
60
0 5 10 15 20 25 30 35 40 45 50 55 60
#
pageviews
Min/Hour
Regular operations Alerts?
Typical downtime pattern
0
15
30
45
60
0 5 10 15 20 25 30 35 40 45 50 55 60
#
pageviews
Min/Hour
Regular operations Alerts? Downtime
Typical downtime pattern
2 examples
Q: Can we
predict downtime?
0
15
30
45
60
0 5 10 15 20 25 30 35 40 45 50 55 60
#
pageviews
Min/Hour
Example downtime pattern
Regular operations Alerts? Downtime
This is where we want to be!
0
15
30
45
60
0 5 10 15 20 25 30 35 40 45 50 55 60
#
pageviews
Min/Hour
Example downtime pattern
Regular operations Alerts? Downtime
This is where we want to be!
News
c sites
for 4 months:
Behaviour patterns are a
leading indicator to alerts
0
15
30
45
60
0 5 10 15 20 25 30 35 40 45 50 55 60
#
pageviews
Min/Hour
Example downtime pattern
Time period in which we can detect a persistent
change in pattern per type of monitoring
Social Alerting
Change in click behavior
Performance
alerting
Remember:
85-90% of all errors are caused by
degradation in user experience…
Another way to look at it
Being
Late
Alarm
clock
Being
Late
Traf
fi
c
Not understanding what normal behavior
looks like makes us do stupid things...
Alarm
clock
Being
late
Traf
fi
c
Power
Disaster
Weather
Bills
Health
GPS
By measuring behaviour of a
system turns silos into noise and
turns causation into likelihood
Predict performance?
Understand what normal is
Enter: System Thinking
Enter: System Thinking
Enter: System Thinking
making sense of the complexity of your
environment by looking at it in terms of
whole rather than by splitting it down
into silos
41
Internet DB
Load
balancer
Web server App server
Digital
Touchpoint
Datacenter
Application Delivery Chain
Multi-vendor
Management
(Multi)
Cloud
providers
42
Internet DB
Load
balancer
Web server App server
Digital
Touchpoint
Datacenter
Application Delivery Chain
External data
providers
CDN
Delivery Quality
Multi-vendor
Management
(Multi)
Cloud
providers
43
Internet DB
Load
balancer
Web server App server
Digital
Touchpoint
Datacenter
3d parties
Application Delivery Chain
External data
providers
CDN
Delivery Quality
Marketing Efforts
Multi-vendor
Management
(Multi)
Cloud
providers
44
Internet DB
Load
balancer
Web server App server
Digital
Touchpoint
Datacenter
3d parties
Application Delivery Chain
External data
providers
CDN
Performance
Marketing
campaigns
Autonomous
growth
Delivery Quality
Marketing Efforts
Traffic
Multi-vendor
Management
(Multi)
Cloud
providers
45
Internet DB
Load
balancer
Web server App server
Digital
Touchpoint
Datacenter
3d parties
Application Delivery Chain
External data
providers
CDN
Performance
Marketing
campaigns
Autonomous
growth
Delivery Quality
Marketing Efforts
Traffic
Span of control?
At the same time we
want to speed up
cycle time…
Observability?
Observability:
Observability is a management
strategy focused on keeping the
most relevant, important and core
issues at or near the top of an
operations process flow.
System
Topology
Example: Application Topology
System
Telemetry
User Experience Infrastructure metrics
Quality
Gates
Automated Performance Test staging environment
Applied
AI
Error tracking
Implementing
observability will
become your
predictive
superpower...
Shift left, Shift Right?
Work smarter, faster…
Thanks! More questions?
M: jtjepkema@measureworks.nl
T: @jeroentjepkema
W: www.measureworks.nl
MeasureWorks - Performance Labs - Why Observability Matters!
MeasureWorks - Performance Labs - Why Observability Matters!

MeasureWorks - Performance Labs - Why Observability Matters!