There you are… an agile company building cloud native microservices, everything is automated and you are deploying multiple times a day. You think you ticked all the DevOps boxes but when does the money start flowing in?
Maybe it's because your users don't like the way your software works, maybe you don't actually have any users. How do you know? Traditional monitoring tools are dead. They can't give you the insights you need to see if your distributed system is still working or not.
"Observability driven development" is a way to focus on building observable systems that focus on seeing if your system is actually working and delivering value to your customers.
By being able to observe your production environment you are able to learn from it, experiment on it (in production) and in the end improve it. In this session we'll discuss what it means to create observable systems and how you can start adding observability to your own systems which help everyone from developers to product owners to make better decisions in adding value to their software.
4. SINCE WE ALL LOVE BEER
I BROUGHT SOME DUTCH BEERS!!
5. SINCE WE ALL LOVE BEER
I BROUGHT SOME DUTCH BEERS!!
FIND THIS LOGO DURING MY
PRESENTATION, TAKE A PICTURE,
TWEET IT
MENTION/FOLLOW @GEERTVDC
AND WIN BEER!
23. BUT I USE STAGING?
DOES STAGING HAVE REAL DATA?
DOES STAGING HAVE REAL USERS?
DOES STAGING REPRESENT PRODUCTION ENOUGH?
HOW MUCH TIME DO YOU SPEND ON STAGING?
24. WHAT IS KEY TO TESTING ON PROD?
OBSERVABILITY
@GEERTVDC
25. OBSERVABILITY
“OBSERVABILITY IS A MEASURE OF HOW
WELL INTERNAL STATES OF A SYSTEM CAN
BE INFERRED FROM KNOWLEDGE OF ITS
EXTERNAL OUTPUTS”
CONTROL THEORY
@GEERTVDC
26. WHAT IS THE DIFFERENCE
WITH MONITORING?
@GEERTVDC
55. METRICS
EXAMPLE: REQUEST DURATION
50 MILLISECONDS REQUEST IS 15 MILLISECONDS
HIGHER THAN AVERAGE
IN MECHELEN
ON FRIDAYS
PEOPLE WHO LIKE BEER
@GEERTVDC
69. EXPERIMENT IN PRODUCTION
public bool CanAccess(IUser user)
{
return Scientist.Science<bool>("widget-permissions", experiment =>
{
experiment.Use(() => IsCollaborator(user)); // old way
experiment.Try(() => HasAccess(user)); // new way
}); // returns the control value
}
SCIENTIST.NET
@GEERTVDC https://github.com/scientistproject/Scientist.net
74. PLAN DESIGN DEVELOP TEST DEPLOY OPERATE
TDD
WHAT IS THE USER IMPACT?
@GEERTVDC
OBSERVABILITY DRIVEN DEVELOPMENT
75. PLAN DESIGN DEVELOP TEST DEPLOY OPERATE
TDD
WHAT IS THE USER IMPACT?
IS THE FEATURE BEHAVING
LIKE WE EXPECTED?@GEERTVDC
OBSERVABILITY DRIVEN DEVELOPMENT
76. PLAN DESIGN DEVELOP TEST DEPLOY OPERATE
OBSERVABILITY DRIVEN DEVELOPMENT
TDD
WHAT IS THE USER IMPACT?
IS THE FEATURE BEHAVING
LIKE WE EXPECTED?
DEPLOYMENT FEEDBACK
@GEERTVDC
77. KNOWING HOW OUR SYSTEM
OPERATES SHOULD BE IN
OUR SYSTEM AS DEVELOPERS
WHAT IS NORMAL?
RELEASE GATES TO NEXT STAGE?
@GEERTVDC
84. SLI SERVICE LEVEL INDICATOR
QUANTITATIVE MEASURE FOR YOUR SERVICE
AVAILABILITY
ERROR RATE
DURATION
LATENCY
FRESHNESS
@GEERTVDC
85. SLO SERVICE LEVEL OBJECTIVE
TARGET MEASURE FOR A SERVICE
MEASURED BY SLIS
AVAILABILITY OF 99.9% FOR LAST 30 DAYS
@GEERTVDC
86. SLA SERVICE LEVEL AGREEMENT
CONTRACT WITH USERS WITH
CONSEQUENSES WHEN
MISSING YOUR SLO
10% DISCOUNT FOR EACH 0.1%
BELOW AVAILABILITY SLO
@GEERTVDC
88. HOW TO DO THIS IN PRACTICE?
DEFINE AN SLO
BUILD INDICATORS BY LOGGING / METRICS
BUILD A DASHBOARD – START MEASURING
MAKE CHOICES BASED ON SERVICE LEVEL
LEAVE SLA PART FOR SALES PEOPLE
93. MAKE IT VISIBLE
SLO
AVAILABILITY
99.9954%
RING 0
98%
RING 1
99.91%
RING 2
100%
USER SIGN UP FLOW – 100%
CHECKOUT – 99.91%
SEARCH – 98%
CLIENT A - USER SIGN UP FLOW – 100%
CLIENT A - CHECKOUT – 99.91%
CLIENT A - SEARCH – 90%
94. TAKEAWAYS
START SMALL AT KEY AREAS OF YOUR APP
EXPLORE TOOLS
EMBRACE TESTING ON PROD!
FOCUS ON CUSTOMERS
TAKE OWNERSHIP OF CODE
@GEERTVDC