Observability is the buzz these days. Everybody wants their code to explain to them what the issue is and how to fix it. But how do you get it right for your architecture, team structure and product maturity? I’ll share the path to observability at Snyk.io from day one to a team of 20 today.
2. Snyk is an open-source security company
We’re 3 years old, raised $32M total, engineering team of 26
SaaS offering on a NodeJS & Python microservices stack
Some context
3. My take on observability
Think about operating your
service
Care for all the lemmings
Care for the individual
lemming
4. With proper observability you have
Speed-of-light troubleshooting
Single source of truth for what happened in the system
Scientific approach to changes
5. How do we get there
Not cost-effective to start in a new code-base
Not cost-effective to start in a mature code-base
So… forever locked outside?
6. Logs to the rescue
Those write-once-when-debugging-then-forget strings
“Not sure what the problem is, added some logs, let’s see”
- Every developer, sometime in their professional lives
7. Step 0 - talk to your team
Is observability important to the team?
Does it fit your team’s methodology?
Definitely a team effort to get it right!
Our take - included in training, code reviews and oncall
8. Step 1 - where to keep your logs
Buy it if you can, build it if you must
Needs to serve end goal
Our angle - happy logz.io customers, pushing 15GB daily
9. Step 2 - start shipping your logs
11th of the 12 factors - don’t manage, just output
Choose a logging library
Adjust to indexing service
Our angle - fluentd daemonsets on a k8s cluster;
`bunyan` logging library with single-line
JSONs
10. Step 3 - structure your logs
Decide on a few rules to make your logs behave
Use a context object for varying parameters
Add a constant label to identify the logged action
Use logging level as part of context
Special treatment for errors
11. Step 3 - structure your logs
logger.info({
temperature: measurement.temperature,
duration: Date.now() - startTime,
params: request.params,
}, 'Completed temperature measurement');
12. Our take -
Standard logged keys match common objects
Logging at specific checkpoints and on response
Logging level matches HTTP status code (2xx, 4xx,
5xx)
Reverse lookup from log to line of code using log label
Error message is the failure, log label is the action
Step 3 - structure your logs
13. Prevent sensitive data - it will leak!
Protect from size overflow
Your log library will become standard in your code-bases
Our angle - sanitising auth tokens and emails (:wave: GDPR)
huge logged objects halted our
services with IO
Step 4 - protect your logs
14. 1 log per request
Collect ‘breadcrumbs’ during request handling
Log upon response with all collected context
Our angle - see https://github.com/snyk/koa2-bunyan-server
Step 5 - make logging easy
16. Skip logs when they carry little value
Sample logs with higher weight to errors
Constantly invest in team training and reviews
Share the joy with Customer Success and Sales Engineering
Our angle - training inside and outside of Engineering
Step 6 - watch out for scale
17. Align your team
Push logs to an external service
*Structure your logs* and sanitise them
Embed logging into your boilerplates
Reap the reward in how your team operates its software
Practical observability
Introduction
Practical approach - giving you tools to get on the observability train
Lemming analogy - each stage is a different service in your stack, each lemming is an incoming request
You want to ‘win’ - get over X% safely to exit / response
Troubleshooting - get to the culprit of any relevant event in the system
Single source of truth - if it happened, we’d seen it
Scientific approach - before pushing a change, ask how will you observe its impact
In a new code-base the value is low
In a mature code-base the investment is high
So how do we get to the promised land?
Logs have existed since ever, so how can they help in today’s complex world?
Let’s re-think logging - no longer the duct-tape of programming