Observability, the practical approach - Anton Drukh - DevOpsDays Tel Aviv 2018

•Download as PPTX, PDF•

0 likes•54 views

Observability is the buzz these days. Everybody wants their code to explain to them what the issue is and how to fix it. But how do you get it right for your architecture, team structure and product maturity? I’ll share the path to observability at Snyk.io from day one to a team of 20 today.

Technology

Observability
The Practical Approach
Anton Drukh
VP Engineering, Snyk
DevOpsDays Tel Aviv 2018

Snyk is an open-source security company
We’re 3 years old, raised $32M total, engineering team of 26
SaaS offering on a NodeJS & Python microservices stack
Some context

My take on observability
Think about operating your
service
Care for all the lemmings
Care for the individual
lemming

With proper observability you have
Speed-of-light troubleshooting
Single source of truth for what happened in the system
Scientific approach to changes

How do we get there
Not cost-effective to start in a new code-base
Not cost-effective to start in a mature code-base
So… forever locked outside?

Logs to the rescue
Those write-once-when-debugging-then-forget strings
“Not sure what the problem is, added some logs, let’s see”
- Every developer, sometime in their professional lives

Step 0 - talk to your team
Is observability important to the team?
Does it fit your team’s methodology?
Definitely a team effort to get it right!
Our take - included in training, code reviews and oncall

Step 1 - where to keep your logs
Buy it if you can, build it if you must
Needs to serve end goal
Our angle - happy logz.io customers, pushing 15GB daily

Step 2 - start shipping your logs
11th of the 12 factors - don’t manage, just output
Choose a logging library
Adjust to indexing service
Our angle - fluentd daemonsets on a k8s cluster;
`bunyan` logging library with single-line
JSONs

Step 3 - structure your logs
Decide on a few rules to make your logs behave
Use a context object for varying parameters
Add a constant label to identify the logged action
Use logging level as part of context
Special treatment for errors

Step 3 - structure your logs
logger.info({
temperature: measurement.temperature,
duration: Date.now() - startTime,
params: request.params,
}, 'Completed temperature measurement');

Our take -
Standard logged keys match common objects
Logging at specific checkpoints and on response
Logging level matches HTTP status code (2xx, 4xx,
5xx)
Reverse lookup from log to line of code using log label
Error message is the failure, log label is the action
Step 3 - structure your logs

Prevent sensitive data - it will leak!
Protect from size overflow
Your log library will become standard in your code-bases
Our angle - sanitising auth tokens and emails (:wave: GDPR)
huge logged objects halted our
services with IO
Step 4 - protect your logs

1 log per request
Collect ‘breadcrumbs’ during request handling
Log upon response with all collected context
Our angle - see https://github.com/snyk/koa2-bunyan-server
Step 5 - make logging easy

let logFunc = log.info;
const start = Date.now();
try {
await next();
} catch (error) {
logFunc = error.code < 500 ? log.warn :
log.error;
req.logContext.error = error;
} finally {
req.logContext.duration = Date.now() - start;
logFunc(req.logContext, 'Reply sent');
}
Step 5 - make logging easy

Skip logs when they carry little value
Sample logs with higher weight to errors
Constantly invest in team training and reviews
Share the joy with Customer Success and Sales Engineering
Our angle - training inside and outside of Engineering
Step 6 - watch out for scale

Align your team
Push logs to an external service
*Structure your logs* and sanitise them
Embed logging into your boilerplates
Reap the reward in how your team operates its software
Practical observability

Similar to Observability, the practical approach - Anton Drukh - DevOpsDays Tel Aviv 2018

NIST 800-92 Log Management Guide in the Real WorldAnton Chuvakin

Designing and Implementing Effective Logging StrategiesAndreaCapolei1

Log AnalysisNSConclave

Advanced Security Automation Made SimpleMark Nunnikhoven

Case Study Design Pattern - Object AdapterAdrian Seungjin Lee

Setting Up Sumo Logic - Apr 2017Sumo Logic

Logging "BrainBox" Short ArticleAnton Chuvakin

Using Sumo Logic - Apr 2018Sumo Logic

Sumo Logic Cert Jam - AdministrationSumo Logic

Refactoring ASP.NET and beyondDotNetMarche

OOP - Basing Software Development on Reusable 17090AshikurRahman

Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemAccumulo Summit

Dev buchan 30 proven tipsBill Buchan

Application Logging Good Bad Ugly ... Beautiful?Anton Chuvakin

How to bring down your own RTC platform. Sandro GauciAlan Quayle

TADSummit 2022 - How to bring your own RTC platform downSandro Gauci

Redis Streams - Fiverr Tech5 meetupItamar Haber

Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic

Silicon Valley Code Camp 2014 - Advanced MongoDBDaniel Coupal

How to not fail at security data analytics (by CxOSidekick)Dinis Cruz

Similar to Observability, the practical approach - Anton Drukh - DevOpsDays Tel Aviv 2018 (20)

NIST 800-92 Log Management Guide in the Real World

Designing and Implementing Effective Logging Strategies

Log Analysis

Advanced Security Automation Made Simple

Case Study Design Pattern - Object Adapter

Setting Up Sumo Logic - Apr 2017

Logging "BrainBox" Short Article

Using Sumo Logic - Apr 2018

Sumo Logic Cert Jam - Administration

Refactoring ASP.NET and beyond

OOP - Basing Software Development on Reusable

Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System

Dev buchan 30 proven tips

Application Logging Good Bad Ugly ... Beautiful?

How to bring down your own RTC platform. Sandro Gauci

TADSummit 2022 - How to bring your own RTC platform down

Redis Streams - Fiverr Tech5 meetup

Sumo Logic QuickStart Webinar - Jan 2016

Silicon Valley Code Camp 2014 - Advanced MongoDB

How to not fail at security data analytics (by CxOSidekick)

Recently uploaded

MINDCTI Revenue Release Quarter One 2024MIND CTI

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

Exploring Multimodal Embeddings with MilvusZilliz

How to Check CNIC Information Online with Pakdata cfdanishmna97

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub

Platformless Horizons for Digital AdaptabilityWSO2

WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2

Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc

Simplifying Mobile A11y Presentation.pptxMarkSteadman7

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea

Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021

Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

API Governance and Monetization - The evolution of API governanceWSO2

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline

Understanding the FAA Part 107 License ..Christopher Logan Kennedy

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Exploring Multimodal Embeddings with Milvus

How to Check CNIC Information Online with Pakdata cf

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Platformless Horizons for Digital Adaptability

WSO2's API Vision: Unifying Control, Empowering Developers

Introduction to Multilingual Retrieval Augmented Generation (RAG)

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...

Simplifying Mobile A11y Presentation.pptx

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Six Myths about Ontologies: The Basics of Formal Ontology

Decarbonising Commercial Real Estate: The Role of Operational Performance

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

API Governance and Monetization - The evolution of API governance

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....

Understanding the FAA Part 107 License ..

Observability, the practical approach - Anton Drukh - DevOpsDays Tel Aviv 2018

1. Observability The Practical Approach Anton Drukh VP Engineering, Snyk DevOpsDays Tel Aviv 2018

2. Snyk is an open-source security company We’re 3 years old, raised $32M total, engineering team of 26 SaaS offering on a NodeJS & Python microservices stack Some context

3. My take on observability Think about operating your service Care for all the lemmings Care for the individual lemming

4. With proper observability you have Speed-of-light troubleshooting Single source of truth for what happened in the system Scientific approach to changes

5. How do we get there Not cost-effective to start in a new code-base Not cost-effective to start in a mature code-base So… forever locked outside?

6. Logs to the rescue Those write-once-when-debugging-then-forget strings “Not sure what the problem is, added some logs, let’s see” - Every developer, sometime in their professional lives

7. Step 0 - talk to your team Is observability important to the team? Does it fit your team’s methodology? Definitely a team effort to get it right! Our take - included in training, code reviews and oncall

8. Step 1 - where to keep your logs Buy it if you can, build it if you must Needs to serve end goal Our angle - happy logz.io customers, pushing 15GB daily

9. Step 2 - start shipping your logs 11th of the 12 factors - don’t manage, just output Choose a logging library Adjust to indexing service Our angle - fluentd daemonsets on a k8s cluster; `bunyan` logging library with single-line JSONs

10. Step 3 - structure your logs Decide on a few rules to make your logs behave Use a context object for varying parameters Add a constant label to identify the logged action Use logging level as part of context Special treatment for errors

11. Step 3 - structure your logs logger.info({ temperature: measurement.temperature, duration: Date.now() - startTime, params: request.params, }, 'Completed temperature measurement');

12. Our take - Standard logged keys match common objects Logging at specific checkpoints and on response Logging level matches HTTP status code (2xx, 4xx, 5xx) Reverse lookup from log to line of code using log label Error message is the failure, log label is the action Step 3 - structure your logs

13. Prevent sensitive data - it will leak! Protect from size overflow Your log library will become standard in your code-bases Our angle - sanitising auth tokens and emails (:wave: GDPR) huge logged objects halted our services with IO Step 4 - protect your logs

14. 1 log per request Collect ‘breadcrumbs’ during request handling Log upon response with all collected context Our angle - see https://github.com/snyk/koa2-bunyan-server Step 5 - make logging easy

15. let logFunc = log.info; const start = Date.now(); try { await next(); } catch (error) { logFunc = error.code < 500 ? log.warn : log.error; req.logContext.error = error; } finally { req.logContext.duration = Date.now() - start; logFunc(req.logContext, 'Reply sent'); } Step 5 - make logging easy

16. Skip logs when they carry little value Sample logs with higher weight to errors Constantly invest in team training and reviews Share the joy with Customer Success and Sales Engineering Our angle - training inside and outside of Engineering Step 6 - watch out for scale

17. Align your team Push logs to an external service *Structure your logs* and sanitise them Embed logging into your boilerplates Reap the reward in how your team operates its software Practical observability

18. Time for some live demos

19. Thank you!

Editor's Notes

Introduction Practical approach - giving you tools to get on the observability train
Lemming analogy - each stage is a different service in your stack, each lemming is an incoming request You want to ‘win’ - get over X% safely to exit / response
Troubleshooting - get to the culprit of any relevant event in the system Single source of truth - if it happened, we’d seen it Scientific approach - before pushing a change, ask how will you observe its impact
In a new code-base the value is low In a mature code-base the investment is high So how do we get to the promised land?
Logs have existed since ever, so how can they help in today’s complex world? Let’s re-think logging - no longer the duct-tape of programming

Observability, the practical approach - Anton Drukh - DevOpsDays Tel Aviv 2018

Recommended

Recommended

More Related Content

Similar to Observability, the practical approach - Anton Drukh - DevOpsDays Tel Aviv 2018

Similar to Observability, the practical approach - Anton Drukh - DevOpsDays Tel Aviv 2018 (20)

More from DevOpsDays Tel Aviv

More from DevOpsDays Tel Aviv (20)

Recently uploaded

Recently uploaded (20)

Observability, the practical approach - Anton Drukh - DevOpsDays Tel Aviv 2018

Editor's Notes