Observability and more architecture next 2020

Observability & More
Alon Fliess
Chief Architect
alonf@codevalue.net
@alon_fliess
http://alonfliess.me
http://codevalue.net

Cloudflare blames ‘bad software’ deployment for today’s outage

About Me
 Alon Fliess:
 Chief Software Architect & Co-Founder at OzCode & CodeValue
 More than 30 years of hands-on experience
 Microsoft Regional Director & Microsoft Azure MVP
 Spend most of my time in project analysis, architecture, design
 Code at night

Azure Israel
 https://www.meetup.com/AzureIsrael
4

Agenda
 DevOps, the true story
 Microservice Architecture, the complexity shift
 Ops & Monitoring
 Site Reliable Managers
 Developers & Observability
 Business (marketing, sales, management) and
observability
 Application Performance Monitoring
 How does it work?
 Distributed Tracing
 Production problem solving
5

The Essence of DevOps
 Better Software, Faster! When Development and Operations Synergize
 Covers the *entire* Application Lifecycle
6

Microservice Architecture == Complexity Shift
7

Ops  Vital Signs: Heartbeat, Blood Pressure, Temperature
8

What Do Site Reliability Managers (SRE) Want?
9

What Do Marketing & Sales Teams Want?
11

What is Observability? (Twitter 2013)
12

Gartner
Critical Capabilities for APM (May 2019)
13
Business
Analysis
Anomaly
Detection
IT Operations
DevOps Release
Application Support
Application Development
Application Owner
Use Cases

APM Players
Dynatrace
AppDynamics (Cisco)
Datadog
Splunk
Broadcom (CA Technologies)
New Relic
Riverbed
IBM
Instana
Oracle
Tingyun
SolarWinds
ManageEngine
Micro Focus
15

How Does Monitoring & Tracing Work?
16
Operating Systems
APM system tracking agent installed on the machine
CPU, Memory, I/O, Network
Code Tracing
Instrumentation
Manual
Auto
Runtime data collection

Instrumentation – Original Pseudo Code
17
Function AddToBasket(var productId, var quantity)
if (quantity < 0)
return false
var product = Dal.GetProductById(productId)
BasketService.Add(product, quantity)
return true

Instrumentation – Add Logging on Errors
18
if (quantity < 0)
Log(“Error: Negative quantity value”)
return false
return true

Instrumentation – Add Metrics of Usage and Errors
19
metrics.Count(“AddToBasket”, 1)
if (quantity < 0)
metrics.Count(“AddToBasketFailure”, 1)
return false
return true

Instrumentation – Measure Latency
20
start = time()
if (quantity < 0)
return false
var product = Dal.GetProductById(productId);
BasketService.Add(product, quantity);
metrics.Measure(“AddToBasket”, time() – start);
return true;

Instrumentation – Measure Latency Everywhere
21
start = time()
if (quantity < 0)
return false
metrics.Measure(“AddToBasket_GetProductById”, time() – start)
metrics.Measure(“AddToBasket”, time() – start)
return true

Instrumentation – Add Debugging Information
22
debug.AddParameters(“AddToBasket”, [[“ProductId”, productid],[“quantity”, quantity]])
start = time()
if (quantity < 0)
debug.AddError(“AddToBasket”, GetErrorData())
return false
debug.AddValue(“AddToBasket”, [[“product”, product]])
return true

Instrumentation – Original vs. Instrumented Code
23
debug.AddParameters(“AddToBasket”, [[“ProductId”, productid],[“quantity”, quantity]])
start = time()
if (quantity < 0)
debug.AddError(“AddToBasket”, GetErrorData())
return false
debug.AddValue(“AddToBasket”, [[“product”, product]])
return true

Instrumentation and Tracing Automation
 Aspect Oriented Approach
 Communication level instrumentation
 Pipeline interception – technology depended
 Resource performance counters – DB statistics for example
 Code Instrumentation
 Manual – deploy a package and call it
 Automatic – bytecode instrumentation libraries and tools
 Distributed Tracing
 Passing call context between services
24

Distributed Tracing
25
Id:123
Application
A
Service A
B
Service B
Span
Span
Span

Instrumentation – Call Context
26
Function AddToBasket(var productId, var quantity, var context)
debug.AddParameters(context, “AddToBasket”, [[“ProductId”, productid],[“quantity”, quantity]])
metrics.Count(context, “AddToBasket”, 1)
start = time()
if (quantity < 0)
Log(context, “Error: Negative quantity value”)
metrics.Count(context, “AddToBasketFailure”, 1)
debug.AddError(context, “AddToBasket”, GetErrorData())
return false
var product = Dal.GetProductById(context, productId)
debug.AddValue(context, “AddToBasket”, [[“product”, product]])
metrics.Measure(context, “AddToBasket_GetProductById”, time() – start)
BasketService.Add(context, product, quantity)
metrics.Measure(context, “AddToBasket”, time() – start)
return true
Context:
Call Id
URL
HTTP Method
DB Host
User Info
Timing Info

Instrumentation – Using Span
27
Function AddToBasket(var productId, var quantity, var context)
span = trace.BeginSpan(context, {“AddToBasket”, productid, quantity})
if (quantity < 0)
span.Error(“Negative quantity value”)
return false;
var product = Dal.GetProductById(context, productId)
span.AddValue(“product”, product)
BasketService.Add(context, product, quantity)
span.End()
return true;
Span:
Call Id
URL
HTTP Method
DB Host
User Info
Timing Info

What Do SREs & Developers Want – From Each Other?
29

APM Error Analysis – Not Enough Information
Error Rate
Request information
Stack trace
 APM systems can assist in health monitoring and fault first aid

Production Problem Solving Challenges
10kg
Can’t mess with
data
10kg
No Debugging
tools
10kg
Code is
optimized
10kg
Older source
code version
10kg
Can’t impact
performance
10kg
Data must stay in
a secure env.
10kg
Data is private and
contains PII
10kg
Very hard to
reproduce the bug

Production Problem Solving Platforms
 OzCode
 OverOps
 Rookout
 Application Insights
34

Problem Solving With a Production Debugger
35

Alon Fliess
Chief Architect
alonf@codevalue.net
@alon_fliess
http://alonfliess.me
http://codevalue.net

Observability and more architecture next 2020

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Observability and more architecture next 2020

Similar to Observability and more architecture next 2020 (20)

More from Alon Fliess

More from Alon Fliess (11)

Recently uploaded

Recently uploaded (20)

Observability and more architecture next 2020

Editor's Notes