Architectures That Scale Deep - Regaining Control in Deep Systems

Ben Sigelman (@el_bhs, bhs@lightstep.com)
Co-founder & CEO: LightStep
Co-creator: OpenTracing, OpenTelemetry, Google Dapper, Google Monarch
Architectures that Scale Deep:
Regaining Control in Deep Systems
QCon SF, November 2019

InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
properties-deep-systems/

Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com

Part I
Scaling, and Deep Systems

How does this look
for software?

How do real-world
systems look?

Microservices at scale aren’t
just wide systems, they’re
deep systems

Deep Systems
Architectures with ≥ 4 layers of
independently operated services
(including external/cloud dependencies)
Deep Systems
Architectures with ≥ 4 layers of
independently operated services
(including external/cloud dependencies)

What do deep systems sound like?

“Don’t deploy on Fridays”

“Where’s Chris?! I’m dealing with
a P0 and they’re the only one
who knows how to debug this.”

“It can’t be our fault, our
dashboard says we’re healthy”

“Kafka is on ﬁre”

“I need 100% availability
from your team.
One hundred percent.”

“I didn’t know I depended on
that region”

“That was on a dashboard but I
can’t ﬁnd it”

Lots of challenges:
- People-management
- Security
- Multi-tenancy
- “Big-customer” success
- Performance
- Observability

Part II
Control Theory: TL;DR Edition

Why do we care so much
about observability, anyway?

A System
Inputs Outputs
… and its state vector,

Inputs
A System
Outputs
Observability
How well can you infer
internal state using only
the outputs?

Outputs
A System
Inputs
Controllability
How well can you
control internal state
using only the inputs?

Controllability
is the dual of
Observability

Part III
What Deep Systems
Mean for Observability

# of services
developersperservice
Architectural
evolution
Deep
Systems
Pure Monoliths

Stress (n): responsibility without controlStress
what you can control
what you are
responsible for

Observability:
Shrink This Gap

Managing Deep Systems
Services must have SLOs
(“Service Level Objectives”: latency, errors, etc)
For eﬀective service management, only
three things matter:
0. Releasing service functionality
1. Gradually improving SLOs
2. Rapidly restoring SLOs
In a deep system, we must control the
entire “triangle” to maintain our SLOs

Controllability == ObservabilityControllability == Observability
There’s that word again…

Observability: “The Conventional Wisdom”
Observing microservices is hard
Google and Facebook solved this (right???)
They used Metrics, Logging, and Distributed Tracing…
… So we should, too.

3 Pillars, 3 Experiences
Metrics
Logs
Traces

Three Pillars?Three Pillars? Two giant pipes…
Logs
Metrics
Without Traces:
Cognitive Load ≈ O(depth2
)

Three Pillars?Three Pillars? Two giant pipes…
Logs
Metrics

Two giant pipes…
Logs
Metrics
Without Traces:
Cognitive Load ≈ O(depth2
)

Traces provide Context
And context rules out
invalid hypotheses

Two giant pipes and a ﬁlter
Logs
Metrics
Context
(from traces)

Context
(from traces)
Context reduces cognitive load
With Traces:
Cognitive Load ≈ O(depth)
Relevant Metrics
Relevant Logs

Microservices don’t just scale wide,
they scale deep
Recognize deep systems

“Controllability” (of SLOs)
depends on observability

… and traces are not sprinkles
“The Three Pillars of Observability”
is a lousy metaphor

Tracing can reduce
cognitive load from
O(depth2
) to O(depth)

Tracing is the backbone of
simple observability
in deep systems

Thank You
Feedback always
welcome:
twitter → @el_bhs
the emails → bhs@lightstep.com
Play with LightStep,
for free, anytime:
(no email address required!)
lightstep.com/play

Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
properties-deep-systems/

Architectures That Scale Deep - Regaining Control in Deep Systems

More Related Content

What's hot

Similar to Architectures That Scale Deep - Regaining Control in Deep Systems

More from C4Media

Recently uploaded

Architectures That Scale Deep - Regaining Control in Deep Systems