Ben Sigelman (@el_bhs, bhs@lightstep.com)
Co-founder & CEO: LightStep
Co-creator: OpenTracing, OpenTelemetry, Google Dapper, Google Monarch
Architectures that Scale Deep:
Regaining Control in Deep Systems
QCon SF, November 2019
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
properties-deep-systems/
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
Part I
Scaling, and Deep Systems
What is scale, anyway?
Scaling wide
Scaling wide
Scaling wide
Scaling wide
Scaling wide
Scaling deep
Scaling deep
Scaling deep
Scaling deep
Scaling deep
How does this look
for software?
Software: Scaling wide
Software: Scaling deep
How do real-world
systems look?
Microservices at scale aren’t
just wide systems, they’re
deep systems
Deep Systems
Architectures with ≥ 4 layers of
independently operated services
(including external/cloud dependencies)
Deep Systems
Architectures with ≥ 4 layers of
independently operated services
(including external/cloud dependencies)
What do deep systems sound like?
“Don’t deploy on Fridays”
What do deep systems sound like?
“Where’s Chris?! I’m dealing with
a P0 and they’re the only one
who knows how to debug this.”
What do deep systems sound like?
“It can’t be our fault, our
dashboard says we’re healthy”
What do deep systems sound like?
“Kafka is on fire”
What do deep systems sound like?
“I need 100% availability
from your team.
One hundred percent.”
What do deep systems sound like?
“I didn’t know I depended on
that region”
What do deep systems sound like?
“That was on a dashboard but I
can’t find it”
What do deep systems sound like?
Lots of challenges:
- People-management
- Security
- Multi-tenancy
- “Big-customer” success
- Performance
- Observability
What do deep systems sound like?
Part II
Control Theory: TL;DR Edition
Why do we care so much
about observability, anyway?
A System
Inputs Outputs
… and its state vector,
Inputs
A System
Outputs
… and its state vector,
Observability
How well can you infer
internal state using only
the outputs?
Outputs
A System
Inputs
… and its state vector,
Controllability
How well can you
control internal state
using only the inputs?
Controllability
is the dual of
Observability
Controllability
is the dual of
Observability
Part III
What Deep Systems
Mean for Observability
# of services
developersperservice
Architectural
evolution
Deep
Systems
Pure Monoliths
Stress (n): responsibility without controlStress
what you can control
what you are
responsible for
Observability:
Shrink This Gap
Mental models
A System
Managing Deep Systems
Services must have SLOs
(“Service Level Objectives”: latency, errors, etc)
For effective service management, only
three things matter:
0. Releasing service functionality
1. Gradually improving SLOs
2. Rapidly restoring SLOs
In a deep system, we must control the
entire “triangle” to maintain our SLOs
Controllability == ObservabilityControllability == Observability
There’s that word again…
Observability: “The Conventional Wisdom”
Observing microservices is hard
Google and Facebook solved this (right???)
They used Metrics, Logging, and Distributed Tracing…
… So we should, too.
3 Pillars, 3 Experiences
Metrics
Logs
Traces
Three Pillars?Three Pillars? Two giant pipes…
Logs
Metrics
Without Traces:
Cognitive Load ≈ O(depth2
)
Three Pillars?Three Pillars? Two giant pipes…
Logs
Metrics
Two giant pipes…
Logs
Metrics
Without Traces:
Cognitive Load ≈ O(depth2
)
Traces
Traces provide Context
Traces provide Context
And context rules out
invalid hypotheses
Two giant pipes and a filter
Logs
Metrics
Context
(from traces)
Context
(from traces)
Context reduces cognitive load
With Traces:
Cognitive Load ≈ O(depth)
Relevant Metrics
Relevant Logs
Observability:
Shrink This Gap
Let’s Review
Microservices don’t just scale wide,
they scale deep
Recognize deep systems
Stress (n): responsibility without controlStress
what you can control
what you are
responsible for
“Controllability” (of SLOs)
depends on observability
… and traces are not sprinkles
“The Three Pillars of Observability”
is a lousy metaphor
Tracing can reduce
cognitive load from
O(depth2
) to O(depth)
Tracing is the backbone of
simple observability
in deep systems
Thank You
Feedback always
welcome:
twitter → @el_bhs
the emails → bhs@lightstep.com
Play with LightStep,
for free, anytime:
(no email address required!)
lightstep.com/play
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
properties-deep-systems/

Architectures That Scale Deep - Regaining Control in Deep Systems