Microservices Application Tracing Standards and Simulators - Adrians at OSCON

Microservices Application Tracing Standards
and Simulators
From Zipkin to Greater Tracing: Involving a wider group of people in
distributed tracing
@adrianfcole
@adrianco
#oscon

Introduction
introduction
opening zipkin
beyond zipkin
simulation
@adrianco @adrianfcole

@adrianfcole
• spring cloud at pivotal
• focused on distributed tracing
• helped open zipkin

Opening Zipkin
introduction
opening zipkin
beyond zipkin
simulation
@adrianfcole

Distributed Tracing commoditizes knowledge
Distributed tracing systems collect end-to-end latency graphs
(traces) in near real-time.
You can compare traces to understand why certain requests
take longer than others.

Zipkin is like Chrome DevTool’s network panel!
http://zipkin.io/
• .. except you see your whole architecture

It started with community focus

commit 92c941890c2009a401b777093342dc4f28955640
Author: Johan Oskarsson <johan@oskarsson.nu>
Date: Tue Nov 15 10:09:47 2011 -0800
[split] Enable B3 tracing for TFE. Filter out finagle-http headers from incoming requests
BigBrotherBird is silently born

Zipkin is less silently born
commit 2b7acead044e71c744f39804abe564383eb5f846
Author: Johan Oskarsson <johan@oskarsson.nu>
Date: Wed Jun 6 11:28:34 2012 -0700
Initial commit

zipkin says “we are a community”

So what happened?
Zipkin development at Twitter was in short bursts, centered on other work
Many experienced Zipkin engineers don’t work at Twitter (or in the Bay Area)
Platform diversity is a reality for many
Having the same goals was our opportunity

How’s OpenZipkin doing now?
Zipkin's now releasable (maybe too releasable)
We’re working on understandability on usability
We’re making the community easier to find
We hit bumps, and sometimes reverse change

Beyond Zipkin
introduction
opening zipkin
beyond zipkin
simulation
@adrianfcole

The “greater” tracing
Many groups are solving similar problems
Some focus on stacks, others on instrumentation
By collaborating more, we can make tracing greater

Instrumentation portability
Interop through shared trace pipelines.
Practical matters, like categorization and tactical designs
Moving R&D to implementation
Simulation and system testing
distributed-tracing google group
Distributed Tracing Workgroup

OpenTracing is an effort to clean-up and de-risk
distributed tracing instrumentation
OpenTracing Interfaces decouple instrumentation from
vendor-specific dependencies and terminology. This
allows applications to switch products with less effort.
http://opentracing.io/
OpenTracing: Go, Python, Java, JavaScript..

A single configuration change to bind a Tracer
implementation in main() or similar
import opentracing "github.com/opentracing/opentracing-go"
import "github.com/tracer_x/tracerimpl"
func main() {
// Bind tracerimpl to the opentracing system
opentracing.InitGlobalTracer(
tracerimpl.New(kTracerImplAccessToken))
... normal main() stuff ...
}
How does it work?
Clean, vendor-neutral instrumentation code that
naturally tells the story of a distributed operation
import opentracing "github.com/opentracing/opentracing-go"
func AddContact(c *Contact) {
sp := opentracing.StartSpan("AddContact")
defer sp.Finish()
sp.LogEventWithPayload("Added contact: ", *c)
subRoutine(sp, ...)
...
}
func subRoutine(parentSpan opentracing.Span, ...) {
...
sp := opentracing.StartChildSpan(parentSpan, "subRoutine")
defer sp.Finish()
sp.Info("deferred work to subroutine")
...
}
Thanks, @el_bhs for
the slide!

Pivot Tracing is applied research from Brown University
(the one that brought us X-Trace).
Pivot tracing allows you to dynamically query systems at
runtime, grouping on “Baggage” which propagates across
service boundaries.
pivottracing.io
Pivot Tracing

Start writing queries including the fancy
happened-before join
From incr In DataNodeMetrics.incrBytesRead
Join cl In First(ClientProtocols) On cl -> incr
GroupBy cl.procName
Select cl.procName, SUM(incr.delta)
How does it work?
Services need to be in Java and be able to talk to
a provided PubSub broker.
// Add a library
<dependency>
<groupId>edu.brown.cs.systems</groupId>
<artifactId>pivottracing-agent</artifactId>
<version>4.0</version>
</dependency>
// Initialize it on bootstrap
PivotTracing.initialize(); @brownsys_jmace
made this!

Simulation
introduction
opening zipkin
beyond zipkin
simulation
@adrianfcole

What does @adrianco do?
@adrianco
Technology Due
Diligence on Deals
Presentations at
Conferences
Presentations at
Companies
Technical
Advice for Portfolio
Companies
Program
Committee for
Conferences
Networking with
Interesting PeopleTinkering with
Technologies
Maintain
Relationship with
Cloud Vendors
Previously: Netflix, eBay, Sun Microsystems, CCL, TCU London BSc Applied Physics

@adrianco
Testing Flow Monitors
Monitoring tools often “explode on impact” with
real world use cases at scale
Interestingly large complex environments are
expensive to create or hard to get access to
Free, open source tools don’t have a budget…

OSS Microservice Simulator
Model and visualize microservices
Simulate interesting architectures
Generate large scale configurations
Stress test real tools like Zipkin
Code: github.com/adrianco/spigo
Simulate Protocol Interactions in Go
Visualize with D3, Neo4j or Guesstimate
See for yourself: http://simianviz.surge.sh
Follow @simianviz for updates
ELB Load Balancer
Zuul
API Proxy
Karyon
Business Logic
Staash
Data Access Layer
Priam
Cassandra Datastore
Three
Availability
Zones
Denominator
DNS Endpoint

POST Spigo flows to zipkin
# collect flows, duration 2 seconds, architecture lamp
$ ./spigo -c —d 2 -a lamp
—snip—
# clean out zipkin database and post newly created data
$ misc/zipkin.sh lamp

Spigo Nanoservice Structure
func Start(listener chan gotocol.Message) {
...
for {
select {
case msg := <-listener:
flow.Instrument(msg, name, hist)
switch msg.Imposition {
case gotocol.Hello: // get named by parent
...
case gotocol.NameDrop: // someone new to talk to
...
case gotocol.Put: // upstream request handler
...
outmsg := gotocol.Message{gotocol.Replicate, listener, time.Now(),
msg.Ctx.NewParent(), msg.Intention}
flow.AnnotateSend(outmsg, name)
outmsg.GoSend(replicas)
}
case <-eurekaTicker.C: // poll the service registry
...
}
}
}
Skeleton code for sideways replicating a Put message
Instrument incoming requests
Instrument outgoing requests
update trace context

Flow Trace Records
riak2
us-east-1
zoneC
riak9
us-west-2
zoneA
Put s896
Replicate
riak3
us-east-1
zoneA
riak8
us-west-2
zoneC
riak4
us-east-1
zoneB
riak10
us-west-2
zoneB
us-east-1.zoneC.riak2 t98p895s896 Put
us-east-1.zoneA.riak3 t98p896s908 Replicate
us-east-1.zoneB.riak4 t98p896s909 Replicate
us-west-2.zoneA.riak9 t98p896s910 Replicate
us-west-2.zoneB.riak10 t98p910s912 Replicate
us-west-2.zoneC.riak8 t98p910s913 Replicate
staash
us-east-1
zoneC
s910 s908s913
s909s912
Replicate Put
context: transaction parent span

Definition of an architecture
{
"arch": "lamp",
"description":"Simple LAMP stack",
"version": "arch-0.0",
"victim": "webserver",
"services": [
{ "name": "rds-mysql", "package": "store", "count": 2, "regions": 1, "dependencies": [] },
{ "name": "memcache", "package": "store", "count": 1, "regions": 1, "dependencies": [] },
{ "name": "webserver", "package": "monolith", "count": 18, "regions": 1, "dependencies": ["memcache", "rds-mysql"] },
{ "name": "webserver-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["webserver"] },
{ "name": "www", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["webserver-elb"] }
]
}
Header includes
chaos monkey victim
New tier
name
Tier
package
0 = non
Regional
Node
count
List of tier
dependencies
See for yourself: http://simianviz.surge.sh/lamp

Migrating to Microservices
See for yourself: http://simianviz.surge.sh/migration
Endpoint
ELB
PHP
MySQL
MySQL
Next step Controls node
placement distance
Select models

Running Spigo
$ ./spigo -a lamp -d 2 -j -c
2016/05/16 18:46:37 Loading architecture from json_arch/lamp_arch.json
2016/05/16 18:46:37 lamp.edda: starting
2016/05/16 18:46:37 HTTP metrics now available at localhost:8123/debug/vars
2016/05/16 18:46:37 Architecture: lamp Simple LAMP stack
2016/05/16 18:46:37 architecture: scaling to 100%
2016/05/16 18:46:37 Starting: {rds-mysql store 1 2 []}
2016/05/16 18:46:37 lamp.us-east-1.zoneB..eureka01...eureka.eureka: starting
2016/05/16 18:46:37 lamp.us-east-1.zoneC..eureka02...eureka.eureka: starting
2016/05/16 18:46:37 lamp.us-east-1.zoneA..eureka00...eureka.eureka: starting
2016/05/16 18:46:37 Starting: {memcache store 1 1 []}
2016/05/16 18:46:37 Starting: {webserver monolith 1 18 [memcache rds-mysql]}
2016/05/16 18:46:37 Starting: {webserver-elb elb 1 0 [webserver]}
2016/05/16 18:46:37 Starting: {www denominator 0 0 [webserver-elb]}
2016/05/16 18:46:37 lamp.*.*..www00...www.denominator activity rate 10ms
2016/05/16 18:46:38 chaosmonkey delete: lamp.us-east-1.zoneA..webserver09...webserver.monolith
2016/05/16 18:46:39 asgard: Shutdown
2016/05/16 18:46:39 Saving 30 histograms for Guesstimate
2016/05/16 18:46:39 lamp.us-east-1.zoneA..eureka00...eureka.eureka: closing
2016/05/16 18:46:39 lamp.us-east-1.zoneC..eureka02...eureka.eureka: closing
2016/05/16 18:46:39 lamp.us-east-1.zoneB..eureka01...eureka.eureka: closing
2016/05/16 18:46:39 spigo: complete
2016/05/16 18:46:39 lamp.edda: closing
2016/05/16 18:46:39 Flushing flows to json_metrics/lamp_flow.json
-a architecture lamp
-d run for 2 seconds
-j graph json/lamp.json
-c flows json_metrics/lamp_flow.json

Riak IoT Architecture
{
"arch": "riak",
"description":"Riak IoT ingestion example for the RICON 2015 presentation",
"version": "arch-0.0",
"victim": "",
"services": [
{ "name": "riakTS", "package": "riak", "count": 6, "regions": 1, "dependencies": ["riakTS", "eureka"]},
{ "name": "ingester", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakTS"]},
{ "name": "ingestMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["ingester"]},
{ "name": "riakKV", "package": "riak", "count": 3, "regions": 1, "dependencies": ["riakKV"]},
{ "name": "enricher", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakKV", "ingestMQ"]},
{ "name": "enrichMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["enricher"]},
{ "name": "analytics", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingester"]},
{ "name": "analytics-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["analytics"]},
{ "name": "analytics-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["analytics-elb"]},
{ "name": "normalization", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["enrichMQ"]},
{ "name": "iot-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["normalization"]},
{ "name": "iot-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["iot-elb"]},
{ "name": "stream", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingestMQ"]},
{ "name": "stream-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["stream"]},
{ "name": "stream-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["stream-elb"]}
]
}
New tier
name
Tier
package
Node
count
List of tier
dependencies
0 = non
Regional

Single Region Riak IoT
See for yourself: http://simianviz.surge.sh/riak

IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint

Stream Endpoint
Analytics Endpoint
Load Balancer
Load Balancer
Load Balancer

Stream Endpoint
Analytics Endpoint
Load Balancer
Normalization Services
Load Balancer
Load Balancer
Stream Service
Analytics Service

Stream Endpoint
Analytics Endpoint
Load Balancer
Enrich Message Queue
Riak KV
Enricher Services
Load Balancer
Load Balancer
Stream Service
Analytics Service

Stream Endpoint
Analytics Endpoint
Load Balancer
Riak KV
Enricher Services
Ingest Message Queue
Load Balancer
Load Balancer
Stream Service
Analytics Service

Stream Endpoint
Analytics Endpoint
Load Balancer
Riak KV
Enricher Services
Ingest Message Queue
Load Balancer
Load Balancer
Stream Service Riak TS
Analytics Service
Ingester Service

Two Region Riak IoT
Stream Endpoint
Analytics Endpoint
East Region Ingestion
West Region Ingestion
Multi Region TS Analytics

Spigo with Neo4j
$ ./spigo -a netflix -d 2 -n -c -kv chat:200ms
2016/05/18 12:07:08 Graph will be written to Neo4j via NEO4JURL=localhost:7474
2016/05/18 12:07:08 Loading architecture from json_arch/netflix_arch.json
2016/05/18 12:07:08 HTTP metrics now available at localhost:8123/debug/vars
2016/05/18 12:07:08 netflix.edda: starting
2016/05/18 12:07:08 Architecture: netflix A simplified Netflix service. See http://netflix.github.io/ to decode the package names
2016/05/18 12:07:08 architecture: scaling to 100%
2016/05/18 12:07:08 Starting: {cassSubscriber priamCassandra 1 6 [cassSubscriber eureka]}
2016/05/18 12:07:08 netflix.us-east-1.zoneA..eureka00...eureka.eureka: starting
2016/05/18 12:07:08 netflix.us-east-1.zoneB..eureka01...eureka.eureka: starting
2016/05/18 12:07:08 netflix.us-east-1.zoneC..eureka02...eureka.eureka: starting
2016/05/18 12:07:08 Starting: {evcacheSubscriber store 1 3 []}
2016/05/18 12:07:08 Starting: {subscriber staash 1 3 [cassSubscriber evcacheSubscriber]}
2016/05/18 12:07:08 Starting: {cassPersonalization priamCassandra 1 6 [cassPersonalization eureka]}
2016/05/18 12:07:08 Starting: {personalizationData staash 1 3 [cassPersonalization]}
2016/05/18 12:07:08 Starting: {cassHistory priamCassandra 1 6 [cassHistory eureka]}
2016/05/18 12:07:08 Starting: {historyData staash 1 3 [cassHistory]}
2016/05/18 12:07:08 Starting: {contentMetadataS3 store 1 1 []}
2016/05/18 12:07:08 Starting: {personalize karyon 1 9 [contentMetadataS3 subscriber historyData personalizationData]}
2016/05/18 12:07:08 Starting: {login karyon 1 6 [subscriber]}
2016/05/18 12:07:08 Starting: {home karyon 1 9 [contentMetadataS3 subscriber personalize]}
2016/05/18 12:07:08 Starting: {play karyon 1 9 [contentMetadataS3 historyData subscriber]}
2016/05/18 12:07:08 Starting: {loginpage karyon 1 6 [login]}
2016/05/18 12:07:08 Starting: {homepage karyon 1 9 [home]}
2016/05/18 12:07:08 Starting: {playpage karyon 1 9 [play]}
2016/05/18 12:07:08 Starting: {wwwproxy zuul 1 3 [loginpage homepage playpage]}
2016/05/18 12:07:08 Starting: {apiproxy zuul 1 3 [login home play]}
2016/05/18 12:07:08 Starting: {www-elb elb 1 0 [wwwproxy]}
2016/05/18 12:07:08 Starting: {api-elb elb 1 0 [apiproxy]}
2016/05/18 12:07:08 Starting: {www denominator 0 0 [www-elb]}
2016/05/18 12:07:08 Starting: {api denominator 0 0 [api-elb]}
2016/05/18 12:07:08 netflix.*.*..api00...api.denominator activity rate 200ms
2016/05/18 12:07:09 chaosmonkey delete: netflix.us-east-1.zoneA..homepage03...homepage.karyon
2016/05/18 12:07:10 asgard: Shutdown
2016/05/18 12:07:10 netflix.us-east-1.zoneC..eureka02...eureka.eureka: closing
2016/05/18 12:07:10 netflix.us-east-1.zoneA..eureka00...eureka.eureka: closing
2016/05/18 12:07:10 netflix.us-east-1.zoneB..eureka01...eureka.eureka: closing
2016/05/18 12:07:10 spigo: complete
2016/05/18 12:07:11 netflix.edda: closing
-a architecture netflix
-d run for 2 seconds
-n graph and flows written to Neo4j
-c flows json_metrics/netflix_flow.json
-kv chat:200ms start flows at 5/sec

@adrianco
Conclusion
Monitoring tools can be stressed at scale by
simulating their inputs with metrics, dependency
graphs and flows
Spigo can be extended to produce any format
output at very large scale from a laptop

Ask Adrians
@adrianco @adrianfcole
distributed-tracing google group
opentracing.io zipkin.io
github.com/adrianco/spigo
@opentracing @simianviz @zipkinproject
pivottracing.io

Microservices Application Tracing Standards and Simulators - Adrians at OSCON

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Microservices Application Tracing Standards and Simulators - Adrians at OSCON

Similar to Microservices Application Tracing Standards and Simulators - Adrians at OSCON (20)

More from Adrian Cockcroft

More from Adrian Cockcroft (12)

Recently uploaded

Recently uploaded (20)

Microservices Application Tracing Standards and Simulators - Adrians at OSCON