Microxchg Analyzing Response Time Distributions for Microservices

Adrian Cockcroft
Adrian CockcroftTechnology Fellow at Battery Ventures
Analyzing Response Time
Distributions for Microservices
Adrian Cockcroft @adrianco
Technology Fellow - Battery Ventures
February 2016
What does @adrianco do?
@adrianco
Technology Due
Diligence on Deals
Presentations at
Conferences
Presentations at
Companies
Technical
Advice for Portfolio
Companies
Program
Committee for
Conferences
Networking with
Interesting PeopleTinkering with
Technologies
Maintain
Relationship with
Cloud Vendors
Challenges for
Microservice
Platforms
Managing Scale
A Possible Hierarchy
Continents
Regions
Zones
Services
Versions
Containers
Instances
How Many?
3 to 5
2-4 per Continent
1-5 per Region
100’s per Zone
Many per Service
1000’s per Version
10,000’s
It’s much more challenging
than just a large number of
machines
Flow
Some tools can show
the request flow
across a few services
Interesting
architectures have a
lot of microservices!
Flow visualization is
a big challenge.
See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
Simulated Microservices
Model and visualize microservices
Simulate interesting architectures
Generate large scale configurations
Eventually stress test real tools
See github.com/adrianco/spigo
Simulate Protocol Interactions in Go
Visualize with D3
ELB Load Balancer
Zuul API Proxy
Karyon
Business
Logic
Staash
Data
Access
Layer
Priam Cassandra
Datastore
Three
Availability
Zones
Spigo Nanoservice Structure
func Start(listener chan gotocol.Message) {
...
for {
select {
case msg := <-listener:
flow.Instrument(msg, name, hist)
switch msg.Imposition {
case gotocol.Hello: // get named by parent
...
case gotocol.NameDrop: // someone new to talk to
...
case gotocol.Put: // upstream request handler
...
outmsg := gotocol.Message{gotocol.Replicate, listener, time.Now(),
msg.Ctx.NewParent(), msg.Intention}
flow.AnnotateSend(outmsg, name)
outmsg.GoSend(replicas)
}
case <-eurekaTicker.C: // poll the service registry
...
}
}
}
Nanoservice simulation total about 200 lines of Go
Flow Trace Recording
riak2
us-east-1
zoneC
riak9
us-west-2
zoneA
Put s896
Replicate
riak3
us-east-1
zoneA
riak8
us-west-2
zoneC
riak4
us-east-1
zoneB
riak10
us-west-2
zoneB
us-east-1.zoneC.riak2 t98p895s896 Put
us-east-1.zoneA.riak3 t98p896s908 Replicate
us-east-1.zoneB.riak4 t98p896s909 Replicate
us-west-2.zoneA.riak9 t98p896s910 Replicate
us-west-2.zoneB.riak10 t98p910s912 Replicate
us-west-2.zoneC.riak8 t98p910s913 Replicate
staash
us-east-1
zoneC
s910 s908s913
s909s912
Open Zipkin
A common format for trace annotations
A Java tool for visualizing traces
Standardization effort to fold in other formats
Driven by Adrian Cole (currently at Pivotal)
Extended to load Spigo generated trace files
Zipkin Trace Dependencies
Zipkin Trace Dependencies
Trace for one Spigo Flow
Definition of an architecture
{
"arch": "lamp",
"description":"Simple LAMP stack",
"version": "arch-0.0",
"victim": "webserver",
"services": [
{ "name": "rds-mysql", "package": "store", "count": 2, "regions": 1, "dependencies": [] },
{ "name": "memcache", "package": "store", "count": 1, "regions": 1, "dependencies": [] },
{ "name": "webserver", "package": "monolith", "count": 18, "regions": 1, "dependencies": ["memcache", "rds-mysql"] },
{ "name": "webserver-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["webserver"] },
{ "name": "www", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["webserver-elb"] }
]
}
Header includes
chaos monkey victim
New tier
name
Tier
package
0 = non
Regional
Node
count
List of tier
dependencies
Running Spigo
$ ./spigo -a lamp -j -d 2
2016/01/26 23:04:05 Loading architecture from json_arch/lamp_arch.json
2016/01/26 23:04:05 lamp.edda: starting
2016/01/26 23:04:05 Architecture: lamp Simple LAMP stack
2016/01/26 23:04:05 architecture: scaling to 100%
2016/01/26 23:04:05 lamp.us-east-1.zoneB.eureka01....eureka.eureka: starting
2016/01/26 23:04:05 lamp.us-east-1.zoneA.eureka00....eureka.eureka: starting
2016/01/26 23:04:05 lamp.us-east-1.zoneC.eureka02....eureka.eureka: starting
2016/01/26 23:04:05 Starting: {rds-mysql store 1 2 []}
2016/01/26 23:04:05 Starting: {memcache store 1 1 []}
2016/01/26 23:04:05 Starting: {webserver monolith 1 18 [memcache rds-mysql]}
2016/01/26 23:04:05 Starting: {webserver-elb elb 1 0 [webserver]}
2016/01/26 23:04:05 Starting: {www denominator 0 0 [webserver-elb]}
2016/01/26 23:04:05 lamp.*.*.www00....www.denominator activity rate 10ms
2016/01/26 23:04:06 chaosmonkey delete: lamp.us-east-1.zoneC.webserver02....webserver.monolith
2016/01/26 23:04:07 asgard: Shutdown
2016/01/26 23:04:07 lamp.us-east-1.zoneB.eureka01....eureka.eureka: closing
2016/01/26 23:04:07 lamp.us-east-1.zoneA.eureka00....eureka.eureka: closing
2016/01/26 23:04:07 lamp.us-east-1.zoneC.eureka02....eureka.eureka: closing
2016/01/26 23:04:07 spigo: complete
2016/01/26 23:04:07 lamp.edda: closing
-a architecture lamp
-j graph json/lamp.json
-d run for 2 seconds
Riak IoT Architecture
{
"arch": "riak",
"description":"Riak IoT ingestion example for the RICON 2015 presentation",
"version": "arch-0.0",
"victim": "",
"services": [
{ "name": "riakTS", "package": "riak", "count": 6, "regions": 1, "dependencies": ["riakTS", "eureka"]},
{ "name": "ingester", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakTS"]},
{ "name": "ingestMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["ingester"]},
{ "name": "riakKV", "package": "riak", "count": 3, "regions": 1, "dependencies": ["riakKV"]},
{ "name": "enricher", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakKV", "ingestMQ"]},
{ "name": "enrichMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["enricher"]},
{ "name": "analytics", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingester"]},
{ "name": "analytics-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["analytics"]},
{ "name": "analytics-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["analytics-elb"]},
{ "name": "normalization", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["enrichMQ"]},
{ "name": "iot-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["normalization"]},
{ "name": "iot-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["iot-elb"]},
{ "name": "stream", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingestMQ"]},
{ "name": "stream-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["stream"]},
{ "name": "stream-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["stream-elb"]}
]
}
New tier
name
Tier
package
Node
count
List of tier
dependencies
0 = non
Regional
Single Region Riak IoT
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Load Balancer
Load Balancer
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Normalization Services
Load Balancer
Load Balancer
Stream Service
Analytics Service
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Normalization Services
Enrich Message Queue
Riak KV
Enricher Services
Load Balancer
Load Balancer
Stream Service
Analytics Service
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Normalization Services
Enrich Message Queue
Riak KV
Enricher Services
Ingest Message Queue
Load Balancer
Load Balancer
Stream Service
Analytics Service
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Normalization Services
Enrich Message Queue
Riak KV
Enricher Services
Ingest Message Queue
Load Balancer
Load Balancer
Stream Service Riak TS
Analytics Service
Ingester Service
Two Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
East Region Ingestion
West Region Ingestion
Multi Region TS Analytics
Two Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
East Region Ingestion
West Region Ingestion
Multi Region TS Analytics
What’s the response
time of the stream
endpoint?
Response Times
What’s the response time of a simple service?
memcached
rds-msql
rds-msqlwebservers
elb
www
What’s the response time of an even simpler storage
backed web service?
memcached
mysql
disk volume
web
service
load
generator
See http://www.getguesstimate.com/models/1307
https://github.com/getguesstimate/guesstimate-app by Ozzie Gooen
See http://www.getguesstimate.com/models/1307
https://github.com/getguesstimate/guesstimate-app by Ozzie Gooen
See http://www.getguesstimate.com/models/1307
https://github.com/getguesstimate/guesstimate-app by Ozzie Gooen
See http://www.getguesstimate.com/models/1307
https://github.com/getguesstimate/guesstimate-app by Ozzie Gooen
Hit rates: memcached 40% mysql 70%
memcached hit %
memcached response mysql response
service cpu time
memcached hit mode
mysql cache hit mode
mysql disk access mode
Hit rates: memcached 40% mysql 70%
Hit rates: memcached 60% mysql 70%
memcached hit %
memcached response mysql response
service cpu time
memcached hit mode
mysql cache hit mode
mysql disk access mode
Hit rates: memcached 60% mysql 70%
Hit rates: memcached 20% mysql 90%
memcached hit %
memcached response mysql response
service cpu time
memcached hit mode
mysql cache hit mode
mysql disk access mode
Hit rates: memcached 20% mysql 90%
Measuring
Response Time With
Histograms
Changes made to codahale/hdrhistogram
Changes made to go-kit/kit/metrics (today!)
Implementation in adrianco/spigo/collect
What to measure?
Client Server
GetRequest
GetResponse
Client
Time
Client Send CS
Server Receive SR
Server Send SS
Client Receive CR
Server
Time
What to measure?
Client Server
GetRequest
GetResponse
Client
Time
Client Send CS
Server Receive SR
Server Send SS
Client Receive CR
Response
CR-CS
Service
SS-SR
Network
SR-CS
Network
CR-SS
Net Round Trip
(SR-CS) + (CR-SS)
(CR-CS) - (SS-SR)
Server
Time
Spigo Histogram Collection
func Start(listener chan gotocol.Message) {
...
for {
select {
case msg := <-listener:
flow.Instrument(msg, name, nethist)
switch msg.Imposition {
...
case gotocol.GetResponse:
// return path from a request, terminate and log response time in histograms
flow.End(msg, resphist, servhist, rthist)
case gotocol.Goodbye:
collect.SaveHist(nethist, name, "_net")
collect.SaveHist(resphist, name, "_resp")
collect.SaveHist(servhist, name, "_serv")
collect.SaveHist(rthist, name, “_rt")
collect.SaveAllGuesses(name)
gotocol.Message{gotocol.Goodbye, nil, time.Now(), gotocol.NilContext, name}.GoSend(parent)
return
}
case <-chatTicker.C:
...
sm = gotocol.Message{gotocol.GetRequest, listener, now, ctx, "Why"}
flow.AnnotateSend(sm, name)
sm.GoSend(microindex[m]) // send to a randomly chosen dependency
}
}
}
Go-Kit Histogram Collection
const (
maxHistObservable = 1000000
sampleCount = 500
)
func NewHist(name string) metrics.Histogram {
var h metrics.Histogram
if name != "" && archaius.Conf.Collect {
h = expvar.NewHistogram(name, 1000, maxHistObservable, 1, []int{50, 99}...)
if sampleMap == nil {
sampleMap = make(map[metrics.Histogram][]int64)
}
sampleMap[h] = make([]int64, 0, sampleCount)
return h
}
return nil
}
func Measure(h metrics.Histogram, d time.Duration) {
if h != nil && archaius.Conf.Collect {
if d > maxHistObservable {
h.Observe(int64(maxHistObservable))
} else {
h.Observe(int64(d))
}
s := sampleMap[h]
if s != nil && len(s) < sampleCount {
sampleMap[h] = append(s, int64(d))
}
}
}
Nanoseconds!
Median and 99%ile
Slice for first 500
values as samples for
export to Guesstimate
Spigo Histogram Results
name: storage.*.*.load00....load.denominator_resp
count: 1978
gauges: map[50:126975 99:278527]
From, To, Count, Prob, Bar
28672, 29695, 1, 0.0005, :
31744, 32767, 1, 0.0005, :
34816, 36863, 2, 0.0010, :#
36864, 38911, 8, 0.0040, |######
38912, 40959, 13, 0.0066, |##########
40960, 43007, 18, 0.0091, |##############
43008, 45055, 12, 0.0061, |#########
45056, 47103, 26, 0.0131, |####################
47104, 49151, 24, 0.0121, |##################
49152, 51199, 33, 0.0167, |#########################
51200, 53247, 29, 0.0147, |######################
53248, 55295, 35, 0.0177, |###########################
55296, 57343, 39, 0.0197, |##############################
57344, 59391, 35, 0.0177, |###########################
59392, 61439, 43, 0.0217, |#################################
61440, 63487, 31, 0.0157, |########################
63488, 65535, 39, 0.0197, |##############################
65536, 69631, 74, 0.0374, |#########################################################
69632, 73727, 65, 0.0329, |##################################################
73728, 77823, 57, 0.0288, |############################################
77824, 81919, 37, 0.0187, |############################
81920, 86015, 37, 0.0187, |############################
86016, 90111, 30, 0.0152, |#######################
90112, 94207, 39, 0.0197, |##############################
94208, 98303, 28, 0.0142, |#####################
98304, 102399, 30, 0.0152, |#######################
102400, 106495, 31, 0.0157, |########################
106496, 110591, 20, 0.0101, |###############
110592, 114687, 26, 0.0131, |####################
114688, 118783, 44, 0.0222, |##################################
118784, 122879, 41, 0.0207, |###############################
122880, 126975, 54, 0.0273, |##########################################
126976, 131071, 51, 0.0258, |#######################################
131072, 139263, 114, 0.0576, |########################################################################################
139264, 147455, 123, 0.0622, |###############################################################################################
147456, 155647, 127, 0.0642, |###################################################################################################
155648, 163839, 102, 0.0516, |###############################################################################
163840, 172031, 90, 0.0455, |######################################################################
172032, 180223, 65, 0.0329, |##################################################
180224, 188415, 43, 0.0217, |#################################
188416, 196607, 60, 0.0303, |##############################################
196608, 204799, 54, 0.0273, |##########################################
204800, 212991, 29, 0.0147, |######################
212992, 221183, 21, 0.0106, |################
221184, 229375, 25, 0.0126, |###################
229376, 237567, 18, 0.0091, |##############
237568, 245759, 15, 0.0076, |###########
245760, 253951, 9, 0.0046, |#######
253952, 262143, 8, 0.0040, |######
262144, 278527, 10, 0.0051, |#######
278528, 294911, 6, 0.0030, |####
294912, 311295, 2, 0.0010, |#
327680, 344063, 2, 0.0010, :#
344064, 360447, 1, 0.0005, |
376832, 393215, 1, 0.0005, :
name: storage.*.*.load00....load.denominator_resp
count: 1978
gauges: map[50:126975 99:278527]
From, To, Count, Prob, Bar
28672, 29695, 1, 0.0005, :
31744, 32767, 1, 0.0005, :
34816, 36863, 2, 0.0010, :#
36864, 38911, 8, 0.0040, |######
38912, 40959, 13, 0.0066, |##########
Normalized probability
Response time distribution
measured in nanoseconds
using High Dynamic
Range Histogram
:# Zero counts skipped
|# Contiguous buckets
Total count, median and
99th percentile values
Go Guesstimate Export
https://github.com/adrianco/goguesstimate
{
"space": {
"name": "gotest",
"description": "Testing",
"is_private": "true",
"graph": {
"metrics": [
{"id": "AB", "readableId": "AB", "name": "memcached", "location": {"row": 2, "column":4}},
{"id": "AC", "readableId": "AC", "name": "memcached percent", "location": {"row": 2, "column":
3}},
{"id": "AD", "readableId": "AD", "name": "staash cpu", "location": {"row": 3, "column":3}},
{"id": "AE", "readableId": "AE", "name": "staash", "location": {"row": 3, "column":2}}
],
"guesstimates": [
{"metric": "AB", "input": null, "guesstimateType": "DATA", "data":
[119958,6066,13914,9595,6773,5867,2347,1333,9900,9404,13518,9021,7915,3733,10244,5461,12243,7931,9044,11706,
5706,22861,9022,48661,15158,28995,16885,9564,17915,6610,7080,7065,12992,35431,11910,11465,14455,25790,8339,9
991]},
{"metric": "AC", "input": "40", "guesstimateType": "POINT"},
{"metric": "AD", "input": "[1000,4000]", "guesstimateType": "NORMAL"},
{"metric": "AE", "input": "=100+((randomInt(0,100)>AC)?AB:AD)", "guesstimateType": "FUNCTION"}
]
}
}
}
See http://www.getguesstimate.com
See http://www.getguesstimate.com
Response time distributions
exported directly from Spigo
as 500 samples to
json_metrics/storage.guess
then posted to guesstimate.
Conference driven
development not quite
complete, go-kit PR in
place to provide full
names of histograms
Relationship between
services will also be
exported soon.
What’s Next?
Trends to watch for 2016:
Serverless Architectures - AWS Lambda
Teraservices - using terabytes of memory
Teraservices
Terabyte Memory Directions
Engulf dataset in memory for analytics
Balanced config for memory intensive workloads
Replace high end systems at commodity cost point
Explore non-volatile memory implications
Terabyte Memory Options
Now: Diablo DDR4 DIMM containing flash 64/128/256GB
Migrates pages to/from companion DRAM DIMM
Shipping now as volatile memory, future non-volatile
Announced but not shipped for 2016
AWS X1 Instance Type - over 2TB RAM
Easy availability should drive innovation
Diablo Memory1: Flash DIMM
NO CHANGES to CPU or Server
NO CHANGES to Operating System
NO CHANGES to Applications
✓ UP TO 256GB DDR4 MEMORY PER MODULE
✓ UP TO 4TB MEMORY IN 2 SOCKET SYSTEM
TM
Q&A
Adrian Cockcroft @adrianco
http://slideshare.com/adriancockcroft
Technology Fellow - Battery Ventures
See www.battery.com for a list of portfolio investments
Security
Visit http://www.battery.com/our-companies/ for a full list of all portfolio companies in which all Battery Funds have invested.
Palo Alto Networks
Enterprise IT
Operations &
Management
Big DataCompute
Networking
Storage
1 of 58

More Related Content

What's hot(20)

Why Microservice Why Microservice
Why Microservice
Kelvin Yeung496 views
Implementing Domain Events with KafkaImplementing Domain Events with Kafka
Implementing Domain Events with Kafka
Andrei Rugina997 views
Architecture: MicroservicesArchitecture: Microservices
Architecture: Microservices
Amazon Web Services23.1K views
MicroservicesMicroservices
Microservices
Stephan Lindauer797 views
Introduction to microservicesIntroduction to microservices
Introduction to microservices
Anil Allewar1.1K views
Introduction to microservicesIntroduction to microservices
Introduction to microservices
Paulo Gandra de Sousa7.4K views
Cloud Migration: A How-To GuideCloud Migration: A How-To Guide
Cloud Migration: A How-To Guide
Amazon Web Services10.1K views
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
Amazon Web Services20.2K views
From Monolithic to Microservices From Monolithic to Microservices
From Monolithic to Microservices
Amazon Web Services6.3K views
Microservice Architecture 101Microservice Architecture 101
Microservice Architecture 101
Kochih Wu4.8K views
MicroservicesMicroservices
Microservices
SmartBear1.4K views
A Pattern Language for MicroservicesA Pattern Language for Microservices
A Pattern Language for Microservices
Chris Richardson2.5K views
Introduction To MicroservicesIntroduction To Microservices
Introduction To Microservices
Lalit Kale1.1K views
Design patterns for microservice architectureDesign patterns for microservice architecture
Design patterns for microservice architecture
The Software House21.9K views
Patterns of resiliencePatterns of resilience
Patterns of resilience
Uwe Friedrichsen49.1K views

Viewers also liked(20)

Similar to Microxchg Analyzing Response Time Distributions for Microservices(20)

IoTMyth ProposalIoTMyth Proposal
IoTMyth Proposal
Alfian Firmansyah43 views
Aplicaciones distribuidas con DaprAplicaciones distribuidas con Dapr
Aplicaciones distribuidas con Dapr
César Jesús Angulo Gasco424 views
Day 4 - Cloud Migration - But How?Day 4 - Cloud Migration - But How?
Day 4 - Cloud Migration - But How?
Amazon Web Services3.5K views
IOOF IT System ModernisationIOOF IT System Modernisation
IOOF IT System Modernisation
MongoDB356 views
Elastic{ON} 2017 RecapElastic{ON} 2017 Recap
Elastic{ON} 2017 Recap
Matias Cascallares1.3K views
Cloud applicationsCloud applications
Cloud applications
jamiehannaford582 views
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
Shubhra Kar2.3K views

More from Adrian Cockcroft(10)

Recently uploaded(20)

Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic Meetup
Rick Ossendrijver24 views
ThroughputThroughput
Throughput
Moisés Armani Ramírez31 views
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya59 views
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet49 views

Microxchg Analyzing Response Time Distributions for Microservices

  • 1. Analyzing Response Time Distributions for Microservices Adrian Cockcroft @adrianco Technology Fellow - Battery Ventures February 2016
  • 2. What does @adrianco do? @adrianco Technology Due Diligence on Deals Presentations at Conferences Presentations at Companies Technical Advice for Portfolio Companies Program Committee for Conferences Networking with Interesting PeopleTinkering with Technologies Maintain Relationship with Cloud Vendors
  • 5. A Possible Hierarchy Continents Regions Zones Services Versions Containers Instances How Many? 3 to 5 2-4 per Continent 1-5 per Region 100’s per Zone Many per Service 1000’s per Version 10,000’s It’s much more challenging than just a large number of machines
  • 7. Some tools can show the request flow across a few services
  • 8. Interesting architectures have a lot of microservices! Flow visualization is a big challenge. See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
  • 9. Simulated Microservices Model and visualize microservices Simulate interesting architectures Generate large scale configurations Eventually stress test real tools See github.com/adrianco/spigo Simulate Protocol Interactions in Go Visualize with D3 ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Three Availability Zones
  • 10. Spigo Nanoservice Structure func Start(listener chan gotocol.Message) { ... for { select { case msg := <-listener: flow.Instrument(msg, name, hist) switch msg.Imposition { case gotocol.Hello: // get named by parent ... case gotocol.NameDrop: // someone new to talk to ... case gotocol.Put: // upstream request handler ... outmsg := gotocol.Message{gotocol.Replicate, listener, time.Now(), msg.Ctx.NewParent(), msg.Intention} flow.AnnotateSend(outmsg, name) outmsg.GoSend(replicas) } case <-eurekaTicker.C: // poll the service registry ... } } } Nanoservice simulation total about 200 lines of Go
  • 11. Flow Trace Recording riak2 us-east-1 zoneC riak9 us-west-2 zoneA Put s896 Replicate riak3 us-east-1 zoneA riak8 us-west-2 zoneC riak4 us-east-1 zoneB riak10 us-west-2 zoneB us-east-1.zoneC.riak2 t98p895s896 Put us-east-1.zoneA.riak3 t98p896s908 Replicate us-east-1.zoneB.riak4 t98p896s909 Replicate us-west-2.zoneA.riak9 t98p896s910 Replicate us-west-2.zoneB.riak10 t98p910s912 Replicate us-west-2.zoneC.riak8 t98p910s913 Replicate staash us-east-1 zoneC s910 s908s913 s909s912
  • 12. Open Zipkin A common format for trace annotations A Java tool for visualizing traces Standardization effort to fold in other formats Driven by Adrian Cole (currently at Pivotal) Extended to load Spigo generated trace files
  • 15. Trace for one Spigo Flow
  • 16. Definition of an architecture { "arch": "lamp", "description":"Simple LAMP stack", "version": "arch-0.0", "victim": "webserver", "services": [ { "name": "rds-mysql", "package": "store", "count": 2, "regions": 1, "dependencies": [] }, { "name": "memcache", "package": "store", "count": 1, "regions": 1, "dependencies": [] }, { "name": "webserver", "package": "monolith", "count": 18, "regions": 1, "dependencies": ["memcache", "rds-mysql"] }, { "name": "webserver-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["webserver"] }, { "name": "www", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["webserver-elb"] } ] } Header includes chaos monkey victim New tier name Tier package 0 = non Regional Node count List of tier dependencies
  • 17. Running Spigo $ ./spigo -a lamp -j -d 2 2016/01/26 23:04:05 Loading architecture from json_arch/lamp_arch.json 2016/01/26 23:04:05 lamp.edda: starting 2016/01/26 23:04:05 Architecture: lamp Simple LAMP stack 2016/01/26 23:04:05 architecture: scaling to 100% 2016/01/26 23:04:05 lamp.us-east-1.zoneB.eureka01....eureka.eureka: starting 2016/01/26 23:04:05 lamp.us-east-1.zoneA.eureka00....eureka.eureka: starting 2016/01/26 23:04:05 lamp.us-east-1.zoneC.eureka02....eureka.eureka: starting 2016/01/26 23:04:05 Starting: {rds-mysql store 1 2 []} 2016/01/26 23:04:05 Starting: {memcache store 1 1 []} 2016/01/26 23:04:05 Starting: {webserver monolith 1 18 [memcache rds-mysql]} 2016/01/26 23:04:05 Starting: {webserver-elb elb 1 0 [webserver]} 2016/01/26 23:04:05 Starting: {www denominator 0 0 [webserver-elb]} 2016/01/26 23:04:05 lamp.*.*.www00....www.denominator activity rate 10ms 2016/01/26 23:04:06 chaosmonkey delete: lamp.us-east-1.zoneC.webserver02....webserver.monolith 2016/01/26 23:04:07 asgard: Shutdown 2016/01/26 23:04:07 lamp.us-east-1.zoneB.eureka01....eureka.eureka: closing 2016/01/26 23:04:07 lamp.us-east-1.zoneA.eureka00....eureka.eureka: closing 2016/01/26 23:04:07 lamp.us-east-1.zoneC.eureka02....eureka.eureka: closing 2016/01/26 23:04:07 spigo: complete 2016/01/26 23:04:07 lamp.edda: closing -a architecture lamp -j graph json/lamp.json -d run for 2 seconds
  • 18. Riak IoT Architecture { "arch": "riak", "description":"Riak IoT ingestion example for the RICON 2015 presentation", "version": "arch-0.0", "victim": "", "services": [ { "name": "riakTS", "package": "riak", "count": 6, "regions": 1, "dependencies": ["riakTS", "eureka"]}, { "name": "ingester", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakTS"]}, { "name": "ingestMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["ingester"]}, { "name": "riakKV", "package": "riak", "count": 3, "regions": 1, "dependencies": ["riakKV"]}, { "name": "enricher", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakKV", "ingestMQ"]}, { "name": "enrichMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["enricher"]}, { "name": "analytics", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingester"]}, { "name": "analytics-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["analytics"]}, { "name": "analytics-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["analytics-elb"]}, { "name": "normalization", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["enrichMQ"]}, { "name": "iot-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["normalization"]}, { "name": "iot-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["iot-elb"]}, { "name": "stream", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingestMQ"]}, { "name": "stream-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["stream"]}, { "name": "stream-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["stream-elb"]} ] } New tier name Tier package Node count List of tier dependencies 0 = non Regional
  • 20. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint
  • 21. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Load Balancer Load Balancer
  • 22. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Load Balancer Load Balancer Stream Service Analytics Service
  • 23. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Load Balancer Load Balancer Stream Service Analytics Service
  • 24. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Ingest Message Queue Load Balancer Load Balancer Stream Service Analytics Service
  • 25. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Ingest Message Queue Load Balancer Load Balancer Stream Service Riak TS Analytics Service Ingester Service
  • 26. Two Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint East Region Ingestion West Region Ingestion Multi Region TS Analytics
  • 27. Two Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint East Region Ingestion West Region Ingestion Multi Region TS Analytics What’s the response time of the stream endpoint?
  • 29. What’s the response time of a simple service? memcached rds-msql rds-msqlwebservers elb www
  • 30. What’s the response time of an even simpler storage backed web service? memcached mysql disk volume web service load generator
  • 35. Hit rates: memcached 40% mysql 70%
  • 36. memcached hit % memcached response mysql response service cpu time memcached hit mode mysql cache hit mode mysql disk access mode Hit rates: memcached 40% mysql 70%
  • 37. Hit rates: memcached 60% mysql 70%
  • 38. memcached hit % memcached response mysql response service cpu time memcached hit mode mysql cache hit mode mysql disk access mode Hit rates: memcached 60% mysql 70%
  • 39. Hit rates: memcached 20% mysql 90%
  • 40. memcached hit % memcached response mysql response service cpu time memcached hit mode mysql cache hit mode mysql disk access mode Hit rates: memcached 20% mysql 90%
  • 42. Changes made to codahale/hdrhistogram Changes made to go-kit/kit/metrics (today!) Implementation in adrianco/spigo/collect
  • 43. What to measure? Client Server GetRequest GetResponse Client Time Client Send CS Server Receive SR Server Send SS Client Receive CR Server Time
  • 44. What to measure? Client Server GetRequest GetResponse Client Time Client Send CS Server Receive SR Server Send SS Client Receive CR Response CR-CS Service SS-SR Network SR-CS Network CR-SS Net Round Trip (SR-CS) + (CR-SS) (CR-CS) - (SS-SR) Server Time
  • 45. Spigo Histogram Collection func Start(listener chan gotocol.Message) { ... for { select { case msg := <-listener: flow.Instrument(msg, name, nethist) switch msg.Imposition { ... case gotocol.GetResponse: // return path from a request, terminate and log response time in histograms flow.End(msg, resphist, servhist, rthist) case gotocol.Goodbye: collect.SaveHist(nethist, name, "_net") collect.SaveHist(resphist, name, "_resp") collect.SaveHist(servhist, name, "_serv") collect.SaveHist(rthist, name, “_rt") collect.SaveAllGuesses(name) gotocol.Message{gotocol.Goodbye, nil, time.Now(), gotocol.NilContext, name}.GoSend(parent) return } case <-chatTicker.C: ... sm = gotocol.Message{gotocol.GetRequest, listener, now, ctx, "Why"} flow.AnnotateSend(sm, name) sm.GoSend(microindex[m]) // send to a randomly chosen dependency } } }
  • 46. Go-Kit Histogram Collection const ( maxHistObservable = 1000000 sampleCount = 500 ) func NewHist(name string) metrics.Histogram { var h metrics.Histogram if name != "" && archaius.Conf.Collect { h = expvar.NewHistogram(name, 1000, maxHistObservable, 1, []int{50, 99}...) if sampleMap == nil { sampleMap = make(map[metrics.Histogram][]int64) } sampleMap[h] = make([]int64, 0, sampleCount) return h } return nil } func Measure(h metrics.Histogram, d time.Duration) { if h != nil && archaius.Conf.Collect { if d > maxHistObservable { h.Observe(int64(maxHistObservable)) } else { h.Observe(int64(d)) } s := sampleMap[h] if s != nil && len(s) < sampleCount { sampleMap[h] = append(s, int64(d)) } } } Nanoseconds! Median and 99%ile Slice for first 500 values as samples for export to Guesstimate
  • 47. Spigo Histogram Results name: storage.*.*.load00....load.denominator_resp count: 1978 gauges: map[50:126975 99:278527] From, To, Count, Prob, Bar 28672, 29695, 1, 0.0005, : 31744, 32767, 1, 0.0005, : 34816, 36863, 2, 0.0010, :# 36864, 38911, 8, 0.0040, |###### 38912, 40959, 13, 0.0066, |########## 40960, 43007, 18, 0.0091, |############## 43008, 45055, 12, 0.0061, |######### 45056, 47103, 26, 0.0131, |#################### 47104, 49151, 24, 0.0121, |################## 49152, 51199, 33, 0.0167, |######################### 51200, 53247, 29, 0.0147, |###################### 53248, 55295, 35, 0.0177, |########################### 55296, 57343, 39, 0.0197, |############################## 57344, 59391, 35, 0.0177, |########################### 59392, 61439, 43, 0.0217, |################################# 61440, 63487, 31, 0.0157, |######################## 63488, 65535, 39, 0.0197, |############################## 65536, 69631, 74, 0.0374, |######################################################### 69632, 73727, 65, 0.0329, |################################################## 73728, 77823, 57, 0.0288, |############################################ 77824, 81919, 37, 0.0187, |############################ 81920, 86015, 37, 0.0187, |############################ 86016, 90111, 30, 0.0152, |####################### 90112, 94207, 39, 0.0197, |############################## 94208, 98303, 28, 0.0142, |##################### 98304, 102399, 30, 0.0152, |####################### 102400, 106495, 31, 0.0157, |######################## 106496, 110591, 20, 0.0101, |############### 110592, 114687, 26, 0.0131, |#################### 114688, 118783, 44, 0.0222, |################################## 118784, 122879, 41, 0.0207, |############################### 122880, 126975, 54, 0.0273, |########################################## 126976, 131071, 51, 0.0258, |####################################### 131072, 139263, 114, 0.0576, |######################################################################################## 139264, 147455, 123, 0.0622, |############################################################################################### 147456, 155647, 127, 0.0642, |################################################################################################### 155648, 163839, 102, 0.0516, |############################################################################### 163840, 172031, 90, 0.0455, |###################################################################### 172032, 180223, 65, 0.0329, |################################################## 180224, 188415, 43, 0.0217, |################################# 188416, 196607, 60, 0.0303, |############################################## 196608, 204799, 54, 0.0273, |########################################## 204800, 212991, 29, 0.0147, |###################### 212992, 221183, 21, 0.0106, |################ 221184, 229375, 25, 0.0126, |################### 229376, 237567, 18, 0.0091, |############## 237568, 245759, 15, 0.0076, |########### 245760, 253951, 9, 0.0046, |####### 253952, 262143, 8, 0.0040, |###### 262144, 278527, 10, 0.0051, |####### 278528, 294911, 6, 0.0030, |#### 294912, 311295, 2, 0.0010, |# 327680, 344063, 2, 0.0010, :# 344064, 360447, 1, 0.0005, | 376832, 393215, 1, 0.0005, : name: storage.*.*.load00....load.denominator_resp count: 1978 gauges: map[50:126975 99:278527] From, To, Count, Prob, Bar 28672, 29695, 1, 0.0005, : 31744, 32767, 1, 0.0005, : 34816, 36863, 2, 0.0010, :# 36864, 38911, 8, 0.0040, |###### 38912, 40959, 13, 0.0066, |########## Normalized probability Response time distribution measured in nanoseconds using High Dynamic Range Histogram :# Zero counts skipped |# Contiguous buckets Total count, median and 99th percentile values
  • 48. Go Guesstimate Export https://github.com/adrianco/goguesstimate { "space": { "name": "gotest", "description": "Testing", "is_private": "true", "graph": { "metrics": [ {"id": "AB", "readableId": "AB", "name": "memcached", "location": {"row": 2, "column":4}}, {"id": "AC", "readableId": "AC", "name": "memcached percent", "location": {"row": 2, "column": 3}}, {"id": "AD", "readableId": "AD", "name": "staash cpu", "location": {"row": 3, "column":3}}, {"id": "AE", "readableId": "AE", "name": "staash", "location": {"row": 3, "column":2}} ], "guesstimates": [ {"metric": "AB", "input": null, "guesstimateType": "DATA", "data": [119958,6066,13914,9595,6773,5867,2347,1333,9900,9404,13518,9021,7915,3733,10244,5461,12243,7931,9044,11706, 5706,22861,9022,48661,15158,28995,16885,9564,17915,6610,7080,7065,12992,35431,11910,11465,14455,25790,8339,9 991]}, {"metric": "AC", "input": "40", "guesstimateType": "POINT"}, {"metric": "AD", "input": "[1000,4000]", "guesstimateType": "NORMAL"}, {"metric": "AE", "input": "=100+((randomInt(0,100)>AC)?AB:AD)", "guesstimateType": "FUNCTION"} ] } } }
  • 50. See http://www.getguesstimate.com Response time distributions exported directly from Spigo as 500 samples to json_metrics/storage.guess then posted to guesstimate. Conference driven development not quite complete, go-kit PR in place to provide full names of histograms Relationship between services will also be exported soon.
  • 52. Trends to watch for 2016: Serverless Architectures - AWS Lambda Teraservices - using terabytes of memory
  • 54. Terabyte Memory Directions Engulf dataset in memory for analytics Balanced config for memory intensive workloads Replace high end systems at commodity cost point Explore non-volatile memory implications
  • 55. Terabyte Memory Options Now: Diablo DDR4 DIMM containing flash 64/128/256GB Migrates pages to/from companion DRAM DIMM Shipping now as volatile memory, future non-volatile Announced but not shipped for 2016 AWS X1 Instance Type - over 2TB RAM Easy availability should drive innovation
  • 56. Diablo Memory1: Flash DIMM NO CHANGES to CPU or Server NO CHANGES to Operating System NO CHANGES to Applications ✓ UP TO 256GB DDR4 MEMORY PER MODULE ✓ UP TO 4TB MEMORY IN 2 SOCKET SYSTEM TM
  • 57. Q&A Adrian Cockcroft @adrianco http://slideshare.com/adriancockcroft Technology Fellow - Battery Ventures See www.battery.com for a list of portfolio investments
  • 58. Security Visit http://www.battery.com/our-companies/ for a full list of all portfolio companies in which all Battery Funds have invested. Palo Alto Networks Enterprise IT Operations & Management Big DataCompute Networking Storage