Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Microxchg Analyzing Response Time Distributions for Microservices

7,637 views

Published on

Research oriented presentation @Microxchg Berlin Feb 5th 2016. New code to collect histograms of response time and export them to monte-carlo simulation spreadsheet via getguesstimate.com

Published in: Technology
  • Be the first to comment

Microxchg Analyzing Response Time Distributions for Microservices

  1. 1. Analyzing Response Time Distributions for Microservices Adrian Cockcroft @adrianco Technology Fellow - Battery Ventures February 2016
  2. 2. What does @adrianco do? @adrianco Technology Due Diligence on Deals Presentations at Conferences Presentations at Companies Technical Advice for Portfolio Companies Program Committee for Conferences Networking with Interesting PeopleTinkering with Technologies Maintain Relationship with Cloud Vendors
  3. 3. Challenges for Microservice Platforms
  4. 4. Managing Scale
  5. 5. A Possible Hierarchy Continents Regions Zones Services Versions Containers Instances How Many? 3 to 5 2-4 per Continent 1-5 per Region 100’s per Zone Many per Service 1000’s per Version 10,000’s It’s much more challenging than just a large number of machines
  6. 6. Flow
  7. 7. Some tools can show the request flow across a few services
  8. 8. Interesting architectures have a lot of microservices! Flow visualization is a big challenge. See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
  9. 9. Simulated Microservices Model and visualize microservices Simulate interesting architectures Generate large scale configurations Eventually stress test real tools See github.com/adrianco/spigo Simulate Protocol Interactions in Go Visualize with D3 ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Three Availability Zones
  10. 10. Spigo Nanoservice Structure func Start(listener chan gotocol.Message) { ... for { select { case msg := <-listener: flow.Instrument(msg, name, hist) switch msg.Imposition { case gotocol.Hello: // get named by parent ... case gotocol.NameDrop: // someone new to talk to ... case gotocol.Put: // upstream request handler ... outmsg := gotocol.Message{gotocol.Replicate, listener, time.Now(), msg.Ctx.NewParent(), msg.Intention} flow.AnnotateSend(outmsg, name) outmsg.GoSend(replicas) } case <-eurekaTicker.C: // poll the service registry ... } } } Nanoservice simulation total about 200 lines of Go
  11. 11. Flow Trace Recording riak2 us-east-1 zoneC riak9 us-west-2 zoneA Put s896 Replicate riak3 us-east-1 zoneA riak8 us-west-2 zoneC riak4 us-east-1 zoneB riak10 us-west-2 zoneB us-east-1.zoneC.riak2 t98p895s896 Put us-east-1.zoneA.riak3 t98p896s908 Replicate us-east-1.zoneB.riak4 t98p896s909 Replicate us-west-2.zoneA.riak9 t98p896s910 Replicate us-west-2.zoneB.riak10 t98p910s912 Replicate us-west-2.zoneC.riak8 t98p910s913 Replicate staash us-east-1 zoneC s910 s908s913 s909s912
  12. 12. Open Zipkin A common format for trace annotations A Java tool for visualizing traces Standardization effort to fold in other formats Driven by Adrian Cole (currently at Pivotal) Extended to load Spigo generated trace files
  13. 13. Zipkin Trace Dependencies
  14. 14. Zipkin Trace Dependencies
  15. 15. Trace for one Spigo Flow
  16. 16. Definition of an architecture { "arch": "lamp", "description":"Simple LAMP stack", "version": "arch-0.0", "victim": "webserver", "services": [ { "name": "rds-mysql", "package": "store", "count": 2, "regions": 1, "dependencies": [] }, { "name": "memcache", "package": "store", "count": 1, "regions": 1, "dependencies": [] }, { "name": "webserver", "package": "monolith", "count": 18, "regions": 1, "dependencies": ["memcache", "rds-mysql"] }, { "name": "webserver-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["webserver"] }, { "name": "www", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["webserver-elb"] } ] } Header includes chaos monkey victim New tier name Tier package 0 = non Regional Node count List of tier dependencies
  17. 17. Running Spigo $ ./spigo -a lamp -j -d 2 2016/01/26 23:04:05 Loading architecture from json_arch/lamp_arch.json 2016/01/26 23:04:05 lamp.edda: starting 2016/01/26 23:04:05 Architecture: lamp Simple LAMP stack 2016/01/26 23:04:05 architecture: scaling to 100% 2016/01/26 23:04:05 lamp.us-east-1.zoneB.eureka01....eureka.eureka: starting 2016/01/26 23:04:05 lamp.us-east-1.zoneA.eureka00....eureka.eureka: starting 2016/01/26 23:04:05 lamp.us-east-1.zoneC.eureka02....eureka.eureka: starting 2016/01/26 23:04:05 Starting: {rds-mysql store 1 2 []} 2016/01/26 23:04:05 Starting: {memcache store 1 1 []} 2016/01/26 23:04:05 Starting: {webserver monolith 1 18 [memcache rds-mysql]} 2016/01/26 23:04:05 Starting: {webserver-elb elb 1 0 [webserver]} 2016/01/26 23:04:05 Starting: {www denominator 0 0 [webserver-elb]} 2016/01/26 23:04:05 lamp.*.*.www00....www.denominator activity rate 10ms 2016/01/26 23:04:06 chaosmonkey delete: lamp.us-east-1.zoneC.webserver02....webserver.monolith 2016/01/26 23:04:07 asgard: Shutdown 2016/01/26 23:04:07 lamp.us-east-1.zoneB.eureka01....eureka.eureka: closing 2016/01/26 23:04:07 lamp.us-east-1.zoneA.eureka00....eureka.eureka: closing 2016/01/26 23:04:07 lamp.us-east-1.zoneC.eureka02....eureka.eureka: closing 2016/01/26 23:04:07 spigo: complete 2016/01/26 23:04:07 lamp.edda: closing -a architecture lamp -j graph json/lamp.json -d run for 2 seconds
  18. 18. Riak IoT Architecture { "arch": "riak", "description":"Riak IoT ingestion example for the RICON 2015 presentation", "version": "arch-0.0", "victim": "", "services": [ { "name": "riakTS", "package": "riak", "count": 6, "regions": 1, "dependencies": ["riakTS", "eureka"]}, { "name": "ingester", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakTS"]}, { "name": "ingestMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["ingester"]}, { "name": "riakKV", "package": "riak", "count": 3, "regions": 1, "dependencies": ["riakKV"]}, { "name": "enricher", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakKV", "ingestMQ"]}, { "name": "enrichMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["enricher"]}, { "name": "analytics", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingester"]}, { "name": "analytics-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["analytics"]}, { "name": "analytics-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["analytics-elb"]}, { "name": "normalization", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["enrichMQ"]}, { "name": "iot-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["normalization"]}, { "name": "iot-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["iot-elb"]}, { "name": "stream", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingestMQ"]}, { "name": "stream-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["stream"]}, { "name": "stream-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["stream-elb"]} ] } New tier name Tier package Node count List of tier dependencies 0 = non Regional
  19. 19. Single Region Riak IoT
  20. 20. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint
  21. 21. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Load Balancer Load Balancer
  22. 22. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Load Balancer Load Balancer Stream Service Analytics Service
  23. 23. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Load Balancer Load Balancer Stream Service Analytics Service
  24. 24. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Ingest Message Queue Load Balancer Load Balancer Stream Service Analytics Service
  25. 25. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Ingest Message Queue Load Balancer Load Balancer Stream Service Riak TS Analytics Service Ingester Service
  26. 26. Two Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint East Region Ingestion West Region Ingestion Multi Region TS Analytics
  27. 27. Two Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint East Region Ingestion West Region Ingestion Multi Region TS Analytics What’s the response time of the stream endpoint?
  28. 28. Response Times
  29. 29. What’s the response time of a simple service? memcached rds-msql rds-msqlwebservers elb www
  30. 30. What’s the response time of an even simpler storage backed web service? memcached mysql disk volume web service load generator
  31. 31. See http://www.getguesstimate.com/models/1307 https://github.com/getguesstimate/guesstimate-app by Ozzie Gooen
  32. 32. See http://www.getguesstimate.com/models/1307 https://github.com/getguesstimate/guesstimate-app by Ozzie Gooen
  33. 33. See http://www.getguesstimate.com/models/1307 https://github.com/getguesstimate/guesstimate-app by Ozzie Gooen
  34. 34. See http://www.getguesstimate.com/models/1307 https://github.com/getguesstimate/guesstimate-app by Ozzie Gooen
  35. 35. Hit rates: memcached 40% mysql 70%
  36. 36. memcached hit % memcached response mysql response service cpu time memcached hit mode mysql cache hit mode mysql disk access mode Hit rates: memcached 40% mysql 70%
  37. 37. Hit rates: memcached 60% mysql 70%
  38. 38. memcached hit % memcached response mysql response service cpu time memcached hit mode mysql cache hit mode mysql disk access mode Hit rates: memcached 60% mysql 70%
  39. 39. Hit rates: memcached 20% mysql 90%
  40. 40. memcached hit % memcached response mysql response service cpu time memcached hit mode mysql cache hit mode mysql disk access mode Hit rates: memcached 20% mysql 90%
  41. 41. Measuring Response Time With Histograms
  42. 42. Changes made to codahale/hdrhistogram Changes made to go-kit/kit/metrics (today!) Implementation in adrianco/spigo/collect
  43. 43. What to measure? Client Server GetRequest GetResponse Client Time Client Send CS Server Receive SR Server Send SS Client Receive CR Server Time
  44. 44. What to measure? Client Server GetRequest GetResponse Client Time Client Send CS Server Receive SR Server Send SS Client Receive CR Response CR-CS Service SS-SR Network SR-CS Network CR-SS Net Round Trip (SR-CS) + (CR-SS) (CR-CS) - (SS-SR) Server Time
  45. 45. Spigo Histogram Collection func Start(listener chan gotocol.Message) { ... for { select { case msg := <-listener: flow.Instrument(msg, name, nethist) switch msg.Imposition { ... case gotocol.GetResponse: // return path from a request, terminate and log response time in histograms flow.End(msg, resphist, servhist, rthist) case gotocol.Goodbye: collect.SaveHist(nethist, name, "_net") collect.SaveHist(resphist, name, "_resp") collect.SaveHist(servhist, name, "_serv") collect.SaveHist(rthist, name, “_rt") collect.SaveAllGuesses(name) gotocol.Message{gotocol.Goodbye, nil, time.Now(), gotocol.NilContext, name}.GoSend(parent) return } case <-chatTicker.C: ... sm = gotocol.Message{gotocol.GetRequest, listener, now, ctx, "Why"} flow.AnnotateSend(sm, name) sm.GoSend(microindex[m]) // send to a randomly chosen dependency } } }
  46. 46. Go-Kit Histogram Collection const ( maxHistObservable = 1000000 sampleCount = 500 ) func NewHist(name string) metrics.Histogram { var h metrics.Histogram if name != "" && archaius.Conf.Collect { h = expvar.NewHistogram(name, 1000, maxHistObservable, 1, []int{50, 99}...) if sampleMap == nil { sampleMap = make(map[metrics.Histogram][]int64) } sampleMap[h] = make([]int64, 0, sampleCount) return h } return nil } func Measure(h metrics.Histogram, d time.Duration) { if h != nil && archaius.Conf.Collect { if d > maxHistObservable { h.Observe(int64(maxHistObservable)) } else { h.Observe(int64(d)) } s := sampleMap[h] if s != nil && len(s) < sampleCount { sampleMap[h] = append(s, int64(d)) } } } Nanoseconds! Median and 99%ile Slice for first 500 values as samples for export to Guesstimate
  47. 47. Spigo Histogram Results name: storage.*.*.load00....load.denominator_resp count: 1978 gauges: map[50:126975 99:278527] From, To, Count, Prob, Bar 28672, 29695, 1, 0.0005, : 31744, 32767, 1, 0.0005, : 34816, 36863, 2, 0.0010, :# 36864, 38911, 8, 0.0040, |###### 38912, 40959, 13, 0.0066, |########## 40960, 43007, 18, 0.0091, |############## 43008, 45055, 12, 0.0061, |######### 45056, 47103, 26, 0.0131, |#################### 47104, 49151, 24, 0.0121, |################## 49152, 51199, 33, 0.0167, |######################### 51200, 53247, 29, 0.0147, |###################### 53248, 55295, 35, 0.0177, |########################### 55296, 57343, 39, 0.0197, |############################## 57344, 59391, 35, 0.0177, |########################### 59392, 61439, 43, 0.0217, |################################# 61440, 63487, 31, 0.0157, |######################## 63488, 65535, 39, 0.0197, |############################## 65536, 69631, 74, 0.0374, |######################################################### 69632, 73727, 65, 0.0329, |################################################## 73728, 77823, 57, 0.0288, |############################################ 77824, 81919, 37, 0.0187, |############################ 81920, 86015, 37, 0.0187, |############################ 86016, 90111, 30, 0.0152, |####################### 90112, 94207, 39, 0.0197, |############################## 94208, 98303, 28, 0.0142, |##################### 98304, 102399, 30, 0.0152, |####################### 102400, 106495, 31, 0.0157, |######################## 106496, 110591, 20, 0.0101, |############### 110592, 114687, 26, 0.0131, |#################### 114688, 118783, 44, 0.0222, |################################## 118784, 122879, 41, 0.0207, |############################### 122880, 126975, 54, 0.0273, |########################################## 126976, 131071, 51, 0.0258, |####################################### 131072, 139263, 114, 0.0576, |######################################################################################## 139264, 147455, 123, 0.0622, |############################################################################################### 147456, 155647, 127, 0.0642, |################################################################################################### 155648, 163839, 102, 0.0516, |############################################################################### 163840, 172031, 90, 0.0455, |###################################################################### 172032, 180223, 65, 0.0329, |################################################## 180224, 188415, 43, 0.0217, |################################# 188416, 196607, 60, 0.0303, |############################################## 196608, 204799, 54, 0.0273, |########################################## 204800, 212991, 29, 0.0147, |###################### 212992, 221183, 21, 0.0106, |################ 221184, 229375, 25, 0.0126, |################### 229376, 237567, 18, 0.0091, |############## 237568, 245759, 15, 0.0076, |########### 245760, 253951, 9, 0.0046, |####### 253952, 262143, 8, 0.0040, |###### 262144, 278527, 10, 0.0051, |####### 278528, 294911, 6, 0.0030, |#### 294912, 311295, 2, 0.0010, |# 327680, 344063, 2, 0.0010, :# 344064, 360447, 1, 0.0005, | 376832, 393215, 1, 0.0005, : name: storage.*.*.load00....load.denominator_resp count: 1978 gauges: map[50:126975 99:278527] From, To, Count, Prob, Bar 28672, 29695, 1, 0.0005, : 31744, 32767, 1, 0.0005, : 34816, 36863, 2, 0.0010, :# 36864, 38911, 8, 0.0040, |###### 38912, 40959, 13, 0.0066, |########## Normalized probability Response time distribution measured in nanoseconds using High Dynamic Range Histogram :# Zero counts skipped |# Contiguous buckets Total count, median and 99th percentile values
  48. 48. Go Guesstimate Export https://github.com/adrianco/goguesstimate { "space": { "name": "gotest", "description": "Testing", "is_private": "true", "graph": { "metrics": [ {"id": "AB", "readableId": "AB", "name": "memcached", "location": {"row": 2, "column":4}}, {"id": "AC", "readableId": "AC", "name": "memcached percent", "location": {"row": 2, "column": 3}}, {"id": "AD", "readableId": "AD", "name": "staash cpu", "location": {"row": 3, "column":3}}, {"id": "AE", "readableId": "AE", "name": "staash", "location": {"row": 3, "column":2}} ], "guesstimates": [ {"metric": "AB", "input": null, "guesstimateType": "DATA", "data": [119958,6066,13914,9595,6773,5867,2347,1333,9900,9404,13518,9021,7915,3733,10244,5461,12243,7931,9044,11706, 5706,22861,9022,48661,15158,28995,16885,9564,17915,6610,7080,7065,12992,35431,11910,11465,14455,25790,8339,9 991]}, {"metric": "AC", "input": "40", "guesstimateType": "POINT"}, {"metric": "AD", "input": "[1000,4000]", "guesstimateType": "NORMAL"}, {"metric": "AE", "input": "=100+((randomInt(0,100)>AC)?AB:AD)", "guesstimateType": "FUNCTION"} ] } } }
  49. 49. See http://www.getguesstimate.com
  50. 50. See http://www.getguesstimate.com Response time distributions exported directly from Spigo as 500 samples to json_metrics/storage.guess then posted to guesstimate. Conference driven development not quite complete, go-kit PR in place to provide full names of histograms Relationship between services will also be exported soon.
  51. 51. What’s Next?
  52. 52. Trends to watch for 2016: Serverless Architectures - AWS Lambda Teraservices - using terabytes of memory
  53. 53. Teraservices
  54. 54. Terabyte Memory Directions Engulf dataset in memory for analytics Balanced config for memory intensive workloads Replace high end systems at commodity cost point Explore non-volatile memory implications
  55. 55. Terabyte Memory Options Now: Diablo DDR4 DIMM containing flash 64/128/256GB Migrates pages to/from companion DRAM DIMM Shipping now as volatile memory, future non-volatile Announced but not shipped for 2016 AWS X1 Instance Type - over 2TB RAM Easy availability should drive innovation
  56. 56. Diablo Memory1: Flash DIMM NO CHANGES to CPU or Server NO CHANGES to Operating System NO CHANGES to Applications ✓ UP TO 256GB DDR4 MEMORY PER MODULE ✓ UP TO 4TB MEMORY IN 2 SOCKET SYSTEM TM
  57. 57. Q&A Adrian Cockcroft @adrianco http://slideshare.com/adriancockcroft Technology Fellow - Battery Ventures See www.battery.com for a list of portfolio investments
  58. 58. Security Visit http://www.battery.com/our-companies/ for a full list of all portfolio companies in which all Battery Funds have invested. Palo Alto Networks Enterprise IT Operations & Management Big DataCompute Networking Storage

×