Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SVCC: Code Shaming and Antipatterns

482 views

Published on

Presentation from Silicon Valley Code Camp 2014, on subtle anti-patterns that show up in cloud services under load.

Published in: Software
  • Be the first to comment

SVCC: Code Shaming and Antipatterns

  1. 1. ... Web Role Azure Cloud Service
  2. 2. Service Bus Web Role Worker Azure Cloud Service
  3. 3. Service Bus Queue Message Batch Process Messages Process Message Process Message ..
  4. 4. Service Bus Queue Message Batch Process Messages Process Message .. Process Message
  5. 5. 00:30.2 00:25.9 00:21.6 00:17.3 00:13.0 00:08.6 00:04.3 00:00.0 Message Type 1 Message Type 2 Message Type 3 Message Type 4 Message Type 5 Message Type 6 Message Type 7 Message Type 8 Variation in Message Processing Avg Min Max
  6. 6. http://channel9.msdn.com/Series/PerfView-Tutorial/Tutorial-12-Wall-Clock-Time-Investigation-Basics
  7. 7. Cloud Service Boundary Load Balancer Web Servers Database App Servers Azure Queue(s)
  8. 8. ... Web Role Azure Cloud Service 500 databases
  9. 9. Azure Load Balancer DB1 DB2 DB3 SrcIp SrcPort DestIp DestPort A.B.C.D 1 E.F.G.H 1433 A.B.C.D 2 E.F.G.H 1433
  10. 10. ... Web Role Azure Cloud Service 500 databases Content moderation service
  11. 11. 450 400 350 300 250 200 150 100 50 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Seconds Web Request Response Latency Avg Latency Response Latency
  12. 12. ... Azure Cloud Service Web Role Worker Blob Queue Azure Storage Account
  13. 13. Query Throughput Latency Reach Every 30 seconds, each device publishes a status update (location, health, etc) 4k – 100k msgs/sec 2000 – 5000 ms Single device Every 10 minutes, a batch job retrieves all of the status updates delivered in the past 10 minutes 2M msgs / 10 minutes 2 minutes All devices On an ad-hoc basis, a user may request the current status and recent history of all of their devices 15 requests / second 500 ms Limited device set On an ad-hoc basis, a user may request a historical time range of all of their devices 5 requests / second 750 ms Limited device set
  14. 14. Pk={Device;Day}, Rk={Timestamp} Payload={fields} STB Readiness This isn’t a relational workload Per-device insert and lookup Periodic batch transfer Per-device lookup Natural fit for table storage Device ID = Pk Data type = Rk Periodic batch transfer Natural fit for blob storage Instance + Timestamp = blob id Buffer and write into blocks Roll over on time interval (10 min) 0101 1101 0111 1101 0111 ... Time/space buffer Table Storage Blob Storage Uri={Minute;Instance} Payload={JSON Data} Querying by device By time - direct { PkRk } lookup By day - direct { Pk } max of 2880 records per partition Batch transfer by time frame Parallel download of all blobs matching timeframe pattern Adding scale capacity 20k operations per storage account,
  15. 15. Where are the scalability bottlenecks? Where are the availability and failure points? Where are the key insight and instrumentation points? Cloud Service Front End Web Role Instance Instance Instance Instance Caching Role Instance Instance Worker Role Instance Databases DB DB DB DB Storage Storage Account Storage Account
  16. 16. http://channel9.msdn.com/Series/FailSafe http://code.msdn.microsoft.com/windowsazure/ContosoSocial-in- Windows-8dd9052c

×