Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Reaching 5 Million Messaging Connections: Our Journey with Kubernetes


Published on

Learn how we built a high-scale messaging service and state store on Kubernetes. The solution supports millions of persistent, concurrent connections; enables tens of thousands of messages per second; is globally addressable; stores millions of states; and responds with minimal latency (<250ms).

To evaluate build approaches, the team split into Makers & Breakers. Makers developed the solution stack while Breakers focused on repurposing Locust, a high-scale load testing framework, to simulate behavior. Leveraging the flexibility of Kubernetes, we were able to scale the stack and solve blockers on the path to a viable solution. Blockers included ingress, file descriptors, service discovery and resource limits. The experience was deeply educational, generating key learnings for developers tasked with building a scaled solution on top of Kubernetes.

Published in: Technology
  • Be the first to comment

Reaching 5 Million Messaging Connections: Our Journey with Kubernetes

  1. 1. Reaching 5 Million Messaging Connections: Our Journey with Kubernetes Dylan O’Mahony - Cloud Architecture Manager, Bose Dave Doyle - Software Engineering Manager, Connected Dec 12, 2018
  2. 2. So near, yet so far Cloud Home
  3. 3. So near, yet so far Cloud Home
  4. 4. So near, yet so far Cloud Home
  5. 5. So near, yet so far Cloud Home
  6. 6. © Notting Hill Where I’m coming from.
  7. 7. The Team Four people. Two teams. Makers and breakers.

  8. 8. The Stack
  9. 9. Infrastructure: “Galapagos” • Kubernetes on AWS (not EKS) • Each team member had a full rollout of the stack
  10. 10. Solution Model WebSocket Connection MQTT MQTT MQTT
  11. 11. Solution Components
  12. 12. Ingress & Load Balancing: HAProxy • De facto standard for proxy and load balancing • TCP for WebSockets • Less confusing than most ingress options
  13. 13. Message Broker: VerneMQ • Clustering • Bridging (future considerations) • MQTTv5 shared subscriptions • Fault tolerance • Well-defined netsplit behaviour • Time order integrity
  14. 14. The Glue: Listening Service • Written in Golang • Subscribes to VerneMQ with a shared subscription • Writes shadow states to Cassandra • Lightweight and performant
  15. 15. Shadow Store: Cassandra • Performant • Fault tolerant • Massively scalable • Stable
  16. 16. Setup: Kubernetes • All images built on Alpine • StatefulSet: VerneMQ, Cassandra • DaemonSet: HAProxy (ingress nodes) • Deployment: Listening Service, Prometheus, Grafana
  17. 17. Solution Components
  18. 18. Test Rig: Locust • Explored MZBench and JMeter • Wanted more flexibility • Decided on Locust
  19. 19. High-Level Architecture Cass Cass Cass Cass Cass Cass Cass Cass Cass LS LS LS LS LS LS VMQ VMQ VMQ VMQ VMQ VMQ VMQ VMQ VMQ HAProxy HAProxy HAProxy HAProxy HAProxy HAProxy HAProxy HAProxy HAProxy EC2EC2 EC2EC2 EC2EC2 Master
  20. 20. Testing
  21. 21. Target 5,000,000 Persistent Concurrent Connections
  22. 22. Result 340 Connections Blocker: Python File Descriptor limits • Paho MQTT client • Python and select() • Python has max 1024 file descriptors open when using select()
  23. 23. Workaround • Replaced select() call • Tried async_io library • Did not work % make python
  24. 24. Result 700k Connections Blocker: Configuration defaults and NAT • HAProxy port exhaustion • VerneMQ default connection config limits • Service abstraction NAT
  25. 25. Workaround Reconfigure Everything • VerneMQ: fix max connection setting, add 3 more listeners • Bypass Kubernetes Service • HAProxy • round-robin VerneMQ nodes • increase source ports • vertically scale ingress nodes for more iops/bandwidth • Created app to query Kubernetes API, returned templated config
  26. 26. Service Mesh?
  27. 27. Result 1.1 Million Connections • Subscriptions were failing • VerneMQ nodes were being terminated • Kubernetes brought them back up Blocker: ?
  28. 28. Diagnosing the Problem • Scaled VerneMQ incrementally from 10 to 80 nodes • Conclusion: resize/reallocation issue
  29. 29. Workaround Exponential Backoff • Modified clients to add custom behavior • Delayed subscriptions to begin at decaying rate • VerneMQ recovered
  30. 30. Result 1.5 Million Connections Blocker: Resources - Erlang/OTP Scheduler • Erlang schedulers went to 100% utilization • Increasing resources didn’t help
  31. 31. Workaround Reconfigure due to cgroups • Erlang/OTP is not cgroup-aware • Directly configure vCPUs in Erlang for the scheduler
  32. 32. Result 4.85 Million Connections Blocker: Resources Resources Resources
  33. 33. Active WebSocket Connections 5,000,001 Average latency for published message to reach subscriber 69ms 9,779Average throughput of publishes per second
  34. 34. Success!
  35. 35. Key Learnings
  36. 36. 1. Mind your dependencies
  37. 37. 2. Experiment with resource limits
  38. 38. 3. Layers complicate troubleshooting
  39. 39. 4. Starting at scale is different than organic growth
  40. 40. 5. Our solution was a lot cheaper Cost per device per annum Costperdevice Time
  41. 41. Conclusion
  42. 42. Thank you. Dec 12, 2018 @bose
  43. 43. Credits Peter Chow-Wah
 Software Engineer, Connected Scott Wallace
 Software Engineer, Connected Eric Ko
 Software Engineer, Connected Thomas Aston
 Lead Project Manager, Connected Cameron Rowshanbin
 Software Engineer, Connected Kitty Chio
 Project Manager, Connected Josh West
 Principal Cloud Engineer
 & Team Lead, Bose Myles Steinhauser
 Senior Cloud Engineer, Bose Yiwei Chen
 Cloud Engineer, Bose Kevin Bralten
 Solutions Engineer, Connected Special Thanks @bose @connectedio