SlideShare a Scribd company logo
1 of 25
Download to read offline
Building and Scaling a
WebSockets Pubsub
System
Kapil Reddy @ Helpshift
About me - Kapil
Staff Engineer @ Helpshift
Clojure
Distributed Systems
Games
Music
Books/Comics
Football
Helpshift is a Mobile CRM SaaS product. We help connect app developers with their customers. Since everything is now on mobile.
Scale
• ~2 TB data broadcast / day
• Outgoing - 75 k msg/sec
• Incoming - 1.5 k msg/sec
• Concurrency - 3.5k
Here are some scale numbers for the Platform we have built
PubSub Platform
We built a generic Pubish and Subscribe platform. Subscribers of these messages are Javascript clients listening on Websockets connection and Publishers are any
backend server using ZMQ to publish the messages
A simplified version of the platform’s architecture. Again browsers (Subscribers) connect to Dirigent using WebSockets and Backend servers (Publishers) connect to
Diligent using ZMQ. It’s a simplified view right now.
Zooming in a bit we get inside architecture a little more and see there are two different type of services. They internal talk to each other using ZMQ as well. Zookeeper is
used to do co-ordination between Dirigent services
We also we have multiple clusters and they can talk to each other. They have their different set of subscribers. Publishers can come from another cluster.
Evolution
v1 of the platform we used different transport mechanism. HTTP streaming for delivering messages to browsers and HTTP to deliver messages to Dirigent servers. HTTP
mechanism posed problem and it had coupling effect with backend server. Whenever dirigent platform went down due to load the HTTP connections timed out and
created a cascading failure in backend servers. We switched ZMQ there.
Problems with HTTP
streaming
Browser client needs only a subset of data but unsubscribing and subscribing to new topics was not possible over HTTP streaming since it’s unidirectional channel. The
only option was push everything to all clients for a specific subdomain. Initially it sounded like a good idea but once we hit scale we were running out of network
bandwidth per machine. We switched to web sockets where client can ask for specific information based on UI actions.
Under the hood
• Clojure (JVM)
• Http-kit (NIO based web sockets server)
• ZMQ
• Zookeeper
Monitoring
All the messages we are publishing is important data and needs to rendered in time. The nature of this data is ephemeral. We don’t store it anywhere so auditing is hard.
So utilising monitoring was crucial for us.
Under the hood
• StatsD protocol
• Graphite - Storage
• Grafana - Frontend
*example of monitoring
comparison different
stages*
Since auditing this kind of data is hard. We compare metrics of data in different stages of the platform. But since the numbers are big it’s hard to spot any anomaly. What
we are looking for is variance.
Message variance is easy to parse visually. If variance is low some stage of the platform is dropping data. In fact we also have setup alerts on this same query.
Another important metric is time taken to publish a message to WebSocket connection. Since near real time SLA is so important we look at p99s for anomalies. We have
setup alerts on these as well.
Cost saving
Costs are a concern for us always! There are two important factors that add up to the cost. Outgoing bandwidth usage and number of machines
Compression
First we started using gzip compression for websockets. It’s a standard compression mechanism supported by browsers but as with browsers there are quirks here.
Re-visiting features
Biggest change you can do to save costs is to re-visit the features/business logic itself and try to optimise there. This reduced the bandwidth usage by significant
amount.
Auto scaling
To save up on number of machines used. We started investigating in how to do auto scaling. Auto scaling was not a straight forward thing since all the connections are
long running and usually can stay alive for as long as 8 hours.
HAProxy with least
conn
We went with the obvious choice of least connection with HAProxy doing the load balancing.
Least load connection
works.
Sometimes
The problem with least load connection is assumption that number of connections a server is handling is directly proportional to amount of work it’s doing. This was a
wrong assumption and it just lead us to uneven distribution. Server crashes and just bad sleepless nights.
Feedback load
balancing
Feedback load balancing is something we started to do with Herald an internal tool we built at Helpshift. This helps HAProxy decide which server to choose when routing
a new connection. All the servers can expose the current load they are under to Herald which in turns tells HAproxy which server to choose. If all servers are loaded we
scale out. If all servers are under loaded we scale in.
Summary
• Building a web sockets infrastructure on EC2 is
possible but it has quirks
• Use feedback load balancing for WebSockets /
Long running connection traffic
• ZMQ, JVM are solid building blocks for building a
realtime pubsub platform
• Instrumentation in multiple stages of platform is a
good way to keep track of a real time system

More Related Content

What's hot

Web Real-time Communications
Web Real-time CommunicationsWeb Real-time Communications
Web Real-time CommunicationsAlexei Skachykhin
 
SignalR for ASP.NET Developers
SignalR for ASP.NET DevelopersSignalR for ASP.NET Developers
SignalR for ASP.NET DevelopersShivanand Arur
 
Microsoft signal r
Microsoft signal rMicrosoft signal r
Microsoft signal rrustd
 
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...confluent
 
BlazeMeter Presents at the High Performance Drupal Meetup
BlazeMeter Presents at the High Performance Drupal MeetupBlazeMeter Presents at the High Performance Drupal Meetup
BlazeMeter Presents at the High Performance Drupal MeetupBlazeMeter
 
2.2 Reliable Message Bus based on RocketMQ
2.2 Reliable Message Bus based on RocketMQ2.2 Reliable Message Bus based on RocketMQ
2.2 Reliable Message Bus based on RocketMQ振东 刘
 
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...confluent
 
Php day 2011 - Zing me configuration system arch
Php day 2011 - Zing me configuration system archPhp day 2011 - Zing me configuration system arch
Php day 2011 - Zing me configuration system archQuang Anh Le
 
Building Realtime Web Applications With ASP.NET SignalR
Building Realtime Web Applications With ASP.NET SignalRBuilding Realtime Web Applications With ASP.NET SignalR
Building Realtime Web Applications With ASP.NET SignalRShravan Kumar Kasagoni
 
Aws 12 Month Free Tier for Web Designers and Developers
Aws 12 Month Free Tier for Web Designers and DevelopersAws 12 Month Free Tier for Web Designers and Developers
Aws 12 Month Free Tier for Web Designers and DevelopersDylan Burris
 
Testing the limits of cloud networks
Testing the limits of cloud networksTesting the limits of cloud networks
Testing the limits of cloud networksPLUMgrid
 
How to Build High Performance : WordPress
How to Build High Performance : WordPressHow to Build High Performance : WordPress
How to Build High Performance : WordPressDylan Burris
 
Messaging Powered Front Ends
Messaging Powered Front EndsMessaging Powered Front Ends
Messaging Powered Front EndsElton Stoneman
 
Introduction to SignalR
Introduction to SignalRIntroduction to SignalR
Introduction to SignalRAdam Mokan
 
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...VMware Tanzu
 

What's hot (20)

Web Real-time Communications
Web Real-time CommunicationsWeb Real-time Communications
Web Real-time Communications
 
SignalR for ASP.NET Developers
SignalR for ASP.NET DevelopersSignalR for ASP.NET Developers
SignalR for ASP.NET Developers
 
Microsoft signal r
Microsoft signal rMicrosoft signal r
Microsoft signal r
 
Advanced WCF
Advanced WCFAdvanced WCF
Advanced WCF
 
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
 
BlazeMeter Presents at the High Performance Drupal Meetup
BlazeMeter Presents at the High Performance Drupal MeetupBlazeMeter Presents at the High Performance Drupal Meetup
BlazeMeter Presents at the High Performance Drupal Meetup
 
Real time web with SignalR
Real time web with SignalRReal time web with SignalR
Real time web with SignalR
 
2.2 Reliable Message Bus based on RocketMQ
2.2 Reliable Message Bus based on RocketMQ2.2 Reliable Message Bus based on RocketMQ
2.2 Reliable Message Bus based on RocketMQ
 
Real-time Communications with SignalR
Real-time Communications with SignalRReal-time Communications with SignalR
Real-time Communications with SignalR
 
Load balancer
Load balancerLoad balancer
Load balancer
 
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
 
Php day 2011 - Zing me configuration system arch
Php day 2011 - Zing me configuration system archPhp day 2011 - Zing me configuration system arch
Php day 2011 - Zing me configuration system arch
 
Building Realtime Web Applications With ASP.NET SignalR
Building Realtime Web Applications With ASP.NET SignalRBuilding Realtime Web Applications With ASP.NET SignalR
Building Realtime Web Applications With ASP.NET SignalR
 
Aws 12 Month Free Tier for Web Designers and Developers
Aws 12 Month Free Tier for Web Designers and DevelopersAws 12 Month Free Tier for Web Designers and Developers
Aws 12 Month Free Tier for Web Designers and Developers
 
Testing the limits of cloud networks
Testing the limits of cloud networksTesting the limits of cloud networks
Testing the limits of cloud networks
 
How to Build High Performance : WordPress
How to Build High Performance : WordPressHow to Build High Performance : WordPress
How to Build High Performance : WordPress
 
Messaging Powered Front Ends
Messaging Powered Front EndsMessaging Powered Front Ends
Messaging Powered Front Ends
 
Introduction to SignalR
Introduction to SignalRIntroduction to SignalR
Introduction to SignalR
 
Introduction to SignalR
Introduction to SignalRIntroduction to SignalR
Introduction to SignalR
 
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
 

Similar to Building and Scaling a WebSockets Pubsub System

Introduction to requirement of microservices
Introduction to requirement of microservicesIntroduction to requirement of microservices
Introduction to requirement of microservicesAvik Das
 
Silk Performer Presentation v1
Silk Performer Presentation v1Silk Performer Presentation v1
Silk Performer Presentation v1Sun Technlogies
 
Arsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry SusantoArsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry SusantoDicodingEvent
 
Docebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverlessDocebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverlessAWS User Group Italy
 
Microservice Workshop Hands On
Microservice Workshop Hands On Microservice Workshop Hands On
Microservice Workshop Hands On Ram G Suri
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKXThe Art of Message Queues - TEKX
The Art of Message Queues - TEKXMike Willbanks
 
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized APIImplementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized APIIJCSIS Research Publications
 
Transcend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC ProductsTranscend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC ProductsBaiju P.S.
 
Real Time Web with SignalR
Real Time Web with SignalRReal Time Web with SignalR
Real Time Web with SignalRBilal Amjad
 
All you need to know about yelowsofts new version update
All you need to know about yelowsofts new version updateAll you need to know about yelowsofts new version update
All you need to know about yelowsofts new version updateYelowsoft
 
InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud
InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloudInterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud
InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloudiMasters
 
Programming Server side with Sevlet
 Programming Server side with Sevlet  Programming Server side with Sevlet
Programming Server side with Sevlet backdoor
 
Server Farms and XML Web Services
Server Farms and XML Web ServicesServer Farms and XML Web Services
Server Farms and XML Web ServicesJorgen Thelin
 
JNUC 2017: Open Distribution Server
JNUC 2017: Open Distribution ServerJNUC 2017: Open Distribution Server
JNUC 2017: Open Distribution ServerBryson Tyrrell
 
Unit 1st and 3rd notes of java
Unit 1st and 3rd notes of javaUnit 1st and 3rd notes of java
Unit 1st and 3rd notes of javaNiraj Bharambe
 
A scalable and reliable matching service for content based publish subscribe ...
A scalable and reliable matching service for content based publish subscribe ...A scalable and reliable matching service for content based publish subscribe ...
A scalable and reliable matching service for content based publish subscribe ...somnath goud
 
Mumbai MuleSoft Meetup 12
Mumbai MuleSoft Meetup 12Mumbai MuleSoft Meetup 12
Mumbai MuleSoft Meetup 12Akshata Sawant
 
'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...OdessaJS Conf
 
Optimizing cloud resources for delivering iptv services through virtualization
Optimizing cloud resources for delivering iptv services through virtualizationOptimizing cloud resources for delivering iptv services through virtualization
Optimizing cloud resources for delivering iptv services through virtualizationJPINFOTECH JAYAPRAKASH
 
Confluent Messaging Modernization Forum
Confluent Messaging Modernization ForumConfluent Messaging Modernization Forum
Confluent Messaging Modernization Forumconfluent
 

Similar to Building and Scaling a WebSockets Pubsub System (20)

Introduction to requirement of microservices
Introduction to requirement of microservicesIntroduction to requirement of microservices
Introduction to requirement of microservices
 
Silk Performer Presentation v1
Silk Performer Presentation v1Silk Performer Presentation v1
Silk Performer Presentation v1
 
Arsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry SusantoArsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry Susanto
 
Docebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverlessDocebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverless
 
Microservice Workshop Hands On
Microservice Workshop Hands On Microservice Workshop Hands On
Microservice Workshop Hands On
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKXThe Art of Message Queues - TEKX
The Art of Message Queues - TEKX
 
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized APIImplementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized API
 
Transcend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC ProductsTranscend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC Products
 
Real Time Web with SignalR
Real Time Web with SignalRReal Time Web with SignalR
Real Time Web with SignalR
 
All you need to know about yelowsofts new version update
All you need to know about yelowsofts new version updateAll you need to know about yelowsofts new version update
All you need to know about yelowsofts new version update
 
InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud
InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloudInterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud
InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud
 
Programming Server side with Sevlet
 Programming Server side with Sevlet  Programming Server side with Sevlet
Programming Server side with Sevlet
 
Server Farms and XML Web Services
Server Farms and XML Web ServicesServer Farms and XML Web Services
Server Farms and XML Web Services
 
JNUC 2017: Open Distribution Server
JNUC 2017: Open Distribution ServerJNUC 2017: Open Distribution Server
JNUC 2017: Open Distribution Server
 
Unit 1st and 3rd notes of java
Unit 1st and 3rd notes of javaUnit 1st and 3rd notes of java
Unit 1st and 3rd notes of java
 
A scalable and reliable matching service for content based publish subscribe ...
A scalable and reliable matching service for content based publish subscribe ...A scalable and reliable matching service for content based publish subscribe ...
A scalable and reliable matching service for content based publish subscribe ...
 
Mumbai MuleSoft Meetup 12
Mumbai MuleSoft Meetup 12Mumbai MuleSoft Meetup 12
Mumbai MuleSoft Meetup 12
 
'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...
 
Optimizing cloud resources for delivering iptv services through virtualization
Optimizing cloud resources for delivering iptv services through virtualizationOptimizing cloud resources for delivering iptv services through virtualization
Optimizing cloud resources for delivering iptv services through virtualization
 
Confluent Messaging Modernization Forum
Confluent Messaging Modernization ForumConfluent Messaging Modernization Forum
Confluent Messaging Modernization Forum
 

Recently uploaded

chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 

Recently uploaded (20)

chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 

Building and Scaling a WebSockets Pubsub System

  • 1. Building and Scaling a WebSockets Pubsub System Kapil Reddy @ Helpshift
  • 2. About me - Kapil Staff Engineer @ Helpshift Clojure Distributed Systems Games Music Books/Comics Football
  • 3. Helpshift is a Mobile CRM SaaS product. We help connect app developers with their customers. Since everything is now on mobile.
  • 4. Scale • ~2 TB data broadcast / day • Outgoing - 75 k msg/sec • Incoming - 1.5 k msg/sec • Concurrency - 3.5k Here are some scale numbers for the Platform we have built
  • 5. PubSub Platform We built a generic Pubish and Subscribe platform. Subscribers of these messages are Javascript clients listening on Websockets connection and Publishers are any backend server using ZMQ to publish the messages
  • 6. A simplified version of the platform’s architecture. Again browsers (Subscribers) connect to Dirigent using WebSockets and Backend servers (Publishers) connect to Diligent using ZMQ. It’s a simplified view right now.
  • 7. Zooming in a bit we get inside architecture a little more and see there are two different type of services. They internal talk to each other using ZMQ as well. Zookeeper is used to do co-ordination between Dirigent services
  • 8. We also we have multiple clusters and they can talk to each other. They have their different set of subscribers. Publishers can come from another cluster.
  • 10. v1 of the platform we used different transport mechanism. HTTP streaming for delivering messages to browsers and HTTP to deliver messages to Dirigent servers. HTTP mechanism posed problem and it had coupling effect with backend server. Whenever dirigent platform went down due to load the HTTP connections timed out and created a cascading failure in backend servers. We switched ZMQ there.
  • 11. Problems with HTTP streaming Browser client needs only a subset of data but unsubscribing and subscribing to new topics was not possible over HTTP streaming since it’s unidirectional channel. The only option was push everything to all clients for a specific subdomain. Initially it sounded like a good idea but once we hit scale we were running out of network bandwidth per machine. We switched to web sockets where client can ask for specific information based on UI actions.
  • 12. Under the hood • Clojure (JVM) • Http-kit (NIO based web sockets server) • ZMQ • Zookeeper
  • 13. Monitoring All the messages we are publishing is important data and needs to rendered in time. The nature of this data is ephemeral. We don’t store it anywhere so auditing is hard. So utilising monitoring was crucial for us.
  • 14. Under the hood • StatsD protocol • Graphite - Storage • Grafana - Frontend
  • 15. *example of monitoring comparison different stages* Since auditing this kind of data is hard. We compare metrics of data in different stages of the platform. But since the numbers are big it’s hard to spot any anomaly. What we are looking for is variance.
  • 16. Message variance is easy to parse visually. If variance is low some stage of the platform is dropping data. In fact we also have setup alerts on this same query.
  • 17. Another important metric is time taken to publish a message to WebSocket connection. Since near real time SLA is so important we look at p99s for anomalies. We have setup alerts on these as well.
  • 18. Cost saving Costs are a concern for us always! There are two important factors that add up to the cost. Outgoing bandwidth usage and number of machines
  • 19. Compression First we started using gzip compression for websockets. It’s a standard compression mechanism supported by browsers but as with browsers there are quirks here.
  • 20. Re-visiting features Biggest change you can do to save costs is to re-visit the features/business logic itself and try to optimise there. This reduced the bandwidth usage by significant amount.
  • 21. Auto scaling To save up on number of machines used. We started investigating in how to do auto scaling. Auto scaling was not a straight forward thing since all the connections are long running and usually can stay alive for as long as 8 hours.
  • 22. HAProxy with least conn We went with the obvious choice of least connection with HAProxy doing the load balancing.
  • 23. Least load connection works. Sometimes The problem with least load connection is assumption that number of connections a server is handling is directly proportional to amount of work it’s doing. This was a wrong assumption and it just lead us to uneven distribution. Server crashes and just bad sleepless nights.
  • 24. Feedback load balancing Feedback load balancing is something we started to do with Herald an internal tool we built at Helpshift. This helps HAProxy decide which server to choose when routing a new connection. All the servers can expose the current load they are under to Herald which in turns tells HAproxy which server to choose. If all servers are loaded we scale out. If all servers are under loaded we scale in.
  • 25. Summary • Building a web sockets infrastructure on EC2 is possible but it has quirks • Use feedback load balancing for WebSockets / Long running connection traffic • ZMQ, JVM are solid building blocks for building a realtime pubsub platform • Instrumentation in multiple stages of platform is a good way to keep track of a real time system