SlideShare a Scribd company logo
1
Scaling to Millions of
Simultaneous Connections
Rick Reed
WhatsApp
Erlang Factory SF
March 30, 2012
2
About ...
Joined WhatsApp in 2011
New to Erlang
Background in performance of C-based
systems on FreeBSD and Linux
Prior work at Yahoo!, SGI
3
Overview
The “good problem to have”
Performance Goals
Tools and Techniques
Results
General Findings
Specific Scalability Fixes
4
The Problem
A good problem, but a problem nonetheless
Growth, Earthquakes, and Soccer!
Msg rates for past four weeks
Mexican earthquake
goals
HT FT
5
The Problem
Initial server loading: ~200k connections
Discouraging prognosis for growth
Cluster brittle in the face of failures/overloads
6
Performance Goals
1 Million connections per server … !
Resilience against disruptions under load
Software failures
Hardware failures (servers, network gear)
World events (sports, earthquakes, etc.)
7
Performance Goals
Our standard configuration
Dual Westmere Hex-core (24 logical CPUs)
100GB RAM, SSD
Dual NIC (user-facing, back-end/distribution)
FreeBSD 8.3
OTP R14B03
8
Tools and Techniques
System activity monitoring (wsar)
OS-level
BEAM
9
Tools and Techniques
Processor hardware perf counters (pmcstat)
dtrace, kernel lock-counting, gprof
10
Tools and Techniques
fprof (w/ and w/o cpu_timestamp)
BEAM lock-counting (invaluable!!!)
11
Tools and Techniques
Synthetic workload
Good for subsystems with simple interfaces
Limited value for user-facing systems
12
Tools and Techniques
Tee'd workload
Where side-effects can be contained
Extremely useful for tuning
13
Tools and Techniques
Diverted workload
Add additional production load to server
DNS via extra IP aliases
TTL issues
IPFW forwarding
Ran into a few kernel panics at high conn counts
14
Results
Initial bottlenecks appeared around 425k
First round of fixes got us to 1M conns
Fruit was hanging pretty low
15
Results
Continued attacking similar bottlenecks
Achieved 2M conns about a month later
Put further optimizations on back burner
16
Results
Began optimizing app code after New Years
Unintentional record attempt in Feb
Peaked at 2.8M conns before we intervened
571k pkts/sec, >200k dist msgs/sec
17
Results
Still trying to obtain elusive 3M conns
St. Patrick's Day wasn't as lucky as hoped
18
General Findings
Erlang has awesome SMP scalability
>85% cpu utilization across 24 logical cpus
FreeBSD shines as well
19
General Findings
CPU% vs. # Conns
20
General Findings
Contention, contention, contention
From 200k to 2M were all contention fixes
Some issues are internal to BEAM
Some addressable with app changes
Most required BEAM patches
Some required app changes
Especially: partitioning workload correctly
Some common Erlang idioms come at a price
21
Specific Scalability Fixes
FreeBSD
Backported TSC-based kernel timecounter
gettimeofday(2) calls much less expensive
Backported igb network driver
Had issues with MSI-X queue stalls
sysctl tuning
Obvious limits (e.g., kern.ipc.maxsockets)
net.inet.tcp.tcphashsize=524288
22
Specific Scalability Fixes
BEAM metrics
Scheduler (%util, csw, waits, sleeps, …)
statistics(message_queues)
Msgs queued, #non-empty queues, longest queue
process_info(message_queue_stats)
Enq/deq/send count & rates (1s, 10s, 100s)
statistics(message_counts)
Aggregation of message_queue_stats
Enable fprof cpu_timestamp for FreeBSD
23
Specific Scalability Fixes
BEAM metrics (cont.)
Make lock-counting work for larger async
thread counts (e.g., +A 1024)
Add suspend, location, and port_locks options
to erts_debug:lock_counters
Enable/disable process/port lock counting at
runtime
Fix missing accounting for outbound dist bytes
24
Specific Scalability Fixes
BEAM tuning
+swt low
Avoid scheduler perma-sleep
+Mummc/mmmbc/mmsbc 99999
Prefer mseg over malloc
+Mut 24
Want allocator instance per scheduler
25
Specific Scalability Fixes
BEAM tuning
+Mulmbcs 32767 +Mumbcgs 1
+Musmbcs 2047
Want large 2M-aligned mseg allocations to
maximize superpage promotions
Run with real-time scheduling priority
+ssct 1 (via patch; scheduler spin count)
26
Specific Scalability Fixes
BEAM contention
timeofday lock (esp., timeofday delivery)
Reduced slot traversals on timer wheel
Widened bif timer hash table
Ended up moving bif timers to receive timeouts
Improved check_io allocation scalability
Added prim_file:write_file/3 & /4 (port reuse)
Disable mseg max check
27
Specific Scalability Fixes
BEAM contention (cont.)
Reduce setopts calls in prim_inet:accept
and in inet:tcp_controlling_process
28
Specific Scalability Fixes
OTP throughput
Add gc throttling when message queue is long
Increase default dist receive buffer from 4k to
256k (and make configurable)
Patch mnesia_tm to dispatch async_dirty txns
to separate per-table procs for concurrency
Add pg2 denormalized group member lists to
improve lookup throughput
Increase max configurable mseg cache size
29
Specific Scalability Fixes
Erlang usage
Prefer os:timestamp to erlang:now
Implement cross-node gen_server calls without
using monitors (reduces dist traffic and proc
link lock contention)
Partition ets and mnesia tables and localize
access to smaller number of processes
Small mnesia clusters
30
Specific Scalability Fixes
Operability fixes
Added [prepend] option to erlang:send
Added process_flag(flush_message_queue)
31
Questions? Comments?
rr@whatsapp.com

More Related Content

What's hot

F5_Active-Active Data Center.pdf
F5_Active-Active Data Center.pdfF5_Active-Active Data Center.pdf
F5_Active-Active Data Center.pdfSolutions Architect
 
Facebook chat architecture
Facebook chat architectureFacebook chat architecture
Facebook chat architectureUdaya Kiran
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Claus Ibsen
 
Outrageous Performance: RageDB's Experience with the Seastar Framework
Outrageous Performance: RageDB's Experience with the Seastar FrameworkOutrageous Performance: RageDB's Experience with the Seastar Framework
Outrageous Performance: RageDB's Experience with the Seastar FrameworkScyllaDB
 
Cilium - API-aware Networking and Security for Containers based on BPF
Cilium - API-aware Networking and Security for Containers based on BPFCilium - API-aware Networking and Security for Containers based on BPF
Cilium - API-aware Networking and Security for Containers based on BPFThomas Graf
 
FreeSWITCH as a Microservice
FreeSWITCH as a MicroserviceFreeSWITCH as a Microservice
FreeSWITCH as a MicroserviceEvan McGee
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane Michelle Holley
 
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020Jelastic Multi-Cloud PaaS
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge MigrationJames Denton
 
NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?Anton Zadorozhniy
 
JANOG43 Forefront of SRv6, Open Source Implementations
JANOG43 Forefront of SRv6, Open Source ImplementationsJANOG43 Forefront of SRv6, Open Source Implementations
JANOG43 Forefront of SRv6, Open Source ImplementationsKentaro Ebisawa
 
Nginx Internals
Nginx InternalsNginx Internals
Nginx InternalsJoshua Zhu
 
LF_DPDK17_Flexible and Extensible support for new protocol processing with DP...
LF_DPDK17_Flexible and Extensible support for new protocol processing with DP...LF_DPDK17_Flexible and Extensible support for new protocol processing with DP...
LF_DPDK17_Flexible and Extensible support for new protocol processing with DP...LF_DPDK
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDPDaniel T. Lee
 
Massive service basic
Massive service basicMassive service basic
Massive service basicDaeMyung Kang
 
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...Nicolas Fränkel
 
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on KubernetesStateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetesconfluent
 

What's hot (20)

F5_Active-Active Data Center.pdf
F5_Active-Active Data Center.pdfF5_Active-Active Data Center.pdf
F5_Active-Active Data Center.pdf
 
Facebook chat architecture
Facebook chat architectureFacebook chat architecture
Facebook chat architecture
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
 
Outrageous Performance: RageDB's Experience with the Seastar Framework
Outrageous Performance: RageDB's Experience with the Seastar FrameworkOutrageous Performance: RageDB's Experience with the Seastar Framework
Outrageous Performance: RageDB's Experience with the Seastar Framework
 
Cilium - API-aware Networking and Security for Containers based on BPF
Cilium - API-aware Networking and Security for Containers based on BPFCilium - API-aware Networking and Security for Containers based on BPF
Cilium - API-aware Networking and Security for Containers based on BPF
 
FreeSWITCH as a Microservice
FreeSWITCH as a MicroserviceFreeSWITCH as a Microservice
FreeSWITCH as a Microservice
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane
 
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
 
NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?
 
JANOG43 Forefront of SRv6, Open Source Implementations
JANOG43 Forefront of SRv6, Open Source ImplementationsJANOG43 Forefront of SRv6, Open Source Implementations
JANOG43 Forefront of SRv6, Open Source Implementations
 
Nginx Internals
Nginx InternalsNginx Internals
Nginx Internals
 
LF_DPDK17_Flexible and Extensible support for new protocol processing with DP...
LF_DPDK17_Flexible and Extensible support for new protocol processing with DP...LF_DPDK17_Flexible and Extensible support for new protocol processing with DP...
LF_DPDK17_Flexible and Extensible support for new protocol processing with DP...
 
Intel dpdk Tutorial
Intel dpdk TutorialIntel dpdk Tutorial
Intel dpdk Tutorial
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDP
 
Massive service basic
Massive service basicMassive service basic
Massive service basic
 
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
 
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on KubernetesStateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
 

Viewers also liked

Realtime communication in mobile
Realtime communication in mobileRealtime communication in mobile
Realtime communication in mobilegirish_fingent
 
Erlang, the big switch in social games
Erlang, the big switch in social gamesErlang, the big switch in social games
Erlang, the big switch in social gamesWooga
 
MQTT with Java - a protocol for IoT and M2M communication
MQTT with Java - a protocol for IoT and M2M communicationMQTT with Java - a protocol for IoT and M2M communication
MQTT with Java - a protocol for IoT and M2M communicationChristian Götz
 
Getting started with MQTT - Virtual IoT Meetup presentation
Getting started with MQTT - Virtual IoT Meetup presentationGetting started with MQTT - Virtual IoT Meetup presentation
Getting started with MQTT - Virtual IoT Meetup presentationChristian Götz
 
Hike 29 Presentation
Hike 29 PresentationHike 29 Presentation
Hike 29 Presentation52 Hikes
 
Whatsapp Technical
Whatsapp Technical Whatsapp Technical
Whatsapp Technical harshghagare
 
WhatsApp architecture
WhatsApp architectureWhatsApp architecture
WhatsApp architectureMahesh Bitla
 
Whatsapp's Architecture
Whatsapp's ArchitectureWhatsapp's Architecture
Whatsapp's ArchitectureUdaya Kiran
 
Scaling Hike Messenger to 15M Users
Scaling Hike Messenger to 15M UsersScaling Hike Messenger to 15M Users
Scaling Hike Messenger to 15M UsersMongoDB
 

Viewers also liked (13)

Realtime communication in mobile
Realtime communication in mobileRealtime communication in mobile
Realtime communication in mobile
 
Erlang, the big switch in social games
Erlang, the big switch in social gamesErlang, the big switch in social games
Erlang, the big switch in social games
 
MQTT with Java - a protocol for IoT and M2M communication
MQTT with Java - a protocol for IoT and M2M communicationMQTT with Java - a protocol for IoT and M2M communication
MQTT with Java - a protocol for IoT and M2M communication
 
Getting started with MQTT - Virtual IoT Meetup presentation
Getting started with MQTT - Virtual IoT Meetup presentationGetting started with MQTT - Virtual IoT Meetup presentation
Getting started with MQTT - Virtual IoT Meetup presentation
 
Hike 29 Presentation
Hike 29 PresentationHike 29 Presentation
Hike 29 Presentation
 
Whatsapp Technical
Whatsapp Technical Whatsapp Technical
Whatsapp Technical
 
WhatsApp architecture
WhatsApp architectureWhatsApp architecture
WhatsApp architecture
 
Whatsapp's Architecture
Whatsapp's ArchitectureWhatsapp's Architecture
Whatsapp's Architecture
 
Scaling Hike Messenger to 15M Users
Scaling Hike Messenger to 15M UsersScaling Hike Messenger to 15M Users
Scaling Hike Messenger to 15M Users
 
Whatsapp
WhatsappWhatsapp
Whatsapp
 
Whatsapp project work
Whatsapp project workWhatsapp project work
Whatsapp project work
 
10 Amazing facts about WhatsApp
10 Amazing facts about WhatsApp10 Amazing facts about WhatsApp
10 Amazing facts about WhatsApp
 
whatsapp ppt
whatsapp pptwhatsapp ppt
whatsapp ppt
 

Similar to Scaling to Millions of Simultaneous Connections by Rick Reed from WhatsApp

Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Programinside-BigData.com
 
Q2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingQ2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingLinaro
 
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slidessmpant
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
How We Test MongoDB: Evergreen
How We Test MongoDB: EvergreenHow We Test MongoDB: Evergreen
How We Test MongoDB: EvergreenMongoDB
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
 
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontierinside-BigData.com
 
Metrics towards enterprise readiness of unikernels
Metrics towards enterprise readiness of unikernelsMetrics towards enterprise readiness of unikernels
Metrics towards enterprise readiness of unikernelsMadhuri Yechuri
 
State Zero: Middle Tennessee Electric Membership Corporation
State Zero: Middle Tennessee Electric Membership CorporationState Zero: Middle Tennessee Electric Membership Corporation
State Zero: Middle Tennessee Electric Membership CorporationSSP Innovations
 
MTEMC’s State 0 Changes with 1700+ Versions Intact
MTEMC’s State 0 Changes with 1700+ Versions IntactMTEMC’s State 0 Changes with 1700+ Versions Intact
MTEMC’s State 0 Changes with 1700+ Versions IntactSSP Innovations
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineAndreas Grabner
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupSlide_N
 
Master's Thesis - climateprediction.net: A Cloudy Approach
Master's Thesis - climateprediction.net: A Cloudy ApproachMaster's Thesis - climateprediction.net: A Cloudy Approach
Master's Thesis - climateprediction.net: A Cloudy Approachkabute
 
Was liberty at scale
Was liberty at scaleWas liberty at scale
Was liberty at scalesflynn073
 

Similar to Scaling to Millions of Simultaneous Connections by Rick Reed from WhatsApp (20)

Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
 
Q2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingQ2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP Scheduling
 
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slides
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Handout3o
Handout3oHandout3o
Handout3o
 
PraveenBOUT++
PraveenBOUT++PraveenBOUT++
PraveenBOUT++
 
How We Test MongoDB: Evergreen
How We Test MongoDB: EvergreenHow We Test MongoDB: Evergreen
How We Test MongoDB: Evergreen
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
Gatehouse software genanvendelse
Gatehouse software genanvendelseGatehouse software genanvendelse
Gatehouse software genanvendelse
 
Data race
Data raceData race
Data race
 
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
 
Metrics towards enterprise readiness of unikernels
Metrics towards enterprise readiness of unikernelsMetrics towards enterprise readiness of unikernels
Metrics towards enterprise readiness of unikernels
 
State Zero: Middle Tennessee Electric Membership Corporation
State Zero: Middle Tennessee Electric Membership CorporationState Zero: Middle Tennessee Electric Membership Corporation
State Zero: Middle Tennessee Electric Membership Corporation
 
MTEMC’s State 0 Changes with 1700+ Versions Intact
MTEMC’s State 0 Changes with 1700+ Versions IntactMTEMC’s State 0 Changes with 1700+ Versions Intact
MTEMC’s State 0 Changes with 1700+ Versions Intact
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
 
Master's Thesis - climateprediction.net: A Cloudy Approach
Master's Thesis - climateprediction.net: A Cloudy ApproachMaster's Thesis - climateprediction.net: A Cloudy Approach
Master's Thesis - climateprediction.net: A Cloudy Approach
 
Hptf 2240 Final
Hptf 2240 FinalHptf 2240 Final
Hptf 2240 Final
 
Was liberty at scale
Was liberty at scaleWas liberty at scale
Was liberty at scale
 

More from mustafa sarac

Uluslararasilasma son
Uluslararasilasma sonUluslararasilasma son
Uluslararasilasma sonmustafa sarac
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3mustafa sarac
 
Latka december digital
Latka december digitalLatka december digital
Latka december digitalmustafa sarac
 
Axial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualAxial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualmustafa sarac
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpymustafa sarac
 
Math for programmers
Math for programmersMath for programmers
Math for programmersmustafa sarac
 
TEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizTEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizmustafa sarac
 
How to make and manage a bee hotel?
How to make and manage a bee hotel?How to make and manage a bee hotel?
How to make and manage a bee hotel?mustafa sarac
 
Cahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir miCahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir mimustafa sarac
 
How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?mustafa sarac
 
Staff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital MarketsStaff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital Marketsmustafa sarac
 
Yetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimiYetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimimustafa sarac
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0mustafa sarac
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tshmustafa sarac
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008mustafa sarac
 
Wireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guideWireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guidemustafa sarac
 
State of Serverless Report 2020
State of Serverless Report 2020State of Serverless Report 2020
State of Serverless Report 2020mustafa sarac
 
Dont just roll the dice
Dont just roll the diceDont just roll the dice
Dont just roll the dicemustafa sarac
 

More from mustafa sarac (20)

Uluslararasilasma son
Uluslararasilasma sonUluslararasilasma son
Uluslararasilasma son
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
Latka december digital
Latka december digitalLatka december digital
Latka december digital
 
Axial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualAxial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manual
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpy
 
Math for programmers
Math for programmersMath for programmers
Math for programmers
 
The book of Why
The book of WhyThe book of Why
The book of Why
 
BM sgk meslek kodu
BM sgk meslek koduBM sgk meslek kodu
BM sgk meslek kodu
 
TEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizTEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimiz
 
How to make and manage a bee hotel?
How to make and manage a bee hotel?How to make and manage a bee hotel?
How to make and manage a bee hotel?
 
Cahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir miCahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir mi
 
How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?
 
Staff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital MarketsStaff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital Markets
 
Yetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimiYetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimi
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tsh
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008
 
Wireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guideWireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guide
 
State of Serverless Report 2020
State of Serverless Report 2020State of Serverless Report 2020
State of Serverless Report 2020
 
Dont just roll the dice
Dont just roll the diceDont just roll the dice
Dont just roll the dice
 

Recently uploaded

Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringC Sai Kiran
 
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfA CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfKamal Acharya
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
 
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxThe Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxCenterEnamel
 
fundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projectionfundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projectionjeevanprasad8
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdfKamal Acharya
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwoodseandesed
 
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data AnalysisIT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data AnalysisDr. Radhey Shyam
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdfKamal Acharya
 
Pharmacy management system project report..pdf
Pharmacy management system project report..pdfPharmacy management system project report..pdf
Pharmacy management system project report..pdfKamal Acharya
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfPipe Restoration Solutions
 
fluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerfluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerapareshmondalnita
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdfKamal Acharya
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industriesMuhammadTufail242431
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfAbrahamGadissa
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxwendy cai
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdfKamal Acharya
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectRased Khan
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamDr. Radhey Shyam
 

Recently uploaded (20)

Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
 
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfA CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxThe Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
 
fundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projectionfundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projection
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data AnalysisIT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
Pharmacy management system project report..pdf
Pharmacy management system project report..pdfPharmacy management system project report..pdf
Pharmacy management system project report..pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
fluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerfluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answer
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdf
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptx
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
 

Scaling to Millions of Simultaneous Connections by Rick Reed from WhatsApp

  • 1. 1 Scaling to Millions of Simultaneous Connections Rick Reed WhatsApp Erlang Factory SF March 30, 2012
  • 2. 2 About ... Joined WhatsApp in 2011 New to Erlang Background in performance of C-based systems on FreeBSD and Linux Prior work at Yahoo!, SGI
  • 3. 3 Overview The “good problem to have” Performance Goals Tools and Techniques Results General Findings Specific Scalability Fixes
  • 4. 4 The Problem A good problem, but a problem nonetheless Growth, Earthquakes, and Soccer! Msg rates for past four weeks Mexican earthquake goals HT FT
  • 5. 5 The Problem Initial server loading: ~200k connections Discouraging prognosis for growth Cluster brittle in the face of failures/overloads
  • 6. 6 Performance Goals 1 Million connections per server … ! Resilience against disruptions under load Software failures Hardware failures (servers, network gear) World events (sports, earthquakes, etc.)
  • 7. 7 Performance Goals Our standard configuration Dual Westmere Hex-core (24 logical CPUs) 100GB RAM, SSD Dual NIC (user-facing, back-end/distribution) FreeBSD 8.3 OTP R14B03
  • 8. 8 Tools and Techniques System activity monitoring (wsar) OS-level BEAM
  • 9. 9 Tools and Techniques Processor hardware perf counters (pmcstat) dtrace, kernel lock-counting, gprof
  • 10. 10 Tools and Techniques fprof (w/ and w/o cpu_timestamp) BEAM lock-counting (invaluable!!!)
  • 11. 11 Tools and Techniques Synthetic workload Good for subsystems with simple interfaces Limited value for user-facing systems
  • 12. 12 Tools and Techniques Tee'd workload Where side-effects can be contained Extremely useful for tuning
  • 13. 13 Tools and Techniques Diverted workload Add additional production load to server DNS via extra IP aliases TTL issues IPFW forwarding Ran into a few kernel panics at high conn counts
  • 14. 14 Results Initial bottlenecks appeared around 425k First round of fixes got us to 1M conns Fruit was hanging pretty low
  • 15. 15 Results Continued attacking similar bottlenecks Achieved 2M conns about a month later Put further optimizations on back burner
  • 16. 16 Results Began optimizing app code after New Years Unintentional record attempt in Feb Peaked at 2.8M conns before we intervened 571k pkts/sec, >200k dist msgs/sec
  • 17. 17 Results Still trying to obtain elusive 3M conns St. Patrick's Day wasn't as lucky as hoped
  • 18. 18 General Findings Erlang has awesome SMP scalability >85% cpu utilization across 24 logical cpus FreeBSD shines as well
  • 20. 20 General Findings Contention, contention, contention From 200k to 2M were all contention fixes Some issues are internal to BEAM Some addressable with app changes Most required BEAM patches Some required app changes Especially: partitioning workload correctly Some common Erlang idioms come at a price
  • 21. 21 Specific Scalability Fixes FreeBSD Backported TSC-based kernel timecounter gettimeofday(2) calls much less expensive Backported igb network driver Had issues with MSI-X queue stalls sysctl tuning Obvious limits (e.g., kern.ipc.maxsockets) net.inet.tcp.tcphashsize=524288
  • 22. 22 Specific Scalability Fixes BEAM metrics Scheduler (%util, csw, waits, sleeps, …) statistics(message_queues) Msgs queued, #non-empty queues, longest queue process_info(message_queue_stats) Enq/deq/send count & rates (1s, 10s, 100s) statistics(message_counts) Aggregation of message_queue_stats Enable fprof cpu_timestamp for FreeBSD
  • 23. 23 Specific Scalability Fixes BEAM metrics (cont.) Make lock-counting work for larger async thread counts (e.g., +A 1024) Add suspend, location, and port_locks options to erts_debug:lock_counters Enable/disable process/port lock counting at runtime Fix missing accounting for outbound dist bytes
  • 24. 24 Specific Scalability Fixes BEAM tuning +swt low Avoid scheduler perma-sleep +Mummc/mmmbc/mmsbc 99999 Prefer mseg over malloc +Mut 24 Want allocator instance per scheduler
  • 25. 25 Specific Scalability Fixes BEAM tuning +Mulmbcs 32767 +Mumbcgs 1 +Musmbcs 2047 Want large 2M-aligned mseg allocations to maximize superpage promotions Run with real-time scheduling priority +ssct 1 (via patch; scheduler spin count)
  • 26. 26 Specific Scalability Fixes BEAM contention timeofday lock (esp., timeofday delivery) Reduced slot traversals on timer wheel Widened bif timer hash table Ended up moving bif timers to receive timeouts Improved check_io allocation scalability Added prim_file:write_file/3 & /4 (port reuse) Disable mseg max check
  • 27. 27 Specific Scalability Fixes BEAM contention (cont.) Reduce setopts calls in prim_inet:accept and in inet:tcp_controlling_process
  • 28. 28 Specific Scalability Fixes OTP throughput Add gc throttling when message queue is long Increase default dist receive buffer from 4k to 256k (and make configurable) Patch mnesia_tm to dispatch async_dirty txns to separate per-table procs for concurrency Add pg2 denormalized group member lists to improve lookup throughput Increase max configurable mseg cache size
  • 29. 29 Specific Scalability Fixes Erlang usage Prefer os:timestamp to erlang:now Implement cross-node gen_server calls without using monitors (reduces dist traffic and proc link lock contention) Partition ets and mnesia tables and localize access to smaller number of processes Small mnesia clusters
  • 30. 30 Specific Scalability Fixes Operability fixes Added [prepend] option to erlang:send Added process_flag(flush_message_queue)