Keynote: Scaling Sensu Go

Scaling Sensu Go
By Sean Porter,
Co-founder & CTO.

Who am I?
● Creator of Sensu
● Co-founder
● CTO
● PorterTech
2

Overview
1. How we 10X’d performance in 6 months
2. Deployment architectures
3. Hardware recommendations
4. Summary
5. Questions
3

Scale
7
In terms of:
● Performance
● Organization

● Steep learning curve
● Requires RabbitMQ and Redis expertise
● Capable of scaling*
Scaling Sensu Core (1.X)
9

● Used AWS EC2
● M5.2xlarge to i3.metal
● Agent session load tool
● Disappointing results (~5k)
● Inconsistent
Step 2 - Test environment
14

16
Spent $10k on gaming hardware.

● Control
● Consistency
● Capacity
Why bear bare metal?
18

● AMD Threadripper 2920X (12 Cores, 3.5GHz)
● Gigabyte X399 AORUS PRO
● 16GB DDR4 2666MHz CL16 (2x 8GB)
● Two Intel 660p Series M.2 PCIe 512GB SSDs
● Intel Gigabit CT PCIe Network Card
Backend hardware
20

● AMD Threadripper 2990WX (32 Cores, 3.0GHz)
● 32GB DDR4 2666MHz CL16 (4x 8GB)
● Intel 660p Series M.2 PCIe 512GB SSD
Agents hardware
21

● Two Ubiquiti UniFi 8 Port 60W Switches
● Separate load tool and data planes
Network hardware
22

● Consistently delivered disappointing results!
Agents: 4,000
Checks: 8 at 5s interval
Events/s: 6,400
● Produced data!
The ﬁrst results
24

● Identified several possible bottlenecks
● Identified bugs while under load!
● Began experimentation...
The first results
25

● Sensu Events!
● ~95% of etcd write operations
● Disabled Event persistence - 11,200 Events/s
● etcd max database size (10GB*)
● Needed to move the workload
The primary offender
26

● AMD Threadripper 2920X (12 Cores, 3.5GHz)
● 16GB DDR4 2666MHz CL16 (2x 8GB)
● Two Intel 660p Series M.2 PCIe 512GB SSDs
● Three Intel Gigabit CT PCIe Network Card
PostgreSQL hardware
29

Agents: 4,000
Events/s: 11,200
Not good enough!
New results with PostgreSQL
3030

● Multi-Version Concurrency Control
● Many updates - need aggressive auto-vacuuming!
vacuum_cost_delay = 10ms
vacuum_cost_limit = 10000
autovacuum_naptime = 10s
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.025
PostgreSQL tuning
31

● Tune write-ahead logging
● Reduce the number of disk writes
wal_sync_method = fdatasync
wal_writer_delay = 5000ms
max_wal_size = 5GB
min_wal_size = 1GB
PostgreSQL tuning
32

● Burying Check TTL switch set on every Event!
● Additional etcd PUT and DELETE operations
A huge bug!
33

Agents: 4,000
Events/s: 32,000
Much better! Still not good enough.
New results with bug ﬁx
3434

● Several etcd range (reads) requests per Event
● Caching reduced etcd range requests by 50%
● No improvement to Event throughput :(
Entity and silenced caches
35

● Every object is serialized for transport and storage
● Changed from JSON to Protobuf
○ Applied to Agent transport and etcd store
○ Reduced serialized object size!
○ Less CPU time
Serialization
36

● Increased Backend internal queue lengths
○ From 100 to 1000 (made configurable)
● Increased Backend internal worker counts
○ From 100 to 1000 (made configurable)
● Increases concurrency and absorbs latency spikes
Internal queues and workers
37

Agents: 36,000
Checks: 38 at 10s interval (4 subscriptions)
Events/s: 34,200
Almost there!!!
New results
3838

Agents: 40,000
Checks: 38 at 10s interval (4 subscriptions)
Events/s: 38,000
New results
4040

● https://github.com/sensu/sensu-perf
● Performance tests are reproducible
● Users can test their own deployments!
● Now part of release QA!
The performance project
42

43
What’s next for scaling Sensu?

Multi-site Federation
● 40,000 Agents per cluster
● Run multiple/distributed Sensu Go clusters
● Centralized RBAC policy management
● Centralized visibility via the WebUI
44

Backend requirements
● 16 vCPU
● 16GB memory
● Attached NVMe SSD
○ >50MB/s and >5k sustained random IOPS
● Gigabit ethernet (low latency)
5454

PostgreSQL requirements
● 16 vCPU
● 16GB memory
● Attached NVMe SSD
○ >300MB/s and >5k sustained random IOPS
● 10 gigabit ethernet (low latency)
5555

Keynote: Scaling Sensu Go

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Keynote: Scaling Sensu Go

Similar to Keynote: Scaling Sensu Go (20)

More from Sensu Inc.

More from Sensu Inc. (16)

Recently uploaded

Recently uploaded (20)

Keynote: Scaling Sensu Go