HAProxyConf22: Scaling Bedrock Video Delivery to 50 Million Users with HAProxy

Scaling Bedrock video
delivery to 50 million
users with HAProxy
HAProxyConf 2022 Paris, France
Vincent GALLISSOT
Bedrock Streaming
Lead Cloud Architect
@vgallissot

HAProxyConf 2019
RTL's Journey to Kubernetes with HAProxy

50 million active users on our customers' platforms
HYBRID
Netherlands
SVOD
France
France
AVOD
Belgium
HYBRID
Hungary
HYBRID

Launched in 2020 Migrated to
in 2022

32 years of episodes 870 episodes aired

Multiple CDNs for each customer
Origin of CDNs
● Performant
● Resilient
● Cost effective

VOD Origin: between storage and CDNs

Packaging: gives control to the player

Packaging: cutting one video into small chunks

Just in Time Packaging with USP Origin

Old days: On Premise VOD Origin
Video storage
Cache to absorb
the load
Just In Time
Packaging
No auto-
scaling
No customer
isolation

Issues during lockdown
July 2020
Let’s migrate to AWS

V1 on AWS: a NLB & an ASG
Video storage
Single TLS
endpoint
Auto scaling
(CPU &
Bandwidth)

Load tests
Photo by Andrew Wulf on Unsplash

Performances and scaling not prod ready
5500 GET/S
Scales well
Scales well
Load test:
2000 GET/S

Why do we reach this
limit?
How can we push this
limit?

Under the hood: metadata
65KB 15B

Metadata can be cached locally
1000 hours of
video = 70GB
of cache

NLB: no advanced load balancing algorithms
Local cache
RR & LOR

The cache is ineffective with Round Robin or Least
Outstanding Requests

The more we scale up, the worse it gets

Consistent Hashing to the rescue

Consistent Hashing + scaling = ♥

V2: Add HAProxy with Consistent Hashing
Local cache
Consistent
Hashing

Two levels of load balancing
Same list of
backends

Haproxy Service Discovery Orchestrator

HSDO: Server list stored in DynamoDB

HSDO client: Using the runtime API
def sendHaproxyCommand(self, command):
command = command + "n"
haproxySock = socket.socket(socket.AF_UNIX,
socket.SOCK_STREAM)
try:
haproxySock.connect(self.socketPath)
haproxySock.send(command.encode("utf-8"))
https://github.com/BedrockStreaming/hsdo

commands.append(
"set server %s/%s addr %s port %s"
% (
backendName,
serverName,
server.IPAddress,
self.backendServerPort
)
)

commands.append(
"set server %s/%s weight %s"
% (
backendName,
serverName,
server.weight
)
)

HSDO client: Makes sure the running configuration
respects the desired state
def checkBackendConf(self):
stat = self.sendHaproxyCommand("show stat")
check = self.backendConfReady(stat)

Simple backend definition
backend usp-origin
balance hdr(X-VIDEO-ID)
hash-type consistent sdbm avalanche
option httpchk HEAD /status
server-template usp-origin
${AUTO_SCALING_GROUP_MAX_SIZE}
127.0.0.1:80
check
disabled

V2: >60% fewer calls to S3 storage
Sudden
cache drop

Gradual ramping up controlled by HSDO Server

Consistent Hashing unwanted behavior

HAProxyConf 2019 - Andrew Rodland

Consistent Hashing with Bounded Loads

Too much traffic for a single server

Failover mechanisms: server throttles
timeout
connect 20ms

Failover mechanisms: bad HTTP codes
retry-on
502-504
502-504

Highlights of our HAProxy configuration
defaults
timeout connect 20ms
timeout server 2s
retries 2
retry-on 502 503 504
conn-failure empty-response response-timeout
option redispatch
default-server inter 1s fall 1 rise 10 observe layer7

Redispatch happens all the time

We migrated all
onprem video traffic
to this V2 in
September 2020

Inter-AZ traffic: the financial abyss

V3: remove inter-AZ traffic
Single AZ
HSDO client
sticks to its
own AZ
servers

V3: more resilient to an AZ failure

V3: fallback backend
frontend fe_main
acl not-enough-ready-servers-in-AZ nbsrv(usp-origin) lt
${HAPROXY_BACKEND_SERVERS_MIN_SIZE}
use_backend fallback-usp-origin if not-enough-ready-servers-in-AZ
use_backend usp-origin

Running a Load Balancer on EC2 Spot instances
● Use at least 12 different instance types
● Mix instance families (C, M, R, etc.)
● Prefer many small servers rather than a few large ones
● Take care to the “UP to” vs “baseline” bandwidths

EC2 “Up to” bandwidth
Source: https://aws.amazon.com/ec2/instance-types/#Compute_Optimized

EC2 baseline bandwidth
Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/compute-
optimized-instances.html#compute-network-performance

V4: remove Network Load Balancer

V3 has a 100% uptime
over the past
22 months

Using HAProxy on AWS is not complex

Thank you
Vincent GALLISSOT
Bedrock Streaming
Lead Cloud Architect
@vgallissot

HAProxyConf22: Scaling Bedrock Video Delivery to 50 Million Users with HAProxy

Recommended

Recommended

More Related Content

Similar to HAProxyConf22: Scaling Bedrock Video Delivery to 50 Million Users with HAProxy

Similar to HAProxyConf22: Scaling Bedrock Video Delivery to 50 Million Users with HAProxy (20)

Recently uploaded

Recently uploaded (20)

HAProxyConf22: Scaling Bedrock Video Delivery to 50 Million Users with HAProxy

Editor's Notes