M|18 Scalability via Expendable Resources: Containers at BlaBlaCar

Maxime Fouilleul
Lead Database Engineer

Scalability via Expendable
Resources: Containers at
BlaBlaCar
M|18, Feb 27, 2018

Today’s
agenda
BlaBlaCar - Facts & Figures
Infrastructure Ecosystem - 100% containers powered carpooling
Backend High Availability Pillars - MariaDB as an example
Database as a Service - Building a frictionless infrastructure
What’s next?

60 million
members
Founded
in 2006
1 million tonnes
less CO2
In the past year
30 million mobile
app downloads
iPhone and Android
5 million
monthly travellers
Currently in
22 countriesFrance, Spain, UK, Italy, Poland, Hungary, Croatia, Serbia, Romania,
Germany, Belgium, India, Mexico, The Netherlands, Luxembourg,
Portugal, Ukraine, Czech Republic, Slovakia, Russia, Brazil and Turkey.
Facts and Figures

MariaDB Cassandra Redis PostgreSQL
Transactional
20 clusters
55 nodes
40K reads/s
Our prod data ecosystem
ElasticSearch
Distributed
6 clusters
32 nodes
3K reads/s
Volatile
17 clusters
51 nodes
40K reads/s
Search
11 clusters
65 nodes
1K searches/s
Spatial
4 clusters
14 nodes
3K reads/s

Infrastructure Ecosystem
100% containers powered
carpooling

Infrastructure Ecosystem
bare-metal servers
1 type of
hardware
3 disk profiles
fleet cluster
CoreOS
fleet etcd“Distributed init system”
Hardware
Container Registry
ggn
dgr
Service Codebase
rkt PODs
build
run
store
host
create mysqld
monitoring
nerve
mysql-main1
php
nginx
nerve
monitoring
synapse
front1
synapse
nerve
zookeeper Service Discovery

backend pod
client pod
Service Discovery
/database/node1
go-nerve does health checks
and reports to zookeeper in
service keys
node1
/database
Applications hit their local
haproxy to access backends
go-synapse watches
zookeeper service keys and
reloads haproxy if changes are
detected
HAProxy
go-nerve
Zookeeper
go-synapse

Backend High Availability Pillars
MariaDB as an example

Abolish Slavery
Everyone's the same

Asynchronous vs. Synchronous
Master
Slave Slave Slave
wsrep wsrep wsrep wsrep
MariaDB Cluster
wsrep
MariaDB Cluster means
No Single Point of
Failure
No Replication Lag
Auto States Transfers
As fast as the slowest

MySQL at BlaBlaCar?
wsrep wsrep wsrep wsrep
MariaDB Cluster
wsrep
MariaDB Cluster
Our prerequisites are
Containers
Writes go on one
node
Writes
Reads are balanced
on the others
Reads

# zookeepercli -c lsr /services/mysql/main
mysql-main1_192.168.1.2_ba0f1f8b
mysql-main2_192.168.1.3_734d63da
mysql-main3_192.168.1.4_dde45787
# zookeepercli -c get /services/mysql/main/mysql-
main1_192.168.1.2_ba0f1f8b3
{
"available":true,
"host":"192.168.1.2",
"port":3306,
"name":"mysql-main1",
"weight":255,
"labels":{
"host":"r10-srv4"
}
}
# cat env/prod-dc1/services/mysql-main/attributes/nerve.yml
---
override:
nerve:
services:
- name: "mysql-main"
port: 3306
reporters:
- {type: zookeeper, path: /services/mysql/main}
checks:
- type: sql
driver: mysql
datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/"
Nerve - Track and report service status

# cat env/prod-dc1/services/tripsearch/attributes/tripsearch.yml
—-
override:
tripsearch:
database:
read:
host: localhaproxy
database: tripsearch
user: tripsearch_rd
port: 3307
write:
host: localhaproxy
database: tripsearch
user: tripsearch_wr
port: 3308
Synapse - Service discovery router
# cat env/prod-dc1/services/tripsearch/attributes/synapse.yml
---
override:
synapse:
services:
- name: mysql-main_read
path: /services/mysql/main
port: 3307
serverCorrelation:
type: excludeServer
otherServiceName: mysql-main_write
scope: first
- name: mysql-main_write
path: /services/mysql/main
port: 3308
serverOptions: backup
serverSort: date

Be Quiet!
Come gently into prod

Service
Discovery
weight
system
Nerve’s checks are OK
Service is reported
with a current weight
of 1/255.
Warmup is triggered
Current weight is
increased following a
weighted fibonacci
suite.

If enableCheckStableCommand is set
The command is run at each
increase and if returning != 0,
current weight restart from 1.
Weight value is reached
The service is fully in
production.
go-nerve Zookeeper go-synapse HAProxy
call API on
/enable or
/weight/:weight
store current
weight
update weight on
HaProxy via
socket
set weight
<backend>/<server>
<weight>

# cat /report_slow_queries.sh
#!/dgr/bin/busybox sh
. /dgr/bin/functions.sh
isLevelEnabled "debug" && set -x
slwq=$(/usr/bin/timeout 1 /usr/bin/mysql -h127.0.0.1 -ulocal_mon -plocal_mon information_schema -e "SELECT COUNT(1) FROM processlist WHERE user
LIKE '%rd' AND LOWER(command) <> 'sleep' AND time > 1" -BN)
if [ $? -eq 0 ] && [ $slwq -eq 0 ]; then
return 0
else
return 1
fi
MySQL’s warm up in nerve
---
override:
nerve:
services:
port: 3306
reporters:
checks:
- type: sql
driver: mysql
enableCheckStableCommand: ["/report_slow_queries.sh"]

MySQL’s warm up in nerve
bbc mysql prod-dc1 mysql-main mysql-main1 monitor
#1 Weight: 1/255 Processes: 0 Slow: 0
#7 Weight: 13/255 Processes: 20 Slow: 1 <- SLOW !
#24 Weight: 255/255 Processes: 46 Slow: 0 <- FULL POWER !

Die in Peace...
Get out when you are
ready

API call /disable return
The service can be shutdown
without risk.
Call /disable on Nerve’s API
Set weight to 0 = no more new
sessions will go into the services.
if disableGracefullyDoneCommand is set
This command is run in loop until
return 0.
Gracefully
Disabling
Pipeline

# cat /report_remaining_processes.sh
#!/dgr/bin/busybox sh
. /dgr/bin/functions.sh
isLevelEnabled "debug" && set -x
procs=$(/usr/bin/timeout 1 /usr/bin/mysql -h127.0.0.1 -ulocal_mon -plocal_mon information_schema -e "SELECT COUNT(1) FROM processlist WHERE user
LIKE '%rd' OR user LIKE '%wr'" -BN)
if [ $? -eq 0 ] && [ $procs -eq 0 ]; then
return 0
else
return 1
fi
MySQL’s graceful shutdown in nerve
---
override:
nerve:
services:
port: 3306
reporters:
checks:
- type: sql
driver: mysql
enableCheckStableCommand: ["/report_slow_queries.sh"]
disableGracefullyDoneCommand: ["/root/report_remaining_processes.sh"]

Be Quiet!
Come gently into prod
Abolish Slavery
Every node is the same
Die in Peace...
Get out when you are ready
Graceful restart
Service Discovery (nerve/synapse)
Weight system
Slow query tracking
Graceful restart
Weight system
No master/slave
Auto States Transferts
Backend High Availability Pillars

Database as a Service
Building a frictionless infrastructure

Easy deployment
Pull Request on a services
repository
No technical parameters to
override
The services are auto initialized

Easy deployment with GGN
$ tree env/prod-dc1/services/mysql-main
env/prod-dc1/services/mysql-main
├── attributes
│ ├── galera.yml
│ ├── innodb.yml
│ └── nerve.yml
├── service-manifest.yml
└── unit.tmpl
1 directory, 5 files
$ cat env/prod-dc1/services/mysql-main/service-manifest.yml
containers:
- aci.blbl.cr/pod-mysql:10.1-32
nodes:
- hostname: "mysql-main1"
ip: "192.168.1.1"
fleet:
- MachineMetadata=name=r11-srv1
- hostname: "mysql-mysql-main2"
ip: "192.168.1.2"
fleet:
- hostname: "mysql-mysql-main3"
ip: "192.168.1.3"
fleet:
$ cat env/prod-dc1/services/mysql-main/attributes/galera.yml
---
override:
mariadb:
galera:
wsrep_cluster_name: "prod-dc1_main"
$ cat env/prod-dc1/services/mysql-main/attributes/innodb.yml
---
override:
mariadb:
innodb:
innodb_log_file_size: "1G"
innodb_buffer_pool_size: "4G

Easy deployment
$ cat env/prod-dc1/services/mysql-main/unit.tmpl
[Unit]
Description=pod-mysql {{.hostname}}
[Service]
{{- template "env-fleet" .}}
{{ template "rkt-pre-start" . -}}
{{ template "rkt-post-stop" . }}
ExecStartPre=/usr/bin/mkdir -p /mnt/sdb1/{{.hostname}}/log
{{ template "rkt-run-options" . -}}
--volume=mysql-data,kind=host,source=/mnt/sdb1/{{.hostname}}
--volume=mysql-log,kind=host,source=/mnt/sdb1/{{.hostname}}/log
{{.acis}}
{{- template "x-fleet" . }}
# ggn prod-dc1 mysql-main update -y
Deploy the service with GGN (github.com/blablacar/ggn)
Generates systemd units based on templating send them to the environment using fleet.

Easy Monitoring &
Alerting
Service Oriented Monitoring
The monitoring plateform is
plugged into the service
discovery

Pager Duty
Incidents Manager
Grafana
Beautiful Visualizations
Prometheus
Smart Monitoring
Nerve
Service Discovery
Easy Monitoring & Alerting

Prometheus with Nerve integration
$ cat pod-mysql/pod-manifest.yml
name: aci.blbl.cr/pod-mysql:10.1-33
pod:
apps:
- dependencies:
- aci.blbl.cr/aci-mariadb:10.1-29
app:
mountPoints:
- {name: mysql-data, path: /var/lib/mysql}
- {name: mysql-log, path: /var/log/mysql}
- name: aci-nerve
dependencies:
- aci.blbl.cr/aci-go-nerve:21-23
- aci.blbl.cr/aci-mariadb:10.1-29
- dependencies:
- aci.blbl.cr/aci-prometheus-mysql-exporter:0.10.0-1
---
override:
nerve:
services:
- name: "{{.hostname}}"
port: 3306
reporters:
checks:
- type: sql
driver: mysql
- name: "{{.hostname}}_prometheus"
port: 9104
reporters:
- {type: zookeeper, path: /monitoring/mysql/main}
# curl mysql-main1.prod.dc1.com:9104/metrics | head
# HELP mysql_exporter_last_scrape_duration_seconds Duration of the
last scrape of metrics from MySQL.
# TYPE mysql_exporter_last_scrape_duration_seconds gauge
mysql_exporter_last_scrape_duration_seconds 0.056807316
# HELP mysql_exporter_last_scrape_error Whether the last scrape of
metrics from MySQL resulted in an error (1 for error, 0 for success).
# TYPE mysql_exporter_last_scrape_error gauge
mysql_exporter_last_scrape_error 0
[...]
# cat env/prod-dc1/services/prometheus/attributes/prometheus.yml
[...]
ranged_targets:
- type: zk
job_name: discovery_prod-dc1
scrape_interval: 20s
metrics_path: /metrics
zk:
hosts: '{{ toJson .zk.hosts }}'
zkpaths:
- /monitoring
[...]

Prometheus relabeling
# [zk: localhost:2181(CONNECTED) 1] get /monitoring/mysql/main/mysql-main1_prometheus_192.168.1.2_ba0f1f8b
{"available":true,"host":"192.168.1.2","port":9104,"name":"mysql-main1","weight":255,"labels":{"host":"r11-srv1"}}
We push services info with Nerve into Zookeeper
And Prometheus does the magic

$ cat prometheus-rules/alert.mysql.rules
# Alert: Galera node state is not synced.
ALERT MySQLGaleraStateIsNotSynced
IF (mysql_global_status_wsrep_local_state != 4 AND mysql_global_variables_wsrep_desync == 0)
FOR 2m
LABELS {
severity = "warning", team="data_infrastructure"
}
ANNOTATIONS {
summary = "Galera node {{ $labels.name }} state is not in “Synced” (state={{$value}}).",
dashboard = "https://promgrafana.blabla.com/dashboard/db/mysql-cluster-view?var-
cluster={{$labels.service}}&var-ds=prom-dc1&from=now-1h&to=now",
runbook="https://ops-run-book.blabla.com/mysql/operational-tasks#MySQLGaleraOutOfSync",
}
Alerting
PromQL to find out
unhealthy services
Labeling for routing to
Slack & Pager Duty
Annotations with
templating to have clear
descriptions, URL to
dashboards and ops
runbooks

Easy troubleshooting
Do the basic health checks
quickly
In real time
Avoiding human
mistakes/errors

A set of bash scripts Do the basic health
checks quickly
Easy troubleshooting with “bbc” command
Manage all backends
the same way
Can be used by non-
specialists
Plugged into the
service discovery
Designed for our
needs

# bbc mysql list
pp-dc2 mysql-main
pp-dc2 mysql-user
pp-dc2 mysql-trip
pp-dc2 mysql-payment
prod-dc1 mysql-main
prod-dc1 mysql-user
prod-dc1 mysql-trip
prod-dc1 mysql-payment
[...]
bbc command examples
# bbc mysql overview prod-dc1 mysql-main
=== Service Overview 'prod-dc1 mysql-main' ===
mysql-main1 (192.168.1.1) PING, PORT, Synced
---
mysql-main1 (3306) - enabled - weight = 255/255
mysql-main1_prometheus (9104) - enabled - weight = 255/255
---
mysql-main2_prometheus (9104) - enabled - weight = 255/255
---
mysql-main3_prometheus (9104) - enabled - weight = 255/255 # bbc mysql connect prod-dc1 mysql-main
env: prod-dc1
service: mysql-main
host: mysql-main1
ip: 192.168.1.1
Enter the username [ENTER]: team_data
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or g.
Your MariaDB connection id is 2887129
Server version: 10.1.28-MariaDB-1~jessie mariadb.org binary distribution
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
MariaDB [(none)]>
# bbc mysql monitor prod-dc1 mysql-main mysql-main1
Weight: 255/255 Processes: 88 Slow: 0

# bbc postgresql overview prod-dc1 postgresql-corridoring
Service Overview 'prod-dc1 postgresql-corridoring'
-- USING BDR --
postgresql-corridoring1 (192.168.1.10) PING, PORT
# bbc postgresql list
pp-dc2 postgresql-airflow
pp-dc2 postgresql-corridoring
pp-dc2 postgresql-redash
pp-dc2 postgresql-trip-pricing
prod-dc1 postgresql-corridoring
prod-dc1 postgresql-redash
# bbc postgresql connect prod-dc1 postgresql-corridoring
env: prod-dc1
service: postgresql-corridoring - database : corridoring
host: postgresql-corridoring1
ip: 192.168.1.10
Enter the username [ENTER]: team_data
Password for user team_arch:
psql (9.6.6, server 9.4.12)
Type "help" for help.
corridoring=#
# bbc redis overview prod-dc1 redis-main
=== Service 'prod-dc1' 'redis-main' ===
Redis elector master: redis-main1.prod.dc-1.blabla.com
redis-main1 (192.168.1.20): PING, PORT, role:master, clients:255
redis-main2 (192.168.1.21): PING, PORT, role:slave, clients:2, slaveof:192.168.1.20
redis-main3 (192.168.1.22): PING, PORT, role:slave, clients:2, slaveof:192.168.1.20
# bbc redis list
pp-dc2 redis-main
pp-dc2 redis-quota
pp-dc2 redis-translation
pp-dc2 redis-user
prod-dc1 redis-main
prod-dc1 redis-quota
# bbc redis connect prod-dc1 redis-main
env: prod-dc1
service: redis-main
host: redis-main1
ip: 192.168.1.20
role: slave
192.168.1.20:6379>

# bbc cassandra ping prod-dc1 cassandra-user
cassandra-user1 (192.168.1.30) PING, CQL, JMX
---
---
---
# bbc cassandra overview prod-dc1 cassandra-user
=== Service 'prod-dc1 cassandra-user' ===
Datacenter: prod-dc1
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.1.30 6.01 GB 256 33.3% bef39dd5-d4e5-4733-93e5-75904b6d556a r10
UN 192.168.1.31 5.89 GB 256 33.3% 23b77937-2177-4638-b860-e73e4bb913d2 r10
UN 192.168.1.32 5.12 GB 256 33.3% de0f4ed1-1241-499d-9485-e73e4bb913d2 r10
Datacenter: prod-dc2
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.2.10 15.69 GB 256 100.0% 3ca1e862-f3e2-4fbf-a6c1-4d7d5a3e70ec r14
UN 192.168.2.11 14.99 GB 256 100.0% de0f4ed1-1241-499d-9485-2f8196aa7425 r13
UN 192.168.2.12 16.1 GB 256 100.0% 7e5fee00-052f-4546-973d-befaebbe604b r15
Today, 32 subcommands are available on bbc...

Moving to Kubernetes
From a simple
“Distributed init
system” to the
standard for container
orchestration.
Fleet is deprecated
Fleet is no longer
developed and
maintained by
CoreOS.
What does
the future
look like?

Ownership
Move backends
ownership to the
developers teams.
Moving to the cloud?
Extend this idea of
“expendable” services to
hardware resources.
Docker?
Kubernetes + RKT
(rktnetes, rktlet) has a
poor adoption.

M|18 Scalability via Expendable Resources: Containers at BlaBlaCar

M|18 Scalability via Expendable Resources: Containers at BlaBlaCar

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to M|18 Scalability via Expendable Resources: Containers at BlaBlaCar

Similar to M|18 Scalability via Expendable Resources: Containers at BlaBlaCar (20)

More from MariaDB plc

More from MariaDB plc (20)

Recently uploaded

Recently uploaded (20)

M|18 Scalability via Expendable Resources: Containers at BlaBlaCar