Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

M|18 Scalability via Expendable Resources: Containers at BlaBlaCar

130 views

Published on

M|18 Scalability via Expendable Resources: Containers at BlaBlaCar

Published in: Data & Analytics
  • Be the first to comment

M|18 Scalability via Expendable Resources: Containers at BlaBlaCar

  1. 1. The Expendables
  2. 2. Maxime Fouilleul Lead Database Engineer
  3. 3. Scalability via Expendable Resources: Containers at BlaBlaCar M|18, Feb 27, 2018
  4. 4. Today’s agenda BlaBlaCar - Facts & Figures Infrastructure Ecosystem - 100% containers powered carpooling Backend High Availability Pillars - MariaDB as an example Database as a Service - Building a frictionless infrastructure What’s next?
  5. 5. BlaBlaCar Facts & Figures
  6. 6. 60 million members Founded in 2006 1 million tonnes less CO2 In the past year 30 million mobile app downloads iPhone and Android 5 million monthly travellers Currently in 22 countriesFrance, Spain, UK, Italy, Poland, Hungary, Croatia, Serbia, Romania, Germany, Belgium, India, Mexico, The Netherlands, Luxembourg, Portugal, Ukraine, Czech Republic, Slovakia, Russia, Brazil and Turkey. Facts and Figures
  7. 7. MariaDB Cassandra Redis PostgreSQL Transactional 20 clusters 55 nodes 40K reads/s Our prod data ecosystem ElasticSearch Distributed 6 clusters 32 nodes 3K reads/s Volatile 17 clusters 51 nodes 40K reads/s Search 11 clusters 65 nodes 1K searches/s Spatial 4 clusters 14 nodes 3K reads/s
  8. 8. Infrastructure Ecosystem 100% containers powered carpooling
  9. 9. Infrastructure Ecosystem bare-metal servers 1 type of hardware 3 disk profiles fleet cluster CoreOS fleet etcd“Distributed init system” Hardware Container Registry ggn dgr Service Codebase rkt PODs build run store host create mysqld monitoring nerve mysql-main1 php nginx nerve monitoring synapse front1 synapse nerve zookeeper Service Discovery
  10. 10. backend pod client pod Service Discovery /database/node1 go-nerve does health checks and reports to zookeeper in service keys node1 /database Applications hit their local haproxy to access backends go-synapse watches zookeeper service keys and reloads haproxy if changes are detected HAProxy go-nerve Zookeeper go-synapse
  11. 11. Backend High Availability Pillars MariaDB as an example
  12. 12. Abolish Slavery Everyone's the same
  13. 13. Asynchronous vs. Synchronous Master Slave Slave Slave wsrep wsrep wsrep wsrep MariaDB Cluster wsrep MariaDB Cluster means No Single Point of Failure No Replication Lag Auto States Transfers As fast as the slowest
  14. 14. MySQL at BlaBlaCar? wsrep wsrep wsrep wsrep MariaDB Cluster wsrep MariaDB Cluster Our prerequisites are Containers Writes go on one node Writes Reads are balanced on the others Reads
  15. 15. # zookeepercli -c lsr /services/mysql/main mysql-main1_192.168.1.2_ba0f1f8b mysql-main2_192.168.1.3_734d63da mysql-main3_192.168.1.4_dde45787 # zookeepercli -c get /services/mysql/main/mysql- main1_192.168.1.2_ba0f1f8b3 { "available":true, "host":"192.168.1.2", "port":3306, "name":"mysql-main1", "weight":255, "labels":{ "host":"r10-srv4" } } # cat env/prod-dc1/services/mysql-main/attributes/nerve.yml --- override: nerve: services: - name: "mysql-main" port: 3306 reporters: - {type: zookeeper, path: /services/mysql/main} checks: - type: sql driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" Nerve - Track and report service status
  16. 16. # cat env/prod-dc1/services/tripsearch/attributes/tripsearch.yml —- override: tripsearch: database: read: host: localhaproxy database: tripsearch user: tripsearch_rd port: 3307 write: host: localhaproxy database: tripsearch user: tripsearch_wr port: 3308 Synapse - Service discovery router # cat env/prod-dc1/services/tripsearch/attributes/synapse.yml --- override: synapse: services: - name: mysql-main_read path: /services/mysql/main port: 3307 serverCorrelation: type: excludeServer otherServiceName: mysql-main_write scope: first - name: mysql-main_write path: /services/mysql/main port: 3308 serverOptions: backup serverSort: date
  17. 17. Be Quiet! Come gently into prod
  18. 18. Service Discovery weight system Nerve’s checks are OK Service is reported with a current weight of 1/255. Warmup is triggered Current weight is increased following a weighted fibonacci suite.
  19. 19. If enableCheckStableCommand is set The command is run at each increase and if returning != 0, current weight restart from 1. Weight value is reached The service is fully in production. go-nerve Zookeeper go-synapse HAProxy call API on /enable or /weight/:weight store current weight update weight on HaProxy via socket set weight <backend>/<server> <weight>
  20. 20. # cat /report_slow_queries.sh #!/dgr/bin/busybox sh . /dgr/bin/functions.sh isLevelEnabled "debug" && set -x slwq=$(/usr/bin/timeout 1 /usr/bin/mysql -h127.0.0.1 -ulocal_mon -plocal_mon information_schema -e "SELECT COUNT(1) FROM processlist WHERE user LIKE '%rd' AND LOWER(command) <> 'sleep' AND time > 1" -BN) if [ $? -eq 0 ] && [ $slwq -eq 0 ]; then return 0 else return 1 fi MySQL’s warm up in nerve # cat env/prod-dc1/services/mysql-main/attributes/nerve.yml --- override: nerve: services: - name: "mysql-main" port: 3306 reporters: - {type: zookeeper, path: /services/mysql/main} checks: - type: sql driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" enableCheckStableCommand: ["/report_slow_queries.sh"]
  21. 21. MySQL’s warm up in nerve bbc mysql prod-dc1 mysql-main mysql-main1 monitor #1 Weight: 1/255 Processes: 0 Slow: 0 #2 Weight: 2/255 Processes: 0 Slow: 0 #3 Weight: 3/255 Processes: 3 Slow: 0 #4 Weight: 5/255 Processes: 7 Slow: 0 #5 Weight: 5/255 Processes: 10 Slow: 0 #6 Weight: 8/255 Processes: 12 Slow: 0 #7 Weight: 13/255 Processes: 20 Slow: 1 <- SLOW ! #8 Weight: 1/255 Processes: 20 Slow: 1 #9 Weight: 2/255 Processes: 12 Slow: 0 #10 Weight: 3/255 Processes: 4 Slow: 0 #11 Weight: 5/255 Processes: 7 Slow: 0 #12 Weight: 8/255 Processes: 10 Slow: 0 #13 Weight: 13/255 Processes: 12 Slow: 0 #14 Weight: 15/255 Processes: 20 Slow: 0 #15 Weight: 23/255 Processes: 35 Slow: 0 #16 Weight: 38/255 Processes: 40 Slow: 0 #17 Weight: 38/255 Processes: 35 Slow: 0 #18 Weight: 61/255 Processes: 36 Slow: 0 #19 Weight: 61/255 Processes: 47 Slow: 0 #20 Weight: 98/255 Processes: 44 Slow: 0 #21 Weight: 98/255 Processes: 41 Slow: 0 #22 Weight: 158/255 Processes: 38 Slow: 0 #23 Weight: 158/255 Processes: 50 Slow: 0 #24 Weight: 255/255 Processes: 46 Slow: 0 <- FULL POWER ! #25 Weight: 255/255 Processes: 46 Slow: 0
  22. 22. Die in Peace... Get out when you are ready
  23. 23. API call /disable return The service can be shutdown without risk. Call /disable on Nerve’s API Set weight to 0 = no more new sessions will go into the services. if disableGracefullyDoneCommand is set This command is run in loop until return 0. Gracefully Disabling Pipeline
  24. 24. # cat /report_remaining_processes.sh #!/dgr/bin/busybox sh . /dgr/bin/functions.sh isLevelEnabled "debug" && set -x procs=$(/usr/bin/timeout 1 /usr/bin/mysql -h127.0.0.1 -ulocal_mon -plocal_mon information_schema -e "SELECT COUNT(1) FROM processlist WHERE user LIKE '%rd' OR user LIKE '%wr'" -BN) if [ $? -eq 0 ] && [ $procs -eq 0 ]; then return 0 else return 1 fi MySQL’s graceful shutdown in nerve # cat env/prod-dc1/services/mysql-main/attributes/nerve.yml --- override: nerve: services: - name: "mysql-main" port: 3306 reporters: - {type: zookeeper, path: /services/mysql/main} checks: - type: sql driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" enableCheckStableCommand: ["/report_slow_queries.sh"] disableGracefullyDoneCommand: ["/root/report_remaining_processes.sh"]
  25. 25. Be Quiet! Come gently into prod Abolish Slavery Every node is the same Die in Peace... Get out when you are ready Graceful restart Service Discovery (nerve/synapse) Weight system Slow query tracking Graceful restart Service Discovery (nerve/synapse) Weight system No master/slave Auto States Transferts Service Discovery (nerve/synapse) Backend High Availability Pillars
  26. 26. Database as a Service Building a frictionless infrastructure
  27. 27. Easy deployment Pull Request on a services repository No technical parameters to override The services are auto initialized
  28. 28. Easy deployment with GGN $ tree env/prod-dc1/services/mysql-main env/prod-dc1/services/mysql-main ├── attributes │ ├── galera.yml │ ├── innodb.yml │ └── nerve.yml ├── service-manifest.yml └── unit.tmpl 1 directory, 5 files $ cat env/prod-dc1/services/mysql-main/service-manifest.yml containers: - aci.blbl.cr/pod-mysql:10.1-32 nodes: - hostname: "mysql-main1" ip: "192.168.1.1" fleet: - MachineMetadata=name=r11-srv1 - hostname: "mysql-mysql-main2" ip: "192.168.1.2" fleet: - MachineMetadata=name=r12-srv2 - hostname: "mysql-mysql-main3" ip: "192.168.1.3" fleet: - MachineMetadata=name=r13-srv3 $ cat env/prod-dc1/services/mysql-main/attributes/galera.yml --- override: mariadb: galera: wsrep_cluster_name: "prod-dc1_main" $ cat env/prod-dc1/services/mysql-main/attributes/innodb.yml --- override: mariadb: innodb: innodb_log_file_size: "1G" innodb_buffer_pool_size: "4G
  29. 29. Easy deployment $ cat env/prod-dc1/services/mysql-main/unit.tmpl [Unit] Description=pod-mysql {{.hostname}} [Service] {{- template "env-fleet" .}} {{ template "rkt-pre-start" . -}} {{ template "rkt-post-stop" . }} ExecStartPre=/usr/bin/mkdir -p /mnt/sdb1/{{.hostname}}/log {{ template "rkt-run-options" . -}} --volume=mysql-data,kind=host,source=/mnt/sdb1/{{.hostname}} --volume=mysql-log,kind=host,source=/mnt/sdb1/{{.hostname}}/log {{.acis}} {{- template "x-fleet" . }} # ggn prod-dc1 mysql-main update -y Deploy the service with GGN (github.com/blablacar/ggn) Generates systemd units based on templating send them to the environment using fleet.
  30. 30. Easy Monitoring & Alerting Service Oriented Monitoring The monitoring plateform is plugged into the service discovery
  31. 31. Pager Duty Incidents Manager Grafana Beautiful Visualizations Prometheus Smart Monitoring Nerve Service Discovery Easy Monitoring & Alerting
  32. 32. Prometheus with Nerve integration $ cat pod-mysql/pod-manifest.yml name: aci.blbl.cr/pod-mysql:10.1-33 pod: apps: - dependencies: - aci.blbl.cr/aci-mariadb:10.1-29 app: mountPoints: - {name: mysql-data, path: /var/lib/mysql} - {name: mysql-log, path: /var/log/mysql} - name: aci-nerve dependencies: - aci.blbl.cr/aci-go-nerve:21-23 - aci.blbl.cr/aci-mariadb:10.1-29 - dependencies: - aci.blbl.cr/aci-prometheus-mysql-exporter:0.10.0-1 # cat env/prod-dc1/services/mysql-main/attributes/nerve.yml --- override: nerve: services: - name: "{{.hostname}}" port: 3306 reporters: - {type: zookeeper, path: /services/mysql/main} checks: - type: sql driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" - name: "{{.hostname}}_prometheus" port: 9104 reporters: - {type: zookeeper, path: /monitoring/mysql/main} # curl mysql-main1.prod.dc1.com:9104/metrics | head # HELP mysql_exporter_last_scrape_duration_seconds Duration of the last scrape of metrics from MySQL. # TYPE mysql_exporter_last_scrape_duration_seconds gauge mysql_exporter_last_scrape_duration_seconds 0.056807316 # HELP mysql_exporter_last_scrape_error Whether the last scrape of metrics from MySQL resulted in an error (1 for error, 0 for success). # TYPE mysql_exporter_last_scrape_error gauge mysql_exporter_last_scrape_error 0 [...] # cat env/prod-dc1/services/prometheus/attributes/prometheus.yml [...] ranged_targets: - type: zk job_name: discovery_prod-dc1 scrape_interval: 20s metrics_path: /metrics zk: hosts: '{{ toJson .zk.hosts }}' zkpaths: - /monitoring [...]
  33. 33. Prometheus relabeling # [zk: localhost:2181(CONNECTED) 1] get /monitoring/mysql/main/mysql-main1_prometheus_192.168.1.2_ba0f1f8b {"available":true,"host":"192.168.1.2","port":9104,"name":"mysql-main1","weight":255,"labels":{"host":"r11-srv1"}} We push services info with Nerve into Zookeeper And Prometheus does the magic
  34. 34. $ cat prometheus-rules/alert.mysql.rules # Alert: Galera node state is not synced. ALERT MySQLGaleraStateIsNotSynced IF (mysql_global_status_wsrep_local_state != 4 AND mysql_global_variables_wsrep_desync == 0) FOR 2m LABELS { severity = "warning", team="data_infrastructure" } ANNOTATIONS { summary = "Galera node {{ $labels.name }} state is not in “Synced” (state={{$value}}).", dashboard = "https://promgrafana.blabla.com/dashboard/db/mysql-cluster-view?var- cluster={{$labels.service}}&var-ds=prom-dc1&from=now-1h&to=now", runbook="https://ops-run-book.blabla.com/mysql/operational-tasks#MySQLGaleraOutOfSync", } Alerting PromQL to find out unhealthy services Labeling for routing to Slack & Pager Duty Annotations with templating to have clear descriptions, URL to dashboards and ops runbooks
  35. 35. Easy troubleshooting Do the basic health checks quickly In real time Avoiding human mistakes/errors
  36. 36. A set of bash scripts Do the basic health checks quickly Easy troubleshooting with “bbc” command Manage all backends the same way Can be used by non- specialists Plugged into the service discovery Designed for our needs
  37. 37. # bbc mysql list pp-dc2 mysql-main pp-dc2 mysql-user pp-dc2 mysql-trip pp-dc2 mysql-payment prod-dc1 mysql-main prod-dc1 mysql-user prod-dc1 mysql-trip prod-dc1 mysql-payment [...] bbc command examples # bbc mysql overview prod-dc1 mysql-main === Service Overview 'prod-dc1 mysql-main' === mysql-main1 (192.168.1.1) PING, PORT, Synced --- mysql-main1 (3306) - enabled - weight = 255/255 mysql-main1_prometheus (9104) - enabled - weight = 255/255 mysql-main2 (192.168.1.2) PING, PORT, Synced --- mysql-main2 (3306) - enabled - weight = 255/255 mysql-main2_prometheus (9104) - enabled - weight = 255/255 mysql-main3 (192.168.1.3) PING, PORT, Synced --- mysql-main3 (3306) - enabled - weight = 255/255 mysql-main3_prometheus (9104) - enabled - weight = 255/255 # bbc mysql connect prod-dc1 mysql-main env: prod-dc1 service: mysql-main host: mysql-main1 ip: 192.168.1.1 Enter the username [ENTER]: team_data Enter password: Welcome to the MariaDB monitor. Commands end with ; or g. Your MariaDB connection id is 2887129 Server version: 10.1.28-MariaDB-1~jessie mariadb.org binary distribution Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. MariaDB [(none)]> # bbc mysql monitor prod-dc1 mysql-main mysql-main1 Weight: 255/255 Processes: 88 Slow: 0 Weight: 255/255 Processes: 75 Slow: 0 Weight: 255/255 Processes: 89 Slow: 0 Weight: 255/255 Processes: 99 Slow: 0 Weight: 255/255 Processes: 79 Slow: 0 Weight: 255/255 Processes: 65 Slow: 0 Weight: 255/255 Processes: 86 Slow: 0 Weight: 255/255 Processes: 93 Slow: 0 Weight: 255/255 Processes: 88 Slow: 0 Weight: 255/255 Processes: 96 Slow: 0 Weight: 255/255 Processes: 77 Slow: 0 Weight: 255/255 Processes: 73 Slow: 0
  38. 38. # bbc postgresql overview prod-dc1 postgresql-corridoring Service Overview 'prod-dc1 postgresql-corridoring' -- USING BDR -- postgresql-corridoring1 (192.168.1.10) PING, PORT postgresql-corridoring2 (192.168.1.11) PING, PORT postgresql-corridoring3 (192.168.1.12) PING, PORT postgresql-corridoring4 (192.168.1.13) PING, PORT postgresql-corridoring5 (192.168.1.14) PING, PORT # bbc postgresql list pp-dc2 postgresql-airflow pp-dc2 postgresql-corridoring pp-dc2 postgresql-redash pp-dc2 postgresql-trip-pricing prod-dc1 postgresql-corridoring prod-dc1 postgresql-redash bbc command examples # bbc postgresql connect prod-dc1 postgresql-corridoring env: prod-dc1 service: postgresql-corridoring - database : corridoring host: postgresql-corridoring1 ip: 192.168.1.10 Enter the username [ENTER]: team_data Password for user team_arch: psql (9.6.6, server 9.4.12) Type "help" for help. corridoring=# # bbc redis overview prod-dc1 redis-main === Service 'prod-dc1' 'redis-main' === Redis elector master: redis-main1.prod.dc-1.blabla.com redis-main1 (192.168.1.20): PING, PORT, role:master, clients:255 redis-main2 (192.168.1.21): PING, PORT, role:slave, clients:2, slaveof:192.168.1.20 redis-main3 (192.168.1.22): PING, PORT, role:slave, clients:2, slaveof:192.168.1.20 # bbc redis list pp-dc2 redis-main pp-dc2 redis-quota pp-dc2 redis-translation pp-dc2 redis-user prod-dc1 redis-main prod-dc1 redis-quota # bbc redis connect prod-dc1 redis-main env: prod-dc1 service: redis-main host: redis-main1 ip: 192.168.1.20 role: slave 192.168.1.20:6379>
  39. 39. # bbc cassandra ping prod-dc1 cassandra-user cassandra-user1 (192.168.1.30) PING, CQL, JMX --- cassandra-user2 (192.168.1.31) PING, CQL, JMX --- cassandra-user3 (192.168.1.32) PING, CQL, JMX --- bbc command examples # bbc cassandra overview prod-dc1 cassandra-user === Service 'prod-dc1 cassandra-user' === Datacenter: prod-dc1 ==================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 192.168.1.30 6.01 GB 256 33.3% bef39dd5-d4e5-4733-93e5-75904b6d556a r10 UN 192.168.1.31 5.89 GB 256 33.3% 23b77937-2177-4638-b860-e73e4bb913d2 r10 UN 192.168.1.32 5.12 GB 256 33.3% de0f4ed1-1241-499d-9485-e73e4bb913d2 r10 Datacenter: prod-dc2 ==================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 192.168.2.10 15.69 GB 256 100.0% 3ca1e862-f3e2-4fbf-a6c1-4d7d5a3e70ec r14 UN 192.168.2.11 14.99 GB 256 100.0% de0f4ed1-1241-499d-9485-2f8196aa7425 r13 UN 192.168.2.12 16.1 GB 256 100.0% 7e5fee00-052f-4546-973d-befaebbe604b r15 Today, 32 subcommands are available on bbc...
  40. 40. What’s next?
  41. 41. Moving to Kubernetes From a simple “Distributed init system” to the standard for container orchestration. Fleet is deprecated Fleet is no longer developed and maintained by CoreOS. What does the future look like?
  42. 42. Ownership Move backends ownership to the developers teams. Moving to the cloud? Extend this idea of “expendable” services to hardware resources. Docker? Kubernetes + RKT (rktnetes, rktlet) has a poor adoption.

×