Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deploying MariaDB databases with containers at Nokia Networks

300 views

Published on

Nokia is focused on providing software and products that facilitate rapid development, deployment and scaling of products and services to customers. The Common Software Foundation (CSF) within Nokia develops and supports product reuse by multiple applications within Nokia, including MariaDB. Their focus over the last year has been to develop a containerized MariaDB solution supporting multiple architectures, including both clustering and primary/secondary replication with MariaDB MaxScale. In this talk, Rick Lane discusses this journey of these containerized solutions from development to customer trials, including problems encountered and solutions.

Published in: Software
  • Be the first to comment

Deploying MariaDB databases with containers at Nokia Networks

  1. 1. © 2018 Nokia1 Deploying MariaDB databases with containers at Nokia Deploying MariaDB solutions in containerized environments in Nokia Networks Rick Lane 27-02-2019
  2. 2. © 2018 Nokia2 Deploying MariaDB databases with containers at Nokia Deploying MariaDB solutions in containerized environments in Nokia Networks Rick Lane 27-02-2019
  3. 3. © 2018 Nokia3 © 2018 Nokia3 CMDB - MariaDB Common Software Foundation (CSF) Component MariaDB (CMDB)
  4. 4. © 2018 Nokia4 Helm/Kubernetes/Container tradeoffs Pros • Fully separates services from kernel/other services • Extremely light-weight and portable • Containers disposable  Kill and recreate pod as new  Readiness/Liveless probes automate recovery • Deploy application with multiple services in one command/click (helm umbrella charts) • Deployment time significantly faster than VM/ansible (4 minutes compared to 40 minutes) Cons • Containers disposable  Recreated with new IP (looks like new server)  Failure root cause difficult - logs disappear with container. (pod stdout or persistent storage) • Umbrella charts introduce other difficult problems  Can deploy new service instance with helm upgrade of parent chart
  5. 5. © 2018 Nokia5 Nokia Container Management Service Helm/Kubernetes Deployment Model controller worker worker worker worker worker controller edge edge deploy chart cmdb helm chart deploy pods External Connections
  6. 6. © 2018 Nokia6 Security / Affinity Helm/Kubernetes Deployment Model Security • RBAC fully supported • All containers must run as non-root user • Kubernetes RBAC ServiceAccount and Role/RoleBindings limit container privileges • Password security • All user-supplied passwords loaded to kubernetes secret during pre-install-job • Secret used to propagate passwords to maxscale/mariadb pods • Password secret deleted on post-terminate • User must provide secret with old/new password to update passwords Affinity • podAntiAffinity • hard (default) – all pods must be scheduled on separate nodes or deployment will fail • soft – try to schedule pods on separate nodes, but if will deploy anyway • nodeAffinity • mariadb pods forced to deploy on worker nodes • maxscale pods by default deploy on edge nodes (can configure to deploy on worker)
  7. 7. © 2018 Nokia7 Containers CMDB - MariaDB cmdb/mariadb (FROM centos-7.6 os base image) MariaDB database container supporting deployment all configurations (simplex, Galera, Master/Master, Master/Slave) • MariaDB-10.3.11 (client, server, backup, etc) • Galera • SDC/etcd client RPMs • CSF CMDB deployment, configuration and management RPMs cmdb/maxscale (FROM centos-7.6 os base image) MaxScale proxy container supporting deployment of data center configuration • Maxscale-2.2.19 • SDC/etcd client RPMs • CSF CMDB deployment, configuration and management RPMs cmdb/admin (FROM kubectl base image) Kubernetes/Helm Job Administration container supporting all life cycle events (install, upgrade, delete, etc) • MariaDB-10.3.11-client • SDC/etcd client RPMs • Python job orchestrator and python classes to implement configuration specific job tasks
  8. 8. © 2018 Nokia8 Helm chart (services and admin) CMDB - MariaDB ## Image Registry global: registry: "csf-docker-delivered.repo.internal.nokia.com" registry1: "registry1-docker-io.repo.internal.nokia.com" rbac_enabled: true nodeAntiAffinity: hard cluster_name: "my-cluster“ ## Topology master-slave, master-master, galera, simplex cluster_type: “master-slave“ ## Values on how to expose services ## ClusterIP will expose only within cluister, NodePort to expose externally services: ## MySQL service exposes the mysql database service (mariadb or maxscale) mysql: type: ClusterIP ## MariaDB Master exposes the pod that is master mariadb_master: type: NodePort ## Maxscale exposes the administrative interface of Maxscale maxscale: type: NodePort ## Maxctrl (optional) exposes the maxctrl administrative interface of Maxscale maxctrl: enabled: false type: ClusterIP port: 8989 admin: image: name: "cmdb/admin" tag: "4.5-1" pullPolicy: IfNotPresent ## A recovery flag. If changed, will trigger a heal of the database to occur #recovery: none quickInstall: "" ## If set, administrative jobs will be more verbose to stdout (kubectl logs) debug: false ## Exposes the hook-delete-policy. By default, this is set to delete the ## hooks only upon success. In helm v2.9+, this should be set to ## before-hook-creation. This can also be unset to avoid hook deletion ## for troubleshooting and debugging purposes hook_delete_policy: "hook-succeeded"
  9. 9. © 2018 Nokia9 Helm chart (mariadb and maxscale) CMDB - MariaDB mariadb: image: name: "cmdb/mariadb" tag: "4.5-1" pullPolicy: IfNotPresent ## The number of MariaDB pods to create count: 3 heuristic_recover: rollback use_tls: true ## Enable persistence using Persistent Volume Claims persistence: enabled: true accessMode: ReadWriteOnce size: 20Gi storageClass: "" resourcePolicy: delete preserve_pvc: false ## MariaDB server customized configuration mysqld_site_conf: |- [mysqld] userstat = on ## metrics metrics: enabled: false ## Grafana dashboard dashboard: enabled: false maxscale: image: name: "cmdb/maxscale" tag: "4.5-1” pullPolicy: IfNotPresent ## The number of MaxScale pods count: 2 ## MaxScale customized configuration maxscale_site_conf: |- [maxscale] threads = 2 query_retries = 2 query_retry_timeout = 10 [MariaDB-Monitor] monitor_interval = 1000 failcount = 4 ## MaxScale promotion/demotion SQL sql: ## Mariadb Node promoted to master promotion: [] ## Mariadb Node demoted to slave demotion: [] ## leader-elector elector: image: name: "googlecontainer/leader-elector" tag: 0.5 pullPolicy: IfNotPresent
  10. 10. © 2018 Nokia10 Events Life Cycle Management • Kubernetes native events  install = deploy chart and create resources  delete = terminate chart and delete all resources created by install  upgrade = make any changes to mariadb/maxscale resources (configuration, etc) special code to handle heal and scale-in/out events • Nokia plugin events  heal = implemented also with kubernetes upgrade admin.recovery value  scale-in/scale-out = implemented also with kubernetes upgrade mariadb.count or maxscale.count  backup/restore = implemented with Backup/Restore policy
  11. 11. © 2018 Nokia11 Kubernetes Resources Galera Cluster • Deploy mariadb-statefulset with 3+ (odd number). MariaDB pod contains: • Mariadb container  Configures mariadb in Galera configuration automatically at deploy based on IP advertisements  If pod restarts, configured to always come back to Join existing cluster  Persistent Volume Claim mounted for database storage • Backup/Restore container for scheduling routine mariadb container backups • Optional mysqld_exporter container for metrics collection (if metrics enabled) • Mysql Service created to provide access to all DB nodes (all DB nodes added to service as endpoints) • Metrics Service created to provide access to DB nodes from Grafana dashboard (if metrics enabled)
  12. 12. © 2018 Nokia12 Galera Cluster mysql load-balance pods service mariadb metrics mariadb-0 BR volume mariadb metrics mariadb-1 BR volume mariadb metrics mariadb-2 BR volume
  13. 13. © 2018 Nokia13 Kubernetes Resources Master/Slave with HA MaxScale • Deploy maxscale-statefulset with 1 to 3 pods. Maxscale pods contains: • Maxscale container  Configures maxscale using helm values and mariadb container advertised IPs (via etcd)  Monitors http://localhost:4040 for leader-elector changes (setting maxscale passive mode) • Leader-elector container for managing HA  Configured to manage kubernetes endpoint with lease for election of leader in cluster  Starts small web server to publish elected leader to port 4040 • Deploy mariadb-statefulset with 2+ (3+ odd number preferred). MariaDB pod contains: • Mariadb container  Configures mariadb in Master/Slave/Slave configuration automatically at deploy based on IP advertisements  If pod restarts, configured to always come back as a Slave  Persistent Volume Claim mounted for database storage • Backup/Restore container for scheduling routine mariadb container backups • Optional mysqld_exporter container for metrics collection (if metrics enabled) • Mysql Service created to provide access to all maxscale nodes (all maxscale nodes added to service as endpoints) • Maxctrl Service created to provide REST API access to “active” maxscale node (labeled with ‘maxscale-leader’)
  14. 14. © 2018 Nokia14 mariadb metrics mariadb-0 BR volume Master/Slave Cluster with HA MaxScale maxscale elector maxscale-0 maxscale elector maxscale-1 mysql service maxctrl maxscale-1 endpoint watching managing Master Slave Slave passive active load-balance mariadb metrics mariadb-2 BR volume mariadb metrics mariadb-1 BR volume
  15. 15. © 2018 Nokia15 Pod IP Advertisements / Single Pod Failure Pod failures result in the re-created pod being re-deployed with a new IP address (looks like a new cluster server) mariadb-0 etcd server cmdb/my-cluster/services/attributes/mariadb-0 = {“role”: “RM”, “ip”: “172.16.0.35”} cmdb/my-cluster/services/attributes/mariadb-1 = {“role”: “RS”, “ip”: “172.16.0.104”} cmdb/my-cluster/services/attributes/mariadb-2 = {“role”: “RS”, “ip”: “172.16.0.97”} cmdb/my-cluster/services/attributes/maxscale-0 = {“role”: “MXS”, “ip”: “172.16.0.39”} cmdb/my-cluster/services/attributes/maxscale-1 = {“role”: “MXS”, “ip”: “172.16.0.52”} cmdb/my-cluster/services/attributes/mariadb-2 = {“role”: “RS”, “ip”: “172.16.0.201”} mariadb-1 mariadb-2 maxscale-0 maxscale-1 172.16.0.35 172.16.0.104 172.16.0.97 172.16.0.39 172.16.0.52 172.16.0.201 mariadb-2 maxadmin alter server mariadb-2 address=172.16.0.201 Advertise IP Audit advertisements
  16. 16. © 2018 Nokia16 Galera Cluster Heal mariadb-0 mariadb-1 mariadb-2 etcd server cmdb/my-cluster/mariadb-0/config/role = “--cluster=join:SST” cmdb/my-cluster/mariadb-1/config/role = “--cluster=new” cmdb/my-cluster/mariadb-2/config/role = “--cluster=join:SST” admin post-upgrade-job Admin container heal operation (helm upgrade of admin.recovery value) etcd server cmdb/my-cluster/actions/wait_role = {“advertise”: “recovery_pos”} cmdb/my-cluster/services/recovery_pos/mariadb-0 = {“seqno”: “527”} cmdb/my-cluster/services/recovery_pos/mariadb-1 = {“seqno”: “528”} cmdb/my-cluster/services/recovery_pos/mariadb-2 = {“seqno”: “527”} (1) Write wait_role action (2) Kill all mariadb pods (3) Pods advertise recovery_pos seqno values (4) Find pod with largest seqno (5) Largest pod starts cluster, rest join (6) Pods detect role and re-deploy
  17. 17. © 2018 Nokia17 Galera Cluster Scale-Out mariadb-0 mariadb-1 mariadb-2 etcd server cmdb/my-cluster/mariadb-3/config/role = “--cluster=join:SST” cmdb/my-cluster/mariadb-4/config/role = “--cluster=join:SST”admin pre-upgrade-job Admin container scale-out operation (helm upgrade of mariadb.count) mariadb-3 mariadb-4 admin post-upgrade-job (1) Set new pods roles (2) New pods created (3) Notify existing pods of new cluster size
  18. 18. © 2018 Nokia18 Galera Cluster Scale-In mariadb-0 mariadb-1 mariadb-2 admin pre-upgrade-job Admin container scale-in operation (helm upgrade of mariadb.count) mariadb-3 mariadb-4 admin post-upgrade-job (1) Verify new cluster size (2) pods deleted (3) Notify remaining pods of new cluster size
  19. 19. © 2018 Nokia19 MaxScale Cluster Heal mariadb-0 Maxscale will auto-heal MariaDB cluster when all database pods fail mariadb-0 mariadb-0 mariadb-0 mariadb-0 mariadb-0 Remote Data Center Master SlaveSlave Topology Audit (no audit if event < 15 seconds) • After all pods restart: o Original master will be replicating from remote DC (Slave, Running) o Original slaves will still be replicating from old master (Running) • Expected_master = first server replicating to remote DC • If all other servers replicating to same server (old master) For all servers (except expected_master) CHANGE MASTER TO expected_master Run promotion.sql
  20. 20. © 2018 Nokia20 MaxScale Cluster Scale-Out etcd server cmdb/my-cluster/mariadb-3/config/role = “--replicate=slave” cmdb/my-cluster/mariadb-4/config/role = “--replicate=slave” admin pre-upgrade-job Admin container scale-out operation (helm upgrade of mariadb.count) etcd server cmdb/my-cluster/actions/wait_role = {“advertise”: “ready”} cmdb/my-cluster/services/ready/mariadb-3 = ‘true’ cmdb/my-cluster/services/ready/mariadb-4 = ‘true’ (1) Make sure master exists (2) Write wait_role action (3) New pods created (5) As ready pods detected, restore from master backup and advertise pod role mariadb-0 mariadb-1 mariadb-2 mariadb-3 mariadb-4 admin post-upgrade-job (4) Backup Master (0) M (6) Notify existing pods of new cluster size Maxscale: maxadmin create server <server> … maxadmin add server <server>
  21. 21. © 2018 Nokia21 MaxScale Cluster Scale-In mariadb-0 mariadb-1 mariadb-2 admin pre-upgrade-job Admin container scale-in operation (helm upgrade of mariadb.count) mariadb-3 mariadb-4 admin post-upgrade-job (1) Verify new cluster size (3) pods deleted (4) Notify remaining pods of new cluster size MM (2) Switchover Master via MaxScale if necessary Maxscale: maxadmin remove server <server> maxadmin destroy server <server>
  22. 22. © 2018 Nokia22 Future Work • Additional enhancements to prevent data loss  Supporting semi-sync replication in Master/Slave/Slave cluster with MaxScale  Implement preStop hook to trigger switchover if Master is being deleted (eg, for migration) • Kubernetes Horizontal Pod Autoscaling (HPA)

×