High-level architecture
of a complete MariaDB setup
A complete
deployment
A production deployment should include:
● MariaDB itself
● High Availability / Redundancy
● Delayed replica
● Automated backups and backup testing
● Monitoring
● Automation / gitops
What is a complete production deployment?
Where should
MariaDB run?
● System:
○ Linux or *BSD
○ Have enough memory and at least 2 CPU cores
○ Prefer SSD storage
○ (Xfs or ext4 + LVM) or ZFS
● I don't recommend containers
Any cloud platform that provides bare systems will be fine
Where should MariaDB run?
Storage Engines
and Plugins
MariaDB doesn't know how to read or write data, indexes,
caches, run transactions…
All these operations are delegated to a special plugin type called
the storage engines
The default is InnoDB
What is a complete production deployment?
Other notable storage engines include:
● CONNECT
● S3
● SPIDER
● RocksDB
● ColumnStore
What is a complete production deployment?
Notable plugins:
● User statistics
● On staging: SQL error log
What is a complete production deployment?
High Availability
& Scalability
High Availability and Redundancy are achieved by Replication
Which can be of the following types:
● Synchronous or asynchronous
● Single master or multi-master
High Availability & Scalability
MariaDB replication can be:
● Asynchronous replication
● Semi-sync replication
● Galera cluster
High Availability & Scalability
Asynchronous replication:
● Writes on the master (events) are logged into the binary log
● The binary log is read and sent to the replicas
● The replicas receive it and write the relay log
● The relay log events are applies to the data
○ In parallel
● Replicas can be used to scale reads, but the lag should be
monitored
● GTID allows master failover is possible
High Availability & Scalability
Delayed replication:
● Asynchronous replication can be configured to be delayed
● Eg: 1 hour, 1 day
● If an SQL statements destroys data on the master, you can use the
delayed replica to recover data
High Availability & Scalability
Semi-sync replication:
● Similar to asynchronous replication
● But when a client makes a change to the data:
○ The master waits until at least 1 replica has logged the event
into the binary log
○ Then it returns success to the client
● This means that semisync replication makes writes slower
● And if all replicas crash, the master won't accept data writes
High Availability & Scalability
Galera is virtually synchronous replication
● Each transaction needs be certified by the majority of nodes before
it is applied to any node
● Number of nodes should be odd
● Data conflicts lead to failed transactions and potentially node
eviction
● Don't use multiple masters
● Large transactions should be avoided
● AUTO_INCREMENT id's are not necessarily in chronological order
● A node slowdown slows down the cluster
High Availability & Scalability
● Galera nodes can have replicas to scale reads without affecting
writes
● A Galera node can be the master of a node of another cluster
● In this way you can have a secondary cluster
High Availability & Scalability
Proxy layer
Proxies are needed to:
● Only send queries to working servers
● Only send writes to the master(s), send reads to replicas
● Distribute the reads more or less equally amongst replicas
Proxy layer
Why not doing this in the application?
● DBAs would need to update app configuration when the list of
servers changes
● Proxies are complex pieces of software, they know their stuff
● They also have advanced features that the application can't have,
like connections multiplexing
Proxy layer
Main existing proxies:
● ProxySQL
● MaxScale
● HAProxy
Proxy layer
● Proxies can form an independent layer
● Or each app server can run a different proxy
● …or both
● In all cases, proxies can crash. Make sure you have multiple
proxies
Proxy layer
Service discovery
● Proxies send queries to running MariaDB instances
● But proxies can fail too, like any other technology
● Application should only connect to running proxies
Service discovery
Solutions:
● Hashicorp Consul
● Kubernetes
● Cloud vendor-specific solutions
Service discovery
Backups
You need to backup:
● Configuration
● Data
If you use gitops configuration can always be recreated
We'll focus on data
Backups
For proper resilience:
● Automate backups
● Automatically test backups
● Have multiple backup strategies
● Have incremental backups
Also, archive and rotate backups
Backups
Main backup tools / methods:
● Snapshots
● Mariabackup
● mariadb-dump
● Binary log
Backups
Snapshots can be taken at several levels:
● LVM
● Filesystem (ZFS)
● VM
Make sure to only include data in your snapshots
Snapshots are not copies, they are taken instantaneously
They can be incremental
Backups
Mariabackup:
● Takes a copy of the files
● (Almost) no locks are taken
● Uses InnoDB transaction logs to bring the data to a consistent state
○ Ie: undo transactions committed after the copy started
Backups
mariadb-dump:
● Writes all the SQL statement needed to recreate the database
● Slowest backup method, slowest recovery method
● Historically single-threaded
● Uses a long transaction
● But backups can be restored on any MariaDB version
Backups
Binary log:
● Contains events: binary changes to the rows, or SQL statements
and GTIDs
Can be analysed by a human with mariadb-binlog
● It should be periodically be copied to elsewhere
● When you restore a complete backup, check the latest GTID
● Then apply the binlog starting from the next GTID
Backups
Monitoring
A good, well-configured monitoring system:
● Shows detailed metrics about MariaDB performance
● Allows easy customisations
● Sends meaningful, non-redundant warnings
● Sends actionable, critical alerts
Monitoring
● Best ready-made solution: PMM
○ Includes query analysis
Some monitoring systems have templates for MariaDB or MySQL
You can also build your own solution with Prometheus + Grafana
For query analysis: ELK stack
● You can also use a solution for monitoring an application and
DB response times
Monitoring
High-level architecture of a complete MariaDB deployment

High-level architecture of a complete MariaDB deployment

  • 1.
    High-level architecture of acomplete MariaDB setup
  • 2.
  • 3.
    A production deploymentshould include: ● MariaDB itself ● High Availability / Redundancy ● Delayed replica ● Automated backups and backup testing ● Monitoring ● Automation / gitops What is a complete production deployment?
  • 4.
  • 5.
    ● System: ○ Linuxor *BSD ○ Have enough memory and at least 2 CPU cores ○ Prefer SSD storage ○ (Xfs or ext4 + LVM) or ZFS ● I don't recommend containers Any cloud platform that provides bare systems will be fine Where should MariaDB run?
  • 6.
  • 7.
    MariaDB doesn't knowhow to read or write data, indexes, caches, run transactions… All these operations are delegated to a special plugin type called the storage engines The default is InnoDB What is a complete production deployment?
  • 8.
    Other notable storageengines include: ● CONNECT ● S3 ● SPIDER ● RocksDB ● ColumnStore What is a complete production deployment?
  • 9.
    Notable plugins: ● Userstatistics ● On staging: SQL error log What is a complete production deployment?
  • 10.
  • 11.
    High Availability andRedundancy are achieved by Replication Which can be of the following types: ● Synchronous or asynchronous ● Single master or multi-master High Availability & Scalability
  • 12.
    MariaDB replication canbe: ● Asynchronous replication ● Semi-sync replication ● Galera cluster High Availability & Scalability
  • 13.
    Asynchronous replication: ● Writeson the master (events) are logged into the binary log ● The binary log is read and sent to the replicas ● The replicas receive it and write the relay log ● The relay log events are applies to the data ○ In parallel ● Replicas can be used to scale reads, but the lag should be monitored ● GTID allows master failover is possible High Availability & Scalability
  • 14.
    Delayed replication: ● Asynchronousreplication can be configured to be delayed ● Eg: 1 hour, 1 day ● If an SQL statements destroys data on the master, you can use the delayed replica to recover data High Availability & Scalability
  • 15.
    Semi-sync replication: ● Similarto asynchronous replication ● But when a client makes a change to the data: ○ The master waits until at least 1 replica has logged the event into the binary log ○ Then it returns success to the client ● This means that semisync replication makes writes slower ● And if all replicas crash, the master won't accept data writes High Availability & Scalability
  • 16.
    Galera is virtuallysynchronous replication ● Each transaction needs be certified by the majority of nodes before it is applied to any node ● Number of nodes should be odd ● Data conflicts lead to failed transactions and potentially node eviction ● Don't use multiple masters ● Large transactions should be avoided ● AUTO_INCREMENT id's are not necessarily in chronological order ● A node slowdown slows down the cluster High Availability & Scalability
  • 17.
    ● Galera nodescan have replicas to scale reads without affecting writes ● A Galera node can be the master of a node of another cluster ● In this way you can have a secondary cluster High Availability & Scalability
  • 19.
  • 20.
    Proxies are neededto: ● Only send queries to working servers ● Only send writes to the master(s), send reads to replicas ● Distribute the reads more or less equally amongst replicas Proxy layer
  • 21.
    Why not doingthis in the application? ● DBAs would need to update app configuration when the list of servers changes ● Proxies are complex pieces of software, they know their stuff ● They also have advanced features that the application can't have, like connections multiplexing Proxy layer
  • 22.
    Main existing proxies: ●ProxySQL ● MaxScale ● HAProxy Proxy layer
  • 23.
    ● Proxies canform an independent layer ● Or each app server can run a different proxy ● …or both ● In all cases, proxies can crash. Make sure you have multiple proxies Proxy layer
  • 26.
  • 27.
    ● Proxies sendqueries to running MariaDB instances ● But proxies can fail too, like any other technology ● Application should only connect to running proxies Service discovery
  • 28.
    Solutions: ● Hashicorp Consul ●Kubernetes ● Cloud vendor-specific solutions Service discovery
  • 30.
  • 31.
    You need tobackup: ● Configuration ● Data If you use gitops configuration can always be recreated We'll focus on data Backups
  • 32.
    For proper resilience: ●Automate backups ● Automatically test backups ● Have multiple backup strategies ● Have incremental backups Also, archive and rotate backups Backups
  • 33.
    Main backup tools/ methods: ● Snapshots ● Mariabackup ● mariadb-dump ● Binary log Backups
  • 34.
    Snapshots can betaken at several levels: ● LVM ● Filesystem (ZFS) ● VM Make sure to only include data in your snapshots Snapshots are not copies, they are taken instantaneously They can be incremental Backups
  • 35.
    Mariabackup: ● Takes acopy of the files ● (Almost) no locks are taken ● Uses InnoDB transaction logs to bring the data to a consistent state ○ Ie: undo transactions committed after the copy started Backups
  • 36.
    mariadb-dump: ● Writes allthe SQL statement needed to recreate the database ● Slowest backup method, slowest recovery method ● Historically single-threaded ● Uses a long transaction ● But backups can be restored on any MariaDB version Backups
  • 37.
    Binary log: ● Containsevents: binary changes to the rows, or SQL statements and GTIDs Can be analysed by a human with mariadb-binlog ● It should be periodically be copied to elsewhere ● When you restore a complete backup, check the latest GTID ● Then apply the binlog starting from the next GTID Backups
  • 38.
  • 39.
    A good, well-configuredmonitoring system: ● Shows detailed metrics about MariaDB performance ● Allows easy customisations ● Sends meaningful, non-redundant warnings ● Sends actionable, critical alerts Monitoring
  • 40.
    ● Best ready-madesolution: PMM ○ Includes query analysis Some monitoring systems have templates for MariaDB or MySQL You can also build your own solution with Prometheus + Grafana For query analysis: ELK stack ● You can also use a solution for monitoring an application and DB response times Monitoring