SlideShare a Scribd company logo
© 2019 Frédéric G. MARAND - licensed under a Creative Commons Attribution 4.0 International License.
Scaling up and accelerating Drupal 8 with NoSQL
Frédéric G. MARAND fgm - irc/twitter: @osinet
<MongoDB module maintainer />
Topic ?
Simple idea: “No SQL”
● Alternate storage engines: KV, Structures, Document,
Graph, Columnar…
● No standard, often no fixed schema, no joins, no FKs
● → Engine-specific application design
● Drupal architecture ?
Evolved idea: Not Only SQL
● For engines, add equivalent features to SQL
● For Drupal, combine SQL et NoSQL solutions
● Start from the default SQL-based architecture
● Offload services to non-SQL implementations
○ front-end caches, search engines, queue servers
○ specialized storage: cache, KV, lock, sessions…
● Often involves NoSQL as cache for SQL
espace 1 espace 2
NOSQL: do you need it ?
● Start by observing the current state
○ Database queries → devel + webprofiler
○ Cache → heisencache (D7), webprofiler (D8)
○ Build cacheability → renderviz
● Observe behaviour
○ Core observability built-in: DBTNG logging, cache decorators, QueryInterface for KV, config, content…
○ Monitoring module (400 sites) by Karan Poddar (Google SoC) and MD Systems
○ Add your choice of time-series store (e.g. Prometheus, InfluxDB) and UI (e.g. Grafana)
○ ⇨ Use it !
● You want to see this when it happens ⟶
“ “
Peter Drucker
If you can’t
measure it, you
can’t improve it.
Fixing an identified problem is cheaper than “trying things”
Fix from acquired information
● It /MAY/ involve taking queries off the main DB to a NoSQL solution
● But poorly configured NoSQL may make it worse.
“Just do it” ?
● Drupal is built on SQL:
○ Views depends on it by default
○ Most sites rely on Views data model awareness
○ → Contrib often assumes SQL, injects @database
○ NoSQL support doable, rarely done
● Contrib support level is limited
○ Most NoSQL contrib not ported from D7 to D8
○ Drupalshop knowledge limited except biggest or
○ Products may die… e.g. RethinkDB
● Pro support from publishers = costs. Availability.
● Extra support needed = costs
NoSQL == added build costs
→ balance gains vs costs
Example case: RethinkDB
At DevDays Milan 2016, after lots of work, Gizra’s @RoySegall
demoed a Drupal 8 ORM/ODM for RethinkDB.
Then, this happened...
“ “
Do you really need it ?
Front caching
Caching ahead of real work
Default situation with SQL
● Browser caching, limited
● Internal / dynamic page cache in main SQL DB
● Need DB connection, a few SELECT queries
● Fetch cache from DB
● All data from main storage
● ⇨ Serve cached pages in about 20 msec
All this work makes DoS-ing comparatively cheap.
NoSQL improvements
● Add caching ahead of site itself
○ Browser
■ Optimized browser caching (Cache-Control)
■ PWA: use browser local storage
■ CDN module (2k sites)
■ Akamai module (600 sites)
■ ⇨ Serve cached pages in about 15 msec (TTFB)
■ Web-scale
○ Varnish and other reverse proxies
■ ⇨ Serve cached pages in about 10 msec (TTFB)
■ Core support
■ Varnish Purger (3k sites)
● ⇨ Most request will mean 0 SQL queries
○ DoS-ing more costly, especially with CDN
● Move page caches off main DB: next section
Storage: the “Big 3”
The most active NoSQL suites for Drupal 8.x
● Type: Key-value (structure server)
● Module
○ redis
● DB-Engines ranking:
○ #1 Key-value store
● Usage
○ Drupal 7: 10k sites
○ Drupal 8: 10k sites
● Supported by
○ Drupal 7: Makina Corpus
○ Drupal 8: MD Systems
● Type: Key-value
● Module
○ memcache
● DB-Engines ranking:
○ #3 Key-value store
○ #5 Key-value store (Hazelcast)
● Usage (memcache_storage)
○ Drupal 7: 32k (2k) sites
○ Drupal 8: 15k (800) sites
● Supported by:
○ Acquia
○ Tag1 Consulting
MongoDB / CosmosDB
● Type: Document store
● Module
○ mongodb
● DB-Engines ranking:
○ #1 Document store (MongoDB)
○ #4 Document store (CosmosDB)
● Usage
○ Drupal 7: 300 sites
○ Drupal 8: 50 sites
● Supported by
○ OSInet
● Driver support
○ phpredis and predis both supported
● Supported Services
○ Driver adapter for custom code
○ Cache, including invalidations
○ Flood
○ Lock
○ Lock.Persistent
○ Queue
● CLI support
○ Not included
● Other modules
○ Redis Watchdog: logger + UI
Recent events (from @Berdir)
● Deadlock/race condition on node_list invalidations
(#2966607) finally fixed in core 8.8.x with latest
● php-redis 5.0 broke module, fixed in latest 8.x and 7.x
● Module users: please test and report !
Performance / scalability
● Performance, single-server
○ Memory-only implementation
■ Usually among the fastest
■ Often the fastest
■ Even with concurrent access
○ Persistent
■ A bit slower even with just RDB
■ Slower with AOF
● Persistence, single instance
○ RDB:
■ compact snapshots, shippable off-site
■ data loss: since latest snapshot
■ up to last-second fsync’ed journal
■ less compact
● Fault-tolerance: Sentinel 2
○ master/slave supervision
○ automatic failover possible
○ observability support
● Scaling
○ Cluster-based sharding
○ Master → Slaves → Slaves
○ No strong consistency
○ Recommended config: 6 servers
● Cloud-native:
○ Redis Enteprise Cloud
○ AWS Elasticache, Azure, Google Memorystore
○ many others
● Driver support
○ memcache extension (limited availability)
○ memcached extension
○ PHP ≥ 5.6
● Supported Services
○ Driver adapter for custom code
○ Cache, including invalidations
○ Lock
○ Lock.Persistent removed in #2995907
○ Sessions ported, then removed in 7.x
○ Monitoring UI
● CLI support
○ Not included: core commands
● Other module: memcache_storage
○ Cache with core SQL invalidations
○ No lock
○ Monitoring UI
Recent events (from @Berdir)
● Deadlock/race condition on node_list invalidations
(#2966607) finally fixed in core 8.8.x with latest
release, based on Redis fix.
● Performance, single-server
○ Memory-only implementation
■ Usually among the fastest
■ Slower than in-memory Redis
■ A bit faster than to MySQL / MongoDB K/V
○ Persistence: extstore NVRAM support
■ No significant slowdown
■ Usually a bad idea (expectations)
● Fault-tolerance
○ Module support for sharded clusters
○ Consistent hashing: avoid thundering herd prob.
○ Replication: with Hazelcache
Performance / scalability
● Scaling
○ Cluster-based sharding
○ Consistent hashing allows elastic scaling
○ Recommended config: 2 instances per
cluster, 1 cluster per bin, with some
exceptions: usually 10-20 instances per D8 site
○ Some bins must stay in core (form, update)
● Monitoring
○ Instant: module-provided memcache_admin
○ Evolved: phpmemcacheadmin
● Cloud-native
○ AWS Elasticache
○ Azure Memcached Cloud
○ Google AppEngine Memcache
Mainstream packages
Drupal 7 features
● Driver support:
○ mongo extension for PHP 5.x
○ mongodb extension for PHP 7.x
○ MongoDB 2.x, 3.x
● Supported Services
○ Driver adapter for custom code
○ Block
○ Cache
○ Path
○ Queue
● Unsupported services
○ Field storage
○ Lock
○ (Session)
○ Watchdog = logger + UI
● Other modules
○ Views driver: EFQ Views
Drupal 8.x-2.x features
● Driver support
○ mongodb extension for PHP ≥ 7.1
○ mongodb/mongodb php driver
○ MongoDB 3.x, 4.x
● Supported Services
○ Driver adapter for custom code
○ Key-value (e.g. State)
○ Key-value expirable (e.g. *tempstore*, form_cache)
○ Watchdog = logger + UI
● CLI support
○ Drupal Console 1.9.x
○ Drush 9.x
● Other services
○ Entity/field storage
● Other modules
○ MongoDB Indexer
Exotic packages
Drupal 8.x-1.x
● Driver support:
○ mongo extension for PHP 5.x
○ MongoDB 3.x
● Supported services
○ Complete NoSQL distribution
○ @database implementation
○ No SQL DBMS needed
○ Unpatched Drupal core
● Status
○ Sponsored by MongoDB, led by chx
○ Development halted before Drupal 8.0.0
● Performance:
○ About 4x faster than equivalent Drupal core
● Driver support
○ mongo extension for PHP ≥ 5.6
○ MongoDB ≥ 3.6
● Supported Services
○ Complete NoSQL distribution
○ @database implementation
● Source: patched Drupal core + module
● CLI support
○ Drupal Console 1.x
○ Drush 9.x
● Status
○ No issue queue
○ Active, led by daffie
espace réservé non accepté
Performance / scalability
Engine features
● Fault-tolerance
○ Built-in replication
○ Recommended config: 2+1 servers
● Scaling
○ Read-only replicas
○ Data-center awareness
○ Sharding
● Both supported by existing module
Monitoring / Ops
● In-module: logs
● Cloud: MongoDB Atlas, free monitoring, OpsManager
Cloud native
● Azure: CosmosDB
● MongoDB: Atlas
● Mlab (née Mongolab)
Production example
Custom social network (2M users), migrated from MySQL:
MySQL slow queries: -85%, uncached content build time: -98%
NoSQL storage features
Other NoSQL support modules
NoSQL Product Module Wrapper Features 7.x 8.x Supported ?
Neo4J neo4j Y - Y Y N
RethinkDB renthinkdb Y ORM N Y ?
CouchDB couchdb Y Node export Y N N
Couchbase couchbase Y Logger + UI Y N ?
ElasticSearch elasticsearch_connector Y Logger + improved UI,
Statistics, Views
SearchAPI Y Y
AWS DynamoDB dynamodb N Cache Y N ?
AWS SimpleDB awssdk, creeper Y - Y N ?
Riak riak_field_storage Y Field storage, map-reduce Y N unsupported
Apache Cassandra cassandra Y Example app 6.x N unsupported
Tokyo Tyrant node/844354 N Logger + UI 6.x N unapproved
NoSQL Sessions ?
● Why the weak/removed session support, especially for memcache ?
○ Memcache session support is baked in PHP memcached extension
○ It was popular in Drupal 6.x time
○ It is popular in Symfony, even documented on
○ So ?
● Experience
○ Session data
○ Instance restart → all sessions data on instance lost
○ Bigger session data saturating bin → evictions
○ LRU means vulnerability to DoS-ing and blocking admins via evictions
○ DB load is bigger in Drupal than most frameworks
■ Session DB load is a smaller part of load for us
Logs in core
The “SQL” problem
● All sites really need some sort of logging feature
● Smaller sites only have a database
○ ⇨ Database Logging default-enabled
● Code is not perfect, throws notices, errors
● Modules are verbose, log debug info
● “Drupal is too slow, please help, agency is stuck”
○ ⇨ Audit : 1500 inserts/min in watchdog table
○ ⇨ Other audits: watchdog > 99% of site size
● DBlog inserts compete with content work
● Owner disables logging
○ ⇨ now misses essential info
● Does not disable logging
○ ⇨ now can’t find essential info buried in noise
The core NoSQL module
● Core has been bundling a syslog client since 6.0
● Decouple logs from DB load
○ ⇨ No more SQL logs workload
● But where do they go ?
○ ⇨ Needs OS-level configuration
● How are logs cleaned ?
○ ⇨ Needs OS-level configuration
● Where is the UI ?
○ ⇨ Needs extra tools
● Solutions ?
○ D7 has logging hook
○ D8 has PSR/3 standard logging
○ ⇨ Contributions
NoSQL on-site logs
● mongodb_watchdog
○ Logger service
■ Standard Drupal PSR/3 logs backend
■ Pre-storage filtering
■ Uses capped collections: auto-rotation, no ops
■ Dedicated database: zero contention
■ Per-request event tracing
○ Improved logs UI
■ Based on core UI
■ Groups recurring events on single line
■ Details page for occurrences
■ Per-HTTP-request log page
○ Most common reason to deploy MongoDB on D8
● redis_watchdog
○ Logger service
○ Logs UI based on core UI
○ Usage: 1 site
Off-site logs: BELK stack
BELK stack
● Beats (typically FileBeat)
● Elastic Search
● Logstash
● Kibana
● Drupal syslog → local syslog server → local logs
● DON’T log straight from Drupal
● Filebeat pulls logs, sends to Logstash
● Logstash massages logs, sends to ES
● ES provides storage, indexing
● Kibana provides UI
● Hosted with site
● SaaS: Loggly,, ...
Off-site logs: Graylog
● Dual server: ES (logs, search) + MongoDB (meta, conf)
● Includes GROK log handling
● Accept syslog or GELF input
● Designed from Splunk
● Drupal syslog → local syslog server → local logs
● DON’T log straight from Drupal via monolog_gelf
● Local syslog forwards to Graylog2
● Graylog2 massages logs, sends to ES
● ES provides storage, indexing
● Graylog2 provides UI
● Hosted with site
● SaaS: StackHero
(source: Graylog)
Off-site logs: BELK vs Graylog design
Non-SQL Logs: do I need them ?
● Small site, little traffic, single webmaster: just use dblog
● Any other site: upgrade to something else
○ Hosting company provides a logs dashboard (e.g. Splunk): use it
■ syslog into their stack, via local syslog then pull
○ Have an internal ops team ?
■ syslog into internal BELK or Graylog
○ No ops expertise ? don’t have time to learn Kibana/Graylog ? hosting company
doesn’t provide real time logs access ?
■ Want to minimize costs and/or have logs in-site ?
● use mongodb_watchdog
■ Otherwise, use SaaS logs vendor
● Datadog, Scalyr, Loggly or Papertrail (SolarWinds),
Queue API services
● Core: mostly for Batch API
● General D8 use: proxy invalidation
○ Invalidation queues
● Commerce sites
○ ERP links
○ Third-party catalog/inventory
● Media sites
○ Real time news feeds ingestion
○ Deferred derived media generation
Queue modules
● Core bundled: queue.database service
○ used by all Drupal sites
● advanced_queue project
○ created for Drupal Commerce projects
○ used by Commerce 2.x
NoSQL: storage-based
● Core bundled: queue.memory service
● Redis:
○ 7.x: redis_queue project
○ 8.x: redis project
● MongoDB
○ 7.x: mongodb project
NoSQL: message servers
● Beanstalkd
○ 6.x/7.x: popular, used by itself
○ 8.x complete port, but no users (?)
● RabbitMQ
○ 7.x: little used, 8.x: most popular
○ Users include public TV, major french e-tailer
○ Hardened by production at these levels
○ 7.x: some use, but no 8.x port
● Apache Kafka
○ 8.x only
○ Created for largest french retail chain
● Other queue services
○ Less used: Gearman, IronMQ, 0MQ
○ No 8.x versions
Queue API modules by usage D7/D8
NoSQL Queue: do I need it ?
● Mainstream Drupal site without Varnish / CDN
○ probably not, advancedqueue is still a nice improvement though
● Content site with a lot of generated content, Varnish and/or CDN
○ consider using Redis (D8), MongoDB (D7), RabbitMQ (D8)
○ or use Kafka (D8) if you need to (e.g. corporate mandate)
● Drupal Commerce standalone
○ advancedqueue is normally enough
● Site generating lots of dynamic media (image, video, sound) ...or ingesting fast feeds (> 1 item/sec)
○ need a dedicated message server
NoSQL Queue: which should I use ?
● The one your ops team supports best
○ Content management has a low event rate (< 1 event/sec)
● Kafka-class is for high-throughput queues
○ Think LinkedIn, Twitter, Netflix, Spotify, Airbnb, Paypal…
● RabbitMQ is solid
○ usually well known and monitored
○ D8 driver used for years on Cyber Monday, Black Friday, Olympic games...
● Beanstalkd is simple
○ It “just works”
○ Good first queue upgrading from DB
SQL-based search
● Search has long been the weakest core feature in Drupal
○ In spite of improvements with each version
● Relevant issues
○ Good recall, but bad precision
○ Multilingual support, but no language awareness
○ Low awareness of language inflections → preprocessing API
○ Limited ability to handle asian (CJK) languages
○ Slow updates, cron-based pull mode
○ Indexing costs impacting site users
○ Indexed search for content only → search plugins
○ Other entity types limited to unindexed search by default
○ No support for restricted content search
● Useful complements: porterstemmer, snowball_stemmer
● SQL Alternative: Search API database search. Similar.
NoSQL search solutions
Cloud-based / SaaS
● SaaS offerings:
○ Algolia
○ Google CSE
● Drupal Hosting offerings (alphabetic order):
○ Acquia Search SOLR
○ Pantheon SOLR
○ ElasticSearch / SOLR
On-site / near-site
● Core support: Search API (14% of D7, 16% of D8 sites)
● Standard solution:
○ Local SOLR
○ Multilingual search supported
● Alternatives:
○ Elastic Search → heart of BELK suite
○ Xunsearch: Xapian for Chinese
○ Xapian (8.x dev)
● D7 backends not on D8:
○ Elastic Search via Elastica
○ Google Search Appliance: killed by Google
○ MongoDB via MongoDB module
○ Sphinx
● Proprietary search engine publishers have custom,
unpublished, non-GPL (!) Drupal modules
SQL and NoSQL search solutions by usage in D8
Non-core search: which should I use ?
● Any content deserves search
○ Core for small content quantities
○ Search API DB backend used by
● SaaS
○ For entry level: Algolia/Google = 0 recurring cost, near 0 set-up cost
○ Both perform better than core, but non-free
● Drupal PaaS have managed ES/SOLR
● Others: cost equilibrium
○ ES/SOLR have setup and recurring costs of possession (server load)
○ SaaS has lower set-up costs, but recurring fees
○ Core search has the cost of lost opportunity
Best practices
Best current practice: NoSQL in general
Drupal 8 core tries hard to be SQL-agnostic
● Every use of the DB goes through @database
○ So anything able to pass for a SQL engine may be used
○ The mongodb_dbtng, mongodb 8.x-1.x, and Drumongous projects do just that
● Even Views has a query plugin. Project efq_views (7.x, 8.x) supports NoSQL engines that way
● No service except “storage” services should receive databases
○ Write a storage service for your data, defining its interface
○ Write a SQL provider implementing it, receiving @database
○ Tag the service as “backend_overridable”
○ Core mostly does it, custom code should always do it.
● References:
Best current practice: MongoDB
● Connecting to MongoDB with 8.x-2.x
○ Using multiple databases ? Use @mongodb.client_factory
■ The client you get is a standard mongodb/mongodb Client instance
■ You have to handle topology
○ Using single database ? Use @mongodb.database_factory
■ The database you get is a standard mongodb/mongodb Database instance
■ Your DB topology is now configurable in settings
○ You probably don’t want to use Doctrine ODM, especially when interacting with Drupal data
● Designing a custom schema
○ Start from the queries, not from some canonicalization
○ For large scale data sets, consider:
■ Splitting live and archive data for sharding
■ Having a write DB and a read DB, and a CLI-based service between them - read about CQRS
○ Never use a monotonic increasing key for sharding
○ In most cases, joined data in lists don’t need to be as up-to-date as primary views
■ Embed “light” versions of dependent objects for lists, only use $lookup and DBRef joins on full datum view
“ “
There, I said it !
Contribution is
its own reward
Join us for
contribution opportunities
Thursday, October 31, 2019
Room: Europe Foyer 2
First Time
Contributor Workshop
Room: Diamond Lounge
Room: Europe Foyer 2
What did you think?
Locate this session at the DrupalCon Amsterdam website:
Take the Survey!

More Related Content

What's hot

Airflow 101
Airflow 101Airflow 101
Airflow 101
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
Shiao-An Yuan
Introduction to Prometheus and Cortex (WOUG)
Introduction to Prometheus and Cortex (WOUG)Introduction to Prometheus and Cortex (WOUG)
Introduction to Prometheus and Cortex (WOUG)
DevOps with Kubernetes
DevOps with KubernetesDevOps with Kubernetes
DevOps with Kubernetes
EastBanc Tachnologies
Ansible roles done right
Ansible roles done rightAnsible roles done right
Ansible roles done right
Dan Vaida
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
Derrick Qin
Opa gatekeeper
Opa gatekeeperOpa gatekeeper
Opa gatekeeper
Rita Zhang
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics
Juraj Hantak
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
Marco Pas
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
Command Prompt., Inc
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
Liangjun Jiang
Kubernetes #4 volume &amp; stateful set
Kubernetes #4   volume &amp; stateful setKubernetes #4   volume &amp; stateful set
Kubernetes #4 volume &amp; stateful set
Terry Cho
Kubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive OverviewKubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive Overview
Bob Killen
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
What we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on KubernetesWhat we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on Kubernetes
Apache airflow
Apache airflowApache airflow
Apache airflow
Pavel Alexeev
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Sean Cohen

What's hot (20)

Airflow 101
Airflow 101Airflow 101
Airflow 101
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
Introduction to Prometheus and Cortex (WOUG)
Introduction to Prometheus and Cortex (WOUG)Introduction to Prometheus and Cortex (WOUG)
Introduction to Prometheus and Cortex (WOUG)
DevOps with Kubernetes
DevOps with KubernetesDevOps with Kubernetes
DevOps with Kubernetes
Ansible roles done right
Ansible roles done rightAnsible roles done right
Ansible roles done right
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
Opa gatekeeper
Opa gatekeeperOpa gatekeeper
Opa gatekeeper
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
Kubernetes #4 volume &amp; stateful set
Kubernetes #4   volume &amp; stateful setKubernetes #4   volume &amp; stateful set
Kubernetes #4 volume &amp; stateful set
Kubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive OverviewKubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive Overview
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
What we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on KubernetesWhat we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on Kubernetes
Apache airflow
Apache airflowApache airflow
Apache airflow
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019

Similar to Scaling up and accelerating Drupal 8 with NoSQL

MySQL and MariaDB Backups
MySQL and MariaDB BackupsMySQL and MariaDB Backups
MySQL and MariaDB Backups
Federico Razzoli
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
Deep Kapadia
PL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptxPL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptx
Vinicius M Grippa
Ukoug 2011 mysql_arch_for_orcl_dba
Ukoug 2011 mysql_arch_for_orcl_dbaUkoug 2011 mysql_arch_for_orcl_dba
Ukoug 2011 mysql_arch_for_orcl_dba
Scaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry PolyakovskyScaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry Polyakovsky
Redis Labs
Doctrine Project
Doctrine ProjectDoctrine Project
Doctrine Project
Daniel Lima
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
Decoupled (Headless) Drupal
Decoupled (Headless) DrupalDecoupled (Headless) Drupal
Decoupled (Headless) Drupal
Daniel Stout
Redis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs TalksRedis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs Talks
Redis Labs
Plain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticalsPlain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticalsAngela Byron
Benchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetesBenchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetes
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo SeidelOSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
Drupal 7 and RDF
Drupal 7 and RDFDrupal 7 and RDF
Drupal 7 and RDF
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Chris Shenton
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdfLupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
Drupal 7 performance and optimization
Drupal 7 performance and optimizationDrupal 7 performance and optimization
Drupal 7 performance and optimization
Shafqat Hussain
HTML, CSS & Javascript Architecture (extended version) - Jan Kraus
HTML, CSS & Javascript Architecture (extended version) - Jan KrausHTML, CSS & Javascript Architecture (extended version) - Jan Kraus
HTML, CSS & Javascript Architecture (extended version) - Jan Kraus
Women in Technology Poland
What's New in OpenLDAP
What's New in OpenLDAPWhat's New in OpenLDAP
What's New in OpenLDAPLDAPCon
Scaling symfony apps
Scaling symfony appsScaling symfony apps
Scaling symfony apps
Matteo Moretti

Similar to Scaling up and accelerating Drupal 8 with NoSQL (20)

MySQL and MariaDB Backups
MySQL and MariaDB BackupsMySQL and MariaDB Backups
MySQL and MariaDB Backups
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
PL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptxPL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptx
Ukoug 2011 mysql_arch_for_orcl_dba
Ukoug 2011 mysql_arch_for_orcl_dbaUkoug 2011 mysql_arch_for_orcl_dba
Ukoug 2011 mysql_arch_for_orcl_dba
Scaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry PolyakovskyScaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry Polyakovsky
Doctrine Project
Doctrine ProjectDoctrine Project
Doctrine Project
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
Decoupled (Headless) Drupal
Decoupled (Headless) DrupalDecoupled (Headless) Drupal
Decoupled (Headless) Drupal
Redis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs TalksRedis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs Talks
Plain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticalsPlain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticals
Benchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetesBenchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetes
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo SeidelOSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
Drupal 7 and RDF
Drupal 7 and RDFDrupal 7 and RDF
Drupal 7 and RDF
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdfLupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
Drupal 7 performance and optimization
Drupal 7 performance and optimizationDrupal 7 performance and optimization
Drupal 7 performance and optimization
HTML, CSS & Javascript Architecture (extended version) - Jan Kraus
HTML, CSS & Javascript Architecture (extended version) - Jan KrausHTML, CSS & Javascript Architecture (extended version) - Jan Kraus
HTML, CSS & Javascript Architecture (extended version) - Jan Kraus
What's New in OpenLDAP
What's New in OpenLDAPWhat's New in OpenLDAP
What's New in OpenLDAP
Drupal performance
Drupal performanceDrupal performance
Drupal performance
Scaling symfony apps
Scaling symfony appsScaling symfony apps
Scaling symfony apps

More from OSInet

Interface texte plein écran en Go avec TView
Interface texte plein écran en Go avec TViewInterface texte plein écran en Go avec TView
Interface texte plein écran en Go avec TView
Mon site web est hacké ! Que faire ?
Mon site web est hacké ! Que faire ?Mon site web est hacké ! Que faire ?
Mon site web est hacké ! Que faire ?
Faster Drupal sites using Queue API
Faster Drupal sites using Queue APIFaster Drupal sites using Queue API
Faster Drupal sites using Queue API
Life after the hack
Life after the hackLife after the hack
Life after the hack
Delayed operations with queues for website performance
Delayed operations with queues for website performanceDelayed operations with queues for website performance
Delayed operations with queues for website performance
Drupal 8 : regards croisés
Drupal 8 : regards croisésDrupal 8 : regards croisés
Drupal 8 : regards croisés
Cache speedup with Heisencache for Drupal 7 and Drupal 8
Cache speedup with Heisencache for Drupal 7 and Drupal 8Cache speedup with Heisencache for Drupal 7 and Drupal 8
Cache speedup with Heisencache for Drupal 7 and Drupal 8
Recueil des mauvaises pratiques constatées lors de l'audit de sites Drupal 7
Recueil des mauvaises pratiques constatées lors de l'audit de sites Drupal 7Recueil des mauvaises pratiques constatées lors de l'audit de sites Drupal 7
Recueil des mauvaises pratiques constatées lors de l'audit de sites Drupal 7
Le groupe PHP-FIG et les standards PSR
Le groupe  PHP-FIG et les standards PSRLe groupe  PHP-FIG et les standards PSR
Le groupe PHP-FIG et les standards PSR
Les blocs Drupal de à Drupal 8
Les blocs Drupal de à Drupal 8Les blocs Drupal de à Drupal 8
Les blocs Drupal de à Drupal 8
Utiliser drupal
Utiliser drupalUtiliser drupal
Utiliser drupal
Equipe drupal
Equipe drupalEquipe drupal
Equipe drupalOSInet
Pourquoi choisir un CMS Open Source ?
Pourquoi choisir un CMS Open Source ?Pourquoi choisir un CMS Open Source ?
Pourquoi choisir un CMS Open Source ?OSInet
Drupal et le NoSQL - drupagora 2011
Drupal et le NoSQL - drupagora 2011Drupal et le NoSQL - drupagora 2011
Drupal et le NoSQL - drupagora 2011
Drupal Views development
Drupal Views developmentDrupal Views development
Drupal Views development

More from OSInet (15)

Interface texte plein écran en Go avec TView
Interface texte plein écran en Go avec TViewInterface texte plein écran en Go avec TView
Interface texte plein écran en Go avec TView
Mon site web est hacké ! Que faire ?
Mon site web est hacké ! Que faire ?Mon site web est hacké ! Que faire ?
Mon site web est hacké ! Que faire ?
Faster Drupal sites using Queue API
Faster Drupal sites using Queue APIFaster Drupal sites using Queue API
Faster Drupal sites using Queue API
Life after the hack
Life after the hackLife after the hack
Life after the hack
Delayed operations with queues for website performance
Delayed operations with queues for website performanceDelayed operations with queues for website performance
Delayed operations with queues for website performance
Drupal 8 : regards croisés
Drupal 8 : regards croisésDrupal 8 : regards croisés
Drupal 8 : regards croisés
Cache speedup with Heisencache for Drupal 7 and Drupal 8
Cache speedup with Heisencache for Drupal 7 and Drupal 8Cache speedup with Heisencache for Drupal 7 and Drupal 8
Cache speedup with Heisencache for Drupal 7 and Drupal 8
Recueil des mauvaises pratiques constatées lors de l'audit de sites Drupal 7
Recueil des mauvaises pratiques constatées lors de l'audit de sites Drupal 7Recueil des mauvaises pratiques constatées lors de l'audit de sites Drupal 7
Recueil des mauvaises pratiques constatées lors de l'audit de sites Drupal 7
Le groupe PHP-FIG et les standards PSR
Le groupe  PHP-FIG et les standards PSRLe groupe  PHP-FIG et les standards PSR
Le groupe PHP-FIG et les standards PSR
Les blocs Drupal de à Drupal 8
Les blocs Drupal de à Drupal 8Les blocs Drupal de à Drupal 8
Les blocs Drupal de à Drupal 8
Utiliser drupal
Utiliser drupalUtiliser drupal
Utiliser drupal
Equipe drupal
Equipe drupalEquipe drupal
Equipe drupal
Pourquoi choisir un CMS Open Source ?
Pourquoi choisir un CMS Open Source ?Pourquoi choisir un CMS Open Source ?
Pourquoi choisir un CMS Open Source ?
Drupal et le NoSQL - drupagora 2011
Drupal et le NoSQL - drupagora 2011Drupal et le NoSQL - drupagora 2011
Drupal et le NoSQL - drupagora 2011
Drupal Views development
Drupal Views developmentDrupal Views development
Drupal Views development

Recently uploaded

Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Output determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CCOutput determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CC
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras

Recently uploaded (16)

Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Output determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CCOutput determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CC
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx

Scaling up and accelerating Drupal 8 with NoSQL

  • 1. © 2019 Frédéric G. MARAND - licensed under a Creative Commons Attribution 4.0 International License. Scaling up and accelerating Drupal 8 with NoSQL Frédéric G. MARAND fgm - irc/twitter: @osinet <MongoDB module maintainer />
  • 3. Topic ? Simple idea: “No SQL” ● Alternate storage engines: KV, Structures, Document, Graph, Columnar… ● No standard, often no fixed schema, no joins, no FKs ● → Engine-specific application design ● Drupal architecture ? Evolved idea: Not Only SQL ● For engines, add equivalent features to SQL ● For Drupal, combine SQL et NoSQL solutions ● Start from the default SQL-based architecture ● Offload services to non-SQL implementations ○ front-end caches, search engines, queue servers ○ specialized storage: cache, KV, lock, sessions… ● Often involves NoSQL as cache for SQL espace 1 espace 2
  • 4. NOSQL: do you need it ? ● Start by observing the current state ○ Database queries → devel + webprofiler ○ Cache → heisencache (D7), webprofiler (D8) ○ Build cacheability → renderviz ● Observe behaviour ○ Core observability built-in: DBTNG logging, cache decorators, QueryInterface for KV, config, content… ○ Monitoring module (400 sites) by Karan Poddar (Google SoC) and MD Systems ○ Add your choice of time-series store (e.g. Prometheus, InfluxDB) and UI (e.g. Grafana) ○ ⇨ Use it ! ● You want to see this when it happens ⟶
  • 5. “ “ Peter Drucker If you can’t measure it, you can’t improve it.
  • 6. Fixing an identified problem is cheaper than “trying things” Fix from acquired information ● It /MAY/ involve taking queries off the main DB to a NoSQL solution ● But poorly configured NoSQL may make it worse.
  • 7. “Just do it” ? ● Drupal is built on SQL: ○ Views depends on it by default ○ Most sites rely on Views data model awareness ○ → Contrib often assumes SQL, injects @database ○ NoSQL support doable, rarely done ● Contrib support level is limited ○ Most NoSQL contrib not ported from D7 to D8 ○ Drupalshop knowledge limited except biggest or specialized ○ Products may die… e.g. RethinkDB ● Pro support from publishers = costs. Availability. ● Extra support needed = costs NoSQL == added build costs → balance gains vs costs Example case: RethinkDB At DevDays Milan 2016, after lots of work, Gizra’s @RoySegall demoed a Drupal 8 ORM/ODM for RethinkDB. Then, this happened...
  • 10. Caching ahead of real work Default situation with SQL ● Browser caching, limited ● Internal / dynamic page cache in main SQL DB ● Need DB connection, a few SELECT queries ● Fetch cache from DB ● All data from main storage ● ⇨ Serve cached pages in about 20 msec All this work makes DoS-ing comparatively cheap. NoSQL improvements ● Add caching ahead of site itself ○ Browser ■ Optimized browser caching (Cache-Control) ■ PWA: use browser local storage ○ CDN ■ CDN module (2k sites) ■ Akamai module (600 sites) ■ ⇨ Serve cached pages in about 15 msec (TTFB) ■ Web-scale ○ Varnish and other reverse proxies ■ ⇨ Serve cached pages in about 10 msec (TTFB) ■ Core support ■ Varnish Purger (3k sites) ● ⇨ Most request will mean 0 SQL queries ○ DoS-ing more costly, especially with CDN ● Move page caches off main DB: next section
  • 13. Storage: the “Big 3” The most active NoSQL suites for Drupal 8.x Redis ● Type: Key-value (structure server) ● Module ○ redis ● DB-Engines ranking: ○ #1 Key-value store ● Usage ○ Drupal 7: 10k sites ○ Drupal 8: 10k sites ● Supported by ○ Drupal 7: Makina Corpus ○ Drupal 8: MD Systems Memcached ● Type: Key-value ● Module ○ memcache ● DB-Engines ranking: ○ #3 Key-value store ○ #5 Key-value store (Hazelcast) ● Usage (memcache_storage) ○ Drupal 7: 32k (2k) sites ○ Drupal 8: 15k (800) sites ● Supported by: ○ Acquia ○ Tag1 Consulting MongoDB / CosmosDB ● Type: Document store ● Module ○ mongodb ● DB-Engines ranking: ○ #1 Document store (MongoDB) ○ #4 Document store (CosmosDB) ● Usage ○ Drupal 7: 300 sites ○ Drupal 8: 50 sites ● Supported by ○ OSInet
  • 14. Redis ● Driver support ○ phpredis and predis both supported ● Supported Services ○ Driver adapter for custom code ○ Cache, including invalidations ○ Flood ○ Lock ○ Lock.Persistent ○ Queue ● CLI support ○ Not included ● Other modules ○ Redis Watchdog: logger + UI Recent events (from @Berdir) ● Deadlock/race condition on node_list invalidations (#2966607) finally fixed in core 8.8.x with latest release ● php-redis 5.0 broke module, fixed in latest 8.x and 7.x releases ● Module users: please test and report !
  • 15. Performance / scalability Redis ● Performance, single-server ○ Memory-only implementation ■ Usually among the fastest ■ Often the fastest ■ Even with concurrent access ○ Persistent ■ A bit slower even with just RDB ■ Slower with AOF ● Persistence, single instance ○ RDB: ■ compact snapshots, shippable off-site ■ data loss: since latest snapshot ○ AOF ■ up to last-second fsync’ed journal ■ less compact ● Fault-tolerance: Sentinel 2 ○ master/slave supervision ○ automatic failover possible ○ observability support ● Scaling ○ Cluster-based sharding ○ Master → Slaves → Slaves ○ No strong consistency ○ Recommended config: 6 servers ● Cloud-native: ○ Redis Enteprise Cloud ○ AWS Elasticache, Azure, Google Memorystore ○ many others
  • 16. Redis ● Driver support ○ memcache extension (limited availability) ○ memcached extension ○ PHP ≥ 5.6 ● Supported Services ○ Driver adapter for custom code ○ Cache, including invalidations ○ Lock ○ Lock.Persistent removed in #2995907 ○ Sessions ported, then removed in 7.x ○ Monitoring UI ● CLI support ○ Not included: core commands ● Other module: memcache_storage ○ Cache with core SQL invalidations ○ No lock ○ Monitoring UI Recent events (from @Berdir) ● Deadlock/race condition on node_list invalidations (#2966607) finally fixed in core 8.8.x with latest release, based on Redis fix.
  • 17. ● Performance, single-server ○ Memory-only implementation ■ Usually among the fastest ■ Slower than in-memory Redis ■ A bit faster than to MySQL / MongoDB K/V ○ Persistence: extstore NVRAM support ■ No significant slowdown ■ Usually a bad idea (expectations) ■ emory/ ● Fault-tolerance ○ Module support for sharded clusters ○ Consistent hashing: avoid thundering herd prob. ○ Replication: with Hazelcache Performance / scalability Redis ● Scaling ○ Cluster-based sharding ○ Consistent hashing allows elastic scaling ○ Recommended config: 2 instances per cluster, 1 cluster per bin, with some exceptions: usually 10-20 instances per D8 site ○ Some bins must stay in core (form, update) ● Monitoring ○ Instant: module-provided memcache_admin ○ Evolved: phpmemcacheadmin ● Cloud-native ○ AWS Elasticache ○ Azure Memcached Cloud ○ Google AppEngine Memcache
  • 18. Mainstream packages MongoDB Drupal 7 features ● Driver support: ○ mongo extension for PHP 5.x ○ mongodb extension for PHP 7.x ○ MongoDB 2.x, 3.x ● Supported Services ○ Driver adapter for custom code ○ Block ○ Cache ○ Path ○ Queue ● Unsupported services ○ Field storage ○ Lock ○ (Session) ○ Watchdog = logger + UI ● Other modules ○ Views driver: EFQ Views Drupal 8.x-2.x features ● Driver support ○ mongodb extension for PHP ≥ 7.1 ○ mongodb/mongodb php driver ○ MongoDB 3.x, 4.x ● Supported Services ○ Driver adapter for custom code ○ Key-value (e.g. State) ○ Key-value expirable (e.g. *tempstore*, form_cache) ○ Watchdog = logger + UI ● CLI support ○ Drupal Console 1.9.x ○ Drush 9.x ● Other services ○ Entity/field storage ● Other modules ○ MongoDB Indexer
  • 19. Exotic packages MongoDB Drupal 8.x-1.x ● Driver support: ○ mongo extension for PHP 5.x ○ MongoDB 3.x ● Supported services ○ Complete NoSQL distribution ○ @database implementation ○ No SQL DBMS needed ○ Unpatched Drupal core ● Status ○ Sponsored by MongoDB, led by chx ○ Development halted before Drupal 8.0.0 ● Performance: ○ About 4x faster than equivalent Drupal core Drumongous ● Driver support ○ mongo extension for PHP ≥ 5.6 ○ MongoDB ≥ 3.6 ● Supported Services ○ Complete NoSQL distribution ○ @database implementation ● Source: patched Drupal core + module ○ ○ ● CLI support ○ Drupal Console 1.x ○ Drush 9.x ● Status ○ ○ No issue queue ○ Active, led by daffie
  • 20. espace réservé non accepté Performance / scalability Engine features ● Fault-tolerance ○ Built-in replication ○ Recommended config: 2+1 servers ● Scaling ○ Read-only replicas ○ Data-center awareness ○ Sharding ● Both supported by existing module Monitoring / Ops ● In-module: logs ● Cloud: MongoDB Atlas, free monitoring, OpsManager Cloud native ● Azure: CosmosDB ● MongoDB: Atlas ● Mlab (née Mongolab) MongoDB Production example Custom social network (2M users), migrated from MySQL: MySQL slow queries: -85%, uncached content build time: -98%
  • 22. Other NoSQL support modules NoSQL Product Module Wrapper Features 7.x 8.x Supported ? Neo4J neo4j Y - Y Y N RethinkDB renthinkdb Y ORM N Y ? CouchDB couchdb Y Node export Y N N Couchbase couchbase Y Logger + UI Y N ? ElasticSearch elasticsearch_connector Y Logger + improved UI, Statistics, Views Y N Y SearchAPI Y Y AWS DynamoDB dynamodb N Cache Y N ? AWS SimpleDB awssdk, creeper Y - Y N ? Riak riak_field_storage Y Field storage, map-reduce Y N unsupported Apache Cassandra cassandra Y Example app 6.x N unsupported Tokyo Tyrant node/844354 N Logger + UI 6.x N unapproved
  • 24. NoSQL Sessions ? ● Why the weak/removed session support, especially for memcache ? ○ Memcache session support is baked in PHP memcached extension ○ It was popular in Drupal 6.x time ○ It is popular in Symfony, even documented on ○ So ? ● Experience ○ Session data ○ Instance restart → all sessions data on instance lost ○ Bigger session data saturating bin → evictions ○ LRU means vulnerability to DoS-ing and blocking admins via evictions ○ DB load is bigger in Drupal than most frameworks ■ Session DB load is a smaller part of load for us
  • 25. Logs
  • 26. Logs in core The “SQL” problem ● All sites really need some sort of logging feature ● Smaller sites only have a database ○ ⇨ Database Logging default-enabled ● Code is not perfect, throws notices, errors ● Modules are verbose, log debug info ● “Drupal is too slow, please help, agency is stuck” ○ ⇨ Audit : 1500 inserts/min in watchdog table ○ ⇨ Other audits: watchdog > 99% of site size ● DBlog inserts compete with content work ● Owner disables logging ○ ⇨ now misses essential info ● Does not disable logging ○ ⇨ now can’t find essential info buried in noise The core NoSQL module ● Core has been bundling a syslog client since 6.0 ● Decouple logs from DB load ○ ⇨ No more SQL logs workload ● But where do they go ? ○ ⇨ Needs OS-level configuration ● How are logs cleaned ? ○ ⇨ Needs OS-level configuration ● Where is the UI ? ○ ⇨ Needs extra tools ● Solutions ? ○ D7 has logging hook ○ D8 has PSR/3 standard logging ○ ⇨ Contributions
  • 27. NoSQL on-site logs (mongodb|redis)_watchdog ● mongodb_watchdog ○ Logger service ■ Standard Drupal PSR/3 logs backend ■ Pre-storage filtering ■ Uses capped collections: auto-rotation, no ops ■ Dedicated database: zero contention ■ Per-request event tracing ○ Improved logs UI ■ Based on core UI ■ Groups recurring events on single line ■ Details page for occurrences ■ Per-HTTP-request log page ○ Most common reason to deploy MongoDB on D8 ● redis_watchdog ○ Logger service ○ Logs UI based on core UI ○ Usage: 1 site
  • 28. Off-site logs: BELK stack BELK stack ● Beats (typically FileBeat) ● Elastic Search ● Logstash ● Kibana Operation ● Drupal syslog → local syslog server → local logs ● DON’T log straight from Drupal ● Filebeat pulls logs, sends to Logstash ● Logstash massages logs, sends to ES ● ES provides storage, indexing ● Kibana provides UI Deployment ● Hosted with site ● SaaS: Loggly,, ...
  • 29. Off-site logs: Graylog Graylog ● Dual server: ES (logs, search) + MongoDB (meta, conf) ● Includes GROK log handling ● Accept syslog or GELF input ● Designed from Splunk Operation ● Drupal syslog → local syslog server → local logs ● DON’T log straight from Drupal via monolog_gelf ● Local syslog forwards to Graylog2 ● Graylog2 massages logs, sends to ES ● ES provides storage, indexing ● Graylog2 provides UI Deployment ● Hosted with site ● SaaS: StackHero
  • 30. (source: Graylog) Off-site logs: BELK vs Graylog design
  • 31. Non-SQL Logs: do I need them ? ● Small site, little traffic, single webmaster: just use dblog ● Any other site: upgrade to something else ○ Hosting company provides a logs dashboard (e.g. Splunk): use it ■ syslog into their stack, via local syslog then pull ○ Have an internal ops team ? ■ syslog into internal BELK or Graylog ○ No ops expertise ? don’t have time to learn Kibana/Graylog ? hosting company doesn’t provide real time logs access ? ■ Want to minimize costs and/or have logs in-site ? ● use mongodb_watchdog ■ Otherwise, use SaaS logs vendor ● Datadog, Scalyr, Loggly or Papertrail (SolarWinds),
  • 33. Queue API services ● Core: mostly for Batch API ● General D8 use: proxy invalidation ○ Invalidation queues ● Commerce sites ○ ERP links ○ Third-party catalog/inventory ● Media sites ○ Real time news feeds ingestion ○ Deferred derived media generation
  • 34. Queue modules SQL and NoSQL SQL ● Core bundled: queue.database service ○ used by all Drupal sites ● advanced_queue project ○ created for Drupal Commerce projects ○ used by Commerce 2.x NoSQL: storage-based ● Core bundled: queue.memory service ● Redis: ○ 7.x: redis_queue project ○ 8.x: redis project ● MongoDB ○ 7.x: mongodb project NoSQL: message servers ● Beanstalkd ○ 6.x/7.x: popular, used by itself ○ 8.x complete port, but no users (?) ● RabbitMQ ○ 7.x: little used, 8.x: most popular ○ Users include public TV, major french e-tailer ○ Hardened by production at these levels ● AWS SQS ○ 7.x: some use, but no 8.x port ● Apache Kafka ○ 8.x only ○ Created for largest french retail chain ● Other queue services ○ Less used: Gearman, IronMQ, 0MQ ○ No 8.x versions
  • 35. Queue API modules by usage D7/D8
  • 36. NoSQL Queue: do I need it ? ● Mainstream Drupal site without Varnish / CDN ○ probably not, advancedqueue is still a nice improvement though ● Content site with a lot of generated content, Varnish and/or CDN ○ consider using Redis (D8), MongoDB (D7), RabbitMQ (D8) ○ or use Kafka (D8) if you need to (e.g. corporate mandate) ● Drupal Commerce standalone ○ advancedqueue is normally enough ● Site generating lots of dynamic media (image, video, sound) ...or ingesting fast feeds (> 1 item/sec) ○ need a dedicated message server
  • 37. NoSQL Queue: which should I use ? ● The one your ops team supports best ○ Content management has a low event rate (< 1 event/sec) ● Kafka-class is for high-throughput queues ○ Think LinkedIn, Twitter, Netflix, Spotify, Airbnb, Paypal… ● RabbitMQ is solid ○ usually well known and monitored ○ D8 driver used for years on Cyber Monday, Black Friday, Olympic games... ● Beanstalkd is simple ○ It “just works” ○ Good first queue upgrading from DB
  • 39. SQL-based search ● Search has long been the weakest core feature in Drupal ○ In spite of improvements with each version ● Relevant issues ○ Good recall, but bad precision ○ Multilingual support, but no language awareness ○ Low awareness of language inflections → preprocessing API ○ Limited ability to handle asian (CJK) languages ○ Slow updates, cron-based pull mode ○ Indexing costs impacting site users ○ Indexed search for content only → search plugins ○ Other entity types limited to unindexed search by default ○ No support for restricted content search ● Useful complements: porterstemmer, snowball_stemmer ● SQL Alternative: Search API database search. Similar.
  • 40. NoSQL search solutions Cloud-based / SaaS ● SaaS offerings: ○ Algolia ○ Google CSE ● Drupal Hosting offerings (alphabetic order): ○ Acquia Search SOLR ○ SOLR ○ Pantheon SOLR ○ ElasticSearch / SOLR On-site / near-site ● Core support: Search API (14% of D7, 16% of D8 sites) ● Standard solution: ○ Local SOLR ○ Multilingual search supported ● Alternatives: ○ Elastic Search → heart of BELK suite ○ Xunsearch: Xapian for Chinese ○ Xapian (8.x dev) ● D7 backends not on D8: ○ Elastic Search via Elastica ○ Google Search Appliance: killed by Google ○ MongoDB via MongoDB module ○ Sphinx ● Proprietary search engine publishers have custom, unpublished, non-GPL (!) Drupal modules
  • 41. SQL and NoSQL search solutions by usage in D8
  • 42. Non-core search: which should I use ? ● Any content deserves search ● SQL ○ Core for small content quantities ○ Search API DB backend used by ● SaaS ○ For entry level: Algolia/Google = 0 recurring cost, near 0 set-up cost ○ Both perform better than core, but non-free ● Drupal PaaS have managed ES/SOLR ● Others: cost equilibrium ○ ES/SOLR have setup and recurring costs of possession (server load) ○ SaaS has lower set-up costs, but recurring fees ○ Core search has the cost of lost opportunity
  • 44. Best current practice: NoSQL in general Drupal 8 core tries hard to be SQL-agnostic ● Every use of the DB goes through @database ○ So anything able to pass for a SQL engine may be used ○ The mongodb_dbtng, mongodb 8.x-1.x, and Drumongous projects do just that ● Even Views has a query plugin. Project efq_views (7.x, 8.x) supports NoSQL engines that way ● No service except “storage” services should receive databases ○ Write a storage service for your data, defining its interface ○ Write a SQL provider implementing it, receiving @database ○ Tag the service as “backend_overridable” ○ Core mostly does it, custom code should always do it. ● References: ○ ○
  • 45. Best current practice: MongoDB ● Connecting to MongoDB with 8.x-2.x ○ Using multiple databases ? Use @mongodb.client_factory ■ The client you get is a standard mongodb/mongodb Client instance ■ You have to handle topology ○ Using single database ? Use @mongodb.database_factory ■ The database you get is a standard mongodb/mongodb Database instance ■ Your DB topology is now configurable in settings ○ You probably don’t want to use Doctrine ODM, especially when interacting with Drupal data ● Designing a custom schema ○ Start from the queries, not from some canonicalization ○ For large scale data sets, consider: ■ Splitting live and archive data for sharding ■ Having a write DB and a read DB, and a CLI-based service between them - read about CQRS ○ Never use a monotonic increasing key for sharding ○ In most cases, joined data in lists don’t need to be as up-to-date as primary views ■ Embed “light” versions of dependent objects for lists, only use $lookup and DBRef joins on full datum view
  • 46. “ “ There, I said it ! Contribution is its own reward
  • 47.
  • 48. Join us for contribution opportunities Thursday, October 31, 2019 9:00-18:00 Room: Europe Foyer 2 Mentored Contribution First Time Contributor Workshop General Contribution #DrupalContributions 9:00-14:00 Room: Diamond Lounge 9:00-18:00 Room: Europe Foyer 2
  • 49. What did you think? Locate this session at the DrupalCon Amsterdam website: Take the Survey!