Drupal 8 can scale well and serve pages fast to many users, especially by offloading parts of the work load from the main SQL database to NoSQL solutions.
This presentation describes the strategies and technologies usable to achieve such gains, including specific configuration, contributed modules and custom coding strategies.
A Kubernetes cluster contains a set of worker
machines known as nodes that run
containerized applications
ü Every cluster has at least one worker node.
Hence, if a node fails, your application will still
be accessible from the other nodes as in a
cluster, multiple nodes are grouped
We will introduce Airflow, an Apache Project for scheduling and workflow orchestration. We will discuss use cases, applicability and how best to use Airflow, mainly in the context of building data engineering pipelines. We have been running Airflow in production for about 2 years, we will also go over some learnings, best practices and some tools we have built around it.
Speakers: Robert Sanders, Shekhar Vemuri
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
This presentation introduces the concept of monitoring - focusing on why and how and finally on the tools to use. It introduces Prometheus (metrics gathering, processing, alerting), application instrumentation and Prometheus exporters and finally it introduces Grafana as a common companion for dashboarding, alerting and notifications. This presentations also introduces the handson workshop - for which materials are available from https://github.com/lucasjellema/monitoring-workshop-prometheus-grafana
PostgreSQL continuous backup and PITR with BarmanEDB
How can I achieve an RPO of 5 minutes for the backups of my PostgreSQL databases? And what about RPO=0 for zero data loss backups? This talk will give you answers to those questions, by guiding you through an overview of Disaster Recovery of PostgreSQL databases with Barman, covering its key concepts and providing useful patterns and tips.
Running Containers at Scale at Netflix. An update on the usage of containers at Netflix. Technical discussions on new features and concepts we've added across container scheduling and execution.
A Kubernetes cluster contains a set of worker
machines known as nodes that run
containerized applications
ü Every cluster has at least one worker node.
Hence, if a node fails, your application will still
be accessible from the other nodes as in a
cluster, multiple nodes are grouped
We will introduce Airflow, an Apache Project for scheduling and workflow orchestration. We will discuss use cases, applicability and how best to use Airflow, mainly in the context of building data engineering pipelines. We have been running Airflow in production for about 2 years, we will also go over some learnings, best practices and some tools we have built around it.
Speakers: Robert Sanders, Shekhar Vemuri
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
This presentation introduces the concept of monitoring - focusing on why and how and finally on the tools to use. It introduces Prometheus (metrics gathering, processing, alerting), application instrumentation and Prometheus exporters and finally it introduces Grafana as a common companion for dashboarding, alerting and notifications. This presentations also introduces the handson workshop - for which materials are available from https://github.com/lucasjellema/monitoring-workshop-prometheus-grafana
PostgreSQL continuous backup and PITR with BarmanEDB
How can I achieve an RPO of 5 minutes for the backups of my PostgreSQL databases? And what about RPO=0 for zero data loss backups? This talk will give you answers to those questions, by guiding you through an overview of Disaster Recovery of PostgreSQL databases with Barman, covering its key concepts and providing useful patterns and tips.
Running Containers at Scale at Netflix. An update on the usage of containers at Netflix. Technical discussions on new features and concepts we've added across container scheduling and execution.
PostgreSQL Replication High Availability MethodsMydbops
This slides illustrates the need for replication in PostgreSQL, why do you need a replication DB topology, terminologies, replication nodes and many more.
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...PostgreSQL-Consulting
Even an experienced PostgreSQL DBA can not always say that upgrading between major versions of Postgres is an easy task, especially if there are some special requirements, such as downtime limitations or if something goes wrong. For less experienced DBAs anything more complex than dump/restore can be frustrating.
In this talk I will describe why we need a special procedure to upgrade between major versions, how that can be achieved and what sort of problems can occur. I will review all possible ways to upgrade your cluster from classical pg_upgrade to old-school slony or modern methods like logical replication. For all approaches, I will give a brief explanation how it works (limited by the scope of this talk of course), examples how to perform upgrade and some advice on potentially problematic steps. Besides I will touch upon such topics as integration of upgrade tools and procedures with other software — connection brokers, operating system package managers, automation tools, etc. This talk would not be complete if I do not cover cases when something goes wrong and how to deal with such cases.
Introduction to Prometheus and Cortex (WOUG)Weaveworks
Have you been wanting to learn about open source Prometheus? Prometheus contributor Bryan Boreham will give you an intro about top things to know about the open source monitoring solution (which was the second project after Kubernetes to go into the CNCF). Bryan will also talk about the value of Cortex. Cortex is an open source project in the CNCF sandbox (and started by Weaveworks) that extends Prometheus by making horizontal scaling and long-term storage possible. He will also cover a little bit about PromQL, the Prometheus Query Language, and key use cases to understand the power of PromQL.
Author: Oleg Chunikhin, www.eastbanctech.com
Kubernetes is a portable open source system for managing and orchestrating containerized cluster applications. Kubernetes solves a number of DevOps related problems out of the box in a simple and unified way – rolling updates and update rollback, canary deployment and other complicated deployment scenarios, scaling, load balancing, service discovery, logging, monitoring, persistent storage management, and much more. You will learn how in less than 30 minutes a reliable self-healing production-ready Kubernetes cluster may be deployed on AWS and used to host and operate multiple environments and applications.
'Ansible Roles done right' is a talk about "Applying TDD while writing roles. Automatic tests powered by Continuous Integration + containers. Quick demo of the new ansible-container." Funny title: "When your applications don't have tests, at least your infrastructure does..."
Orchestrating workflows Apache Airflow on GCP & AWSDerrick Qin
Working in a cloud or on-premises environment, we all somehow move data from A to B on-demand or on schedule. It is essential to have a tool that can automate recurring workflows. This can be anything from an ETL(Extract, Transform, and Load) job for a regular analytics report all the way to automatically re-training a machine learning model.
In this talk, we will introduce Apache Airflow and how it can help orchestrate your workflows. We will cover key concepts, features, and use cases of Apache Airflow, as well as how you can enjoy Apache Airflow on GCP and AWS by demo-ing a few practical workflows.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
An in depth overview of Kubernetes and it's various components.
NOTE: This is a fixed version of a previous presentation (a draft was uploaded with some errors)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
This is the slide I presented at PyCon SG 2019. I talked about overview of Airflow and how we can use Airflow and the other data engineering services on AWS and GCP to build data pipelines.
What we've learned from running a PostgreSQL managed service on KubernetesDoKC
In this talk, I will share some of our learnings from running a managed PostgreSQL/TimescaleDB service on Kubernetes on AWS for a little more than a year: I’ll start with the motivation of running managed PostgreSQL on Kubernetes, the benefits and drawbacks. I’ll describe the architecture of the managed PostgreSQL cloud on Kubernetes I’ll zoom in on how we solved some of the Kubernetes-specific issues within our cloud, such as upgrading extensions without downtimes, taming the dreaded OOM killer, and doing regular maintenance and PostgreSQL major upgrades. I’ll share how open-source tools from the PostgreSQL ecosystem helps us to run the service and explain how we use them in a slightly non-trivial way.
This talk was given by Oleksii Kliukin for DoK Day Europe @ KubeCon 2022.
Introduction to Apache Airflow, it's main concepts and features and an example of a DAG. Afterwards some lessons and best practices learned by from the 3 years I have been using Airflow to power workflows in production.
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Sean Cohen
Starting from the basics, we explore the advantages of using Rook as a Storage operator to serve Ceph storage, the leading Software-Defined Storage platform in the Open Source world. Ceph automates the internal storage management, while Rook automates the user-facing operations and effectively turns a storage technology into a service transparent to the user. The combination delivers an impressive improvement in UX and provides the ideal storage platform for Kubernetes.
A comprehensive examination of use cases and open problems will complement our review of the Rook architecture. We will deep-dive into what Rook does well, what it does not do (yet), and what trade-offs using a storage operator involves operationally. With live access to a running cluster, we will showcase Rook in action as we discuss its capabilities.
https://www.openstack.org/summit/denver-2019/summit-schedule/events/23515/storage-101-rook-and-ceph
MySQL backups overview. Characteristics of every backup type, including dumps, Xtrabackup and snapshots. Planning proper backup strategies. Why and how to test backups.
PostgreSQL Replication High Availability MethodsMydbops
This slides illustrates the need for replication in PostgreSQL, why do you need a replication DB topology, terminologies, replication nodes and many more.
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...PostgreSQL-Consulting
Even an experienced PostgreSQL DBA can not always say that upgrading between major versions of Postgres is an easy task, especially if there are some special requirements, such as downtime limitations or if something goes wrong. For less experienced DBAs anything more complex than dump/restore can be frustrating.
In this talk I will describe why we need a special procedure to upgrade between major versions, how that can be achieved and what sort of problems can occur. I will review all possible ways to upgrade your cluster from classical pg_upgrade to old-school slony or modern methods like logical replication. For all approaches, I will give a brief explanation how it works (limited by the scope of this talk of course), examples how to perform upgrade and some advice on potentially problematic steps. Besides I will touch upon such topics as integration of upgrade tools and procedures with other software — connection brokers, operating system package managers, automation tools, etc. This talk would not be complete if I do not cover cases when something goes wrong and how to deal with such cases.
Introduction to Prometheus and Cortex (WOUG)Weaveworks
Have you been wanting to learn about open source Prometheus? Prometheus contributor Bryan Boreham will give you an intro about top things to know about the open source monitoring solution (which was the second project after Kubernetes to go into the CNCF). Bryan will also talk about the value of Cortex. Cortex is an open source project in the CNCF sandbox (and started by Weaveworks) that extends Prometheus by making horizontal scaling and long-term storage possible. He will also cover a little bit about PromQL, the Prometheus Query Language, and key use cases to understand the power of PromQL.
Author: Oleg Chunikhin, www.eastbanctech.com
Kubernetes is a portable open source system for managing and orchestrating containerized cluster applications. Kubernetes solves a number of DevOps related problems out of the box in a simple and unified way – rolling updates and update rollback, canary deployment and other complicated deployment scenarios, scaling, load balancing, service discovery, logging, monitoring, persistent storage management, and much more. You will learn how in less than 30 minutes a reliable self-healing production-ready Kubernetes cluster may be deployed on AWS and used to host and operate multiple environments and applications.
'Ansible Roles done right' is a talk about "Applying TDD while writing roles. Automatic tests powered by Continuous Integration + containers. Quick demo of the new ansible-container." Funny title: "When your applications don't have tests, at least your infrastructure does..."
Orchestrating workflows Apache Airflow on GCP & AWSDerrick Qin
Working in a cloud or on-premises environment, we all somehow move data from A to B on-demand or on schedule. It is essential to have a tool that can automate recurring workflows. This can be anything from an ETL(Extract, Transform, and Load) job for a regular analytics report all the way to automatically re-training a machine learning model.
In this talk, we will introduce Apache Airflow and how it can help orchestrate your workflows. We will cover key concepts, features, and use cases of Apache Airflow, as well as how you can enjoy Apache Airflow on GCP and AWS by demo-ing a few practical workflows.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
An in depth overview of Kubernetes and it's various components.
NOTE: This is a fixed version of a previous presentation (a draft was uploaded with some errors)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
This is the slide I presented at PyCon SG 2019. I talked about overview of Airflow and how we can use Airflow and the other data engineering services on AWS and GCP to build data pipelines.
What we've learned from running a PostgreSQL managed service on KubernetesDoKC
In this talk, I will share some of our learnings from running a managed PostgreSQL/TimescaleDB service on Kubernetes on AWS for a little more than a year: I’ll start with the motivation of running managed PostgreSQL on Kubernetes, the benefits and drawbacks. I’ll describe the architecture of the managed PostgreSQL cloud on Kubernetes I’ll zoom in on how we solved some of the Kubernetes-specific issues within our cloud, such as upgrading extensions without downtimes, taming the dreaded OOM killer, and doing regular maintenance and PostgreSQL major upgrades. I’ll share how open-source tools from the PostgreSQL ecosystem helps us to run the service and explain how we use them in a slightly non-trivial way.
This talk was given by Oleksii Kliukin for DoK Day Europe @ KubeCon 2022.
Introduction to Apache Airflow, it's main concepts and features and an example of a DAG. Afterwards some lessons and best practices learned by from the 3 years I have been using Airflow to power workflows in production.
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Sean Cohen
Starting from the basics, we explore the advantages of using Rook as a Storage operator to serve Ceph storage, the leading Software-Defined Storage platform in the Open Source world. Ceph automates the internal storage management, while Rook automates the user-facing operations and effectively turns a storage technology into a service transparent to the user. The combination delivers an impressive improvement in UX and provides the ideal storage platform for Kubernetes.
A comprehensive examination of use cases and open problems will complement our review of the Rook architecture. We will deep-dive into what Rook does well, what it does not do (yet), and what trade-offs using a storage operator involves operationally. With live access to a running cluster, we will showcase Rook in action as we discuss its capabilities.
https://www.openstack.org/summit/denver-2019/summit-schedule/events/23515/storage-101-rook-and-ceph
MySQL backups overview. Characteristics of every backup type, including dumps, Xtrabackup and snapshots. Planning proper backup strategies. Why and how to test backups.
Ao contrário do que todo mundo pensa, o Doctrine não é somente um Mapeador de objeto relacional. É um projeto focado em desenvolver soluções para persistência de dados e tecnologias relacionadas. Nessa palestra você verá o uso de várias ferramentas que fazem o uso de pacotes do projeto que serão úteis no seu ambiente desenvolvimento desde a implementação ao deploy.
MongoDB: Advantages of an Open Source NoSQL DatabaseFITC
Save 10% off ANY FITC event with discount code 'slideshare'
See our upcoming events at www.fitc.ca
OVERVIEW
The presentation will present an overview of the MongoDB NoSQL database, its history and current status as the leading NoSQL database. It will focus on how NoSQL, and in particular MongoDB, benefits developers building big data or web scale applications. Discuss the community around MongoDB and compare it to commercial alternatives. An introduction to installing, configuring and maintaining standalone instances and replica sets will be provided.
Presented live at FITC's Spotlight:MEAN Stack on March 28th, 2014.
More info at FITC.ca
Redis Developers Day 2014 - Redis Labs TalksRedis Labs
These are the slides that the Redis Labs team had used to accompany the session that we gave during the first ever Redis Developers Day on October 2nd, 2014, London. It includes some of the ideas we've come up with to tackle operational challenges in the hyper-dense, multi-tenants Redis deployments that our service - Redis Cloud - consists of.
Benchmarking for postgresql workloads in kubernetesDoKC
ABSTRACT OF THE TALK
6 months have passed since our last DoK webinar about benchmarking PostgreSQL workloads in a Kubernetes environment. In the meantime, many things have happened at EDB, and we’re happy to share what we’ve learned in this timeframe. We’ll use cnp-bench and cnp-sandbox to help us describe some of the challenges we might face when running PostgreSQL workloads, how to spot them, and what actions to take to make your databases healthier and more longeve.
cnp-bench is a collection of Helm charts that help run storage and database benchmarks, using popular open source tools like fio, pgbench, and HammerDB. cnp-sandbox is a Helm chart that sets up a Prometheus/Grafana stack, including basic metrics and dashboards for Cloud Native PostgreSQL, the Kubernetes operator developed by EDB. Both cnp-sandbox and cnp-bench are open source and recommended for development, testing, and pre-production environments only.
BIO
A long time open-source programmer and entrepreneur, Gabriele has a degree in Statistics from the University of Florence. After having consistently contributed to the growth of 2ndQuadrant and its members through nurturing a lean and devops culture, he is now leading the Cloud Native initiative at EDB. Gabriele lives in Prato, a small but vibrant city located in the northern part of Tuscany, Italy - famous for having hosted the first European PostgreSQL conferences. His second home is Melbourne, Australia, where he studied at Monash University and worked in the ICT sector. He loves playing the Blues with his Fender Stratocaster, but his major passions are called Elisabeth and Charlotte!
KEY TAKE-AWAYS FROM THE TALK
- A methodology for benchmarking a PostgreSQL database in Kubernetes
- Open source set of tools for benchmarking a PostgreSQL database in Kubernetes
- Reasons why benchmarking both the storage and the database is important
https://github.com/EnterpriseDB/cnp-sandbox
https://github.com/EnterpriseDB/cnp-bench
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo SeidelNETWAYS
So-called shared file systems are known for a long time for all IT administrators using Linux, Unix and Windows. The network based and also the cluster data systems are almost an old hat. Since a while distributed data systems become popular. The GlusterFS project is not new, but the community didn´t caught notice until the takeover of RedHat. Significant characteristics of this data system are the new approach to meta-data management and the new modular structure.
This presentation gives an insight into the approaches of data systems, and will explain the architecture and describe the first steps to set up a GlusterFS-cluster.
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...Chris Shenton
Presentation for PloneConf2017 in Barcelona. Backend tools used to develop and deploy the Diazo theme engine, for front- and back-end developers. Toolchain including build, tests, continuous integration, and deployment to a high-availability AWS cloud cluster. AWS cloudformation creation of the cluster.
Basics of Web App Systems Architecture
General Web Software Optimization Strategies
Defining a Goal for Performance
Performance Metrics, tools
Performance Debugging Techniques
What Can You Control?
What Is Caching?
Drupal Performance modules
Optimizing Drupal
Interface texte plein écran en Go avec TViewOSInet
Comment créer une application combinant une interface Web et une interface plein écran texte (TUI) avec rivo/tview.
Le code est sur https://github.com/fgm/twinui
Le site du livre est https://osinet.fr/go
Face it: most Drupal intranets / extranets / back-offices feel sluggish, and that's because they do too much during the page cycle. Make them snappier by deferring work to a Queue worker.
Delayed operations with queues for website performanceOSInet
Delaying work and deferring it to a queue handled asynchronously is one of the most efficient ways to improve full-page performance on complex page structures typical of content-oriented sites built with Drupal and other CMSes. These are the slides of the talk I have at DrupalCon Barcelona 2015 with Yuriy Gerasimov on this topic : learn about deferred submits, anticipated content refresh, and other tricks to speed up your sites.
En introduction de la conférence Drupagora 2015, Marine Soroko et moi-même avons présenté les éléments de stratégie impliqués dans le lancement de projets Drupal 8 : un regard technique d'un côté, un regard gestion de projets de l'autre.
Cache speedup with Heisencache for Drupal 7 and Drupal 8OSInet
Most performance tuning stops at deploying caches to replace file or database access. The next step is to examine how these cached data fare, even in production under high loads : Heisencache provides such analysis tools, for Drupal 7 and Drupal 8.
Recueil des mauvaises pratiques constatées lors de l'audit de sites Drupal 7OSInet
En 3 ans d'audit de sites Drupal 7 pour identifier des problèmes de performance, qualité, ou sécurité OSInet a identifié les causes d'erreurs les plus fréquentes : en règle général, chaque site audit présente au moins l'une d'entre elles.
Votre site est-il affecté par ces erreurs ?
Le groupe PHP-FIG s'est formé pour favoriser l'interopérabilité des frameworks PHP.
Découvrez l'organisation et le fonctionnement du FIG, et les standards PHP PSR-0/PSR-4 pour l'autoloading, PSR-1/PSR-2 pour les normes de codage, PSR-3 pour le logging, les autres standards en cours d'élaboration: PSR-5 pour PHPdoc, PSR-6 pour le cache, et toutes les discussions en cours sur la standardisation PHP.
Présentation donnée au meetup AFUP du 02/04/2014.
Le système de blocs a été présent depuis les origines de Drupal sur drop.org jusqu'à aujourd'hui dans Drupal 8.
Cette présentation retrace l'historique de son développement, et plus largement celui de Drupal dans son ensemble.
Panorama des technologies NoSQL compatibles avec Drupal 7 et 6 à fin 2011: objectifs globaux, tâches fonctionnelles, techniques de mise en oeuvre, coûts, bonnes pratiques, compromis, modules disponibles.
Avec une bibliographie.
Slides from Frederic G. MARAND's "Developing to the Views 7.3 API" at the Drupal Dev Days in Brussels, 2011-02-05.
More details and code access on my blog.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
https://bit.ly/3KACoyV
The ER diagram for the project is the foundation for the building of the database of the project. The properties, datatypes, and attributes are defined by the ER diagram.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
3. Topic ?
Simple idea: “No SQL”
● Alternate storage engines: KV, Structures, Document,
Graph, Columnar…
● No standard, often no fixed schema, no joins, no FKs
● → Engine-specific application design
● Drupal architecture ?
Evolved idea: Not Only SQL
● For engines, add equivalent features to SQL
● For Drupal, combine SQL et NoSQL solutions
● Start from the default SQL-based architecture
● Offload services to non-SQL implementations
○ front-end caches, search engines, queue servers
○ specialized storage: cache, KV, lock, sessions…
● Often involves NoSQL as cache for SQL
espace 1 espace 2
4. NOSQL: do you need it ?
● Start by observing the current state
○ Database queries → devel + webprofiler
○ Cache → heisencache (D7), webprofiler (D8)
○ Build cacheability → renderviz
● Observe behaviour
○ Core observability built-in: DBTNG logging, cache decorators, QueryInterface for KV, config, content…
○ Monitoring module (400 sites) by Karan Poddar (Google SoC) and MD Systems
○ Add your choice of time-series store (e.g. Prometheus, InfluxDB) and UI (e.g. Grafana)
○ ⇨ Use it !
● You want to see this when it happens ⟶
6. Fixing an identified problem is cheaper than “trying things”
Fix from acquired information
● It /MAY/ involve taking queries off the main DB to a NoSQL solution
● But poorly configured NoSQL may make it worse.
7. “Just do it” ?
● Drupal is built on SQL:
○ Views depends on it by default
○ Most sites rely on Views data model awareness
○ → Contrib often assumes SQL, injects @database
○ NoSQL support doable, rarely done
● Contrib support level is limited
○ Most NoSQL contrib not ported from D7 to D8
○ Drupalshop knowledge limited except biggest or
specialized
○ Products may die… e.g. RethinkDB
● Pro support from publishers = costs. Availability.
● Extra support needed = costs
NoSQL == added build costs
→ balance gains vs costs
Example case: RethinkDB
At DevDays Milan 2016, after lots of work, Gizra’s @RoySegall
demoed a Drupal 8 ORM/ODM for RethinkDB.
Then, this happened...
10. Caching ahead of real work
Default situation with SQL
● Browser caching, limited
● Internal / dynamic page cache in main SQL DB
● Need DB connection, a few SELECT queries
● Fetch cache from DB
● All data from main storage
● ⇨ Serve cached pages in about 20 msec
All this work makes DoS-ing comparatively cheap.
NoSQL improvements
● Add caching ahead of site itself
○ Browser
■ Optimized browser caching (Cache-Control)
■ PWA: use browser local storage
○ CDN
■ CDN module (2k sites)
■ Akamai module (600 sites)
■ ⇨ Serve cached pages in about 15 msec (TTFB)
■ Web-scale
○ Varnish and other reverse proxies
■ ⇨ Serve cached pages in about 10 msec (TTFB)
■ Core support
■ Varnish Purger (3k sites)
● ⇨ Most request will mean 0 SQL queries
○ DoS-ing more costly, especially with CDN
● Move page caches off main DB: next section
13. Storage: the “Big 3”
The most active NoSQL suites for Drupal 8.x
Redis
● Type: Key-value (structure server)
● Module
○ redis
● DB-Engines ranking:
○ #1 Key-value store
● Usage
○ Drupal 7: 10k sites
○ Drupal 8: 10k sites
● Supported by
○ Drupal 7: Makina Corpus
○ Drupal 8: MD Systems
Memcached
● Type: Key-value
● Module
○ memcache
● DB-Engines ranking:
○ #3 Key-value store
○ #5 Key-value store (Hazelcast)
● Usage (memcache_storage)
○ Drupal 7: 32k (2k) sites
○ Drupal 8: 15k (800) sites
● Supported by:
○ Acquia
○ Tag1 Consulting
MongoDB / CosmosDB
● Type: Document store
● Module
○ mongodb
● DB-Engines ranking:
○ #1 Document store (MongoDB)
○ #4 Document store (CosmosDB)
● Usage
○ Drupal 7: 300 sites
○ Drupal 8: 50 sites
● Supported by
○ OSInet
14. Redis
https://www.drupal.org/project/redis
● Driver support
○ phpredis and predis both supported
● Supported Services
○ Driver adapter for custom code
○ Cache, including invalidations
○ Flood
○ Lock
○ Lock.Persistent
○ Queue
● CLI support
○ Not included
● Other modules
○ Redis Watchdog: logger + UI
Recent events (from @Berdir)
● Deadlock/race condition on node_list invalidations
(#2966607) finally fixed in core 8.8.x with latest
release
● php-redis 5.0 broke module, fixed in latest 8.x and 7.x
releases
● Module users: please test and report !
15. Performance / scalability
Redis
https://www.drupal.org/project/redis
● Performance, single-server
○ Memory-only implementation
■ Usually among the fastest
■ Often the fastest
■ Even with concurrent access
○ Persistent
■ A bit slower even with just RDB
■ Slower with AOF
● Persistence, single instance
○ RDB:
■ compact snapshots, shippable off-site
■ data loss: since latest snapshot
○ AOF
■ up to last-second fsync’ed journal
■ less compact
● Fault-tolerance: Sentinel 2
○ master/slave supervision
○ automatic failover possible
○ observability support
● Scaling
○ Cluster-based sharding
○ Master → Slaves → Slaves
○ No strong consistency
○ Recommended config: 6 servers
● Cloud-native:
○ Redis Enteprise Cloud
○ AWS Elasticache, Azure, Google Memorystore
○ many others
16. Redis
https://www.drupal.org/project/memcache
● Driver support
○ memcache extension (limited availability)
○ memcached extension
○ PHP ≥ 5.6
● Supported Services
○ Driver adapter for custom code
○ Cache, including invalidations
○ Lock
○ Lock.Persistent removed in #2995907
○ Sessions ported, then removed in 7.x
○ Monitoring UI
● CLI support
○ Not included: core commands
● Other module: memcache_storage
○ Cache with core SQL invalidations
○ No lock
○ Monitoring UI
Recent events (from @Berdir)
● Deadlock/race condition on node_list invalidations
(#2966607) finally fixed in core 8.8.x with latest
release, based on Redis fix.
17. ● Performance, single-server
○ Memory-only implementation
■ Usually among the fastest
■ Slower than in-memory Redis
■ A bit faster than to MySQL / MongoDB K/V
○ Persistence: extstore NVRAM support
■ No significant slowdown
■ Usually a bad idea (expectations)
■ https://memcached.org/blog/persistent-m
emory/
● Fault-tolerance
○ Module support for sharded clusters
○ Consistent hashing: avoid thundering herd prob.
○ Replication: with Hazelcache
Performance / scalability
Redis
https://www.drupal.org/project/memcache
● Scaling
○ Cluster-based sharding
○ Consistent hashing allows elastic scaling
○ Recommended config: 2 instances per
cluster, 1 cluster per bin, with some
exceptions: usually 10-20 instances per D8 site
○ Some bins must stay in core (form, update)
● Monitoring
○ Instant: module-provided memcache_admin
○ Evolved: phpmemcacheadmin
● Cloud-native
○ AWS Elasticache
○ Azure Memcached Cloud
○ Google AppEngine Memcache
22. Other NoSQL support modules
NoSQL Product Module Wrapper Features 7.x 8.x Supported ?
Neo4J neo4j Y - Y Y N
RethinkDB renthinkdb Y ORM N Y ?
CouchDB couchdb Y Node export Y N N
Couchbase couchbase Y Logger + UI Y N ?
ElasticSearch elasticsearch_connector Y Logger + improved UI,
Statistics, Views
Y N Y
SearchAPI Y Y
AWS DynamoDB dynamodb N Cache Y N ?
AWS SimpleDB awssdk, creeper Y - Y N ?
Riak riak_field_storage Y Field storage, map-reduce Y N unsupported
Apache Cassandra cassandra Y Example app 6.x N unsupported
Tokyo Tyrant node/844354 N Logger + UI 6.x N unapproved
24. NoSQL Sessions ?
● Why the weak/removed session support, especially for memcache ?
○ Memcache session support is baked in PHP memcached extension
○ It was popular in Drupal 6.x time
○ It is popular in Symfony, even documented on symfony.com
○ So ?
● Experience
○ Session data
○ Instance restart → all sessions data on instance lost
○ Bigger session data saturating bin → evictions
○ LRU means vulnerability to DoS-ing and blocking admins via evictions
○ DB load is bigger in Drupal than most frameworks
■ Session DB load is a smaller part of load for us
26. Logs in core
The “SQL” problem
● All sites really need some sort of logging feature
● Smaller sites only have a database
○ ⇨ Database Logging default-enabled
● Code is not perfect, throws notices, errors
● Modules are verbose, log debug info
● “Drupal is too slow, please help, agency is stuck”
○ ⇨ Audit : 1500 inserts/min in watchdog table
○ ⇨ Other audits: watchdog > 99% of site size
● DBlog inserts compete with content work
● Owner disables logging
○ ⇨ now misses essential info
● Does not disable logging
○ ⇨ now can’t find essential info buried in noise
The core NoSQL module
● Core has been bundling a syslog client since 6.0
● Decouple logs from DB load
○ ⇨ No more SQL logs workload
● But where do they go ?
○ ⇨ Needs OS-level configuration
● How are logs cleaned ?
○ ⇨ Needs OS-level configuration
● Where is the UI ?
○ ⇨ Needs extra tools
● Solutions ?
○ D7 has logging hook
○ D8 has PSR/3 standard logging
○ ⇨ Contributions
27. NoSQL on-site logs
(mongodb|redis)_watchdog
● mongodb_watchdog
○ Logger service
■ Standard Drupal PSR/3 logs backend
■ Pre-storage filtering
■ Uses capped collections: auto-rotation, no ops
■ Dedicated database: zero contention
■ Per-request event tracing
○ Improved logs UI
■ Based on core UI
■ Groups recurring events on single line
■ Details page for occurrences
■ Per-HTTP-request log page
○ Most common reason to deploy MongoDB on D8
● redis_watchdog
○ Logger service
○ Logs UI based on core UI
○ Usage: 1 site
28. Off-site logs: BELK stack
BELK stack
● Beats (typically FileBeat)
● Elastic Search
● Logstash
● Kibana
Operation
● Drupal syslog → local syslog server → local logs
● DON’T log straight from Drupal
● Filebeat pulls logs, sends to Logstash
● Logstash massages logs, sends to ES
● ES provides storage, indexing
● Kibana provides UI
Deployment
● Hosted with site
● SaaS: Loggly, Logz.io, ...
29. Off-site logs: Graylog
Graylog
● Dual server: ES (logs, search) + MongoDB (meta, conf)
● Includes GROK log handling
● Accept syslog or GELF input
● Designed from Splunk
Operation
● Drupal syslog → local syslog server → local logs
● DON’T log straight from Drupal via monolog_gelf
● Local syslog forwards to Graylog2
● Graylog2 massages logs, sends to ES
● ES provides storage, indexing
● Graylog2 provides UI
Deployment
● Hosted with site
● SaaS: StackHero
31. Non-SQL Logs: do I need them ?
● Small site, little traffic, single webmaster: just use dblog
● Any other site: upgrade to something else
○ Hosting company provides a logs dashboard (e.g. Splunk): use it
■ syslog into their stack, via local syslog then pull
○ Have an internal ops team ?
■ syslog into internal BELK or Graylog
○ No ops expertise ? don’t have time to learn Kibana/Graylog ? hosting company
doesn’t provide real time logs access ?
■ Want to minimize costs and/or have logs in-site ?
● use mongodb_watchdog
■ Otherwise, use SaaS logs vendor
● Datadog, Scalyr, Loggly or Papertrail (SolarWinds), Logz.io...
33. Queue API services
● Core: mostly for Batch API
● General D8 use: proxy invalidation
○ Invalidation queues
● Commerce sites
○ ERP links
○ Third-party catalog/inventory
● Media sites
○ Real time news feeds ingestion
○ Deferred derived media generation
34. Queue modules
SQL and NoSQL
SQL
● Core bundled: queue.database service
○ used by all Drupal sites
● advanced_queue project
○ created for Drupal Commerce projects
○ used by Commerce 2.x
NoSQL: storage-based
● Core bundled: queue.memory service
● Redis:
○ 7.x: redis_queue project
○ 8.x: redis project
● MongoDB
○ 7.x: mongodb project
NoSQL: message servers
● Beanstalkd
○ 6.x/7.x: popular, used by drupal.org itself
○ 8.x complete port, but no users (?)
● RabbitMQ
○ 7.x: little used, 8.x: most popular
○ Users include public TV, major french e-tailer
○ Hardened by production at these levels
● AWS SQS
○ 7.x: some use, but no 8.x port
● Apache Kafka
○ 8.x only
○ Created for largest french retail chain
● Other queue services
○ Less used: Gearman, IronMQ, 0MQ
○ No 8.x versions
36. NoSQL Queue: do I need it ?
● Mainstream Drupal site without Varnish / CDN
○ probably not, advancedqueue is still a nice improvement though
● Content site with a lot of generated content, Varnish and/or CDN
○ consider using Redis (D8), MongoDB (D7), RabbitMQ (D8)
○ or use Kafka (D8) if you need to (e.g. corporate mandate)
● Drupal Commerce standalone
○ advancedqueue is normally enough
● Site generating lots of dynamic media (image, video, sound) ...or ingesting fast feeds (> 1 item/sec)
○ need a dedicated message server
37. NoSQL Queue: which should I use ?
● The one your ops team supports best
○ Content management has a low event rate (< 1 event/sec)
● Kafka-class is for high-throughput queues
○ Think LinkedIn, Twitter, Netflix, Spotify, Airbnb, Paypal…
● RabbitMQ is solid
○ usually well known and monitored
○ D8 driver used for years on Cyber Monday, Black Friday, Olympic games...
● Beanstalkd is simple
○ It “just works”
○ Good first queue upgrading from DB
39. SQL-based search
● Search has long been the weakest core feature in Drupal
○ In spite of improvements with each version
● Relevant issues
○ Good recall, but bad precision
○ Multilingual support, but no language awareness
○ Low awareness of language inflections → preprocessing API
○ Limited ability to handle asian (CJK) languages
○ Slow updates, cron-based pull mode
○ Indexing costs impacting site users
○ Indexed search for content only → search plugins
○ Other entity types limited to unindexed search by default
○ No support for restricted content search
● Useful complements: porterstemmer, snowball_stemmer
● SQL Alternative: Search API database search. Similar.
40. NoSQL search solutions
Cloud-based / SaaS
● SaaS offerings:
○ Algolia
○ Google CSE
● Drupal Hosting offerings (alphabetic order):
○ Acquia Search SOLR
○ Amazee.io SOLR
○ Pantheon SOLR
○ Platform.sh ElasticSearch / SOLR
On-site / near-site
● Core support: Search API (14% of D7, 16% of D8 sites)
● Standard solution:
○ Local SOLR
○ Multilingual search supported
● Alternatives:
○ Elastic Search → heart of BELK suite
○ Xunsearch: Xapian for Chinese
○ Xapian (8.x dev)
● D7 backends not on D8:
○ Elastic Search via Elastica
○ Google Search Appliance: killed by Google
○ MongoDB via MongoDB module
○ Sphinx
● Proprietary search engine publishers have custom,
unpublished, non-GPL (!) Drupal modules
42. Non-core search: which should I use ?
● Any content deserves search
● SQL
○ Core for small content quantities
○ Search API DB backend used by drupal.org
● SaaS
○ For entry level: Algolia/Google = 0 recurring cost, near 0 set-up cost
○ Both perform better than core, but non-free
● Drupal PaaS have managed ES/SOLR
● Others: cost equilibrium
○ ES/SOLR have setup and recurring costs of possession (server load)
○ SaaS has lower set-up costs, but recurring fees
○ Core search has the cost of lost opportunity
44. Best current practice: NoSQL in general
Drupal 8 core tries hard to be SQL-agnostic
● Every use of the DB goes through @database
○ So anything able to pass for a SQL engine may be used
○ The mongodb_dbtng, mongodb 8.x-1.x, and Drumongous projects do just that
● Even Views has a query plugin. Project efq_views (7.x, 8.x) supports NoSQL engines that way
● No service except “storage” services should receive databases
○ Write a storage service for your data, defining its interface
○ Write a SQL provider implementing it, receiving @database
○ Tag the service as “backend_overridable”
○ Core mostly does it, custom code should always do it.
● References:
○ https://www.drupal.org/project/drupal/issues/2302617
○ https://www.drupal.org/node/2306083
45. Best current practice: MongoDB
● Connecting to MongoDB with 8.x-2.x
○ Using multiple databases ? Use @mongodb.client_factory
■ The client you get is a standard mongodb/mongodb Client instance
■ You have to handle topology
○ Using single database ? Use @mongodb.database_factory
■ The database you get is a standard mongodb/mongodb Database instance
■ Your DB topology is now configurable in settings
○ You probably don’t want to use Doctrine ODM, especially when interacting with Drupal data
● Designing a custom schema
○ Start from the queries, not from some canonicalization
○ For large scale data sets, consider:
■ Splitting live and archive data for sharding
■ Having a write DB and a read DB, and a CLI-based service between them - read about CQRS
○ Never use a monotonic increasing key for sharding
○ In most cases, joined data in lists don’t need to be as up-to-date as primary views
■ Embed “light” versions of dependent objects for lists, only use $lookup and DBRef joins on full datum view
46. “ “
There, I said it !
Contribution is
its own reward
47.
48. Join us for
contribution opportunities
Thursday, October 31, 2019
9:00-18:00
Room: Europe Foyer 2
Mentored
Contribution
First Time
Contributor Workshop
General
Contribution
#DrupalContributions
9:00-14:00
Room: Diamond Lounge
9:00-18:00
Room: Europe Foyer 2
49. What did you think?
Locate this session at the DrupalCon Amsterdam website:
https://drupal.kuoni-congress.info/2019/program/
Take the Survey!
https://www.surveymonkey.com/r/DrupalConAmsterdam