Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

•

0 likes•298 views

This talk shares the story of how SiteGround created an enterprise monitoring system for its Drupal VIP clients. As the person behind this SiteGround project I'll talk about the following topics in details: 1. What is an enterprise level monitoring system for Drupal sites and the underlying hosting infrastructure. 2. Why big enterprise Drupal sites need such a system and what is the business value for the customer. 3. What is the best way to technically implement a system which monitors and solves issues with sites that are extremely complicated. 4. Why a migration from reactive monitoring to SRE best methods is the only option for such sites. At the end of the talk people will know the following: - Why big enterprise Drupal sites need custom monitoring. - Why traditional monitoring is not suitable for sites that use the latest technologies - Elasticsearch, Solr, Nginx, Redis, Docker, LXC. - At the end of the talk the people will be familiar with the concepts of proactive system/site management. I'll talk about what site reliability engineers do and how a big part of this has been automated at SiteGround and why this is very important.

Technology

Enterprise Drupal Application &
Hosting Infrastructure Level
Monitoring
Daniel Kanchev
Senior Site Reliability Engineer
@dvkanchev

Enterprise Drupal Hosting Characteristics
￮ Consists of multiple servers
￮ Provides high availability
￮ Offers auto scalability
￮ Requires multiple services to work as expected

Hosting Types Complexity
￮ Shared Hosting Service
￮ Single Virtual Server
￮ Single Dedicated Server
￮ PaaS

Hosting Types Complexity
￮ Shared Hosting Service
￮ Single Virtual Server
￮ Single Dedicated Server
￮ PaaS
￮ Custom Private/Public Clouds

￮ ElasticSearch/Solr
￮ Redis/Memcached
￮ GraphQL
￮ MongoDB
￮ Nodejs
￮ Gearman
￮ CI systems

One Monitoring To Rule Them All
• Website Monitoring
• Hosting Infrastructure Monitoring

Website Monitoring Architecture
Website
London Amsterdam Munich

Website Monitoring Architecture
Website
London Amsterdam Munich
503 ISE

Incidents
￮ Critical Incident - website is down from all locations
￮ Major Incident - website is down from a single location; MySQL replication
is broken; PHP fatal errors recorded in the logs; read-only file system issue
￮ Minor Incident - Memcached/Redis on a single server is down
￮ Notice Incident - web node X is running out of space; PHP warnings
recorded in the logs

Core Principles
￮ Log all events and archive them. Write postmortem reports
￮ Check every single incident - even minor ones and notices
￮ Define performance limits and regularly check reports
￮ Beware of cascade failures
￮ Always strive to go back to pre-incident state
￮ Check one thing at a time and return “OK” or “Failure”

Examples
￮ 1 of 5 app servers goes down
￮ Load on the other 4 increases by 20%
￮ Redis caches are invalidated - overload
￮ Varnish is restarted by a system
administrator to apply a configuration
change
￮ App servers start to return 503 errors
￮ MySQL master goes down
￮ MySQL slave 1 takes over and at this
moment there is no downtime
￮ MySQL slave 2 is behind the new
master
￮ The new MySQL master goes down too
result is a broken DB or outdated one

KEY TAKEAWAYS
1. Embrace Failure and Design for Failure
2. Automate Recovery
3. Log all incidents and analyse them
4. Measure and graph the performance of all components
5. Regularly brake things on purpose in order to test

RESOURCES
Injecting Failure at Netflix - goo.gl/YE1sEY
What is SRE - goo.gl/2lI8E0
SRE book - goo.gl/bfL2At
Netflix Open Source Software - https://netflix.github.io/
Etsy “Measure Everything” - goo.gl/CPVUT5

JOIN US FOR
CONTRIBUTION SPRINTS
First Time Sprinter Workshop - 9:00-12:00 - Room Wicklow2A
Mentored Core Sprint - 9:00-18:00 - Wicklow Hall 2B
General Sprints - 9:00 - 18:00 - Wicklow Hall 2A

Evaluate This Session
THANK YOU!
events.drupal.org/dublin2016/schedule
WHAT DID YOU THINK?

What's hot

MariaDB Galera ClusterAbdul Manaf

MySQL High Availability SolutionsMydbops

Proxysql use case scenarios fosdem17Alkin Tezuysal

Webseminar: MariaDB Enterprise und MariaDB Enterprise ClusterMariaDB Corporation

High Availability with MariaDB EnterpriseMariaDB Corporation

Upcoming changes in MySQL 5.7Morgan Tocker

Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Severalnines

20171104 hk-py con-mysql-documentstore_v1Ivan Ma

Using MySQL in Automated TestingMorgan Tocker

Mysql User Camp : 20th June - Mysql New FeaturesTarique Saleem

Mysql User Camp : 20-June-14 : Mysql FabricMysql User Camp

Webinar slides: Managing MySQL Replication for High AvailabilitySeveralnines

MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...Severalnines

MySQL Cluster (NDB) - Best Practices Percona Live 2017Severalnines

Plny12 galera-cluster-best-practicesDimas Prasetyo

Maria DB Galera Cluster for High AvailabilityOSSCube

Become a MySQL DBA - webinar series - slides: Which High Availability solution?Severalnines

Migrating from InnoDB and HBase to MyRocks at FacebookMariaDB plc

MySQL highav AvailabilityBaruch Osoveskiy

MySQL Shell for Database EngineersMydbops

What's hot (20)

MariaDB Galera Cluster

MySQL High Availability Solutions

Proxysql use case scenarios fosdem17

Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster

High Availability with MariaDB Enterprise

Upcoming changes in MySQL 5.7

Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison

20171104 hk-py con-mysql-documentstore_v1

Using MySQL in Automated Testing

Mysql User Camp : 20th June - Mysql New Features

Mysql User Camp : 20-June-14 : Mysql Fabric

Webinar slides: Managing MySQL Replication for High Availability

MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...

MySQL Cluster (NDB) - Best Practices Percona Live 2017

Plny12 galera-cluster-best-practices

Maria DB Galera Cluster for High Availability

Become a MySQL DBA - webinar series - slides: Which High Availability solution?

Migrating from InnoDB and HBase to MyRocks at Facebook

MySQL highav Availability

MySQL Shell for Database Engineers

Viewers also liked

Sofia WP User Group PresentationDaniel Kanchev

WordPress website optimizationDaniel Kanchev

DrupalCon Barcelona 2015Daniel Kanchev

Hidden Secrets For A Hack-Proof Joomla! SiteDaniel Kanchev

How to Speed Up Your Joomla! SiteDaniel Kanchev

Drupal8 + AngularJSDaniel Kanchev

Turbinando Drupal com RedisDaniel Santos

High Performance on Drupal 7Exove

Building enterprise high availability application with drupalRatnesh kumar, CSM

High Performance DrupalChapter Three

Implementing High Performance Drupal SitesShri Kumar

Drupal High Availability High Performance 2012Amazee Labs

Highly available Drupal on a Raspberry Pi clusterJeff Geerling

ProTips for Staying Sane while Working from Home Jeff Geerling

Ansible + Drupal: A Fortuitous DevOps MatchJeff Geerling

Amazon Web Services Building Blocks for Drupal Applications and HostingAcquia

How we build a startup with DrupalPavel Prischepa

Growth HackingMattan Griffel

Viewers also liked (18)

Sofia WP User Group Presentation

WordPress website optimization

DrupalCon Barcelona 2015

Hidden Secrets For A Hack-Proof Joomla! Site

How to Speed Up Your Joomla! Site

Drupal8 + AngularJS

Turbinando Drupal com Redis

High Performance on Drupal 7

Building enterprise high availability application with drupal

High Performance Drupal

Implementing High Performance Drupal Sites

Drupal High Availability High Performance 2012

Highly available Drupal on a Raspberry Pi cluster

ProTips for Staying Sane while Working from Home

Ansible + Drupal: A Fortuitous DevOps Match

Amazon Web Services Building Blocks for Drupal Applications and Hosting

How we build a startup with Drupal

Growth Hacking

Similar to Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

1 architecture & designMark Swarbrick

Planning to Fail #phpuk13Dave Gardner

Midwest PHP - Scaling MagentoMathew Beane

AWS Webcast - AWS OpsWorks Continuous Integration Demo Amazon Web Services

SynapseIndia drupal presentation on drupal infoSynapseindiappsdevelopment

MySQL High Availability Solutions - Feb 2015 webinarAndrew Morgan

Drupal Performance : DrupalCamp NorthPhilip Norton

Change management in hybrid landscapesChris Kernaghan

Continuent Tungsten - Scalable Saa S Data Managementguest2e11e8

WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf

Rails Conf Europe 2007 NotesRoss Lawley

Best Practices for Building WordPress ApplicationsTaylor Lovett

Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Emerson Eduardo Rodrigues Von Staffen

DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...Amazon Web Services

DevOps Sydney: Chef AutomateMatt Ray

From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamAndreas Grabner

Planning to Fail #phpne13Dave Gardner

Open source: Top issues in the top enterprise packagesRogue Wave Software

DevOps, A brief introduction to Vagrant & AnsibleArnaud LEMAIRE

Twelve Factor - Designing for ChangeEric Wyles

Similar to Enterprise Drupal Application & Hosting Infrastructure Level Monitoring (20)

1 architecture & design

Planning to Fail #phpuk13

Midwest PHP - Scaling Magento

AWS Webcast - AWS OpsWorks Continuous Integration Demo

SynapseIndia drupal presentation on drupal info

MySQL High Availability Solutions - Feb 2015 webinar

Drupal Performance : DrupalCamp North

Change management in hybrid landscapes

Continuent Tungsten - Scalable Saa S Data Management

WinOps Conf 2016 - Michael Greene - Release Pipelines

Rails Conf Europe 2007 Notes

Best Practices for Building WordPress Applications

Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...

DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...

DevOps Sydney: Chef Automate

From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam

Planning to Fail #phpne13

Open source: Top issues in the top enterprise packages

DevOps, A brief introduction to Vagrant & Ansible

Twelve Factor - Designing for Change

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

A Call to Action for Generative AI in 2024Results

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Histor y of HAM Radio presentation slidevu2urc

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Slack Application Development 101 Slidespraypatel2

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Presentation on how to chat with PDF using ChatGPT code interpreter

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Boost Fertility New Invention Ups Success Rates.pdf

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

A Call to Action for Generative AI in 2024

Advantages of Hiring UIUX Design Service Providers for Your Business

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Exploring the Future Potential of AI-Enabled Smartphone Processors

What Are The Drone Anti-jamming Systems Technology?

08448380779 Call Girls In Civil Lines Women Seeking Men

Handwritten Text Recognition for manuscripts and early printed texts

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Histor y of HAM Radio presentation slide

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Boost PC performance: How more available memory can improve productivity

Slack Application Development 101 Slides

Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

2. Enterprise Drupal Application & Hosting Infrastructure Level Monitoring Daniel Kanchev Senior Site Reliability Engineer @dvkanchev

3. Enterprise Drupal Hosting Characteristics ￮ Consists of multiple servers ￮ Provides high availability ￮ Offers auto scalability ￮ Requires multiple services to work as expected

4. Enterprise Drupal Hosting Characteristics ￮ Consists of multiple servers ￮ Provides high availability ￮ Offers auto scalability ￮ Requires multiple services to work as expected ￮ Really expensive ￮ Nobody wants to manage this sh*t :)

5. Hosting Types Complexity

6. Hosting Types Complexity ￮ Shared Hosting Service ￮ Single Virtual Server ￮ Single Dedicated Server ￮ PaaS

7. Hosting Types Complexity ￮ Shared Hosting Service ￮ Single Virtual Server ￮ Single Dedicated Server ￮ PaaS ￮ Custom Private/Public Clouds

9. ￮ ElasticSearch/Solr ￮ Redis/Memcached ￮ GraphQL ￮ MongoDB ￮ Nodejs ￮ Gearman ￮ CI systems

10. One Monitoring To Rule Them All • Website Monitoring • Hosting Infrastructure Monitoring

11. Website Monitoring Architecture Website London Amsterdam Munich

12. Website Monitoring Architecture Website London Amsterdam Munich 503 ISE

13. Incidents ￮ Critical Incident - website is down from all locations ￮ Major Incident - website is down from a single location; MySQL replication is broken; PHP fatal errors recorded in the logs; read-only file system issue ￮ Minor Incident - Memcached/Redis on a single server is down ￮ Notice Incident - web node X is running out of space; PHP warnings recorded in the logs

14.

15. Core Principles ￮ Log all events and archive them. Write postmortem reports ￮ Check every single incident - even minor ones and notices ￮ Define performance limits and regularly check reports ￮ Beware of cascade failures ￮ Always strive to go back to pre-incident state ￮ Check one thing at a time and return “OK” or “Failure”

16. Examples ￮ 1 of 5 app servers goes down ￮ Load on the other 4 increases by 20% ￮ Redis caches are invalidated - overload ￮ Varnish is restarted by a system administrator to apply a configuration change ￮ App servers start to return 503 errors ￮ MySQL master goes down ￮ MySQL slave 1 takes over and at this moment there is no downtime ￮ MySQL slave 2 is behind the new master ￮ The new MySQL master goes down too result is a broken DB or outdated one

17.

18.

19. KEY TAKEAWAYS 1. Embrace Failure and Design for Failure 2. Automate Recovery 3. Log all incidents and analyse them 4. Measure and graph the performance of all components 5. Regularly brake things on purpose in order to test

20. RESOURCES Injecting Failure at Netflix - goo.gl/YE1sEY What is SRE - goo.gl/2lI8E0 SRE book - goo.gl/bfL2At Netflix Open Source Software - https://netflix.github.io/ Etsy “Measure Everything” - goo.gl/CPVUT5

21. JOIN US FOR CONTRIBUTION SPRINTS First Time Sprinter Workshop - 9:00-12:00 - Room Wicklow2A Mentored Core Sprint - 9:00-18:00 - Wicklow Hall 2B General Sprints - 9:00 - 18:00 - Wicklow Hall 2A

22. Evaluate This Session THANK YOU! events.drupal.org/dublin2016/schedule WHAT DID YOU THINK?

Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Similar to Enterprise Drupal Application & Hosting Infrastructure Level Monitoring (20)

Recently uploaded

Recently uploaded (20)

Enterprise Drupal Application & Hosting Infrastructure Level Monitoring