High Availability
High Performance
Plone
Guido Stevens
guido.stevens@cosent.nl
www.cosent.nl
Social Knowledge Technology
Plone Worldwide
Resilience
Please wave, to improve my speech
Plone as usual
● Aspeli: über-buildout for a
production Plone server
● Regebro:
Plone-Buildout-Example
– nginx frontend
– varnish cache
– haproxy balancer
– 4x plone instance
– zeo backend
Plone as usual
Plone as usual
webserver :80
Plone as usual
caching
Plone as usual
balancing across Plone instances
Plone as usual
Plone instances
Plone as usual
ZEO backend
Meet the client
● High-profile internet technology NGO
● Slashdot traffic levels
– 0.4 million page views / peak day
– 4 million page views / month
– 40 million hits / month
● Mission-critical web presence
● 100% uptime previous 5 years
● Non-Plone sysadmins
● High security
No can do
SPOF
SPOF
WTF?
Architecture Goals
● Must convince “file-based 100% uptime” sysadmins
● No SPOF
– eliminate all Single Points Of Failure
● Automated failover
– no manual intervention
● Extreme performance
● Extreme resilience
– killall -9 Plone
Meet Paul Stevens
● My brother
● mod_wodan + DBmail
● Plone developer
● pjstevns on irc/github/etc
NFG Net Facilities Group
● premium hosting
● 24/7 MySQL HA
– since stone age
● www.nfg.nl
Plone as usual
3-tier
Plone as usual
Duplicate setup
Load Balancer
Load Balancer
● Client provided hardware load balancer
● Alternative: Linux Virtual Server + HAproxy
– 2x HAproxy in active/passive config
● this would be an EXTRA layer of HAproxy not shown in diagram
– use highly available “virtual” IP address
– monitor with Heartbeat or comparable
– failover virtual IP addres with arping broadcasts
● Alternative: AWS
Load Balancer
Ensure physical separation
● Ensure redundancy across physical servers
– no use to fail over on same machine
– separate machines in separate data centers
● Gotcha: moving virtuals around
– Disable HA facilities of virtualization platform
– We'll do our own HA
Full cluster
Replacing ZEO
ZEO versus Relstorage
● ZEO
– ZEO protocol
– filestorage
– object pickles
● ZRS Replication
– $$$ at the time
– later opensourced
● No hot-failover
– slave master reconfig→
● Relstorage
– ZEO protocol
– MySQL or PostgreSQL
– object pickles: no alchemy!
● MySQL replication
– done that 24/7 since 2001
– widely used
● Hot failover
– multi-master
Relstorage on MySQL
Blobstorage
● Not shown in diagram
● Client provided Netapp Metrocluster NFS disks
– no need to care about replication and HA for those
● Alternatives:
– DRBD + NFS
– AWS Elastic Block Device
– F-sniper + rsync + NFS
● Why not run database on that?
– disk replication + NFS + ZEO
– what can possibly go wrong?
Full cluster
Apache + Wodan
mod_wodan
● Caching module for Apache
– C
– Originally by ICS for nu.nl
– Now maintained by NFG
● Store response body + headers on disk
● BOFH attitude to caching policies
● Used in anger
● Alternative: stxnext.staticdeployment
Varnish ↔ Wodan
● Proxy process
● RAM memory cache
– restart → empty cache
– expired → gone
● Plays nice
– request + response headers
– etag split-view
● purge API
– plone.app.caching
● Apache module
● Persistent disk cache
– restart full cache→
– expired keep fallback→
● BOFH
– my way or the highway
– single cache file per page
● Cronjobs maintenance
– crawl sitemap
– delete removed pages
Varnish plus Wodan
Varnish
● unload Plone
● plone.app.caching policies
– pages 1 hour
– resources longer
– purge on edit
● etag split-view
– per-user page versions
– cache authenticated
Wodan
● failsafe content delivery
● hard policy config
– pages 1 minute
– resources longer
– edit 1-minute refresh→
● Gotcha: anonymous only
– editors bypass Wodan
Failure Modes
Full cluster
MySQL failover
Multi Master MySQL
● multi-master
– cross replication
● each slaves the other
– any can be master
● hot failover and failback
● Gotcha: use only 1 master at a time
– Relstorage is not multi-master
– avoid replication errors
● mmm_agent server (not shown in diagram)
– monitors mysql health and replication
– manages virtual MySQL HA ip address
● think: Heartbeat for MySQL
Blade failure
Wodan only
Plone as usual
file-based
content
delivery
Readonly Rescue Mode
● File-based content delivery
– mod_wodan
– full cache of all pages + resources
– cached search results (Subject / tag cloud)
● AJAX-driven graceful degradation
– detect backend down via non-cached lightweight view
● @@ipaddress not a full page: minimal rendering overhead
– disable interactive elements via CSS
● search bar, personal tools display:none→
● Gotcha: anonymous only
– down for authenticated until manual reconfig→
● Gotcha: ErrorDocument
– pre-cache nice page but preserve http error status code→
No-downtime maintenance
Full cluster
cosent.nl/blog

High-performance high-availability Plone

  • 1.
    High Availability High Performance Plone GuidoStevens guido.stevens@cosent.nl www.cosent.nl Social Knowledge Technology
  • 2.
  • 3.
  • 4.
    Please wave, toimprove my speech
  • 6.
    Plone as usual ●Aspeli: über-buildout for a production Plone server ● Regebro: Plone-Buildout-Example – nginx frontend – varnish cache – haproxy balancer – 4x plone instance – zeo backend
  • 7.
  • 8.
  • 9.
  • 10.
    Plone as usual balancingacross Plone instances
  • 11.
  • 12.
  • 13.
    Meet the client ●High-profile internet technology NGO ● Slashdot traffic levels – 0.4 million page views / peak day – 4 million page views / month – 40 million hits / month ● Mission-critical web presence ● 100% uptime previous 5 years ● Non-Plone sysadmins ● High security
  • 14.
  • 15.
    Architecture Goals ● Mustconvince “file-based 100% uptime” sysadmins ● No SPOF – eliminate all Single Points Of Failure ● Automated failover – no manual intervention ● Extreme performance ● Extreme resilience – killall -9 Plone
  • 16.
    Meet Paul Stevens ●My brother ● mod_wodan + DBmail ● Plone developer ● pjstevns on irc/github/etc NFG Net Facilities Group ● premium hosting ● 24/7 MySQL HA – since stone age ● www.nfg.nl
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
    Load Balancer ● Clientprovided hardware load balancer ● Alternative: Linux Virtual Server + HAproxy – 2x HAproxy in active/passive config ● this would be an EXTRA layer of HAproxy not shown in diagram – use highly available “virtual” IP address – monitor with Heartbeat or comparable – failover virtual IP addres with arping broadcasts ● Alternative: AWS
  • 23.
  • 24.
    Ensure physical separation ●Ensure redundancy across physical servers – no use to fail over on same machine – separate machines in separate data centers ● Gotcha: moving virtuals around – Disable HA facilities of virtualization platform – We'll do our own HA
  • 25.
  • 26.
  • 27.
    ZEO versus Relstorage ●ZEO – ZEO protocol – filestorage – object pickles ● ZRS Replication – $$$ at the time – later opensourced ● No hot-failover – slave master reconfig→ ● Relstorage – ZEO protocol – MySQL or PostgreSQL – object pickles: no alchemy! ● MySQL replication – done that 24/7 since 2001 – widely used ● Hot failover – multi-master
  • 28.
  • 29.
    Blobstorage ● Not shownin diagram ● Client provided Netapp Metrocluster NFS disks – no need to care about replication and HA for those ● Alternatives: – DRBD + NFS – AWS Elastic Block Device – F-sniper + rsync + NFS ● Why not run database on that? – disk replication + NFS + ZEO – what can possibly go wrong?
  • 30.
  • 31.
  • 32.
    mod_wodan ● Caching modulefor Apache – C – Originally by ICS for nu.nl – Now maintained by NFG ● Store response body + headers on disk ● BOFH attitude to caching policies ● Used in anger ● Alternative: stxnext.staticdeployment
  • 33.
    Varnish ↔ Wodan ●Proxy process ● RAM memory cache – restart → empty cache – expired → gone ● Plays nice – request + response headers – etag split-view ● purge API – plone.app.caching ● Apache module ● Persistent disk cache – restart full cache→ – expired keep fallback→ ● BOFH – my way or the highway – single cache file per page ● Cronjobs maintenance – crawl sitemap – delete removed pages
  • 34.
    Varnish plus Wodan Varnish ●unload Plone ● plone.app.caching policies – pages 1 hour – resources longer – purge on edit ● etag split-view – per-user page versions – cache authenticated Wodan ● failsafe content delivery ● hard policy config – pages 1 minute – resources longer – edit 1-minute refresh→ ● Gotcha: anonymous only – editors bypass Wodan
  • 35.
  • 36.
  • 37.
  • 38.
    Multi Master MySQL ●multi-master – cross replication ● each slaves the other – any can be master ● hot failover and failback ● Gotcha: use only 1 master at a time – Relstorage is not multi-master – avoid replication errors ● mmm_agent server (not shown in diagram) – monitors mysql health and replication – manages virtual MySQL HA ip address ● think: Heartbeat for MySQL
  • 39.
  • 40.
  • 41.
  • 42.
    Readonly Rescue Mode ●File-based content delivery – mod_wodan – full cache of all pages + resources – cached search results (Subject / tag cloud) ● AJAX-driven graceful degradation – detect backend down via non-cached lightweight view ● @@ipaddress not a full page: minimal rendering overhead – disable interactive elements via CSS ● search bar, personal tools display:none→ ● Gotcha: anonymous only – down for authenticated until manual reconfig→ ● Gotcha: ErrorDocument – pre-cache nice page but preserve http error status code→
  • 43.
  • 44.
  • 45.