High-performance high-availability Plone

3,798
-1

Published on

Presentation at the Plone Conference Brazil 2013.

How to create a Plone deployment that performs like crazy and survives not only a datacenter failure, but even keeps on running when all Plone heads are down.

Published in: Technology
4 Comments
10 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,798
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
29
Comments
4
Likes
10
Embeds 0
No embeds

No notes for slide

High-performance high-availability Plone

  1. 1. High Availability High Performance Plone Guido Stevens guido.stevens@cosent.nl www.cosent.nl Social Knowledge Technology
  2. 2. Plone Worldwide
  3. 3. Resilience
  4. 4. Please wave, to improve my speech
  5. 5. Plone as usual ● Aspeli: über-buildout for a production Plone server ● Regebro: Plone-Buildout-Example – nginx frontend – varnish cache – haproxy balancer – 4x plone instance – zeo backend
  6. 6. Plone as usual
  7. 7. Plone as usual webserver :80
  8. 8. Plone as usual caching
  9. 9. Plone as usual balancing across Plone instances
  10. 10. Plone as usual Plone instances
  11. 11. Plone as usual ZEO backend
  12. 12. Meet the client ● High-profile internet technology NGO ● Slashdot traffic levels – 0.4 million page views / peak day – 4 million page views / month – 40 million hits / month ● Mission-critical web presence ● 100% uptime previous 5 years ● Non-Plone sysadmins ● High security
  13. 13. No can do SPOF SPOF WTF?
  14. 14. Architecture Goals ● Must convince “file-based 100% uptime” sysadmins ● No SPOF – eliminate all Single Points Of Failure ● Automated failover – no manual intervention ● Extreme performance ● Extreme resilience – killall -9 Plone
  15. 15. Meet Paul Stevens ● My brother ● mod_wodan + DBmail ● Plone developer ● pjstevns on irc/github/etc NFG Net Facilities Group ● premium hosting ● 24/7 MySQL HA – since stone age ● www.nfg.nl
  16. 16. Plone as usual
  17. 17. 3-tier
  18. 18. Plone as usual
  19. 19. Duplicate setup
  20. 20. Load Balancer
  21. 21. Load Balancer ● Client provided hardware load balancer ● Alternative: Linux Virtual Server + HAproxy – 2x HAproxy in active/passive config ● this would be an EXTRA layer of HAproxy not shown in diagram – use highly available “virtual” IP address – monitor with Heartbeat or comparable – failover virtual IP addres with arping broadcasts ● Alternative: AWS
  22. 22. Load Balancer
  23. 23. Ensure physical separation ● Ensure redundancy across physical servers – no use to fail over on same machine – separate machines in separate data centers ● Gotcha: moving virtuals around – Disable HA facilities of virtualization platform – We'll do our own HA
  24. 24. Full cluster
  25. 25. Replacing ZEO
  26. 26. ZEO versus Relstorage ● ZEO – ZEO protocol – filestorage – object pickles ● ZRS Replication – $$$ at the time – later opensourced ● No hot-failover – slave master reconfig→ ● Relstorage – ZEO protocol – MySQL or PostgreSQL – object pickles: no alchemy! ● MySQL replication – done that 24/7 since 2001 – widely used ● Hot failover – multi-master
  27. 27. Relstorage on MySQL
  28. 28. Blobstorage ● Not shown in diagram ● Client provided Netapp Metrocluster NFS disks – no need to care about replication and HA for those ● Alternatives: – DRBD + NFS – AWS Elastic Block Device – F-sniper + rsync + NFS ● Why not run database on that? – disk replication + NFS + ZEO – what can possibly go wrong?
  29. 29. Full cluster
  30. 30. Apache + Wodan
  31. 31. mod_wodan ● Caching module for Apache – C – Originally by ICS for nu.nl – Now maintained by NFG ● Store response body + headers on disk ● BOFH attitude to caching policies ● Used in anger ● Alternative: stxnext.staticdeployment
  32. 32. Varnish ↔ Wodan ● Proxy process ● RAM memory cache – restart → empty cache – expired → gone ● Plays nice – request + response headers – etag split-view ● purge API – plone.app.caching ● Apache module ● Persistent disk cache – restart full cache→ – expired keep fallback→ ● BOFH – my way or the highway – single cache file per page ● Cronjobs maintenance – crawl sitemap – delete removed pages
  33. 33. Varnish plus Wodan Varnish ● unload Plone ● plone.app.caching policies – pages 1 hour – resources longer – purge on edit ● etag split-view – per-user page versions – cache authenticated Wodan ● failsafe content delivery ● hard policy config – pages 1 minute – resources longer – edit 1-minute refresh→ ● Gotcha: anonymous only – editors bypass Wodan
  34. 34. Failure Modes
  35. 35. Full cluster
  36. 36. MySQL failover
  37. 37. Multi Master MySQL ● multi-master – cross replication ● each slaves the other – any can be master ● hot failover and failback ● Gotcha: use only 1 master at a time – Relstorage is not multi-master – avoid replication errors ● mmm_agent server (not shown in diagram) – monitors mysql health and replication – manages virtual MySQL HA ip address ● think: Heartbeat for MySQL
  38. 38. Blade failure
  39. 39. Wodan only
  40. 40. Plone as usual file-based content delivery
  41. 41. Readonly Rescue Mode ● File-based content delivery – mod_wodan – full cache of all pages + resources – cached search results (Subject / tag cloud) ● AJAX-driven graceful degradation – detect backend down via non-cached lightweight view ● @@ipaddress not a full page: minimal rendering overhead – disable interactive elements via CSS ● search bar, personal tools display:none→ ● Gotcha: anonymous only – down for authenticated until manual reconfig→ ● Gotcha: ErrorDocument – pre-cache nice page but preserve http error status code→
  42. 42. No-downtime maintenance
  43. 43. Full cluster
  44. 44. cosent.nl/blog
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×