Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

LCache DrupalCon Dublin 2016

301 views

Published on

Scalable caching in Drupal is broken. Once cache access saturates a network link, the main options are Memcache sharding (which has broken coherency during and after network splits) and Redis clustering (immature in multi-master and as complex as MySQL replication in master/replica modes).

We can do better. We can have better performance, scale, and operational simplicity. We just need to take a lesson from multicore processor architectures and their use of L1/L2 caches. Drupal doesn't even need full-scale coherency management; it just needs the cache writes on an earlier request to be guaranteed readable on a later request.

Published in: Engineering
  • Be the first to like this

LCache DrupalCon Dublin 2016

  1. 1. Intelligent, Tiered, Scalable Caching with LCache 1
  2. 2. Existing Cachinng Challenges 2
  3. 3. Pantheon.io Traditional Web Caching 3 Redis or Memcache Cache Traffic Web Server Web Server Web Server Web Server Bottleneck
  4. 4. Pantheon.io The Anatomy of a Bottleneck 4
  5. 5. Pantheon.io Scaling Traditional Web Caching 5 Redis or Memcache Cache Traffic Web Server Web Server Web Server Web Server Redis or Memcache ● Use replication? ○ Failover issues ○ Replication lag or slow writes ● Use sharding? ○ Consistency issues ● Still network-bound
  6. 6. Proudly Designed Elsewhere: Employing Known Solutions 6
  7. 7. Pantheon.io Existing Solutions: Multi-Core Processors 7
  8. 8. Pantheon.io W rites Existing Solutions: Pantheon’s Valhalla 8 Application Container File Mount Cache Application Container File Mount Cache Application Container File Mount Cache File Server File Server File Server Events
  9. 9. Pantheon.io Row Changes (No SQL) SQL Existing Solutions: MySQL Row Replication 9 MySQL Primary Application MySQL Replica shell> mysqlbinlog -vv log_file ... # at 302 #080828 15:03:08 server id 1 end_log_pos 356 Update_rows: table id 17 flags: STMT_END_F BINLOG ' fAS3SBMBAAAALAAAAC4BAAAAABEAAAAAAAAABHRlc3QAAXQAAwMPCgIUAAQ= fAS3SBgBAAAANgAAAGQBAAAQABEAAAAAAAEAA////AEAAAAFYXBwbGX4AQAAAARwZWFyIbIP '/*!*/; ### UPDATE test.t ### WHERE ### @1=1 /* INT meta=0 nullable=0 is_null=0 */ ### @2='apple' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */ ### @3=NULL /* VARSTRING(20) meta=0 nullable=1 is_null=1 */ ### SET ### @1=1 /* INT meta=0 nullable=0 is_null=0 */ ### @2='pear' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */ ### @3='2009:01:01' /* DATE meta=0 nullable=1 is_null=0 */ Keeps Replication Simple!
  10. 10. Pantheon.io “Because it’s faster, of course.” ● Inspired by multicore processors ⌾ Get the working set close to the work ⌾ Trade some write performance and scale for massive read gains ⌾ Hide the coherency management ● Inspired by Pantheon’s Valhalla file system ⌾ Write-through: clients can leave at any point ⌾ Incremental changes freshen the local cache ⌾ Only as read-after-write consistent as it needs to be ● Inspired by MySQL row-based replication ⌾ Materialize complex tag deletion on the primary instance and only replicate the key-based changes 10
  11. 11. Pantheon.io Contrast: ChainedFastBackend 11 LCache ChainedFastBackend Beginning of Request Synchronizes cache writes and bin/key invalidations. One SELECT query. Updates bin invalidation data. One SELECT query. Read Key Reads local cache. If no key does not exist in the local cache, reads consistent cache. No query if hitting local cache. Reads from local cache. No query if hitting local cache. Write or Invalidate Key Writes to local and consistent caches. One INSERT query. Writes to local and consistent caches. Invalidates entire bin in all local caches. Up to two queries per write. Invalidate Tag Writes to consistent cache and generates key invalidations. Multiple queries. Writes to consistent cache. Invalidates entire bin in local caches. End of Request Garbage-collects deletions. Executes one batched DELETE query (if cache writes have occurred) after request closes. No activity.
  12. 12. Challenges and Solutions 12
  13. 13. Pantheon.io Unexpected Issues ● Sites write to caches very often ⌾ Seeing 10-40 cache “sets” per page ⌾ LCache’s “sets” are expensive (transactional database plus replication to clients) ⌾ Most modules assume a miss is a good reason to “set.” ⌾ Some cache items are “set” more than “get.” ● Using tags for bins was not fast enough ⌾ Relational model created too much overhead ⌾ Materializing the clearing of a whole bin wasn’t efficient (replicated many, many changes) ⌾ Moved to native bin support 13
  14. 14. Pantheon.io Write Models to Optimize the “Set” Path 14 Low Splay (each write to random choice of 64 keys) High Splay (each write to random choice of 4096 keys) 10 Processes ✕ 40 Writes Each Winner here! And not worse here!
  15. 15. Pantheon.io Machine Learning: Avoiding Useless “Sets” 15 Loading iterator... Iterating... Array ( [lcache:10.223.176.176:18341:5:cache:environment_indicator] => 5634 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:taxonomy_term] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_8] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_1] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_2] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_3] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:calendar] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_5] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:redirects] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:backlinks] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_7] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_6] => 3036 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:frontpage] => 3036 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_4] => 3036 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:agency_search] => 3036 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:glossary] => 3036 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:campaign] => 3036 LCache starts ignoring at 100 now
  16. 16. Pantheon.io Configuration: Assigning Bins and Keys ● Better with LCache ⌾ Frequently read ⌾ Rarely written ⌾ Large ● Worse (or not ideal) with LCache ⌾ Read once or not at all (e.g. form cache should use normal database cache backend) ⌾ Things handleable earlier in the stack (e.g. Varnish instead of Drupal’s page cache) ⌾ Keys updated often (partly mitigated with machine learning) ⌾ Clearing 100+ keys with a tag (because of replication) 16
  17. 17. Built for Reliability 17
  18. 18. Pantheon.io Test-Driven Development 18
  19. 19. Pantheon.io Composer-Based Library 19
  20. 20. Pantheon.io Lightweight Adapters for Frameworks ● Stateless ● Composer inclusion of the LCache library ● Modules and extensions ⌾ Drupal 7 module ⌾ Drupal 8 module ⌾ WordPress drop-in ● Drupal 8.3+ core? 20
  21. 21. Performance and Scalability 21
  22. 22. Pantheon.io Comparing Against Redis: Performance 22
  23. 23. Pantheon.io Comparing Against Redis: Concurrency 23
  24. 24. Pantheon.io Going Live: Performance 24
  25. 25. Pantheon.io Going Live: Impact on Databases 25
  26. 26. Next Steps 26
  27. 27. Pantheon.io Further Performance Improvements ● Try mysqli with asynchronous queries for the initial synchronization. ⌾ Upside: No synchronous wait on obtaining events. ⌾ Downside: Yet another database connection. ● Synchronize (again) in the destructor after the request closes. ⌾ Upside: Potentially handles some events without users waiting. ⌾ Downside: Additional database queries. ● SQLite L1 cache ⌾ Upside: Persists across PHP-FPM restarts. Useful with CLI. Cache can be larger than memory. ⌾ Downside: Slower writes. Possible lock contention. 27
  28. 28. Pantheon.io Ambitions for Core ● ChainedFastBackend isn’t going to cut it. ⌾ Not usable for most cache bins. ⌾ Administrators need to carefully choose when to introduce it. ⌾ Degrades rapidly on cache writes. ● Even just the LCache L2 component is faster than Drupal’s built-in caches. ⌾ INSERT-only model is a big win. ⌾ LCache can use a Null L1 seamlessly. ● Relying on Composer-based libraries is widespread in Drupal 8. ● A default cache for most bins 28
  29. 29. Pantheon.io PSR-6 and PSR-16 ● PSR-6 ⌾ No concept of cache tags, an essential part of Drupal 8 caching. ⌾ No concept of retrieving invalidated items. (Not supported in LCache yet, but supported by Drupal 8.) ⌾ Interesting concept of deferred persistence. ● PSR-16 ⌾ Counter interface wouldn’t be consumed by Drupal 8 (but would be by WordPress). ⌾ Mostly built on PSR-6. 29
  30. 30. @DavidStrauss david@pantheon.io 30 Questions?

×