Efficient Shared Data in Perl

3,819 views

Published on

This talk was o

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Efficient Shared Data in Perl

  1. 1. Efficient Shared Data in Perl <ul><ul><li>Perrin Harkins </li></ul></ul>
  2. 2. What’s your problem? <ul><li>Apache is multi-process </li></ul><ul><li>Process assignment is random </li></ul><ul><li>Information wants to be shared </li></ul><ul><li>Inter-process data sharing is ad hoc </li></ul>
  3. 3. Sharing is good for <ul><li>Sessions </li></ul><ul><li>Caching </li></ul><ul><li>Usually transient data </li></ul><ul><li>Otherwise, use a RDBMS </li></ul>
  4. 4. Approaches <ul><li>Files </li></ul><ul><ul><li>One big file </li></ul></ul><ul><ul><li>One file per record </li></ul></ul><ul><li>DBM </li></ul><ul><li>Shared memory </li></ul><ul><ul><li>Seems like the obvious choice, but… </li></ul></ul><ul><li>RDBMS </li></ul>
  5. 5. Playing well together <ul><li>Atomic updates </li></ul><ul><ul><li>Prevents corruption </li></ul></ul><ul><li>Exclusive Locking </li></ul><ul><ul><li>Prevents lost updates </li></ul></ul><ul><ul><li>Without this, last save wins </li></ul></ul>Perl Fund Blossom Buttercup $100 $105 $2100 $100
  6. 6. Cache::Cache <ul><li>Consistent interface to multiple storage methods </li></ul><ul><ul><li>File system </li></ul></ul><ul><ul><li>Shared memory via IPC::ShareLite </li></ul></ul><ul><li>Many cache-related features built in </li></ul><ul><ul><li>Expiration times </li></ul></ul><ul><ul><li>Size limit </li></ul></ul><ul><ul><li>Multiple namespaces </li></ul></ul>
  7. 7. Cache::Cache, continued <ul><li>Atomic updates </li></ul><ul><li>Easy to install </li></ul><ul><ul><li>No compiler needed for file-based storage </li></ul></ul><ul><li>Benchmarks are on backend storage classes </li></ul><ul><ul><li>Cache::FileBackend not Cache::FileCache </li></ul></ul>
  8. 8. Cache::Mmap <ul><li>Uses one big mmap’ed file </li></ul><ul><li>Many tuning options </li></ul><ul><ul><li>Size of blocks </li></ul></ul><ul><ul><li>Size of locking regions </li></ul></ul><ul><li>Optimization for scalar data </li></ul><ul><li>Uses locks internally </li></ul><ul><li>Requires compiler </li></ul>
  9. 9. MLDBM::Sync <ul><li>Extension of MLDBM </li></ul><ul><ul><li>Originally developed for Apache::ASP </li></ul></ul><ul><ul><li>Uses lock file, tie/untie </li></ul></ul><ul><li>Choice of DBM types </li></ul><ul><ul><li>SDBM is fastest, but limited </li></ul></ul><ul><li>Tied interface </li></ul><ul><li>Locks on entire database </li></ul><ul><li>Explicit locking in API </li></ul><ul><li>Can run with standard library </li></ul>
  10. 10. BerkeleyDB <ul><li>Not DB_File, BerkeleyDB.pm </li></ul><ul><li>Requires Berkeley DB library from sleepycat.com </li></ul><ul><li>Tricky to install on some systems </li></ul><ul><li>Tied or OO interface </li></ul><ul><li>No built-in support for complex data structures </li></ul><ul><li>Locks on entire database or on pages </li></ul><ul><li>Supports transactions </li></ul><ul><li>Shared memory cache </li></ul><ul><li>Tests are on BTree </li></ul>
  11. 11. IPC::MM <ul><li>Interface for Engelschall’s mm </li></ul><ul><li>Implements shared BTree and Hash in C </li></ul><ul><li>Tied interface </li></ul><ul><li>Data is not persistent </li></ul><ul><li>Only shares between related processes </li></ul>
  12. 12. Tie::TextDir <ul><li>Dirt-simple: one record per file </li></ul><ul><li>Keys must be legal file names </li></ul><ul><li>No compiler needed </li></ul><ul><li>Doesn’t handle complex data structures </li></ul>
  13. 13. IPC::Shareable <ul><li>Very Perlish and transparent </li></ul><ul><li>Shared memory </li></ul><ul><li>Lots going on under the hood </li></ul><ul><li>Explicit locking supported </li></ul><ul><li>Tied interface </li></ul><ul><li>Requires a compiler </li></ul>
  14. 14. DBD::SQLite <ul><li>Fast, single-file SQL engine in a DBD </li></ul><ul><li>Full transaction support! </li></ul><ul><li>Locking between processes at database level </li></ul>
  15. 15. DBD::MySQL <ul><li>Adds network capabilities </li></ul><ul><li>Atomic updates or transactions </li></ul><ul><li>More work than most to set up </li></ul>
  16. 16. memcached <ul><li>Networked daemon </li></ul><ul><li>Intended for clusters </li></ul><ul><li>Non-blocking I/O </li></ul><ul><li>Clients for Perl, PHP, Java </li></ul><ul><li>Requires a Linux kernel patch, until 2.6 is out </li></ul>
  17. 17. Testing Methodology <ul><li>P4 2.53 Ghz, 512MB RAM, Red Hat 9, ext3, Perl 5.8.0 </li></ul><ul><li>Abstraction layer IPC::SharedHash </li></ul><ul><ul><li>Implements new(), fetch(), store() </li></ul></ul><ul><ul><li>Handles serialization where necessary </li></ul></ul><ul><ul><li>Calls FETCH() and STORE() instead of using tied interface </li></ul></ul><ul><li>mod_perl handler </li></ul><ul><li>ab (Apache Bench) </li></ul>
  18. 18. Variables <ul><li>Number of parallel clients </li></ul><ul><li>Percentage of writes </li></ul><ul><ul><li>Sessions can have a lot of writes </li></ul></ul><ul><ul><li>Caches are mostly read, by definition </li></ul></ul><ul><li>Locality of access </li></ul><ul><li>Scalars vs. complex data </li></ul>
  19. 19. Read-Only Sharing
  20. 20. Effect of Increasing Clients
  21. 21. Effect of Read/Write Ratio
  22. 22. Scalars vs. Complex Data Structures
  23. 23. Latest Results
  24. 24. Analysis <ul><li>Why is shared memory so slow? </li></ul><ul><ul><li>Still has to serialize </li></ul></ul><ul><ul><li>Moving too much data at once </li></ul></ul><ul><li>What about IPC::MM? </li></ul><ul><ul><li>Moves one at a time </li></ul></ul><ul><ul><li>Moving parts are in C </li></ul></ul><ul><li>Why is the file system so fast? </li></ul><ul><ul><li>Modern VM system </li></ul></ul><ul><ul><li>Kernel-managed caching </li></ul></ul>
  25. 25. Analysis <ul><li>Why is Tie::TextDir faster than Cache::FileBackend? </li></ul><ul><ul><li>Digest::SHA1 </li></ul></ul><ul><ul><li>Splitting into multiple directories not normally necessary on modern filesystems: </li></ul></ul><ul><ul><li>/mu/lt/ip/ledirs </li></ul></ul>
  26. 26. Problems with this test <ul><li>Size of values not considered </li></ul><ul><li>Size of overall hash not considered correctly </li></ul><ul><li>BerkeleyDB should be tested with fancier lock mode </li></ul><ul><li>Needs a real network test for memchached and MySQL </li></ul><ul><li>Should try harder to reduce margin of error </li></ul>
  27. 27. A Word About Clustering <ul><li>Shared filesystems </li></ul><ul><ul><li>NFS </li></ul></ul><ul><ul><li>Samba/CIFS </li></ul></ul><ul><li>RDBMS </li></ul><ul><ul><li>Most reliable, well understood, easy integration </li></ul></ul><ul><li>Replicated data </li></ul><ul><ul><li>Multicast </li></ul></ul><ul><ul><li>Spread </li></ul></ul>
  28. 28. What about threads? <ul><li>Apache 2/mod_perl 2/Perl 5.8 bring threads to the table </li></ul><ul><li>Still not clear how this will work with complex data structures and objects </li></ul><ul><li>Threaded performance is mostly bad in 5.8 </li></ul>
  29. 29. Questions to help you choose <ul><li>Do you need to store complex data? </li></ul><ul><ul><li>BerkeleyDB, Tie::TextDir, and IPC::MM require a wrapper for this </li></ul></ul><ul><li>Are your keys valid filenames? </li></ul><ul><ul><li>Tie::TextDir does not hash the keys </li></ul></ul><ul><li>Do you need persistence? </li></ul><ul><ul><li>IPC::MM is not persistent </li></ul></ul><ul><li>Do you need explicit locking? </li></ul><ul><ul><li>MLDBM::Sync, MySQL, BerkeleyDB </li></ul></ul>
  30. 30. Questions to help you choose <ul><li>No compiler? </li></ul><ul><ul><li>Cache::FileBackend, Tie::TextDir, MLDBM::Sync if you have Storable </li></ul></ul><ul><li>Need clustering? </li></ul><ul><ul><li>DBD::MySQL, memcached </li></ul></ul>

×