Successfully reported this slideshow.

Clustered PHP - DC PHP 2009

1,764 views

Published on

Presentation from DC PHP 2009

Published in: Technology
  • Be the first to comment

Clustered PHP - DC PHP 2009

  1. 1. Clustered PHPDC PHP 2009<br />by Marcel Esser – marcel.esser@croscon.com<br />Senior Developer & Systems Analyst, CROSCON<br />
  2. 2. what is clustering?<br />“A computer cluster is a group of linked computers, working together <br />closely so that in many respects they form a single computer.”<br />- Wikipedia<br />
  3. 3. what is clustering?<br />“A computer cluster is a group of linked computers, working together <br />closely so that in many respects they form a single computer.”<br />- Wikipedia<br />…in other words, it’s really hard.<br />
  4. 4. what is clustering not?<br />Clustering is not high availability.<br />… but you often achieve high availability with clustering.<br />
  5. 5. what is clustering not?<br />Clustering is not high availability.<br />… but you often achieve high availability with clustering.<br />Clustering is not high volume.<br />… but you often achieve high volume with clustering.<br />
  6. 6. what is clustering not?<br />Clustering is not high availability.<br />… but you often achieve high availability with clustering.<br />Clustering is not high volume.<br />… but you often achieve high volume with clustering.<br />We want both.<br />
  7. 7. also…<br />clustering is really,<br />really hard<br />
  8. 8. some do it well<br />
  9. 9. some … eh<br />well, nevermind.<br />
  10. 10. why cluster?<br />service more users<br />
  11. 11. why cluster?<br />service more users<br />service users faster<br />
  12. 12. why cluster?<br />service more users<br />service users faster<br />increase reliability<br />
  13. 13. why cluster?<br />service more users<br />service users faster<br />increase reliability<br />get rich<br />
  14. 14. objectives<br />linear capacity increase<br />
  15. 15. 1 server - 100 lolcats per second<br />
  16. 16. 2 servers - 200 lolcats per second<br />
  17. 17. 3 servers - 300 lolcats per second<br />
  18. 18. objectives<br />linear capacity increase<br />linear cost increase<br />
  19. 19. 100 lolcats - $100<br />
  20. 20. 200 lolcats - $200<br />
  21. 21. 300 lolcats - $300<br />
  22. 22. objectives<br />linear capacity increase<br />linear cost increase<br />exponential reliability increase<br />
  23. 23. common topics in clustering php<br /><ul><li> Load Balancing
  24. 24. Database Scaling
  25. 25. Replicated Storage
  26. 26. Backups
  27. 27. Data Caches
  28. 28. Distributed Sessions
  29. 29. Staging Strategies
  30. 30. Debugging
  31. 31. Background Services</li></li></ul><li>load balancing<br />Load<br />Servers<br />Client<br />Balancer<br />
  32. 32. load balancing<br />Your load balancer may or may not…<br /><ul><li> remove bad nodes from the pool
  33. 33. balance by performance
  34. 34. balance by weight
  35. 35. route by geolocation
  36. 36. support sticky sessions
  37. 37. have 1 million other features</li></li></ul><li>load balancing tools<br />some among thousands…<br /><ul><li> DNS servers
  38. 38. Big IP
  39. 39. Perlbal
  40. 40. nginx
  41. 41. Varnish</li></li></ul><li>database scaling<br />common things you can do:<br />partitioning<br />replication<br />sharding<br />
  42. 42. database partitioning<br /><ul><li> every user is assigned to a database server
  43. 43. users don’t share data between each other (between servers)
  44. 44. when you need more capacity, add another database server
  45. 45. works for some apps, doesn’t work for others</li></ul>implementation example: invoice and timesheet management app<br />
  46. 46. database replication (mysql)<br />master - master<br />master - slave<br />master - many slave<br />
  47. 47. database replication (mysql)<br />master - master<br />server 1 replicates (as master) to server 2 (acting as slave)<br />server 2 replicates (as master) to server 1 (acting as slave)<br /><ul><li> works well to a point
  48. 48. complete nightmare when replication gets desynchronized
  49. 49. doesn’t actually improve write performance
  50. 50. good for basic high availability</li></li></ul><li>database replication (mysql)<br />master - slave<br />server 1 replicates (as master) to server 2 (acting as slave)<br /><ul><li> good first step
  51. 51. makes you re-write your application to consider slave queries
  52. 52. doesn’t increase write performance
  53. 53. de-synchronization is relatively painless
  54. 54. replication lag</li></li></ul><li>database replication (mysql)<br />master – many slave<br />server 1 replicates (as master) to many servers (acting as slaves)<br /><ul><li> thundering read performance
  55. 55. makes you re-write your application to consider slave queries
  56. 56. doesn’t increase write performance
  57. 57. de-synchronization is relatively painless
  58. 58. replication lag</li></li></ul><li>database sharding<br /><ul><li> data is split between multiple database servers
  59. 59. logical index is kept of what data is where (for example, a mathematical index or a lookup chart)
  60. 60. you have to grab, parse, and correlate data across servers
  61. 61. theoretically limitless scalability
  62. 62. complicated</li></ul>implementation example: digg, facebook, etc<br />
  63. 63. replicated storage<br />common things you can do:<br />replicated file system<br />lookup tables<br />storage services<br />huge NAS arrays<br />
  64. 64. replicated file system (glusterfs)<br /><ul><li> very affordable
  65. 65. various replication modes
  66. 66. nothing to keep track of in your app
  67. 67. easy to implement
  68. 68. can cause massive failures if poorly configured</li></li></ul><li>lookup tables<br /><ul><li> very affordable
  69. 69. limitless mode; entirely up to you
  70. 70. entirely dependent on your application logic
  71. 71. can cause massive failures if poorly implemented</li></li></ul><li>storage services<br /><ul><li> very expensive
  72. 72. theoretically limit-less capacity
  73. 73. easy to use
  74. 74. data must be pulled back first if used locally
  75. 75. costs and bandwidth usage can be mitigated (for example, by putting a proxy in front of it)</li></li></ul><li>large NAS arrays<br /><ul><li> insanely expensive
  76. 76. insanely expensive
  77. 77. insanely expensive
  78. 78. bullet-proof fault tolerance… at a price
  79. 79. easy to use… for a price</li></li></ul><li>backups<br />common methods:<br />all-RAID (doesn’t work)<br />snapshots<br />copying from slaves<br />
  80. 80. backups<br />all-RAID doesn’t work<br />Why?<br />RAID won’t keep your application from<br />deleting data everywhere<br />
  81. 81. snapshots<br />use a mechanism to take a snapshot of the partition<br />i.e. LVM partitions<br /><ul><li> works really well
  82. 82. easy if you do it from the beginning
  83. 83. requires some planning
  84. 84. should be used with RAID drives</li></li></ul><li>copying from slaves<br />take a slave out of rotation and copy from it<br />i.e. MySQL databases<br /><ul><li> works really well
  85. 85. easy if you do it from the beginning
  86. 86. requires some planning
  87. 87. backups can be out of date</li></li></ul><li>data caches (memcached)<br />PHP doesn’t have cross-request persistence, so<br />someone added it: memcached<br /><ul><li> in-memory
  88. 88. fast
  89. 89. scalable
  90. 90. proven
  91. 91. use it</li></ul>Got configuration data? Small, high-TTL data sets?<br />Use APC.<br />Large, high-TTL data sets? Use files.<br />Mind the race condition.<br />
  92. 92. replicated sessions<br />pick your poison:<br /><ul><li> memcache w. redundancy
  93. 93. database
  94. 94. shared file system (don’t actually do this)</li></li></ul><li>staging strategy<br />If you value your free time:<br />Dev<br />Stage<br />Live<br />Test<br />
  95. 95. staging strategy (dev)<br /><ul><li> do use source control systems (subversion, etc)
  96. 96. do profile your to loop for obvious performance issues
  97. 97. do use phpdoc tags
  98. 98. do make your dev environment as similar to live as practical (i.e., don’t develop on Windows and run live on UNIX)
  99. 99. do document all your changes
  100. 100. do use TDD (test-driven development)</li></li></ul><li>staging strategy (test)<br /><ul><li> do make test functionally identical to live, except for data
  101. 101. do create data fixtures that are representative of real-life data
  102. 102. do create functional tests for the user interface (Selenium)
  103. 103. do not push anything to stage that did not pass unit tests</li></li></ul><li>staging strategy (stage)<br /><ul><li> do make stage identical to a live node
  104. 104. do connect to the live database
  105. 105. do have test ‘users’ to perform destructive operations against
  106. 106. do have a mechanism to automate pushing stage to live</li></li></ul><li>staging strategy (live)<br /><ul><li> do not ever make changes by hand on live
  107. 107. do automate pushing updates
  108. 108. do take nodes out rotation when you push updates
  109. 109. do not allow ssh access to live except when really needed</li></li></ul><li>debugging<br /><ul><li> do use xdebug on dev, test, and stage
  110. 110. do prepare an automated action that can turn xdebug and profiling on/off on 1 of the live nodes. you can and will run into errors that only exist on live.
  111. 111. do write a test case to replicate the the bug first and then fix the bug, whenever possible
  112. 112. do first look if bugs are explainable by platform differences between development and production systems (i.e., don’t develop on Windows and deploy on UNIX)
  113. 113. do go to my talk at ZendCon in October, “It Works on Dev”</li></li></ul><li>background services<br /><ul><li> do void launching background processes from the web app
  114. 114. PHP doesn’t have a native message queue, so (many) people wrote some. example, gearmand. do use a message queue.
  115. 115. do check for memory leaks in background tasks! many php libraries and also many php versions themselves still leak memory. try to write a loop in bash for a background task rather than in php. recycle the process often.
  116. 116. do plan your message format carefully
  117. 117. do persist important messages</li></li></ul><li>questions<br />…anybody?<br />
  118. 118. shameless plug<br />CROSCON<br />Bespoke Application Development<br />Consulting<br />Security<br />Marcel Esser<br />marcel.esser@croscon.com<br />(202) 730-9728<br />Personal? marcel.esser@gmail.com<br />

×