Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MNPHP Scalable Architecture 101 - Feb 3 2011


Published on

An overall presentation on scaling out your system starting from a single server and many of the several options you may face.

Published in: Technology
  • Be the first to comment

MNPHP Scalable Architecture 101 - Feb 3 2011

  1. 1. Mike Willbanks Blog: Twitter : mwillbanks IRC : lubs on freenode Scalable Architectures 101 MNPHP Feb 3, 2011
  2. 2. Scalability? <ul>Your application is growing, your systems are slowing and growth is inevitable... <li>Where do we go from here? </li><ul><li>Load Balancing
  3. 3. Web Servers
  4. 4. Database Servers
  5. 5. Cache Servers </li></ul></ul><ul><ul><li>Job Servers
  6. 6. DNS Servers
  7. 7. CDN Servers
  8. 8. Front-End Performance </li></ul></ul>
  9. 9. The Beginning... <ul>Single Server Syndrome <li>One Server Many Functions </li><ul><li>Web Server, Database Server, Cache Server, Job Server, DNS Server, Mail Server.... </li></ul><li>How we know it's time </li><ul><li>iostat, cpu load, overall degradation </li></ul></ul>
  10. 10. The Next Step... <ul>Single Separation Syndrome <li>Separation of Web and Database </li><ul><li>Fix the main disk I/O bottleneck. </li></ul><li>However, we can't handle our current I/O, CPU or amount of requests on our web server. </li></ul>
  11. 11. Load Balancing
  12. 12. Load Balancing Our Environment
  13. 13. Several Options <ul><li>DNS Rotation (Little to No Cost) </li><ul><li>Not very reliable, but works on a small scale. </li></ul><li>Software Based (Commodity Server Cost) </li><ul><li>HAProxy, Pound, Varnish, Squid, Wackamole, Perlbal, Web Server Proxy... </li></ul><li>Hardware Based (High Cost Appliance) </li><ul><li>Several vendors ranging based on need. </li><ul><li>A10, F5, etc. </li></ul></ul></ul>
  14. 14. Routing Types of Load Balancers <ul><li>Round Robin
  15. 15. Static
  16. 16. Least Connections
  17. 17. Source
  18. 18. IP
  19. 19. Basic Authentication </li></ul><ul><li>URI
  20. 20. URI Parameter
  21. 21. Header
  22. 22. Cookie
  23. 23. Regular Expression </li></ul>
  24. 24. Open Source Software Options <ul><li>Out of the many options we will focus in on 3 </li><ul><li>HAProxy – By and large one of the most popular.
  25. 25. Pound – Said to be great for medium traffic sites.
  26. 26. Varnish – A caching solution that also does load balancing </li></ul></ul>
  27. 27. HAProxy <ul><li>Pros </li><ul><li>Extremely full featured
  28. 28. Very well known
  29. 29. Handles just about every type of routing
  30. 30. Several examples online
  31. 31. Has a web-based GUI </li></ul><li>Cons </li><ul><li>No native SSL support (use Stunnel)
  32. 32. Setup can be complex and take a lot of time </li></ul></ul>
  33. 33. Sample HAProxy Configuration global log local0 log local1 notice maxconn 4096 user haproxy group haproxy daemon defaults log global mode http option httplog option dontlognull retries 3 option redispatch maxconn 2000 contimeout 5000 clitimeout 50000 srvtimeout 50000 listen localhost option httpchk GET / balance roundrobin cookie SERVERID server serv1 check inter 2000 rise 2 fall 5 server serv2 check inter 2000 rise 2 fall 5 option httpclose stats enable stats uri /lb?stats stats realm haproxy stats auth test:test
  34. 34. Pound <ul><li>Pros </li><ul><li>chroot support
  35. 35. Native SSL support
  36. 36. Insanely simple setup
  37. 37. Supports virtually all types of routing
  38. 38. Many online tutorials </li></ul><li>Cons </li><ul><li>No native SSL support (use Stunnel)
  39. 39. Setup can be complex and take a lot of time </li></ul></ul>
  40. 40. Sample Pound Configuration User &quot;www-data&quot; Group &quot;www-data&quot; LogLevel 1 Alive 30 Control &quot;/var/run/pound/poundctl.socket&quot; ListenHTTP Address Port 80 xHTTP 0 Service BackEnd Address Port 8080 End BackEnd Address Port 8080 End End End
  41. 41. Varnish <ul><li>Pros </li><ul><li>Supports front-end caching
  42. 42. Farily simple setup
  43. 43. Extremely well known
  44. 44. Many online tutorials
  45. 45. Large suite of tools (varnishstat, varnishtop, varnishlog, varnishreplay, varnishncsa) </li></ul><li>Cons </li><ul><li>No native SSL support (use Pound or Stunnel)
  46. 46. If you want a WebGUI you must PAY </li></ul></ul>
  47. 47. Sample Varnish Configuration backend default1 { .host = &quot;;; .port = &quot;8080&quot;; .probe = { .url = &quot;/&quot;; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } backend default2 { .host = &quot;;; .port = &quot;8080&quot;; .probe = { .url = &quot;/&quot;; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } director default round-robin { { .backend = default1; } { .backend = default2; } } sub vcl_recv { if ( ~ &quot;^$&quot;) { set req.backend = default; } }
  48. 48. What We Need to Remember <ul><li>Web Servers </li><ul><li>One always needs to be available
  49. 49. Don't use SSL on the web server level! </li></ul><li>Headers </li><ul><li>Pass headers if SSL is on or not
  50. 50. Client IP is likely on X-forwarded-for
  51. 51. If using Virtual Hosts pass the Host </li></ul><li>Sessions </li><ul><li>Need a solution if not using sticky routing </li></ul></ul>
  52. 52. Web Servers
  53. 53. Several Options <ul><li>Apache
  54. 54. IIS
  55. 55. Nginx
  56. 56. Lighttpd
  57. 57. etc. </li></ul>
  58. 58. Configuration <ul><li>Sever name should be the same on all servers </li><ul><li>Make a server alias so you can reach individual servers w/o load balancing </li></ul><li>Each configuration SHOULD or MUST be the same.
  59. 59. Client IP will likely be in X-forwarded-for.
  60. 60. SSL will not be in $_SERVER['HTTPS'] and HTTP_ header instead. </li></ul>
  61. 61. What We Need to Remember <ul><li>Files </li><ul><li>All web servers need our files.
  62. 62. Static content could be tagged in version control.
  63. 63. Static content may need a file server / CDN / etc.
  64. 64. User Generated content on NFS mount or served from the cloud or a CDN. </li></ul><li>Sessions </li><ul><li>All web servers need access to our sessions.
  65. 65. Remember disk is slow and the database will be a bottleneck. How about distributed caching? </li></ul></ul>
  66. 66. Other Thoughts <ul><li>Running PHP on your web server may be a resource hog, you may want to offload static content requests to nginx, lighttpd or some other lightweight web server. </li><ul><li>Running a proxy to your main web servers works great for hardworking processes. While serving static content from the lightweight server. </li></ul></ul>
  67. 67. Database Servers
  68. 68. Where We All Start <ul>Single Database Server <li>Lots of options and steps as we move forward. </li></ul>
  69. 69. Replication <ul>Single Master, Single Slave <li>Write code that can write to the master and read from the slave. </li><ul><li>Exception: Be smart, don't write to the master and read from the slave on the table you just wrote to. </li></ul></ul>
  70. 70. Multiple Slaves <ul>Single Master, Multiple Slaves <li>It is a great time to start to implement connection pooling. </li></ul>
  71. 71. Multiple Masters <ul>Multiple Master, Multiple Slaves <li>Do NOT write to both masters at once with MySQL!
  72. 72. Be warned, auto-incrementing now should change so you do not conflict. </li></ul>
  73. 73. Partitioning <ul>Segmenting your Data <li>Vertical Partitioning </li><ul><li>Move less accessed columns, large data columns and columns not likely in the where to other tables. </li></ul><li>Horizontal Partitioning </li><ul><li>Done by moving rows into different tables. </li><ul><li>Based on Range, Date, User or Interlaced </li></ul></ul></ul>
  74. 74. What We Need to Remember <ul><li>Replication </li><ul><li>There may be a lag!
  75. 75. All reports / read queries should go here
  76. 76. Don't read here directly after a write </li><ul><li>Transactions / Lag / etc. </li></ul></ul><li>Sessions </li><ul><li>Never store sessions in the DB </li><ul><li>Large binlogs, garbage collection causes slow queries, queue may fill up and cause a crash or max connections. </li></ul></ul></ul>
  77. 77. Cache Servers (not full page)
  78. 78. Caching <ul>“ Caching is imperative in scaling and performance” <ul><li>Single Server </li><ul><li>Shared Memory: APC / Xcache / etc
  79. 79. File Based: Files / Sqlite / etc
  80. 80. Not highly scalable, great for configuration files. </li></ul><li>Distributed </li><ul><li>Memcached, Redis, etc.
  81. 81. Setup consistent hashing. </li></ul></ul><li>Do not cache what cannot be re-created. </li></ul>
  82. 82. Caching <ul>In The Beginning <li>Single Caching Server
  83. 83. Start to cache fetches, invalidate cache on write and write new cache, always reading from the cache. </li></ul>
  84. 84. Distributed Caching <ul>Distributed Mania <li>Write based on consistent hashing (hash of a key that you are writing)
  85. 85. Server depends on the hash.
  86. 86. Hint – use the memcached pecl extension. </li></ul>
  87. 87. The Read / Write Process <ul>In the most simple form... </ul>
  88. 88. What We Need to Remember <ul><li>Replicated or not...
  89. 89. Elasticity </li><ul><li>Consistent hashing – cannot add or remove w/o losing data </li></ul><li>Sessions </li><ul><li>Store me here... please please please! </li></ul><li>Memory Caches </li><ul><li>Durability - If it fails, it's gone!
  90. 90. Ensure dedicated memory!
  91. 91. If you run out of memory, does it remove an old and add the new or not allow anything to come in? </li></ul></ul>
  92. 92. Job Servers
  93. 93. “ Message queues and mailboxes are software-engineering components used for interprocess communication, or for inter-thread communication within the same process. They use a queue for messaging – the passing of control or of content.”
  94. 94. Messages are Everywhere
  95. 95. What are Message Queues <ul><li>A FIFO buffer
  96. 96. Asynchronous push / pull
  97. 97. An application framework for sending and receiving messages.
  98. 98. A way to communicate between applications / systems.
  99. 99. A way to decouple components.
  100. 100. A way to offload work. </li></ul>
  101. 101. Where We All Start <ul>Single Job Server <li>Lots of options and steps as we move forward. </li></ul>Queue Receive Producer Message Queue Server Consumer
  102. 102. Distributed Job Servers <ul>Distributed Mania <li>Load balance a message queue for scale
  103. 103. Can continue to create more workers </li></ul>Producer Message Queue Server Consumer Consumer Consumer Consumer Consumer Message Queue Server Message Queue Server Producer Producer
  104. 104. Why are Message Queues Useful? <ul><li>Asynchronous Processing
  105. 105. Communication between Applications / Systems
  106. 106. Image Resizing
  107. 107. Video Processing
  108. 108. Sending out Emails
  109. 109. Auto-Scaling Virtual Instances
  110. 110. Log Analysis
  111. 111. The list goes on... </li></ul>
  112. 112. What We Need to Remember <ul><li>Replication or not?
  113. 113. You need to keep your workers running </li><ul><li>Supervisord or monit or some other monitoring... </li></ul><li>Don't offload things just to offload </li><ul><li>If it needs to be real-time and not near real-time this is not a good place for things – however, your boss does not need to know :) </li></ul></ul>
  114. 114. DNS Servers
  115. 115. What to do <ul><li>Just about every domain registrar runs DNS </li><ul><li>DO NOT RUN YOUR OWN! </li></ul><li>Anycast DNS </li><ul><li>Anycast is a network addressing and routing scheme whereby data is routed to the &quot;nearest&quot; or &quot;best&quot; destination as viewed by the routing topology.
  116. 116. It's sexy, it's sweet and it is FAST!
  117. 117. A “cheaper” provider is DNS Made Easy. </li><ul><li>Yes the interface is ugly. </li></ul></ul></ul>
  118. 118. What to look for... <ul><li>Wildcard support
  119. 119. Failover / Distributed
  120. 120. CNAME support
  121. 121. TXT support
  122. 122. Name Server support </li></ul>
  123. 123. CDN Servers
  124. 124. Why Use a CDN <ul><li>Free your bandwidth
  125. 125. Free your server from serving basic files
  126. 126. Distributed servers around the globe </li></ul>
  127. 127. What you need to know <ul><li>Origin Pull </li><ul><li>Utilizes your own web server and pulls the content and stores it in their nodes. </li></ul><li>PoP Pull </li><ul><li>You upload the content to something like S3 and it has a CDN on the top of it like CloudFront. </li></ul></ul>
  128. 128. What's the best? <ul><li>Depends on your need...
  129. 129. Origin Pull is great if you want to maintain all of the content in your web server.
  130. 130. PoP Push is great for storing things like user generated content. </li></ul>
  131. 131. Front-End Performance
  132. 132. Discussion Points <ul><li>Tactics </li><ul><li>Minification (JavaScript / CSS)
  133. 133. CSS Sprites
  134. 134. GZIP
  135. 135. Cookies are evil
  136. 136. Parallel downloads (using subdomains for serving
  137. 137. HTTP Expires </li></ul></ul>
  138. 138. Discussion Points <ul><li>Tools </li><ul><li>Yslow
  139. 139. Firebug
  140. 140. Google Page Speed
  141. 141. Google Webmaster Tools </li></ul></ul>
  142. 142. Mike Willbanks Blog: Twitter : mwillbanks IRC : lubs on freenode Questions?