MNPHP Scalable Architecture 101 - Feb 3 2011

  • 1,300 views
Uploaded on

An overall presentation on scaling out your system starting from a single server and many of the several options you may face.

An overall presentation on scaling out your system starting from a single server and many of the several options you may face.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,300
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
21
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Mike Willbanks Blog: http://blog.digitalstruct.com Twitter : mwillbanks IRC : lubs on freenode Scalable Architectures 101 MNPHP Feb 3, 2011
  • 2. Scalability?
      Your application is growing, your systems are slowing and growth is inevitable...
    • Where do we go from here?
      • Load Balancing
      • 3. Web Servers
      • 4. Database Servers
      • 5. Cache Servers
      • Job Servers
      • 6. DNS Servers
      • 7. CDN Servers
      • 8. Front-End Performance
  • 9. The Beginning...
      Single Server Syndrome
    • One Server Many Functions
      • Web Server, Database Server, Cache Server, Job Server, DNS Server, Mail Server....
    • How we know it's time
      • iostat, cpu load, overall degradation
  • 10. The Next Step...
      Single Separation Syndrome
    • Separation of Web and Database
      • Fix the main disk I/O bottleneck.
    • However, we can't handle our current I/O, CPU or amount of requests on our web server.
  • 11. Load Balancing
  • 12. Load Balancing Our Environment
  • 13. Several Options
    • DNS Rotation (Little to No Cost)
      • Not very reliable, but works on a small scale.
    • Software Based (Commodity Server Cost)
      • HAProxy, Pound, Varnish, Squid, Wackamole, Perlbal, Web Server Proxy...
    • Hardware Based (High Cost Appliance)
      • Several vendors ranging based on need.
        • A10, F5, etc.
  • 14. Routing Types of Load Balancers
  • 24. Open Source Software Options
    • Out of the many options we will focus in on 3
      • HAProxy – By and large one of the most popular.
      • 25. Pound – Said to be great for medium traffic sites.
      • 26. Varnish – A caching solution that also does load balancing
  • 27. HAProxy
    • Pros
      • Extremely full featured
      • 28. Very well known
      • 29. Handles just about every type of routing
      • 30. Several examples online
      • 31. Has a web-based GUI
    • Cons
      • No native SSL support (use Stunnel)
      • 32. Setup can be complex and take a lot of time
  • 33. Sample HAProxy Configuration global log 127.0.0.1 local0 log 127.0.0.1 local1 notice maxconn 4096 user haproxy group haproxy daemon defaults log global mode http option httplog option dontlognull retries 3 option redispatch maxconn 2000 contimeout 5000 clitimeout 50000 srvtimeout 50000 listen localhost 0.0.0.0:80 option httpchk GET / balance roundrobin cookie SERVERID server serv1 0.0.0.0:8080 check inter 2000 rise 2 fall 5 server serv2 0.0.0.0:8080 check inter 2000 rise 2 fall 5 option httpclose stats enable stats uri /lb?stats stats realm haproxy stats auth test:test
  • 34. Pound
    • Pros
      • chroot support
      • 35. Native SSL support
      • 36. Insanely simple setup
      • 37. Supports virtually all types of routing
      • 38. Many online tutorials
    • Cons
      • No native SSL support (use Stunnel)
      • 39. Setup can be complex and take a lot of time
  • 40. Sample Pound Configuration User "www-data" Group "www-data" LogLevel 1 Alive 30 Control "/var/run/pound/poundctl.socket" ListenHTTP Address 127.0.0.1 Port 80 xHTTP 0 Service BackEnd Address 127.0.0.1 Port 8080 End BackEnd Address 127.0.0.1 Port 8080 End End End
  • 41. Varnish
    • Pros
      • Supports front-end caching
      • 42. Farily simple setup
      • 43. Extremely well known
      • 44. Many online tutorials
      • 45. Large suite of tools (varnishstat, varnishtop, varnishlog, varnishreplay, varnishncsa)
    • Cons
      • No native SSL support (use Pound or Stunnel)
      • 46. If you want a WebGUI you must PAY
  • 47. Sample Varnish Configuration backend default1 { .host = "127.0.0.1"; .port = "8080"; .probe = { .url = "/"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } backend default2 { .host = "127.0.0.1"; .port = "8080"; .probe = { .url = "/"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } director default round-robin { { .backend = default1; } { .backend = default2; } } sub vcl_recv { if (req.http.host ~ "^127.0.0.1$") { set req.backend = default; } }
  • 48. What We Need to Remember
    • Web Servers
      • One always needs to be available
      • 49. Don't use SSL on the web server level!
    • Headers
      • Pass headers if SSL is on or not
      • 50. Client IP is likely on X-forwarded-for
      • 51. If using Virtual Hosts pass the Host
    • Sessions
      • Need a solution if not using sticky routing
  • 52. Web Servers
  • 53. Several Options
  • 58. Configuration
    • Sever name should be the same on all servers
      • Make a server alias so you can reach individual servers w/o load balancing
    • Each configuration SHOULD or MUST be the same.
    • 59. Client IP will likely be in X-forwarded-for.
    • 60. SSL will not be in $_SERVER['HTTPS'] and HTTP_ header instead.
  • 61. What We Need to Remember
    • Files
      • All web servers need our files.
      • 62. Static content could be tagged in version control.
      • 63. Static content may need a file server / CDN / etc.
      • 64. User Generated content on NFS mount or served from the cloud or a CDN.
    • Sessions
      • All web servers need access to our sessions.
      • 65. Remember disk is slow and the database will be a bottleneck. How about distributed caching?
  • 66. Other Thoughts
    • Running PHP on your web server may be a resource hog, you may want to offload static content requests to nginx, lighttpd or some other lightweight web server.
      • Running a proxy to your main web servers works great for hardworking processes. While serving static content from the lightweight server.
  • 67. Database Servers
  • 68. Where We All Start
      Single Database Server
    • Lots of options and steps as we move forward.
  • 69. Replication
      Single Master, Single Slave
    • Write code that can write to the master and read from the slave.
      • Exception: Be smart, don't write to the master and read from the slave on the table you just wrote to.
  • 70. Multiple Slaves
      Single Master, Multiple Slaves
    • It is a great time to start to implement connection pooling.
  • 71. Multiple Masters
      Multiple Master, Multiple Slaves
    • Do NOT write to both masters at once with MySQL!
    • 72. Be warned, auto-incrementing now should change so you do not conflict.
  • 73. Partitioning
      Segmenting your Data
    • Vertical Partitioning
      • Move less accessed columns, large data columns and columns not likely in the where to other tables.
    • Horizontal Partitioning
      • Done by moving rows into different tables.
        • Based on Range, Date, User or Interlaced
  • 74. What We Need to Remember
    • Replication
      • There may be a lag!
      • 75. All reports / read queries should go here
      • 76. Don't read here directly after a write
        • Transactions / Lag / etc.
    • Sessions
      • Never store sessions in the DB
        • Large binlogs, garbage collection causes slow queries, queue may fill up and cause a crash or max connections.
  • 77. Cache Servers (not full page)
  • 78. Caching
      “ Caching is imperative in scaling and performance”
      • Single Server
        • Shared Memory: APC / Xcache / etc
        • 79. File Based: Files / Sqlite / etc
        • 80. Not highly scalable, great for configuration files.
      • Distributed
        • Memcached, Redis, etc.
        • 81. Setup consistent hashing.
    • Do not cache what cannot be re-created.
  • 82. Caching
      In The Beginning
    • Single Caching Server
    • 83. Start to cache fetches, invalidate cache on write and write new cache, always reading from the cache.
  • 84. Distributed Caching
      Distributed Mania
    • Write based on consistent hashing (hash of a key that you are writing)
    • 85. Server depends on the hash.
    • 86. Hint – use the memcached pecl extension.
  • 87. The Read / Write Process
      In the most simple form...
  • 88. What We Need to Remember
    • Replicated or not...
    • 89. Elasticity
      • Consistent hashing – cannot add or remove w/o losing data
    • Sessions
      • Store me here... please please please!
    • Memory Caches
      • Durability - If it fails, it's gone!
      • 90. Ensure dedicated memory!
      • 91. If you run out of memory, does it remove an old and add the new or not allow anything to come in?
  • 92. Job Servers
  • 93. “ Message queues and mailboxes are software-engineering components used for interprocess communication, or for inter-thread communication within the same process. They use a queue for messaging – the passing of control or of content.” http://en.wikipedia.org/wiki/Message_queue
  • 94. Messages are Everywhere
  • 95. What are Message Queues
    • A FIFO buffer
    • 96. Asynchronous push / pull
    • 97. An application framework for sending and receiving messages.
    • 98. A way to communicate between applications / systems.
    • 99. A way to decouple components.
    • 100. A way to offload work.
  • 101. Where We All Start
      Single Job Server
    • Lots of options and steps as we move forward.
    Queue Receive Producer Message Queue Server Consumer
  • 102. Distributed Job Servers
      Distributed Mania
    • Load balance a message queue for scale
    • 103. Can continue to create more workers
    Producer Message Queue Server Consumer Consumer Consumer Consumer Consumer Message Queue Server Message Queue Server Producer Producer
  • 104. Why are Message Queues Useful?
    • Asynchronous Processing
    • 105. Communication between Applications / Systems
    • 106. Image Resizing
    • 107. Video Processing
    • 108. Sending out Emails
    • 109. Auto-Scaling Virtual Instances
    • 110. Log Analysis
    • 111. The list goes on...
  • 112. What We Need to Remember
    • Replication or not?
    • 113. You need to keep your workers running
      • Supervisord or monit or some other monitoring...
    • Don't offload things just to offload
      • If it needs to be real-time and not near real-time this is not a good place for things – however, your boss does not need to know :)
  • 114. DNS Servers
  • 115. What to do
    • Just about every domain registrar runs DNS
      • DO NOT RUN YOUR OWN!
    • Anycast DNS
      • Anycast is a network addressing and routing scheme whereby data is routed to the "nearest" or "best" destination as viewed by the routing topology.
      • 116. It's sexy, it's sweet and it is FAST!
      • 117. A “cheaper” provider is DNS Made Easy.
        • Yes the interface is ugly.
  • 118. What to look for...
  • 123. CDN Servers
  • 124. Why Use a CDN
    • Free your bandwidth
    • 125. Free your server from serving basic files
    • 126. Distributed servers around the globe
  • 127. What you need to know
    • Origin Pull
      • Utilizes your own web server and pulls the content and stores it in their nodes.
    • PoP Pull
      • You upload the content to something like S3 and it has a CDN on the top of it like CloudFront.
  • 128. What's the best?
    • Depends on your need...
    • 129. Origin Pull is great if you want to maintain all of the content in your web server.
    • 130. PoP Push is great for storing things like user generated content.
  • 131. Front-End Performance
  • 132. Discussion Points
    • Tactics
      • Minification (JavaScript / CSS)
      • 133. CSS Sprites
      • 134. GZIP
      • 135. Cookies are evil
      • 136. Parallel downloads (using subdomains for serving
      • 137. HTTP Expires
  • 138. Discussion Points
  • 142. Mike Willbanks Blog: http://blog.digitalstruct.com Twitter : mwillbanks IRC : lubs on freenode Questions?