MNPHP Scalable Architecture 101 - Feb 3 2011
Upcoming SlideShare
Loading in...5

MNPHP Scalable Architecture 101 - Feb 3 2011



An overall presentation on scaling out your system starting from a single server and many of the several options you may face.

An overall presentation on scaling out your system starting from a single server and many of the several options you may face.



Total Views
Slideshare-icon Views on SlideShare
Embed Views



3 Embeds 9 6 2 1



Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    MNPHP Scalable Architecture 101 - Feb 3 2011 MNPHP Scalable Architecture 101 - Feb 3 2011 Presentation Transcript

    • Mike Willbanks Blog: Twitter : mwillbanks IRC : lubs on freenode Scalable Architectures 101 MNPHP Feb 3, 2011
    • Scalability?
        Your application is growing, your systems are slowing and growth is inevitable...
      • Where do we go from here?
        • Load Balancing
        • Web Servers
        • Database Servers
        • Cache Servers
        • Job Servers
        • DNS Servers
        • CDN Servers
        • Front-End Performance
    • The Beginning...
        Single Server Syndrome
      • One Server Many Functions
        • Web Server, Database Server, Cache Server, Job Server, DNS Server, Mail Server....
      • How we know it's time
        • iostat, cpu load, overall degradation
    • The Next Step...
        Single Separation Syndrome
      • Separation of Web and Database
        • Fix the main disk I/O bottleneck.
      • However, we can't handle our current I/O, CPU or amount of requests on our web server.
    • Load Balancing
    • Load Balancing Our Environment
    • Several Options
      • DNS Rotation (Little to No Cost)
        • Not very reliable, but works on a small scale.
      • Software Based (Commodity Server Cost)
        • HAProxy, Pound, Varnish, Squid, Wackamole, Perlbal, Web Server Proxy...
      • Hardware Based (High Cost Appliance)
        • Several vendors ranging based on need.
          • A10, F5, etc.
    • Routing Types of Load Balancers
      • Round Robin
      • Static
      • Least Connections
      • Source
      • IP
      • Basic Authentication
      • URI
      • URI Parameter
      • Header
      • Cookie
      • Regular Expression
    • Open Source Software Options
      • Out of the many options we will focus in on 3
        • HAProxy – By and large one of the most popular.
        • Pound – Said to be great for medium traffic sites.
        • Varnish – A caching solution that also does load balancing
    • HAProxy
      • Pros
        • Extremely full featured
        • Very well known
        • Handles just about every type of routing
        • Several examples online
        • Has a web-based GUI
      • Cons
        • No native SSL support (use Stunnel)
        • Setup can be complex and take a lot of time
    • Sample HAProxy Configuration global log local0 log local1 notice maxconn 4096 user haproxy group haproxy daemon defaults log global mode http option httplog option dontlognull retries 3 option redispatch maxconn 2000 contimeout 5000 clitimeout 50000 srvtimeout 50000 listen localhost option httpchk GET / balance roundrobin cookie SERVERID server serv1 check inter 2000 rise 2 fall 5 server serv2 check inter 2000 rise 2 fall 5 option httpclose stats enable stats uri /lb?stats stats realm haproxy stats auth test:test
    • Pound
      • Pros
        • chroot support
        • Native SSL support
        • Insanely simple setup
        • Supports virtually all types of routing
        • Many online tutorials
      • Cons
        • No native SSL support (use Stunnel)
        • Setup can be complex and take a lot of time
    • Sample Pound Configuration User "www-data" Group "www-data" LogLevel 1 Alive 30 Control "/var/run/pound/poundctl.socket" ListenHTTP Address Port 80 xHTTP 0 Service BackEnd Address Port 8080 End BackEnd Address Port 8080 End End End
    • Varnish
      • Pros
        • Supports front-end caching
        • Farily simple setup
        • Extremely well known
        • Many online tutorials
        • Large suite of tools (varnishstat, varnishtop, varnishlog, varnishreplay, varnishncsa)
      • Cons
        • No native SSL support (use Pound or Stunnel)
        • If you want a WebGUI you must PAY
    • Sample Varnish Configuration backend default1 { .host = ""; .port = "8080"; .probe = { .url = "/"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } backend default2 { .host = ""; .port = "8080"; .probe = { .url = "/"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } director default round-robin { { .backend = default1; } { .backend = default2; } } sub vcl_recv { if ( ~ "^$") { set req.backend = default; } }
    • What We Need to Remember
      • Web Servers
        • One always needs to be available
        • Don't use SSL on the web server level!
      • Headers
        • Pass headers if SSL is on or not
        • Client IP is likely on X-forwarded-for
        • If using Virtual Hosts pass the Host
      • Sessions
        • Need a solution if not using sticky routing
    • Web Servers
    • Several Options
      • Apache
      • IIS
      • Nginx
      • Lighttpd
      • etc.
    • Configuration
      • Sever name should be the same on all servers
        • Make a server alias so you can reach individual servers w/o load balancing
      • Each configuration SHOULD or MUST be the same.
      • Client IP will likely be in X-forwarded-for.
      • SSL will not be in $_SERVER['HTTPS'] and HTTP_ header instead.
    • What We Need to Remember
      • Files
        • All web servers need our files.
        • Static content could be tagged in version control.
        • Static content may need a file server / CDN / etc.
        • User Generated content on NFS mount or served from the cloud or a CDN.
      • Sessions
        • All web servers need access to our sessions.
        • Remember disk is slow and the database will be a bottleneck. How about distributed caching?
    • Other Thoughts
      • Running PHP on your web server may be a resource hog, you may want to offload static content requests to nginx, lighttpd or some other lightweight web server.
        • Running a proxy to your main web servers works great for hardworking processes. While serving static content from the lightweight server.
    • Database Servers
    • Where We All Start
        Single Database Server
      • Lots of options and steps as we move forward.
    • Replication
        Single Master, Single Slave
      • Write code that can write to the master and read from the slave.
        • Exception: Be smart, don't write to the master and read from the slave on the table you just wrote to.
    • Multiple Slaves
        Single Master, Multiple Slaves
      • It is a great time to start to implement connection pooling.
    • Multiple Masters
        Multiple Master, Multiple Slaves
      • Do NOT write to both masters at once with MySQL!
      • Be warned, auto-incrementing now should change so you do not conflict.
    • Partitioning
        Segmenting your Data
      • Vertical Partitioning
        • Move less accessed columns, large data columns and columns not likely in the where to other tables.
      • Horizontal Partitioning
        • Done by moving rows into different tables.
          • Based on Range, Date, User or Interlaced
    • What We Need to Remember
      • Replication
        • There may be a lag!
        • All reports / read queries should go here
        • Don't read here directly after a write
          • Transactions / Lag / etc.
      • Sessions
        • Never store sessions in the DB
          • Large binlogs, garbage collection causes slow queries, queue may fill up and cause a crash or max connections.
    • Cache Servers (not full page)
    • Caching
        “ Caching is imperative in scaling and performance”
        • Single Server
          • Shared Memory: APC / Xcache / etc
          • File Based: Files / Sqlite / etc
          • Not highly scalable, great for configuration files.
        • Distributed
          • Memcached, Redis, etc.
          • Setup consistent hashing.
      • Do not cache what cannot be re-created.
    • Caching
        In The Beginning
      • Single Caching Server
      • Start to cache fetches, invalidate cache on write and write new cache, always reading from the cache.
    • Distributed Caching
        Distributed Mania
      • Write based on consistent hashing (hash of a key that you are writing)
      • Server depends on the hash.
      • Hint – use the memcached pecl extension.
    • The Read / Write Process
        In the most simple form...
    • What We Need to Remember
      • Replicated or not...
      • Elasticity
        • Consistent hashing – cannot add or remove w/o losing data
      • Sessions
        • Store me here... please please please!
      • Memory Caches
        • Durability - If it fails, it's gone!
        • Ensure dedicated memory!
        • If you run out of memory, does it remove an old and add the new or not allow anything to come in?
    • Job Servers
    • “ Message queues and mailboxes are software-engineering components used for interprocess communication, or for inter-thread communication within the same process. They use a queue for messaging – the passing of control or of content.”
    • Messages are Everywhere
    • What are Message Queues
      • A FIFO buffer
      • Asynchronous push / pull
      • An application framework for sending and receiving messages.
      • A way to communicate between applications / systems.
      • A way to decouple components.
      • A way to offload work.
    • Where We All Start
        Single Job Server
      • Lots of options and steps as we move forward.
      Queue Receive Producer Message Queue Server Consumer
    • Distributed Job Servers
        Distributed Mania
      • Load balance a message queue for scale
      • Can continue to create more workers
      Producer Message Queue Server Consumer Consumer Consumer Consumer Consumer Message Queue Server Message Queue Server Producer Producer
    • Why are Message Queues Useful?
      • Asynchronous Processing
      • Communication between Applications / Systems
      • Image Resizing
      • Video Processing
      • Sending out Emails
      • Auto-Scaling Virtual Instances
      • Log Analysis
      • The list goes on...
    • What We Need to Remember
      • Replication or not?
      • You need to keep your workers running
        • Supervisord or monit or some other monitoring...
      • Don't offload things just to offload
        • If it needs to be real-time and not near real-time this is not a good place for things – however, your boss does not need to know :)
    • DNS Servers
    • What to do
      • Just about every domain registrar runs DNS
        • DO NOT RUN YOUR OWN!
      • Anycast DNS
        • Anycast is a network addressing and routing scheme whereby data is routed to the "nearest" or "best" destination as viewed by the routing topology.
        • It's sexy, it's sweet and it is FAST!
        • A “cheaper” provider is DNS Made Easy.
          • Yes the interface is ugly.
    • What to look for...
      • Wildcard support
      • Failover / Distributed
      • CNAME support
      • TXT support
      • Name Server support
    • CDN Servers
    • Why Use a CDN
      • Free your bandwidth
      • Free your server from serving basic files
      • Distributed servers around the globe
    • What you need to know
      • Origin Pull
        • Utilizes your own web server and pulls the content and stores it in their nodes.
      • PoP Pull
        • You upload the content to something like S3 and it has a CDN on the top of it like CloudFront.
    • What's the best?
      • Depends on your need...
      • Origin Pull is great if you want to maintain all of the content in your web server.
      • PoP Push is great for storing things like user generated content.
    • Front-End Performance
    • Discussion Points
      • Tactics
        • Minification (JavaScript / CSS)
        • CSS Sprites
        • GZIP
        • Cookies are evil
        • Parallel downloads (using subdomains for serving
        • HTTP Expires
    • Discussion Points
      • Tools
        • Yslow
        • Firebug
        • Google Page Speed
        • Google Webmaster Tools
    • Mike Willbanks Blog: Twitter : mwillbanks IRC : lubs on freenode Questions?