I ❤ HAProxy
National Airspace System - FAA
Simplified Web Architecture
             Web
Clients                Dynamic    “Data”
            Server

                         PHP      Memcache
 iPhones     Nginx
                                  PostgreSQL
                         Ruby
                                   MySQL
 Androids   Apache       Perl      Mongo
                                  CouchDB
                        Python
                                    Redis
 Browsers   lighttpd
                        Node.js    Oracle
ChOP Archtiecture
            Web
Clients             Dynamic     “Data”
           Server

                                Memcache
 iPhone

                                 MySQL
 Android    Nginx    PHP5-FPM
                                  Redis

 Desktop
                                  Chat
YouVersion Architecture
            Web
Clients             Dynamic     “Data”
           Server

                                Memcache
 iPhone
                     PHP5-FPM
                                PostgreSQL
 Android    Nginx
                                 Mongo
                       Ruby
                     (coming
 Desktop               soon)
                                 Oracle
HAProxy
¡  High Availability Proxy

¡  TCP load balancing proxy with awesome health
    checking built in

¡  Fast

¡  Scalable

¡  Makes non-HA services HA
How I Love Thee, Let Me
Count The Ways…
¡  Rock solid

¡  Dead simple to run and configure

¡  Comprehensive Health Checking

¡  Lots of statistics
HAProxy Uses
¡  Not really a service unto itself

¡  Fits into the gaps between layers well

¡  Issue: Becomes a single point of failure itself


      HAProxy        HAProxy*          HAProxy*


                 Web         Dynamic
Clients                                      “Data”
                Server        Engine

                              * – potential future use
Eliminating SPOFs
¡  Two types of HAProxy SPOFs:
 ¡  Service Outage
     (Hardware failure or HAProxy service failure)
 ¡  HAProxy Limit Outage / Upstream Outage
     (Hit some arbitrary limit we defined somewhere or
     ran out of some slots somewhere)
Service Outage
¡  HAProxy service crashes or dies for some reason
    (has never happened, knock on wood)

¡  Hardware / Network Failure
Service Outage: Solution
¡  Corosync & Pacemaker

¡  Hard to configure at first, but don’t really need to
    touch it later

¡  Pretty much magic

¡  Two Corosync HAProxy clusters: DFW and SAN

¡  Setup is blogged about here:
    http://itand.me/41901523
HAProxy Limit Outage /
Upstream Outage
¡  Usually because of an outage further upstream
    at the Dynamic or “Data” layer

¡  Completely Hypothetical Situation: Mongo slows
    down, causing PHP processes to back up,
    causing the connection limit to go through the
    roof, causing total outage
What it looks like on the
graph (Yesterday)




OR: WHY WE MUST MOVE MONGO STAT!
For ChOP (Chat), it’s a little
different…
Upstream Outage
¡  Usually the result of running out of PHP processes.

¡  Normally each PHP process can process
    hundreds of req/s

¡  Something slows them down (mongo, postgres,
    et al) so a process can only process a smaller
    number of req / s (or, worse, seconds / req)

¡  Inevitably, these requests take all PHP processes,
    nothing else can run and HAProxy fails all health
    checks and shows you Binary Jesus
“Solutions”
¡  Start Hashing URLs to avoid upstream failures
 ¡  Want to send all URL requests to the same app server
     so if it’s slow only that app server goes down
 ¡  Some benefit to caching as well
 ¡  Challenge: want to hash only part of a URL
 ¡  Challenge: need to separate app servers into
     “availability groups”
 ¡  Challenge: deployments, monitoring, alerting, all
     that crap…
HAProxy Limit Outage
¡  We set limits on all HAProxy backends and front
    ends and servers to ensure they don’t get
    overwhelmed

¡  Sometimes these limits are too low

¡  Solution: Raise them

¡  Challenge: Raise them too high without regard
    for the backend, and you could cause more
    harm than good (Stampeding Herd)
Q&A

HAProxy tech talk

  • 1.
  • 2.
  • 4.
    Simplified Web Architecture Web Clients Dynamic “Data” Server PHP Memcache iPhones Nginx PostgreSQL Ruby MySQL Androids Apache Perl Mongo CouchDB Python Redis Browsers lighttpd Node.js Oracle
  • 5.
    ChOP Archtiecture Web Clients Dynamic “Data” Server Memcache iPhone MySQL Android Nginx PHP5-FPM Redis Desktop Chat
  • 6.
    YouVersion Architecture Web Clients Dynamic “Data” Server Memcache iPhone PHP5-FPM PostgreSQL Android Nginx Mongo Ruby (coming Desktop soon) Oracle
  • 7.
    HAProxy ¡  High AvailabilityProxy ¡  TCP load balancing proxy with awesome health checking built in ¡  Fast ¡  Scalable ¡  Makes non-HA services HA
  • 8.
    How I LoveThee, Let Me Count The Ways… ¡  Rock solid ¡  Dead simple to run and configure ¡  Comprehensive Health Checking ¡  Lots of statistics
  • 10.
    HAProxy Uses ¡  Notreally a service unto itself ¡  Fits into the gaps between layers well ¡  Issue: Becomes a single point of failure itself HAProxy HAProxy* HAProxy* Web Dynamic Clients “Data” Server Engine * – potential future use
  • 11.
    Eliminating SPOFs ¡  Twotypes of HAProxy SPOFs: ¡  Service Outage (Hardware failure or HAProxy service failure) ¡  HAProxy Limit Outage / Upstream Outage (Hit some arbitrary limit we defined somewhere or ran out of some slots somewhere)
  • 12.
    Service Outage ¡  HAProxyservice crashes or dies for some reason (has never happened, knock on wood) ¡  Hardware / Network Failure
  • 13.
    Service Outage: Solution ¡ Corosync & Pacemaker ¡  Hard to configure at first, but don’t really need to touch it later ¡  Pretty much magic ¡  Two Corosync HAProxy clusters: DFW and SAN ¡  Setup is blogged about here: http://itand.me/41901523
  • 14.
    HAProxy Limit Outage/ Upstream Outage ¡  Usually because of an outage further upstream at the Dynamic or “Data” layer ¡  Completely Hypothetical Situation: Mongo slows down, causing PHP processes to back up, causing the connection limit to go through the roof, causing total outage
  • 15.
    What it lookslike on the graph (Yesterday) OR: WHY WE MUST MOVE MONGO STAT!
  • 16.
    For ChOP (Chat),it’s a little different…
  • 17.
    Upstream Outage ¡  Usuallythe result of running out of PHP processes. ¡  Normally each PHP process can process hundreds of req/s ¡  Something slows them down (mongo, postgres, et al) so a process can only process a smaller number of req / s (or, worse, seconds / req) ¡  Inevitably, these requests take all PHP processes, nothing else can run and HAProxy fails all health checks and shows you Binary Jesus
  • 18.
    “Solutions” ¡  Start HashingURLs to avoid upstream failures ¡  Want to send all URL requests to the same app server so if it’s slow only that app server goes down ¡  Some benefit to caching as well ¡  Challenge: want to hash only part of a URL ¡  Challenge: need to separate app servers into “availability groups” ¡  Challenge: deployments, monitoring, alerting, all that crap…
  • 19.
    HAProxy Limit Outage ¡ We set limits on all HAProxy backends and front ends and servers to ensure they don’t get overwhelmed ¡  Sometimes these limits are too low ¡  Solution: Raise them ¡  Challenge: Raise them too high without regard for the backend, and you could cause more harm than good (Stampeding Herd)
  • 20.