Scaling PHP
… in an “Enterprise” environment
(what’s that?)
Sarel van der Walt
- I’m from the Internet … a.k.a. afrihost.com
facebook.com/sarelvdwalt
twitter.com/sfvdwalt
– Dustin Whittle
“Scalability is about the entire architecture,
not some minor code optimisations”
Is PHP really the problem?
• Have you looked at your DB?
• Have you looked at your network?
• Have you looked at your assets?
That’s but to name a few!
versus…
Vertical = expensive
Horizontal = inexpensive,
but complex
Why Afrihost cared (suddenly!)
From this… …to this!
Our architecture, more/less (actually very high level)
haproxy haproxy
Internet
CZ4 CZ5 CZ6 CZ7 CZ8 CZ...
MySQL Master
MySQL Slave 0 MySQL Slave 24h
LogStash
$_SESSION
• File system is bad :: you can only use ONE server!
• Database (MySQL, Postgres) is ok :: vertical scaling needed
$_SESSION
• IN-Memory Cache is better - think memcached / redis
• You could also put it in mongodb, or casandra…

…session data nicely lend itself to serialisation
• Browser based is the BEST - but be careful of the 4k limit of older
browsers (and security)
IN-Memory Cache
• File-System Cache - old, but trusted - vertical
• memcached is better :: multiple servers through sharding
IN-Memory Cache [continued]
• redis is awesome (but more complicated) :: also has sharding

/* redis can do so much more */
… more redis (I love redis!)
#atomic
#fancy_replacing_your_auto_increments_with_this??
… more redis (I love redis!)
… more redis (I love redis!)
#wondering_about_message_queues
… more redis (I love redis!)
Background Work (non-blocking)
• 3rd Party Calls (tweeting, api-calls, emailing, etc)
• Expensive DB (or any other) work - especially unimportant ones
• Q: What’s “expensive” ?? 

A: 150+ milliseconds
• Ask yourself: Is it important for the next request?!
• Q: So what do I do?

A: Queue it!!

Q: How?

A: Why with REDIS, off course?! :)

… or something else… like MySQL… or not?
During billing run (25th to 2nd) Afrihost runs…
550 000 background CLI's - broken up over 253 scripts
Some of these actions, are called...
75k times
during a month, consuming...
9.7 days of server time !!!
> composer.phar install chrisboulton/php-resque
40million records added in 42 minutes
... that's 954k in a minute...
... that's 15906 in a second!! (MySQL took 34 seconds to do this)
... that's 35 seconds (rounded up, lol) to insert the entire billing-run to Afrihost Job Queue!!
MySQL didn’t stand a chance!!
(1177 seconds = 19.6 minutes!!)
HTTP Caching
• Are you caching assets on the browsers?
• Should they really download new CSS that often?
• How do you cache-bust?
• eTag much? No, it’s not this…
Reverse Proxy
• varnish - varnish-cache.org
• nginx - nginx.org
• haproxy - haproxy.org
• squid - squid-cache.org
• PS: Most Frameworks has reverse proxy built in!!
Reverse Proxy - haproxy
Database Optimisation
• SQL Slow Query Log - look at it regularly!
• Add slaves for read-scaling!
• Side-line: Disaster Recovery :: Slave 0-seconds + Slave 24-hour!
• Archive!! - makes reading AND writing faster
• Offload!! - perhaps relational DB isn’t the right place for data?
Database Optimisation [continued…]
id username in_bytes out_bytes created_at
123 sarel@afrihost.co.za 1024 2048 2014-07-15 10:01:15
124 dude@afrihost.co.za 971826 12823 2014-07-15 10:01:23
125 geek@afrihost.co.za 17263712 7126373 2014-07-15 10:01:58
126 27838127372@afrihost 12382 9563 2014-07-15 10:02:07
2009-06 :: 3110 records
2009-10 :: 1.7 million records
2010-10 :: 11.9 million records
2011-10 :: 17.4 million records
2012-10 :: 26.6 million records
2013-10 :: 59.6 million records
2014-06 :: 101.1 million records
Database Optimisation [continued…]
• A good strategy that worked for us was ztableYYYYMM!
• You could use any strategy you like, the basics of it:
• Select a bunch of rows
• Insert them into the applicable archive table
• Delete original
• Check out: https://github.com/sarelvdwalt/SFDatabaseArchiveBundle
Framework Optimisation
• Did you just install it and leave? #default_install!
• Go through your configs, switch things on and off for production
• PRO Hint :: Switch debugging off in production
• PRO Hint #2 :: Switch APC / OpCache on INSIDE your Framework (if
it has it) for things like Doctrine / ORM
Doing Essential work Up-Front
• Pre-cache data - hey, Redis / Casandra might be a good place to
put this?? :)
• Warm up your cache during rollouts
• Careful about rollout cache busting!
Move work to the client
• The math is simple: 100k clients times 4 cores = 400
000 CPU cores! Use them … use them ALL!!!!
• Seriously, render calculations belong on the
browser. CSS3, JavaScript
• Do you have an IF statement in your HTML
generation (on server)? Now you have two problems!
• Send data via API, with templates - can utilise CDN
for templates - SOA (Service Oriented
Architecture)
• AngularJS, BackboneJS, check out TodoMVC.com
Learn to debug… LIVE!
• xdebug + webgrind - this will hurt, it’s slow and eats CPU’s (and
HDD space)
• xhprof + xhprofGUI - fast, can run on production (best to keep it off
until you need it)
• Sometimes logging is good enough! Use LogStash for history and
trend… which brings me to monitoring…
Learn to debug… LIVE!
Have you tried:
Don’t lie… you echo’d
in your wetsuit, didn’t
you?
Perhaps better to wrap that…
Monitoring!
LogStash - aggregate logs from multiple servers into central location.
… let’s zoom and see WTF…
Monitoring… looking closer…
… hmmmm… what started around 18:08 on the 25th?
Monitoring… looking closer…
… hmmmm… what started around 18:08 on the 25th?
TURNS OUT FALSE ALARM, BUT STILL…
USEFUL!
Monitoring… DevOps Board…
Monitoring… DevOps Board…
SMS’s (Texts)…
… lots and lots of notifications!
ON-Call Much?
Accessibility
• How close are you to your servers?
• Do you NEED to log in to see logs?
• Catch-22 :: Developers should be able to access anything (or close
to) without having to log into the servers… but don’t deny them
access.
Side-line…
The most junior developer should do the rollouts!
Let’s look at something practical…
the Afrihost order form.
Each click, does:
1. HTTP request (hits apache, hits symfony)
2. Asks the DB for the next products in the category
3. Returns the value, and renders it
Let’s go look at it live: clientzone.afrihost.com/get-connected
Remember this?
Final thoughts…
… but sometimes it’s not so elementary.
Thank you.
Charging beer… 36%
github.com/sarelvdwalt/php-jhb
Php johannesburg   meetup - talk 2014 - scaling php in the enterprise

Php johannesburg meetup - talk 2014 - scaling php in the enterprise

  • 1.
    Scaling PHP … inan “Enterprise” environment (what’s that?) Sarel van der Walt - I’m from the Internet … a.k.a. afrihost.com facebook.com/sarelvdwalt twitter.com/sfvdwalt
  • 2.
    – Dustin Whittle “Scalabilityis about the entire architecture, not some minor code optimisations”
  • 3.
    Is PHP reallythe problem? • Have you looked at your DB? • Have you looked at your network? • Have you looked at your assets? That’s but to name a few!
  • 4.
  • 5.
    Why Afrihost cared(suddenly!) From this… …to this!
  • 6.
    Our architecture, more/less(actually very high level) haproxy haproxy Internet CZ4 CZ5 CZ6 CZ7 CZ8 CZ... MySQL Master MySQL Slave 0 MySQL Slave 24h LogStash
  • 7.
    $_SESSION • File systemis bad :: you can only use ONE server! • Database (MySQL, Postgres) is ok :: vertical scaling needed
  • 8.
    $_SESSION • IN-Memory Cacheis better - think memcached / redis • You could also put it in mongodb, or casandra…
 …session data nicely lend itself to serialisation • Browser based is the BEST - but be careful of the 4k limit of older browsers (and security)
  • 9.
    IN-Memory Cache • File-SystemCache - old, but trusted - vertical • memcached is better :: multiple servers through sharding
  • 10.
    IN-Memory Cache [continued] •redis is awesome (but more complicated) :: also has sharding
 /* redis can do so much more */
  • 11.
    … more redis(I love redis!) #atomic #fancy_replacing_your_auto_increments_with_this??
  • 12.
    … more redis(I love redis!)
  • 13.
    … more redis(I love redis!) #wondering_about_message_queues
  • 14.
    … more redis(I love redis!)
  • 15.
    Background Work (non-blocking) •3rd Party Calls (tweeting, api-calls, emailing, etc) • Expensive DB (or any other) work - especially unimportant ones • Q: What’s “expensive” ?? 
 A: 150+ milliseconds • Ask yourself: Is it important for the next request?! • Q: So what do I do?
 A: Queue it!!
 Q: How?
 A: Why with REDIS, off course?! :)
 … or something else… like MySQL… or not?
  • 16.
    During billing run(25th to 2nd) Afrihost runs… 550 000 background CLI's - broken up over 253 scripts Some of these actions, are called... 75k times during a month, consuming... 9.7 days of server time !!!
  • 17.
    > composer.phar installchrisboulton/php-resque
  • 18.
    40million records addedin 42 minutes ... that's 954k in a minute... ... that's 15906 in a second!! (MySQL took 34 seconds to do this) ... that's 35 seconds (rounded up, lol) to insert the entire billing-run to Afrihost Job Queue!! MySQL didn’t stand a chance!! (1177 seconds = 19.6 minutes!!)
  • 19.
    HTTP Caching • Areyou caching assets on the browsers? • Should they really download new CSS that often? • How do you cache-bust? • eTag much? No, it’s not this…
  • 20.
    Reverse Proxy • varnish- varnish-cache.org • nginx - nginx.org • haproxy - haproxy.org • squid - squid-cache.org • PS: Most Frameworks has reverse proxy built in!!
  • 21.
  • 22.
    Database Optimisation • SQLSlow Query Log - look at it regularly! • Add slaves for read-scaling! • Side-line: Disaster Recovery :: Slave 0-seconds + Slave 24-hour! • Archive!! - makes reading AND writing faster • Offload!! - perhaps relational DB isn’t the right place for data?
  • 23.
    Database Optimisation [continued…] idusername in_bytes out_bytes created_at 123 sarel@afrihost.co.za 1024 2048 2014-07-15 10:01:15 124 dude@afrihost.co.za 971826 12823 2014-07-15 10:01:23 125 geek@afrihost.co.za 17263712 7126373 2014-07-15 10:01:58 126 27838127372@afrihost 12382 9563 2014-07-15 10:02:07 2009-06 :: 3110 records 2009-10 :: 1.7 million records 2010-10 :: 11.9 million records 2011-10 :: 17.4 million records 2012-10 :: 26.6 million records 2013-10 :: 59.6 million records 2014-06 :: 101.1 million records
  • 24.
    Database Optimisation [continued…] •A good strategy that worked for us was ztableYYYYMM! • You could use any strategy you like, the basics of it: • Select a bunch of rows • Insert them into the applicable archive table • Delete original • Check out: https://github.com/sarelvdwalt/SFDatabaseArchiveBundle
  • 25.
    Framework Optimisation • Didyou just install it and leave? #default_install! • Go through your configs, switch things on and off for production • PRO Hint :: Switch debugging off in production • PRO Hint #2 :: Switch APC / OpCache on INSIDE your Framework (if it has it) for things like Doctrine / ORM
  • 26.
    Doing Essential workUp-Front • Pre-cache data - hey, Redis / Casandra might be a good place to put this?? :) • Warm up your cache during rollouts • Careful about rollout cache busting!
  • 27.
    Move work tothe client • The math is simple: 100k clients times 4 cores = 400 000 CPU cores! Use them … use them ALL!!!! • Seriously, render calculations belong on the browser. CSS3, JavaScript • Do you have an IF statement in your HTML generation (on server)? Now you have two problems! • Send data via API, with templates - can utilise CDN for templates - SOA (Service Oriented Architecture) • AngularJS, BackboneJS, check out TodoMVC.com
  • 28.
    Learn to debug…LIVE! • xdebug + webgrind - this will hurt, it’s slow and eats CPU’s (and HDD space) • xhprof + xhprofGUI - fast, can run on production (best to keep it off until you need it) • Sometimes logging is good enough! Use LogStash for history and trend… which brings me to monitoring…
  • 29.
    Learn to debug…LIVE! Have you tried: Don’t lie… you echo’d in your wetsuit, didn’t you? Perhaps better to wrap that…
  • 30.
    Monitoring! LogStash - aggregatelogs from multiple servers into central location. … let’s zoom and see WTF…
  • 31.
    Monitoring… looking closer… …hmmmm… what started around 18:08 on the 25th?
  • 32.
    Monitoring… looking closer… …hmmmm… what started around 18:08 on the 25th? TURNS OUT FALSE ALARM, BUT STILL… USEFUL!
  • 33.
  • 34.
    Monitoring… DevOps Board… SMS’s(Texts)… … lots and lots of notifications! ON-Call Much?
  • 35.
    Accessibility • How closeare you to your servers? • Do you NEED to log in to see logs? • Catch-22 :: Developers should be able to access anything (or close to) without having to log into the servers… but don’t deny them access.
  • 36.
    Side-line… The most juniordeveloper should do the rollouts!
  • 37.
    Let’s look atsomething practical… the Afrihost order form.
  • 38.
    Each click, does: 1.HTTP request (hits apache, hits symfony) 2. Asks the DB for the next products in the category 3. Returns the value, and renders it Let’s go look at it live: clientzone.afrihost.com/get-connected Remember this?
  • 39.
    Final thoughts… … butsometimes it’s not so elementary.
  • 40.
    Thank you. Charging beer…36% github.com/sarelvdwalt/php-jhb