Improving the
Performance of your
  Web Application
  Joe Stump, Lead Architect, Digg.com
Introductions
“Web 2.0 sucks (for scaling).”
             Joe Stump, Lead Architect, Digg.com




  Users want access to all of their cr...
Backend Considerations
    Language considerations
    Scaling out
    Caching strategies
    Content storage and delivery...
Frontend Considerations

    Reduce HTTP requests
    Avoid inline JavaScript and CSS
    Compression and Minification
    ...
“PHP doesn’t scale.”
 Cal Henderson, Director of Development, Flickr.com




Languages don’t scale
Bytecode caching (PHP, ...
Discussion!

What language do you use?
Why?
Does it help you or hurt to use it?
Your mom lied; don’t share.

 Decentralize data, storage, processing, etc.
 Increased redundancy
 Scaling becomes simple; ...
Scaling Up
Scaling Up
Scaling Up
Scaling Out
Scaling Out
Scaling Out
Scaling Out
Scaling Out
Scaling Out
Scaling Out
Scaling Out
Scaling Out
Scaling Out
How do I scale easily?
1.Caching
2.Caching
3.Caching!
What are my options?

Disk based caching (e.g. Cache_Lite)
In memory caching (e.g. APC, Memcached)
Cloud caching (e.g. Mog...
Disk based caching
           Stupid simple
           Cheap
           Fairly easy to scale out
           Dynamic images...
APC (PHP)
  Bytecode caching
  In memory user cache
  Insanely fast
  Not centralized or shared
Memcache
If you’re not using this you’re crazy
Easy to set up and use
Insanely fast over the network
Scales to insane heig...
Mogile FS
File and data store
Runs over WebDAV
Scales out infinitely (in theory)
Serialize data, store in file
Centralized a...
Amazon S3
File and data store
Runs over HTTP
Scales out infinitely (in theory)
Serialize data, store in file
Centralized and...
Discussion!

Are you using caching?
Why not?
If so, what’s your strategy?
Content Storage/Delivery

  What are your storage needs?
  Is it critical YOU store them?
  How costly is it to store in-h...
i can has free storage?

     YouTube for video
     Scribd for documents
     Flickr for images
Cloud Services (S3)

Simple to get up and running
No hardware maintenance
Costs money, but not as much as you think
NFS
Simple to set up and get running
Costs money, requires colocation, etc.
Does. Not. Scale.
Did I mention it doesn’t sca...
Mogile FS
Somewhat complicated to set up
Costs money, requires colocation, etc.
Scales exceptionally well
Used at Digg, Li...
Roll Your Own

File storage IS your business
Highly specialized and customized
Costs money, requires colocation, etc.
Last...
CDN

Completely outsource it
Costs a ton of money
Out of your control
Scales and scales and scales
Discussion!

What are you using for storage?
What’s worked for you?
What’s failed epically?
Parallel Data Requests

    Access your data in parallel
    Make data access asynchronous (WHAT?!)
    Loosely couple you...
HTTP

       Parallel
       Asynchronous
       Non-blocking
       Loosely coupled
       Free foot massages!
HTTP
Gearman
     Parallel
     Asynchronous
     Scales well
Discussion!


Which format to use for exchange?
Anyone doing this already?
Amazon, Google,Yahoo!
Near time processing

  Does this need to be done NOW?
  Offload to background processes
  Offloading must be a no op
  Feed...
Cron

Run every minute or two
Simple
Great for batch jobs
Not decentralized, locking issues
Gearman
     Fire and forget
     Simple
     Scales well
     Digg Images
     Nearly instant
     Decentralized
     No ...
Queues

Grid Engine by Sun
Starling by Twitter
Others?
Amzon’s EC2
http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/




            Near limit...
Discussion!


What’s low(er) priority?
Where would you implement this?
Partitioning Data

Horizontal v.Vertical
Not all data lives in a single place
Hash records to partitions
App smart / logic...
Horizontal
192.168.0.1         192.168.0.2         192.168.0.3

Users               Users               Users
id int(11)  ...
Hashing your data
   oh hai! were’s mai dataz?!
How?

Put 10,000 users per partition
Partition users alphabetically
Partition home listings by zip code
Partition products...
Vertical
192.168.0.1          192.168.0.2      192.168.0.3

Users                UsersPrf         UsersStg
id int(11)     ...
Why?

Avoid altering large tables
Save time during insert
Many small tables v. one large table
Lazy loading of rarely used...
Discussion!


Natural partitions in your data?
How would you hash your data?
Reduce HTTP Requests

   Bundle JavaScript and CSS
   Use sprites for images
   Reduce images / outside objects
Reduce HTTP Requests

   Bundle JavaScript and CSS
   Use sprites for images
   Reduce images / outside objects
Avoid inline JS/CSS

           External = Cached

           Inline = Not Cached
Compression / Minify

  Enable Gzip compression sitewide
  Use minification software on JS
  jQuery/Prototype Minified
Learn to Love HTTP/1.1

 Cache-Control: public/private
 Connection: close
 Expires: Thu, 28 Feb 2008 16:00:00 GMT
Conclusions

Share nothing, decentralize, redundancy
Caching, caching, caching, caching
Reduce, recycle and reuse
Resources

High Performance Web Sites
Essential Knowledge for Front-End Engineers
by Steve Souders


Serving JavaScript Fa...
Questions?!
Contact/Flame Me


          Joe Stump
          joe@digg.com
          http://joestump.net
Improving The Performance of Your Web App
Upcoming SlideShare
Loading in …5
×

Improving The Performance of Your Web App

5,875
-1

Published on

These are the slides from my FOWA workshop on how to scale your web apps.

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,875
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
169
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Improving The Performance of Your Web App

  1. 1. Improving the Performance of your Web Application Joe Stump, Lead Architect, Digg.com
  2. 2. Introductions
  3. 3. “Web 2.0 sucks (for scaling).” Joe Stump, Lead Architect, Digg.com Users want access to all of their crap at all times. I, personally, don’t find your dog funny or cute, but I’ll be damned if I’m the one who’ll stand in the way of you posting it and others consuming it.
  4. 4. Backend Considerations Language considerations Scaling out Caching strategies Content storage and delivery Parallel data requests Near time data processing Partitioning data
  5. 5. Frontend Considerations Reduce HTTP requests Avoid inline JavaScript and CSS Compression and Minification Learn to love HTTP/1.1
  6. 6. “PHP doesn’t scale.” Cal Henderson, Director of Development, Flickr.com Languages don’t scale Bytecode caching (PHP, Python, etc) Robust library & driver support Active developer communities
  7. 7. Discussion! What language do you use? Why? Does it help you or hurt to use it?
  8. 8. Your mom lied; don’t share. Decentralize data, storage, processing, etc. Increased redundancy Scaling becomes simple; add more boxes
  9. 9. Scaling Up
  10. 10. Scaling Up
  11. 11. Scaling Up
  12. 12. Scaling Out
  13. 13. Scaling Out
  14. 14. Scaling Out
  15. 15. Scaling Out
  16. 16. Scaling Out
  17. 17. Scaling Out
  18. 18. Scaling Out
  19. 19. Scaling Out
  20. 20. Scaling Out
  21. 21. Scaling Out
  22. 22. How do I scale easily? 1.Caching 2.Caching 3.Caching!
  23. 23. What are my options? Disk based caching (e.g. Cache_Lite) In memory caching (e.g. APC, Memcached) Cloud caching (e.g. MogileFS, S3)
  24. 24. Disk based caching Stupid simple Cheap Fairly easy to scale out Dynamic images Slower than others Use fast disks! RAM disks are faster
  25. 25. APC (PHP) Bytecode caching In memory user cache Insanely fast Not centralized or shared
  26. 26. Memcache If you’re not using this you’re crazy Easy to set up and use Insanely fast over the network Scales to insane heights Failover, widely supported, etc. Centralized and shared across site
  27. 27. Mogile FS File and data store Runs over WebDAV Scales out infinitely (in theory) Serialize data, store in file Centralized and shared across site
  28. 28. Amazon S3 File and data store Runs over HTTP Scales out infinitely (in theory) Serialize data, store in file Centralized and shared across site Costs money Widely supported in all languages Check out ThruDB
  29. 29. Discussion! Are you using caching? Why not? If so, what’s your strategy?
  30. 30. Content Storage/Delivery What are your storage needs? Is it critical YOU store them? How costly is it to store in-house? Can you do it for free? (YAY! Mooching!)
  31. 31. i can has free storage? YouTube for video Scribd for documents Flickr for images
  32. 32. Cloud Services (S3) Simple to get up and running No hardware maintenance Costs money, but not as much as you think
  33. 33. NFS Simple to set up and get running Costs money, requires colocation, etc. Does. Not. Scale. Did I mention it doesn’t scale? Stop gap solution at best
  34. 34. Mogile FS Somewhat complicated to set up Costs money, requires colocation, etc. Scales exceptionally well Used at Digg, LiveJournal, others Check out File_Mogile by Digg (PEAR)
  35. 35. Roll Your Own File storage IS your business Highly specialized and customized Costs money, requires colocation, etc. Last resort
  36. 36. CDN Completely outsource it Costs a ton of money Out of your control Scales and scales and scales
  37. 37. Discussion! What are you using for storage? What’s worked for you? What’s failed epically?
  38. 38. Parallel Data Requests Access your data in parallel Make data access asynchronous (WHAT?!) Loosely couple your data access layer All for the low, low price of FREE!* *Offer only available for hardcore nerds looking for street cred.
  39. 39. HTTP Parallel Asynchronous Non-blocking Loosely coupled Free foot massages!
  40. 40. HTTP
  41. 41. Gearman Parallel Asynchronous Scales well
  42. 42. Discussion! Which format to use for exchange? Anyone doing this already? Amazon, Google,Yahoo!
  43. 43. Near time processing Does this need to be done NOW? Offload to background processes Offloading must be a no op Feeds, Facebook, crawling, etc.
  44. 44. Cron Run every minute or two Simple Great for batch jobs Not decentralized, locking issues
  45. 45. Gearman Fire and forget Simple Scales well Digg Images Nearly instant Decentralized No guarantees
  46. 46. Queues Grid Engine by Sun Starling by Twitter Others?
  47. 47. Amzon’s EC2 http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/ Near limitless computing resources Remember; don’t share Awesome for bots, crawling, etc.
  48. 48. Discussion! What’s low(er) priority? Where would you implement this?
  49. 49. Partitioning Data Horizontal v.Vertical Not all data lives in a single place Hash records to partitions App smart / logical sharding
  50. 50. Horizontal 192.168.0.1 192.168.0.2 192.168.0.3 Users Users Users id int(11) id int(11) id int(11) username char(15) username char(15) username char(15) password char(15) password char(15) password char(15) email char(45) email char(45) email char(45)
  51. 51. Hashing your data oh hai! were’s mai dataz?!
  52. 52. How? Put 10,000 users per partition Partition users alphabetically Partition home listings by zip code Partition products by SKU
  53. 53. Vertical 192.168.0.1 192.168.0.2 192.168.0.3 Users UsersPrf UsersStg id int(11) id int(11) id int(11) username char(15) fname char(50) cmts_pg tinyint(2) password char(15) lname char(50) cmts_lvl tinyint(1) email char(45) url char(255) cmts_prf tinyint(1)
  54. 54. Why? Avoid altering large tables Save time during insert Many small tables v. one large table Lazy loading of rarely used data
  55. 55. Discussion! Natural partitions in your data? How would you hash your data?
  56. 56. Reduce HTTP Requests Bundle JavaScript and CSS Use sprites for images Reduce images / outside objects
  57. 57. Reduce HTTP Requests Bundle JavaScript and CSS Use sprites for images Reduce images / outside objects
  58. 58. Avoid inline JS/CSS External = Cached Inline = Not Cached
  59. 59. Compression / Minify Enable Gzip compression sitewide Use minification software on JS jQuery/Prototype Minified
  60. 60. Learn to Love HTTP/1.1 Cache-Control: public/private Connection: close Expires: Thu, 28 Feb 2008 16:00:00 GMT
  61. 61. Conclusions Share nothing, decentralize, redundancy Caching, caching, caching, caching Reduce, recycle and reuse
  62. 62. Resources High Performance Web Sites Essential Knowledge for Front-End Engineers by Steve Souders Serving JavaScript Fast http://www.thinkvitamin.com/features/webapps/serving-javascript-fast by Cal Henderson, Director of Development, Flickr.com
  63. 63. Questions?!
  64. 64. Contact/Flame Me Joe Stump joe@digg.com http://joestump.net

×