Slideshare.net (beta)

 
Post: 
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons



All comments

Add a comment on Slide 1

If you have a SlideShare account, login to comment; else you can comment as a guest


Showing 1-50 of 2 (more)

Improving The Performance of Your Web App

From joestump, 4 months ago

These are the slides from my FOWA workshop on how to scale your we more

917 views  |  0 comments  |  2 favorites  |  37 downloads
 

Groups/Events

Not added to any group/event

 
 

Privacy InfoNew!

This slideshow is Public

 
Embed in your blog
Embed (wordpress.com)
custom

Slideshow Statistics
Total Views: 917
on Slideshare: 917
from embeds: 0* * Views from embeds since 21 Aug, 07

Slideshow transcript

Slide 1: Improving the Performance of your Web Application Joe Stump, Lead Architect, Digg.com

Slide 2: Introductions

Slide 3: “Web 2.0 sucks (for scaling).” Joe Stump, Lead Architect, Digg.com Users want access to all of their crap at all times. I, personally, don’t find your dog funny or cute, but I’ll be damned if I’m the one who’ll stand in the way of you posting it and others consuming it.

Slide 4: Backend Considerations Language considerations Scaling out Caching strategies Content storage and delivery Parallel data requests Near time data processing Partitioning data

Slide 5: Frontend Considerations Reduce HTTP requests Avoid inline JavaScript and CSS Compression and Minification Learn to love HTTP/1.1

Slide 6: “PHP doesn’t scale.” Cal Henderson, Director of Development, Flickr.com Languages don’t scale Bytecode caching (PHP, Python, etc) Robust library & driver support Active developer communities

Slide 7: Discussion! What language do you use? Why? Does it help you or hurt to use it?

Slide 8: Your mom lied; don’t share. Decentralize data, storage, processing, etc. Increased redundancy Scaling becomes simple; add more boxes

Slide 9: Scaling Up

Slide 10: Scaling Up

Slide 11: Scaling Up

Slide 12: Scaling Out

Slide 13: Scaling Out

Slide 14: Scaling Out

Slide 15: Scaling Out

Slide 16: Scaling Out

Slide 17: Scaling Out

Slide 18: Scaling Out

Slide 19: Scaling Out

Slide 20: Scaling Out

Slide 21: Scaling Out

Slide 22: How do I scale easily? 1.Caching 2.Caching 3.Caching!

Slide 23: What are my options? Disk based caching (e.g. Cache_Lite) In memory caching (e.g. APC, Memcached) Cloud caching (e.g. MogileFS, S3)

Slide 24: Disk based caching Stupid simple Cheap Fairly easy to scale out Dynamic images Slower than others Use fast disks! RAM disks are faster

Slide 25: APC (PHP) Bytecode caching In memory user cache Insanely fast Not centralized or shared

Slide 26: Memcache If you’re not using this you’re crazy Easy to set up and use Insanely fast over the network Scales to insane heights Failover, widely supported, etc. Centralized and shared across site

Slide 27: Mogile FS File and data store Runs over WebDAV Scales out infinitely (in theory) Serialize data, store in file Centralized and shared across site

Slide 28: Amazon S3 File and data store Runs over HTTP Scales out infinitely (in theory) Serialize data, store in file Centralized and shared across site Costs money Widely supported in all languages Check out ThruDB

Slide 29: Discussion! Are you using caching? Why not? If so, what’s your strategy?

Slide 30: Content Storage/Delivery What are your storage needs? Is it critical YOU store them? How costly is it to store in-house? Can you do it for free? (YAY! Mooching!)

Slide 31: i can has free storage? YouTube for video Scribd for documents Flickr for images

Slide 32: Cloud Services (S3) Simple to get up and running No hardware maintenance Costs money, but not as much as you think

Slide 33: NFS Simple to set up and get running Costs money, requires colocation, etc. Does. Not. Scale. Did I mention it doesn’t scale? Stop gap solution at best

Slide 34: Mogile FS Somewhat complicated to set up Costs money, requires colocation, etc. Scales exceptionally well Used at Digg, LiveJournal, others Check out File_Mogile by Digg (PEAR)

Slide 35: Roll Your Own File storage IS your business Highly specialized and customized Costs money, requires colocation, etc. Last resort

Slide 36: CDN Completely outsource it Costs a ton of money Out of your control Scales and scales and scales

Slide 37: Discussion! What are you using for storage? What’s worked for you? What’s failed epically?

Slide 38: Parallel Data Requests Access your data in parallel Make data access asynchronous (WHAT?!) Loosely couple your data access layer All for the low, low price of FREE!* *Offer only available for hardcore nerds looking for street cred.

Slide 39: HTTP Parallel Asynchronous Non-blocking Loosely coupled Free foot massages!

Slide 41: HTTP

Slide 42: Gearman Parallel Asynchronous Scales well

Slide 43: Discussion! Which format to use for exchange? Anyone doing this already? Amazon, Google,Yahoo!

Slide 44: Near time processing Does this need to be done NOW? Offload to background processes Offloading must be a no op Feeds, Facebook, crawling, etc.

Slide 45: Cron Run every minute or two Simple Great for batch jobs Not decentralized, locking issues

Slide 46: Gearman Fire and forget Simple Scales well Digg Images Nearly instant Decentralized No guarantees

Slide 47: Queues Grid Engine by Sun Starling by Twitter Others?

Slide 48: Amzon’s EC2 http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/ Near limitless computing resources Remember; don’t share Awesome for bots, crawling, etc.

Slide 49: Discussion! What’s low(er) priority? Where would you implement this?

Slide 50: Partitioning Data Horizontal v.Vertical Not all data lives in a single place Hash records to partitions App smart / logical sharding

Slide 51: Horizontal 192.168.0.1 192.168.0.2 192.168.0.3 Users Users Users id int(11) id int(11) id int(11) username char(15) username char(15) username char(15) password char(15) password char(15) password char(15) email char(45) email char(45) email char(45)

Slide 52: Hashing your data oh hai! were’s mai dataz?!

Slide 53: How? Put 10,000 users per partition Partition users alphabetically Partition home listings by zip code Partition products by SKU

Slide 54: Vertical 192.168.0.1 192.168.0.2 192.168.0.3 Users UsersPrf UsersStg id int(11) id int(11) id int(11) username char(15) fname char(50) cmts_pg tinyint(2) password char(15) lname char(50) cmts_lvl tinyint(1) email char(45) url char(255) cmts_prf tinyint(1)

Slide 55: Why? Avoid altering large tables Save time during insert Many small tables v. one large table Lazy loading of rarely used data

Slide 56: Discussion! Natural partitions in your data? How would you hash your data?

Slide 57: Reduce HTTP Requests Bundle JavaScript and CSS Use sprites for images Reduce images / outside objects

Slide 58: Reduce HTTP Requests Bundle JavaScript and CSS Use sprites for images Reduce images / outside objects

Slide 59: Avoid inline JS/CSS External = Cached Inline = Not Cached

Slide 60: Compression / Minify Enable Gzip compression sitewide Use minification software on JS jQuery/Prototype Minified

Slide 61: Learn to Love HTTP/1.1 Cache-Control: public/private Connection: close Expires: Thu, 28 Feb 2008 16:00:00 GMT

Slide 62: Conclusions Share nothing, decentralize, redundancy Caching, caching, caching, caching Reduce, recycle and reuse

Slide 63: Resources High Performance Web Sites Essential Knowledge for Front-End Engineers by Steve Souders Serving JavaScript Fast http://www.thinkvitamin.com/features/webapps/serving-javascript-fast by Cal Henderson, Director of Development, Flickr.com

Slide 64: Questions?!

Slide 65: Contact/Flame Me Joe Stump joe@digg.com http://joestump.net