Geek Sessions Talk
Upcoming SlideShare
Loading in...5

Geek Sessions Talk






Total Views
Views on SlideShare
Embed Views



1 Embed 2 2



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Geek Sessions Talk Geek Sessions Talk Presentation Transcript

  • Jonathan Abrams geekSessions July 31, 2007 Web Infrastructure: Surviving The “Hockey Stick”
  • Friendster Growth
  • How to scale a webapp
    • Lightweight non-sticky sessions
    • Cache almost everything (memcached)
    • Decouple slow processes from webapp
    • Segment the database (don’t use replication to scale)
    • Scale out not up (innovate on your app, not your infrastructure)
  • Session Management
    • Use simple lightweight sessions, store in centralized location that can be volatile (memcached at Socializr)
    • 9F6077E3322B4E24C90C43178F42B9C0FFD1E4A43AED1BCA –> 656280029
    • Keep other user data in cache or cookies
    • Avoid sticky sessions, keep load balancing simple
  • Tim Bray - Nov 2006 “Comparing Frameworks”
    • “ For Web apps, I’ve given PHP the edge, because I think building scalable PHP is a little easier . By default, PHP gives you a “shared-nothing” (or at least “shared very little”) architecture, which means you’re going to scale out pretty well until your database hits the wall. Java is a much richer system and assumes you’re smart enough to know whether a shared-nothing architecture is appropriate or not. The effect is, you have to be smarter to get the same kind of scaling out of Java.”
  • Evite Session Management
  • Caching
    • Use memcached, don’t invent your own
    • Put a large memcached instance on every webapp node
    • Cache almost everything but think of your expiration strategy and invalidation rules
  • Avoid queries in loops
    • Queries in loops are SLOW and strain the database
    • Friendster in 2006 – 100s of db and cache queries per page
    • Don’t be afraid of joins when they are optimized and well-indexed and the results are cached
    • Cache big results
  • MySQL replication is not for scaling
    • If you have mostly reads, just use memcached, not slaves
    • If you have many writes, the master will still be a bottleneck, and you will experience slave lag
    • Scaling requires you to segment the db, not replicate (especially blobs)
    • Use replication only for redundancy
    • (some exceptions to this, i.e. joins on shards)
  • How not to architect a “friend tracker” Tim adds a new photo -> update his 100 friends’ trackers Each user has their own tracker rows, sharded based on owner: insert into tracker (owner, user, date, type, param) values … For each trackable event, # of db writes = # of friends for that user …
  • A better “friend tracker” approach: Tim adds a new photo -> update his 100 friends’ trackers Each user has their own update rows, sharded based on user: insert into tracker (user, date, type, param) values … For each trackable event, # of db writes = 1 To compute someone’s “tracker”: for each of n shards: select * from tracker where user in (list), aggregate in the code …
  • Decouple slow processes
    • Expensive computations (i.e. graph)
    • Uploads & photo processing
    • External content integration (via screen scraping, APIs, RSS, etc.)
    • How: iframe, AJAX, POSTs and redirects, subdomains, etc.
  • How not to decouple: WebApp Queue Servers add photo POST url: trackevent.php
    • trackevent.php?user=1…
  • Thank You
    • [email_address]