• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Geek Sessions Talk

Geek Sessions Talk






Total Views
Views on SlideShare
Embed Views



1 Embed 2

http://www.slideshare.net 2



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Geek Sessions Talk Geek Sessions Talk Presentation Transcript

    • Jonathan Abrams geekSessions July 31, 2007 Web Infrastructure: Surviving The “Hockey Stick”
    • Friendster Growth
    • How to scale a webapp
      • Lightweight non-sticky sessions
      • Cache almost everything (memcached)
      • Decouple slow processes from webapp
      • Segment the database (don’t use replication to scale)
      • Scale out not up (innovate on your app, not your infrastructure)
    • Session Management
      • Use simple lightweight sessions, store in centralized location that can be volatile (memcached at Socializr)
      • 9F6077E3322B4E24C90C43178F42B9C0FFD1E4A43AED1BCA –> 656280029
      • Keep other user data in cache or cookies
      • Avoid sticky sessions, keep load balancing simple
    • Tim Bray - Nov 2006 “Comparing Frameworks”
      • “ For Web apps, I’ve given PHP the edge, because I think building scalable PHP is a little easier . By default, PHP gives you a “shared-nothing” (or at least “shared very little”) architecture, which means you’re going to scale out pretty well until your database hits the wall. Java is a much richer system and assumes you’re smart enough to know whether a shared-nothing architecture is appropriate or not. The effect is, you have to be smarter to get the same kind of scaling out of Java.”
    • Evite Session Management
    • Caching
      • Use memcached, don’t invent your own
      • Put a large memcached instance on every webapp node
      • Cache almost everything but think of your expiration strategy and invalidation rules
    • Avoid queries in loops
      • Queries in loops are SLOW and strain the database
      • Friendster in 2006 – 100s of db and cache queries per page
      • Don’t be afraid of joins when they are optimized and well-indexed and the results are cached
      • Cache big results
    • MySQL replication is not for scaling
      • If you have mostly reads, just use memcached, not slaves
      • If you have many writes, the master will still be a bottleneck, and you will experience slave lag
      • Scaling requires you to segment the db, not replicate (especially blobs)
      • Use replication only for redundancy
      • (some exceptions to this, i.e. joins on shards)
    • How not to architect a “friend tracker” Tim adds a new photo -> update his 100 friends’ trackers Each user has their own tracker rows, sharded based on owner: insert into tracker (owner, user, date, type, param) values … For each trackable event, # of db writes = # of friends for that user …
    • A better “friend tracker” approach: Tim adds a new photo -> update his 100 friends’ trackers Each user has their own update rows, sharded based on user: insert into tracker (user, date, type, param) values … For each trackable event, # of db writes = 1 To compute someone’s “tracker”: for each of n shards: select * from tracker where user in (list), aggregate in the code …
    • Decouple slow processes
      • Expensive computations (i.e. graph)
      • Uploads & photo processing
      • External content integration (via screen scraping, APIs, RSS, etc.)
      • How: iframe, AJAX, POSTs and redirects, subdomains, etc.
    • How not to decouple: WebApp Queue Servers add photo POST url: trackevent.php
      • trackevent.php?user=1…
    • Thank You
      • [email_address]
      • www.socializr.com/jobs