• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Why and how of highly scalable web sites

Why and how of highly scalable web sites



A presentation I made on high scalability for websites/applications. Discusses why scalability is important, and provides an overview of scaling techniques for network, database and applications.

A presentation I made on high scalability for websites/applications. Discusses why scalability is important, and provides an overview of scaling techniques for network, database and applications.



Total Views
Views on SlideShare
Embed Views



1 Embed 1

http://www.slashdocs.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Why and how of highly scalable web sites Why and how of highly scalable web sites Presentation Transcript

    • Why and How of Highly S-c-a-l-a-b-l-e Web Sites
      Faizan Javed Ph.D.,
    • Some stats to get you pumped up…..
      F acebook:
      200710m users, 200860m, 2009120m, 2010500m+. 13 million queries per second.
      570 billion page views a month.
      1.2 million photos served per second.
      ….in 1997:
      “Obidos”1 big DB, 1 big server. Not a whole lot of customers.
      Today: 100-150 services build a page. 60m+ unique users.
      : King of scalability.
      Started as a single server research project in 1997.
      2005: indexed 8billion pages. Now??
      YouTube: founded 2/2005.
      2006  100m views per day.
      Now: 1 billion views per day.
    • Its not just the big guys…
      : social web gaming and Facebook apps.
      50m monthly users. 10m daily active users.
      Zynga’s : Fast and furious growth!
      1million daily players after 4 days.
      10million after 60 days.
      Currently 35million+.
      2009 stats 30m users, 2 billion requests a month, 13,000 requests per second.
    • Some definitions and comments..
      Scalable vs high performance:
      Performance: Blindingly fast for 1000 users and 1 GB data
      Scalable: maintain that performance for upto 10 times data
      and users.
      Scale up vs Scale out: Hardware/Architecture
      Up: Buy a bigger box! Easy, but cost doesn’t scale linearly..
      Out: Add regular boxes! Cheap(er), but admin costs and balance ..
      How about hybrid systems? (requires proper capacity planning)
      Infrastructure/plumbing is important!
      Google’s PageRank available almost immediately, but Google infrastructure secret for far longer…
    • How Web 2.0 is driving this
      TechCrunch/Slashdot/Digg effect – Rapid unexpected customer demand/growth!
      Web 1.0 was mostly static pages:
      Push content to RDBMS, cache in HTML on front-end, round robin webservers and voila!
      Web 2.0’s social phenomena brings unique challenges…
      Comments, “liking”, content suggestions, earned virtual currency, reputation systems…all DYNAMIC and written back to data store.
      Real time social graphs (connectivity between people, places, and things)
      Facebook example
      You are hip. You are popular. You have 500 friends. You log in…
      Facebook gathers status of all your 500 friends at the same time!
      500 requests, replies merged, services contacted, all in a reasonable amount of time!
    • Technology adoption lifecyle
      Fast growth and rapid adoption occurs here!
      Ted Dziubathe ‘anti-Arrington’:
      Rant1: “Scalability is not your problem, getting people to give a sh*t is.”
      Discuss capacity planning – what do you need to scale to ??
      “Every year, we take the busiest minute of the busiest hour of the busiest day and build capacity on that, we built our systems to (handle that load) and we went above and beyond that.” –Scott Gulbransen, Intuit Spokesman
      Rant 2: “Saying "Rails doesn't scale" is like saying "my car doesn't go infinitely fast". ”
      Don’t blame a single technology, chances are you are doing it wrong.  
      PHP doesn’t “scale”? Java doesn’t “scale”? Both are used extensively by Google!
      Rant 3: Silicon valley “machismo”:
      Yea! I am gonna write a post about scalability and once it hits reddit everyone will know how hardcore I am !
    • Why care about high scalability
      Its not 1998 anymore:
      More and more Web 2.0 apps out there..more data to process.
      Raw lust for real-time data.
      Jack posts something…his friends expect it to “ping” pop-up on their screen…
      Web 3.0, the Semantic web:
      “Intelligent Web”, software (personal) agents that will make recommendations based on user (browsing) profiles
      Cutting edge web/software technology:
      Great advances being made in all areas of computer science…
      YouTube content infringement but just one example..
      Start-ups and capacity planning:
      Good to be aware of what to scale to
    • Scaling: Hardware & Network
      Machine redundancy: Master/Slave, Cold/Warm/Hot spares
      Load balancing (for horizontal scaling):
      Hardware (Cisco routers - $$$), Software (Pound, LVS)
      Layer 4 (TCP), sticky sessions (not needed in REST model)
      Layer 7 (HTTP), mapping URLs to servers
      Content caching :
      Reverse proxy: a load balancer that can cache static and dynamic content
      CDN (content delivery network): Akamai, Netscaler, etc.
      Geographically dispersed caching of content to minimize network latency
    • Layer 4 (TCP) load balancing
      Round robin algorithm: rotates amongst the listed servers
      Least connections algorithm: checks for active connections and assigns request to server with least requests (doesn’t overload servers that are handling slow queries).
    • Layer 7 (HTTP) load balancing
      Hash table: Create an entry for each URL with server to redirect to.
      Simple indexing: Perform hashing function on URL. Ensure a uniform distribution.
      Why do this? For serving large files, a cache farm may be needed, and layer 4 balancing will store duplicates while layer 7 will allow each single object to exist only once on a cache server.
    • Content Delivery Networks (CDNs)
      Performance golden rule:
      10-20%: time spent downloading HTML doc.
      80-90%: components in the page.
      Component servers closer to user 
      fewer network hops or response time  response times of many HTTP requests improved.
      Alternative to re-architecting database, application
      smaller companies/startups may not be able to afford CDN services. (Akamai, SAVVIS, etc.)
      Free CDN services: Globule, CoDeeN, CoralCDN
    • Scaling: Database/backend
      RDBMS such as MySQL, SQL Server, Oracle:
      Denormalize database: Reduce joins, create redundant data
      Fixing data inconsistency now job of application!
      Replication (to scale reads):Master-Slave, Tree, Master-Master
      Caching (to scale reads): Memcachedcache layer on database
      Partitioning (to scale writes): Clustering (vertical), Federation/Sharding (horizontal)
      • What if your app does more far more writes than can be handled by RDBMS systems?
      • “NoSQL” movement: a data store based on key/value pairs
      • Leading methodologies: Amazon Dynamo, Google BigTable
      Cassandra (Dynamo): Used at Twitter, Facebook, Digg, Rackspace
      Voldemort (Dynamo), CrouchDB, MongoDB, Hbase, etc.
    • Brewers CAP theoremhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=
      “..though its desirable to have Consistency, High-Availability and Partition-tolerance in every system, unfortunately no system can achieve all three at the same time.”
      Consistent: Guarantees state of a system at any time unless explicitly changed. Example 3 is not consistent (master-master set-up).
      Available: Examples 1 and 2 are not highly available. If a node goes down, there is total data loss.
      Partition-tolerance: Example 3 is partition tolerant but not consistent (Bank ATM withdrawal example).
      BigTable: Consistent + Available
      Dynamo: Available + Partition-tolerant
    • Denormalizing the database – Simple example
      Query 1:
      SELECTproduct_name, order_date
      FROM orders INNER JOIN products USING(product_id)
      WHEREproduct_name like 'A%' ORDER by order_date DESC
      • Scan order_date index on Orders table, compare product_name in Products table
      Query 2:
      SELECTproduct_name, order_date
      FROM orders
      WHEREproduct_name like 'A%' ORDER by order_date DESC
      • No join, single index, but data replication
      Denormalizationvs normalization – which is better? For small n it doesn’t matter !
    • Replication (scaling reads): Master-Slave
      Read/Write ratio generally 80/20 or 90/10.
      All writes performed on master.
      Event writes to binary log, transmitted to slaves.
      Slave is read only.
      Provides more read power!
      Slave needs to be at least as powerful as the master as it needs to do all writes a master can.
      Every box has to perform every write, so can’t scale writes.
    • Replication (scaling reads): Master-Master
      Provides High Availability – each machine is copy of other.
      Writes faster than single master
      Problems with auto-incrementing IDs: inserting records into masters simultaneously..
      Solution: relax reliance on IDs being sequential
      Replication can be playing catchup..consistency issues..
    • Load balancing and replication
    • Caching (scaling reads): Memcached
      functionget_foo(intuserid) {
      /* first try the cache */
      data = memcached_fetch("userrow:" + userid);
      if (!data) {
      /* not found : request database */
      data = db_select("SELECT * FROM users WHERE userid = ?", userid);
      /* then store in cache until next get*/
      memcached_add("userrow:" + userid, data);
      return data;
      Great for speeding up expensive fetch/read queries - e.g., a product details page on an e-commerce site.
      Not good for update/write queries – will result in a cache miss and a database call PLUS a cache update impacting performnce.
    • Clustering (scaling writes)
      Vertical partitioning (easy but limited).
      Distribute tables across different clusters.
      Application logic needs to know where tables are located.
      Design: Go through every query to check which tables join.
      Managing clusters is difficult.
      Increases SQL connections.
      Large database with 6 tables
      Cluster 2:
      Tables 3 and 4
      Cluster 3:
      Tables 5 and 6
      Cluster 1:
      Tables 1 and 2
    • Sharding (scaling writes)
      “Shared-nothing” partitioning scheme.
      Slice data over multiple servers.
      E.g., Users table.
      Odd users id on server 1
      Even users id on server 2
      Design: Shard data so that all records reside on same shard. Avoid cross-server joins!
      Referential integrity might need to be enforced in application code.
    • Code change due to good sharding..
      Before sharding:
      string connectionString = ConfigurationSettings.AppSettings["ConnectionInfo"];
      OdbcConnectionconn = new OdbcConnection(connectionString);
      OdbcCommandcmd = new OdbcCommand("SELECT Name, Address FROM Customers WHERE CustomerID= ?", conn);
      OdbcParameterparam = cmd.Parameters.Add("@CustomerID", OdbcType.Int);
      param.Value = customerId;
      OdbcDataReader reader = cmd.ExecuteReader();
      After sharding:
      string connectionString = GetDatabaseFor(customerId);
      string connectionString = ConfigurationSettings.AppSettings["ConnectionInfo"];
      OdbcConnectionconn = new OdbcConnection(connectionString);
      OdbcCommandcmd = new OdbcCommand("SELECT Name, Address FROM Customers WHERE CustomerID= ?", conn);
      OdbcParameterparam = cmd.Parameters.Add("@CustomerID", OdbcType.Int);
      param.Value = customerId;
      OdbcDataReader reader = cmd.ExecuteReader();
    • Back to denormalization:An average social network profile schema in great need of de-normalization….
    • DIGG case study: Denormalizing the database
      CREATE TABLE `Diggs` (`id` INT(11),`itemid` INT(11),`userid` INT(11),`digdate` DATETIME,PRIMARY KEY (`id`),KEY `user` (`userid`),KEY `item` (`itemid`)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
      CREATE TABLE `Friends` (`id` INT(10) AUTO_INCREMENT,`userid` INT(10),`username` VARCHAR(15),`friendid` INT(10),`friendname` VARCHAR(15),`mutual` TINYINT(1),`date_created` DATETIME,PRIMARY KEY (`id`),UNIQUE KEY `Friend_unique` (`userid`,`friendid`),KEY `Friend_friend` (`friendid`)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
      The problem – Intersection of two sets:
      Users who dugg an item (millions of rows)
      Users that have befriended the digger
      (100s of millions of rows)
      Why Digg made the shift to Cassandra
      (Dynamo/BigTable hybrid)
    • DIGG case study: Denormalizing the database
      JOIN too slow in SQL, do it in PHP:
      Query ‘Friends’ for all my friends. With a cold cache takes 1.5 s.
      Query ‘Diggs’ for any diggs of a specific item by a user in the set of friend user IDs. Enormous query, looks somewhat like..14 seconds with a cold cache
      SELECT `digdate`, `id`
      FROM `Diggs`
      WHERE `userid` IN (59, 9006, 15989, 16045, 29183, 30220, 62511, 75212, 79006, can balloon to hundreds of user IDs..)AND itemid = 13084479
      ORDER BY `digdate` DESC, `id` DESC LIMIT 4;
    • Amazon Dynamo - A Distributed Storage System http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
      Hard to create redundancy and parallelism, are single point of failure.
      Two db servers for identical data: difficult to synchronize!
      Master/Slave: master has to take all heat when writes are occurring!
      Huge issue for mega e-commerce sites.
      Adding more web servers doesn’t help…it’s the database that is the problem!
    • Dynamo – A Distributed Storage Systemhttp://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
      Ring of identical computers.
      Fault tolerance: data is redundant
      Eventually consistent storage system
      Hard to create a responsive and consistent distributed storage system…so redundancy is accomplished asynchronously.
      Partitioning algorithm is complex (which node will store an object to scale).
      Simple Put and Get interface
      Put requires Key,Context and Object..content used by Dynamo to validate requests.
    • Google BigTable(opensource:Hbase)http://labs.google.com/papers/bigtable.html
      • Applications store data rows in tables.
      • Collection of rows located by (sortable) row key (and optional timestamp)
      • Columns may be sparse, and arbitrary in number
      • Column name form “<family>:<label>”.
      • Only a single row at a time may be locked by default.
      • Conceptual view:
    • Google BigTable (opensource:Hbase)http://labs.google.com/papers/bigtable.html
      Physical storage view:
      Stored on a per-column family basis.
      Empty cells of the conceptual view not stored (requests return no value)
    • Scaling: Application
      Front-end “performance” enhancements:
      Minimize HTTP requests, Use Gzip for compression, Reduce DNS lookups, Minify Javascipt and CSS, CDN, etc.
      • “Special-purpose” computations:
      Crawled documents, web request logs, inverted indices, most frequent queries, etc.
      Enter MapReduce: distributed processing framework by Google
      Apache Hadoop– opensourceMapReduce by Yahoo!
      “(Key,Value) pairs + a Map function and a Reduce function”
      ** Powers 90%+ of Google’s jobs and apps internally.
      ** Recently supplanted by Google Percolator for Instant Search
      Microsoft note: DryadLINQ, their answer to MapReduce.
    • MapReduce/Hadoop:http://labs.google.com/papers/mapreduce.html
      For processing and generating large datasets on clusters.
      Input dataset split into independent chunks.
      Operates on <key, value> pairs.
      Implicit parallelization: splitting and distributing data, starting maps, reduces, collecting output
      One Master, multiple Workers
    • MapReduce: WordCount example
      File 1:
      Hello World Bye World
      File 2:
      Hello Hadoop Goodbye Hadoop
      The output of the first map:
      < Bye, 1> < Hello, 1> < World, 2>
      The output of the second map:
      < Goodbye, 1> < Hadoop, 2> < Hello, 1>
      The output of the job is:
      < Bye, 1> < Goodbye, 1> < Hadoop, 2>
      < Hello, 2> < World, 2>
      map(String key, String value):
      // key: document name
      // value: document contents
      for each word w in value: EmitIntermediate(w, "1");
      reduce(String key, Iterator values):
      // key: a word
      // values: a list of counts
      int result = 0;
      for each v in values:
      result += ParseInt(v); Emit(AsString(result));
    • Word count Amazon Elastic Mapreduce
    • Microsoft Dryad/DryadLINQhttp://research.microsoft.com/en-us/projects/dryad/
      Microsoft’s answer to MapReduce
      Dryad jobs are directed acyclic graphs (DAGs) vs. Map/Distribute/Sort/Reduce operations of MapReduce.
      No fault tolerance between stages in MapReduce.
      Big jobs can be more efficient with Dryad.
      Allows developers to specifydata communication mechanisms of computations (TCP pipes, files, FIFOs, etc)
      Allows arbitrary number of inputs and outputs for computations (MapReduce is restricted to 1-1)
    • DryadLINQ = LINQ + Dryad
      Collection<T> collection;
      boolIsLegal(Key k);
      string Hash(Key);
      var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
    • Sawzall – Parallel Analysis of Datahttp://research.google.com/archive/sawzall.html
      Interpreted, procedural domain-specific language to handle huge quantities of data.
      Type-safe scripting language which utilizes Google infrastructure
      Used to process log data generated by Google servers.
      Suitable for the map-phase of map-reduce. Widely used at Google.
      topwords: table top(3) of word: string weight count: int;  fields: array of bytes = splitcsvline(input);  w: string = string(fields[0]);  c: int = int(string(fields[1]), 10);  if (c != 0) {    emit topwords <- w weight c;  }
      Input: abc, 1 def,2 ghi,3 def,4 jkl, 5
      Output: topwords[] = def, 6, 0 topwords[] = jkl, 5, 0 topwords[]=ghi, 3, 0
    • Percolator – Incremental distributed processinghttp://www.google.com/research/pubs/pub36726.html
      Near instant processing new crawled web documents.
      Incremental vs Batch processing (MapReduce)
      Allows changes to web index without rebuilding entire index from scratch.
      Akin to database triggers: sits atop BigTable, can make changes to web maps.
      “Fresher” web results, faster indexing and crawling.
    • Summary of topics presented