6.1. Web Scale34330EEDCExecutionEnvironments for Distributed Computing6.1.1. 	Anatomy of a service6.1.2. 	Too many Writes to 	Database6.1.3.	Cheaper peaks6.1.4.	Facebook PlatformMaster in Computer Architecture, Networks and Systems  - CANS
6.1. Web Scale34330EEDCExecution Environments for Distributed Computing6.1.1. 	Anatomy of a service6.1.2. 	Too many Writes to 	Database6.1.3.	Cheaper peaks6.1.4.	Facebook PlatformMaster in Computer Architecture, Networks and Systems  - CANS
Anatomy of a Web Service
Problems may arise in…Various browsers, plugins, operatingsystems, performance, screensize,PEBKAC, etc
Problems may arise in…Internet partitioning, performance bottlenecks,packetloss, jitter
Problems may arise in…DDoStargetinganothercustomer,routingproblems, capacity,Power/coolingproblems, «lazy» remotehands
Problems may arise in…Performance limits, bugs,configurationerrors,faulty HW
Problems may arise in…Networklimits, interruptlimitsOS limits, bugs,configurationerrors,faulty HW, error recovery,
Problems may arise in…Speed of clients, #threads, contentnot in sync, unresponsive Apps,toomanysources of contents,userpersistence,configurationerrors, bugs
Problems may arise in… Requests/sec100 KB	    5 MB	      50 KB      5 KB       50 KB	  50 KBDefault configuration of Tomcatallows 200 threads/instance
Problems may arise in…Speed of clients, #threads, contentnot in sync, unresponsive Apps,toomanysources of contents,userpersistenceconfigurationerrors, bugs
Problems may arise in…Databaseconcurrency, accessto 3rd party data (APIs),CPU ormemoryboundproblems,datacenterreplication,logginguseractions
Problems may arise in…Database concurrency, modifying schemas,Massive tables -> indexes,disk performance,CPU/memory bound,datacenter replication
Problems may arise in…Availability and performance,More than 24h to analyze daily logsNot reaching Inbox (spam folders)Surpass monitoring capacity
6.1. Web Scale34330EEDCExecution Environments for Distributed Computing6.1.1. 	Anatomy of a service6.1.2. 	Too many Writes to 	Database6.1.3.	Cheaper peaks6.1.4.	Facebook PlatformMaster in Computer Architecture, Networks and Systems  - CANS
Too many writes to databaseThere’s no machine  that could do 44k/sec over 1 TB of data.Scaling reads is easier: Big cacheReplicationOn write you have to:Update dataUpdate Transaction logUpdate indexesInvalidate cacheReplicateWrite to 2 or more disks (RAID x)http://www.scribd.com/doc/2592098/DVPmysqlucFederation-at-Flickr-Doing-Billions-of-Queries-Per-Day
		CaseDatabase FederationSharding per User-IDGlobal Ring, know where is the dataPHP Logic to connect shards and data consistentWhat’s a Shard?:Horizontal partitioning of a table, usually per Primary KeyBenefitsYou can scale as long as you have budgetDisadvantagesYou lost the possibility to do any JOIN, COUNT, RANGE, between ShardsYour application logic has to be awareIf you what to rebalance shards, you will need some kind of global unique, beware of auto-incrementsMore services needing HA, BCP, change control, and so on
		CaseGlobal Ring?Storing Key-Value of:User_ID -> Shard_IDPhoto_ID -> User_IDGroup_ID -> Shard_IDEvery access to data has to know where -> memcached with a TTL of 30 minutesGlobal IDs?:You don’t want two objects with the same ID!StrategiesGUIDs: 128 bits Ids, so bigger indexes, and poor supported by MySQLCentral autoincrement: You have a table where for every Id needed you do an insert and let MySQL take care of everything. At 60 photos/sec will be a BIG tableReplace Into: An only MySQL solution, small tables and allows for redundancy (one server provides odd and another even
		Case: Replace INTOThe Tickets64 schema looks like:CREATE TABLE `Tickets64` ( 	`id` bigint(20) unsigned NOT NULL auto_increment,`stub` char(1) NOT NULL default '',	PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`)) ENGINE=MyISAMSELECT * from Tickets64 returns a single row that looks something like:+-------------------+------+ | id 			| stub |+-------------------+------+| 72157623227190423 | a 	| +-------------------+------+ When they need a new globally unique 64-bit ID they issue the following SQL:REPLACE INTO Tickets64 (stub) VALUES ('a'); SELECT LAST_INSERT_ID();
		CasePHP LogicYou lost any kind of intershard relational query (No JOINs)You lost any kind of integrity reference (No ForeignKeys)You have to control distributed transactionsYou select a Favorite (so they need to update your Shard and the one of the other user)Open 2 connections to the two shardsBegin a transaction on both ShardsAdd the dataIf everything is ok -> commit, else roll back and errorSo we improve scalability but impact code complexity and performance off a single page view (hint: async database access)
		CaseThey get an arbitrary scalable infrastructureThey have a marginally more complex code
Hai!I’mworking!
		CaseThey get an arbitrary scalable infrastructureThey have a marginally more complex codeThey “only” have 20 engineers, so scalability also means:Roughly 2.5 million Flickr members per engineer.Roughly 200 million photos per engineer.28 user facing pages. 23 administrative pages.20 API methods, though only 7.5 public API methods.80 API calls per second.250 CPUs.850 annual deploys.16 feature flags.
6.1. Web Scale34330EEDCExecution Environments for Distributed Computing6.1.1. 	Anatomy of a service6.1.2. 	Too many Writes to 	Database6.1.3.	Cheaper peaks6.1.4.	Facebook PlatformMaster in Computer Architecture, Networks and Systems  - CANS
CheaperpeaksIfyourcapacityplanning comes fromtheaggregate of allyourcustomers and you plan tohavethousands of them, whatcouldyou do?And your performance impacts in thebrand of yourcustomer (so you’llhaveproblems)You are a Start-up withoutloads of money
		  CaseWhat a recommendation engine looks like?
		  CaseHave to store data for every page view their customer getsDo MAGIC over millions of rows to calculate related items for YOUShow recommendations to userOnly 2 snippets of Javascript/HTMLLess than 0’5 seconds per view
		  CaseOption AEvery hit to tracker becomes an Insert to a MySQL sharded by customerEvery hit to recommender recalculates the list of items to show based on collective intelligenceBenefitsStraightforward to code and manageQuick and easy for a proof of conceptDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceOur webserver could be overloaded with the sum of all our customersThe recommender is a CPU and memory Hog and we need too many servers to cope with our estimated demand
		  CaseOption BEvery hit to tracker becomes an Insert to a MySQL sharded by customerWe have a cron job that recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsBenefitsStraightforward to codeThe compute intensive task is out of critical path, is asynchronousDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceOur webserver could be overloaded with the sum of all our customersWe have to control what are doing our cron jobs and check for errors and tune them so they don’t bring down the database
		  CaseOption CEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB (sharded by customer) the corresponding set of itemsBenefitsStraightforward to code, only had to move and parse filesA surge on pageviews don’t bring down the database for writesThe compute intensive task is out of critical path, it’s asynchronousDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceWe have to control what are doing our cron jobs and check for errors and tune them so they don’t bring down the databaseWe could hit bandwidth limits
		  CaseOption DEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructureWe could hit bandwidth limits
		  Case: Map/ReduceHadoop: It’s “only” a Framework for running Map/Reduce Applications on large clusters. Allows replication and Fault tolerance, as HW failure will be the norm, using a distributed file system, HDFSMap/Reduce: In a map/reduce application, there are two kinds of jobs, Map and Reduce. Mappers read the HDFS blocks and does local processing and run in parallel. From a webserver log file <url,#hits>Reducers get the output of many mappers and consolidate data. If there was a mapper per day, reducer could calculate how many monthly hits get an URLHbase: Hadoop/MR design gets better throughput than latency so it’s used as analytical platform, but Hbase allow low latency random access to very big tables (billions of rows per millions of columns)Column oriented DB: Table->Row->ColumnFamily->Timestamp=>Value
		  CaseOption DEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructureWe could hit bandwidth limits
		  CaseOption EEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingAll static files served by a CDNBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousUnlimited bandwidthDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructure
		  Case: CDNWhat’s a Content Delivery Network?Your server or http repository (Amazon S3,..) is the Origin of the contentThey give you a DNS name (bb.cdn.net) and you have to create a CNAME to this name (www.example.com -> bb.cdn.net.)When a user asks for www.example.com, the CDN will chose which of their nodes is the nearest to the user and give it/they IP addressesThe user asks for a content (/a.gif) to the node of the CDN, that will check if it has a fresh copy that will send or if it’s a MISS will check with they upstream caches till your OriginSo we get unlimited bandwidth and better latency (we can’t surpass the speed of light)
		  CaseOption EEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingAll static files served by a CDNBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousUnlimited bandwidthDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructure
		  CaseThey get a completely scalable infrastructure at AWSCan provision a new Cruncher, Datastore or Recommender in a matter of minutes and remove it as soon as neededThey don’t have any upper limit of how many request could serveAll the requests that can impact on the User Experience of the customers of theirs are served by a CDNAs there are only 3 kinds of servers and are managed as images, don’t need so much engineers to take care of the infrastructure
6.1. Web Scale34330EEDCExecution Environments for Distributed Computing6.1.1. 	Anatomy of a service6.1.2. 	Too many Writes to 	Database6.1.3.	Cheaper peaks6.1.4.	Facebook PlatformMaster in Computer Architecture, Networks and Systems  - CANS
Facebook PlatformIf your primary data source is not under your control and it’s too far, what happens?An API case
   CaseDuplicatedGifts
   CaseLovingit	     More «Pongos»Hittingthebullseye?
   CaseIt’s a social wish list applicationWhen you access checks if your friends have enabled the application and shows their wish listsYou can share your wish lists on FacebookYou can capture wishes (gifts) and be shown a feed of possible merchantsInitial loading time is criticalExpect virality so we won’t have too much response time
   CaseFlow
   CaseNicebutSlow.3 to 7 secondsto load
   CaseDefine goalsDefine metricsAnalizemetricsImproveone at time
   Case: GoalsTime to load < 1 secondEverythingworks
	   Case: MetricsTime tosessionsetupValidatingto FacebookGettingFriendsInformationLookupsto local Database (lists, items, captureditems)Time to load «home» pageGet HTMLGetwidgetsGetJavascriptsGetvariousgraphicassets
	   Case: Analyzing MetricsTime to session setupValidating to Facebook (300 ms)Getting Friends Information (3 sec)Lookups to local Database (lists, items, captured items) (30 ms)Time to load «home» pageGet HTML (400 ms)Get widgets (300 ms)Get Javascripts (300 ms)Get various graphic assets (500 ms)
	   Case: Facebook accessFromToFrom 3 seconds to 500 ms!
	   Case: Facebook accessIn ASP.net we “only” have 12 threads/CPU -> Only 12 concurrent requests. From 4 users/sec to 24/secWe could use asynchronous calls but:Low parallelism, if we don’t know the GetAppUsers, we can’t ask for GetUserInfo, so no speedupWe could increase the default #threads to another number (.NET 4.0 defaults at 5000/CPU)We can get fail resiliency adjusting timeouts and increasing threads, connections, and so on
	   Case: Leveraging “free” toolsSet future Expires on static filesUsers leverage their browser’s cache and are lighter at server’s sideUse “free” CDN to get Jquery et Al.Microsoft and Google provide a public and free repository of Javascript toolsUse CSS spritesAlthough graphic files are small, they need a TCP connection to retrieve. Combining most graphic assets in a big file and use CSS to select which one to show#nav li a {background-image:url('../img/image_nav.gif')} #nav li a.item1 {background-position:0px 0px} #nav li a:hover.item1 {background-position:0px -72px}
	   Case: more on SpritesAvg size 2KB/fileHTTP/1.1 (rfc 2616) suggests that browsers download no more than 2 components in parallel per hostnameSmall files doesn’t use all available bandwidth. TCP Slow Start…Latency also plays an important role
AboutthissessionSergi Morales, Founder & CTO of Expertos en TIPhone: +34 6688-XPNTIEmail : sergi.morales+eedc@expertosenti.comBlog : http://blog.expertosenti.com	Web: http://www.expertosenti.comExpertos en TI: We help Internet oriented projects to leverage all the research done by the big sites (Flickr, Facebook, Twitter, Salesforce, Google, and so on) so they can improve their bottom line and be prepared for growth
About the EEDC course 34330 	Execution Environments for Distributed Computing (EEDC), 	Master in Computer Architecture, Networks and Systems (CANS)	Computer Architectura Department (AC)		Universitat Politècnica de Catalunya – Barcelona Tech (UPC)	ECTS credits: 6INSTRUCTOR	Professor Jordi TorresPhone: +34 93 401 7223 Email : torres@ac.upc.eduOffice : Campus Nord, Modul C6. Room 217.	Web: http://www.JordiTorres.org

EEDC 2010. Scaling Web Applications

  • 1.
    6.1. Web Scale34330EEDCExecutionEnvironmentsfor Distributed Computing6.1.1. Anatomy of a service6.1.2. Too many Writes to Database6.1.3. Cheaper peaks6.1.4. Facebook PlatformMaster in Computer Architecture, Networks and Systems - CANS
  • 2.
    6.1. Web Scale34330EEDCExecutionEnvironments for Distributed Computing6.1.1. Anatomy of a service6.1.2. Too many Writes to Database6.1.3. Cheaper peaks6.1.4. Facebook PlatformMaster in Computer Architecture, Networks and Systems - CANS
  • 3.
    Anatomy of aWeb Service
  • 4.
    Problems may arisein…Various browsers, plugins, operatingsystems, performance, screensize,PEBKAC, etc
  • 5.
    Problems may arisein…Internet partitioning, performance bottlenecks,packetloss, jitter
  • 6.
    Problems may arisein…DDoStargetinganothercustomer,routingproblems, capacity,Power/coolingproblems, «lazy» remotehands
  • 7.
    Problems may arisein…Performance limits, bugs,configurationerrors,faulty HW
  • 8.
    Problems may arisein…Networklimits, interruptlimitsOS limits, bugs,configurationerrors,faulty HW, error recovery,
  • 9.
    Problems may arisein…Speed of clients, #threads, contentnot in sync, unresponsive Apps,toomanysources of contents,userpersistence,configurationerrors, bugs
  • 10.
    Problems may arisein… Requests/sec100 KB 5 MB 50 KB 5 KB 50 KB 50 KBDefault configuration of Tomcatallows 200 threads/instance
  • 11.
    Problems may arisein…Speed of clients, #threads, contentnot in sync, unresponsive Apps,toomanysources of contents,userpersistenceconfigurationerrors, bugs
  • 12.
    Problems may arisein…Databaseconcurrency, accessto 3rd party data (APIs),CPU ormemoryboundproblems,datacenterreplication,logginguseractions
  • 13.
    Problems may arisein…Database concurrency, modifying schemas,Massive tables -> indexes,disk performance,CPU/memory bound,datacenter replication
  • 14.
    Problems may arisein…Availability and performance,More than 24h to analyze daily logsNot reaching Inbox (spam folders)Surpass monitoring capacity
  • 15.
    6.1. Web Scale34330EEDCExecutionEnvironments for Distributed Computing6.1.1. Anatomy of a service6.1.2. Too many Writes to Database6.1.3. Cheaper peaks6.1.4. Facebook PlatformMaster in Computer Architecture, Networks and Systems - CANS
  • 16.
    Too many writesto databaseThere’s no machine that could do 44k/sec over 1 TB of data.Scaling reads is easier: Big cacheReplicationOn write you have to:Update dataUpdate Transaction logUpdate indexesInvalidate cacheReplicateWrite to 2 or more disks (RAID x)http://www.scribd.com/doc/2592098/DVPmysqlucFederation-at-Flickr-Doing-Billions-of-Queries-Per-Day
  • 17.
    CaseDatabase FederationSharding perUser-IDGlobal Ring, know where is the dataPHP Logic to connect shards and data consistentWhat’s a Shard?:Horizontal partitioning of a table, usually per Primary KeyBenefitsYou can scale as long as you have budgetDisadvantagesYou lost the possibility to do any JOIN, COUNT, RANGE, between ShardsYour application logic has to be awareIf you what to rebalance shards, you will need some kind of global unique, beware of auto-incrementsMore services needing HA, BCP, change control, and so on
  • 18.
    CaseGlobal Ring?Storing Key-Valueof:User_ID -> Shard_IDPhoto_ID -> User_IDGroup_ID -> Shard_IDEvery access to data has to know where -> memcached with a TTL of 30 minutesGlobal IDs?:You don’t want two objects with the same ID!StrategiesGUIDs: 128 bits Ids, so bigger indexes, and poor supported by MySQLCentral autoincrement: You have a table where for every Id needed you do an insert and let MySQL take care of everything. At 60 photos/sec will be a BIG tableReplace Into: An only MySQL solution, small tables and allows for redundancy (one server provides odd and another even
  • 19.
    Case: Replace INTOTheTickets64 schema looks like:CREATE TABLE `Tickets64` ( `id` bigint(20) unsigned NOT NULL auto_increment,`stub` char(1) NOT NULL default '', PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`)) ENGINE=MyISAMSELECT * from Tickets64 returns a single row that looks something like:+-------------------+------+ | id | stub |+-------------------+------+| 72157623227190423 | a | +-------------------+------+ When they need a new globally unique 64-bit ID they issue the following SQL:REPLACE INTO Tickets64 (stub) VALUES ('a'); SELECT LAST_INSERT_ID();
  • 20.
    CasePHP LogicYou lostany kind of intershard relational query (No JOINs)You lost any kind of integrity reference (No ForeignKeys)You have to control distributed transactionsYou select a Favorite (so they need to update your Shard and the one of the other user)Open 2 connections to the two shardsBegin a transaction on both ShardsAdd the dataIf everything is ok -> commit, else roll back and errorSo we improve scalability but impact code complexity and performance off a single page view (hint: async database access)
  • 21.
    CaseThey get anarbitrary scalable infrastructureThey have a marginally more complex code
  • 22.
  • 23.
    CaseThey get anarbitrary scalable infrastructureThey have a marginally more complex codeThey “only” have 20 engineers, so scalability also means:Roughly 2.5 million Flickr members per engineer.Roughly 200 million photos per engineer.28 user facing pages. 23 administrative pages.20 API methods, though only 7.5 public API methods.80 API calls per second.250 CPUs.850 annual deploys.16 feature flags.
  • 24.
    6.1. Web Scale34330EEDCExecutionEnvironments for Distributed Computing6.1.1. Anatomy of a service6.1.2. Too many Writes to Database6.1.3. Cheaper peaks6.1.4. Facebook PlatformMaster in Computer Architecture, Networks and Systems - CANS
  • 25.
    CheaperpeaksIfyourcapacityplanning comes fromtheaggregateof allyourcustomers and you plan tohavethousands of them, whatcouldyou do?And your performance impacts in thebrand of yourcustomer (so you’llhaveproblems)You are a Start-up withoutloads of money
  • 26.
    CaseWhata recommendation engine looks like?
  • 27.
    CaseHaveto store data for every page view their customer getsDo MAGIC over millions of rows to calculate related items for YOUShow recommendations to userOnly 2 snippets of Javascript/HTMLLess than 0’5 seconds per view
  • 28.
    CaseOptionAEvery hit to tracker becomes an Insert to a MySQL sharded by customerEvery hit to recommender recalculates the list of items to show based on collective intelligenceBenefitsStraightforward to code and manageQuick and easy for a proof of conceptDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceOur webserver could be overloaded with the sum of all our customersThe recommender is a CPU and memory Hog and we need too many servers to cope with our estimated demand
  • 29.
    CaseOptionBEvery hit to tracker becomes an Insert to a MySQL sharded by customerWe have a cron job that recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsBenefitsStraightforward to codeThe compute intensive task is out of critical path, is asynchronousDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceOur webserver could be overloaded with the sum of all our customersWe have to control what are doing our cron jobs and check for errors and tune them so they don’t bring down the database
  • 30.
    CaseOptionCEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB (sharded by customer) the corresponding set of itemsBenefitsStraightforward to code, only had to move and parse filesA surge on pageviews don’t bring down the database for writesThe compute intensive task is out of critical path, it’s asynchronousDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceWe have to control what are doing our cron jobs and check for errors and tune them so they don’t bring down the databaseWe could hit bandwidth limits
  • 31.
    CaseOptionDEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructureWe could hit bandwidth limits
  • 32.
    Case:Map/ReduceHadoop: It’s “only” a Framework for running Map/Reduce Applications on large clusters. Allows replication and Fault tolerance, as HW failure will be the norm, using a distributed file system, HDFSMap/Reduce: In a map/reduce application, there are two kinds of jobs, Map and Reduce. Mappers read the HDFS blocks and does local processing and run in parallel. From a webserver log file <url,#hits>Reducers get the output of many mappers and consolidate data. If there was a mapper per day, reducer could calculate how many monthly hits get an URLHbase: Hadoop/MR design gets better throughput than latency so it’s used as analytical platform, but Hbase allow low latency random access to very big tables (billions of rows per millions of columns)Column oriented DB: Table->Row->ColumnFamily->Timestamp=>Value
  • 33.
    CaseOptionDEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructureWe could hit bandwidth limits
  • 34.
    CaseOptionEEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingAll static files served by a CDNBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousUnlimited bandwidthDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructure
  • 35.
    Case:CDNWhat’s a Content Delivery Network?Your server or http repository (Amazon S3,..) is the Origin of the contentThey give you a DNS name (bb.cdn.net) and you have to create a CNAME to this name (www.example.com -> bb.cdn.net.)When a user asks for www.example.com, the CDN will chose which of their nodes is the nearest to the user and give it/they IP addressesThe user asks for a content (/a.gif) to the node of the CDN, that will check if it has a fresh copy that will send or if it’s a MISS will check with they upstream caches till your OriginSo we get unlimited bandwidth and better latency (we can’t surpass the speed of light)
  • 36.
    CaseOptionEEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingAll static files served by a CDNBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousUnlimited bandwidthDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructure
  • 37.
    CaseTheyget a completely scalable infrastructure at AWSCan provision a new Cruncher, Datastore or Recommender in a matter of minutes and remove it as soon as neededThey don’t have any upper limit of how many request could serveAll the requests that can impact on the User Experience of the customers of theirs are served by a CDNAs there are only 3 kinds of servers and are managed as images, don’t need so much engineers to take care of the infrastructure
  • 38.
    6.1. Web Scale34330EEDCExecutionEnvironments for Distributed Computing6.1.1. Anatomy of a service6.1.2. Too many Writes to Database6.1.3. Cheaper peaks6.1.4. Facebook PlatformMaster in Computer Architecture, Networks and Systems - CANS
  • 39.
    Facebook PlatformIf yourprimary data source is not under your control and it’s too far, what happens?An API case
  • 40.
    CaseDuplicatedGifts
  • 41.
    CaseLovingit More «Pongos»Hittingthebullseye?
  • 42.
    CaseIt’s a social wish list applicationWhen you access checks if your friends have enabled the application and shows their wish listsYou can share your wish lists on FacebookYou can capture wishes (gifts) and be shown a feed of possible merchantsInitial loading time is criticalExpect virality so we won’t have too much response time
  • 43.
    CaseFlow
  • 44.
    CaseNicebutSlow.3 to 7 secondsto load
  • 45.
    CaseDefine goalsDefine metricsAnalizemetricsImproveone at time
  • 46.
    Case: GoalsTime to load < 1 secondEverythingworks
  • 47.
    Case: MetricsTime tosessionsetupValidatingto FacebookGettingFriendsInformationLookupsto local Database (lists, items, captureditems)Time to load «home» pageGet HTMLGetwidgetsGetJavascriptsGetvariousgraphicassets
  • 48.
    Case: Analyzing MetricsTime to session setupValidating to Facebook (300 ms)Getting Friends Information (3 sec)Lookups to local Database (lists, items, captured items) (30 ms)Time to load «home» pageGet HTML (400 ms)Get widgets (300 ms)Get Javascripts (300 ms)Get various graphic assets (500 ms)
  • 49.
    Case: Facebook accessFromToFrom 3 seconds to 500 ms!
  • 50.
    Case: Facebook accessIn ASP.net we “only” have 12 threads/CPU -> Only 12 concurrent requests. From 4 users/sec to 24/secWe could use asynchronous calls but:Low parallelism, if we don’t know the GetAppUsers, we can’t ask for GetUserInfo, so no speedupWe could increase the default #threads to another number (.NET 4.0 defaults at 5000/CPU)We can get fail resiliency adjusting timeouts and increasing threads, connections, and so on
  • 51.
    Case: Leveraging “free” toolsSet future Expires on static filesUsers leverage their browser’s cache and are lighter at server’s sideUse “free” CDN to get Jquery et Al.Microsoft and Google provide a public and free repository of Javascript toolsUse CSS spritesAlthough graphic files are small, they need a TCP connection to retrieve. Combining most graphic assets in a big file and use CSS to select which one to show#nav li a {background-image:url('../img/image_nav.gif')} #nav li a.item1 {background-position:0px 0px} #nav li a:hover.item1 {background-position:0px -72px}
  • 52.
    Case: more on SpritesAvg size 2KB/fileHTTP/1.1 (rfc 2616) suggests that browsers download no more than 2 components in parallel per hostnameSmall files doesn’t use all available bandwidth. TCP Slow Start…Latency also plays an important role
  • 53.
    AboutthissessionSergi Morales, Founder& CTO of Expertos en TIPhone: +34 6688-XPNTIEmail : sergi.morales+eedc@expertosenti.comBlog : http://blog.expertosenti.com Web: http://www.expertosenti.comExpertos en TI: We help Internet oriented projects to leverage all the research done by the big sites (Flickr, Facebook, Twitter, Salesforce, Google, and so on) so they can improve their bottom line and be prepared for growth
  • 54.
    About the EEDCcourse 34330 Execution Environments for Distributed Computing (EEDC), Master in Computer Architecture, Networks and Systems (CANS) Computer Architectura Department (AC) Universitat Politècnica de Catalunya – Barcelona Tech (UPC) ECTS credits: 6INSTRUCTOR Professor Jordi TorresPhone: +34 93 401 7223 Email : torres@ac.upc.eduOffice : Campus Nord, Modul C6. Room 217. Web: http://www.JordiTorres.org