How to serve 2500 Ad requests per second

1,990 views

Published on

SAPO's text advertisement system is a service that provides contextualized text-ads on webpages or searchpages developed in house by SAPO for 4 years now.

The current implementation serves at peak +90M ad requests per day;
It peaks at ~ 2500 requests per second.
Average latency is 33 milisecs
99.9% of requests are served under 150 milisecs

The goal of this presentation is to broadly describe how such a system works and focus on the following topics:
- functionality/features
- performance
- availability & reliability
- technologies and software used.

Published in: Technology, Design
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,990
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

How to serve 2500 Ad requests per second

  1. 1. How to serve 2500 requests per second SAPO text Ads Miguel Mascarenhas Filipe @ codebits, November 2010
  2. 2. Who am I ? Team lead of Anúncios Sapo (text ads) team Team of 5 (+ me) software engineers, designing & developing the text ads platform
  3. 3. How to serve VS How we serve Are we a role-model? Are there recipes ? Should there be a how to?
  4. 4. Summary ● Project Overview ● Full System Overview ● Serving Text Ads ● Latency & Speed ● Scaling on the Front ● Backend & backend Services ● Availability, Reliability & Fault Tolerance ● Scalability Issues ● Monitoring & Logging ● Programming Languages & Technologies
  5. 5. Anúncios SAPO ● Text Advertisement System ● Search Engines: ● BING ( http://www.bing.pt ) ● SAPO's Search ( http://sl.pt ) ● SAPO & Partners' Content Network ● ~200 web sites ● Impresa, Publico, Cofina, IOL, controlinveste ● Peaks at 90M ad requests per day, 2500/sec
  6. 6. Serving Text Ads
  7. 7. Concepts ● Pay Per Click Business Model ● CPC - Cost Per Click ● QPS – Queries Per Second ● CTR - Click Through Rate (clicks / impressions)
  8. 8. Serving text-ads .. Major features: ● choose & serve ads ● register requests, impressions, clicks, conversions ● maintain user budget up to date ● Quickly reflect changes in ad pool
  9. 9. Serving text-ads ... elect_get_ads() { if( words ) ads = get_ads_keywords() else { if (crawled_site) ads = get_ads_crawler() else ads = get_ads_site_keywords() } site_ads = get_ads_site_targeting() merge_ads(ads,site_ads) }
  10. 10. Serving text-ads ... Election of ads: ● requires index ads['word'] -> [ad1, ad2, ad3..] ● ads ordered by: 'score' -> f(CTR, CPC, quality) ● Auction based on Generalized second-price Auction
  11. 11. Serving text-ads.. Other essential features: contextualization of webpages/sites blacklisting of ads per site reporting information scoring quality of ads anti-fraud systems/fraud detectors
  12. 12. LATENCY & SPEED .. and their importance
  13. 13. Latency Low latency is required: Search pages (BING,SAPO,..) have to: search ads (that's us!) search results and merge results together. «ads added last» - site developers put ad- request code at the end of the page (last thing to load, usually)
  14. 14. Latency Without good latency ads are slow to appear and users have moved on...
  15. 15. Latency Slow ads Low CTR BAD! Latency has a BIG impact on REVENUE.
  16. 16. Latency Service Level Agreement 99.9% of reqs under: 150 milliseconds Average response time is: 20 milliseconds Never take more than 1 second. serve blank ads in that case
  17. 17. How to keep low Latency ? Pre-computing everything is essential Fast contextualization lookup Handle lack of information gracefully (turning essential into optional)
  18. 18. How to keep low Latency ? Decouple (and postpone) everything that isn't essential to serve ads .. such as DB writes & other side effects of serving ads.
  19. 19. Fast word lookups - LiveDB Fast word/site lookup(inverted index of ads) ● cache it in local RAM (memcached) ● 'persistent' backing store is RAM
  20. 20. Fast word lookups - LiveDB Offline creation of index: ads['word'] -> [ ad1, ad2, ad3, ad4, ... ] Lots of details, need to compute additional information for each tuple: (word, ad, CPC): CTR, Evaluation Score
  21. 21. Fast word lookups - LiveDB We choose on MySQL for: ● fast 'inverted index' creation (by using Stored procedures & replication) ● fast index lookup based on the 'fame' of MySQL speed in simple workloads ● Replication for free using MySQL's master-slave replication
  22. 22. Fast word lookups - LiveDB Workload is almost read-only. (in fact, we can make it read-only with some tricks) Storage engines FAST for read-only workloads: MySQL MEMORY MySQL MyISAM Very, very similar MEMORY has more restrictions & limitations
  23. 23. MySQL MEMORY Extremely fast lookup. data is guaranteed to be in RAM (or in swap..) Benchmarked MySQL Memory engine: .. avg response time was around 10-20msecs, ..within our needs! Constraints: • There is a maximum table size in bytes • VARCHAR is in fact.. CHAR()
  24. 24. MySQL MyISAM .. After months in production use, MEMORY engine proved problematic.. Evaluated MyISAM, did benchmarks: same speed, lower standard deviation.
  25. 25. Speed Speed is .. ? Queries per second ? Sequential or Concurrently (Throughput) ?
  26. 26. Speed Speed is .. Queries per second. Sequential or Concurrently (Throughput) ? Throughput is obviously what matters in this case..
  27. 27. Speed! avg time is 20msecs = 50 QPS but... it's totally parallel workload. And most of the time is IOwait on index lookup. 1 server cpu can do ~6x this: ~300 QPS .. current servers: ~1200 QPS ..
  28. 28. Scaling on the Front.. Se scale horizontally because: ● We can add more Frontends to handle more QPS ● We can add more LiveDB slaves to handle more SQL Queries
  29. 29. Backend Message queueing system: SAPO BROKER
  30. 30. Backend Services  'compact' & apply operations to the DBMS  runs anti-fraud system  runs contextualizer & crawler  runs semantic analyser  runs reports & suggestions system
  31. 31. Building the LiveDB MySQL is the ACID DBMS MySQL is the non-ACID LiveDB. Python & Stored Procedures create LiveDB in a MySQL DBMS slave, MySQL replication pushes to the read-only slaves
  32. 32. Availability & Reliability (no downtime please..)
  33. 33. Reliability & Fault Tolerance Almost every service/component is redundant. Currently there are only 2 single points of failure:  Master DB server  Master LiveDB server And even if BOTH FAIL, we keep serving ads...
  34. 34. Reliability & Fault Tolerance Failure in Master LiveDB server:  We have a hot spare,  Can change master in aprox 5 to 10 minutes Failure in Master DB: ● Data starts to pile up on Backend services ● Backoffices are unable to operate (no new ads) ● if failure lasts a long time: we might serve ads without budget ● Electing a new Master is performed manually Has happened before, no ad serving downtime.
  35. 35. Scalability Issues We can scale horizontally in all but two components currently:  Master DBMS MySQL server (but we are far from saturating it..) we currently don't plan to 'solve' this  Master LiveDB server ...
  36. 36. Scalability Issues Building LiveDB doesn't scale: ● We build a full new LiveDB everytime ● It isn't distributed nor is it easily made parallel ● Time is proportional to nº of active Bids LiveDB should be updated only with recent changes in ad pool. Impossible to do with current main DB data model and with current LiveDB design We are currently investing heavily on a solution to this, LiveCouchDB
  37. 37. Monitoring & Logging (is everything okay?)
  38. 38. Monitoring & Logging Bad things happen: Log it, deal with it... We need to know about it: monitor logs trigger alarm if errors on log..
  39. 39. Monitoring & Alarmistics frontend code failures intercept error serve blank ad log error trigger alarm
  40. 40. Monitoring & Alarmistics network failures reconnect with exponential backoff log error trigger alarm ?
  41. 41. Monitoring & Alarmistics machine failures replication & redundancy save state to disk
  42. 42. Monitoring & Alarmistics software bugs.. bad (or lack of) data radio silence log error trigger alarm
  43. 43. Programming Languages .. and software used
  44. 44. Programming Languages Python (backend) Perl (frontend code) C (1 app only) Java (broker & reporting) PHP (backoffices) SQL Javascript
  45. 45. Software used Linux memcached MySQL squid nginx
  46. 46. Currently Evaluating Languages: Hadoop PIG Technologies: Hadoop CouchDB
  47. 47. Questions?

×