How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour
 

Like this? Share it with your network

Share

How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

on

  • 2,107 views

Real-time content, offer and ad targeting decisions must happen quickly. When a user requests information from a web application, a processing clock starts, requiring a decision in as little as 40 ...

Real-time content, offer and ad targeting decisions must happen quickly. When a user requests information from a web application, a processing clock starts, requiring a decision in as little as 40 msec. Delays in targeting decisions lead to delays in responding to the user. These delays can lead to user dissatisfaction and, ultimately, loss of audience and revenue.

This session describes how AOL Advertising uses Hadoop to create sophisticated user profiles and NoSQL database technology from Couchbase to access those profiles in real-time, with sub-millisecond latency. This architecture leaves the bulk of the processing time budget for improved content, offer and ad targeting and even real-time content customization.

Statistics

Views

Total Views
2,107
Views on SlideShare
2,107
Embed Views
0

Actions

Likes
3
Downloads
32
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour Presentation Transcript

  • 1. Simple. Fast. Elastic.How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour NoSQL Now! 2011 Matt Ingenthron 1
  • 2. AD  AND  OFFER  TARGETING“ AOL  asny  incremental  f  improvement  pier  pdrocessing  Bme  d  translates  la>orms,   and   erves  billions  o impressions   n   ay  from  our  a serving  p to  huge   benefits  in  our  ability  to  more  effecBvely  serve  the  ads  needed  to  meet   our  contractual  commitments.  TradiBonal  databases  lack  the  scalability   required  to  support  our  goal  of  five  milliseconds  per  read/write.  CreaBng   user  profiles  with  Hadoop,  then  serving  them  from  Couchbase,  reduces   profile  read  and  write  access  to  under  a  millisecond,  leaving  the  bulk  of   the  processing  Bme  budget  for  improved  targeBng  and  customizaBon. Pero  Subasic ” Chief  Architect,  AOL 2
  • 3. Ad  and  offer  targe/ng 40  milliseconds  to  respond   with  the  decision. profiles,  real  /me  campaign   3 sta/s/cs 2 1 profiles,  campaigns events 3
  • 4. Proven at small, and extra large scale • Leading cloud service (PAAS) • Social game leader – FarmVille, provider Mafia Wars, Café World • Over 150,000 hosted • Over 230 million monthly active applications users • Couchbase Server serving over 6,200 Heroku customers • Couchbase Server is the primary database behind key Zynga properties 4
  • 5. Modern interactive software architecture Application Scales Out Just add more commodity web servers Database Scales Up Get a bigger, more complex server-­‐Expensive  and  disrup/ve  sharding-­‐Doesn’t  perform  at  large  scale 5
  • 6. Couchbase data layer scales like application logic tierData layer now scales with linear cost and constant performance. Application Scales Out Just add more commodity web servers Couchbase  Servers Database Scales Out Just add more commodity data servers Horizontally  scalable,  schema-­‐less,  auto-­‐ sharding,  high-­‐performance  at  Web   Scale Scaling out flattens the cost and performance curves. 6
  • 7. Couchbase  is  a  distributed  database Couchbase  Web  Console Applica/on  user Web  applica/on  server Couchbase  Servers In  the  data  center On  the  administrator  console 7
  • 8. Couchbase  is  Simple,  Fast,  Elas/c  NoSQL • Simple  to: ElasBc  Couchbase – Deploy  (Membase  ServerTemplate) – Develop  (memcached) – Manage  (UI  and  RESTful  API) • Fast: – Predictable  low  latency   – Sub-­‐ms  response  Bmes – Built-­‐in  memcached  technology • Zero-­‐down/me  Elas/city: – Spread  I/O  and  data  across  instances – Consistent  performance  with  linear  cost – Dynamic  rebalancing  of  a  live  cluster 8
  • 9. COUCHBASE  SERVER  ARCHITECTURE  OVERVIEW 9
  • 10. Couchbase  “write”  data  flow  –  applica/on  view User  acBon  results  in  the  need  to   change  the  VALUE  of  KEY 1 ApplicaBon  updates  key’s  VALUE,   2 performs  SET  operaBon   4 Couchbase  client  hashes  KEY,  idenBfies   3 KEY’s  master  server SET  request  sent  over   network  to  master  server 5 Couchbase  replicates  KEY-­‐VALUE  pair,  caches  it   in  memory  and  stores  it  to  disk 10
  • 11. Couchbase  data  flow  –  under  the  hood SET  request  arrives  at  KEY’s   SET  acknowledgement   master  server 1 5 returned  to  applicaBon 2 2 Listener-­‐Sender RAM* 2 Couchbase  storage  engine Disk Disk Disk 3 Disk Disk Disk Replica  Server  1  for  KEY Master  server  for  KEY Replica  Server  2  for  KEY 11
  • 12. Couchbase  Architecture 11211 11210 memcapable  1.0 memcapable  2.0 moxi REST  management  API/Web  UI vBucket  state  and  replicaBon  manager Global  singleton  supervisor Rebalance  orchestrator memcached ConfiguraBon  manager Node  health  monitor Process  monitor protocol  listener/sender Heartbeat Data  Manager Cluster  Manager engine  interface CouchDB hp on  each  node one  per  cluster Erlang/OTP HTTP erlang  port  mapper distributed  erlang 8080 4369 21100  –  21199 12
  • 13. Couchbase  Architecture 11211 11210 memcapable  1.0 memcapable  2.0 moxi vBucket  state  and  replicaBon  manager REST  management  API/Web  UI Global  singleton  supervisor Rebalance  orchestrator memcached ConfiguraBon  manager Node  health  monitor Process  monitor protocol  listener/sender Heartbeat engine  interface CouchDB hp on  each  node one  per  cluster Erlang/OTP HTTP erlang  port  mapper distributed  erlang 8091 4369 21100  –  21199 13
  • 14. Data  buckets  are  secure  Couchbase  “slices” Applica/on  user Web  applica/on  server Bucket  1 Bucket  2 Aggregate  Cluster  Memory  and  Disk  Capacity Couchbase  data  servers In  the  data  center On  the  administrator  console 14
  • 15. Elas/c  Rebalancing Node  1 Node  2 Node  3Before vBucket  1 vBucket  7• Adding  Node  3 vBucket  2 vBucket  8• Node  3  is  in  pending  state vBucket  3 vBucket  9• Clients  talk  to  Node  1,2  only Pending  state vBucket  4 vBucket  10 vBucket  5 vBucket  11 vBucket  6 vBucket  12 15
  • 16. Elas/c  Rebalancing Node  1 Node  2 Node  3 Before vBucket  1 vBucket  7 • Adding  Node  3 vBucket  2 vBucket  8 • Node  3  is  in  pending  state vBucket  3 vBucket  9 • Clients  talk  to  Node  1,2  only Pending  state vBucket  4 vBucket  10 vBucket  5 vBucket  11 vBucket  6 vBucket  12 vBucket  1 vBucket  7 vBucket  2 vBucket  8During vBucket  3 vBucket  9 Rebalancing• Rebalancing  orchestrator  recalculates  the   vBucket  4 vBucket  10 vBucket  map  (including  replicas) vBucket  5 vBucket  11• Migrate  vBuckets  to  the  new  server vBucket  6 vBucket  12• Finalize  migraBon vBucket    migrator vBucket    migrator 15
  • 17. Elas/c  Rebalancing Node  1 Node  2 Node  3 Before vBucket  1 vBucket  7 • Adding  Node  3 vBucket  2 vBucket  8 • Node  3  is  in  pending  state vBucket  3 vBucket  9 • Clients  talk  to  Node  1,2  only Pending  state vBucket  4 vBucket  10 vBucket  5 vBucket  11 vBucket  6 vBucket  12 vBucket  1 vBucket  7 vBucket  2 vBucket  8During vBucket  3 vBucket  9 Rebalancing• Rebalancing  orchestrator  recalculates  the   vBucket  4 vBucket  10 vBucket  map  (including  replicas) vBucket  5 vBucket  11• Migrate  vBuckets  to  the  new  server vBucket  6 vBucket  12• Finalize  migraBon vBucket    migrator vBucket    migrator ClientAKer vBucket  1 vBucket  7 vBucket  5• Node  3  is  balanced• Clients  are  reconfigured  to  talk  to  Node  3 vBucket  2 vBucket  8 vBucket  6 vBucket  3 vBucket  9 vBucket  11 vBucket  4 vBucket  10 vBucket  12 15
  • 18. Aol  and  CouchbaseAd  TargeBng
  • 19. Online Advertising Publishers Advertisers Aol Advertising Advertiser Constraints: “Match Maker” •    Payment model  –  may  pay  per      impression,  click,  or  conversion •    Allowability  –  may  restrict  on          what  web  sites  to  be  served •    Targeting  –  may  only  want  to          be  shown  to  internet  users  in          a  certain  geo  locaBon,  or  from        a  specific  demographic Internet Users •      Frequency  –  may  limit  how  oaen          the  same  user  is  shown  the  ad •    Campaign Delivery:Publisher Constraints:                  -­‐  The  total  ad  budget  may •    Payment model  –  may  charge                          have  to  be  delivered        per  impression,  click,  or                          according  to  a  plan      conversion                  -­‐    The  served  impressions •    Allowability  –  may  prohibit                          may  have  to  generate  no Terminology:      certain  types  of  ads  to  be   • CPM = Cost Per Mille, e.g. $1.50 per 1000 impressions                        less  than  a  prescribed  click-­‐      displayed • CPC = Cost Per Click, e.g. $2 per click                        through  or  conversion  rate • CPA = Cost Per Acquisition, e.g. $15 per acquisition/conversion
  • 20. Large-­‐Scale  Analy/cs• Mission• Team• Data • Ad  serving  logs,  content,  and  3rd  party  data  to  be  processed• Research• Technologies • Cloudera:  Hadoop,  HDFS,  Flume,  Workflow  Manager • Distributed  opera/onal  store:  Couchbase • Light  DB:  MySQL • Use  MPI  for  model  building • Constantly  experimen/ng...
  • 21. Data Feeds Flume  Inges/on-­‐  Cpu-­‐intensive  (MPI-­‐based  ML) -­‐  OperaBonal  store  highly  cached   Large-scale Analytics:                in  Couchbase -ReporBng  and  Insights-­‐  Distributed  search  (Sphinx)          -­‐  DB  support  for  Hadoop  and  MySQL -PredicBve  Segments -Contextual  Analysis  and  SegmentaBon Couchbase   DB  Cluster Actionable data Data from the Internet (to ad serving)
  • 22. Use  Cases  Today• data  set  enrichment:  given  a  field  in  a  data  set  stored  on  HDFS,  enrich  by  adding  related   fields;  media  -­‐>  campaign  -­‐>  adver>ser  chain• blackboard  for  inter-­‐process/job  communica>on:  contextual  segmenta>on  pipelines;   predic>ve  modeling  can  load  per-­‐campaign  models  to  be  used  for  large-­‐scale  scoring• larger  map-­‐side  joins  (where  Hadoop  DistributedCache  and  in-­‐memory  process/task  cache  is   insufficient)  • aggrega>ons  with  large  number  of  item  lookups,  e.g.  user-­‐level  contextual  profiles   aggregated  from  visited  url  contextual  profies  stored  in  memcache• Flume  integra>on  for  data  flow  reliability  end  recovery• segment  genera>on  currently  carried  out  through  Hadoop  pipelines  and  uploaded  into   server-­‐side  Membase  for  targe>ng• but:  strong  tendency  to  move  closer  to  ad  serving  mo>vates  thinking  about  new   architectures  to  reduce  segment  genera>on  >me
  • 23. RT  Framework:  Capture,  Compute  and  Forward Data Feeds Flume  Inges/on COMPUTE CAPTURE FORWARD Couchbase Compute   Couchbase (front-­‐end) Cluster (back-­‐end) and   ad-­‐serving  logic Big  Data  Loop Hadoop
  • 24. RT  Contextual  Segmenta/on Data Feeds Flume  Inges/onUser-­‐ContentID  Mapper Active Event FrameUser-­‐Segment  Mapper UC Map Membase Couchbase  +   US Short-term Map ad-­‐serving  logic ContentID-Segment Map
  • 25. Rough  Capacity  Es/mates• Data  Volume  Calcula/on – 60000  events  per  second  -­‐>  60000  *  900  =  55  mil.  events  during  15-­‐minute   burst – 1KB  per  event  -­‐>  55GB  for  staging  frame  +  55GB  for  loading  frame  =   110GB;  the  rest  ~  800GB  is  for  data  output  from  processes   – 10  nodes  at  128GB  =  1TB  -­‐>  more  than  enough!  (assuming  one  copy)   – exact  calcula/ons  at  hqp://wiki.membase.org/display/membase/Sizing +Guidelines• Processing  bandwidth – assuming  cluster  supports  200K  ops  per  second  (conserva/ve) – 60000  opera/ons/sec  reserved  for  loading  the  current  15-­‐minute  frame – remaining  140K  opera/ons/sec  for  jobs  
  • 26. AOL  TargeBngLessons  Learned  using  Couchbase
  • 27. Couchbase  Architecture  • Requirements • Support  iniBally  up  to  1.2  billion  keys  (1  key  per  user  in  the  system). • Minimum  of  10K  writes  per  second. • Two  clusters,  one  on  each  coast,  to  reduce  latency. • Easily  scalable,  support  an  increasing  number  of  keys  &  writes/reads  per  second  and   seamlessly  allow  growth  for  the  future• Couchbase  Set  up • IniBally    1.1  billion  keys,  now  650  million  keys. • 250  to  ~2K  writes/second. • 1K  to  7K  reads/second. • 2  clusters,  10  nodes  each. • Dual  wriBng,  once  to  each  cluster. • 1.19  TB  of  RAM  available,  124/128  GB  allocated  on  each  server.  200  Gb  in  use  at  the  moment. 25
  • 28. Lessons  Learned Issue ResoluMon Do  not  use  the  local  moxy.    The  membase  client  (Spy)   ReplicaBon  across  data  center  by  wriBng  to  a  local   should  connect  directly  to  an  instance  of  a  remote  moxy  to   moxy  dramaBcally  reduces  the  throughput.   perform  updates.     Membase  needs  150  bytes  per  item  for  meta  data   Correctly  sizing  the  membase  cluster  based  on    Ensure  mem_high_wat  is  not  exceeded  to  prevent   the  expected  number  of  keys  and  size  of  the   spillover  to  disk.  If  incoming  data  arrives  faster  than  the   object  is  criBcal  to  the  membase  operaBon data  write  to  disk,  the  system  returns  errors Re-­‐issue  flushctl  command  every  Bme  Membase  server   Membase  seongs  such  as  memory  high/low   restarts.    Membase  indicated  that  a  beer  configuraBon   water  marks  modified  by  flushctl  will  be  reverted   system  to  allow  permanent  change  of  seongs  is  coming.     to  default  when  the  service  restarts.   UnBl  then,  they  recommend  to  sBck  with  default  seongs.   26
  • 29. THANK  YOUMaM  IngenthronmaM@couchbase.com@ingenthr 27
  • 30. Customers  and  Partners Customers  (par/al  lis/ng) Partners 28