Membase, ShareThis, AOL - Phillips, Mukerji, Jackson - Hadoop World 2010


Published on

Better Ad, Offer and Content Targeting with Membase and Hadoop

James Phillips, Membase
Manu Mukerji, ShareThis
Ben Jackson, AOL

Learn more about Hadoop @

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Get better channels at bottom myspace, forbesAbout ShareThis, stats/metrics consumer facing company but we collect data…scale...
  • Data used as signals for user modelSharingSocial Network usedSearch KeywordsViewsMembase stores user cookie -> segments
  • What we offerdisplay campaigns that are audience targeted,Example of ads, content, different partners etc.
  • Membase, ShareThis, AOL - Phillips, Mukerji, Jackson - Hadoop World 2010

    1. 1. Better Ad, Offer and Content Targeting with Membase and Hadoop<br />ShareThis<br />James Phillips, Co-founder, Membase<br />ManuMukerji, Architect, ShareThis<br />Ben Jackson, Chief Architect, Aol.<br />Aol.<br />
    2. 2. What is Membase?<br />
    3. 3. Data Manager<br />Pure Key-Value Store<br />Built-in Memcached<br />CP behavior (AP option)<br />Cluster Manager<br />Configuration manager<br />Replication supervisor<br />Rebalance orchestration<br />NodeCode Manager<br />Protocol extenders<br />Real-time aggregation<br />Index management<br />Membase Server<br />3<br />Memcached protocol<br />NodeCodeManager<br />(in development)<br />DataManager<br />Cluster Manager<br />
    4. 4. Membase Clustering<br />4<br /><ul><li> Data and I/O spread across nodes
    5. 5. Peer-to-peer replication
    6. 6. Hot cluster maintenance – zero downtime</li></li></ul><li>Targeting atShareThis<br />
    7. 7. Largest integrated sharing network<br />We make sharing simple, engaging & valuable <br />Powerful Social Analytics & Audience Monetization<br />About ShareThis<br />450/momillion consumers<br />~850thousand sites<br />50+social channels<br />
    8. 8. This is how it works<br />Sharing Behavior<br />Log Files<br />Search Keywords<br />Page Views<br />HDFS<br />Map/Reduce<br />Content Analysis<br />Taxonomy<br />Ad Server<br />User<br />Membase<br />2<br />
    9. 9. ShareThis Ad Products<br />
    10. 10. Targeting at Aol.<br />
    11. 11. Internet Conentent and Advertising <br />Over 80 O&O internet brands<br /> 3rd party ad network<br />Reach 90% of US<br />Sells Ad Serving technology<br />ADTECH<br />About AOL<br />10<br />
    12. 12. How AOL uses Hadoop<br />Cookie <br />Classification<br />Events<br />Reporting<br />Cookie Profiling<br />Model Building<br />Scoring <br />Database Supported <br />LAMP Reporting<br />Ad Servers<br />Hadoop<br />Cluster<br />
    13. 13. Real Time Cookie Scoring?<br />Cookie <br />Classification<br />Campaign<br />Insights<br />Ad Servers<br />Events<br />Node Code<br />Membase<br />/MPI Cluster<br />Report <br />Generation<br />Cookie <br />Scoring<br />Flume<br />Model Creation<br />To HDFS<br />Cookie Scoring<br />Report Generation<br />Reporting System<br />No DB<br />Hadoop Cluster<br />
    14. 14. <ul><li>HDFS/MapReduce
    15. 15. Stores months of historic data
    16. 16. Long running model building jobs
    17. 17. Supports research and development
    18. 18. Flume
    19. 19. A reliable and open source solution to streaming event data from multiple sources to multiple sinks.
    20. 20. Membase/MPI cluster
    21. 21. Stores recent data (a few weeks)
    22. 22. Allows data access on how a cookie scores for each model
    23. 23. MPI performs very fast in memory processing on data, given real time updates to cookie classification.</li></ul>The Components<br />
    24. 24. <ul><li>Data is to be stored keyed by AOL user cookie
    25. 25. All events stored keyed by user cookie
    26. 26. Impressions, clicks, …
    27. 27. Enriching data (known demographic and behavioral data)
    28. 28. Ideal but not yet available
    29. 29. Sets stored for each key
    30. 30. Atomic insert/delete
    31. 31. Membase Node Code is coming</li></ul>Node Code Next Year<br />
    32. 32. MPI<br />Parallel processing environment <br />Used heavily in high performance scientific computing<br />Parallel reductions outside of Hadoop MapReduce<br />MPI and Membase on same cluster<br />Local access to membase data for fast analysis<br />Cookie Scoring and Reporting via MPI<br />
    33. 33. <ul><li>Memcached has no facility to get the list of available keys
    34. 34. (this is outside of the traditional memcached model)
    35. 35. Need to store the list of cookie keys in special index locations in Memcached
    36. 36. All MPI Nodes talk to all Memcached Nodes
    37. 37. Large communication overhead</li></ul>Memcache Cluster<br />MPI Cluster<br />MPI processing using traditional Memcached API<br />
    38. 38. <ul><li>Not yet fully baked
    39. 39. Currently only available via Python
    40. 40. MPI is traditionally C/C++
    41. 41. Each MPI instance talks to the local Membase daemon
    42. 42. No external communication needed
    43. 43. Process all data in that local instance
    44. 44. Future exploration needed</li></ul>The fast way w/Membase streaming<br />
    45. 45. Test the speed of the Membase implementation of Memcached<br />How quickly can ad servers access data associated with a user cookie?<br />Goal 5 millisecond latency for all requests<br />Can Membase handle concurrent reads and writes?<br />Testbed – 3 Memcached servers with replication factor 1.<br />Results<br />Pushed ~ 14 GB of data onto the servers<br />Tried reading data while pushing up to 20k/sec keys<br />Found <1 to 3 millisecond response times for queries, easily meeting our needs<br />Membase POC: Part 1<br />
    46. 46. <ul><li>GOALS
    47. 47. Integrating MPI/Membase
    48. 48. Very fast processing of recent server data
    49. 49. Is this faster than similar processing on Hadoop?
    50. 50. Test Bed
    51. 51. 10 servers with 128 GB of ram each.
    52. 52. Flow
    53. 53. Push cookie profiles into Membase from HDFS
    54. 54. Waiting on node code for a fully realized streaming implementation.
    55. 55. Run simple aggregation job in MPI
    56. 56. Use the MPI for parallel reductions
    57. 57. Stream Data using TAP
    58. 58. Do most of the computation locally, doing the bare minimum in parallel reductions</li></ul>Membase POC: Part 2 (ongoing)<br />
    59. 59. Have Questions?<br />James Phillips,<br />Manu Mukerji,<br />Ben Jackson,<br />