Advertisement

Open recommendation platform

Director Data Engineering
Oct. 13, 2013
Advertisement

More Related Content

Advertisement

Open recommendation platform

  1. Open Recommendation Platform ACM RecSys 2013, Hong Kong Torben Brodt plista GmbH Keynote International News Recommender Systems Workshop and Challenge October 13th, 2013
  2. Where it’s coming from Recommendations where ● news websites ● below the article Visitors Publisher different types ● content ● advertising
  3. Where it’s coming from good recommendations for... User happy! Advertiser happy! Publisher happy! plista* happy! * company i am working for
  4. Where it’s coming from some years ago Recommendations Context Visitors Publisher Collaborative Filtering
  5. Where it’s coming from one recommender Collaborative Filtering ● well known algorithm ● more data means more knowledge Parameter Tuning ● time ● trust ● mainstream
  6. Where it’s coming from one recommender = good results 2008 ● finished studies ● 1st publication ● plista was born today ● 5k recs/second ● many publishers
  7. Where it’s coming from netflix prize "use as many recommenders as possible!"
  8. Where it’s coming from more recommenders Collaborative Filtering Most Popular Text Similarity etc ...
  9. understanding performance lost in serendipity ● we have one score ● lucky success? bad loss? ● we needed to keep track on different recommenders success: 0.31 %
  10. understanding performance how to measure success bad good number of ● clicks ● orders ● engages ● time on site ● money 10
  11. understanding performance evaluation technology Algo1 1+1 10 Algo2 100 2+5 Algo... ● features? ● big data math? ● counting! for blending we just count floats
  12. understanding performance evaluation technology impressions collaborative filtering 500 +1 most popular 500 text similarity 500 ZINCRBY "impressions" "collaborative_filtering" "1" ZREVRANGEBYSCORE "impressions"
  13. understanding performance evaluation technology clicks collaborative filtering most popular ... needs division 100 10 1 ZREVRANGEBYSCORE "clicks" collaborative filtering 500 most popular 500 ZREVRANGEBYSCORE "impressions" text similarity 500 impressions
  14. understanding performance evaluation results success ● CF is "always" the best recommender ● but "always" is just avg of all context lets check on context!
  15. Context Context ● We like anonymization! ● We have a big context featured by the web URL + HTTP Headers provide ○ user agent -> device -> mobile ○ IP address -> geolocation ○ referer -> origin (search, direct)
  16. Context Context consider list of best recommender in each context attribute sorted list for what is relevant by ● clicks (content recs) ● price (advertising recs) category = archive hour = 15 publisher = welt.de text similarity 400 recent collaborative filtering 689 most popular 135 collaborative filtering 200 collaborative filtering 10 ... 420 text similarity 80 ... 5 100
  17. Context evaluation context publisher = welt.de collaborative filtering ZUNION clk ... WEIGHTS p:welt.de:clk 4 w:sunday:clk 1 c:archive:clk 1 689 = sunday most popular weekday 420 text similarity collaborative filtering 135 most popular ... category = archive 400 ZREVRANGEBYSCORE "clk" 200 100 text similarity 200 collaborative filtering 10 ... 5 ZUNION imp ... WEIGHTS p:welt.de:imp 4 w:sunday:imp 1 c:archive:imp 1 ZREVRANGEBYSCORE "imp"
  18. Context Targeting Context can be used for optimization and targeting. classical targeting is limitation
  19. Context Livecube Advertising Advertising Recommenders Recommenders RWE Europe RWE Europe 500 +1 500 +1 collaborative filtering collaborative filtering 500 +1 500 +1 IBM Germany IBM Germany 500 500 most popular most popular 500 500 Intel Austria Intel Austria 500 500 text similarity text similarity 500 500 Onsite Onsite new iphone new iphone su... su... 500 +1 500 +1 twitter buys p.. twitter buys p.. 500 500 google has seri. 500 google has seri. 500
  20. Context evaluation context success recap ● added another dimension context result ● better for news: Collaborative Filtering ● better for content: Text Similarity 20
  21. now breath! what did we get? ● possibly many recommenders ● know how to measure success ● technology to see success
  22. the ensemble ● real-time evaluation technology exists ● to choose best algorithm for current context we need to learn: multiarmed bayesian bandit
  23. Data Science “shuffle” exploration exploitation No. 1 getting most temporary success? local minima? Interested? Look for Ted Dunning + Bayesian Bandit
  24. ✓ better results success ● new total / avg is much better ● thx bandit ● thx ensemble time more research ● timeseries
  25. ✓ easy exploration ● tradeoff (money decision) ● between price/time we “waste” in offline evaluation ● and price we loose with bad recommendations
  26. try and error ● minimum pre-testing ● no risk if recommender crashs ● "bad" code might find its context
  27. collaboration ● now plista developers can try ideas ● and allow researchers to do the same
  28. big pool of algorithms Collaborative Filtering Ensemble is able to choose Most Popular Ensemble Text Similarity Research Algorithms BPR-Linear WR-MF SVD++ etc.
  29. researcher has idea src http://g-ecx.images-amazon.com/images/G/03/video/m/feature/wickie_figur.jpg
  30. researcher has idea ● ● ● ● ● src http://userserve-ak.last.fm/serve/_/7291575/Wickie%2B4775745.jpg 30 first and only dataset in news context ○ millions of items ○ only relevant for short time dataset has many attributes !! many publishers have user intersection ○ regional ○ contextual real world !!! ○ you can guide the user ○ you don’t need to follow his route real time !! ○ This is industry, it has to be usable
  31. ... needs to start the server ... probably hosted by university, plista or any cloud provider?
  32. ... api implementation "message bus" ● event notifications ○ impression ○ click ● error notifications ● item updates train model from it
  33. ... package content { // json "type": "impression", "context": { "simple": { "27": 418, // publisher "14": 31721, // widget ... }, "lists": { "10": [100, 101] // channel } ... specs hosted at http://orp.plista. api } com
  34. ... package content { // json "type": "impression", "recs": ... // what was recommended } api specs hosted at http://orp.plista. com
  35. ... package content { // json "type": "click", "context": ... // will include the position } api specs hosted at http://orp.plista. com
  36. ... reply to recommendation requests { // json Real User "recs": { "int": { "3": [13010630, 84799192] recs // 3 refers to content recommendations } ... API } generated by researchers to be shown to real user api specs hosted at http://orp.plista. com Researcher
  37. quality is win win #2 ● happy user Real User recs ● happy researcher ● happy plista research can profit ● real user feedback Researcher ● real benchmark
  38. how to build fast system? use common frameworks src http://en.wikipedia.org/wiki/Pac-Man
  39. quick and fast ● no movies! ● news articles will outdate! ● visitors need the recs NOW ● => handle the data very fast src http://static.comicvine.com/uploads/original/10/101435/2026520-flash.jpg
  40. "send quickly" technologies ● fast web server ● fast network protocol or Apache Kafka ● fast message queue ● fast storage 40
  41. comparison to plista "real-time features feel better in a real-time world" our setup ● php, its easy ● redis, its fast ● r, its well known we don't need batch! see http://goo.gl/AJntul
  42. Overview Collaborative Filtering Ensemble Visitors Most Popular Text Similarity Recommendations Feedback etc. Publisher Preferences
  43. Overview ● 2012 ○ Contest v1 ● 2013 ○ ACM RecSys “News Recommender Challenge” ● 2014 ○ CLEF News Recommendation Evaluation Labs “newsreel”
  44. questions? Contact http://goo.gl/pvXm5 (Blog) torben.brodt@plista.com http://lnkd.in/MUXXuv xing.com/profile/Torben_Brodt www.plista.com News Recommender Challenge https://sites.google.com/site/newsrec2013/ #RecSys @torbenbrodt @NRSws2013 @plista
Advertisement