Open Recommendation Platform

ACM RecSys 2013, Hong Kong

Torben Brodt
plista GmbH
Keynote
International News Recommender
Systems Workshop and Challenge
October 13th, 2013
Where it’s coming from

Recommendations

where
● news websites
● below the article
Visitors

Publisher

different types
● content
● advertising
Where it’s coming from

good recommendations for...

User
happy!

Advertiser
happy!

Publisher
happy!

plista*
happy!
* company i am working for
Where it’s coming from

some years ago

Recommendations
Context

Visitors

Publisher

Collaborative
Filtering
Where it’s coming from

one recommender
Collaborative Filtering
● well known algorithm
● more data means more
knowledge
Parameter Tuning
● time
● trust
● mainstream
Where it’s coming from

one recommender = good results
2008
● finished studies
● 1st publication
● plista was born
today
● 5k recs/second
● many publishers
Where it’s coming from

netflix prize

"use as many recommenders as possible!"
Where it’s coming from

more recommenders
Collaborative Filtering

Most Popular
Text Similarity
etc ...
understanding performance

lost in serendipity

● we have one score
● lucky success? bad loss?
● we needed to keep track
on different recommenders

success: 0.31 %
understanding performance

how to measure success
bad

good

number of
● clicks
● orders
● engages
● time on site
● money

10
understanding performance

evaluation technology

Algo1

1+1

10

Algo2

100

2+5

Algo...

● features?
● big data math?
● counting!
for blending we
just count floats
understanding performance

evaluation technology

impressions
collaborative filtering

500 +1

most popular

500

text similarity

500

ZINCRBY
"impressions"
"collaborative_filtering"
"1"
ZREVRANGEBYSCORE
"impressions"
understanding performance

evaluation technology
clicks
collaborative filtering
most popular
...

needs division
100
10
1

ZREVRANGEBYSCORE
"clicks"

collaborative filtering

500

most popular

500

ZREVRANGEBYSCORE
"impressions"

text similarity

500

impressions
understanding performance

evaluation results
success

● CF is "always" the best
recommender
● but "always" is just avg
of all context

lets check on context!
Context

Context
● We like anonymization!
●

We have a big context featured by the web
URL + HTTP Headers provide
○ user agent -> device -> mobile
○ IP address -> geolocation
○ referer -> origin (search, direct)
Context

Context

consider list of best recommender in each context attribute
sorted list for what is relevant by
● clicks (content recs)
● price (advertising recs)
category = archive
hour = 15
publisher = welt.de

text similarity

400

recent

collaborative filtering

689

most popular

135

collaborative filtering 200

collaborative filtering

10

...

420

text similarity

80

...

5

100
Context

evaluation context
publisher = welt.de
collaborative filtering

ZUNION clk ... WEIGHTS
p:welt.de:clk 4
w:sunday:clk
1
c:archive:clk 1

689

= sunday
most popular weekday 420
text similarity collaborative filtering
135
most popular
...
category = archive

400

ZREVRANGEBYSCORE
"clk"

200
100

text similarity

200

collaborative filtering

10

...

5

ZUNION imp ... WEIGHTS
p:welt.de:imp 4
w:sunday:imp
1
c:archive:imp 1
ZREVRANGEBYSCORE
"imp"
Context

Targeting
Context can be used for optimization and targeting.

classical targeting is limitation
Context

Livecube
Advertising
Advertising

Recommenders
Recommenders

RWE Europe
RWE Europe

500 +1
500 +1

collaborative filtering
collaborative filtering

500 +1
500 +1

IBM Germany
IBM Germany

500
500

most popular
most popular

500
500

Intel Austria
Intel Austria

500
500

text similarity
text similarity

500
500

Onsite
Onsite
new iphone
new iphone
su...
su...

500 +1
500 +1

twitter buys p..
twitter buys p..

500
500

google has seri. 500
google has seri. 500
Context

evaluation context
success

recap
● added another
dimension
context

result
● better for news:
Collaborative Filtering
● better for content:
Text Similarity

20
now breath!

what did we get?
● possibly many recommenders
● know how to measure success
● technology to see success
the ensemble

● real-time evaluation
technology exists
● to choose best algorithm
for current context we
need to learn: multiarmed bayesian bandit
Data Science

“shuffle” exploration exploitation
No. 1 getting most
temporary
success?

local minima?
Interested? Look for Ted Dunning + Bayesian Bandit
✓

better results

success

● new total / avg is
much better
● thx bandit
● thx ensemble

time

more research
● timeseries
✓

easy exploration

● tradeoff (money decision)
● between price/time we
“waste” in offline evaluation
● and price we loose with
bad recommendations
try and error

● minimum pre-testing
● no risk if recommender
crashs
● "bad" code might find
its context
collaboration

● now plista developers
can try ideas
● and allow
researchers to do the
same
big pool of algorithms
Collaborative
Filtering

Ensemble is able to choose
Most Popular

Ensemble
Text Similarity

Research Algorithms
BPR-Linear
WR-MF
SVD++
etc.
researcher has idea

src http://g-ecx.images-amazon.com/images/G/03/video/m/feature/wickie_figur.jpg
researcher has idea
●

●
●

●

●

src http://userserve-ak.last.fm/serve/_/7291575/Wickie%2B4775745.jpg

30

first and only dataset in news context
○ millions of items
○ only relevant for short time
dataset has many attributes !!
many publishers have user intersection
○ regional
○ contextual
real world !!!
○ you can guide the user
○ you don’t need to follow his route
real time !!
○ This is industry, it has to be usable
... needs to start the server
... probably hosted by university, plista or any cloud provider?
... api implementation
"message bus"
● event notifications
○ impression
○ click

● error notifications
● item updates
train model from it
... package content
{ // json
"type": "impression",
"context": {
"simple": {
"27": 418, // publisher
"14": 31721, // widget
...
},
"lists": {
"10": [100, 101] // channel
}
... specs hosted at http://orp.plista.
api
} com
... package content

{ // json
"type": "impression",
"recs": ...
// what was recommended
}
api specs hosted at http://orp.plista.
com
... package content

{ // json
"type": "click",
"context": ...
// will include the position
}

api specs hosted at http://orp.plista.
com
... reply to recommendation requests
{ // json

Real User

"recs": {
"int": {
"3": [13010630, 84799192]

recs

// 3 refers to content
recommendations
}
...

API

}

generated by researchers
to be shown to real user
api specs hosted at http://orp.plista.
com

Researcher
quality is win win #2
● happy user
Real User
recs

● happy researcher
● happy plista

research can profit
● real user feedback
Researcher

● real benchmark
how to build fast system?
use common frameworks

src http://en.wikipedia.org/wiki/Pac-Man
quick and fast

● no movies!
● news articles will outdate!
● visitors need the recs NOW
● => handle the data very fast

src http://static.comicvine.com/uploads/original/10/101435/2026520-flash.jpg
"send quickly" technologies

● fast web server
● fast network protocol
or Apache Kafka

● fast message queue
● fast storage

40
comparison to plista
"real-time features feel better in a real-time world"
our setup
● php, its easy
● redis, its fast
● r, its well known
we don't need batch! see http://goo.gl/AJntul
Overview
Collaborative
Filtering

Ensemble
Visitors

Most Popular

Text Similarity
Recommendations
Feedback

etc.

Publisher

Preferences
Overview
● 2012
○ Contest v1
● 2013
○ ACM RecSys “News
Recommender Challenge”
● 2014
○ CLEF News Recommendation
Evaluation Labs “newsreel”
questions?
Contact
http://goo.gl/pvXm5 (Blog)
torben.brodt@plista.com
http://lnkd.in/MUXXuv
xing.com/profile/Torben_Brodt
www.plista.com
News Recommender Challenge
https://sites.google.com/site/newsrec2013/
#RecSys
@torbenbrodt @NRSws2013 @plista

Open recommendation platform

  • 1.
    Open Recommendation Platform ACMRecSys 2013, Hong Kong Torben Brodt plista GmbH Keynote International News Recommender Systems Workshop and Challenge October 13th, 2013
  • 2.
    Where it’s comingfrom Recommendations where ● news websites ● below the article Visitors Publisher different types ● content ● advertising
  • 3.
    Where it’s comingfrom good recommendations for... User happy! Advertiser happy! Publisher happy! plista* happy! * company i am working for
  • 4.
    Where it’s comingfrom some years ago Recommendations Context Visitors Publisher Collaborative Filtering
  • 5.
    Where it’s comingfrom one recommender Collaborative Filtering ● well known algorithm ● more data means more knowledge Parameter Tuning ● time ● trust ● mainstream
  • 6.
    Where it’s comingfrom one recommender = good results 2008 ● finished studies ● 1st publication ● plista was born today ● 5k recs/second ● many publishers
  • 7.
    Where it’s comingfrom netflix prize "use as many recommenders as possible!"
  • 8.
    Where it’s comingfrom more recommenders Collaborative Filtering Most Popular Text Similarity etc ...
  • 9.
    understanding performance lost inserendipity ● we have one score ● lucky success? bad loss? ● we needed to keep track on different recommenders success: 0.31 %
  • 10.
    understanding performance how tomeasure success bad good number of ● clicks ● orders ● engages ● time on site ● money 10
  • 11.
    understanding performance evaluation technology Algo1 1+1 10 Algo2 100 2+5 Algo... ●features? ● big data math? ● counting! for blending we just count floats
  • 12.
    understanding performance evaluation technology impressions collaborativefiltering 500 +1 most popular 500 text similarity 500 ZINCRBY "impressions" "collaborative_filtering" "1" ZREVRANGEBYSCORE "impressions"
  • 13.
    understanding performance evaluation technology clicks collaborativefiltering most popular ... needs division 100 10 1 ZREVRANGEBYSCORE "clicks" collaborative filtering 500 most popular 500 ZREVRANGEBYSCORE "impressions" text similarity 500 impressions
  • 14.
    understanding performance evaluation results success ●CF is "always" the best recommender ● but "always" is just avg of all context lets check on context!
  • 15.
    Context Context ● We likeanonymization! ● We have a big context featured by the web URL + HTTP Headers provide ○ user agent -> device -> mobile ○ IP address -> geolocation ○ referer -> origin (search, direct)
  • 16.
    Context Context consider list ofbest recommender in each context attribute sorted list for what is relevant by ● clicks (content recs) ● price (advertising recs) category = archive hour = 15 publisher = welt.de text similarity 400 recent collaborative filtering 689 most popular 135 collaborative filtering 200 collaborative filtering 10 ... 420 text similarity 80 ... 5 100
  • 17.
    Context evaluation context publisher =welt.de collaborative filtering ZUNION clk ... WEIGHTS p:welt.de:clk 4 w:sunday:clk 1 c:archive:clk 1 689 = sunday most popular weekday 420 text similarity collaborative filtering 135 most popular ... category = archive 400 ZREVRANGEBYSCORE "clk" 200 100 text similarity 200 collaborative filtering 10 ... 5 ZUNION imp ... WEIGHTS p:welt.de:imp 4 w:sunday:imp 1 c:archive:imp 1 ZREVRANGEBYSCORE "imp"
  • 18.
    Context Targeting Context can beused for optimization and targeting. classical targeting is limitation
  • 19.
    Context Livecube Advertising Advertising Recommenders Recommenders RWE Europe RWE Europe 500+1 500 +1 collaborative filtering collaborative filtering 500 +1 500 +1 IBM Germany IBM Germany 500 500 most popular most popular 500 500 Intel Austria Intel Austria 500 500 text similarity text similarity 500 500 Onsite Onsite new iphone new iphone su... su... 500 +1 500 +1 twitter buys p.. twitter buys p.. 500 500 google has seri. 500 google has seri. 500
  • 20.
    Context evaluation context success recap ● addedanother dimension context result ● better for news: Collaborative Filtering ● better for content: Text Similarity 20
  • 21.
    now breath! what didwe get? ● possibly many recommenders ● know how to measure success ● technology to see success
  • 22.
    the ensemble ● real-timeevaluation technology exists ● to choose best algorithm for current context we need to learn: multiarmed bayesian bandit
  • 23.
    Data Science “shuffle” explorationexploitation No. 1 getting most temporary success? local minima? Interested? Look for Ted Dunning + Bayesian Bandit
  • 24.
    ✓ better results success ● newtotal / avg is much better ● thx bandit ● thx ensemble time more research ● timeseries
  • 25.
    ✓ easy exploration ● tradeoff(money decision) ● between price/time we “waste” in offline evaluation ● and price we loose with bad recommendations
  • 26.
    try and error ●minimum pre-testing ● no risk if recommender crashs ● "bad" code might find its context
  • 27.
    collaboration ● now plistadevelopers can try ideas ● and allow researchers to do the same
  • 28.
    big pool ofalgorithms Collaborative Filtering Ensemble is able to choose Most Popular Ensemble Text Similarity Research Algorithms BPR-Linear WR-MF SVD++ etc.
  • 29.
    researcher has idea srchttp://g-ecx.images-amazon.com/images/G/03/video/m/feature/wickie_figur.jpg
  • 30.
    researcher has idea ● ● ● ● ● srchttp://userserve-ak.last.fm/serve/_/7291575/Wickie%2B4775745.jpg 30 first and only dataset in news context ○ millions of items ○ only relevant for short time dataset has many attributes !! many publishers have user intersection ○ regional ○ contextual real world !!! ○ you can guide the user ○ you don’t need to follow his route real time !! ○ This is industry, it has to be usable
  • 31.
    ... needs tostart the server ... probably hosted by university, plista or any cloud provider?
  • 32.
    ... api implementation "messagebus" ● event notifications ○ impression ○ click ● error notifications ● item updates train model from it
  • 33.
    ... package content {// json "type": "impression", "context": { "simple": { "27": 418, // publisher "14": 31721, // widget ... }, "lists": { "10": [100, 101] // channel } ... specs hosted at http://orp.plista. api } com
  • 34.
    ... package content {// json "type": "impression", "recs": ... // what was recommended } api specs hosted at http://orp.plista. com
  • 35.
    ... package content {// json "type": "click", "context": ... // will include the position } api specs hosted at http://orp.plista. com
  • 36.
    ... reply torecommendation requests { // json Real User "recs": { "int": { "3": [13010630, 84799192] recs // 3 refers to content recommendations } ... API } generated by researchers to be shown to real user api specs hosted at http://orp.plista. com Researcher
  • 37.
    quality is winwin #2 ● happy user Real User recs ● happy researcher ● happy plista research can profit ● real user feedback Researcher ● real benchmark
  • 38.
    how to buildfast system? use common frameworks src http://en.wikipedia.org/wiki/Pac-Man
  • 39.
    quick and fast ●no movies! ● news articles will outdate! ● visitors need the recs NOW ● => handle the data very fast src http://static.comicvine.com/uploads/original/10/101435/2026520-flash.jpg
  • 40.
    "send quickly" technologies ●fast web server ● fast network protocol or Apache Kafka ● fast message queue ● fast storage 40
  • 41.
    comparison to plista "real-timefeatures feel better in a real-time world" our setup ● php, its easy ● redis, its fast ● r, its well known we don't need batch! see http://goo.gl/AJntul
  • 42.
  • 43.
    Overview ● 2012 ○ Contestv1 ● 2013 ○ ACM RecSys “News Recommender Challenge” ● 2014 ○ CLEF News Recommendation Evaluation Labs “newsreel”
  • 44.