Open recommendation platform

Open Recommendation Platform

ACM RecSys 2013, Hong Kong

Torben Brodt
plista GmbH
Keynote
International News Recommender
Systems Workshop and Challenge
October 13th, 2013

Where it’s coming from

Recommendations

where
● news websites
● below the article
Visitors

Publisher

different types
● content
● advertising


good recommendations for...

User
happy!

Advertiser
happy!

Publisher
happy!

plista*
happy!
* company i am working for


some years ago

Recommendations
Context

Visitors

Publisher

Collaborative
Filtering


one recommender
Collaborative Filtering
● well known algorithm
● more data means more
knowledge
Parameter Tuning
● time
● trust
● mainstream


one recommender = good results
2008
● finished studies
● 1st publication
● plista was born
today
● 5k recs/second
● many publishers


netflix prize

"use as many recommenders as possible!"


more recommenders

Most Popular
Text Similarity
etc ...

understanding performance

lost in serendipity

● we have one score
● lucky success? bad loss?
● we needed to keep track
on different recommenders

success: 0.31 %


how to measure success
bad

good

number of
● clicks
● orders
● engages
● time on site
● money

10


evaluation technology

Algo1

1+1

10

Algo2

100

2+5

Algo...

● features?
● big data math?
● counting!
for blending we
just count floats



impressions
collaborative filtering

500 +1

most popular

500

text similarity

500

ZINCRBY
"impressions"
"collaborative_filtering"
"1"
ZREVRANGEBYSCORE
"impressions"


clicks
most popular
...

needs division
100
10
1

ZREVRANGEBYSCORE
"clicks"


500

most popular

500

ZREVRANGEBYSCORE
"impressions"

text similarity

500

impressions


evaluation results
success

● CF is "always" the best
recommender
● but "always" is just avg
of all context

lets check on context!

Context

Context
● We like anonymization!
●

We have a big context featured by the web
URL + HTTP Headers provide
○ user agent -> device -> mobile
○ IP address -> geolocation
○ referer -> origin (search, direct)

Context

Context

consider list of best recommender in each context attribute
sorted list for what is relevant by
● clicks (content recs)
● price (advertising recs)
category = archive
hour = 15
publisher = welt.de

text similarity

400

recent


689

most popular

135

collaborative filtering 200


10

...

420

text similarity

80

...

5

100

Context

evaluation context
publisher = welt.de

ZUNION clk ... WEIGHTS
p:welt.de:clk 4
w:sunday:clk
1
c:archive:clk 1

689

= sunday
most popular weekday 420
text similarity collaborative filtering
135
most popular
...
category = archive

400

ZREVRANGEBYSCORE
"clk"

200
100

text similarity

200


10

...

5

ZUNION imp ... WEIGHTS
p:welt.de:imp 4
w:sunday:imp
1
c:archive:imp 1
ZREVRANGEBYSCORE
"imp"

Context

Targeting
Context can be used for optimization and targeting.

classical targeting is limitation

Context

Livecube
Advertising
Advertising

Recommenders
Recommenders

RWE Europe
RWE Europe

500 +1
500 +1


500 +1
500 +1

IBM Germany
IBM Germany

500
500

most popular
most popular

500
500

Intel Austria
Intel Austria

500
500

text similarity
text similarity

500
500

Onsite
Onsite
new iphone
new iphone
su...
su...

500 +1
500 +1

twitter buys p..
twitter buys p..

500
500

google has seri. 500
google has seri. 500

Context

evaluation context
success

recap
● added another
dimension
context

result
● better for news:
● better for content:
Text Similarity

20

now breath!

what did we get?
● possibly many recommenders
● know how to measure success
● technology to see success

the ensemble

● real-time evaluation
technology exists
● to choose best algorithm
for current context we
need to learn: multiarmed bayesian bandit

Data Science

“shuffle” exploration exploitation
No. 1 getting most
temporary
success?

local minima?
Interested? Look for Ted Dunning + Bayesian Bandit

✓

better results

success

● new total / avg is
much better
● thx bandit
● thx ensemble

time

more research
● timeseries

✓

easy exploration

● tradeoff (money decision)
● between price/time we
“waste” in offline evaluation
● and price we loose with
bad recommendations

try and error

● minimum pre-testing
● no risk if recommender
crashs
● "bad" code might find
its context

collaboration

● now plista developers
can try ideas
● and allow
researchers to do the
same

big pool of algorithms
Collaborative
Filtering

Ensemble is able to choose
Most Popular

Ensemble
Text Similarity

Research Algorithms
BPR-Linear
WR-MF
SVD++
etc.

researcher has idea

src http://g-ecx.images-amazon.com/images/G/03/video/m/feature/wickie_figur.jpg

researcher has idea
●

●
●

●

●

src http://userserve-ak.last.fm/serve/_/7291575/Wickie%2B4775745.jpg

30

first and only dataset in news context
○ millions of items
○ only relevant for short time
dataset has many attributes !!
many publishers have user intersection
○ regional
○ contextual
real world !!!
○ you can guide the user
○ you don’t need to follow his route
real time !!
○ This is industry, it has to be usable

... needs to start the server
... probably hosted by university, plista or any cloud provider?

... api implementation
"message bus"
● event notifications
○ impression
○ click

● error notifications
● item updates
train model from it

... package content
{ // json
"type": "impression",
"context": {
"simple": {
"27": 418, // publisher
"14": 31721, // widget
...
},
"lists": {
"10": [100, 101] // channel
}
... specs hosted at http://orp.plista.
api
} com

... package content

{ // json
"type": "impression",
"recs": ...
// what was recommended
}
api specs hosted at http://orp.plista.
com

... package content

{ // json
"type": "click",
"context": ...
// will include the position
}

com

... reply to recommendation requests
{ // json

Real User

"recs": {
"int": {
"3": [13010630, 84799192]

recs

// 3 refers to content
recommendations
}
...

API

}

generated by researchers
to be shown to real user
com

Researcher

quality is win win #2
● happy user
Real User
recs

● happy researcher
● happy plista

research can profit
● real user feedback
Researcher

● real benchmark

how to build fast system?
use common frameworks

src http://en.wikipedia.org/wiki/Pac-Man

quick and fast

● no movies!
● news articles will outdate!
● visitors need the recs NOW
● => handle the data very fast

src http://static.comicvine.com/uploads/original/10/101435/2026520-flash.jpg

"send quickly" technologies

● fast web server
● fast network protocol
or Apache Kafka

● fast message queue
● fast storage

40

comparison to plista
"real-time features feel better in a real-time world"
our setup
● php, its easy
● redis, its fast
● r, its well known
we don't need batch! see http://goo.gl/AJntul

Overview
Collaborative
Filtering

Ensemble
Visitors

Most Popular

Text Similarity
Recommendations
Feedback

etc.

Publisher

Preferences

Overview
● 2012
○ Contest v1
● 2013
○ ACM RecSys “News
Recommender Challenge”
● 2014
○ CLEF News Recommendation
Evaluation Labs “newsreel”

questions?
Contact
http://goo.gl/pvXm5 (Blog)
torben.brodt@plista.com
http://lnkd.in/MUXXuv
xing.com/profile/Torben_Brodt
www.plista.com
News Recommender Challenge
https://sites.google.com/site/newsrec2013/
#RecSys
@torbenbrodt @NRSws2013 @plista

Open recommendation platform

More Related Content

Similar to Open recommendation platform

More from Torben Brodt

Recently uploaded

Open recommendation platform