Open Recommendation Platform
ACM RecSys 2013, Hong Kong
Torben Brodt
plista GmbH
Keynote
International News Recommender
Systems Workshop and Challenge
October 13th, 2013
Where it’s coming from
Recommendations
where
● news websites
● below the article
Visitors
Publisher
different types
● content
● advertising
Where it’s coming from
good recommendations for...
User
happy!
Advertiser
happy!
Publisher
happy!
plista*
happy!
* company i am working for
Where it’s coming from
some years ago
Recommendations
Context
Visitors
Publisher
Collaborative
Filtering
Where it’s coming from
one recommender
Collaborative Filtering
● well known algorithm
● more data means more
knowledge
Parameter Tuning
● time
● trust
● mainstream
Where it’s coming from
one recommender = good results
2008
● finished studies
● 1st publication
● plista was born
today
● 5k recs/second
● many publishers
Where it’s coming from
netflix prize
"use as many recommenders as possible!"
Where it’s coming from
more recommenders
Collaborative Filtering
Most Popular
Text Similarity
etc ...
understanding performance
lost in serendipity
● we have one score
● lucky success? bad loss?
● we needed to keep track
on different recommenders
success: 0.31 %
Context
Context
● We like anonymization!
●
We have a big context featured by the web
URL + HTTP Headers provide
○ user agent -> device -> mobile
○ IP address -> geolocation
○ referer -> origin (search, direct)
Context
Context
consider list of best recommender in each context attribute
sorted list for what is relevant by
● clicks (content recs)
● price (advertising recs)
category = archive
hour = 15
publisher = welt.de
text similarity
400
recent
collaborative filtering
689
most popular
135
collaborative filtering 200
collaborative filtering
10
...
420
text similarity
80
...
5
100
Context
Livecube
Advertising
Advertising
Recommenders
Recommenders
RWE Europe
RWE Europe
500 +1
500 +1
collaborative filtering
collaborative filtering
500 +1
500 +1
IBM Germany
IBM Germany
500
500
most popular
most popular
500
500
Intel Austria
Intel Austria
500
500
text similarity
text similarity
500
500
Onsite
Onsite
new iphone
new iphone
su...
su...
500 +1
500 +1
twitter buys p..
twitter buys p..
500
500
google has seri. 500
google has seri. 500
big pool of algorithms
Collaborative
Filtering
Ensemble is able to choose
Most Popular
Ensemble
Text Similarity
Research Algorithms
BPR-Linear
WR-MF
SVD++
etc.
researcher has idea
src http://g-ecx.images-amazon.com/images/G/03/video/m/feature/wickie_figur.jpg
researcher has idea
●
●
●
●
●
src http://userserve-ak.last.fm/serve/_/7291575/Wickie%2B4775745.jpg
30
first and only dataset in news context
○ millions of items
○ only relevant for short time
dataset has many attributes !!
many publishers have user intersection
○ regional
○ contextual
real world !!!
○ you can guide the user
○ you don’t need to follow his route
real time !!
○ This is industry, it has to be usable
... needs to start the server
... probably hosted by university, plista or any cloud provider?
... api implementation
"message bus"
● event notifications
○ impression
○ click
● error notifications
● item updates
train model from it
... package content
{ // json
"type": "impression",
"recs": ...
// what was recommended
}
api specs hosted at http://orp.plista.
com
... package content
{ // json
"type": "click",
"context": ...
// will include the position
}
api specs hosted at http://orp.plista.
com
... reply to recommendation requests
{ // json
Real User
"recs": {
"int": {
"3": [13010630, 84799192]
recs
// 3 refers to content
recommendations
}
...
API
}
generated by researchers
to be shown to real user
api specs hosted at http://orp.plista.
com
Researcher
quality is win win #2
● happy user
Real User
recs
● happy researcher
● happy plista
research can profit
● real user feedback
Researcher
● real benchmark
how to build fast system?
use common frameworks
src http://en.wikipedia.org/wiki/Pac-Man
quick and fast
● no movies!
● news articles will outdate!
● visitors need the recs NOW
● => handle the data very fast
src http://static.comicvine.com/uploads/original/10/101435/2026520-flash.jpg
"send quickly" technologies
● fast web server
● fast network protocol
or Apache Kafka
● fast message queue
● fast storage
40
comparison to plista
"real-time features feel better in a real-time world"
our setup
● php, its easy
● redis, its fast
● r, its well known
we don't need batch! see http://goo.gl/AJntul