SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
Open Recommendation Platform
ACM RecSys 2013, Hong Kong
Torben Brodt
plista GmbH
Keynote
International News Recommender
Systems Workshop and Challenge
October 13th, 2013
2.
Where it’s coming from
Recommendations
where
● news websites
● below the article
Visitors
Publisher
different types
● content
● advertising
3.
Where it’s coming from
good recommendations for...
User
happy!
Advertiser
happy!
Publisher
happy!
plista*
happy!
* company i am working for
4.
Where it’s coming from
some years ago
Recommendations
Context
Visitors
Publisher
Collaborative
Filtering
5.
Where it’s coming from
one recommender
Collaborative Filtering
● well known algorithm
● more data means more
knowledge
Parameter Tuning
● time
● trust
● mainstream
6.
Where it’s coming from
one recommender = good results
2008
● finished studies
● 1st publication
● plista was born
today
● 5k recs/second
● many publishers
7.
Where it’s coming from
netflix prize
"use as many recommenders as possible!"
8.
Where it’s coming from
more recommenders
Collaborative Filtering
Most Popular
Text Similarity
etc ...
9.
understanding performance
lost in serendipity
● we have one score
● lucky success? bad loss?
● we needed to keep track
on different recommenders
success: 0.31 %
10.
understanding performance
how to measure success
bad
good
number of
● clicks
● orders
● engages
● time on site
● money
10
11.
understanding performance
evaluation technology
Algo1
1+1
10
Algo2
100
2+5
Algo...
● features?
● big data math?
● counting!
for blending we
just count floats
12.
understanding performance
evaluation technology
impressions
collaborative filtering
500 +1
most popular
500
text similarity
500
ZINCRBY
"impressions"
"collaborative_filtering"
"1"
ZREVRANGEBYSCORE
"impressions"
13.
understanding performance
evaluation technology
clicks
collaborative filtering
most popular
...
needs division
100
10
1
ZREVRANGEBYSCORE
"clicks"
collaborative filtering
500
most popular
500
ZREVRANGEBYSCORE
"impressions"
text similarity
500
impressions
14.
understanding performance
evaluation results
success
● CF is "always" the best
recommender
● but "always" is just avg
of all context
lets check on context!
15.
Context
Context
● We like anonymization!
●
We have a big context featured by the web
URL + HTTP Headers provide
○ user agent -> device -> mobile
○ IP address -> geolocation
○ referer -> origin (search, direct)
16.
Context
Context
consider list of best recommender in each context attribute
sorted list for what is relevant by
● clicks (content recs)
● price (advertising recs)
category = archive
hour = 15
publisher = welt.de
text similarity
400
recent
collaborative filtering
689
most popular
135
collaborative filtering 200
collaborative filtering
10
...
420
text similarity
80
...
5
100
18.
Context
Targeting
Context can be used for optimization and targeting.
classical targeting is limitation
19.
Context
Livecube
Advertising
Advertising
Recommenders
Recommenders
RWE Europe
RWE Europe
500 +1
500 +1
collaborative filtering
collaborative filtering
500 +1
500 +1
IBM Germany
IBM Germany
500
500
most popular
most popular
500
500
Intel Austria
Intel Austria
500
500
text similarity
text similarity
500
500
Onsite
Onsite
new iphone
new iphone
su...
su...
500 +1
500 +1
twitter buys p..
twitter buys p..
500
500
google has seri. 500
google has seri. 500
20.
Context
evaluation context
success
recap
● added another
dimension
context
result
● better for news:
Collaborative Filtering
● better for content:
Text Similarity
20
21.
now breath!
what did we get?
● possibly many recommenders
● know how to measure success
● technology to see success
22.
the ensemble
● real-time evaluation
technology exists
● to choose best algorithm
for current context we
need to learn: multiarmed bayesian bandit
23.
Data Science
“shuffle” exploration exploitation
No. 1 getting most
temporary
success?
local minima?
Interested? Look for Ted Dunning + Bayesian Bandit
24.
✓
better results
success
● new total / avg is
much better
● thx bandit
● thx ensemble
time
more research
● timeseries
25.
✓
easy exploration
● tradeoff (money decision)
● between price/time we
“waste” in offline evaluation
● and price we loose with
bad recommendations
26.
try and error
● minimum pre-testing
● no risk if recommender
crashs
● "bad" code might find
its context
27.
collaboration
● now plista developers
can try ideas
● and allow
researchers to do the
same
28.
big pool of algorithms
Collaborative
Filtering
Ensemble is able to choose
Most Popular
Ensemble
Text Similarity
Research Algorithms
BPR-Linear
WR-MF
SVD++
etc.
29.
researcher has idea
src http://g-ecx.images-amazon.com/images/G/03/video/m/feature/wickie_figur.jpg
30.
researcher has idea
●
●
●
●
●
src http://userserve-ak.last.fm/serve/_/7291575/Wickie%2B4775745.jpg
30
first and only dataset in news context
○ millions of items
○ only relevant for short time
dataset has many attributes !!
many publishers have user intersection
○ regional
○ contextual
real world !!!
○ you can guide the user
○ you don’t need to follow his route
real time !!
○ This is industry, it has to be usable
31.
... needs to start the server
... probably hosted by university, plista or any cloud provider?
32.
... api implementation
"message bus"
● event notifications
○ impression
○ click
● error notifications
● item updates
train model from it
34.
... package content
{ // json
"type": "impression",
"recs": ...
// what was recommended
}
api specs hosted at http://orp.plista.
com
35.
... package content
{ // json
"type": "click",
"context": ...
// will include the position
}
api specs hosted at http://orp.plista.
com
36.
... reply to recommendation requests
{ // json
Real User
"recs": {
"int": {
"3": [13010630, 84799192]
recs
// 3 refers to content
recommendations
}
...
API
}
generated by researchers
to be shown to real user
api specs hosted at http://orp.plista.
com
Researcher
37.
quality is win win #2
● happy user
Real User
recs
● happy researcher
● happy plista
research can profit
● real user feedback
Researcher
● real benchmark
38.
how to build fast system?
use common frameworks
src http://en.wikipedia.org/wiki/Pac-Man
39.
quick and fast
● no movies!
● news articles will outdate!
● visitors need the recs NOW
● => handle the data very fast
src http://static.comicvine.com/uploads/original/10/101435/2026520-flash.jpg
40.
"send quickly" technologies
● fast web server
● fast network protocol
or Apache Kafka
● fast message queue
● fast storage
40
41.
comparison to plista
"real-time features feel better in a real-time world"
our setup
● php, its easy
● redis, its fast
● r, its well known
we don't need batch! see http://goo.gl/AJntul
42.
Overview
Collaborative
Filtering
Ensemble
Visitors
Most Popular
Text Similarity
Recommendations
Feedback
etc.
Publisher
Preferences