Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Real-world News Recommender Systems
1. Verarbeitung von Datenstromen in Echtzeit
Tobias Heintz1 Benjamin Kille2
1plista GmbH
2Technische Universitat Berlin
September 26, 2014
2. Table of Contents
Introduction
Recommender Systems
Unpersonalised Recommendation
Collaborative Filtering
Content-based Filtering
Evaluation
News Recommendation
Big Data Issues
3. Who are we?
I Tobias Heintz, plista GmbH
I Benjamin Kille, Technische Universitat Berlin
plista GmbH
Pioneers for targeted advertisement and content distribution.
I founded 31 July, 2008
I incorporated in the WPP Group as of 1 January, 2014
I headquaters in Berlin, Germany
I 120 employees (30 % R&D)
Technische Universitat Berlin
I >30 000 enrolled students
I 331 professors
I >2600 researchers
4. What problems do we address?
Recommender Systems
We will introduce recommender systems; we will discuss a variety
of algorithms; we will explore how to evaluate recommender
systems.
News
We will talk about speci
5. c challenges when recommending news;
we will illustrate issues arising as system fail to build
comprehensive user pro
6. les; we will depict how news evolving over
time aect recommender systems.
Big Data
We will examplify in what way news represent a source of big data;
we will introduce a system which grants researchers access to big
data; we will show you, how you can compete with your own
approaches.
7. Why are these problem important?
Users increasingly face information overload as they interact with
item collections. For instance:
I 43 000 000 songs on Apple's iTunes
I 100 h of video are uploaded on Youtube every minute
I 3 000 000 movies on IMDb
I ...
Collection continue to grow causing even more severe information
overload. The same yields for news articles.
8. Table of Contents
Introduction
Recommender Systems
Unpersonalised Recommendation
Collaborative Filtering
Content-based Filtering
Evaluation
News Recommendation
Big Data Issues
12. lter. More formally, a general-purpose
recommender system is a triple (U; I; ).
U ! set of users fu1; u2; : : : ; uMg
I ! set of items fi1; i2; : : : ; iNg
! a
13. lter function
The performance of dierent recommendation algorithms typically
depends on .
14. Filter Functions
Filter functions take a user u, the entire item collection I, and a
model M. They return a subset of items to be recommended I.
(u; I;M) = I
Recommender systems' success or failure strongly depends on the
model M. In particular, how accurately the model re
ects actual
user preferences. M may take various kinds of input, as we will
discuss for a selection of recommendation algorithms.
17. Most-Popular Recommendation
M orders the item collection according to the number of
interactions, K L M N.
K interactions
L interactions
M interactions
most N interactions
popular
18. Summary: Unpersonalised Recommenders
Advantages
I low computational complexity
I easy to update M
I domain independent
Disadvantages
I disregard personal taste
I disregard context
I high chance to recommend known or unpopular items
19. Collaborative Filtering
Basic Assumptions
I systems have access to users' preferences
I users with similar tastes in the past will continue to like
similar items
I systems have means to compare users tastes
Distinctions
I model-based vs memory-based
I item-based vs user-based
22. Example
Anna
Aviator
Bob
Clara
Dan
Bad Boys
Cars
District 9
Elektra
user profile: Anna
Bad Boys District 9 Elektra [ , , ]
23. Example
Anna
Bob
Clara
Dan
[ , , ]
Bad Boys District 9 Elektra [ , , , ]
Aviator
Bad Boys District 9 Elektra
[ , , ]
Cars District 9 Elektra
[ ]
Aviator
28. Preference Elicitation
Explicit Preferences
I Likes
I Thumbs Up/Down
I Ratings
I Comments
I Purchase
Implicit Preferences
I Click
I Dwell Time
I Returns
How can we measure whether users like items and how much they
do?
29. Collaborative Filtering Algorithms with Ratings
Memory-based
Algorithm uses the complete set of data in the recommendation
process. M contains the full rating matrix.
I user-based k-nearest neighbour
I item-based k-nearest neighbour
Model-based
Algorithm learns a model M and uses it to recommend items.
I matrix factorisation with ALS
I matrix factorisation with SGD
30. User-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (u; v)
Anna
Aviator
Bob
Clara
Dan
Bad Boys Cars District 9 Elektra
31. User-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (u; v)
Anna
Aviator
Bob
Clara
Dan
Bad Boys Cars District 9 Elektra
32. User-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (u; v)
Anna
Aviator
Bob
Bad Boys Cars District 9 Elektra
0 0
1 1 1
1 1 0
1 1
33. Similarity Measures
Number of items in common
(u; v) =
X
i2I
I(i)
I(i) =
(
1 if both u and v liked i
0 otherwise
Cosine similarity
(u; v) =
u v
jjujjjjvjj
Pearson's correlation coecient
(u; v) =
cov(u; v)
std(u)std(v)
34. User-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (u; v)
Anna
Bob
Clara
Dan
Anna Bob Clara Dan
1
1
1
1
sim(Anna, Bob)
sim(Bob, Anna)
35. User-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (u; v)
Anna
Bob
Clara
Dan
Anna Bob Clara Dan
1
1
1
1
sim(Anna, Bob)
sim(Bob, Anna)
[1, sBob, sClara, sDan]
36. User-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (u; v)
Anna
Aviator
Bob
Clara
Dan
Bad Boys Cars District 9 Elektra
?
38. le:
u = (r (i1); r (i2); : : : ; r (iN))
similarity vector:
(u; ) = ((u; v1); (u; v2); : : : ; (u; u); : : : ; (u; vM))
preference prediction:
r (j) = u(u; )
Result
We obtain a prediction for each item's preference and can rank
them accordingly. The algorithm returns as many items as
requested starting from the top rank.
39. Item-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (i ; j)
Anna
Aviator
Bob
Clara
Dan
Bad Boys Cars District 9 Elektra
40. Item-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (i ; j)
Anna
Aviator
Bob
Clara
Dan
Bad Boys Cars District 9 Elektra
41. Item-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (i ; j)
Anna
Aviator
Bob
Clara
Dan
Bad Boys
1
1
1
1
0
0 0
0
42. Similarity Measures
Number of items in common
(i ; j) =
X
u2U
I(u)
I(u) =
(
1 if both i and j are liked by u
0 otherwise
Cosine similarity
(i ; j) =
i j
jji jjjjj jj
Pearson's correlation coecient
(i ; j) =
cov(i ; j)
std(i)std(j)
43. Item-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (i ; j)
Aviator Bad Boys Cars District 9 Elektra
Aviator
Bad Boys
Cars
District 9
Elektra
1
1
1
1
1
sim(Aviator, Bad Boys)
sim(Bad Boys, Aviator)
44. Item-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (i ; j)
Anna
Aviator
Bob
Clara
Dan
Bad Boys Cars District 9 Elektra
?
46. le:
i = (r (u1); r (u2); : : : ; r (uM))
similarity vector:
(i ; ) = ((i ; j1); (i ; j2); : : : ; (i ; i); : : : ; (i ; jN))
preference prediction:
r (u) = (i ; )i
Result
We obtain a prediction for each item's preference and can rank
them accordingly. The algorithm returns as many items as
requested starting from the top rank.
47. Matrix Factorisation
Input: M N rating matrix R
R =
2
664
1 1 1
1 1 1 1
1 1 1
1
3
775
Goal
Fill the gaps of missing preferences.
48. Matrix Factorisation
Idea
Project preferences into low dimensional space to detect latent
structures.
[R]MN [P]MK[Q]N
K
K M;N
Problem
How to determine P and Q?
49. Matrix Factorisation
Learning P and Q
Input: Error metric
E(P;Q; R) =
X
(u;i)2R
i )2
(r (u; i) PuQ
(quadratic error)
E(P;Q; R) =
X
(u;i)2R
jr (u; i) PuQ
i j
(absolute error)
50. Matrix Factorisation
Stochastic Gradient Descent
Optimise error metric by selecting data points at random.
I initialise P;Q with small random values
I pick a preference (u; i) at random
I determine the gradient at that point
I adjust P;Q accordingly
I continue
Alternating Least Squares
Optimise either P or Q keeping the other
51. xed
I initialise P;Q with small random values
I optimise error metric by P
I optimise error metric by Q
I continue
52. Summary: Collaborative Filtering
Advantages
I takes personal taste into account
I successful in the Net
ix Prize competition
I domain-independent
Disadvantages
I cold-start problem
I sparsity
I grey sheep
53. Cold-Start Problem
I user without known preferences
I item without preferences
I similarity measures fail
I inconclusive latent factors
54. Grey Sheep
I user rate all their items average
I user pro
55. le: [3; 3; 3; 3; : : : ; 3]
I collaborative systems cannot distinguish good from bad items
56. Content-based Filtering
Idea
Suggest items which are similar to items users have liked.
Similarity
I based on content ! features
I depending on the domain
62. le, item collection, item features, and similarity
measure
Features
▪ Name/ID
▪ Meta data
▪ Content
▪ audio stream -- songs
▪ video stream --
movies
▪ text -- book, news
article
65. Content-based Filtering
Similarity: Example
I keyword overlap ! text
I average colour match ! images/video
I maximum amplitude ! audio/sound
I common actors ! movies
I common interests ! friends/partnership
66. Summary: Content-based Filtering
Advantages
I considers personal taste
I high expectability
Disadvantages
I cost-sensitive for high-volume contents, e.g., video
I low serendipity
I user cold-start
67. Evaluation
Important aspects
I how well does the system predict preferences?
I how often do users receive useful suggestions?
I how long does it take for the system to provide suggestions?
I how many requests cannot be answered?
I how often do users return to the site?
I how often do users purchase/rent/consume items which the
system had recommended?
I how well did users perceive the system?
68. Evaluation: Rating Prediction
Goal
The evaluation ought to show how well the system estimates
preferences.
Assumptions
I system can access recorded explicit numerical preferences
I tastes remain stable over time
I the more accurate the system estimates preferences, the more
suited the suggestions
Metrics
I root mean squared error
q
1
j(u;i)j
P
(u;i)2R(r (u; i) ^r (u; i ))2
I mean absolute error 1
j(u;i)j
P
(u;i)2R jr (u; i) ^r (u; i)j
69. Evaluation: Ranking
Goal
The evaluation ought to show how well the system ranks items
according to users' preferences.
Assumptions
I system can access preference relations between items
I tastes remain stable over time
I the better the system ranks items, the more suited the
suggestions
Metrics
I normalised discounted cumulative gain DCG
IDCG
I mean reciprocal rank 1
juj
P
u2U
1
ranki
70. Evaluation: Top-N
Goald
The evaluation ought to show how well the system selects the top
suggestions.
Assumptions
I system can access preference relations between items
I tastes remain stable over time
I the better the system selects the top suggestions, the more
suited they are
Metrics
I precision@N TP
TP+FP
I recall@N TP
TP+FN
71. Evaluation: Problems
I explicit preferences may not be available
I tastes change over time
I recorded data does not fully re
ect the current situation
Solution
Accessing real systems with current user interactions to see
whether method performs better than existing one ! second part
of the tutorial
72. Summary: Recommender Systems
I support users by suggesting interesting items
I counteract information overload
I unpersonalised recommender
I collaborative
73. ltering
I user-based k-nearest neighbour
I item-based k-nearest neighbour
I matrix factorisation
I content-based
75. Table of Contents
Introduction
Recommender Systems
Unpersonalised Recommendation
Collaborative Filtering
Content-based Filtering
Evaluation
News Recommendation
Big Data Issues
76. News Recommendation: Special Characteristics
Collection Dynamics
I thousands of new article published daily
I older articles' relevancy decays
Contextual Dierences
I users perceive recommendations dierently
I devices render recommendations dierently
I dependence on daytime and weekday
Popularity Bias
I few items receive a lot of attention
I most items receive hardly any attention
80. Table of Contents
Introduction
Recommender Systems
Unpersonalised Recommendation
Collaborative Filtering
Content-based Filtering
Evaluation
News Recommendation
Big Data Issues
81. Big Data
Goal
Intelligent real-time processing of huge amounts of data.
Recommender Systems ! personalisation
I volume ! amount of data to be stored increases
I variety ! heterogeneous data
I velocity ! data streams in (near) real-time
I veracity ! noisy data
83. l the requirements of big data?
Volume
hundreds of GB every day X
Variety
news entail textual data and images enducing some variety
Velocity
news arise continuously ! second part of the tutorial X
Veracity
news have some consistent attributes (headline, text), but also
comprise some features which are missing or wrong (date, location,
image)