Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The plista Dataset

ACM RecSys 2013, Hong Kong

Authors:
Kille, Benjamin
and Hopfgartner, Frank
and Brodt, Torben
and Hein...
Introduction and Motivation
● Context: News Article Recommendation
Introduction and Motivation
● Do we need another recommendation data
set?
we have
...
● What features are those data sets ...
Introduction and Motivation
● Features that had not been available in
existing data sets:
○ contextual features: device, o...
Introduction and Motivation
● Additional requirements for recommending news articles
○ real-time → recommendations must be...
Dataset characteristics
{ // json
"type": "impression",
"context": {
"simple": {
"27": 418, // publisher
"14": 31721, // w...
Dataset characteristics
● object types
○
○
○
○

impressions → users reading news articles
clicks → users clicking recommen...
Dataset usage
Dataset usage
● Evaluation based on
Click-Through-Rate
(CTR)
● ~ 84 million
impressions
● ~ 1 million clicks
Dataset usage
● evaluation cross-news portal
recommenders
● 10 - 36 % user overlap in
between different news
portals
Dataset usage
● news portal comparisons
● do we observe similar user
behaviour on news portals
offering similar content?
Dataset usage
● evaluating contextual
recommendation algorithms
● sensitive to
○ weekday
○ hour of day
○ ...
Dataset usage
When using the data set you may consider…
● … we identify users by session IDs
○
○

individual users may hav...
Conclusions
● we introduce a new data set intended to
support recommender systems research
● we outlined novel features wh...
Summary
● news articles
○ of ~13 publishers

● transactional data
○ Impressions
○ Clicks

● contextual data
○ of ~50 attri...
The plista Dataset
@inproceedings{Kille:2013,
title = {The plista Dataset},
author = {
Kille, Benjamin
and Hopfgartner, Fr...
Upcoming SlideShare
Loading in …5
×

Paper the plista dataset

1,679 views

Published on

Published in: Technology, Business
  • Be the first to comment

Paper the plista dataset

  1. 1. The plista Dataset ACM RecSys 2013, Hong Kong Authors: Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias Speaker: Brodt, Torben International News Recommender Systems Workshop and Challenge October 13th, 2013
  2. 2. Introduction and Motivation ● Context: News Article Recommendation
  3. 3. Introduction and Motivation ● Do we need another recommendation data set? we have ... ● What features are those data sets missing? ● What requirements entail news articles for recommendation?
  4. 4. Introduction and Motivation ● Features that had not been available in existing data sets: ○ contextual features: device, operating system, browser, etc. ○ cross-domain features: 13 different news providers included ○ different interaction types: interactions with recommendations (clicks), as well as news items (impressions) ○ content features: headline, URL, images, text snippets, etc.
  5. 5. Introduction and Motivation ● Additional requirements for recommending news articles ○ real-time → recommendations must be provided within a short time interval (< 200ms) ○ changing relevancy → items’ relevancy decreases with time ○ dynamics → new news items are being continuously added ● Requirements inherent to existing recommender systems: ○ sparsity → users typically read only few news articles ○ cold start → systems refrain from requesting users to create profiles; this results in a majority of small user profiles
  6. 6. Dataset characteristics { // json "type": "impression", "context": { "simple": { "27": 418, // publisher "14": 31721, // widget ... }, "lists": { "10": [100, 101] // channel } ... specs hosted at http://orp.plista. api } com
  7. 7. Dataset characteristics ● object types ○ ○ ○ ○ impressions → users reading news articles clicks → users clicking recommendations creates → news articles being created updates → news articles being updated api specs hosted at http://orp.plista. com
  8. 8. Dataset usage
  9. 9. Dataset usage ● Evaluation based on Click-Through-Rate (CTR) ● ~ 84 million impressions ● ~ 1 million clicks
  10. 10. Dataset usage ● evaluation cross-news portal recommenders ● 10 - 36 % user overlap in between different news portals
  11. 11. Dataset usage ● news portal comparisons ● do we observe similar user behaviour on news portals offering similar content?
  12. 12. Dataset usage ● evaluating contextual recommendation algorithms ● sensitive to ○ weekday ○ hour of day ○ ...
  13. 13. Dataset usage When using the data set you may consider… ● … we identify users by session IDs ○ ○ individual users may have several IDs users sharing their device might be mapped to one ID ● … interactions (clicks, impressions) and content dynamics (creates, updates) differ between news portals ● … contents are restricted to German ● … preferences are represented on a binary scale (user read article, user clicked recommendation) ● … clicking on recommendations might not reveal the actual relevancy of an item
  14. 14. Conclusions ● we introduce a new data set intended to support recommender systems research ● we outlined novel features which existing data sets lacked ● we presented scenarios which can be evaluated using the data set ● we pointed to critical aspects which ought to be considered when working with the data set
  15. 15. Summary ● news articles ○ of ~13 publishers ● transactional data ○ Impressions ○ Clicks ● contextual data ○ of ~50 attributes ● cross domain application
  16. 16. The plista Dataset @inproceedings{Kille:2013, title = {The plista Dataset}, author = { Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias }, booktitle = { NRS'13: Proceedings of the International Workshop and Challenge on News Recommender Systems }, year = {2013}, month = {10}, location = {Hong Kong, China}, publisher = {ACM}, pages={14--21} }

×