Your SlideShare is downloading. ×
0
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Paper  the plista dataset
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Paper the plista dataset

1,004

Published on

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,004
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
14
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The plista Dataset ACM RecSys 2013, Hong Kong Authors: Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias Speaker: Brodt, Torben International News Recommender Systems Workshop and Challenge October 13th, 2013
  • 2. Introduction and Motivation ● Context: News Article Recommendation
  • 3. Introduction and Motivation ● Do we need another recommendation data set? we have ... ● What features are those data sets missing? ● What requirements entail news articles for recommendation?
  • 4. Introduction and Motivation ● Features that had not been available in existing data sets: ○ contextual features: device, operating system, browser, etc. ○ cross-domain features: 13 different news providers included ○ different interaction types: interactions with recommendations (clicks), as well as news items (impressions) ○ content features: headline, URL, images, text snippets, etc.
  • 5. Introduction and Motivation ● Additional requirements for recommending news articles ○ real-time → recommendations must be provided within a short time interval (< 200ms) ○ changing relevancy → items’ relevancy decreases with time ○ dynamics → new news items are being continuously added ● Requirements inherent to existing recommender systems: ○ sparsity → users typically read only few news articles ○ cold start → systems refrain from requesting users to create profiles; this results in a majority of small user profiles
  • 6. Dataset characteristics { // json "type": "impression", "context": { "simple": { "27": 418, // publisher "14": 31721, // widget ... }, "lists": { "10": [100, 101] // channel } ... specs hosted at http://orp.plista. api } com
  • 7. Dataset characteristics ● object types ○ ○ ○ ○ impressions → users reading news articles clicks → users clicking recommendations creates → news articles being created updates → news articles being updated api specs hosted at http://orp.plista. com
  • 8. Dataset usage
  • 9. Dataset usage ● Evaluation based on Click-Through-Rate (CTR) ● ~ 84 million impressions ● ~ 1 million clicks
  • 10. Dataset usage ● evaluation cross-news portal recommenders ● 10 - 36 % user overlap in between different news portals
  • 11. Dataset usage ● news portal comparisons ● do we observe similar user behaviour on news portals offering similar content?
  • 12. Dataset usage ● evaluating contextual recommendation algorithms ● sensitive to ○ weekday ○ hour of day ○ ...
  • 13. Dataset usage When using the data set you may consider… ● … we identify users by session IDs ○ ○ individual users may have several IDs users sharing their device might be mapped to one ID ● … interactions (clicks, impressions) and content dynamics (creates, updates) differ between news portals ● … contents are restricted to German ● … preferences are represented on a binary scale (user read article, user clicked recommendation) ● … clicking on recommendations might not reveal the actual relevancy of an item
  • 14. Conclusions ● we introduce a new data set intended to support recommender systems research ● we outlined novel features which existing data sets lacked ● we presented scenarios which can be evaluated using the data set ● we pointed to critical aspects which ought to be considered when working with the data set
  • 15. Summary ● news articles ○ of ~13 publishers ● transactional data ○ Impressions ○ Clicks ● contextual data ○ of ~50 attributes ● cross domain application
  • 16. The plista Dataset @inproceedings{Kille:2013, title = {The plista Dataset}, author = { Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias }, booktitle = { NRS'13: Proceedings of the International Workshop and Challenge on News Recommender Systems }, year = {2013}, month = {10}, location = {Hong Kong, China}, publisher = {ACM}, pages={14--21} }

×