Mining Cross-Domain Rating Datasets
from Structured Data on Twitter
@sidooms
Simon Dooms
Rating Datasets
 What are ratings? Explicit user preference information
 Why ratings? Recommender systems
ConclusionCros...
Rating Datasets
 What are ratings? Explicit user preference information
 Why ratings? Recommender systems
ConclusionCros...
Ratings Scarcity in Research
 Ratings = private data
 Public datasets to the rescue?
– MovieLens 100K (1998)
– MovieLens...
Social Sharing = Ratings Goldmine
 Previous research: MovieTweetings
ConclusionCross-DomainResultsSocial SharingIntro
Apr...
Social Sharing = Ratings Goldmine
 Previous research: MovieTweetings
– Movie Rating dataset from IMDb – Twitter
– https:/...
Target Websites - Goodreads
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University ...
Target Websites - Pandora
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - ...
Target Websites - YouTube
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - ...
Mining Experiment
 But words are wind…
– 2 Weeks experiment
– 4 Online platforms
ConclusionCross-DomainResultsSocial Shar...
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 12
Python code + ...
The Numbers
One more thing …
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University...
Cross-Domain Rating Dataset
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University ...
Applications
 Collect ratings for recsys research / input
 Cross-domain recsys research
 Trend detection, analytics, .....
Conclusions
 Ratings scarcity in research
 Public dataset are old and synthetic
 Social sharing = ratings goldmine
 2 ...
@sidooms
Simon Dooms
Mining Cross-Domain Rating Datasets
from Structured Data on Twitter
Mining Cross-Domain Rating Datasets from Structured Data on Twitter
Upcoming SlideShare
Loading in...5
×

Mining Cross-Domain Rating Datasets from Structured Data on Twitter

648

Published on

Slides about mining cross-domain ratings presented at the WWW 2014 conference on April 8, in Seoul (Korea) by Simon Dooms.

Published in: Science
1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
648
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
20
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Mining Cross-Domain Rating Datasets from Structured Data on Twitter

  1. 1. Mining Cross-Domain Rating Datasets from Structured Data on Twitter @sidooms Simon Dooms
  2. 2. Rating Datasets  What are ratings? Explicit user preference information  Why ratings? Recommender systems ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 2
  3. 3. Rating Datasets  What are ratings? Explicit user preference information  Why ratings? Recommender systems ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 3
  4. 4. Ratings Scarcity in Research  Ratings = private data  Public datasets to the rescue? – MovieLens 100K (1998) – MovieLens 1M (2000) – MovieLens 10M (2008) – More on recsyswiki.com Old, Synthetic Datasets ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 4
  5. 5. Social Sharing = Ratings Goldmine  Previous research: MovieTweetings ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 5
  6. 6. Social Sharing = Ratings Goldmine  Previous research: MovieTweetings – Movie Rating dataset from IMDb – Twitter – https://github.com/sidooms/MovieTweetings  What about other domains? Websites? Well, let’s try it out! ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 6
  7. 7. Target Websites - Goodreads ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 7 Twitter user - Rating - Book title Book author - Goodreads URL - Time
  8. 8. Target Websites - Pandora ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 8 Twitter user - Song Pandora URL - Time
  9. 9. Target Websites - YouTube ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 9 Twitter user - (Video uploader) YouTube URL - Time
  10. 10. Mining Experiment  But words are wind… – 2 Weeks experiment – 4 Online platforms ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 10
  11. 11. ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 12 Python code + Task Scheduler = Dataset files https://github.com/sidooms/Twitter-ratings
  12. 12. The Numbers One more thing … ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 13
  13. 13. Cross-Domain Rating Dataset ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 14
  14. 14. Applications  Collect ratings for recsys research / input  Cross-domain recsys research  Trend detection, analytics, ...  Applicable for all social sharing webs ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 15
  15. 15. Conclusions  Ratings scarcity in research  Public dataset are old and synthetic  Social sharing = ratings goldmine  2 week experiment, 4 major websites  Python code & datasets on Github  True cross-domain ratings dataset ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 16
  16. 16. @sidooms Simon Dooms Mining Cross-Domain Rating Datasets from Structured Data on Twitter
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×