lastfm crawler

1,847 views
1,767 views

Published on

Mini-project result presentation in class

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,847
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

lastfm crawler

  1. 1. last.fm crawlerRW vs RWRW Mário Almeida malmeida@kth.se Zafar Gilani szuhgi@kth.se Arinto Murdopo arinto@kth.se
  2. 2. Outline● Parameters● Methodology● Results● Challenges● Conclusion
  3. 3. Parameters1. Playcounts2. Playlists3. Ages4. IDs5. Number of friends (degrees)Compare average using RW and RWRW!
  4. 4. MethodologyUtilized lastfm APIs to obtain ● user info ● number of friends (degree)RW with UIS-WROn-the-fly, we apply RW formula:
  5. 5. MethodologyFor RWRW, we apply:The weight Wv is set to number of friends (degree)
  6. 6. ResultsCrawled for ~10 hoursNumber of samples: 48000Number of age samples: 36363, not all usersshow their age
  7. 7. Results - Ages RW estimates lower After about 25k average age samples, the There is a big values. age stabilizes. correlation between age and the degree
  8. 8. Results - Playlists Most users do not have playlists. RW estimates higher numbers of playlists. Users with higher degrees tend to have more playlists.
  9. 9. Results - Playcounts We found some users having playcounts in the order of millions. RW estimates higher playcounts. Users with higher degree tend to have higher playcounts
  10. 10. Results - IDs Not yet stable. RW estimates a lower average ID compared to RWRW. An user with lower ID has generally a higher degree
  11. 11. Results - Degrees RWRW reduces the bias of nodes with higher probability to be visited due to the high degree. This is indeed close to the expected degree value.
  12. 12. Conclusion● A simple random walk in a social network generally results into biased averages. ○ A node with higher degree has a higher probability of being discovered.● RWRW normalizes the averages. ○ High variations do not abruptly impact the estimation. ○ RWRW reduces the biases of RW.● Low variance means lower difference between RW and RWRW.● Crawling lastfm produces many challenges ○ e.g.: 0 degree, banned user, huge playcounts
  13. 13. QuestionsCheck the code in:● http://code.google.com/p/lastfm-rwrw/

×