Your SlideShare is downloading. ×

lastfm crawler

1,559

Published on

Mini-project result presentation in class

Mini-project result presentation in class

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,559
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. last.fm crawlerRW vs RWRW Mário Almeida malmeida@kth.se Zafar Gilani szuhgi@kth.se Arinto Murdopo arinto@kth.se
  • 2. Outline● Parameters● Methodology● Results● Challenges● Conclusion
  • 3. Parameters1. Playcounts2. Playlists3. Ages4. IDs5. Number of friends (degrees)Compare average using RW and RWRW!
  • 4. MethodologyUtilized lastfm APIs to obtain ● user info ● number of friends (degree)RW with UIS-WROn-the-fly, we apply RW formula:
  • 5. MethodologyFor RWRW, we apply:The weight Wv is set to number of friends (degree)
  • 6. ResultsCrawled for ~10 hoursNumber of samples: 48000Number of age samples: 36363, not all usersshow their age
  • 7. Results - Ages RW estimates lower After about 25k average age samples, the There is a big values. age stabilizes. correlation between age and the degree
  • 8. Results - Playlists Most users do not have playlists. RW estimates higher numbers of playlists. Users with higher degrees tend to have more playlists.
  • 9. Results - Playcounts We found some users having playcounts in the order of millions. RW estimates higher playcounts. Users with higher degree tend to have higher playcounts
  • 10. Results - IDs Not yet stable. RW estimates a lower average ID compared to RWRW. An user with lower ID has generally a higher degree
  • 11. Results - Degrees RWRW reduces the bias of nodes with higher probability to be visited due to the high degree. This is indeed close to the expected degree value.
  • 12. Conclusion● A simple random walk in a social network generally results into biased averages. ○ A node with higher degree has a higher probability of being discovered.● RWRW normalizes the averages. ○ High variations do not abruptly impact the estimation. ○ RWRW reduces the biases of RW.● Low variance means lower difference between RW and RWRW.● Crawling lastfm produces many challenges ○ e.g.: 0 degree, banned user, huge playcounts
  • 13. QuestionsCheck the code in:● http://code.google.com/p/lastfm-rwrw/

×