Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
last.fm crawlerRW vs RWRW             Mário Almeida      malmeida@kth.se                 Zafar Gilani     szuhgi@kth.se   ...
Outline●   Parameters●   Methodology●   Results●   Challenges●   Conclusion
Parameters1.   Playcounts2.   Playlists3.   Ages4.   IDs5.   Number of friends (degrees)Compare average using RW and RWRW!
MethodologyUtilized lastfm APIs to obtain   ● user info   ● number of friends (degree)RW with UIS-WROn-the-fly, we apply R...
MethodologyFor RWRW, we apply:The weight Wv is set to number of friends (degree)
ResultsCrawled for ~10 hoursNumber of samples: 48000Number of age samples: 36363, not all usersshow their age
Results - Ages           RW        estimates          lower      After about 25k       average age    samples, the     The...
Results - Playlists       Most users do         not have         playlists.                          RW estimates higher  ...
Results - Playcounts           We found some           users having           playcounts in the           order of million...
Results - IDs       Not yet stable.                         RW estimates a lower                         average ID compar...
Results - Degrees                 RWRW reduces the bias of nodes                with higher probability to be visited     ...
Conclusion● A simple random walk in a social network  generally results into biased averages.  ○ A node with higher degree...
QuestionsCheck the code in:● http://code.google.com/p/lastfm-rwrw/
Upcoming SlideShare
Loading in …5
×

lastfm crawler

2,160 views

Published on

Mini-project result presentation in class

Published in: Education
  • Be the first to comment

lastfm crawler

  1. 1. last.fm crawlerRW vs RWRW Mário Almeida malmeida@kth.se Zafar Gilani szuhgi@kth.se Arinto Murdopo arinto@kth.se
  2. 2. Outline● Parameters● Methodology● Results● Challenges● Conclusion
  3. 3. Parameters1. Playcounts2. Playlists3. Ages4. IDs5. Number of friends (degrees)Compare average using RW and RWRW!
  4. 4. MethodologyUtilized lastfm APIs to obtain ● user info ● number of friends (degree)RW with UIS-WROn-the-fly, we apply RW formula:
  5. 5. MethodologyFor RWRW, we apply:The weight Wv is set to number of friends (degree)
  6. 6. ResultsCrawled for ~10 hoursNumber of samples: 48000Number of age samples: 36363, not all usersshow their age
  7. 7. Results - Ages RW estimates lower After about 25k average age samples, the There is a big values. age stabilizes. correlation between age and the degree
  8. 8. Results - Playlists Most users do not have playlists. RW estimates higher numbers of playlists. Users with higher degrees tend to have more playlists.
  9. 9. Results - Playcounts We found some users having playcounts in the order of millions. RW estimates higher playcounts. Users with higher degree tend to have higher playcounts
  10. 10. Results - IDs Not yet stable. RW estimates a lower average ID compared to RWRW. An user with lower ID has generally a higher degree
  11. 11. Results - Degrees RWRW reduces the bias of nodes with higher probability to be visited due to the high degree. This is indeed close to the expected degree value.
  12. 12. Conclusion● A simple random walk in a social network generally results into biased averages. ○ A node with higher degree has a higher probability of being discovered.● RWRW normalizes the averages. ○ High variations do not abruptly impact the estimation. ○ RWRW reduces the biases of RW.● Low variance means lower difference between RW and RWRW.● Crawling lastfm produces many challenges ○ e.g.: 0 degree, banned user, huge playcounts
  13. 13. QuestionsCheck the code in:● http://code.google.com/p/lastfm-rwrw/

×