Mining the social web for music-related data: a hands-on tutorial

4,249 views

Published on

Slides for the Tutorial given as part of the International Symposium on Music Information Retrieval (ISMIR '09) in Kobe, October 2009

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
4,249
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Mining the social web for music-related data: a hands-on tutorial

  1. 1. Mining the social webfor music-related data: a hands-on tutorial Claudio Baccigalupo & Benjamin Fields ISMIR 2009 – Kobe (Japan) 1
  2. 2. Welcome!Claudio Baccigalupo Ben Fields IIIA–CSIC Goldsmiths Barcelona, Spain University of London, UK ismir2009.ben elds.net 2
  3. 3. It’s a hands-on tutorialYou will write code for real MIR applications:1. Evaluating hypotheses 4. Performing audio analysis2. Comparing lyrics by genre 5. Capturing social data3. Revealing trends 6. Collecting feedbackexploring multiple languages and web sites: 3
  4. 4. Source code archiveAll the code examples are included in the le: ismir2009.ben elds.net/tutorial.zipUnzip the archive and open a shell in its folder. en check that Python and Ruby are installed: $ ruby --version $ python --version Or download them from: python.org and ruby-lang.org 4
  5. 5. #1 EVALUATING HYPOTHESES 5
  6. 6. The evaluation processSay you have built the“ultimate genre recogniser” song genre genre recogniserHow would you evaluate its precision rate?1. Build a local collection of varied songs A cumbersome,2. Assign them with a genre label boring process!3. Run the algorithm and check its output 6
  7. 7. The traditional approachEvaluate with a few, manually labelled examples$ cd <PACKAGE PATH>/c$ python isrock.py ../m/rock.mp3 =» True$ python isrock.py ../m/metal.mp3 =» True$ python isrock.py ../m/vocal.mp3 =» False$ python isrock.py ../m/experimental.mp3 =» False e “ultimate” recogniser or just a coincidence? 7
  8. 8. The web-based approach e web contains thousands of genre-classi edsongs that can be legally downloaded for freeJamendo only includes 170K songs by 14K artists 8
  9. 9. From browser to web APIA web API allows to retrieve data in a compactformat from a site by means of simple queries api.jamendo.com/get2/name +artist_name/album/plain/?jamendo.com/en/albums order=ratingmonth_desc 9
  10. 10. From browser to web APIDocumentation at: developer.jamendo.comAPI query to retrieve 50 random Rock songs:api.jamendo.com/get2/stream/track/plain/?tag_idstr=rock&n=50&order=random_desc 10
  11. 11. Creating a Python scriptVerify the genre recogniser on Jamendo tracks:$ pythonfrom urllib import urlopenfrom isrock import isRockquery = "http://api.jamendo.com/get2/stream/track/plain/?n=50&tag_idstr=rock&order=random_desc"result = urlopen(query).read()songs = result.split()rock = [isRock(song) for song in songs]print "The ratio of rock songs is: %.2f" % (float(rock.count(True))/len(songs)) 11
  12. 12. Evaluating the genre recogniser e code is included in the le c/jamendo_1.py: =» The ratio of rock$ python jamendo_1.py songs is: 0.58 e script c/jamendo_2.py allows to specify thenumber of tests and multiple genres at once:$ python jamendo_2.py 30 rock jazz country rnb e result shows that the isRock recogniser isnot able to distinguish Rock from other genres 12
  13. 13. Lessons learntMusic data can easily be retrieved from the web ousands of songs can be downloaded for freeSongs in Jamendo already have a genre labelattached, so you do not have to decide for oneWorking with a web API simpli es the processDi erent musical objects are available (songs,artists, albums, playlists, users) to work on 13
  14. 14. #2 COMPARINGLYRICS BY GENRE 14
  15. 15. The relevance of lyricsRecent interest for lyrics-based analysis:1. Knees, Schedl, Widmer, Multiple Lyrics Alignment: Automatic Retrieval of Song Lyrics, 20052. Geleijnse, Korst, E cient Lyrics Extraction from the web, 20063. Kleedorfer, Knees, Pohle, Oh Oh Oh Whoah! Towards Automatic Topic Detection in Song Lyrics, 20084. Mayer, Neumayer, Rauber, Rhyme and Style Features for Musical Genre Categorisation By Song Lyrics, 2008Lyrics were retrieved without using any web API 15
  16. 16. An online song lyrics databaseLyrics y provides a web API to retrieve lyricslyrics y.com/search/view.php? lyrics y.com/api/api.php? 1524965aad&view=578812 i=KEY&a=Rihanna&t=Umbrella 16
  17. 17. Creating a Ruby scriptTo retrieve the lyrics for “Umbrella” (Rihanna):$ echo $lyricsfly_key = "PASTE YOUR KEY HERE" >lyricsfly_key.rb 17
  18. 18. Creating a Ruby scriptTo retrieve the lyrics for “Umbrella” (Rihanna):$ irbrequire net/httprequire rexml/documentrequire lyricsfly_keyurl = "http://lyricsfly.com/api/api.php?a=Rihanna&t=Umbrella&i=#{$lyricsfly_key}"result = Net::HTTP.get_response(URI.parse(url))response = REXML::Document.new(result.body).elements[//tx]puts response.text 18
  19. 19. Retrieving multiple lyrics e code is included in the le c/lyrics y_1.py: =» You have my heart[br]$ ruby lyricsfly_1.rb And well never be ... e script c/lyrics y_2.rb allows to specify theartist name and track title: =» Imagine theres no$ ruby lyricsfly_2.rb Heaven"John Lennon" Imagine Its easy if you try No Hell below us ... 19
  20. 20. Lyrics-based analysisMayer, Neumayer, Rauber, Rhyme and Style Featuresfor Musical Genre Categorisation By Song Lyrics, 2008Textual features of lyrics are related to the genreHip-hop lyrics have more ‘?’ than Country onesEvaluated on 29 Hip-hop and 41 Country songsDoes this hold with larger data sets? 20
  21. 21. Repeating the experiment‘Country’ and ‘Hip-hop’ music web API #1 List songs by genre music web API #2 List lyrics by genre Count ‘?’ by genre 21
  22. 22. Retrieving songs by genreLast.fm has 4M songs classi ed by tags/genreslast.fm/music/+tag/country last.fm/api/show?service=285 22
  23. 23. Retrieving songs by genreLast.fm has 4M songs classi ed by tags/genres ws.audioscrobbler.com/2.0/?last.fm/music/+tag/country method=tag.gettoptracks& tag=disco& api_key=KEY 23
  24. 24. Combining two music web APIs e code is included in the le c/lyrics y_3.rb:require net/httprequire rexml/documentrequire "#{File.dirname(__FILE__)}/lyricsfly_key"require "#{File.dirname(__FILE__)}/lastfm_key"def get_lyrics(artist_and_title) artist,title = artist_and_title.collect{|arg| arg.gsub(/[^a-zA-Z0-9]/,%25)} url = "http://lyricsfly.com/api/api.php?" url += "a=#{artist}&t=#{title}&i=#{$lyricsfly_key}" result = Net::HTTP.get_response(URI.parse(url)) response = REXML::Document.new(result.body).elements[//tx] response.text.gsub("[br]", "") unless response.nil?end 24
  25. 25. Combining two music web APIsdef get_artists_and_titles(genre) url = "http://ws.audioscrobbler.com/2.0/?method=" url += "tag.gettoptracks&tag=#{genre}&api_key=#{$lastfm_key}" result = Net::HTTP.get_response(URI.parse(url)) response = REXML::Document.new(result.body) response.elements.collect(//track) do |track| [ track.elements[artist].elements[name].text,track.elements[name].text ] end unless response.nil?endARGV.each do |genre| tracks = get_artists_and_titles(genre) lyrics = tracks.collect{|track| get_lyrics(track)}.compact qm = lyrics.inject(0.0) {|qm, lyric| qm + lyric.count("?")} p "#{genre} avg question marks: %.2f" % (qm/lyrics.length)endFinally: $ ruby lyricsfly_3.rb country hip-hop 25
  26. 26. Lessons learntHip-hop lyrics have more “?” than Country onesAny programming language with libraries toretrieve pages and parse XML can do the workData from di erent web APIs can be aggregatedA mash-up application can uncover hiddenmusical relationships among di erent domains 26
  27. 27. A more advanced mash-upismir2009.ben elds.net/gmapradio 27
  28. 28. From instance to concept ere is no limit to the chain of API callsTo connect even more resources, uniqueidenti ers work better than ambiguous namesMany web sites identify musical objects througha speci c set of Musicbrainz IDs which allow toeasily match the same item in multiple places 28
  29. 29. Music and web ontologies e Linking Open Data project is a prominentattempt at expressing and connecting objects ofdi erent domains using semantic web technology linkeddata.org musicontology.com 29
  30. 30. #3 REVEALING TRENDS 30
  31. 31. What is trendy?“Trend” is related to a speci c time and contextAnything that is rapidly becoming famous inyour enviroment (your friends, your location, …)Mavens are the rst to pick up on nascent trendsMusic example: which artists should you now belistening to, to keep up with the latest trends? 31
  32. 32. Trendy artist of the month your friendsLast month theymostly listened to: is month they aremostly listening to: is kind of information can be retrieved frommusic-related web communities such as Last.fm 32
  33. 33. Hiding API calls with wrappersLast.fm provides for each user the list of friendsand the most played artists in a given period ese data can be retrieved via Last.fm API or,more easily, using the Python wrapper pylast,available at code.google.com/p/pylast :$ wget http://pypi.python.org/packages/source/p/pylast/pylast-0.4.15.tar.gz$ gunzip < pylast-0.4.15.tar.gz | tar xvf -$ cd pylast-0.4.15$ sudo python setup.py install 33
  34. 34. Retrieving lists of friendsAPI wrappers abstract the functions that makeHTTP calls to send and receive informationUsing the pylast wrapper, the code to obtain listsof friends from Last.fm is compact and clear:$ pythonimport pylastfrom lastfm_key import lastfm_keyapi = pylast.get_lastfm_network(lastfm_key)friends = api.get_user("claudiob").get_friends()print "Last.fm friends: %s" % friends 34
  35. 35. Extracting trendy artists of the month To reveal which artists a user should listen to: 1. Retrieve the list of friends of that user 2. Retrieve the most played artists by the friends during this and the previous month, printing those who have ‘grown’ more in this period while excluding artists the user is already aware of e code is included in the le c/lastfm_2.py: $ python lastfm_2.py claudiob=» Trendy artists for claudiob:1) Amy Winehouse, already known by daddyrho, recentlydiscovered by kobra_cccpozzi, pilomatic, econ-luca, ... 35
  36. 36. Lessons learntSocial data from the web helps uncover trendsData for trends imply a temporal dimension, acontext (friends, geographical location, …) anda class of objects (artists, tracks, …) to observeMore transparent and ‘human’ than usingcollaborative ltering for recommendationsAPI wrappers shorten and clear up the code 36
  37. 37. #4 PERFORMINGAUDIO ANALYSIS 37
  38. 38. The web as a source of toolsHow do you extract acoustic features of a song?1. Write your own code:2. Use a software package:3. Retrieve from a web site:echonest.com/analyze 38
  39. 39. Estimating the tempo of a songAcoustic analysis performed through a web API: upload: ‘Upload a track to e Echo Nests analyzer for analysis and later retrieval of track information’ get_tempo: ‘Retrieve the overall estimated tempo of a track in beats per minute after previously calling for analysis via upload’developer.echonest.com/ Authentication is required pages/overview 39
  40. 40. Creating a Ruby scriptEstimate tempo for the track m/120bpm.mp3:$ irbrequire net/httprequire rexml/documentrequire echonest_keysong = http://ismir2009.benfields.net/m/120bpm.mp3url= http://developer.echonest.com/api/uploadresult = Net::HTTP.post_form(URI.parse(url),{api_key => $echonest_key, version => 3,url => song}) 40
  41. 41. Creating a Ruby scriptEstimate tempo for the track m/120bpm.mp3:song_id = REXML::Document.new(result.body).elements[//track].attributes[id]url = http://developer.echonest.com/api/get_tempourl+= "?id=#{song_id}"url+= "&version=3&api_key=#{$echonest_key}"result = Net::HTTP.get_response(URI.parse(url))tempo = REXML::Document.new(result.body).elements[//tempo].textputs "The estimated tempo is #{tempo} BPM" 41
  42. 42. Complete audio analysis e code is included in the le c/echonest_1.rb: =» The estimated tempo$ ruby echonest_1.rb is 120.013 BPM e script c/echonest_2.rb allows to specify thetrack location and estimates more features: =» "time_signature"=> 4,$ ruby echonest_2.rbhttp:// "mode"=> 1, "key"=> 5,ismir2009.benfields.net "tempo"=> 120/m/120bpm.mp3 developer.echonest.com/ forums/thread/9 42
  43. 43. Minor/major vs. sad/happy‘Sad’ and ‘Happy’ Songs in minor are sadList songs by mood Songs in major are happy Would you agree?List modes by mood Compare minor/major 43
  44. 44. Running the experiment e code is included in the le c/echonest_3.rb: $ ruby echonest_3.rb sad happy=» sad songs are 0.25 major, 0.75 minor happy songs are 1.00 major, 0.00 minor Repeating the experiment with more songs can serve as a proper evaluation of the statement Do not submit too many simultaneous queries! 44
  45. 45. Lessons learnt e web makes available both musical data andtools for acoustic analysisEven DJs can use web-based tools to remixSymbolic analysis not available… yet? e future of music software is on the web 45
  46. 46. #5 CAPTURING SOCIAL DATA 46
  47. 47. Social networks for musiciansWeb sites for musician-to-musician networking1. Grant access to an artist’s public music2. Record relationships among musicians in the same network3. Provide social data in the domain of musicCan be very useful for music informatics 47
  48. 48. A different kind of musical resourceSoundCloud, an advanced music-sharing platform soundcloud.com soundcloud.com/api/console 48
  49. 49. Retrieving lists of followersInstall the Python wrapper for SoundCloud API:$ git clone git://github.com/soundcloud/python-api-wrapper.git$ cd python-api-wrapper$ sudo python setup.py installCode is included in the le c/soundcloud_1.py:import scapifrom soundcloud_oauth import init_scoperoot = init_scope()user = root.users("bfields")for friend in user.followings(): print "Following %s" % friend["username"] 49
  50. 50. A different type of authenticationSoundCloud authenticates with OAuth protocol: c/soundcloud_1.py runs the full protocol c/soundcloud_2.py includes a valid token only for this application $ python soundcloud_2.py =» bfields is following: Forss, atl, stunna, ... 50
  51. 51. Plotting networks of friendsInstall the igraph library and Python wrapper: igraph.sf.net/ pypi.python.org/from and download.html pypi/python-igraphand test by drawing a simple graph into a le:$ pythonimport igraph =»g = igraph.Graph(n=2, edges=[(0,1)])g.write_svg("test.svg", g.layout("kk")) [test.svg] 51
  52. 52. Plotting networks of friends e code included in the le c/soundcloud_3.py:1. Adds a vertex for a given seed user2. Gets from SoundCloud the list of people the user “follows”3. For each of these persons: • Recursively restart from 1. until the desired level of depth • Adds an edge to connect to the person to the seed user$ python soundcloud_3.py bfields 2 =» 52
  53. 53. Plotting networks of friends e code included in the le c/soundcloud_3.py:1. Adds a vertex for a given seed user2. Gets from SoundCloud the list of people the user “follows”3. For each of these persons: • Recursively restart from 1. until the desired level of depth • Adds an edge to connect to the person to the seed user$ python soundcloud_3.py bfields 3 =» 53
  54. 54. Plotting as a web service How to plot a snapshot of the full network? c/soundcloud_4.py collects the full network of SoundCloud friends (takes a long time!) LaNet-vi o ers a web service that draws large scale networks of dataxavier.informatics.indiana.edu/ lanet-vi/ (no software required!) 54
  55. 55. Plotting as a web serviceismir2009.ben elds.net/m/soundcloud_16k.png 55
  56. 56. Lessons learntMusicians relate in online communities witheach other (as friends, followers, …)Social networks can easily be extracted andplotted, either partially or completelyPlotting can also be outsourced to a web serviceResearchers can bene t for several applications:playlist generation, recommender systems, … 56
  57. 57. #6 COLLECTING FEEDBACK 57
  58. 58. Subjective evaluationResearchers need feedback in many scenarios Recommender systems Automatic composition Mood-based analysis …and moreSetting up a web survey has some drawbacks! isolated hard to share can vote twice (or never) requires personal data 58
  59. 59. Advantages of social networksBene ts of publishing the survey on Facebook:1. Collect personal data without lling any form2. Explore social connections between users3. Share and publish surveys through networks of friends4. Potentially reach millions of users in a friendly environment5. Attract more people with game-styled application 59
  60. 60. Advantages of social networks apps.facebook.com/herd-it 60
  61. 61. Creating a PHP survey e code is included in the le c/facebook_1.php:<?php if(!($vote = @$_GET[vote])) { ?><object type="application/x-shockwave-flash" height="30"data="mp3player.swf?autoplay=true&amp;song_url=http://ismir2009.benfields.net/m/saxex.mp3"></object><form><b>Personal data:</b><br /> Your name: <input type="text" name="first_name" size="17"> <input type="text" name="last_name" size="17"> Your birth year: <input type="text" name="birth_year" size="4"> Your sex: <input type="radio" name="sex" value="male"> Male <input type="radio" name="sex" value="female"> Female Your current home-town: <input type="text" name="city" size=25><b>Survey:</b><br /> This song was performed by a <input type="submit" value=Human name="vote" /> or by a <input type="submit" value=Robot name="vote" />?</form> 61
  62. 62. Creating a PHP survey e code is included in the le c/facebook_1.php:<?php } else { $info = $_GET; $data_file = "/tmp/fb_survey.txt"; $text = "|". date("Y.m.d H:i:s"); $text .= "|". $vote; $text .= "|". $info[first_name] ." ". $info[last_name]; $text .= "|". $info[sex]; $text .= "|". $info[birth_year]; $text .= "|". $info[city]; $file = fopen($data_file, "a"); fwrite($file, $text ."n"); fclose($file); $file = file_get_contents($data_file, r); $human = substr_count($file, |Human|); $robot = substr_count($file, |Robot|); echo "Votes so far: Human ". $human ." - Robot ". $robot; echo "<pre>". $file ."</pre>";} ?> 62
  63. 63. Creating a Facebook application1. Create a Facebook account2. Navigate to facebook.com/developers/apps.php3. Add the Developer application to the Facebook pro le4. Set up a new application 63
  64. 64. Setting up the environment5. Copy and paste the application keys to c/facebook_key.php:<?php$app_id = "142884214891";$api_key = "33e8acab5343ebfc3bc545c57c81c7e0";$secret_key = "d2721fe8b3276bd24a2d5a0d62b3f518";?>6. Download and unzip the Facebook PHP client library from facebook.com/developers/apps.php to the c/ folder7. Add the following lines at the top of c/facebook_1.php:<?phprequire_once(facebook_key.php);require_once("facebook-platform/php/facebook.php");$facebook = new Facebook($api_key, $secret);$user = $facebook->require_login(); ?> 64
  65. 65. Creating a Facebook survey in PHP8. Modify the rest of c/facebook_1.php as follows:<?php if(!($vote = @$_GET[vote])) { ?><fb:mp3 src="http://ismir2009.benfields.net/m/saxex.mp3"<object type="application/x-shockwave-flash" height="30"title="Autumn Leaves" album="Autumn Leaves" artist="Human ordata="mp3player.swf?autoplay=true&amp;song_url=http://Robot?" />ismir2009.benfields.net/m/saxex.mp3"></object><form><b>Personal data:</b><br /> Your name: <input type="text" name="first_name" size="17"> <input type="text" name="last_name" size="17"> Your birth year: <input type="text" name="birth_year" size="4"> Your sex: <input type="radio" name="sex" value="male"> Male <input type="radio" name="sex" value="female"> Female Your current home-town: <input type="text" name="city" size=25><b>Survey:</b><br /> This song was performed by a <input type="submit" value=Human name="vote" /> or by a <input type="submit" value=Robot name="vote" />?</form> 65
  66. 66. Creating a Facebook survey in PHP9. Modify the rest of c/facebook_1.php as follows:<?php } else { $info = $_GET; reset($facebook->api_client->users_getStandardInfo($user, array(first_name, last_name, sex, birthday,current_location))); $info[birth_year] = isset($info[birthday]) ? date("Y", strptime($info[birthday], "%m %d, %Y")) : ; $info[city] = isset($info[current_location]) ? $info[current_location][city] : ; $data_file = "/tmp/fb_survey.txt"; $text = "|". date("Y.m.d H:i:s"); $text .= "|". $vote; [...] echo "Votes so far: Human ". $human ." - Robot ". $robot; echo "<pre>". $file ."</pre>";} ?> 66
  67. 67. Deploying to FacebookUpdate the code from the le c/facebook_2.phpto a web server and update Facebook settings: Click on the canvas tab Fill the desired web address Specify the location of the code Select FBML as render method Save changes Visit the application page 67
  68. 68. Deploying to Facebookapps.facebook.com/CANVAS_PAGE_URL 68
  69. 69. Lessons learntDeveloping Facebook apps is fast and easyUsers can bene t from the social environmentMore social features can be exploited 69
  70. 70. CONCLUSIONS 70
  71. 71. Mining the web for musical data is easy and fastEvery researcher can bene t from this approachWeb APIs can be combined for speci c goalsOnline social networks are practical to collecthuman experiences and evaluate hypotheses e web also o ers tools for analysis and graphs Give it a try! 71

×