USING SEARCH ENGINES FOR
 CLASSIFICATION: DOES IT
       STILL WORK?
    Sten Govaerts, Nik Corthaut, Erik Duval
• Our   problem

• Classification   using search engines

• The   setup

• The   evaluation

• Conclusion
TUNIFY
TUNIFY
TUNIFY
HOW DOES IT WORK?

• manually   annotated metadata

•5 music experts at Aristo Music and
 different consultants

• almost ...
PROBLEMS

• satisfying
         the music choice of all
 customers

  • retail
         and catering differ from you
    a...
GENERATE THE METADATA

• from    different sources:

  • the audio signal
  • web sources
  • the Aristo database
  • atte...
GENRE...




• our   master thesis looked at different ways to generate genre...
ONE APPROACH...

• M. Schedl, T. Pohle, P. Knees, G. Widmer, “Assigning
                                                  ...
CLASSIFICATION WITH
  SEARCH ENGINES
     using co-occurrence
CLASSIFICATION WITH
  SEARCH ENGINES
     using co-occurrence
CLASSIFICATION WITH
  SEARCH ENGINES
     using co-occurrence

   Artist + Genre + Schema
CLASSIFICATION WITH
  SEARCH ENGINES
     using co-occurrence

   Artist + Genre + Schema
CLASSIFICATION WITH
  SEARCH ENGINES
     using co-occurrence

   Artist + Genre + Schema
CLASSIFICATION WITH
  SEARCH ENGINES
     using co-occurrence

   Artist + Genre + Schema
Rock:      Jazz:




Blues:      Pop:




Country:   Metal:
Rock:              Jazz:




           0,013            0,013
Blues:              Pop:



           0,009            0,0...
RESULTS

• master   thesis student’s results were much worse

• what   happened?

 • did   Google search result count chan...
HOW TO EVALUATE THIS?


• re-run   the original experiment

  • evaluate   on the same data set: 1995 artists and 9 genres...
THE DATA SET
       Blues   Country   Electronic
       Folk    Jazz      Metal
       Rap     Reggae    RnB
THE DATA SET
                  Blues   Country   Electronic
                  Folk    Jazz      Metal
                  Ra...
THE DATA SET
       Blues   Country   Electronic
       Folk    Jazz      Metal
       Rap     Reggae    RnB
MOTION CHART



• http://hmdb.cs.kuleuven.be/muzik/gapminder.html
MORE FINE-GRAINED...

• 18   artists

• more  search engines: Google.co.uk/.fr/.be, uk/
 fr.search.yahoo.com

• twice   a ...
2 Pac            Rap
  Alan Lomax         Folk
  Art Pepper          Jazz
 Cradle of Filth     Metal
 David Parsons     El...
MAIN SEARCH ENGINE
      RESULTS
REGIONAL GOOGLES
WHAT TO USE?

• use   Google when it’s stable else rely on Yahoo!

• when    is it stable? test with a small set

  • some...
CONCLUSION

• still   works after 3 years

• Google      -> Yahoo! -> Live! Search

• why     does Google fluctuate?

•a ge...
FUTURE WORK

• understand the performance
 differences of regional search
 engines

• use   alternative search engines

• ...
Q & A.
DEMO METADATA
              GENERATION



• http://ariadne.cs.kuleuven.be/samgi-service/
Using search engines for classification: does it still work?
Using search engines for classification: does it still work?
Using search engines for classification: does it still work?
Using search engines for classification: does it still work?
Using search engines for classification: does it still work?
Using search engines for classification: does it still work?
Upcoming SlideShare
Loading in...5
×

Using search engines for classification: does it still work?

413

Published on

My presentation at the adMIRe workshop on ISM 2009 in San Diego. The presentation is about our study on the use of search engines to classify genres.

Published in: Education, Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
413
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • NOT the Southern African Media and Gender Institute.
  • 1. MG is better than MS, a possible explanation is that style is a broader term than genre for music
    2. Google outperforms Yahoo! & Live!
    3. results fluctuate over time
    4. technical issues with Yahoo! only a fraction of the artists are retrieved
  • 1. the accuracy is not the exactly the same as for the large data set. but the overall trends are similar.
    2. MG schema is still more accurate
    3. Yahoo! MG is a very stable
    4. Live! is still the worst and Google the best!
  • 1. Yahoo! is very stable
    2. Live is the worst, Google the best!
    3. no noticable differences between Live and Bing. Bing was launched on 3 June.
    4. On 29 July, collaboration between Bing and Yahoo
  • 1. .com performs best! -> co.uk -> fr -> be
    2. fr and be worse: maybe because genres are in english
    3. one could also check if local artists are classified better
  • correct: light
    incorrect: dark

    1. yahoo most stable
    2. google changes most often.
    3. changing from correct to incorrect occurs most, but no clear pattern
    4. Live seems to struggle with the same artists, one time they do it correctly, the next time wrong.
  • Transcript of "Using search engines for classification: does it still work?"

    1. 1. USING SEARCH ENGINES FOR CLASSIFICATION: DOES IT STILL WORK? Sten Govaerts, Nik Corthaut, Erik Duval
    2. 2. • Our problem • Classification using search engines • The setup • The evaluation • Conclusion
    3. 3. TUNIFY
    4. 4. TUNIFY
    5. 5. TUNIFY
    6. 6. HOW DOES IT WORK? • manually annotated metadata •5 music experts at Aristo Music and different consultants • almost 80,000 songs • but, not enough...
    7. 7. PROBLEMS • satisfying the music choice of all customers • retail and catering differ from you and me! • new markets • react fast on emerging music trends • adding the full Belgian library catalog
    8. 8. GENERATE THE METADATA • from different sources: • the audio signal • web sources • the Aristo database • attention metadata • using our metadata generation framework: SamgI
    9. 9. GENRE... • our master thesis looked at different ways to generate genre...
    10. 10. ONE APPROACH... • M. Schedl, T. Pohle, P. Knees, G. Widmer, “Assigning and Visualizing Music Genres by Web-based Co-occurrence Analysis”, Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 260-265. • G. Geleijnse, J. Korst, "Web-based Artist Categorization", Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 266 - 271.
    11. 11. CLASSIFICATION WITH SEARCH ENGINES using co-occurrence
    12. 12. CLASSIFICATION WITH SEARCH ENGINES using co-occurrence
    13. 13. CLASSIFICATION WITH SEARCH ENGINES using co-occurrence Artist + Genre + Schema
    14. 14. CLASSIFICATION WITH SEARCH ENGINES using co-occurrence Artist + Genre + Schema
    15. 15. CLASSIFICATION WITH SEARCH ENGINES using co-occurrence Artist + Genre + Schema
    16. 16. CLASSIFICATION WITH SEARCH ENGINES using co-occurrence Artist + Genre + Schema
    17. 17. Rock: Jazz: Blues: Pop: Country: Metal:
    18. 18. Rock: Jazz: 0,013 0,013 Blues: Pop: 0,009 0,015 Country: Metal: 0,009 0,005
    19. 19. RESULTS • master thesis student’s results were much worse • what happened? • did Google search result count change? • has Google Search API different results? • is the student’s implementation correct?
    20. 20. HOW TO EVALUATE THIS? • re-run the original experiment • evaluate on the same data set: 1995 artists and 9 genres. • different search engines: Google,Yahoo! and Live! Search. • over time: 8 times over a period of 36 days.
    21. 21. THE DATA SET Blues Country Electronic Folk Jazz Metal Rap Reggae RnB
    22. 22. THE DATA SET Blues Country Electronic Folk Jazz Metal Rap Reggae RnB 10% 9% 3% 2% 12% 13% 5% 4% 41%
    23. 23. THE DATA SET Blues Country Electronic Folk Jazz Metal Rap Reggae RnB
    24. 24. MOTION CHART • http://hmdb.cs.kuleuven.be/muzik/gapminder.html
    25. 25. MORE FINE-GRAINED... • 18 artists • more search engines: Google.co.uk/.fr/.be, uk/ fr.search.yahoo.com • twice a day for 53 days • 250,000 queries!
    26. 26. 2 Pac Rap Alan Lomax Folk Art Pepper Jazz Cradle of Filth Metal David Parsons Electronic Desmond Dekker Reggae Downpour Metal IceT Rap Jerry Butler RnB Joy Lynn White Country Louisiana Red Blues Lou Rawls RnB LTJ Bukem Electronic Peter Tosh Reggae Pinetop Smith Jazz Robert Johnson Blues Roy Rogers Country Steeleye Span Folk
    27. 27. MAIN SEARCH ENGINE RESULTS
    28. 28. REGIONAL GOOGLES
    29. 29. WHAT TO USE? • use Google when it’s stable else rely on Yahoo! • when is it stable? test with a small set • some artists get classified incorrectly on bad days • compare the accuracy achieved with the test set to the average.
    30. 30. CONCLUSION • still works after 3 years • Google -> Yahoo! -> Live! Search • why does Google fluctuate? •a generic version of an all purpose classifier is implemented in metadata generation framework
    31. 31. FUTURE WORK • understand the performance differences of regional search engines • use alternative search engines • tweak the genre taxonomy depending on the search engine
    32. 32. Q & A.
    33. 33. DEMO METADATA GENERATION • http://ariadne.cs.kuleuven.be/samgi-service/
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×