Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Dark Art: Is Music Recommendation Science a Science


Published on

WOMRAD Keynote at Recsys2010

Published in: Technology

The Dark Art: Is Music Recommendation Science a Science

  1. The Dark Art Is Music Recommendation Science a Science? Michael Papish 2010.09.26 WOMRAD | Barcelona
  2. Agenda <ul><li>45 minutes talking </li></ul><ul><li>10 minutes of questions/discussion </li></ul><ul><li>… But, feel free to shout out questions/comments as we go. Please do not throw things until the end, however. </li></ul>Page
  3. Our perspective on MIR Page A little like outsiders peering in… hippie drum circle
  4. Your perspective on us Page … or, less charitably Requires change in USA Constitution before can be elected President
  5. Thesis <ul><li>Making good recommendations is more art than science </li></ul>Page Might be a Dark Art since the practitioner can manipulate the subject?
  6. Touchpoints with early MIR Page Pattie Maes Eric Scheirer Rashmi Sinha <ul><li>Early advisor </li></ul><ul><li>Practical experience running into limits of pure CF </li></ul><ul><li>Attended dissertation defense </li></ul><ul><li>Concludes that acoustics alone couldn’t predict preference </li></ul><ul><li>Swearingen, K. & Sinha, R. Beyond Algorithms: An HCI Perspective on Recommender Systems. (2001) </li></ul><ul><li>MUI part of early user surveys </li></ul><ul><li>Concludes that user psychology and trust might play biggest roles in recommendation efficacy </li></ul>
  7. <ul><li>Combines both humans and machines </li></ul><ul><li>Focuses on understanding both content and listeners </li></ul><ul><li>Only uses transparent algorithms which contain human understandable features so we can tweak, tune and curate results </li></ul><ul><li>Embraces the emotional and subjective aspects of the problem </li></ul><ul><li>Optimizes user trust </li></ul><ul><li>Uses different techniques for different applications and domains </li></ul>Page Our approach to recommendations These foundations plus 10 years of practical experience building recommenders has led us to an approach that…
  8. My immature attempt at a Philosophy of Recommendation Science
  9. <ul><ul><li>Doesn’t actually work in practice (e.g. induction), but a nice idea </li></ul></ul>Page What is Science? * Karl Popper: “In so far as a scientific statement speaks about reality, it must be falsifiable; and in so far as it is not falsifiable, it does not speak about reality.” * In one PPT slide Thomas Kuhn: “Under normal conditions the research scientist is not an innovator but a solver of puzzles, and the puzzles upon which he concentrates are just those which he believes can be both stated and solved within the existing scientific tradition.” <ul><ul><li>Science consists of puzzle solving </li></ul></ul><ul><ul><li>A solution to the puzzle is reproducible and measurable </li></ul></ul>
  10. Page What is the puzzle of recommendations? Puzzle: Understand the listener’s preferences and help her find and discover music she likes Foundational Problem: How do we measure success? Without an objective metric how do we conduct scientific research?
  11. Page This is not a new observation ISMIR 2001 Resolution: There is a current need for metrics to evaluate the performance and accuracy of the various approaches and algorithms being developed within the Music Information Retrieval research community. Herlocker et al. Evaluating collaborative filtering recommender systems. [2004]: Each algorithmic approach has adherents who claim it to be superior for some purpose. Clearly identifying the best algorithm for a given purpose has proven challenging, in part because researchers disagree on which attributes should be measured, and on which metrics should be used for each attribute. Researchers who survey the literature will find over a dozen quantitative metrics and additional qualitative evaluation techniques.
  12. Page Example Metrics <ul><li>Music Information Retrieval Evaluation eXchange (MIREX) established in 2005 </li></ul><ul><ul><li>Very acoustic focused </li></ul></ul><ul><ul><li>Many tasks compare computer guesses on audio similarity with human-generated reference data </li></ul></ul><ul><ul><li>Strength: Objective; easy to measure; automated once reference set exists </li></ul></ul><ul><ul><li>Problem: Too narrowly defined for the problem of Music Recommendations. Similarity ≠ good recommendation </li></ul></ul>
  13. Page Example Metrics <ul><li>Statistical comparisons across standardized data sets of usage data </li></ul><ul><ul><li>e.g. Netflix Prize </li></ul></ul><ul><ul><li>How well can the recommender predict ratings/usage? </li></ul></ul><ul><ul><li>Strength: Objective, easy to measure and completely automated </li></ul></ul><ul><ul><li>Problem: Doesn’t measure “discovery”; resolution of ratings data isn’t very high; unclear if better predictions actually equal better user experience </li></ul></ul>
  14. Page Example Metrics <ul><li>More holistic approaches to measurement </li></ul><ul><ul><li>Surveying users about trust and satisfaction </li></ul></ul><ul><ul><ul><li>e.g. Swearingen & Sinha </li></ul></ul></ul><ul><ul><ul><li>Strength: Measures the thing we actually care about </li></ul></ul></ul><ul><ul><ul><li>Weakness: Time consuming to measure; not-automated nor standardized; objective? </li></ul></ul></ul><ul><ul><li>Practical “business” metrics </li></ul></ul><ul><ul><ul><li>e.g. additional sales; lower churn; more “sticky” </li></ul></ul></ul><ul><ul><ul><li>Weakness (?): Adopting these metrics seems to immediately remove the problem from the realm of science into practical application </li></ul></ul></ul>
  15. Page The (abbreviated) History of MIR The dawn of time (1960) The start of infinite music (2001) Dotcoms are cool again (2006) The Future (according to Gartner/Jupiter always happening in present +3 years) “ Hmm, it’s hard to file LPs in the card catalog” Learned how to stop making bad recs The Age of Good Recs Crisis? – The Dark Arts Age The Wall of Really Good Recs (no David Gilmour) The golden age of Mark Zuckerberg house parties (2005)
  16. Worries that Recommendation Science is no longer a Science
  17. Page State of the Art <ul><li>It is easy to know when we make bad recommendations </li></ul><ul><li>But, it is hard (impossible?) to know if we are making good vs. really good recommendations </li></ul><ul><li>We have exhausted all the simple and objective metrics </li></ul><ul><li>In order create better listener experiences, we need to adopt more holistic metrics </li></ul><ul><ul><li>Seeing this at WOMRAD with Pirkka Åman & Lassi Liikkanen and Audrey Laplante papers </li></ul></ul>
  18. Page Worries <ul><li>But, what if these holistic metrics aren’t objective? </li></ul><ul><li>What if the psychological aspects of user trust (UI, presentation, context) drown MIR when it comes to good vs. really good recs </li></ul><ul><ul><li>Hunch: We may have crossed this barrier </li></ul></ul><ul><li>What if user preference is too variable and unstable to measure carefully? </li></ul><ul><ul><li>Hunch: Practical experience interacting with normal listeners indicates this is true </li></ul></ul>
  19. Page Conclusions <ul><li>Music Recommendation Science is transitioning from science to pure practical challenge </li></ul><ul><li>The fuzzier aspects of the problem will begin to dominate </li></ul><ul><li>We are entering the Age of the Dark Art </li></ul>
  20. A Way Forward
  21. Page A Way Forward <ul><li>Option #1: Focus on unsolved research problems in MIR. Such as… </li></ul><ul><ul><li>How can we better elicit user preference? </li></ul></ul><ul><ul><li>What are the limits of the average listener? </li></ul></ul><ul><ul><ul><li>Can she discern between a random arrangement of songs vs. an expert arrangement of songs? </li></ul></ul></ul><ul><ul><ul><li>What happens when we show a list of artists vs. albums vs. tracks vs. playlists to a listener? </li></ul></ul></ul><ul><ul><ul><li>Can we build tools/games to expand these limits? </li></ul></ul></ul><ul><ul><li>What is a listener profile? </li></ul></ul><ul><ul><li>Can we quantify the importance of sonic vs. cultural/social preference? </li></ul></ul><ul><ul><li>Can we add relevance layers to search/browse cases? </li></ul></ul><ul><ul><li>And, most importantly … </li></ul></ul>We have 2 options…
  22. The biggest problem in music science today Page Can we get this guy to stop ruining every show we go to?
  23. Page Or, Option #2 Or, seeing as this presentation is the very first session at the very first workshop on the very first day of RecSys2010, we could…
  24. Page Or, Option #2 … adjourn to here for the rest of the week
  25. Discussion / Tomato Throwing Time
  26. Tik tok.