Auditory Display in MIR

Becky Stewart
Becky Stewartresearcher at Centre for Digital Music, Queen Mary, University of London
stop looking for music
and start listening to it:

auditory display in music information retrieval interfaces

Becky Stewart
rebecca.stewart@eecs.qmul.ac.uk



Centre for Digital Music
School of Electronic Engineering and Computer Science
Queen Mary, University of London
In this talk we will ...
In this talk we will ...

• Review how search and browse for information
In this talk we will ...

• Review how search and browse for information


• Look at current commercially-available interfaces
In this talk we will ...

• Review how search and browse for information


• Look at current commercially-available interfaces


• Discuss why listening should be integrated
In this talk we will ...

• Review how search and browse for information


• Look at current commercially-available interfaces


• Discuss why listening should be integrated


• Look at solutions presented by academia
In this talk we will ...

• Review how search and browse for information


• Look at current commercially-available interfaces


• Discuss why listening should be integrated


• Look at solutions presented by academia


• Review recent research from C4DM
In this talk we will ...

• Review how search and browse for information


• Look at current commercially-available interfaces


• Discuss why listening should be integrated


• Look at solutions presented by academia


• Review recent research from C4DM


• Wrap up and conclude
how do we find information?
let’s start with something easy...
Auditory Display in MIR
Familiar interface

Summarizes information

Users seldom scroll down, almost never go to next page
how about better browsing?
Auditory Display in MIR
Easy to traverse information

Relationships between items can be inferred

Encourages browsing
what about something other than text?
Auditory Display in MIR
Users seldom go on to next page of results

Broad overview, but can zoom in on specific result

All other information beyond image is suppressed, but recallable
what about time-based media?
Auditory Display in MIR
Less helpful than the image search results

Difficult to navigate results

Have to go to web page to view any portion of the video

Music or audio results only is not an option
so what about music interfaces?
how do we find music?
Auditory Display in MIR
commercial interfaces use a combination
  of text fields and seed songs/artists
commercial interfaces use a combination
  of text fields and seed songs/artists




     academic interfaces like maps
commercial interfaces use a combination
             of text fields and seed songs/artists




                  academic interfaces like maps

for searches results are lists of text perhaps enhanced with images,
general knowledge and hyperlinks
commercial interfaces use a combination
             of text fields and seed songs/artists




                  academic interfaces like maps

for searches results are lists of text perhaps enhanced with images,
general knowledge and hyperlinks

songs are played back one at a time and only if explicitly requested by user
Also a recent increase in
network interaction paradigms.



                                                                                                                 Have an account? Sign In                  Share Path

                                                                                                                 history




                                                   Joan as Police Woman



                                                        Mystery Jets
              Enter artist name


                                                     Jeremy Warmsley                                                           Alanis Morissette

                           Laura Marling                                                   Nellie McKay

                                                      Emmy the Great                                                              Ani DiFranco

                                                                                          Kimya Dawson

                                                         Basia Bulat                                                              Aimee Mann

                                                                                          Fiona Apple
                                                    Regina Spektor                                                                   Liz Phair

                                                                                           Imogen Heap
        Fast As You Can                                                                                                          Sara Bareilles
        Fiona Apple                                                                            Rilo Kiley

                                                                                                                               Sarah McLachlan

                                                                                               Tori Amos
                  00:22

         Maze Radio



  Fiona Apple has been
  visited 376 times.
                                           Powered by The Echo Nest.   Music powered by Rdio    More info at Music Machinery   Check out the Labyrinth of Genre
why should audio be integrated?
Bjork / Björk

• textual metadata can be malformed or wrong



• an empty text field is less than inspiring



• text can be a barrier to discovery

   • previous knowledge is needed

   • difficult to move into tail, will stay in the head
 Celma and Cano From hits to niches? or how popular artists can bias music recommendation and discovery. In
 Proc. of 2nd Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition (ACM KDD),
 Las Vegas, Nevada, USA, August 2008.
listening makes a difference

• users make different judgements about playlists when metadata is missing

 L. Barrington, R. Oda, and G. Lanckriet. Smarter than Genius: human evaluation of music recommender
 systems. In Proc. of ISMIR’09: 10th Int.Society for Music Information Retrieval Conf., pages 357–362, Kobe,
 Japan, October 2009.
listening is faster

• when search results are compiled into a single audio stream instead of a list
  of results, users find what they are looking for quicker

 S. Ali and P. Aarabi. A cyclic interface for the presentation of multiple music files. IEEE Trans. on Multimedia, 10
 (5):780–793, August 2008.



• listeners can find music without a GUI faster than with an iPod, and be just as
 happy with their selection

 Andreja Andric, Pierre-Louis Xech, and Andrea Fantasia, “Music mood wheel: Improving browsing experience on
 digital content through an audio interface,”in Proc. of 2nd Int. Conf. on Automated Production of Cross Media
 Content for Multi-Channel Distribution (AXMEDIS’06), 2006.
listening is effective

• users can understand and navigate a collection of music as effectively
  without a GUI as with one


• they are slower, but don’t make significantly more mistakes

 S. Pauws, D. Bouwhuis, and B. Eggen. Programming and enjoying music with your eyes closed. In CHI ’00:
 Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, pages 376–383. ACM, 2000. doi:
 10.1145/332040.332460.
how can interfaces use more listening?
not by being VoiceOver
not by being VoiceOver
maps
mused

• passive listening

 G. Coleman. Mused: navigating the personal
 sample library. In Proc. of ICMC: Int.
 Computer Music Conf., Copenhagen,
 Denmark, August 2007.



• youtube
 http://www.youtube.com/watch?
 v=DuuESpj558Y&feature=related
mused

• passive listening

 G. Coleman. Mused: navigating the personal
 sample library. In Proc. of ICMC: Int.
 Computer Music Conf., Copenhagen,
 Denmark, August 2007.



• youtube
 http://www.youtube.com/watch?
 v=DuuESpj558Y&feature=related
sonic browser

• hugely influential interface

• introduced aurally exploring a
  map of sounds

• direct sonification
 M. Fernström and E. Brazil. Sonic browsing:
 an auditory tool for multimedia asset
 management. In Proc. of ICAD ’01:
 Internation Conf. on Auditory Display, pages


 132–135, Espoo, Finland, August 2001. M.
 Fernström and C. McNamara. After direct
 manipulation - direct sonification. In Proc. of
 ICAD ’98: Int. Conf. on Auditory Display,
 1998.
soundtorch

• 3D version of sonic browser
 S. Heise, M. Hlatky, and J. Loviscach.
 SoundTorch: Quick browsing in large audio
 collections. In Proc. of AES 125th Conv., San
 Francisco, CA, October 2008.

 S. Heise, M. Hlatky, and J. Loviscach. Aurally
 and visually enhanced audio search with
 SoundTorch. In CHI ’09: Proc. of the 27th int.
 conf.e extended abstracts on Human factors
 in computing systems, pages 3241–3246,
 Boston, MA, USA, April 2009. doi:
 10.1145/1520340.1520465.


• youtube
 http://www.youtube.com/watch?v=eiwj7Td7Pec
soundtorch

• 3D version of sonic browser
 S. Heise, M. Hlatky, and J. Loviscach.
 SoundTorch: Quick browsing in large audio
 collections. In Proc. of AES 125th Conv., San
 Francisco, CA, October 2008.

 S. Heise, M. Hlatky, and J. Loviscach. Aurally
 and visually enhanced audio search with
 SoundTorch. In CHI ’09: Proc. of the 27th int.
 conf.e extended abstracts on Human factors
 in computing systems, pages 3241–3246,
 Boston, MA, USA, April 2009. doi:
 10.1145/1520340.1520465.


• youtube
 http://www.youtube.com/watch?v=eiwj7Td7Pec
neptune
• based on Islands of Music

 P. Knees, M. Schedl, T. Pohle, and G.
 Widmer. An innovative three-dimensional
 user interface for exploring music
 collections enriched with meta-
 information from the web. In
 MULTIMEDIA ’06: Proc. of the 14th
 annual ACM int.l conf. on Multimedia,
 pages 17–24, Santa Barbara, CA, USA,
 2006. doi: 10.1145/1180639.1180652.
neptune
• based on Islands of Music

 P. Knees, M. Schedl, T. Pohle, and G.
 Widmer. An innovative three-dimensional
 user interface for exploring music
 collections enriched with meta-
 information from the web. In
 MULTIMEDIA ’06: Proc. of the 14th
 annual ACM int.l conf. on Multimedia,
 pages 17–24, Santa Barbara, CA, USA,
 2006. doi: 10.1145/1180639.1180652.
sonixplorer
• extension of neptune

• landscape can be marked up
  by user

• introduced focus

• youtube
 http://www.youtube.com/watch?v=mIfWg2Eex74

 D. Lübbers. Sonixplorer: Combining
 visualization and auralization for content-
 based exploration of music collections. In
 Proc. of ISMIR’05: 6th Int. Society for Music
 Information Retrieval Conf., pages 590–593,
 London, UK, 2005.

 D. Lübbers and M. Jarke. Adaptive
 multimodal exploration of music collections.
 In Proc. of ISMIR’09: 10th Int. Society for
 Music Information Retrieval Conf., pages
 195–200, Kyoto, Japan, 2009.
sonixplorer
• extension of neptune

• landscape can be marked up
  by user

• introduced focus

• youtube
 http://www.youtube.com/watch?v=mIfWg2Eex74

 D. Lübbers. Sonixplorer: Combining
 visualization and auralization for content-
 based exploration of music collections. In
 Proc. of ISMIR’05: 6th Int. Society for Music
 Information Retrieval Conf., pages 590–593,
 London, UK, 2005.

 D. Lübbers and M. Jarke. Adaptive
 multimodal exploration of music collections.
 In Proc. of ISMIR’09: 10th Int. Society for
 Music Information Retrieval Conf., pages
 195–200, Kyoto, Japan, 2009.
what’s the problem?
what’s the problem?

• too much information thrown at the user
what’s the problem?

• too much information thrown at the user




• does not translate well to mobile devices

   • rendering spatial audio

   • reliance on screens
my research
map paradigm without any visuals
Auditory Display in MIR
Auditory Display in MIR
Auditory Display in MIR
evaluation

• user study with 12 users


• most liked the idea


• but the implementation needed improvement


• confusion as to how to navigate through the space


• some people adverse to concurrent playback
add visuals and improve physical controller,
but keep dependence on audio
cyclic playback

• inspired by
 S. Ali and P. Aarabi. A cyclic interface for the
 presentation of multiple music files. IEEE
 Trans. on Multimedia, 10(5):780–793, August
 2008.



• hear everything within 20
  seconds


• user can control concurrent
  playback
Auditory Display in MIR
Auditory Display in MIR
Auditory Display in MIR
evaluation

• no formal evaluation, but demonstrated to a variety of individuals and small
  groups (approximately 40 people)


• improved interaction with physical controller


• perhaps too many controls, much steeper learning curve


• much room for improvement
art installation
Michela Magas
public installation

• shown in Information Aesthetics at SIGGRAPH 2009


• approximately 1000 passed through the exhibit


• children, students, artists, designers, technologists


• quick to bring smiles - it was fun, people even brought back friends to
  experience it


• easy to learn how to use
conclusions drawn from research
conclusions drawn from research

• context is key when shaping interaction

  • users will approach an interface with previous knowledge, need to build on
    and incorporate that knowledge
conclusions drawn from research

• context is key when shaping interaction

   • users will approach an interface with previous knowledge, need to build on
     and incorporate that knowledge


• audio can’t be subtle

   • can’t rely on complex information to be universally implied through only
     audio
conclusions drawn from research

• context is key when shaping interaction

   • users will approach an interface with previous knowledge, need to build on
     and incorporate that knowledge


• audio can’t be subtle

   • can’t rely on complex information to be universally implied through only
     audio


• can (and should) be fun
conclusions drawn from research

• context is key when shaping interaction

   • users will approach an interface with previous knowledge, need to build on
     and incorporate that knowledge


• audio can’t be subtle

   • can’t rely on complex information to be universally implied through only
     audio


• can (and should) be fun


• maps aren’t great, there must be something better
why haven’t these ideas caught on?
why haven’t these ideas caught on?

• solutions use non-scalable algorithms that are impractical for commercial
  applications (a problem not limited to only interfaces within MIR)


• music is increasingly in the cloud, looking at entire collections at once is not
  useful


• portability across devices


• many of them just don’t work that well
   • most have very simple acoustics models
   • too much information thrown at user, or information is not organized in an
     accessible way
flickr:jlcwalker




flickr:matsber
what am I doing at nyu?
concentrating on how a small collection of songs
can be best presented to a user
concentrating on how a small collection of songs
can be best presented to a user




i.e. how can the results of a search or browse
query be better presented?
Auditory Display in MIR
Auditory Display in MIR
Auditory Display in MIR
Auditory Display in MIR
Experimental Design - Aims of Experiment




 To determine the best interface parameters for music
 search and browsing tasks.
Experimental Design - Independent Variables

                   Number of Songs:
                   1 to 5 songs play concurrently
Experimental Design - Independent Variables

                   Number of Songs:
                   1 to 5 songs play concurrently




                   Musical and Signal Content of Songs:
                   Similar or dissimilar.
Experimental Design - Independent Variables

                   Number of Songs:
                   1 to 5 songs play concurrently




                   Musical and Signal Content of Songs:
                   Similar or dissimilar.




                   Visualization:
                   Whether interactive graphics representing each
                   song are presented
Experimental Design - Dependent Variables
Experimental Design - Dependent Variables

 Search

  • A song is played and the participant needs to find that song in the
    collection.

  • No metadata is displayed.

  • The task is timed.
Experimental Design - Dependent Variables

 Search

  • A song is played and the participant needs to find that song in the
    collection.

  • No metadata is displayed.

  • The task is timed.

 Browse

  • A situation is described and the participant is asked to find a song that fits
    the situation.

  • The task is timed.
Experimental Design - Participant Experience
Experimental Design - Participant Experience

1. Participant uses simplified version of interface with only 1 song to choose
   an HRTF set.
Experimental Design - Participant Experience

1. Participant uses simplified version of interface with only 1 song to choose
   an HRTF set.
2. A video explains how to use the interface and the participant has
   approximately 5 minutes to practice a search task and a browsing task.
Experimental Design - Participant Experience

1. Participant uses simplified version of interface with only 1 song to choose
   an HRTF set.
2. A video explains how to use the interface and the participant has
   approximately 5 minutes to practice a search task and a browsing task.
3. For about 45 minutes, the participant completes a series of search and
   browsing tasks.
Experimental Design - Participant Experience

1. Participant uses simplified version of interface with only 1 song to choose
   an HRTF set.
2. A video explains how to use the interface and the participant has
   approximately 5 minutes to practice a search task and a browsing task.
3. For about 45 minutes, the participant completes a series of search and
   browsing tasks.
4. The participant completes a short questionnaire about their experience so
   far.
Experimental Design - Participant Experience

1. Participant uses simplified version of interface with only 1 song to choose
   an HRTF set.
2. A video explains how to use the interface and the participant has
   approximately 5 minutes to practice a search task and a browsing task.
3. For about 45 minutes, the participant completes a series of search and
   browsing tasks.
4. The participant completes a short questionnaire about their experience so
   far.
5. 15 minute break away from the computer and headphones.
Experimental Design - Participant Experience

1. Participant uses simplified version of interface with only 1 song to choose
   an HRTF set.
2. A video explains how to use the interface and the participant has
   approximately 5 minutes to practice a search task and a browsing task.
3. For about 45 minutes, the participant completes a series of search and
   browsing tasks.
4. The participant completes a short questionnaire about their experience so
   far.
5. 15 minute break away from the computer and headphones.
6. The participant completes a second 45 minute session of search and
   browsing tasks.
Experimental Design - Participant Experience

1. Participant uses simplified version of interface with only 1 song to choose
   an HRTF set.
2. A video explains how to use the interface and the participant has
   approximately 5 minutes to practice a search task and a browsing task.
3. For about 45 minutes, the participant completes a series of search and
   browsing tasks.
4. The participant completes a short questionnaire about their experience so
   far.
5. 15 minute break away from the computer and headphones.
6. The participant completes a second 45 minute session of search and
   browsing tasks.
7. The participant completes a final questionnaire.
to conclude
search engines are tuned for the type of
information being sought
search engines are tuned for the type of
information being sought

but they break down when presenting time-based
media
search engines are tuned for the type of
information being sought

but they break down when presenting time-based
media

in our case, music
direct manipulation to direct sonification
direct manipulation to direct sonification

listen to the music first, then get more information if
so desired
direct manipulation to direct sonification

listen to the music first, then get more information if
so desired

this is done by using auditory displays
a lot of focus on map-based paradigms, but it is
time to move on
a lot of focus on map-based paradigms, but it is
time to move on

concurrent presentation of audio is a good idea
a lot of focus on map-based paradigms, but it is
time to move on

concurrent presentation of audio is a good idea

but spatialization should not be used to represent
complex relationships
a lot of focus on map-based paradigms, but it is
time to move on

concurrent presentation of audio is a good idea

but spatialization should not be used to represent
complex relationships

music is complex
incorporating listening improves music search
and discovery
incorporating listening improves music search
and discovery

so it should continue
incorporating listening improves music search
and discovery

so it should continue

the work I am doing during my visit at nyu will
measure whether this presented interface can
assist people in performing search and browse
tasks more efficiently
however, what I believe to be the most difficult
problem still remains to be addressed:

the cold start problem

future work needs to concentrate on how you
initiate a search or browsing task
thank you


these slides can be found at http://www.slideshare.net/beckystewart/presentations
1 of 110

Recommended

Noise monitoring by
Noise monitoringNoise monitoring
Noise monitoringNor Faizah Mohamad Zaki
9.6K views9 slides
Stop Looking and Start Listening by
Stop Looking and Start ListeningStop Looking and Start Listening
Stop Looking and Start ListeningBecky Stewart
500 views104 slides
Rf0310 Triple Scoop Music Blanchfield by
Rf0310 Triple Scoop Music BlanchfieldRf0310 Triple Scoop Music Blanchfield
Rf0310 Triple Scoop Music BlanchfieldRenegadePR
415 views4 slides
Music discovery on the net by
Music discovery on the netMusic discovery on the net
Music discovery on the netguestbf080
3.6K views15 slides
Artcasting at SFMOMA: First-Year Lessons, Future Challenges for Museum Podca... by
Artcasting at SFMOMA: First-Year Lessons, Future Challenges for Museum Podca...Artcasting at SFMOMA: First-Year Lessons, Future Challenges for Museum Podca...
Artcasting at SFMOMA: First-Year Lessons, Future Challenges for Museum Podca...Stephanie Pau
620 views30 slides
Inmi symposium williamsonandmullensiefen_2012 by
Inmi symposium williamsonandmullensiefen_2012Inmi symposium williamsonandmullensiefen_2012
Inmi symposium williamsonandmullensiefen_2012vickywilliamson
422 views25 slides

More Related Content

Featured

ChatGPT and the Future of Work - Clark Boyd by
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
28K views69 slides
Getting into the tech field. what next by
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
6.6K views22 slides
Google's Just Not That Into You: Understanding Core Updates & Search Intent by
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
6.9K views99 slides
How to have difficult conversations by
How to have difficult conversations How to have difficult conversations
How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC
5.6K views19 slides
Introduction to Data Science by
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceChristy Abraham Joy
82.6K views51 slides
Time Management & Productivity - Best Practices by
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
169.8K views42 slides

Featured(20)

ChatGPT and the Future of Work - Clark Boyd by Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd28K views
Getting into the tech field. what next by Tessa Mero
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero6.6K views
Google's Just Not That Into You: Understanding Core Updates & Search Intent by Lily Ray
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray6.9K views
Time Management & Productivity - Best Practices by Vit Horky
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky169.8K views
The six step guide to practical project management by MindGenius
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius36.7K views
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright... by RachelPearson36
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson3612.7K views
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present... by Applitools
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools55.5K views
12 Ways to Increase Your Influence at Work by GetSmarter
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter401.7K views
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G... by DevGAMM Conference
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
DevGAMM Conference3.6K views
Barbie - Brand Strategy Presentation by Erica Santiago
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
Erica Santiago25.1K views
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well by Saba Software
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Saba Software25.3K views
Introduction to C Programming Language by Simplilearn
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
Simplilearn8.5K views
The Pixar Way: 37 Quotes on Developing and Maintaining a Creative Company (fr... by Palo Alto Software
The Pixar Way: 37 Quotes on Developing and Maintaining a Creative Company (fr...The Pixar Way: 37 Quotes on Developing and Maintaining a Creative Company (fr...
The Pixar Way: 37 Quotes on Developing and Maintaining a Creative Company (fr...
Palo Alto Software88.4K views
9 Tips for a Work-free Vacation by Weekdone.com
9 Tips for a Work-free Vacation9 Tips for a Work-free Vacation
9 Tips for a Work-free Vacation
Weekdone.com7.2K views
How to Map Your Future by SlideShop.com
How to Map Your FutureHow to Map Your Future
How to Map Your Future
SlideShop.com275.1K views

Auditory Display in MIR

  • 1. stop looking for music and start listening to it: auditory display in music information retrieval interfaces Becky Stewart rebecca.stewart@eecs.qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary, University of London
  • 2. In this talk we will ...
  • 3. In this talk we will ... • Review how search and browse for information
  • 4. In this talk we will ... • Review how search and browse for information • Look at current commercially-available interfaces
  • 5. In this talk we will ... • Review how search and browse for information • Look at current commercially-available interfaces • Discuss why listening should be integrated
  • 6. In this talk we will ... • Review how search and browse for information • Look at current commercially-available interfaces • Discuss why listening should be integrated • Look at solutions presented by academia
  • 7. In this talk we will ... • Review how search and browse for information • Look at current commercially-available interfaces • Discuss why listening should be integrated • Look at solutions presented by academia • Review recent research from C4DM
  • 8. In this talk we will ... • Review how search and browse for information • Look at current commercially-available interfaces • Discuss why listening should be integrated • Look at solutions presented by academia • Review recent research from C4DM • Wrap up and conclude
  • 9. how do we find information?
  • 10. let’s start with something easy...
  • 12. Familiar interface Summarizes information Users seldom scroll down, almost never go to next page
  • 13. how about better browsing?
  • 15. Easy to traverse information Relationships between items can be inferred Encourages browsing
  • 16. what about something other than text?
  • 18. Users seldom go on to next page of results Broad overview, but can zoom in on specific result All other information beyond image is suppressed, but recallable
  • 21. Less helpful than the image search results Difficult to navigate results Have to go to web page to view any portion of the video Music or audio results only is not an option
  • 22. so what about music interfaces? how do we find music?
  • 24. commercial interfaces use a combination of text fields and seed songs/artists
  • 25. commercial interfaces use a combination of text fields and seed songs/artists academic interfaces like maps
  • 26. commercial interfaces use a combination of text fields and seed songs/artists academic interfaces like maps for searches results are lists of text perhaps enhanced with images, general knowledge and hyperlinks
  • 27. commercial interfaces use a combination of text fields and seed songs/artists academic interfaces like maps for searches results are lists of text perhaps enhanced with images, general knowledge and hyperlinks songs are played back one at a time and only if explicitly requested by user
  • 28. Also a recent increase in network interaction paradigms. Have an account? Sign In Share Path history Joan as Police Woman Mystery Jets Enter artist name Jeremy Warmsley Alanis Morissette Laura Marling Nellie McKay Emmy the Great Ani DiFranco Kimya Dawson Basia Bulat Aimee Mann Fiona Apple Regina Spektor Liz Phair Imogen Heap Fast As You Can Sara Bareilles Fiona Apple Rilo Kiley Sarah McLachlan Tori Amos 00:22 Maze Radio Fiona Apple has been visited 376 times. Powered by The Echo Nest. Music powered by Rdio More info at Music Machinery Check out the Labyrinth of Genre
  • 29. why should audio be integrated?
  • 30. Bjork / Björk • textual metadata can be malformed or wrong • an empty text field is less than inspiring • text can be a barrier to discovery • previous knowledge is needed • difficult to move into tail, will stay in the head Celma and Cano From hits to niches? or how popular artists can bias music recommendation and discovery. In Proc. of 2nd Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition (ACM KDD), Las Vegas, Nevada, USA, August 2008.
  • 31. listening makes a difference • users make different judgements about playlists when metadata is missing L. Barrington, R. Oda, and G. Lanckriet. Smarter than Genius: human evaluation of music recommender systems. In Proc. of ISMIR’09: 10th Int.Society for Music Information Retrieval Conf., pages 357–362, Kobe, Japan, October 2009.
  • 32. listening is faster • when search results are compiled into a single audio stream instead of a list of results, users find what they are looking for quicker S. Ali and P. Aarabi. A cyclic interface for the presentation of multiple music files. IEEE Trans. on Multimedia, 10 (5):780–793, August 2008. • listeners can find music without a GUI faster than with an iPod, and be just as happy with their selection Andreja Andric, Pierre-Louis Xech, and Andrea Fantasia, “Music mood wheel: Improving browsing experience on digital content through an audio interface,”in Proc. of 2nd Int. Conf. on Automated Production of Cross Media Content for Multi-Channel Distribution (AXMEDIS’06), 2006.
  • 33. listening is effective • users can understand and navigate a collection of music as effectively without a GUI as with one • they are slower, but don’t make significantly more mistakes S. Pauws, D. Bouwhuis, and B. Eggen. Programming and enjoying music with your eyes closed. In CHI ’00: Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, pages 376–383. ACM, 2000. doi: 10.1145/332040.332460.
  • 34. how can interfaces use more listening?
  • 35. not by being VoiceOver
  • 36. not by being VoiceOver
  • 37. maps
  • 38. mused • passive listening G. Coleman. Mused: navigating the personal sample library. In Proc. of ICMC: Int. Computer Music Conf., Copenhagen, Denmark, August 2007. • youtube http://www.youtube.com/watch? v=DuuESpj558Y&feature=related
  • 39. mused • passive listening G. Coleman. Mused: navigating the personal sample library. In Proc. of ICMC: Int. Computer Music Conf., Copenhagen, Denmark, August 2007. • youtube http://www.youtube.com/watch? v=DuuESpj558Y&feature=related
  • 40. sonic browser • hugely influential interface • introduced aurally exploring a map of sounds • direct sonification M. Fernström and E. Brazil. Sonic browsing: an auditory tool for multimedia asset management. In Proc. of ICAD ’01: Internation Conf. on Auditory Display, pages 132–135, Espoo, Finland, August 2001. M. Fernström and C. McNamara. After direct manipulation - direct sonification. In Proc. of ICAD ’98: Int. Conf. on Auditory Display, 1998.
  • 41. soundtorch • 3D version of sonic browser S. Heise, M. Hlatky, and J. Loviscach. SoundTorch: Quick browsing in large audio collections. In Proc. of AES 125th Conv., San Francisco, CA, October 2008. S. Heise, M. Hlatky, and J. Loviscach. Aurally and visually enhanced audio search with SoundTorch. In CHI ’09: Proc. of the 27th int. conf.e extended abstracts on Human factors in computing systems, pages 3241–3246, Boston, MA, USA, April 2009. doi: 10.1145/1520340.1520465. • youtube http://www.youtube.com/watch?v=eiwj7Td7Pec
  • 42. soundtorch • 3D version of sonic browser S. Heise, M. Hlatky, and J. Loviscach. SoundTorch: Quick browsing in large audio collections. In Proc. of AES 125th Conv., San Francisco, CA, October 2008. S. Heise, M. Hlatky, and J. Loviscach. Aurally and visually enhanced audio search with SoundTorch. In CHI ’09: Proc. of the 27th int. conf.e extended abstracts on Human factors in computing systems, pages 3241–3246, Boston, MA, USA, April 2009. doi: 10.1145/1520340.1520465. • youtube http://www.youtube.com/watch?v=eiwj7Td7Pec
  • 43. neptune • based on Islands of Music P. Knees, M. Schedl, T. Pohle, and G. Widmer. An innovative three-dimensional user interface for exploring music collections enriched with meta- information from the web. In MULTIMEDIA ’06: Proc. of the 14th annual ACM int.l conf. on Multimedia, pages 17–24, Santa Barbara, CA, USA, 2006. doi: 10.1145/1180639.1180652.
  • 44. neptune • based on Islands of Music P. Knees, M. Schedl, T. Pohle, and G. Widmer. An innovative three-dimensional user interface for exploring music collections enriched with meta- information from the web. In MULTIMEDIA ’06: Proc. of the 14th annual ACM int.l conf. on Multimedia, pages 17–24, Santa Barbara, CA, USA, 2006. doi: 10.1145/1180639.1180652.
  • 45. sonixplorer • extension of neptune • landscape can be marked up by user • introduced focus • youtube http://www.youtube.com/watch?v=mIfWg2Eex74 D. Lübbers. Sonixplorer: Combining visualization and auralization for content- based exploration of music collections. In Proc. of ISMIR’05: 6th Int. Society for Music Information Retrieval Conf., pages 590–593, London, UK, 2005. D. Lübbers and M. Jarke. Adaptive multimodal exploration of music collections. In Proc. of ISMIR’09: 10th Int. Society for Music Information Retrieval Conf., pages 195–200, Kyoto, Japan, 2009.
  • 46. sonixplorer • extension of neptune • landscape can be marked up by user • introduced focus • youtube http://www.youtube.com/watch?v=mIfWg2Eex74 D. Lübbers. Sonixplorer: Combining visualization and auralization for content- based exploration of music collections. In Proc. of ISMIR’05: 6th Int. Society for Music Information Retrieval Conf., pages 590–593, London, UK, 2005. D. Lübbers and M. Jarke. Adaptive multimodal exploration of music collections. In Proc. of ISMIR’09: 10th Int. Society for Music Information Retrieval Conf., pages 195–200, Kyoto, Japan, 2009.
  • 48. what’s the problem? • too much information thrown at the user
  • 49. what’s the problem? • too much information thrown at the user • does not translate well to mobile devices • rendering spatial audio • reliance on screens
  • 51. map paradigm without any visuals
  • 55. evaluation • user study with 12 users • most liked the idea • but the implementation needed improvement • confusion as to how to navigate through the space • some people adverse to concurrent playback
  • 56. add visuals and improve physical controller, but keep dependence on audio
  • 57. cyclic playback • inspired by S. Ali and P. Aarabi. A cyclic interface for the presentation of multiple music files. IEEE Trans. on Multimedia, 10(5):780–793, August 2008. • hear everything within 20 seconds • user can control concurrent playback
  • 61. evaluation • no formal evaluation, but demonstrated to a variety of individuals and small groups (approximately 40 people) • improved interaction with physical controller • perhaps too many controls, much steeper learning curve • much room for improvement
  • 64. public installation • shown in Information Aesthetics at SIGGRAPH 2009 • approximately 1000 passed through the exhibit • children, students, artists, designers, technologists • quick to bring smiles - it was fun, people even brought back friends to experience it • easy to learn how to use
  • 66. conclusions drawn from research • context is key when shaping interaction • users will approach an interface with previous knowledge, need to build on and incorporate that knowledge
  • 67. conclusions drawn from research • context is key when shaping interaction • users will approach an interface with previous knowledge, need to build on and incorporate that knowledge • audio can’t be subtle • can’t rely on complex information to be universally implied through only audio
  • 68. conclusions drawn from research • context is key when shaping interaction • users will approach an interface with previous knowledge, need to build on and incorporate that knowledge • audio can’t be subtle • can’t rely on complex information to be universally implied through only audio • can (and should) be fun
  • 69. conclusions drawn from research • context is key when shaping interaction • users will approach an interface with previous knowledge, need to build on and incorporate that knowledge • audio can’t be subtle • can’t rely on complex information to be universally implied through only audio • can (and should) be fun • maps aren’t great, there must be something better
  • 70. why haven’t these ideas caught on?
  • 71. why haven’t these ideas caught on? • solutions use non-scalable algorithms that are impractical for commercial applications (a problem not limited to only interfaces within MIR) • music is increasingly in the cloud, looking at entire collections at once is not useful • portability across devices • many of them just don’t work that well • most have very simple acoustics models • too much information thrown at user, or information is not organized in an accessible way
  • 73. what am I doing at nyu?
  • 74. concentrating on how a small collection of songs can be best presented to a user
  • 75. concentrating on how a small collection of songs can be best presented to a user i.e. how can the results of a search or browse query be better presented?
  • 80. Experimental Design - Aims of Experiment To determine the best interface parameters for music search and browsing tasks.
  • 81. Experimental Design - Independent Variables Number of Songs: 1 to 5 songs play concurrently
  • 82. Experimental Design - Independent Variables Number of Songs: 1 to 5 songs play concurrently Musical and Signal Content of Songs: Similar or dissimilar.
  • 83. Experimental Design - Independent Variables Number of Songs: 1 to 5 songs play concurrently Musical and Signal Content of Songs: Similar or dissimilar. Visualization: Whether interactive graphics representing each song are presented
  • 84. Experimental Design - Dependent Variables
  • 85. Experimental Design - Dependent Variables Search • A song is played and the participant needs to find that song in the collection. • No metadata is displayed. • The task is timed.
  • 86. Experimental Design - Dependent Variables Search • A song is played and the participant needs to find that song in the collection. • No metadata is displayed. • The task is timed. Browse • A situation is described and the participant is asked to find a song that fits the situation. • The task is timed.
  • 87. Experimental Design - Participant Experience
  • 88. Experimental Design - Participant Experience 1. Participant uses simplified version of interface with only 1 song to choose an HRTF set.
  • 89. Experimental Design - Participant Experience 1. Participant uses simplified version of interface with only 1 song to choose an HRTF set. 2. A video explains how to use the interface and the participant has approximately 5 minutes to practice a search task and a browsing task.
  • 90. Experimental Design - Participant Experience 1. Participant uses simplified version of interface with only 1 song to choose an HRTF set. 2. A video explains how to use the interface and the participant has approximately 5 minutes to practice a search task and a browsing task. 3. For about 45 minutes, the participant completes a series of search and browsing tasks.
  • 91. Experimental Design - Participant Experience 1. Participant uses simplified version of interface with only 1 song to choose an HRTF set. 2. A video explains how to use the interface and the participant has approximately 5 minutes to practice a search task and a browsing task. 3. For about 45 minutes, the participant completes a series of search and browsing tasks. 4. The participant completes a short questionnaire about their experience so far.
  • 92. Experimental Design - Participant Experience 1. Participant uses simplified version of interface with only 1 song to choose an HRTF set. 2. A video explains how to use the interface and the participant has approximately 5 minutes to practice a search task and a browsing task. 3. For about 45 minutes, the participant completes a series of search and browsing tasks. 4. The participant completes a short questionnaire about their experience so far. 5. 15 minute break away from the computer and headphones.
  • 93. Experimental Design - Participant Experience 1. Participant uses simplified version of interface with only 1 song to choose an HRTF set. 2. A video explains how to use the interface and the participant has approximately 5 minutes to practice a search task and a browsing task. 3. For about 45 minutes, the participant completes a series of search and browsing tasks. 4. The participant completes a short questionnaire about their experience so far. 5. 15 minute break away from the computer and headphones. 6. The participant completes a second 45 minute session of search and browsing tasks.
  • 94. Experimental Design - Participant Experience 1. Participant uses simplified version of interface with only 1 song to choose an HRTF set. 2. A video explains how to use the interface and the participant has approximately 5 minutes to practice a search task and a browsing task. 3. For about 45 minutes, the participant completes a series of search and browsing tasks. 4. The participant completes a short questionnaire about their experience so far. 5. 15 minute break away from the computer and headphones. 6. The participant completes a second 45 minute session of search and browsing tasks. 7. The participant completes a final questionnaire.
  • 96. search engines are tuned for the type of information being sought
  • 97. search engines are tuned for the type of information being sought but they break down when presenting time-based media
  • 98. search engines are tuned for the type of information being sought but they break down when presenting time-based media in our case, music
  • 99. direct manipulation to direct sonification
  • 100. direct manipulation to direct sonification listen to the music first, then get more information if so desired
  • 101. direct manipulation to direct sonification listen to the music first, then get more information if so desired this is done by using auditory displays
  • 102. a lot of focus on map-based paradigms, but it is time to move on
  • 103. a lot of focus on map-based paradigms, but it is time to move on concurrent presentation of audio is a good idea
  • 104. a lot of focus on map-based paradigms, but it is time to move on concurrent presentation of audio is a good idea but spatialization should not be used to represent complex relationships
  • 105. a lot of focus on map-based paradigms, but it is time to move on concurrent presentation of audio is a good idea but spatialization should not be used to represent complex relationships music is complex
  • 106. incorporating listening improves music search and discovery
  • 107. incorporating listening improves music search and discovery so it should continue
  • 108. incorporating listening improves music search and discovery so it should continue the work I am doing during my visit at nyu will measure whether this presented interface can assist people in performing search and browse tasks more efficiently
  • 109. however, what I believe to be the most difficult problem still remains to be addressed: the cold start problem future work needs to concentrate on how you initiate a search or browsing task
  • 110. thank you these slides can be found at http://www.slideshare.net/beckystewart/presentations