stop looking for music
and start listening to it:
auditory display in music collection interfaces
Becky Stewart
rebecca.stewart@eecs.qmul.ac.uk
Centre for Digital Music
School of Electronic Engineering and Computer Science
Queen Mary, University of London
In this talk we will ...
• Review how search and browse for information
In this talk we will ...
• Review how search and browse for information
• Look at current commercially-available interfaces
In this talk we will ...
• Review how search and browse for information
• Look at current commercially-available interfaces
• Discuss why listening should be integrated
In this talk we will ...
• Review how search and browse for information
• Look at current commercially-available interfaces
• Discuss why listening should be integrated
• Look at solutions presented by academia
In this talk we will ...
• Review how search and browse for information
• Look at current commercially-available interfaces
• Discuss why listening should be integrated
• Look at solutions presented by academia
• Review recent research from C4DM
In this talk we will ...
• Review how search and browse for information
• Look at current commercially-available interfaces
• Discuss why listening should be integrated
• Look at solutions presented by academia
• Review recent research from C4DM
• Wrap up and conclude
Users seldom go on to next page of results
Broad overview, but can zoom in on specific result
All other information beyond image is suppressed, but recallable
Less helpful than the image search results
Difficult to navigate results
Have to go to web page to view any portion of the video
Music or audio results only is not an option
commercial interfaces use a combination
of text fields and seed songs/artists
commercial interfaces use a combination
of text fields and seed songs/artists
results are lists of text perhaps enhanced with images, general
knowledge and hyperlinks
commercial interfaces use a combination
of text fields and seed songs/artists
results are lists of text perhaps enhanced with images, general
knowledge and hyperlinks
songs are played back one at a time and only if explicitly
requested by user
Bjork / Björk
• textual metadata can be malformed or wrong
• an empty text field is less than inspiring
• text can be a barrier to discovery
• previous knowledge is needed
• difficult to move into tail, will stay in the head
Celma and Cano From hits to niches? or how popular artists can bias music recommendation and discovery. In
Proc. of 2nd Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition (ACM KDD),
Las Vegas, Nevada, USA, August 2008.
listening makes a difference
• users make different judgements about playlists when metadata is missing
L. Barrington, R. Oda, and G. Lanckriet. Smarter than Genius: human evaluation of music recommender
systems. In Proc. of ISMIR’09: 10th Int.Society for Music Information Retrieval Conf., pages 357–362, Kobe,
Japan, October 2009.
listening is faster
• when search results are compiled into a single audio stream instead of a list
of results, users find what they are looking for quicker
S. Ali and P. Aarabi. A cyclic interface for the presentation of multiple music files. IEEE Trans. on Multimedia,
10(5):780–793, August 2008.
• listeners can find music without a GUI faster than with an iPod, and be just as
happy with their selection
Andreja Andric, Pierre-Louis Xech, and Andrea Fantasia, “Music mood wheel: Improving browsing experience on
digital content through an audio interface,”in Proc. of 2nd Int. Conf. on Automated Production of Cross Media
Content for Multi-Channel Distribution (AXMEDIS’06), 2006.
listening is effective
• users can understand and navigate a collection of music as effectively
without a GUI as with one
• they are slower, but don’t make significantly more mistakes
S. Pauws, D. Bouwhuis, and B. Eggen. Programming and enjoying music with your eyes closed. In CHI ’00:
Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, pages 376–383. ACM, 2000. doi:
10.1145/332040.332460.
mused
• passive listening
G. Coleman. Mused: navigating the personal
sample library. In Proc. of ICMC: Int.
Computer Music Conf., Copenhagen,
Denmark, August 2007.
• youtube
http://www.youtube.com/watch?
v=DuuESpj558Y&feature=related
mused
• passive listening
G. Coleman. Mused: navigating the personal
sample library. In Proc. of ICMC: Int.
Computer Music Conf., Copenhagen,
Denmark, August 2007.
• youtube
http://www.youtube.com/watch?
v=DuuESpj558Y&feature=related
sonic browser
• hugely influential interface
• introduced aurally exploring a
map of sounds
• direct sonification
M. Fernström and E. Brazil. Sonic browsing:
an auditory tool for multimedia asset
management. In Proc. of ICAD ’01:
Internation Conf. on Auditory Display, pages
132–135, Espoo, Finland, August 2001. M.
Fernström and C. McNamara. After direct
manipulation - direct sonification. In Proc. of
ICAD ’98: Int. Conf. on Auditory Display,
1998.
soundtorch
• 3D version of sonic browser
S. Heise, M. Hlatky, and J. Loviscach.
SoundTorch: Quick browsing in large audio
collections. In Proc. of AES 125th Conv., San
Francisco, CA, October 2008.
S. Heise, M. Hlatky, and J. Loviscach. Aurally
and visually enhanced audio search with
SoundTorch. In CHI ’09: Proc. of the 27th int.
conf.e extended abstracts on Human factors
in computing systems, pages 3241–3246,
Boston, MA, USA, April 2009. doi:
10.1145/1520340.1520465.
• youtube
http://www.youtube.com/watch?v=eiwj7Td7Pec
soundtorch
• 3D version of sonic browser
S. Heise, M. Hlatky, and J. Loviscach.
SoundTorch: Quick browsing in large audio
collections. In Proc. of AES 125th Conv., San
Francisco, CA, October 2008.
S. Heise, M. Hlatky, and J. Loviscach. Aurally
and visually enhanced audio search with
SoundTorch. In CHI ’09: Proc. of the 27th int.
conf.e extended abstracts on Human factors
in computing systems, pages 3241–3246,
Boston, MA, USA, April 2009. doi:
10.1145/1520340.1520465.
• youtube
http://www.youtube.com/watch?v=eiwj7Td7Pec
neptune
• based on Islands of Music
P. Knees, M. Schedi, T. Pohle, and G.
Widmer. An innovative three-dimensional
user interface for exploring music
collections enriched with meta-
information from the web. In
MULTIMEDIA ’06: Proc. of the 14th
annual ACM int.l conf. on Multimedia,
pages 17–24, Santa Barbara, CA, USA,
2006. doi: 10.1145/1180639.1180652.
neptune
• based on Islands of Music
P. Knees, M. Schedi, T. Pohle, and G.
Widmer. An innovative three-dimensional
user interface for exploring music
collections enriched with meta-
information from the web. In
MULTIMEDIA ’06: Proc. of the 14th
annual ACM int.l conf. on Multimedia,
pages 17–24, Santa Barbara, CA, USA,
2006. doi: 10.1145/1180639.1180652.
sonixplorer
• extension of neptune
• landscape can be marked up
by user
• introduced focus
• youtube
http://www.youtube.com/watch?v=mIfWg2Eex74
D. Lübbers. Sonixplorer: Combining
visualization and auralization for content-
based exploration of music collections. In
Proc. of ISMIR’05: 6th Int. Society for Music
Information Retrieval Conf., pages 590–593,
London, UK, 2005.
D. Lübbers and M. Jarke. Adaptive
multimodal exploration of music collections.
In Proc. of ISMIR’09: 10th Int. Society for
Music Information Retrieval Conf., pages
195–200, Kyoto, Japan, 2009.
sonixplorer
• extension of neptune
• landscape can be marked up
by user
• introduced focus
• youtube
http://www.youtube.com/watch?v=mIfWg2Eex74
D. Lübbers. Sonixplorer: Combining
visualization and auralization for content-
based exploration of music collections. In
Proc. of ISMIR’05: 6th Int. Society for Music
Information Retrieval Conf., pages 590–593,
London, UK, 2005.
D. Lübbers and M. Jarke. Adaptive
multimodal exploration of music collections.
In Proc. of ISMIR’09: 10th Int. Society for
Music Information Retrieval Conf., pages
195–200, Kyoto, Japan, 2009.
what’s the problem?
• too much information thrown at the user
• does not translate well to mobile devices
• rendering spatial audio
• reliance on screens
Can still do more efficient
things in B-format domain
Can still do more efficient
things in B-format domain
Only 3 convolutions and only
need to store 3 filters
Can still do more efficient
things in B-format domain
When compared to direct binaural,
there are measurable differences.
Only 3 convolutions and only
need to store 3 filters
Can still do more efficient
things in B-format domain
When compared to direct binaural,
there are measurable differences.
But listeners can’t tell the difference.
Only 3 convolutions and only
need to store 3 filters
Can still do more efficient
things in B-format domain
When compared to direct binaural,
there are measurable differences.
But listeners can’t tell the difference.
So use the more efficient Only 3 convolutions and only
implementation. need to store 3 filters
build an interface which uses virtual ambisonics
task: browse a collection to select a single song
evaluation
• user study with 12 users
• most liked the idea
• but the implementation needed improvement
• confusion as to how to navigate through the space
• some people adverse to concurrent playback
add visuals and improve physical controller,
but keep dependence on audio
cyclic playback
• inspired by
S. Ali and P. Aarabi. A cyclic interface for the
presentation of multiple music files. IEEE
Trans. on Multimedia, 10(5):780–793, August
2008.
• hear everything within 20
seconds
• user can control concurrent
playback
evaluation
• no formal evaluation, but demonstrated to a variety of individuals and small
groups (approximately 40 people)
• improved interaction with physical controller
• perhaps too many controls, much steeper learning curve
• much room for improvement
public installation
• shown in Information Aesthetics at SIGGRAPH 2009
• approximately 1000 passed through the exhibit
• children, students, artists, designers, technologists
• quick to bring smiles - it was fun, people even brought back friends to
experience it
• easy to learn how to use
conclusions drawn from research
• context is key when shaping interaction
• users will approach an interface with previous knowledge, need to build on
and incorporate that knowledge
conclusions drawn from research
• context is key when shaping interaction
• users will approach an interface with previous knowledge, need to build on
and incorporate that knowledge
• audio can’t be subtle
• can’t rely on complex information to be universally implied through only
audio
conclusions drawn from research
• context is key when shaping interaction
• users will approach an interface with previous knowledge, need to build on
and incorporate that knowledge
• audio can’t be subtle
• can’t rely on complex information to be universally implied through only
audio
• can (and should) be fun
conclusions drawn from research
• context is key when shaping interaction
• users will approach an interface with previous knowledge, need to build on
and incorporate that knowledge
• audio can’t be subtle
• can’t rely on complex information to be universally implied through only
audio
• can (and should) be fun
• maps aren’t great, there must be something better
why haven’t these ideas caught on?
• solutions use non-scalable algorithms that are impractical for commercial
applications (a problem not limited to only interfaces within MIR)
• portability across devices
• many of them just don’t work that well
• most have very simple acoustics models
• too much information thrown at user, or information is not organized in an
accessible way
direct manipulation to direct sonification
listen to the music first, then get more information if
so desired
direct manipulation to direct sonification
listen to the music first, then get more information if
so desired
this is done by using auditory displays
a lot of focus on map-based paradigms, but it may
be time to move on
a lot of focus on map-based paradigms, but it may
be time to move on
concurrent presentation of audio is a good idea
a lot of focus on map-based paradigms, but it may
be time to move on
concurrent presentation of audio is a good idea
but spatialization should not be used to represent
complex relationships
a lot of focus on map-based paradigms, but it may
be time to move on
concurrent presentation of audio is a good idea
but spatialization should not be used to represent
complex relationships
music is complex
incorporating listening improves music search
and discovery
so it should continue
we haven’t figured out how to do it perfectly
need to turn fun toys into useful tools