• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Turnbull Music Vocab
 

Turnbull Music Vocab

on

  • 731 views

 

Statistics

Views

Total Views
731
Views on SlideShare
731
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Start Itunes with talk playlist Start web browser for music survey sight - cosmal.ucsd.edu/cal/friends - username: dougturnbull

Turnbull Music Vocab Turnbull Music Vocab Presentation Transcript

  • Identifying Words that are Musically Meaningful David Torres, Douglas Turnbull , Luke Barrington, Gert Lanckriet Computer Audition Lab UC San Diego ISMIR September 25, 2007
  • Introduction
    • Our Goal: Create a content-based music search engine for natural language queries.
      • CAL Music Search Engine [SIGIR07]
    • Problem: picking a vocabulary of musically meaningful words?
      • Word is present  pattern in audio content
    • Solution: find words that are correlated with a set of acoustic signals
  • Two-View Representation
    • Consider a set of annotated songs. Each song is represented by:
      • Annotation vector in a Semantic Space
      • Audio feature vector (s) in an Acoustic Space
    Acoustic Space (2D) x y Semantic Space (2D) ‘ funky’ ‘ Ireland’ Mustang Sally The Commitments Mustang Sally The Commitments Riverdance Bill Whelan Riverdance Bill Whelan Hot Pants James Brown Hot Pants James Brown
  • Semantic Representation
    • Vocabulary of words:
      • CAL500 : 174 phrases from a human survey
        • Instrumentation, genre, emotion, usages, vocal characteristics
      • LastFM: ~15,000 tags from social music site
      • Web Mining: 100,000+ words mined from text documents
    • Annotation Vector, denoted s
      • Each element represents the ‘semantic association’ between a word and the song.
      • Dimension (D S ) = size of vocabulary
      • Example: Frank Sinatra’s ‘Fly Me to the Moon”
        • Vocabulary = {funk, jazz, guitar, female vocals, sad, passionate }
        • Annotation ( s i ) = [0/4 , 3/4, 4/4 , 0/4 , 2/4, 1/4]
    • Data is represented by a N x D S Matrix S =
    - s 1 - . - s i - . - s N -
  • Acoustic Representation
    • Each song is represented by an audio feature vector a that is automatically extracted from the audio-content.
    • Data is represented by NxD A matrix A =
    - a 1 - . - a i - . - a N - Acoustic Space (2D) Semantic Space (2D) ‘ funky’ ‘ Ireland’ Mustang Sally The Commitments x y Mustang Sally The Commitments
  • Canonical Correlation Analysis (CCA)
    • CCA is a technique for exploring dependencies between two related spaces.
      • Generalization of PCA to multiple spaces
      • Constrained optimization problem
        • Find vectors weight v ectors w s and w a :
          • 1-D projection of data in the semantic space - Sw s
          • 1-D projection of data in the acoustic space - Aw a
        • Maximize correlation of the projections
          • max ( Sw s ) T ( Aw a )
        • Constrain w s and w a to prevent infinite correlation
    max ( Sw s ) T ( Aw a ) w a , w s subject to: ( Sw s ) T ( Sw s ) = 1 ( Aw a ) T ( Aw a ) = 1
  • CCA Visualization Audio feature space Semantic space ‘ funky’ ‘ Ireland’ a a a b b b b c c c c d d d = = x y 1 1 0 -1 0 -1 -1 -1 S 1 -1 1 1 -1 -1 -1 1 A 1 0 0 -1 2 0 0 -2 1 0 w s 1 -1 w a ( Sw s ) T ( Aw a ) 2 0 0 -2 1 0 0 -1 = 4 Sparse Solution
  • What Sparsity means…
    • In the previous example,
      • w s , ’funky’  0
        •  ‘ funky’ is correlated w/ audio signals
        •  a musically meaningful word
      • w s , ’Ireland’ = 0
        •  ‘ Ireland’ is not correlated
        •  No linear relationship with the acoustic representation
    • In practice, w s is dense even if most words are uncorrelated
      • ‘ dense’ means many non-zero values
      • due to random variability in the data
    • Key Idea: reformulate CCA to produce a sparse solution.
  • Introducing Sparse CCA [ICML07]
    • Plan: penalize the objective function for each non-zero semantic dimensions
      • Pick a penalty function f( w s )
        • Penalizes each non-zero dimension
        • Take 1: Cardinality of w s : f(w s ) = | w s | 0
          • Combinatorial problem - np-hard
        • Take 2: L1 relaxation: f( w s ) = | w s | 1
          • Non-convex, not very tight approximation
        • Take 3: SDP relaxation
          • Prohibitive expensive for large problem
        • Solution: f( w s ) =  i log |w s,i |
          • Non-convex problem, but
          • Can be solved efficiently with DC program
          • Tight approximation
  • Introducing Sparse CCA [ICML07]
    • Plan: penalize the objective function for each non-zero semantic dimensions
      • Pick a penalty function f( w s )
        • Penalizes each non-zero dimension
        • f( w s ) =  i log |w s,i |
      • Use tuning parameter  to control importance of sparsity
        • Increasing   smaller set of ‘musically relevant’ words
    max ( Sw s ) T ( Aw a ) w a , w s subject to: ( Sw s ) T ( Sw s ) = 1 ( Aw a ) T ( Aw a ) = 1 -  f( w s )
  • Experimental Setup
    • CAL500 Data Set [SIGIR07]
      • 500 songs by 500 Artists
      • Semantic Representation
        • 173 words
          • genre, instrumentation, usages, emotions, vocals, etc…
        • Annotation vector is average from 3+ listeners
        • Word Agreement Score
          • measures how consistently listeners apply a word to songs
      • Acoustic Representation
        • Bag of Dynamic MFCC Vectors [McKinney03]
          • 52-D vector spectral modulation intensities
          • 160 vectors per minute of audio content
        • Duplicate annotation vector for each Dynamic MFCC
  • Experiment 1: Qualitative Results
    • Words with high acoustic correlation
    • hip-hop, arousing, sad, drum machine, heavy beat, at a party, rapping
    • Words with no acoustic correlation
    • classic rock, normal, constant energy, going to sleep, falsetto
  • Experiment 2: Vocabulary Pruning
    • AMG2131 Text Corpus [ISMIR06]
      • AMG Allmusic song reviews for most of CAL500 songs
      • 315 word vocabulary
      • Annotation vector based on the presence or absence of a word in the review
      • More noisy word-song relationships then CAL500
    • Experimental Design:
      • Merge vocabularies: 173+315 = 488 words
      • Prune noisy words as we increase amount of sparsity in CCA
    • Hypothesis:
      • AMG words will be pruned before CAL500 words
  • Experiment 2: Vocabulary Pruning
    • Experimental Design:
      • Merge vocabularies: 488 words
      • Prune noisy words as we increase amount of sparsity in CCA
    • Result:
    • As Sparse CCA is more aggressive, more AMG words are pruned.
    Vocabulary Size % Web2131 Words # AMG2131 Words # CAL500 Words 0.64 315 173 488 0.52 131 118 249 0.42 64 85 149 0.22 11 39 50
  • Experiment 3: Vocabulary Selection
    • Experimental Design:
    • Rank words by
      • how aggressive Sparse CCA is before word gets pruned.
      • how consistently humans use a word across CAL500 corpus.
    • As we decrease vocabulary size, calculate Average AROC
    Result: Sparse CCA does predict words that have better AROC .68 .76 AROC 173 120 20 Vocab Size 70
  • Recap
    • Constructing a ‘meaningful vocabulary’ is the first step in building a content-based, natural-language search engine for music.
    • Given a semantic representation and acoustic representation Sparse CCA can be used to find ‘musically meaningful’ words.
      • i.e., semantic dimensions linearly correlated with audio features
    • Automatically pruning words is important when using noisy sources of semantic information
      • e.g., LastFM Tags or Web Documents
  • Future Work
    • Theory: moving beyond linear correlation with kernel methods
    • Application: Sparse CCA can be used to find ‘musically meaningful’ audio features by imposing sparsity in the acoustic space
    • Practice: handling large, noisy semantically annotated music corpora
  • Identifying Words that are Musically Meaningful David Torres, Douglas Turnbull , Luke Barrington, Gert Lanckriet Computer Audition Lab UC San Diego ISMIR September 25, 2007
  • Experiment 3: Vocabulary Selection
    • Our content-based music search engine rank orders songs given a text-based query [SIGIR 07]
      • Area under the ROC curve (AROC) measures quality of each ranking
        • 0.5 is random, 1.0 is perfect
        • 0.68 is average AROC for all 1-word queries
    • Can Sparse CCA pick words that will have higher AROC?
      • Idea: words with high correlation have more signal in the audio representation and will be easier to model.
      • How does it compare picking words that humans consistently use to label songs.