Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Gomezetal ismir2012

2,874 views

Published on

Predominant Fundamental Frequency Estimation vs Singing Voice Separation for the Automatic Transcription of Accompanied Flamenco Singing
Emilia Gómez1, Francisco Jesus Cañadas Quesada2, Justin Salamon1, Jordi Bonada1, Pedro Vera Candea2, Pablo Cabañas Molero2
1Music Technology Group, Universitat Pompeu Fabra, 2Telecommunication Engineering, University of Jaen

Presented by Emilia Gómez, at ISMIR 2012 (International Society of Music Information Retrieval Conference) on the 12th of October 2012.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Gomezetal ismir2012

  1. 1. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing Emilia Gómez1, Francisco Cañadas2, Justin Salamon1, Jordi Bonada1, Pedro Vera2, Pablo Cabañas2 1 Music Technology Group, Universitat Pompeu Fabra 2 Universidad de Jaen emilia.gomez@upf.edu  
  2. 2. To future ISMIR organizers 2/35 Minimizing the “banquet/last day” effect: ‣  Schedule the best paper presentation ‣  Convert it to a poster session ‣  Invite a great keynote speaker ‣  ... emilia.gomez@upf.edu  
  3. 3. This talk ISMIR 2012 ‣  Musical cultures ‣  Music transcription (Benetos et al.) ‣  Predominant f0 estimation (Salamon et al.) ‣  Onset detection (Böck et al.) ‣  NMF (Boulanger-Lewandowski et al., Kirchhoff et al.), Singing voice separation (Sprechmann et al.; ) ‣  Ground truth evaluation (Peeters Fort; Urbano et al.) ‣  Flamenco (Pikkrakis et al.) ‣  Singing (Devaney et al., Proutsjova et al., Lagrange et al., Ross et al., Koduri et al.) emilia.gomez@upf.edu  
  4. 4. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  5. 5. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  6. 6. Flamenco singing ‣  Music tradition from Andalusia, south of Spain. ‣  Singing tradition (Gamboa, 2005): cante. ‣  Accompanying instruments: ‣  Flamenco guitar: toque. ‣  Other instruments: claps (palmas), rhythmic feet (zapateado), percussion (cajón) emilia.gomez@upf.edu  
  7. 7. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  8. 8. Music material ‣  Previous work on a cappella (Mora et al. 2012, Gómez and Bonada 2012) ‣  Focus on accompanied styles: Fandangos, 4 variants (Valverde, Almonaster, Calañas, Valiente-Alosno, Valiente-Huelva) emilia.gomez@upf.edu  
  9. 9. Arcangel http://www.youtube.com/watch?v=p2hTeDJblBsemilia.gomez@upf.edu  
  10. 10. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  11. 11. Flamenco singing transcription ‣  Tedious ‣  No standard methodology ‣  ‘Computer-assisted’ transcription ‣  Note-level Donnier (2011) emilia.gomez@upf.edu  
  12. 12. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  13. 13. Automatic singing transcription Challenges ‣  General: singing voice ‣  Specific: ‣  Polyphonic material ‣  Ornamentation, melisma ‣  Recording conditions (e.g. reverb, noise) Fandango (Cojo de Málaga) 1921 ‣  Voice quality ‣  Tuning emilia.gomez@upf.edu  
  14. 14. Approach ‣  System based on previous work by (Bonada et al. 2010) used in online castings for TV-shows. Singing voice Note transcription f0 estimation emilia.gomez@upf.edu  
  15. 15. Approach Singing voice f0 estimation Note transcription emilia.gomez@upf.edu  
  16. 16. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  17. 17. (1) Separation-based approach (UJA) Singing voice separation ‣  A mixture spectrogram X is factorized into three different spectrograms: ‣  Percussive (Xp): smoothness in f, sparseness in t ‣  Harmonic (Xh): sparseness in f, smoothness in t ‣  Vocal (Xv): sparseness in f, sparseness in t ‣  Our NMF proposal does not use any clustering process to discriminate basis emilia.gomez@upf.edu  
  18. 18. (1) Separation-based approach (UJA) Singing voice separation ‣  Stages: 1.  Segmentation: manual labelling. 2.  Training: learn percussive and harmonic basis vectors from instrumental regions, using an unsupervised NMF percussive/harmonic separation approach. 3.  Separation: Xv is extracted from the vocal regions by keeping the percussive and harmonic basis vectors fixed from the previous stage. emilia.gomez@upf.edu  
  19. 19. (1) Separation-based approach (UJA) Monophonic f0 estimation ‣  Cumulative mean normalized difference function (de Cheveigné and Kawahara, 2002). ‣  Indicates the cost of having a period equal to τ at time frame t ‣  f0 sequence: lowest-cost path. Dynamic programming ‣  Step-by-step along time. Continuous and smooth f0 contour emilia.gomez@upf.edu  
  20. 20. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  21. 21. (2) Predominant f0 estimation (MTG) emilia.gomez@upf.edu  
  22. 22. (2) Predominant f0 estimation (MTG) emilia.gomez@upf.edu  
  23. 23. (2) Predominant f0 estimation (MTG) emilia.gomez@upf.edu  
  24. 24. (2) Predominant f0 estimation (MTG) emilia.gomez@upf.edu  
  25. 25. (2) Predominant f0 estimation (MTG) emilia.gomez@upf.edu  
  26. 26. (2) Predominant f0 estimation (MTG) ‣  More details (Salamon et al. @ ISMIR) ‣  Default parameters (MTG) ‣  Per-excerpt adapted parameters (MTGAdaptedParam): ‣  Minimum and maximum frequency threshold ‣  Strictness of the voicing filter Song (Fandango de Valverde, Raya) f0 mix emilia.gomez@upf.edu  
  27. 27. Approach Note transcription Singing voice f0 estimation emilia.gomez@upf.edu  
  28. 28. Approach Note transcription Singing voice f0 estimation emilia.gomez@upf.edu  
  29. 29. Note segmentation ‣  Tuning frequency estimation: 1.  Histogram of f0 deviations, 1 cent resolution 2.  Give more weight to stable frames (low f0 derivative) 3.  Use a bell-shape window to assign f0 values to histogram bins 4.  The maximum of the histogram (bmax) determines the estimated tuning frequency fref = 440·2bmax/1200 emilia.gomez@upf.edu  
  30. 30. following criteria: duration (Ld ), pitch (Lc ), existence of dio voiced and unvoiced frames (Lv ), and low-level features repr Note segmentation related to stability (Ls ):‣  Short note transcription: Dynamic programming (DP) algorithm. each L(npi ) = Ld (npi ) · Lc (npi ) · Lv (npi ) · Ls (npi ) (8) are givi ‣  Duration: small for short and long durations Duration likelihood Ld is set so that it is small for short step ‣  Stability: a voiced note should be more or less stable in timbre energy ‣  and long durations. Pitch likelihood L is defined so that it Pitch: more weight to frames with low f0 derivative c base ‣  Voicing: according to the % of voiced frames0 values are to the note nom- is higher the closer the frame f in a note peatnote pitch indexinal pitch cpi , giving more relevance to frames with low f0 F derivative values. The voicing likelihood Lv is defined so node k, j tion that segments with a high percentage of unvoiced frames and are unlikely to be a voiced note, while segments with a temp j high percentage of voiced frames are unlikely to be an un- leve voiced note. Finally, the stability likelihood Ls considers that a voiced note is unlikely to have fast and significant 0 0 timbre or energy changes in the middle. Note that this is 4.1 k-dmax k-dmin k frame index not in contradiction with smooth vowel changes, charac- emilia.gomez@upf.edu   teristic of flamenco singing. We
  31. 31. Note transcription ‣  Iterative note transcription: 1.  Note consolidation: consecutive notes with same pitch and soft transition in terms of energy and timbre (stability below a threshold) 2.  Tuning frequency refinement: consider note pitch values, giving higher weight to longer and louder notes 3.  Note pitch refinement. emilia.gomez@upf.edu  
  32. 32. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  33. 33. Evaluation strategy ‣  Music material: ‣  30 excerpts, μduration=53.48 seconds, 2392 notes ‣  Variety of singers, recording conditions. ‣  Ground truth (big problem!): ‣  All perceptible notes (including ornamentations) ‣  Equal-tempered chromatic scale ‣  Discussion of working examples with flamenco experts ‣  Annotations by a single subject ‣  Evaluation measures (another big problem!) proposed by MIREX (Audio Melody Extraction task, on a frame basis, comparing quantized pitch values) emilia.gomez@upf.edu  
  34. 34. Results ‣  Satisfying results for both strategies. ‣  Good guitar timbre estimation in our separation-based approach  requiring manual segmentation. ‣  Predominant f0 estimation (MTG), yields slightly higher accuracy  fully automatic. ‣  Best results adapting parameters (84.68% overall, 77.92 pitch accuracy) ‣  Voicing false alarm rate (around 10%), the guitar is detected as melody. ‣  Better results than for a cappella singing, no tuning errors. emilia.gomez@upf.edu  
  35. 35. Qualitative error analysis ‣  Limitations: ‣  F0 estimation: ‣  Highly accompanied sections: voicing, 5th/8th errors ‣  Note segmentation labelling ‣  Highly ornamented sections ‣  Overall agreement: emilia.gomez@upf.edu  
  36. 36. Case study ‣  Fandango de Valverde, Raya emilia.gomez@upf.edu  
  37. 37. Case study emilia.gomez@upf.edu  
  38. 38. Case study emilia.gomez@upf.edu  
  39. 39. Case study emilia.gomez@upf.edu  
  40. 40. Case study emilia.gomez@upf.edu  
  41. 41. Case study emilia.gomez@upf.edu  
  42. 42. Case study emilia.gomez@upf.edu  
  43. 43. Conclusions ‣  Adaptive algorithms according to repertoire use- case ‣  Limitations challenges: ‣  F0 estimation: voicing ‣  Note transcription: onset detection, pitch labelling. ‣  Accurate enough for higher level analyses: similarity, style classification, motive analysis, COmputation FLAmenco http://mtg.upf.edu/research/projects/cofla) Thanks! emilia.gomez@upf.edu  

×