SlideShare a Scribd company logo
Predominant fundamental frequency estimation
             versus singing voice separation 
             for the automatic transcription 
            of accompanied flamenco singing	

          Emilia Gómez1, Francisco Cañadas2, Justin Salamon1, Jordi Bonada1,
                            Pedro Vera2, Pablo Cabañas2	

                           1 Music Technology Group, Universitat Pompeu Fabra	

                                          2 Universidad de Jaen 	





emilia.gomez@upf.edu	
  
To future ISMIR organizers	

                                                         2/35
 Minimizing the “banquet/last day” effect:	


 ‣     Schedule the best paper presentation	

 ‣     Convert it to a poster session	

 ‣     Invite a great keynote speaker	

 ‣     ...	





emilia.gomez@upf.edu	
  
This talk  ISMIR 2012	


 ‣     Musical cultures	

 ‣     Music transcription (Benetos et al.)	

 ‣     Predominant f0 estimation (Salamon et al.) 	

 ‣     Onset detection (Böck et al.)	

 ‣     NMF (Boulanger-Lewandowski et al., Kirchhoff et al.), Singing voice
       separation (Sprechmann et al.; )	

 ‣     Ground truth  evaluation (Peeters  Fort; Urbano et al.)	

 ‣     Flamenco (Pikkrakis et al.)	

 ‣     Singing (Devaney et al., Proutsjova et al., Lagrange et al., Ross et al., Koduri
       et al.) 	




emilia.gomez@upf.edu	
  
Predominant fundamental frequency estimation
             versus singing voice separation 
             for the automatic transcription 
            of accompanied flamenco singing	





emilia.gomez@upf.edu	
  
Predominant fundamental frequency estimation
             versus singing voice separation 
             for the automatic transcription 
            of accompanied flamenco singing	





emilia.gomez@upf.edu	
  
Flamenco singing	

‣     Music tradition from Andalusia, south of Spain.	

‣     Singing tradition (Gamboa, 2005): cante. 	

‣     Accompanying instruments: 	

       ‣  Flamenco guitar: toque.	

       ‣  Other instruments: claps (palmas), rhythmic
          feet (zapateado), percussion (cajón)	





     emilia.gomez@upf.edu	
  
Predominant fundamental frequency estimation
             versus singing voice separation 
             for the automatic transcription 
            of accompanied flamenco singing	





emilia.gomez@upf.edu	
  
Music material	


‣    Previous work on a cappella (Mora et al.
     2012, Gómez and Bonada 2012)	


‣    Focus on accompanied styles:
     Fandangos, 4 variants (Valverde,
     Almonaster, Calañas, Valiente-Alosno,
     Valiente-Huelva)	





       emilia.gomez@upf.edu	
  
Arcangel	





                           http://www.youtube.com/watch?v=p2hTeDJblBs
emilia.gomez@upf.edu	
  
Predominant fundamental frequency estimation
             versus singing voice separation 
             for the automatic transcription 
            of accompanied flamenco singing	





emilia.gomez@upf.edu	
  
Flamenco singing transcription	


 ‣     Tedious	

 ‣     No standard methodology	

 ‣     ‘Computer-assisted’
       transcription	

 ‣     Note-level	





                                    Donnier (2011)	

emilia.gomez@upf.edu	
  
Predominant fundamental frequency estimation
             versus singing voice separation 
             for the automatic transcription 
            of accompanied flamenco singing	





emilia.gomez@upf.edu	
  
Automatic singing transcription	



Challenges	


  ‣     General: singing voice	

  ‣     Specific: 	

         ‣  Polyphonic material	

         ‣  Ornamentation, melisma	

         ‣  Recording conditions 
            (e.g. reverb, noise)	

               	

Fandango (Cojo de Málaga) 1921	

          ‣    Voice quality	

          ‣    Tuning	





 emilia.gomez@upf.edu	
  
Approach	



 ‣     System based on previous work by (Bonada et al. 2010) used in
       online castings for TV-shows.	



                 Singing voice    Note transcription	

                 f0 estimation	





emilia.gomez@upf.edu	
  
Approach	





     Singing voice 
     f0 estimation	

      Note transcription	





emilia.gomez@upf.edu	
  
Predominant fundamental frequency estimation
             versus singing voice separation 
             for the automatic transcription 
            of accompanied flamenco singing	





emilia.gomez@upf.edu	
  
(1) Separation-based approach (UJA)	



Singing voice separation	


    ‣     A mixture spectrogram X is factorized into three
          different spectrograms:	

           ‣  Percussive (Xp): smoothness in f, sparseness in t	

           ‣  Harmonic (Xh): sparseness in f, smoothness in t	

           ‣  Vocal (Xv): sparseness in f, sparseness in t	

    ‣     Our NMF proposal does not use any clustering
          process to discriminate basis 	





  emilia.gomez@upf.edu	
  
(1) Separation-based approach (UJA)	



Singing voice separation	


    ‣     Stages:	

           1.  Segmentation: manual labelling.	

           2.  Training: learn percussive and harmonic basis vectors
               from instrumental regions, using an unsupervised NMF
               percussive/harmonic separation approach.	

           3.  Separation: Xv is extracted from the vocal regions by
               keeping the percussive and harmonic basis vectors
               fixed from the previous stage. 	





  emilia.gomez@upf.edu	
  
(1) Separation-based approach (UJA)	



Monophonic f0 estimation	

   ‣     Cumulative mean normalized difference function (de Cheveigné and
         Kawahara, 2002).	

          ‣  Indicates the cost of having a period equal to τ at time frame t	

          ‣  f0 sequence: lowest-cost path. Dynamic programming	

          ‣  Step-by-step along time. Continuous and smooth f0 contour	





 emilia.gomez@upf.edu	
  
Predominant fundamental frequency estimation
             versus singing voice separation 
             for the automatic transcription 
            of accompanied flamenco singing	





emilia.gomez@upf.edu	
  
(2) Predominant f0 estimation (MTG)	





emilia.gomez@upf.edu	
  
(2) Predominant f0 estimation (MTG)	





emilia.gomez@upf.edu	
  
(2) Predominant f0 estimation (MTG)	





emilia.gomez@upf.edu	
  
(2) Predominant f0 estimation (MTG)	





emilia.gomez@upf.edu	
  
(2) Predominant f0 estimation (MTG)	





emilia.gomez@upf.edu	
  
(2) Predominant f0 estimation (MTG)	



‣       More details (Salamon et al. @ ISMIR)	

‣       Default parameters (MTG)	

‣       Per-excerpt adapted parameters
        (MTGAdaptedParam):	

          ‣    Minimum and maximum frequency
               threshold	

          ‣    Strictness of the voicing filter	

                      Song 
                      (Fandango de Valverde, Raya)	



                        f0	


                       mix	





     emilia.gomez@upf.edu	
  
Approach	





                                     Note transcription	

                 Singing voice 
                 f0 estimation	





emilia.gomez@upf.edu	
  
Approach	





                                     Note transcription	

                 Singing voice 
                 f0 estimation	





emilia.gomez@upf.edu	
  
Note segmentation	



 ‣     Tuning frequency estimation: 	

        1.  Histogram of f0 deviations, 1 cent resolution	

        2.  Give more weight to stable frames (low f0 derivative)	

        3.  Use a bell-shape window to assign f0 values to histogram
            bins	

        4.  The maximum of the histogram (bmax) determines the
            estimated tuning frequency fref = 440·2bmax/1200	





emilia.gomez@upf.edu	
  
following criteria: duration (Ld ), pitch (Lc ), existence of              dio
                   voiced and unvoiced frames (Lv ), and low-level features                   repr
                                                           Note segmentation	

                   related to stability (Ls ):

‣        Short note transcription: Dynamic programming (DP) algorithm.	

                     each
                            L(npi ) = Ld (npi ) · Lc (npi ) · Lv (npi ) · Ls (npi )    (8)    are
                                                                                              givi
          ‣    Duration: small for short and long durations	

                      Duration likelihood Ld is set so that it is small for short             step
          ‣    Stability: a voiced note should be more or less stable in timbre  energy	

          ‣ 
                 and long durations. Pitch likelihood L is defined so that it
               Pitch: more weight to frames with low f0 derivative	

 c
                                                                                              base
          ‣    Voicing: according to the % of voiced frames0 values are to the note nom-
                 is higher the closer the frame f in a note 	

                               peat
note pitch indexinal pitch cpi , giving more relevance to frames with low f0                     F
                derivative values. The voicing likelihood Lv is defined so
                                                        node k, j
                                                                                              tion
                that segments with a high percentage of unvoiced frames                       and
                are unlikely to be a voiced note, while segments with a                       temp
     j
                high percentage of voiced frames are unlikely to be an un-                    leve
                voiced note. Finally, the stability likelihood Ls considers
                that a voiced note is unlikely to have fast and significant
     0
         0
                timbre or energy changes in the middle. Note that this is                     4.1
                   k-dmax              k-dmin   k   frame index
                not in contradiction with smooth vowel changes, charac-
     emilia.gomez@upf.edu	
  

                teristic of flamenco singing.                                                  We
Note transcription	




 ‣     Iterative note transcription:	

         1.  Note consolidation: consecutive notes with same pitch and
             soft transition in terms of energy and timbre (stability
             below a threshold)	

         2.  Tuning frequency refinement: consider note pitch values,
             giving higher weight to longer and louder notes	

         3.  Note pitch refinement.	





emilia.gomez@upf.edu	
  
Predominant fundamental frequency estimation
             versus singing voice separation 
             for the automatic transcription 
            of accompanied flamenco singing	





emilia.gomez@upf.edu	
  
Evaluation strategy	


 ‣     Music material: 	

        ‣  30 excerpts, μduration=53.48 seconds, 2392 notes	

        ‣  Variety of singers, recording conditions.	

 ‣     Ground truth (big problem!):	

        ‣  All perceptible notes (including ornamentations)	

        ‣  Equal-tempered chromatic scale	

        ‣  Discussion of working examples with flamenco experts	

        ‣  Annotations by a single subject	

 ‣     Evaluation measures (another big problem!) proposed by MIREX
       (Audio Melody Extraction task, on a frame basis, comparing
       quantized pitch values)	



emilia.gomez@upf.edu	
  
Results	


‣      Satisfying results for both strategies.	

‣      Good guitar timbre estimation in our
       separation-based approach 
       requiring manual segmentation.	

‣      Predominant f0 estimation (MTG),
       yields slightly higher accuracy  fully
       automatic.	

‣      Best results adapting parameters
       (84.68% overall, 77.92 pitch accuracy)	

‣      Voicing false alarm rate (around 10%),
       the guitar is detected as melody. 	

‣      Better results than for a cappella
       singing, no tuning errors.	




     emilia.gomez@upf.edu	
  
Qualitative error analysis	


 ‣     Limitations: 	

        ‣  F0 estimation:	

            ‣  Highly accompanied sections: voicing, 5th/8th
               errors 	

        ‣  Note segmentation  labelling 	

            ‣  Highly ornamented sections	

        ‣  Overall agreement:	





emilia.gomez@upf.edu	
  
Case study	


 ‣     Fandango de Valverde, Raya	





emilia.gomez@upf.edu	
  
Case study	





emilia.gomez@upf.edu	
  
Case study	





emilia.gomez@upf.edu	
  
Case study	





emilia.gomez@upf.edu	
  
Case study	





emilia.gomez@upf.edu	
  
Case study	





emilia.gomez@upf.edu	
  
Case study	





emilia.gomez@upf.edu	
  
Conclusions	


 ‣     Adaptive algorithms according to repertoire  use-
       case	

 ‣     Limitations  challenges: 	

        ‣  F0 estimation: voicing	

        ‣  Note transcription: onset detection, pitch labelling.	

 ‣     Accurate enough for higher level analyses: similarity,
       style classification, motive analysis, 
       COmputation  FLAmenco
       http://mtg.upf.edu/research/projects/cofla)	


                           Thanks!	

emilia.gomez@upf.edu	
  

More Related Content

Recently uploaded

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 

Recently uploaded (20)

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Gomezetal ismir2012

  • 1. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing Emilia Gómez1, Francisco Cañadas2, Justin Salamon1, Jordi Bonada1, Pedro Vera2, Pablo Cabañas2 1 Music Technology Group, Universitat Pompeu Fabra 2 Universidad de Jaen emilia.gomez@upf.edu  
  • 2. To future ISMIR organizers 2/35 Minimizing the “banquet/last day” effect: ‣  Schedule the best paper presentation ‣  Convert it to a poster session ‣  Invite a great keynote speaker ‣  ... emilia.gomez@upf.edu  
  • 3. This talk ISMIR 2012 ‣  Musical cultures ‣  Music transcription (Benetos et al.) ‣  Predominant f0 estimation (Salamon et al.) ‣  Onset detection (Böck et al.) ‣  NMF (Boulanger-Lewandowski et al., Kirchhoff et al.), Singing voice separation (Sprechmann et al.; ) ‣  Ground truth evaluation (Peeters Fort; Urbano et al.) ‣  Flamenco (Pikkrakis et al.) ‣  Singing (Devaney et al., Proutsjova et al., Lagrange et al., Ross et al., Koduri et al.) emilia.gomez@upf.edu  
  • 4. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  • 5. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  • 6. Flamenco singing ‣  Music tradition from Andalusia, south of Spain. ‣  Singing tradition (Gamboa, 2005): cante. ‣  Accompanying instruments: ‣  Flamenco guitar: toque. ‣  Other instruments: claps (palmas), rhythmic feet (zapateado), percussion (cajón) emilia.gomez@upf.edu  
  • 7. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  • 8. Music material ‣  Previous work on a cappella (Mora et al. 2012, Gómez and Bonada 2012) ‣  Focus on accompanied styles: Fandangos, 4 variants (Valverde, Almonaster, Calañas, Valiente-Alosno, Valiente-Huelva) emilia.gomez@upf.edu  
  • 9. Arcangel http://www.youtube.com/watch?v=p2hTeDJblBs emilia.gomez@upf.edu  
  • 10. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  • 11. Flamenco singing transcription ‣  Tedious ‣  No standard methodology ‣  ‘Computer-assisted’ transcription ‣  Note-level Donnier (2011) emilia.gomez@upf.edu  
  • 12. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  • 13. Automatic singing transcription Challenges ‣  General: singing voice ‣  Specific: ‣  Polyphonic material ‣  Ornamentation, melisma ‣  Recording conditions (e.g. reverb, noise) Fandango (Cojo de Málaga) 1921 ‣  Voice quality ‣  Tuning emilia.gomez@upf.edu  
  • 14. Approach ‣  System based on previous work by (Bonada et al. 2010) used in online castings for TV-shows. Singing voice Note transcription f0 estimation emilia.gomez@upf.edu  
  • 15. Approach Singing voice f0 estimation Note transcription emilia.gomez@upf.edu  
  • 16. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  • 17. (1) Separation-based approach (UJA) Singing voice separation ‣  A mixture spectrogram X is factorized into three different spectrograms: ‣  Percussive (Xp): smoothness in f, sparseness in t ‣  Harmonic (Xh): sparseness in f, smoothness in t ‣  Vocal (Xv): sparseness in f, sparseness in t ‣  Our NMF proposal does not use any clustering process to discriminate basis emilia.gomez@upf.edu  
  • 18. (1) Separation-based approach (UJA) Singing voice separation ‣  Stages: 1.  Segmentation: manual labelling. 2.  Training: learn percussive and harmonic basis vectors from instrumental regions, using an unsupervised NMF percussive/harmonic separation approach. 3.  Separation: Xv is extracted from the vocal regions by keeping the percussive and harmonic basis vectors fixed from the previous stage. emilia.gomez@upf.edu  
  • 19. (1) Separation-based approach (UJA) Monophonic f0 estimation ‣  Cumulative mean normalized difference function (de Cheveigné and Kawahara, 2002). ‣  Indicates the cost of having a period equal to τ at time frame t ‣  f0 sequence: lowest-cost path. Dynamic programming ‣  Step-by-step along time. Continuous and smooth f0 contour emilia.gomez@upf.edu  
  • 20. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  • 21. (2) Predominant f0 estimation (MTG) emilia.gomez@upf.edu  
  • 22. (2) Predominant f0 estimation (MTG) emilia.gomez@upf.edu  
  • 23. (2) Predominant f0 estimation (MTG) emilia.gomez@upf.edu  
  • 24. (2) Predominant f0 estimation (MTG) emilia.gomez@upf.edu  
  • 25. (2) Predominant f0 estimation (MTG) emilia.gomez@upf.edu  
  • 26. (2) Predominant f0 estimation (MTG) ‣  More details (Salamon et al. @ ISMIR) ‣  Default parameters (MTG) ‣  Per-excerpt adapted parameters (MTGAdaptedParam): ‣  Minimum and maximum frequency threshold ‣  Strictness of the voicing filter Song (Fandango de Valverde, Raya) f0 mix emilia.gomez@upf.edu  
  • 27. Approach Note transcription Singing voice f0 estimation emilia.gomez@upf.edu  
  • 28. Approach Note transcription Singing voice f0 estimation emilia.gomez@upf.edu  
  • 29. Note segmentation ‣  Tuning frequency estimation: 1.  Histogram of f0 deviations, 1 cent resolution 2.  Give more weight to stable frames (low f0 derivative) 3.  Use a bell-shape window to assign f0 values to histogram bins 4.  The maximum of the histogram (bmax) determines the estimated tuning frequency fref = 440·2bmax/1200 emilia.gomez@upf.edu  
  • 30. following criteria: duration (Ld ), pitch (Lc ), existence of dio voiced and unvoiced frames (Lv ), and low-level features repr Note segmentation related to stability (Ls ): ‣  Short note transcription: Dynamic programming (DP) algorithm. each L(npi ) = Ld (npi ) · Lc (npi ) · Lv (npi ) · Ls (npi ) (8) are givi ‣  Duration: small for short and long durations Duration likelihood Ld is set so that it is small for short step ‣  Stability: a voiced note should be more or less stable in timbre energy ‣  and long durations. Pitch likelihood L is defined so that it Pitch: more weight to frames with low f0 derivative c base ‣  Voicing: according to the % of voiced frames0 values are to the note nom- is higher the closer the frame f in a note peat note pitch indexinal pitch cpi , giving more relevance to frames with low f0 F derivative values. The voicing likelihood Lv is defined so node k, j tion that segments with a high percentage of unvoiced frames and are unlikely to be a voiced note, while segments with a temp j high percentage of voiced frames are unlikely to be an un- leve voiced note. Finally, the stability likelihood Ls considers that a voiced note is unlikely to have fast and significant 0 0 timbre or energy changes in the middle. Note that this is 4.1 k-dmax k-dmin k frame index not in contradiction with smooth vowel changes, charac- emilia.gomez@upf.edu   teristic of flamenco singing. We
  • 31. Note transcription ‣  Iterative note transcription: 1.  Note consolidation: consecutive notes with same pitch and soft transition in terms of energy and timbre (stability below a threshold) 2.  Tuning frequency refinement: consider note pitch values, giving higher weight to longer and louder notes 3.  Note pitch refinement. emilia.gomez@upf.edu  
  • 32. Predominant fundamental frequency estimation versus singing voice separation for the automatic transcription of accompanied flamenco singing emilia.gomez@upf.edu  
  • 33. Evaluation strategy ‣  Music material: ‣  30 excerpts, μduration=53.48 seconds, 2392 notes ‣  Variety of singers, recording conditions. ‣  Ground truth (big problem!): ‣  All perceptible notes (including ornamentations) ‣  Equal-tempered chromatic scale ‣  Discussion of working examples with flamenco experts ‣  Annotations by a single subject ‣  Evaluation measures (another big problem!) proposed by MIREX (Audio Melody Extraction task, on a frame basis, comparing quantized pitch values) emilia.gomez@upf.edu  
  • 34. Results ‣  Satisfying results for both strategies. ‣  Good guitar timbre estimation in our separation-based approach  requiring manual segmentation. ‣  Predominant f0 estimation (MTG), yields slightly higher accuracy  fully automatic. ‣  Best results adapting parameters (84.68% overall, 77.92 pitch accuracy) ‣  Voicing false alarm rate (around 10%), the guitar is detected as melody. ‣  Better results than for a cappella singing, no tuning errors. emilia.gomez@upf.edu  
  • 35. Qualitative error analysis ‣  Limitations: ‣  F0 estimation: ‣  Highly accompanied sections: voicing, 5th/8th errors ‣  Note segmentation labelling ‣  Highly ornamented sections ‣  Overall agreement: emilia.gomez@upf.edu  
  • 36. Case study ‣  Fandango de Valverde, Raya emilia.gomez@upf.edu  
  • 43. Conclusions ‣  Adaptive algorithms according to repertoire use- case ‣  Limitations challenges: ‣  F0 estimation: voicing ‣  Note transcription: onset detection, pitch labelling. ‣  Accurate enough for higher level analyses: similarity, style classification, motive analysis, COmputation FLAmenco http://mtg.upf.edu/research/projects/cofla) Thanks! emilia.gomez@upf.edu