Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music

Like this presentation? Why not share!

- What to Upload to SlideShare by SlideShare 8224924 views
- Customer Code: Creating a Company C... by HubSpot 5647729 views
- Be A Great Product Leader (Amplify,... by Adam Nash 1286599 views
- Trillion Dollar Coach Book (Bill Ca... by Eric Schmidt 1470084 views
- APIdays Paris 2019 - Innovation @ s... by apidays 1832512 views
- A few thoughts on work life-balance by Wim Vanderbauwhede 1284483 views

Hands-on tutorial given in IISC Bengaluru, 2016.

No Downloads

Total views

234

On SlideShare

0

From Embeds

0

Number of Embeds

0

Shares

0

Downloads

7

Comments

0

Likes

1

No notes for slide

- 1. Computational Approaches to Melodic Analysis of Indian Art Music Indian Institute of Sciences, Bengaluru, India 2016 Sankalp Gulati Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
- 2. Tonic Melody Intonation Raga Motifs Similarity Melodic description
- 3. Tonic Identification
- 4. Tonic Identification time (s) Frequency(Hz) 0 1 2 3 4 5 6 7 8 0 1000 2000 3000 4000 5000 100 150 200 250 300 0 0.2 0.4 0.6 0.8 1 Frequency (bins), 1bin=10 cents, Ref=55 Hz Normalizedsalience f2 f3 f4 f 5f6 Tonic Signal processing Learning q Tanpura / drone background sound q Extent of gamakas on Sa and Pa svara q Vadi, sam-vadi svara of the rāga S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic identification in Indian art music: approaches and evaluation. Journal of New Music Research, 43(01):55–73, 2014. Salamon, J., Gulati, S., & Serra, X. (2012). A multipitch approach to tonic identification in Indian classical music. In Proc. of Int. Conf. on Music Information Retrieval (ISMIR) (pp. 499–504), Porto, Portugal. Bellur, A., Ishwar, V., Serra, X., & Murthy, H. (2012). A knowledge based signal processing approach to tonic identification in Indian classical music. In 2nd CompMusic Workshop (pp. 113–118) Istanbul, Turkey. Ranjani, H. G., Arthi, S., & Sreenivas, T. V. (2011). Carnatic music analysis: Shadja, swara identification and raga verification in Alapana using stochastic models. Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE Workshop , 29–32, New Paltz, NY. Accuracy : ~90% !!!
- 5. Tonic Identification: Multipitch Approach q Audio example: q Utilizing drone sound q Multi-pitch analysis Vocals Drone J. Salamon, E. G´omez, and J. Bonada. Sinusoid extraction and salience function design for predominant melody estimation. In Proc. 14th Int. Conf. on Digital Audio Effects (DAFX-11), pages 73–80, Paris, France, Sep. 2011.
- 6. Tonic Identification: Block Diagram STFT Spectral Peak Picking Frequency/ Amplitude correc<on Salience peak picking Mul<-pitch histogram Histogram peak picking Bin salience mapping Harmonic summa<on Audio Sinusoids Time frequency salience Sinusoid Extrac<on Tonic candidates Salience func<on computa<on Tonic candidate genera<on
- 7. Tonic Identification: Signal Processing q STFT § Hop size: 11 ms § Window length: 46 ms § Window type: hamming § FFT = 8192 points STFT
- 8. Tonic Identification: Signal Processing q Spectral peak picking § Absolute threshold: -60 dB Spectral Peak Picking
- 9. Tonic Identification: Signal Processing q Frequency/Amplitude correction § Parabolic interpolation Frequency/ Amplitude correc<on
- 10. Tonic Identification: Signal Processing q Harmonic summation § Spectrum considered: 55-7200 Hz § Frequency range: 55-1760 Hz § Base frequency: 55 Hz § Bin resolution: 10 cents per bin (120 per octave) § N octaves: 5 § Maximum harmonics: 20 § Square cosine window across 50 cents Bin salience mapping Harmonic summa<on
- 11. Tonic Identification: Signal Processing q Tonic candidate generation § Number of salience peaks per frame: 5 § Frequency range: 110-550 Hz Mul<-pitch histogram
- 12. Tonic Identification: Feature Exraction q Identifying tonic in correct octave using multi-pitch histogram q Classification based template learning q Class of an instance is the rank of the tonic 100 150 200 250 300 350 400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Frequency bins (1 bin = 10 cents), Ref: 55Hz Normalizedsalience Multipitch Histogram f2 f3 f4 f5
- 13. q Decision Tree: f2 f3 f2 f3 f5 1st 1st 2nd 3rd 4th 5th >5 <=5 >-7 <=-7 >-11 <=-11 >5 <=5 >-6 <=-6 Sa Sa Pa salience Frequency Sa Sa Pa salience Frequency Tonic Identification: Classification
- 14. Tonic Identification: Results S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic identification in Indian art music: approaches and evaluation. Journal of New Music Research, 43(01): 55–73, 2014.
- 15. Predominant Pitch Estimation
- 16. Pitch Estimation Algorithms q Time-domain approaches § ACF-based (Rabiner 1977) § AMDF-based (YIN) Cheveigné et al. q Frequency-domain approaches § Two-way mismatch (Maher and Beauchamp 1994) § Subharmonic summation (Hermes 1988) Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(1), 24–33 De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111, no. 4 (2002): 1917-1930. § Multi-pitch approaches § Source separation-based (Klapuri, 2003) § Harmonic summation (Melodia) (Salamon and Gómez, 2012) Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48. Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The Journal of the Acoustical Society of , 95 (April), 2254–2263. Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264. Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Speech and Audio Processing, 11(6), 804–816. Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
- 17. Pitch Estimation Algorithms q Time-domain approaches § ACF-based (Rabiner 1977) § AMDF-based (YIN) Cheveigné et al. q Frequency-domain approaches § Two-way mismatch (Maher and Beauchamp 1994) § Subharmonic summation (Hermes 1988) Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(1), 24–33 De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111, no. 4 (2002): 1917-1930. § Multi-pitch approaches § Source separation-based (Klapuri, 2003) § Harmonic summation (Melodia) (Salamon and Gómez, 2012) Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48. Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The Journal of the Acoustical Society of , 95 (April), 2254–2263. Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264. Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Speech and Audio Processing, 11(6), 804–816. Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
- 18. Predominant Pitch Estimation: YIN Signal Diﬀerence function Auto-correlation Cumulative diﬀerence function rt͑͒ϭ ͚jϭtϩ1 tϩW xjxjϩ, ͑1͒ where rt() is the autocorrelation function of lag calculated at time index t, and W is the integration window size. This function is illustrated in Fig. 1͑b͒ for the signal plotted in Fig. 1͑a͒. It is common in signal processing to use a slightly different deﬁnition: rtЈ͑͒ϭ ͚jϭtϩ1 tϩWϪ xjxjϩ. ͑2͒ Here the integration window size shrinks with increasing values of , with the result that the envelope of the function decreases as a function of lag as illustrated in Fig. 1͑c͒. The FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function ͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same, calculated according to Eq. ͑2͒. The envelope of this function is tapered to zero because of the smaller number of terms in the summation at larger . FIG. 2. F0 estimation error rates as a function of the slope of the envelope of the ACF, quantiﬁed by its intercept with the abscissa. The dotted line represents errors for which the F0 estimate was too high, the dashed line those for which it was too low, and the full line their sum. Triangles at the right represent error rates for ACF calculated as in Eq. ͑1͒ (maxϭϱ). These rates were measured over a subset of the database used in Sec. III. Lag (samples) The present article introduces a method for F0 estima- tion that produces fewer errors than other well-known meth- ods. The name YIN ͑from ‘‘yin’’ and ‘‘yang’’ of oriental philosophy͒ alludes to the interplay between autocorrelation and cancellation that it involves. This article is the ﬁrst of a rt͑͒ϭ where rt( at time ind function is Fig. 1͑a͒. I different d rtЈ͑͒ϭ Here the values of decreases two deﬁnit side ͓tϩ1, this articl ‘‘modiﬁed correlation In resp multiples FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function ͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same, calculated according to Eq. ͑2͒. The envelope of this function is tapered to zero because of the smaller number of terms in the summation at larger . The horizontal arrows symbolize the search range for the period. FIG. 2. F0 e of the ACF, represents er those for wh right represen rates were m max . The parameter max allows the algorithm to be biased to favor one form of error at the expense of the other, with a minimum of total error for intermediate values. Using Eq. ͑2͒ rather than Eq. ͑1͒ introduces a natural bias that can be tuned by adjusting W. However, changing the window size has other effects, and one can argue that a bias of this sort, if useful, should be applied explicitly rather than implicitly. This is one reason to prefer the deﬁnition of Eq. ͑1͒. The autocorrelation method compares the signal to its shifted self. In that sense it is related to the AMDF method ͑average magnitude difference function, Ross et al., 1974; Ney, 1982͒ that performs its comparison using differences rather than products, and more generally to time-domain methods that measure intervals between events in time ͑Hess, 1983͒. The ACF is the Fourier transform of the power spectrum, and can be seen as measuring the regular spacing of harmonics within that spectrum. The cepstrum method ͑Noll, 1967͒ replaces the power spectrum by the log magni- tude spectrum and thus puts less weight on high-amplitude parts of the spectrum ͑particularly near the ﬁrst formant that often dominates the ACF͒. Similar ‘‘spectral whitening’’ ef- fects can be obtained by linear predictive inverse ﬁltering or center-clipping ͑Rabiner and Schafer, 1978͒, or by splitting the signal over a bank of ﬁlters, calculating ACFs within each channel, and adding the results after amplitude normal- ization ͑de Cheveigne´, 1991͒. Auditory models based on au- tocorrelation are currently one of the more popular ways to The same is true after taking the square and averaging over a window: ͚jϭtϩ1 tϩW ͑xjϪxjϩT͒2 ϭ0. ͑5͒ Conversely, an unknown period may be found by forming the difference function: dt͑͒ϭ ͚jϭ1 W ͑xjϪxjϩ͒2 , ͑6͒ and searching for the values of for which the function is zero. There is an inﬁnite set of such values, all multiples of the period. The difference function calculated from the signal in Fig. 1͑a͒ is illustrated in Fig. 3͑a͒. The squared sum may FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒. ͑b͒ Cumulative mean normalized difference function. Note that the function starts at 1 rather than 0 and remains high until the dip at the period. size was 25 ms, window shift was one sample, search range was 40 to 800 Hz, and threshold ͑step 4͒ was 0.1. Version Gross error ͑%͒ Step 1 10.0 Step 2 1.95 Step 3 1.69 Step 4 0.78 Step 5 0.77 Step 6 0.50 Lag (samples) ed a od re ow 00 sed h a ͑2͒ ned has if tly. its hod 74; ces ain The same is true after taking the square and averaging over a FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒. ͑b͒ Cumulative mean normalized difference function. Note that the function starts at 1 rather than 0 and remains high until the dip at the period. hod were dow 800 Lag (samples) max . The parameter max allows the algorithm to be biased to favor one form of error at the expense of the other, with a minimum of total error for intermediate values. Using Eq. ͑2͒ rather than Eq. ͑1͒ introduces a natural bias that can be tuned by adjusting W. However, changing the window size has other effects, and one can argue that a bias of this sort, if useful, should be applied explicitly rather than implicitly. This is one reason to prefer the deﬁnition of Eq. ͑1͒. The autocorrelation method compares the signal to its shifted self. In that sense it is related to the AMDF method ͑average magnitude difference function, Ross et al., 1974; Ney, 1982͒ that performs its comparison using differences rather than products, and more generally to time-domain The same is true after taking the square and averaging over a window: FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒. ͑b͒ Cumulative mean normalized difference function. Note that the function starts at 1 rather than 0 and remains high until the dip at the period. TABLE I. Gross error rates for the simple unbiased autocorrelation method ͑step 1͒, and for the cumulated steps described in the text. These rates were measured over a subset of the database used in Sec. III. Integration window size was 25 ms, window shift was one sample, search range was 40 to 800 Hz, and threshold ͑step 4͒ was 0.1. Version Gross error ͑%͒ Step 1 10.0 Step 2 1.95 Step 3 1.69 Step 4 0.78 Step 5 0.77 Step 6 0.50 Lag (samples) De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111, no. 4 (2002): 1917-1930.
- 19. Predominant Pitch Estimation: YIN
- 20. Predominant Pitch Estimation: Melodia Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
- 21. Predominant Pitch Estimation: Melodia Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
- 22. Predominant Pitch Estimation: Melodia audio Spectrogram Spectral peaks Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
- 23. Predominant Pitch Estimation: Melodia Spectral peaks Time-frequency salience Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
- 24. Predominant Pitch Estimation: Melodia Time-frequency salience Salience peaks Contours Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
- 25. Predominant Pitch Estimation: Melodia Contours Predominant melody contours Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
- 26. Essentia implementation of Melodia
- 27. Essentia implementation of Melodia
- 28. Essentia implementation of Melodia
- 29. Essentia implementation of Melodia
- 30. Essentia implementation of Melodia
- 31. Essentia implementation of Melodia
- 32. Essentia implementation of Melodia
- 33. Essentia implementation of Melodia
- 34. Essentia implementation of Melodia Audio Spectrogram
- 35. Essentia implementation of Melodia
- 36. Essentia implementation of Melodia Spectral peaks Spectrogram
- 37. Essentia implementation of Melodia
- 38. Essentia implementation of Melodia Time-frequency salience Spectral peaks
- 39. Essentia implementation of Melodia
- 40. Essentia implementation of Melodia Salience peaks Time-frequency salience
- 41. Essentia implementation of Melodia
- 42. Essentia implementation of Melodia All contours Salience peaks
- 43. Essentia implementation of Melodia
- 44. Essentia implementation of Melodia Predominant melody contours All contours
- 45. Essentia implementation of Melodia
- 46. Essentia implementation of Melodia
- 47. Essentia implementation of Melodia
- 48. Essentia implementation of Melodia
- 49. Predominant Pitch Estimation: Melodia
- 50. What about loudness and timbre?
- 51. What about loudness and timbre?
- 52. Loudness features in Essentia
- 53. Loudness of predominant voice
- 54. Loudness of predominant voiceFrequency Time
- 55. Loudness of predominant voiceFrequency Time
- 56. Loudness of predominant voiceFrequency Time F0
- 57. Loudness of predominant voiceFrequency Time F0
- 58. Loudness of predominant voiceFrequency Time F0
- 59. Loudness of predominant voiceFrequency Time F0
- 60. Loudness of predominant voice: example
- 61. Spectral centroid of predominant voice
- 62. CompMusic: Dunya
- 63. CompMusic: Dunya API Internet
- 64. CompMusic: Dunya Web
- 65. CompMusic: Dunya API hTps://github.com/MTG/pycompmusic
- 66. Dunya API Examples q Metadata q Features

No public clipboards found for this slide

Special Offer to SlideShare Readers

The SlideShare family just got bigger. You now have unlimited* access to books, audiobooks, magazines, and more from Scribd.

Cancel anytime.
Be the first to comment