SlideShare a Scribd company logo
Landmark Detection in Hindustani
Music Melodies
Sankalp Gulati1, Joan Serrà2, Kaustuv K. Ganguli3 and Xavier Serra1
sankalp.gulati@upf.edu, jserra@iiia.csic.es, kaustuvkanti@ee.iitb.ac.in, xavier.serra@upf.edu
1Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
2Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Barcelona, Spain
3Indian Institute of Technology Bombay, Mumbai, India
SMC-ICMC 2014, Athens, Greece
5
Indian Art Music
§  Hindustani music (North Indian music)
§  Carnatic music
Melodies in Hindustani Music
§  Rāg: melodic framework of Indian art music
Melodies in Hindustani Music
§  Rāg: melodic framework of Indian art music
S	 r	 R	 g	 G	 m	 M	 P	 d	 D	 n	 N	
Do	 Re	 Mi	 Fa	 Sol	 La	 Ti	
Svaras
Melodies in Hindustani Music
§  Rāg: melodic framework of Indian art music
S	 r	 R	 g	 G	 m	 M	 P	 d	 D	 n	 N	
Do	 Re	 Mi	 Fa	 Sol	 La	 Ti	
Svaras
Bhairavi Thaat
S	 r	 g	 m	 P	 d	 n
Melodies in Hindustani Music
§  Rāg: melodic framework of Indian art music
S	 r	 R	 g	 G	 m	 M	 P	 d	 D	 n	 N	
Do	 Re	 Mi	 Fa	 Sol	 La	 Ti	
Svaras
Bhairavi Thaat
*Nyās Svar (Rāg Bilaskhani todi)
S	 r*	 g*	 m*	 P	 d*	 n	
S	 r	 g	 m	 P	 d	 n	
Nyās translates to home/residence
§  Example
Melodic Landmark: Nyās Occurrences
178 180 182 184 186 188 190
−200
0
200
400
600
800
Time (s)
F0frequency(cents)
N1 N5N4N3N2
A.	K.	Dey,	Nyāsa	in		rāga:	the	pleasant	pause	in	Hindustani	music.	Kanishka	Publishers,	Distributors,	
2008.
Goal and Motivation
§  Methodology for detecting nyās occurrences
Goal and Motivation
§  Methodology for detecting nyās occurrences
N	 N	 N	 N	 N
Goal and Motivation
§  Methodology for detecting nyās occurrences
§  Motivation
§  Melodic motif discovery [Ross and Rao 2012]
§  Melodic segmentation
§  Music transcription
N	 N	 N	 N	 N	
J.	C.	Ross	and	P.	Rao,“DetecKon	of	raga-characterisKc	phrases	from	Hindustani	classical	music	
audio,”	in	Proc.	of	2nd	CompMusic	Workshop,	2012,	pp.	133–	138.
Methodology: Block Diagram
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
Block	diagram	of	the	proposed	methodology
Methodology: Pred. Pitch Estimation
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
Methodology: Segmentation
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
Methodology: Feature Extraction
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
0.12, 0.34, 0.59, 0.23, 0.54
0.21, 0.24, 0.54, 0.54, 0.42
0.32, 0.23, 0.34, 0.41, 0.63
0.66, 0.98, 0.74, 0.33, 0.12
0.90, 0.42, 0.14, 0.83, 0.76
Methodology: Segment Classification
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
0.12, 0.34, 0.59, 0.23, 0.54
0.21, 0.24, 0.54, 0.54, 0.42
0.32, 0.23, 0.34, 0.41, 0.63
0.66, 0.98, 0.74, 0.33, 0.12
0.90, 0.42, 0.14, 0.83, 0.76
Methodology: Segment Classification
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
Nyās
Non nyās
Nyās
Non nyās
Non nyās
Pred. Pitch Estimation and Representation
§  Predominant pitch estimation
§  Method by Salamon and Gómez (2012)
§  Favorable results in MIREX’11
§  Tonic Normalization
§  Pitch values converted from Hertz to Cents
§  Multi-pitch approach by Gulati et al. (2014)
J.	Salamon	and	E.	Go	́mez,	“Melody	extracKon	from	polyphonic	music	signals	using	pitch	contour	charac-	terisKcs,”	IEEE	TransacKons	on	
Audio,	Speech,	and	Language	Processing,	vol.	20,	no.	6,	pp.	1759–1770,	2012.	
S.	GulaK,	A.	Bellur,	J.	Salamon,	H.	Ranjani,	V.	Ishwar,	H.	A.	Murthy,	and	X.	Serra,	“AutomaKc	Tonic	IdenK-	ficaKon	in	Indian	Art	Music:	
Approaches	and	Evalua-	Kon,”	Journal	of	New	Music	Research,	vol.	43,	no.	01,	pp.	55–73,	2014.
Melody Segmentation
1 2 3 4 5 6 7 8 9 10
850
900
950
1000
1050
1100
1150
T1
Sn
Sn-1
Sn+1
ε
ρ1T2 T3
ρ2
T4 T7 T8 T9
T5 T6
Nyās segment
Time (s)
FundamentalFrequency(cents)
§  Baseline: Piecewise linear
segmentation (PLS)
E.	Keogh,	S.	Chu,	D.	Hart,	and	M.	Pazzani,	“SegmenKng	Kme	series:	A	survey	and	novel	approach,”	Data	
Mining	in	Time	Series	Databases,	vol.	57,	pp.	1–22,	2004.
Feature Extraction
Feature Extraction
§  Local (9 features)
Feature Extraction
§  Local (9 features)
§  Contextual (24 features)
Feature Extraction
§  Local (9 features)
§  Contextual (24 features)
§  Local + Contextual (33 features)
Feature Extraction: Local
§  Segment Length
l
Feature Extraction: Local
§  Segment Length
§  μ and σ of pitch values
μ	σ
Feature Extraction: Local
§  Segment Length
§  μ and σ of pitch values
§  μ and σ of difference in adjacent peaks and valley
locations
d1	 d2	 d3	 d4	 d5
Feature Extraction: Local
§  Segment Length
§  μ and σ of pitch values
§  μ and σ of difference in adjacent peaks and valley
locations
§  μ and σ of the peak and valley amplitudes
p1	
p2	
p3	
p4	
p5	
p6
Feature Extraction: Local
§  Segment Length
§  μ and σ of pitch values
§  μ and σ of difference in adjacent peaks and valley
locations
§  μ and σ of the peak and valley amplitudes
§  Temporal centroid (length normalized)
Feature Extraction: Local
§  Segment Length
§  μ and σ of pitch values
§  μ and σ of difference in adjacent peaks and valley
locations
§  μ and σ of the peak and valley amplitudes
§  Temporal centroid (length normalized)
§  Binary flatness measure
Flat	or	non-flat
Feature Extraction: Contextual
§  4 different normalized segment lengths
L1	
L2	
Breath	
Pause	
Breath	
Pause
Feature Extraction: Contextual
§  4 different normalized segment lengths
L1	
L2	
Breath	
Pause	
Breath	
Pause
Feature Extraction: Contextual
§  4 different normalized segment lengths
L1	
L2	
Breath	
Pause	
Breath	
Pause
Feature Extraction: Contextual
§  4 different normalized segment lengths
L1	
L2	
Breath	
Pause	
Breath	
Pause
Feature Extraction: Contextual
§  4 different normalized segment lengths
§  Time difference from the succeeding and preceding
breath pauses
d1	 d2	
Breath	
Pause	
Breath	
Pause
Feature Extraction: Contextual
§  4 different normalized segment lengths
§  Time difference from the succeeding and
proceeding unvoiced regions
§  Local features of neighboring segments (9*2=18)
Breath	
Pause	
Breath	
Pause
Segment Classification
§  Class: Nyās and Non-nyās
§  Classifiers:
§  Trees (min_sample_split=10)
§  K nearest neighbors (n_neighbors=5)
§  Naive bayes (fit_prior=False)
§  Logistic regression (class_weight=‘auto’)
§  Support vector machines (RBF)(class weight=‘auto’)
§  Testing methodology
§  Cross-fold validation
§  Software: Scikit-learn, version 0.14.1
F.	Pedregosa,	G.	Varoquaux,	A.	Gramfort,	V.	Michel,	B.	Thirion,	O.	Grisel,	M.	Blondel,	P.	Prejenhofer,	R.	Weiss,	V.	Dubourg,	J.	Vanderplas,	A.	
Passos,	D.	Cournapeau,	M.	Brucher,	M.	Perrot,	and	E.	Duches-	nay,	“Scikit-learn:	machine	learning	in	Python,”	Journal	of	Machine	Learning	
Research,	vol.	12,	pp.	2825–	2830,	2011
Evaluation: Dataset
§  Audio:
§  Number of recordings: 20 Ālap vocal pieces
§  Duration of recordings: 1.5 hours
§  Number of artists: 8
§  Number of rāgs: 16
§  Type of audio: 15 polyphonic commercial recordings, 5
in-house monophonic recordings**
§  Annotations: Musician with > 15 years of training
§  Statistics:
§  1257 nyās segments
§  150 ms to16.7 s
§  mean 2.46 s, median 1.47 s.
**Openly	available	under	CC	license	in	freesound.org
Evaluation: Measures
§  Boundary annotations (F-scores)
§  Hit rate
§  Allowed deviation: 100 ms
§  Label annotations (F-scores):
§  Pairwise frame clustering method [Levy and Sandler
2008]
§  Statistical significance: Mann-Whitney U test
(p=0.05)
§  Multiple comparison: Holm-Bonferroni method
M.	Levy	and	M.	Sandler,	“Structural	segmentaKon	of	musical	audio	by	constrained	clustering,”	IEEE	TransacKons	on	
Audio,	Speech,	and	Language	Processing,	vol.	16,	no.	2,	pp.	318–326,	2008.	
H.	B.	Mann	and	D.	R.	Whitney,	“On	a	test	of	whether	one	of	two	random	variables	is	stochasKcally	larger	than	the	
other,”	The	annals	of	mathemaKcal	staKsKcs,	vol.	18,	no.	1,	pp.	50–60,	1947.	
S.	Holm,	“A	simple	sequenKally	rejecKve	mulKple	test	procedure,”	Scandinavian	journal	of	staKsKcs,	pp.	65–	70,	1979.
Evaluation: Baseline Approach
§  DTW based kNN classification (k=5)
§  Frequently used for time series classification
§  Random baselines
§  Randomly planting boundaries
§  Evenly planting boundaries at every 100 ms *
§  Ground truth boundaries, randomly assign class labels
X.	Xi,	E.	J.	Keogh,	C.	R.	Shelton,	L.	Wei,	and	C.	A.	Ratanamahatana,	“Fast	Kme	series	classificaKon	using	numerosity	
reducKon,”	in	Proc.	of	the	Int.	Conf.	on	Ma-	chine	Learning,	2006,	pp.	1033–1040.	
X.	Wang,	A.	Mueen,	H.	Ding,	G.	Trajcevski,	P.	Scheuermann,	and	E.	J.	Keogh,	“Experimental	com-	parison	of	
representaKon	methods	and	distance	mea-	sures	for	Kme	series	data,”	Data	Mining	and	Knowl-	edge	Discovery,	
vol.	26,	no.	2,	pp.	275–309,	2013.
Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
Results: Nyās Label Annotation
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
A
L 0.553 0.685 0.723 0.621 0.727 0.722
C 0.251 0.639 0.631 0.690 0.688 0.674
L+C 0.389 0.694 0.693 0.708 0.722 0.706
B
L 0.546 0.708 0.754 0.714 0.749 0.758
C 0.281 0.671 0.611 0.697 0.689 0.697
L+C 0.332 0.672 0.710 0.730 0.743 0.731
Table 2. F-scores for ny¯as and non-ny¯as label annotations
task using PLS method (A) and the proposed segmenta-
tion method (B). Results are shown for different classifiers
(Tree, KNN, NB, LR, SVM) and local (L), contextual (C)
and local together with contextual (L+C) features. DTW is
the baseline method used for comparison. The best random
baseline F-score is 0.153 obtained using RB2.
mentation method consistently performs better than PLS.
However, the differences are not statistically significant.
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whi
the future we plan to
mances, which is a co
sic. We also plan to
and different evaluati
Acknowledgments
This work is partly s
Council under the Eu
Program, as part of
agreement 267583).
from Generalitat de C
the European Commi
and European Social
6.
[1] A. D. Patel, Musi
UK: Oxford Univ
[2] W. S. Rockstro,
ers, and J. Rushto
Conclusions and Future work
§  Proposed segmentation better than PLS method
§  Proposed methodology better than standard DTW based
kNN classification
§  Local features yield highest accuracy
§  Contextual features are also important (maybe not
complementary to local features)
§  Future work
§  Perform similar analysis on Bandish performances
§  Incorporate Rāga specific knowledge
Landmark Detection in Hindustani
Music Melodies
Sankalp Gulati1, Joan Serrà2, Kaustuv K. Ganguli3 and Xavier Serra1
sankalp.gulati@upf.edu, jserra@iiia.csic.es, kaustuvkanti@ee.iitb.ac.in, xavier.serra@upf.edu
1Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
2Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Barcelona, Spain
3Indian Institute of Technology Bombay, Mumbai, India
SMC-ICMC 2014, Athens, Greece
5

More Related Content

Similar to Landmark Detection in Hindustani Music Melodies

Computational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicComputational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art Music
Sankalp Gulati
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
Dev Nath
 
Automatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesAutomatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course Lectures
Yun-Nung (Vivian) Chen
 
Sequence Learning with CTC technique
Sequence Learning with CTC techniqueSequence Learning with CTC technique
Sequence Learning with CTC technique
Chun Hao Wang
 
Phrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingPhrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space Modeling
Sankalp Gulati
 
Metric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target PlaylistsMetric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target Playlists
Ying-Shu Kuo
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
IJERA Editor
 
Transferring Singing Expressions from One Voice to Another for a Given Song
Transferring Singing Expressions from One Voice to Another for a Given SongTransferring Singing Expressions from One Voice to Another for a Given Song
Transferring Singing Expressions from One Voice to Another for a Given Song
NAVER Engineering
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid Approach
Enrico Daga
 
Interval Hashing Based Ranking
Interval Hashing Based RankingInterval Hashing Based Ranking
Interval Hashing Based Ranking
Andrea Gazzarini
 
LSH
LSHLSH
More than Words: Advancing Prosodic Analysis
More than Words: Advancing Prosodic AnalysisMore than Words: Advancing Prosodic Analysis
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingMusical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
Sease
 
Rna seq
Rna seqRna seq
Rna seq
Sean Davis
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
Ana Marasović
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
ssuser4293bd
 
is2015_poster
is2015_posteris2015_poster
is2015_poster
Jan Svec
 
Inductive learning of long-distance dissimilation as a problem for phonology
Inductive learning of long-distance dissimilation as a problem for phonologyInductive learning of long-distance dissimilation as a problem for phonology
Inductive learning of long-distance dissimilation as a problem for phonology
Kevin McMullin
 
AlgoAlignementGenomicSequences.ppt
AlgoAlignementGenomicSequences.pptAlgoAlignementGenomicSequences.ppt
AlgoAlignementGenomicSequences.ppt
SkanderBena
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
Akshay Hegde
 

Similar to Landmark Detection in Hindustani Music Melodies (20)

Computational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicComputational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art Music
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
 
Automatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesAutomatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course Lectures
 
Sequence Learning with CTC technique
Sequence Learning with CTC techniqueSequence Learning with CTC technique
Sequence Learning with CTC technique
 
Phrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingPhrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space Modeling
 
Metric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target PlaylistsMetric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target Playlists
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
 
Transferring Singing Expressions from One Voice to Another for a Given Song
Transferring Singing Expressions from One Voice to Another for a Given SongTransferring Singing Expressions from One Voice to Another for a Given Song
Transferring Singing Expressions from One Voice to Another for a Given Song
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid Approach
 
Interval Hashing Based Ranking
Interval Hashing Based RankingInterval Hashing Based Ranking
Interval Hashing Based Ranking
 
LSH
LSHLSH
LSH
 
More than Words: Advancing Prosodic Analysis
More than Words: Advancing Prosodic AnalysisMore than Words: Advancing Prosodic Analysis
More than Words: Advancing Prosodic Analysis
 
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingMusical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
 
Rna seq
Rna seqRna seq
Rna seq
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
is2015_poster
is2015_posteris2015_poster
is2015_poster
 
Inductive learning of long-distance dissimilation as a problem for phonology
Inductive learning of long-distance dissimilation as a problem for phonologyInductive learning of long-distance dissimilation as a problem for phonology
Inductive learning of long-distance dissimilation as a problem for phonology
 
AlgoAlignementGenomicSequences.ppt
AlgoAlignementGenomicSequences.pptAlgoAlignementGenomicSequences.ppt
AlgoAlignementGenomicSequences.ppt
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 

More from Sankalp Gulati

[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
Sankalp Gulati
 
Computational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicComputational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art Music
Sankalp Gulati
 
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaComputational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Sankalp Gulati
 
Hindify
HindifyHindify
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Sankalp Gulati
 
Tonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicTonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic Music
Sankalp Gulati
 
Tonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicTonic Identification System for Indian Art Music
Tonic Identification System for Indian Art Music
Sankalp Gulati
 

More from Sankalp Gulati (7)

[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
 
Computational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicComputational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art Music
 
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaComputational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
 
Hindify
HindifyHindify
Hindify
 
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
 
Tonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicTonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic Music
 
Tonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicTonic Identification System for Indian Art Music
Tonic Identification System for Indian Art Music
 

Recently uploaded

Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
National Information Standards Organization (NISO)
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
melliereed
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
Steve Thomason
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
nitinpv4ai
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDFLifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Vivekanand Anglo Vedic Academy
 

Recently uploaded (20)

Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDFLifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
 

Landmark Detection in Hindustani Music Melodies

  • 1. Landmark Detection in Hindustani Music Melodies Sankalp Gulati1, Joan Serrà2, Kaustuv K. Ganguli3 and Xavier Serra1 sankalp.gulati@upf.edu, jserra@iiia.csic.es, kaustuvkanti@ee.iitb.ac.in, xavier.serra@upf.edu 1Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain 2Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Barcelona, Spain 3Indian Institute of Technology Bombay, Mumbai, India SMC-ICMC 2014, Athens, Greece 5
  • 2. Indian Art Music §  Hindustani music (North Indian music) §  Carnatic music
  • 3. Melodies in Hindustani Music §  Rāg: melodic framework of Indian art music
  • 4. Melodies in Hindustani Music §  Rāg: melodic framework of Indian art music S r R g G m M P d D n N Do Re Mi Fa Sol La Ti Svaras
  • 5. Melodies in Hindustani Music §  Rāg: melodic framework of Indian art music S r R g G m M P d D n N Do Re Mi Fa Sol La Ti Svaras Bhairavi Thaat S r g m P d n
  • 6. Melodies in Hindustani Music §  Rāg: melodic framework of Indian art music S r R g G m M P d D n N Do Re Mi Fa Sol La Ti Svaras Bhairavi Thaat *Nyās Svar (Rāg Bilaskhani todi) S r* g* m* P d* n S r g m P d n Nyās translates to home/residence
  • 7. §  Example Melodic Landmark: Nyās Occurrences 178 180 182 184 186 188 190 −200 0 200 400 600 800 Time (s) F0frequency(cents) N1 N5N4N3N2 A. K. Dey, Nyāsa in rāga: the pleasant pause in Hindustani music. Kanishka Publishers, Distributors, 2008.
  • 8. Goal and Motivation §  Methodology for detecting nyās occurrences
  • 9. Goal and Motivation §  Methodology for detecting nyās occurrences N N N N N
  • 10. Goal and Motivation §  Methodology for detecting nyās occurrences §  Motivation §  Melodic motif discovery [Ross and Rao 2012] §  Melodic segmentation §  Music transcription N N N N N J. C. Ross and P. Rao,“DetecKon of raga-characterisKc phrases from Hindustani classical music audio,” in Proc. of 2nd CompMusic Workshop, 2012, pp. 133– 138.
  • 11. Methodology: Block Diagram Predominant pitch estimation Tonic identification Audio Nyās svars Pred. pitch est. and representation Histogram computation SegmentationSvar identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion Block diagram of the proposed methodology
  • 12. Methodology: Pred. Pitch Estimation Predominant pitch estimation Tonic identification Audio Nyās svars Pred. pitch est. and representation Histogram computation SegmentationSvar identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion
  • 13. Methodology: Segmentation Predominant pitch estimation Tonic identification Audio Nyās svars Pred. pitch est. and representation Histogram computation SegmentationSvar identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion
  • 14. Methodology: Feature Extraction Predominant pitch estimation Tonic identification Audio Nyās svars Pred. pitch est. and representation Histogram computation SegmentationSvar identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion 0.12, 0.34, 0.59, 0.23, 0.54 0.21, 0.24, 0.54, 0.54, 0.42 0.32, 0.23, 0.34, 0.41, 0.63 0.66, 0.98, 0.74, 0.33, 0.12 0.90, 0.42, 0.14, 0.83, 0.76
  • 15. Methodology: Segment Classification Predominant pitch estimation Tonic identification Audio Nyās svars Pred. pitch est. and representation Histogram computation SegmentationSvar identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion 0.12, 0.34, 0.59, 0.23, 0.54 0.21, 0.24, 0.54, 0.54, 0.42 0.32, 0.23, 0.34, 0.41, 0.63 0.66, 0.98, 0.74, 0.33, 0.12 0.90, 0.42, 0.14, 0.83, 0.76
  • 16. Methodology: Segment Classification Predominant pitch estimation Tonic identification Audio Nyās svars Pred. pitch est. and representation Histogram computation SegmentationSvar identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion Nyās Non nyās Nyās Non nyās Non nyās
  • 17. Pred. Pitch Estimation and Representation §  Predominant pitch estimation §  Method by Salamon and Gómez (2012) §  Favorable results in MIREX’11 §  Tonic Normalization §  Pitch values converted from Hertz to Cents §  Multi-pitch approach by Gulati et al. (2014) J. Salamon and E. Go ́mez, “Melody extracKon from polyphonic music signals using pitch contour charac- terisKcs,” IEEE TransacKons on Audio, Speech, and Language Processing, vol. 20, no. 6, pp. 1759–1770, 2012. S. GulaK, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H. A. Murthy, and X. Serra, “AutomaKc Tonic IdenK- ficaKon in Indian Art Music: Approaches and Evalua- Kon,” Journal of New Music Research, vol. 43, no. 01, pp. 55–73, 2014.
  • 18. Melody Segmentation 1 2 3 4 5 6 7 8 9 10 850 900 950 1000 1050 1100 1150 T1 Sn Sn-1 Sn+1 ε ρ1T2 T3 ρ2 T4 T7 T8 T9 T5 T6 Nyās segment Time (s) FundamentalFrequency(cents) §  Baseline: Piecewise linear segmentation (PLS) E. Keogh, S. Chu, D. Hart, and M. Pazzani, “SegmenKng Kme series: A survey and novel approach,” Data Mining in Time Series Databases, vol. 57, pp. 1–22, 2004.
  • 21. Feature Extraction §  Local (9 features) §  Contextual (24 features)
  • 22. Feature Extraction §  Local (9 features) §  Contextual (24 features) §  Local + Contextual (33 features)
  • 24. Feature Extraction: Local §  Segment Length §  μ and σ of pitch values μ σ
  • 25. Feature Extraction: Local §  Segment Length §  μ and σ of pitch values §  μ and σ of difference in adjacent peaks and valley locations d1 d2 d3 d4 d5
  • 26. Feature Extraction: Local §  Segment Length §  μ and σ of pitch values §  μ and σ of difference in adjacent peaks and valley locations §  μ and σ of the peak and valley amplitudes p1 p2 p3 p4 p5 p6
  • 27. Feature Extraction: Local §  Segment Length §  μ and σ of pitch values §  μ and σ of difference in adjacent peaks and valley locations §  μ and σ of the peak and valley amplitudes §  Temporal centroid (length normalized)
  • 28. Feature Extraction: Local §  Segment Length §  μ and σ of pitch values §  μ and σ of difference in adjacent peaks and valley locations §  μ and σ of the peak and valley amplitudes §  Temporal centroid (length normalized) §  Binary flatness measure Flat or non-flat
  • 29. Feature Extraction: Contextual §  4 different normalized segment lengths L1 L2 Breath Pause Breath Pause
  • 30. Feature Extraction: Contextual §  4 different normalized segment lengths L1 L2 Breath Pause Breath Pause
  • 31. Feature Extraction: Contextual §  4 different normalized segment lengths L1 L2 Breath Pause Breath Pause
  • 32. Feature Extraction: Contextual §  4 different normalized segment lengths L1 L2 Breath Pause Breath Pause
  • 33. Feature Extraction: Contextual §  4 different normalized segment lengths §  Time difference from the succeeding and preceding breath pauses d1 d2 Breath Pause Breath Pause
  • 34. Feature Extraction: Contextual §  4 different normalized segment lengths §  Time difference from the succeeding and proceeding unvoiced regions §  Local features of neighboring segments (9*2=18) Breath Pause Breath Pause
  • 35. Segment Classification §  Class: Nyās and Non-nyās §  Classifiers: §  Trees (min_sample_split=10) §  K nearest neighbors (n_neighbors=5) §  Naive bayes (fit_prior=False) §  Logistic regression (class_weight=‘auto’) §  Support vector machines (RBF)(class weight=‘auto’) §  Testing methodology §  Cross-fold validation §  Software: Scikit-learn, version 0.14.1 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prejenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duches- nay, “Scikit-learn: machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825– 2830, 2011
  • 36. Evaluation: Dataset §  Audio: §  Number of recordings: 20 Ālap vocal pieces §  Duration of recordings: 1.5 hours §  Number of artists: 8 §  Number of rāgs: 16 §  Type of audio: 15 polyphonic commercial recordings, 5 in-house monophonic recordings** §  Annotations: Musician with > 15 years of training §  Statistics: §  1257 nyās segments §  150 ms to16.7 s §  mean 2.46 s, median 1.47 s. **Openly available under CC license in freesound.org
  • 37. Evaluation: Measures §  Boundary annotations (F-scores) §  Hit rate §  Allowed deviation: 100 ms §  Label annotations (F-scores): §  Pairwise frame clustering method [Levy and Sandler 2008] §  Statistical significance: Mann-Whitney U test (p=0.05) §  Multiple comparison: Holm-Bonferroni method M. Levy and M. Sandler, “Structural segmentaKon of musical audio by constrained clustering,” IEEE TransacKons on Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 318–326, 2008. H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochasKcally larger than the other,” The annals of mathemaKcal staKsKcs, vol. 18, no. 1, pp. 50–60, 1947. S. Holm, “A simple sequenKally rejecKve mulKple test procedure,” Scandinavian journal of staKsKcs, pp. 65– 70, 1979.
  • 38. Evaluation: Baseline Approach §  DTW based kNN classification (k=5) §  Frequently used for time series classification §  Random baselines §  Randomly planting boundaries §  Evenly planting boundaries at every 100 ms * §  Ground truth boundaries, randomly assign class labels X. Xi, E. J. Keogh, C. R. Shelton, L. Wei, and C. A. Ratanamahatana, “Fast Kme series classificaKon using numerosity reducKon,” in Proc. of the Int. Conf. on Ma- chine Learning, 2006, pp. 1033–1040. X. Wang, A. Mueen, H. Ding, G. Trajcevski, P. Scheuermann, and E. J. Keogh, “Experimental com- parison of representaKon methods and distance mea- sures for Kme series data,” Data Mining and Knowl- edge Discovery, vol. 26, no. 2, pp. 275–309, 2013.
  • 39. Results: Nyās Boundary Annotation Feat. DTW Tree KNN NB LR SVM A L 0.356 0.407 0.447 0.248 0.449 0.453 C 0.284 0.394 0.387 0.383 0.389 0.406 L+C 0.289 0.414 0.426 0.409 0.432 0.437 B L 0.524 0.672 0.719 0.491 0.736 0.749 C 0.436 0.629 0.615 0.641 0.621 0.673 L+C 0.446 0.682 0.708 0.591 0.725 0.735 Table 1. F-scores for ny¯as boundary detection using PLS method (A) and the proposed segmentation method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM L 0.553 0.685 0.723 0.621 0.727 0.722 5. CONCLUSI We proposed a metho odies of Hindustani m broad steps: melody cation. For melody s which incorporates d boundary annotations cal, contextual and th that the performance cantly better compar dard dynamic time w est neighbor classifie proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whic the future we plan to mances, which is a co
  • 40. Results: Nyās Boundary Annotation Feat. DTW Tree KNN NB LR SVM A L 0.356 0.407 0.447 0.248 0.449 0.453 C 0.284 0.394 0.387 0.383 0.389 0.406 L+C 0.289 0.414 0.426 0.409 0.432 0.437 B L 0.524 0.672 0.719 0.491 0.736 0.749 C 0.436 0.629 0.615 0.641 0.621 0.673 L+C 0.446 0.682 0.708 0.591 0.725 0.735 Table 1. F-scores for ny¯as boundary detection using PLS method (A) and the proposed segmentation method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM L 0.553 0.685 0.723 0.621 0.727 0.722 5. CONCLUSI We proposed a metho odies of Hindustani m broad steps: melody cation. For melody s which incorporates d boundary annotations cal, contextual and th that the performance cantly better compar dard dynamic time w est neighbor classifie proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whic the future we plan to mances, which is a co
  • 41. Results: Nyās Boundary Annotation Feat. DTW Tree KNN NB LR SVM A L 0.356 0.407 0.447 0.248 0.449 0.453 C 0.284 0.394 0.387 0.383 0.389 0.406 L+C 0.289 0.414 0.426 0.409 0.432 0.437 B L 0.524 0.672 0.719 0.491 0.736 0.749 C 0.436 0.629 0.615 0.641 0.621 0.673 L+C 0.446 0.682 0.708 0.591 0.725 0.735 Table 1. F-scores for ny¯as boundary detection using PLS method (A) and the proposed segmentation method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM L 0.553 0.685 0.723 0.621 0.727 0.722 5. CONCLUSI We proposed a metho odies of Hindustani m broad steps: melody cation. For melody s which incorporates d boundary annotations cal, contextual and th that the performance cantly better compar dard dynamic time w est neighbor classifie proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whic the future we plan to mances, which is a co
  • 42. Results: Nyās Boundary Annotation Feat. DTW Tree KNN NB LR SVM A L 0.356 0.407 0.447 0.248 0.449 0.453 C 0.284 0.394 0.387 0.383 0.389 0.406 L+C 0.289 0.414 0.426 0.409 0.432 0.437 B L 0.524 0.672 0.719 0.491 0.736 0.749 C 0.436 0.629 0.615 0.641 0.621 0.673 L+C 0.446 0.682 0.708 0.591 0.725 0.735 Table 1. F-scores for ny¯as boundary detection using PLS method (A) and the proposed segmentation method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM L 0.553 0.685 0.723 0.621 0.727 0.722 5. CONCLUSI We proposed a metho odies of Hindustani m broad steps: melody cation. For melody s which incorporates d boundary annotations cal, contextual and th that the performance cantly better compar dard dynamic time w est neighbor classifie proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whic the future we plan to mances, which is a co
  • 43. Results: Nyās Boundary Annotation Feat. DTW Tree KNN NB LR SVM A L 0.356 0.407 0.447 0.248 0.449 0.453 C 0.284 0.394 0.387 0.383 0.389 0.406 L+C 0.289 0.414 0.426 0.409 0.432 0.437 B L 0.524 0.672 0.719 0.491 0.736 0.749 C 0.436 0.629 0.615 0.641 0.621 0.673 L+C 0.446 0.682 0.708 0.591 0.725 0.735 Table 1. F-scores for ny¯as boundary detection using PLS method (A) and the proposed segmentation method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM L 0.553 0.685 0.723 0.621 0.727 0.722 5. CONCLUSI We proposed a metho odies of Hindustani m broad steps: melody cation. For melody s which incorporates d boundary annotations cal, contextual and th that the performance cantly better compar dard dynamic time w est neighbor classifie proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whic the future we plan to mances, which is a co
  • 44. Results: Nyās Boundary Annotation Feat. DTW Tree KNN NB LR SVM A L 0.356 0.407 0.447 0.248 0.449 0.453 C 0.284 0.394 0.387 0.383 0.389 0.406 L+C 0.289 0.414 0.426 0.409 0.432 0.437 B L 0.524 0.672 0.719 0.491 0.736 0.749 C 0.436 0.629 0.615 0.641 0.621 0.673 L+C 0.446 0.682 0.708 0.591 0.725 0.735 Table 1. F-scores for ny¯as boundary detection using PLS method (A) and the proposed segmentation method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM L 0.553 0.685 0.723 0.621 0.727 0.722 5. CONCLUSI We proposed a metho odies of Hindustani m broad steps: melody cation. For melody s which incorporates d boundary annotations cal, contextual and th that the performance cantly better compar dard dynamic time w est neighbor classifie proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whic the future we plan to mances, which is a co
  • 45. Results: Nyās Label Annotation NB, LR, SVM) and local (L), contextual (C) and local to- gether with contextual (L+C) features. DTW is the base- line method used for comparison. F-score for the random baseline obtained using RB2 is 0.184. Feat. DTW Tree KNN NB LR SVM A L 0.553 0.685 0.723 0.621 0.727 0.722 C 0.251 0.639 0.631 0.690 0.688 0.674 L+C 0.389 0.694 0.693 0.708 0.722 0.706 B L 0.546 0.708 0.754 0.714 0.749 0.758 C 0.281 0.671 0.611 0.697 0.689 0.697 L+C 0.332 0.672 0.710 0.730 0.743 0.731 Table 2. F-scores for ny¯as and non-ny¯as label annotations task using PLS method (A) and the proposed segmenta- tion method (B). Results are shown for different classifiers (Tree, KNN, NB, LR, SVM) and local (L), contextual (C) and local together with contextual (L+C) features. DTW is the baseline method used for comparison. The best random baseline F-score is 0.153 obtained using RB2. mentation method consistently performs better than PLS. However, the differences are not statistically significant. proposed segmentatio proach based on piece set that includes only form best. However, textual information w curacy. This indicate melodic context whi the future we plan to mances, which is a co sic. We also plan to and different evaluati Acknowledgments This work is partly s Council under the Eu Program, as part of agreement 267583). from Generalitat de C the European Commi and European Social 6. [1] A. D. Patel, Musi UK: Oxford Univ [2] W. S. Rockstro, ers, and J. Rushto
  • 46. Conclusions and Future work §  Proposed segmentation better than PLS method §  Proposed methodology better than standard DTW based kNN classification §  Local features yield highest accuracy §  Contextual features are also important (maybe not complementary to local features) §  Future work §  Perform similar analysis on Bandish performances §  Incorporate Rāga specific knowledge
  • 47. Landmark Detection in Hindustani Music Melodies Sankalp Gulati1, Joan Serrà2, Kaustuv K. Ganguli3 and Xavier Serra1 sankalp.gulati@upf.edu, jserra@iiia.csic.es, kaustuvkanti@ee.iitb.ac.in, xavier.serra@upf.edu 1Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain 2Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Barcelona, Spain 3Indian Institute of Technology Bombay, Mumbai, India SMC-ICMC 2014, Athens, Greece 5