In this paper we describe the system proposed by NNI (NWPUNTU-I2R) team for the QUESST task within the Mediaeval 2014 evaluation. To solve the problem, we used both dynamic time warping (DTW) and symbolic search (SS) based approaches. The DTW system performs template matching using subsequence DTW algorithm and posterior representations. The symbolic search is performed on phone sequences generated by phone recognizers. For both symbolic and DTW search, partial sequence matching is performed to reduce missing rate, especially for query type 2 and 3. After fusing 9 DTW systems, 7 symbolic systems, and query length side information, we obtained 0.6023 actual normalized cross entropy (actCnxe) for all queries combined. For type 3 complex queries, we achieved 0.7252 actCnxe.
http://ceur-ws.org/Vol-1263/mediaeval2014_submission_69.pdf
The NNI Query-by-Example System for MediaEval 2014
1. The NNI QbE-STD System for
MedialEval 2014
Peng Yang1, Haihua Xu2, Xiong Xiao2, Lei Xie1, Cheung-Chi Leung3
Hongjie Chen1, Jia Yu1, Hang Lv1, Lei Wang3, Su Jun Leow2
Bin Ma3, Eng Siong Chng1, Haizhou Li2,3
1Northwestern Polytechnical University, Xi’an, China
2Nanyang Technological University, Singapore
3Institute for Infocomm Research, A STAR, Singapore
Presented by Haihua Xu
Temasek Laboratories@NTU, Singapore
1
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
2. System Diagram
Two groups of subsystems are used:
• Subsequence DTW-based template matching on Gaussian/phone posteriorgram
and bottleneck features.
• Symbolic search (SS) using phone tokenizer and weighted finite state transducer
(WFST)
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
2
3. Tokenizers
Tokenizers are used to convert the audio signal into
• posteriorgram or bottleneck features for DTW based systems
• phone sequences/lattices for SS systems
3
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
4. DTW-based Systems
• Full sequence matching1: conventional subsequence DTW. Good
for type 1 queries.
• Used partial matching for type 2&3 queries.
• Use partial feature segment of query for matching
• Segments are 600ms long and shifted by 50ms.
• Improved performance for Type 3 queries.
• 9 DTW systems
• 5 using full matching
• 4 using partial matching
1Yang P. et al, “Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection
”, in Proc. INTERSPEECH, 2014
4
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
5. Why Symbolic Search (SS)
• DTW is effective1, but it is
• computationally expensive and difficult to be indexed,
• not easy to handle inexact match.
• Symbolic search allows indexing and fast search, e.g. using weighted
finite state transducer (WFST).
1Anguera X., Rodrigues-Fuentes L.J., Szoke I., Buzo A., and Metze F., “Query by example search on speech at mediaeval
2014”, in Working Notes Proceedings of the Mediaeval 2014 workshop, Barcelona, Spain, Oct. 16-17
5
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
6. Symbolic Search System
6
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
• Limitations of symbolic search for QbE-STD:
• Must use phone recognizers of other languages for
tokenization poor symbolic representation.
• Inconsistent phone representation between query
and search audio.
7. 7
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
Limitation of Conventional Symbolic Search
• Full – Full symbolic search method
• pMiss – Miss rate
• pFA – False alarm rate
• ATWV – Actual Term Weighted Value
As query length increases,
• Missing rate approaches 100%
• False alarm rate approaches 0
• ATWV approaches 0
8. 8
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
Partial Phone Sequence Matching
Partial Matching Steps
• If a query phone hypothesis is longer
than 6, get all partial sequences of the
hypothesis.
• Use all the unique partial sequences to
search.
• Search results are pooled and all
treated as the match of the query.
• Score normalization is applied, and
decision is made.
• High missing rate of long queries can be reduced by simply shorten the query
representation.
• Rationale: let the system return something first, and then decide which is true match.
9. 9
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
Effectiveness of Partial Phone Sequence
Matching
Full – Full symbolic search method
Partial – Partial symbolic search method
pMiss – Miss rate
pFA – False alarm rate
ATWV – Actual Term Weighted Value
For queries longer than 6 phones:
• Missing rate reduced
• False alarm increased
• ATWV increased.
If beta is not 66.7, the best trade-
off point of pMiss and pFA will
change.
10. 10
Results
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
• For type 1 query, the partial SS method is
obviously worse than DTW method.
• But for type 2 and 3 queries, the partial SS
method is comparable with DTW one.
• For type 3 query, the partial SS method is
significantly better than the DTW one in terms
MTWV.
• The two methods are very complementary.
11. Conclusion
11NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
We have described the NNI system for the QUESST 2014 Task
• DTW based subsystem
• Symbolic search subsystem
• Why conventional SS system is not working, especially for long queries
• Partial phone sequence SS method is proposed
• The NNI system results are reported
In future, research will be focused on reducing the false alarms introduced by the
partial matching method.