Nni v7

The NNI QbE-STD System for
MedialEval 2014
Peng Yang1, Haihua Xu2, Xiong Xiao2, Lei Xie1, Cheung-Chi Leung3
Hongjie Chen1, Jia Yu1, Hang Lv1, Lei Wang3, Su Jun Leow2
Bin Ma3, Eng Siong Chng1, Haizhou Li2,3
1Northwestern Polytechnical University, Xi’an, China
2Nanyang Technological University, Singapore
3Institute for Infocomm Research, A STAR, Singapore
Presented
by

Haihua
Xu

Temasek
Laboratories@NTU,
Singapore

1

NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

System Diagram
Two groups of subsystems are used:
•  Subsequence DTW-based template matching on Gaussian/phone posteriorgram
and bottleneck features.
•  Symbolic search (SS) using phone tokenizer and weighted finite state transducer
(WFST)
2

Tokenizers
Tokenizers are used to convert the audio signal into
•  posteriorgram or bottleneck features for DTW based systems
•  phone sequences/lattices for SS systems
3


DTW-based Systems
•  Full sequence matching1: conventional subsequence DTW. Good
for type 1 queries.
•  Used partial matching for type 2&3 queries.
•  Use partial feature segment of query for matching
•  Segments are 600ms long and shifted by 50ms.
•  Improved performance for Type 3 queries.
•  9 DTW systems
•  5 using full matching
•  4 using partial matching
1Yang P. et al, “Intrinsic spectral analysis based on temporal context features for query-by-example spoken term
detection ”, in Proc. INTERSPEECH, 2014
4


Why Symbolic Search (SS)
•  DTW is effective1, but it is
•  computationally expensive and difficult to be indexed,
•  not easy to handle inexact match.
•  Symbolic search allows indexing and fast search, e.g. using weighted
finite state transducer (WFST).
1Anguera X., Rodrigues-Fuentes L.J., Szoke I., Buzo A., and Metze F., “Query by example search on speech at mediaeval
2014”, in Working Notes Proceedings of the Mediaeval 2014 workshop, Barcelona, Spain, Oct. 16-17
5


Symbolic Search System
6

•  Limitations of symbolic search for QbE-STD:
•  Must use phone recognizers of other languages for
tokenization à poor symbolic representation.
•  Inconsistent phone representation between query
and search audio.

7

Limitation of Conventional Symbolic Search
•  Full – Full symbolic search method
•  pMiss – Miss rate
•  pFA – False alarm rate
•  ATWV – Actual Term Weighted Value
As query length increases,
•  Missing rate approaches 100%
•  False alarm rate approaches 0
•  ATWV approaches 0

8

Partial Phone Sequence Matching
Partial Matching Steps
•  If a query phone hypothesis is longer
than 6, get all partial sequences of the
hypothesis.
•  Use all the unique partial sequences to
search.
•  Search results are pooled and all
treated as the match of the query.
•  Score normalization is applied, and
decision is made.
•  High missing rate of long queries can be reduced by simply shorten the query
representation.
•  Rationale: let the system return something first, and then decide which is true match.

9

Effectiveness of Partial Phone Sequence
Matching
Full – Full symbolic search method
Partial – Partial symbolic search method
pMiss – Miss rate
pFA – False alarm rate
ATWV – Actual Term Weighted Value
For queries longer than 6 phones:
•  Missing rate reduced
•  False alarm increased
•  ATWV increased.
If beta is not 66.7, the best trade-
off point of pMiss and pFA will
change.

10

Results
•  For type 1 query, the partial SS method is
obviously worse than DTW method.
•  But for type 2 and 3 queries, the partial SS
method is comparable with DTW one.
•  For type 3 query, the partial SS method is
significantly better than the DTW one in terms
MTWV.
•  The two methods are very complementary.

Conclusion
11
We have described the NNI system for the QUESST 2014 Task
•  DTW based subsystem
•  Symbolic search subsystem
•  Why conventional SS system is not working, especially for long queries
•  Partial phone sequence SS method is proposed
•  The NNI system results are reported
In future, research will be focused on reducing the false alarms introduced by the
partial matching method.

Thanks !
12

Nni v7

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to Nni v7

Similar to Nni v7 (20)

More from multimediaeval

More from multimediaeval (20)

Nni v7