SlideShare a Scribd company logo
1 of 13
Download to read offline
NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany
The NNI Query-by-Example System
for MedialEval 2015
Jingyong Hou1, Van Tung Pham2, Cheung-Chi Leung3, Lei Wang3, Haihua Xu2, Hang Lv1, Lei Xie1,
Zhonghua Fu1, Chongjia Ni3, Xiong Xiao2, Hongjie Chen1, Shaofei Zhang1, Sining Sun1, Yougen Yuan1,
Pengcheng Li1, Tin Lay Nwe3, Sunil Sivadas3, Bin Ma3, Eng Siong Chng2, Haizhou Li2,3
1Northwestern Polytechnical University (NPWU), Xi’an, China
2Nanyang Technological University (NTU), Singapore
3Institute for Infocomm Research (I2R), A*STAR, Singapore
Presented	
  by	
  Cheung-­‐Chi	
  Leung	
  
Ins3tute	
  for	
  Infocomm	
  Research	
  (I2R),	
  A*STAR,	
  
Singapore	
  
1	
  
NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany
System Diagram
2	
  
•  Score-level fusion of 66 systems
from our 3 groups:
–  15 DTW systems from NWPU
–  39 DTW systems from I2R
–  8 DTW systems and 4 SS systems
from NTU
•  Our submitted system involves:
–  DTW mainly on bottleneck features/stacked bottleneck features
–  Symbolic search (SS) using phoneme tokenizers and weighted finite state transducer
(WFST)
Highlight	
  of	
  this	
  year’s	
  system:	
  
-­‐  Noise	
  robustness	
  techniques	
  to	
  deal	
  with	
  noisy	
  
data	
  of	
  this	
  year	
  
query	
  audio	
   search	
  audio	
  
tokenizer	
   tokenizer	
   tokenizer	
   tokenizer	
  	
  ...	
   	
  ...	
  
DTW	
   DTW	
   SS	
   SS	
  	
  ...	
   	
  ...	
  
intra-­‐group	
  and	
  
inter-­‐group	
  fusion	
  
results	
  
NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany
Training Resources for Tokenizers
•  Tokenizers are used to convert the audio signal into
•  bottleneck features (BNF)/stacked bottleneck features (SBNF)/posteriorgrams
for DTW systems
•  phone sequences/lattices for SS systems
3	
  
Training	
  corpora	
  or	
  phoneme	
  recognizers	
  	
   NWPU	
   I2R	
   NTU	
  
Switchboard	
  (English)	
   √	
   √	
   √√	
  
Development	
  languages	
  in	
  
OpenKWS	
  
Cantonese	
   √	
   √	
   √	
  
Pashto	
   √	
   √	
   √	
  
Tagalog	
   √	
   √	
   √	
  
Tamil	
   √	
   √	
  
Turkish	
   √	
   √	
   √	
  
Vietnamese	
   √	
   √	
  
Fisher	
  Spanish	
   √	
  
HKUST	
  Mandarin	
   √	
  
CallHome	
  EgypRan	
  Arabic	
   √	
  
SEAME	
  (mixed	
  Mandarin-­‐English)	
   √	
  
MASS	
  (Malay)	
   √	
  
BUT	
  phoneme	
  recognizers	
  (Czech,	
  Hungarian	
  
and	
  Russian)	
  
√	
   √ used in SS system(s)
√ used in DTW system(s)
NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany
DTW Systems
•  Exact matching systems: conventional subsequence DTW; Good
for type 1 queries
•  Approximate matching systems to deal with type 2&3 queries
•  Use partial feature segment of query for matching
•  1) Fixed-window based1:
•  Segments of 70-90 frames shifted by 5-10 frames
•  2) Phoneme-sequence based2:
•  Segments formed by consecutive 8 phonemes (phoneme
boundaries derived from phoneme recognizers)
1 P. Yang et al, “The NNI query-by-example system for MediaEval 2014” in Proc. MediaEval 2014 workshop, pp. 16-17.
2 J. Hou et al, “Spoken term detection technology based on DTW,” Journal of Tsinghua University (Sci and Tech), 2015
(to be published).
4	
  
NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany
Exact matching and approximate
matching DTW Systems
•  Fused results of 13 exact matching and 13 approximate matching
(fixed-window based) DTW systems (from the 13 SBNF/BNF
tokenizers)
5	
  
	
  
	
  
minCnxe	
  (maxTWV)	
  on	
  dev	
  
Exact	
  matching	
  
DTW	
  
Approx.	
  
matching	
  
DTW	
  
Exact+Approx.	
  
Matching	
  
DTW	
  
Type	
  1	
  queries	
   0.700	
  (0.293)	
   0.711	
  (0.312)	
   0.685	
  (0.314)	
  
Type	
  2	
  queries	
   0.893	
  (0.083)	
   0.853	
  (0.112)	
   0.852	
  (0.122)	
  
Type	
  3	
  queries	
   0.874	
  (0.124)	
   0.867	
  (0.120)	
   0.856	
  (0.135)	
  
All	
  queries	
   0.844	
  (0.166)	
   0.828	
  (0.179)	
   0.817	
  (0.190)	
  
NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany
Adding Noise to Training Data for
Tokenizers
•  Precautions:
–  Signal-to-noise (SNR) distribution of the noise-added training data
should be similar to that of development data
–  Only portion (~50%) of training data is added with noise (as not all
utterances in this year are highly noisy)
6	
  
QUESST	
  
dev	
  data	
  
training	
  data	
  
of	
  a	
  tokenizer	
  
tokenizer	
  
noise	
  
segment	
  
noise	
  
segment	
  
extracRon	
  
model	
  
training	
  
NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany
Adding Noise to Training Data for
Tokenizers
•  Results of an exact matching DTW system using
SBNF (tokenizer trained using Switchboard corpus)
7	
  
minCnxe	
  (maxTWV)	
  on	
  dev	
  
Baseline	
  (orig.	
  
Switchboard	
  data)	
  
baseline+noise1	
   baseline+noise2	
  
Type	
  1	
  queries	
   0.762	
  (0.227)	
   0.733	
  (0.258)	
   0.735	
  (0.270)	
  
NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany
Speech Enhancement
•  Wiener filter is used to reduce noise in utterances1
•  Initial results show this leads to better DTW search performance for some
tokenizers
•  Further investigation will be conducted
8	
  
minCnxe	
  (maxTWV)	
  of	
  exact	
  matching	
  DTW	
  
systems	
  on	
  type	
  1	
  dev	
  queries	
  
baseline	
   w/	
  speech	
  enhancement	
  
Switchboard	
  monophone	
  
SBNF	
  
0.894	
  (0.097)	
   0.870	
  (0.110)	
  
BUT-­‐CZ	
  posteriorgrams	
   0.931	
  (0.018)	
   0.872	
  (0.103)	
  
BUT-­‐HU	
  posteriorgrams	
   0.909	
  (0.070)	
   0.857	
  (0.114)	
  
1J.	
  Chen,	
  J.	
  Benesty,	
  Y.	
  Huang,	
  and	
  T.	
  Gaensle,	
  "On	
  single-­‐channel	
  noise	
  reducRon	
  in	
  the	
  	
  	
  	
  
	
  	
  Rme	
  domain,"	
  in	
  Proc	
  ICASSP,	
  2011,	
  pp.277-­‐280.	
  
NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany
Symbolic Search Systems
•  Symbolic search system with phoneme sequence approximate matching1 is used to
facilitate type 2&3 queries
•  Key steps:
•  Represent search audio by phone lattices, index it in WFST format
•  Represent query audio by N-best phone sequences
•  Extract partial phone sequences of queries
•  Search by composition of query and search WFSTs
9	
  
1H. Xu et al, “Language independent query-by-example spoken term detection using n-best phone sequences and
partial matching,” in Proc. ICASSP, 2015, 5191-5195.
NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany
Symbolic Search Systems	
  
•  Further improvement by fusing 4 SS systems and 8 DTW
system (4 exact matching and 4 fixed-window approximate
matching)
–  Different types of systems use the same 4 tokenizers
10	
  
	
  
	
  
minCnxe	
  (maxTWV)	
  on	
  dev	
  
DTW	
  (including	
  
exact+approx.)	
  
SS	
   DTW	
  +	
  SS	
  
relaRve	
  
improvement	
  
Type	
  1	
  queries	
   0.683	
  (0.321)	
   0.871	
  (0.150)	
   0.680	
  (0.331)	
   0.4%	
  (3.1%)	
  
Type	
  2	
  queries	
   0.878	
  (0.098)	
   0.902	
  (0.068)	
   0.831	
  (0.168)	
   5.4%	
  (71.4%)	
  
Type	
  3	
  queries	
   0.878	
  (0.113)	
   0.934	
  (0.072)	
   0.854	
  (0.174)	
   2.7%	
  (54.0%)	
  
All	
  queries	
   0.836	
  (0.177)	
   0.910	
  (0.094)	
   0.809	
  (0.224)	
   3.2%(26.5%)	
  
NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany
Results	
  
•  Each group experienced performance gain by:
–  fusing exact-matching and approximate-matching systems
–  fusing systems with systems using different speech preprocessing
techniques (e.g. noise extraction, speech enhancement or VAD)
–  fusing systems with different tokenizers
•  Further performance gain by inter-group fusion
•  Compared with our single best exact matching DTW systems,
system fusion brings around 13.5% relative improvement in
minCnxe (115% in maxTWV) on all query types in dev
11	
  
NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany
Conclusion
12	
  
•  We have described the NNI system for the QUESST 2015
•  Noise robustness techniques are used to deal with the noise
condition of data, and lead to better search performance
•  Same observations are obtained as last year:
•  Complementary DTW and SS systems
•  Complementary exact matching and approximate matching
systems
•  Further investigation will be conducted for speech
enhancement techniques, and the gain provided by BNF and
SBNF
NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany
Thanks !
13	
  

More Related Content

Similar to MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015

Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification system
niranjan kumar
 
Services for preclinical_and_clinical_trials
Services for preclinical_and_clinical_trialsServices for preclinical_and_clinical_trials
Services for preclinical_and_clinical_trials
TheSiestaGroup
 
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
niranjan kumar
 

Similar to MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015 (20)

Cuhk system 14oct_2
Cuhk system 14oct_2Cuhk system 14oct_2
Cuhk system 14oct_2
 
Cuhk system 14oct
Cuhk system 14octCuhk system 14oct
Cuhk system 14oct
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification system
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification system
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” TaskThe Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
Services for preclinical_and_clinical_trials
Services for preclinical_and_clinical_trialsServices for preclinical_and_clinical_trials
Services for preclinical_and_clinical_trials
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
 
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
 
20211008 修論中間発表
20211008 修論中間発表20211008 修論中間発表
20211008 修論中間発表
 
Deep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event DetectionDeep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event Detection
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
 
Autonomous Learning for Autonomous Systems, by Prof. Plamen Angelov
Autonomous Learning for Autonomous Systems, by Prof. Plamen AngelovAutonomous Learning for Autonomous Systems, by Prof. Plamen Angelov
Autonomous Learning for Autonomous Systems, by Prof. Plamen Angelov
 
Test vector compression in Digital Testing
Test vector compression in Digital Testing Test vector compression in Digital Testing
Test vector compression in Digital Testing
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
How much Semantic Data on Small Devices?
How much Semantic Data on Small Devices?How much Semantic Data on Small Devices?
How much Semantic Data on Small Devices?
 
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
 

More from multimediaeval

Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
multimediaeval
 

More from multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 

Recently uploaded

會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
中 央社
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
EADTU
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
AnaAcapella
 

Recently uploaded (20)

會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
 
male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
 
An Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppAn Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge App
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptx
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
Book Review of Run For Your Life Powerpoint
Book Review of Run For Your Life PowerpointBook Review of Run For Your Life Powerpoint
Book Review of Run For Your Life Powerpoint
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
 
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
 

MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015

  • 1. NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany The NNI Query-by-Example System for MedialEval 2015 Jingyong Hou1, Van Tung Pham2, Cheung-Chi Leung3, Lei Wang3, Haihua Xu2, Hang Lv1, Lei Xie1, Zhonghua Fu1, Chongjia Ni3, Xiong Xiao2, Hongjie Chen1, Shaofei Zhang1, Sining Sun1, Yougen Yuan1, Pengcheng Li1, Tin Lay Nwe3, Sunil Sivadas3, Bin Ma3, Eng Siong Chng2, Haizhou Li2,3 1Northwestern Polytechnical University (NPWU), Xi’an, China 2Nanyang Technological University (NTU), Singapore 3Institute for Infocomm Research (I2R), A*STAR, Singapore Presented  by  Cheung-­‐Chi  Leung   Ins3tute  for  Infocomm  Research  (I2R),  A*STAR,   Singapore   1  
  • 2. NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany System Diagram 2   •  Score-level fusion of 66 systems from our 3 groups: –  15 DTW systems from NWPU –  39 DTW systems from I2R –  8 DTW systems and 4 SS systems from NTU •  Our submitted system involves: –  DTW mainly on bottleneck features/stacked bottleneck features –  Symbolic search (SS) using phoneme tokenizers and weighted finite state transducer (WFST) Highlight  of  this  year’s  system:   -­‐  Noise  robustness  techniques  to  deal  with  noisy   data  of  this  year   query  audio   search  audio   tokenizer   tokenizer   tokenizer   tokenizer    ...    ...   DTW   DTW   SS   SS    ...    ...   intra-­‐group  and   inter-­‐group  fusion   results  
  • 3. NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany Training Resources for Tokenizers •  Tokenizers are used to convert the audio signal into •  bottleneck features (BNF)/stacked bottleneck features (SBNF)/posteriorgrams for DTW systems •  phone sequences/lattices for SS systems 3   Training  corpora  or  phoneme  recognizers     NWPU   I2R   NTU   Switchboard  (English)   √   √   √√   Development  languages  in   OpenKWS   Cantonese   √   √   √   Pashto   √   √   √   Tagalog   √   √   √   Tamil   √   √   Turkish   √   √   √   Vietnamese   √   √   Fisher  Spanish   √   HKUST  Mandarin   √   CallHome  EgypRan  Arabic   √   SEAME  (mixed  Mandarin-­‐English)   √   MASS  (Malay)   √   BUT  phoneme  recognizers  (Czech,  Hungarian   and  Russian)   √   √ used in SS system(s) √ used in DTW system(s)
  • 4. NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany DTW Systems •  Exact matching systems: conventional subsequence DTW; Good for type 1 queries •  Approximate matching systems to deal with type 2&3 queries •  Use partial feature segment of query for matching •  1) Fixed-window based1: •  Segments of 70-90 frames shifted by 5-10 frames •  2) Phoneme-sequence based2: •  Segments formed by consecutive 8 phonemes (phoneme boundaries derived from phoneme recognizers) 1 P. Yang et al, “The NNI query-by-example system for MediaEval 2014” in Proc. MediaEval 2014 workshop, pp. 16-17. 2 J. Hou et al, “Spoken term detection technology based on DTW,” Journal of Tsinghua University (Sci and Tech), 2015 (to be published). 4  
  • 5. NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany Exact matching and approximate matching DTW Systems •  Fused results of 13 exact matching and 13 approximate matching (fixed-window based) DTW systems (from the 13 SBNF/BNF tokenizers) 5       minCnxe  (maxTWV)  on  dev   Exact  matching   DTW   Approx.   matching   DTW   Exact+Approx.   Matching   DTW   Type  1  queries   0.700  (0.293)   0.711  (0.312)   0.685  (0.314)   Type  2  queries   0.893  (0.083)   0.853  (0.112)   0.852  (0.122)   Type  3  queries   0.874  (0.124)   0.867  (0.120)   0.856  (0.135)   All  queries   0.844  (0.166)   0.828  (0.179)   0.817  (0.190)  
  • 6. NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany Adding Noise to Training Data for Tokenizers •  Precautions: –  Signal-to-noise (SNR) distribution of the noise-added training data should be similar to that of development data –  Only portion (~50%) of training data is added with noise (as not all utterances in this year are highly noisy) 6   QUESST   dev  data   training  data   of  a  tokenizer   tokenizer   noise   segment   noise   segment   extracRon   model   training  
  • 7. NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany Adding Noise to Training Data for Tokenizers •  Results of an exact matching DTW system using SBNF (tokenizer trained using Switchboard corpus) 7   minCnxe  (maxTWV)  on  dev   Baseline  (orig.   Switchboard  data)   baseline+noise1   baseline+noise2   Type  1  queries   0.762  (0.227)   0.733  (0.258)   0.735  (0.270)  
  • 8. NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany Speech Enhancement •  Wiener filter is used to reduce noise in utterances1 •  Initial results show this leads to better DTW search performance for some tokenizers •  Further investigation will be conducted 8   minCnxe  (maxTWV)  of  exact  matching  DTW   systems  on  type  1  dev  queries   baseline   w/  speech  enhancement   Switchboard  monophone   SBNF   0.894  (0.097)   0.870  (0.110)   BUT-­‐CZ  posteriorgrams   0.931  (0.018)   0.872  (0.103)   BUT-­‐HU  posteriorgrams   0.909  (0.070)   0.857  (0.114)   1J.  Chen,  J.  Benesty,  Y.  Huang,  and  T.  Gaensle,  "On  single-­‐channel  noise  reducRon  in  the            Rme  domain,"  in  Proc  ICASSP,  2011,  pp.277-­‐280.  
  • 9. NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany Symbolic Search Systems •  Symbolic search system with phoneme sequence approximate matching1 is used to facilitate type 2&3 queries •  Key steps: •  Represent search audio by phone lattices, index it in WFST format •  Represent query audio by N-best phone sequences •  Extract partial phone sequences of queries •  Search by composition of query and search WFSTs 9   1H. Xu et al, “Language independent query-by-example spoken term detection using n-best phone sequences and partial matching,” in Proc. ICASSP, 2015, 5191-5195.
  • 10. NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany Symbolic Search Systems   •  Further improvement by fusing 4 SS systems and 8 DTW system (4 exact matching and 4 fixed-window approximate matching) –  Different types of systems use the same 4 tokenizers 10       minCnxe  (maxTWV)  on  dev   DTW  (including   exact+approx.)   SS   DTW  +  SS   relaRve   improvement   Type  1  queries   0.683  (0.321)   0.871  (0.150)   0.680  (0.331)   0.4%  (3.1%)   Type  2  queries   0.878  (0.098)   0.902  (0.068)   0.831  (0.168)   5.4%  (71.4%)   Type  3  queries   0.878  (0.113)   0.934  (0.072)   0.854  (0.174)   2.7%  (54.0%)   All  queries   0.836  (0.177)   0.910  (0.094)   0.809  (0.224)   3.2%(26.5%)  
  • 11. NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany Results   •  Each group experienced performance gain by: –  fusing exact-matching and approximate-matching systems –  fusing systems with systems using different speech preprocessing techniques (e.g. noise extraction, speech enhancement or VAD) –  fusing systems with different tokenizers •  Further performance gain by inter-group fusion •  Compared with our single best exact matching DTW systems, system fusion brings around 13.5% relative improvement in minCnxe (115% in maxTWV) on all query types in dev 11  
  • 12. NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany Conclusion 12   •  We have described the NNI system for the QUESST 2015 •  Noise robustness techniques are used to deal with the noise condition of data, and lead to better search performance •  Same observations are obtained as last year: •  Complementary DTW and SS systems •  Complementary exact matching and approximate matching systems •  Further investigation will be conducted for speech enhancement techniques, and the gain provided by BNF and SBNF
  • 13. NNI QbE system, MedialEval 2015 Workshop, Wurzen, Germany Thanks ! 13