SlideShare a Scribd company logo
1 of 12
Download to read offline
The NNI QbE-STD System for
MedialEval 2014
Peng Yang1, Haihua Xu2, Xiong Xiao2, Lei Xie1, Cheung-Chi Leung3
Hongjie Chen1, Jia Yu1, Hang Lv1, Lei Wang3, Su Jun Leow2
Bin Ma3, Eng Siong Chng1, Haizhou Li2,3
1Northwestern Polytechnical University, Xi’an, China
2Nanyang Technological University, Singapore
3Institute for Infocomm Research, A STAR, Singapore
Presented	
  by	
  	
  Haihua	
  Xu	
  
Temasek	
  Laboratories@NTU,	
  Singapore	
  
1	
  
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
System Diagram
Two groups of subsystems are used:
•  Subsequence DTW-based template matching on Gaussian/phone posteriorgram
and bottleneck features.
•  Symbolic search (SS) using phone tokenizer and weighted finite state transducer
(WFST)
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
2	
  
Tokenizers
Tokenizers are used to convert the audio signal into
•  posteriorgram or bottleneck features for DTW based systems
•  phone sequences/lattices for SS systems
3	
  
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
DTW-based Systems
•  Full sequence matching1: conventional subsequence DTW. Good
for type 1 queries.
•  Used partial matching for type 2&3 queries.
•  Use partial feature segment of query for matching
•  Segments are 600ms long and shifted by 50ms.
•  Improved performance for Type 3 queries.
•  9 DTW systems
•  5 using full matching
•  4 using partial matching
1Yang P. et al, “Intrinsic spectral analysis based on temporal context features for query-by-example spoken term
detection ”, in Proc. INTERSPEECH, 2014
4	
  
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
Why Symbolic Search (SS)
•  DTW is effective1, but it is
•  computationally expensive and difficult to be indexed,
•  not easy to handle inexact match.
•  Symbolic search allows indexing and fast search, e.g. using weighted
finite state transducer (WFST).
1Anguera X., Rodrigues-Fuentes L.J., Szoke I., Buzo A., and Metze F., “Query by example search on speech at mediaeval
2014”, in Working Notes Proceedings of the Mediaeval 2014 workshop, Barcelona, Spain, Oct. 16-17
5	
  
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
Symbolic Search System
6	
  
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
•  Limitations of symbolic search for QbE-STD:
•  Must use phone recognizers of other languages for
tokenization à poor symbolic representation.
•  Inconsistent phone representation between query
and search audio.
7	
  
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
Limitation of Conventional Symbolic Search
•  Full – Full symbolic search method
•  pMiss – Miss rate
•  pFA – False alarm rate
•  ATWV – Actual Term Weighted Value
As query length increases,
•  Missing rate approaches 100%
•  False alarm rate approaches 0
•  ATWV approaches 0
8	
  
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
Partial Phone Sequence Matching
Partial Matching Steps
•  If a query phone hypothesis is longer
than 6, get all partial sequences of the
hypothesis.
•  Use all the unique partial sequences to
search.
•  Search results are pooled and all
treated as the match of the query.
•  Score normalization is applied, and
decision is made.
•  High missing rate of long queries can be reduced by simply shorten the query
representation.
•  Rationale: let the system return something first, and then decide which is true match.
9	
  
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
Effectiveness of Partial Phone Sequence
Matching
Full – Full symbolic search method
Partial – Partial symbolic search method
pMiss – Miss rate
pFA – False alarm rate
ATWV – Actual Term Weighted Value
For queries longer than 6 phones:
•  Missing rate reduced
•  False alarm increased
•  ATWV increased.
If beta is not 66.7, the best trade-
off point of pMiss and pFA will
change.
10	
  
Results
NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
•  For type 1 query, the partial SS method is
obviously worse than DTW method.
•  But for type 2 and 3 queries, the partial SS
method is comparable with DTW one.
•  For type 3 query, the partial SS method is
significantly better than the DTW one in terms
MTWV.
•  The two methods are very complementary.
Conclusion
11	
  NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
We have described the NNI system for the QUESST 2014 Task
•  DTW based subsystem
•  Symbolic search subsystem
•  Why conventional SS system is not working, especially for long queries
•  Partial phone sequence SS method is proposed
•  The NNI system results are reported
In future, research will be focused on reducing the false alarms introduced by the
partial matching method.
Thanks !
12	
  NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona

More Related Content

Viewers also liked

MediaEval 2015 - The Placing Task at MediaEval 2015
MediaEval 2015 - The Placing Task at MediaEval 2015MediaEval 2015 - The Placing Task at MediaEval 2015
MediaEval 2015 - The Placing Task at MediaEval 2015multimediaeval
 
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...multimediaeval
 
MediaEval 2015 - TUW @ MediaEval 2015 Retrieving Diverse Social Images Task
MediaEval 2015 - TUW @ MediaEval 2015 Retrieving Diverse Social Images TaskMediaEval 2015 - TUW @ MediaEval 2015 Retrieving Diverse Social Images Task
MediaEval 2015 - TUW @ MediaEval 2015 Retrieving Diverse Social Images Taskmultimediaeval
 
MediaEval 2015 - EURECOM @ SAVA2015: Visual Features for Multimedia Search
MediaEval 2015 - EURECOM @ SAVA2015: Visual Features for Multimedia SearchMediaEval 2015 - EURECOM @ SAVA2015: Visual Features for Multimedia Search
MediaEval 2015 - EURECOM @ SAVA2015: Visual Features for Multimedia Searchmultimediaeval
 
MediaEval 2015 - Affective Impact of Movies: Task Overview and Results
MediaEval 2015 - Affective Impact of Movies: Task Overview and ResultsMediaEval 2015 - Affective Impact of Movies: Task Overview and Results
MediaEval 2015 - Affective Impact of Movies: Task Overview and Resultsmultimediaeval
 
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 2015
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 2015MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 2015
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 2015multimediaeval
 
MediaEval 2015 - Time-continuous estimation of real-valued dimensions of emot...
MediaEval 2015 - Time-continuous estimation of real-valued dimensions of emot...MediaEval 2015 - Time-continuous estimation of real-valued dimensions of emot...
MediaEval 2015 - Time-continuous estimation of real-valued dimensions of emot...multimediaeval
 
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images TaskMediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Taskmultimediaeval
 
MediaEval 2016 - UNIFESP Predicting Media Interestingness Task
MediaEval 2016 - UNIFESP Predicting Media Interestingness TaskMediaEval 2016 - UNIFESP Predicting Media Interestingness Task
MediaEval 2016 - UNIFESP Predicting Media Interestingness Taskmultimediaeval
 
MediaEval 2015 - Automatically Estimating Emotion in Music with Deep Long-Sho...
MediaEval 2015 - Automatically Estimating Emotion in Music with Deep Long-Sho...MediaEval 2015 - Automatically Estimating Emotion in Music with Deep Long-Sho...
MediaEval 2015 - Automatically Estimating Emotion in Music with Deep Long-Sho...multimediaeval
 
MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challenge
MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery ChallengeMediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challenge
MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challengemultimediaeval
 
MediaEval 2016 - RECOD at Placing Task
MediaEval 2016 - RECOD at Placing TaskMediaEval 2016 - RECOD at Placing Task
MediaEval 2016 - RECOD at Placing Taskmultimediaeval
 

Viewers also liked (12)

MediaEval 2015 - The Placing Task at MediaEval 2015
MediaEval 2015 - The Placing Task at MediaEval 2015MediaEval 2015 - The Placing Task at MediaEval 2015
MediaEval 2015 - The Placing Task at MediaEval 2015
 
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...
 
MediaEval 2015 - TUW @ MediaEval 2015 Retrieving Diverse Social Images Task
MediaEval 2015 - TUW @ MediaEval 2015 Retrieving Diverse Social Images TaskMediaEval 2015 - TUW @ MediaEval 2015 Retrieving Diverse Social Images Task
MediaEval 2015 - TUW @ MediaEval 2015 Retrieving Diverse Social Images Task
 
MediaEval 2015 - EURECOM @ SAVA2015: Visual Features for Multimedia Search
MediaEval 2015 - EURECOM @ SAVA2015: Visual Features for Multimedia SearchMediaEval 2015 - EURECOM @ SAVA2015: Visual Features for Multimedia Search
MediaEval 2015 - EURECOM @ SAVA2015: Visual Features for Multimedia Search
 
MediaEval 2015 - Affective Impact of Movies: Task Overview and Results
MediaEval 2015 - Affective Impact of Movies: Task Overview and ResultsMediaEval 2015 - Affective Impact of Movies: Task Overview and Results
MediaEval 2015 - Affective Impact of Movies: Task Overview and Results
 
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 2015
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 2015MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 2015
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 2015
 
MediaEval 2015 - Time-continuous estimation of real-valued dimensions of emot...
MediaEval 2015 - Time-continuous estimation of real-valued dimensions of emot...MediaEval 2015 - Time-continuous estimation of real-valued dimensions of emot...
MediaEval 2015 - Time-continuous estimation of real-valued dimensions of emot...
 
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images TaskMediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task
 
MediaEval 2016 - UNIFESP Predicting Media Interestingness Task
MediaEval 2016 - UNIFESP Predicting Media Interestingness TaskMediaEval 2016 - UNIFESP Predicting Media Interestingness Task
MediaEval 2016 - UNIFESP Predicting Media Interestingness Task
 
MediaEval 2015 - Automatically Estimating Emotion in Music with Deep Long-Sho...
MediaEval 2015 - Automatically Estimating Emotion in Music with Deep Long-Sho...MediaEval 2015 - Automatically Estimating Emotion in Music with Deep Long-Sho...
MediaEval 2015 - Automatically Estimating Emotion in Music with Deep Long-Sho...
 
MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challenge
MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery ChallengeMediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challenge
MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challenge
 
MediaEval 2016 - RECOD at Placing Task
MediaEval 2016 - RECOD at Placing TaskMediaEval 2016 - RECOD at Placing Task
MediaEval 2016 - RECOD at Placing Task
 

Similar to Nni v7

MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015multimediaeval
 
Deep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionDeep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionSai Kiran Kadam
 
Deep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event DetectionDeep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event DetectionSai Kiran Kadam
 
160628 giab for festival of genomics
160628 giab for festival of genomics160628 giab for festival of genomics
160628 giab for festival of genomicsGenomeInABottle
 
Giab jan2016 analysis team breakout summary
Giab jan2016 analysis team breakout summaryGiab jan2016 analysis team breakout summary
Giab jan2016 analysis team breakout summaryGenomeInABottle
 
Text Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.pptText Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.pptGrace136708
 
A Survey Paper on Detection of Voice Pathology Using Machine Learning
A Survey Paper on Detection of Voice Pathology Using Machine LearningA Survey Paper on Detection of Voice Pathology Using Machine Learning
A Survey Paper on Detection of Voice Pathology Using Machine LearningIRJET Journal
 
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014multimediaeval
 
Donald K. - Innovation in molecular diagnosis, next generation sequencing and...
Donald K. - Innovation in molecular diagnosis, next generation sequencing and...Donald K. - Innovation in molecular diagnosis, next generation sequencing and...
Donald K. - Innovation in molecular diagnosis, next generation sequencing and...EuFMD
 
Mediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesMediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesXavier Anguera
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Association for Computational Linguistics
 
A Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemA Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemVani011
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsGenomeInABottle
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marcGenomeInABottle
 
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITIONDEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITIONniranjan kumar
 
Financial Transactions in ATM Machines using Speech Signals
Financial Transactions in ATM Machines using Speech SignalsFinancial Transactions in ATM Machines using Speech Signals
Financial Transactions in ATM Machines using Speech SignalsIJERA Editor
 

Similar to Nni v7 (20)

MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
 
Deep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionDeep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker Recognition
 
Deep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event DetectionDeep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event Detection
 
160628 giab for festival of genomics
160628 giab for festival of genomics160628 giab for festival of genomics
160628 giab for festival of genomics
 
Giab jan2016 analysis team breakout summary
Giab jan2016 analysis team breakout summaryGiab jan2016 analysis team breakout summary
Giab jan2016 analysis team breakout summary
 
Text Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.pptText Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.ppt
 
A Survey Paper on Detection of Voice Pathology Using Machine Learning
A Survey Paper on Detection of Voice Pathology Using Machine LearningA Survey Paper on Detection of Voice Pathology Using Machine Learning
A Survey Paper on Detection of Voice Pathology Using Machine Learning
 
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
 
Donald K. - Innovation in molecular diagnosis, next generation sequencing and...
Donald K. - Innovation in molecular diagnosis, next generation sequencing and...Donald K. - Innovation in molecular diagnosis, next generation sequencing and...
Donald K. - Innovation in molecular diagnosis, next generation sequencing and...
 
Mediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesMediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slides
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
A Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemA Survey on Speaker Recognition System
A Survey on Speaker Recognition System
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
Towards the study of sentiment in the public opinion of science in Spanish
Towards the study of sentiment in the public opinion of science in SpanishTowards the study of sentiment in the public opinion of science in Spanish
Towards the study of sentiment in the public opinion of science in Spanish
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITIONDEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION
 
DSRG report 2001
DSRG report 2001DSRG report 2001
DSRG report 2001
 
Financial Transactions in ATM Machines using Speech Signals
Financial Transactions in ATM Machines using Speech SignalsFinancial Transactions in ATM Machines using Speech Signals
Financial Transactions in ATM Machines using Speech Signals
 

More from multimediaeval

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...multimediaeval
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...multimediaeval
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...multimediaeval
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...multimediaeval
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Taskmultimediaeval
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...multimediaeval
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimatormultimediaeval
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...multimediaeval
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Imagesmultimediaeval
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matchingmultimediaeval
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...multimediaeval
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...multimediaeval
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...multimediaeval
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detectionmultimediaeval
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...multimediaeval
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attentionmultimediaeval
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...multimediaeval
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...multimediaeval
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...multimediaeval
 

More from multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 

Nni v7

  • 1. The NNI QbE-STD System for MedialEval 2014 Peng Yang1, Haihua Xu2, Xiong Xiao2, Lei Xie1, Cheung-Chi Leung3 Hongjie Chen1, Jia Yu1, Hang Lv1, Lei Wang3, Su Jun Leow2 Bin Ma3, Eng Siong Chng1, Haizhou Li2,3 1Northwestern Polytechnical University, Xi’an, China 2Nanyang Technological University, Singapore 3Institute for Infocomm Research, A STAR, Singapore Presented  by    Haihua  Xu   Temasek  Laboratories@NTU,  Singapore   1   NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
  • 2. System Diagram Two groups of subsystems are used: •  Subsequence DTW-based template matching on Gaussian/phone posteriorgram and bottleneck features. •  Symbolic search (SS) using phone tokenizer and weighted finite state transducer (WFST) NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona 2  
  • 3. Tokenizers Tokenizers are used to convert the audio signal into •  posteriorgram or bottleneck features for DTW based systems •  phone sequences/lattices for SS systems 3   NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
  • 4. DTW-based Systems •  Full sequence matching1: conventional subsequence DTW. Good for type 1 queries. •  Used partial matching for type 2&3 queries. •  Use partial feature segment of query for matching •  Segments are 600ms long and shifted by 50ms. •  Improved performance for Type 3 queries. •  9 DTW systems •  5 using full matching •  4 using partial matching 1Yang P. et al, “Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection ”, in Proc. INTERSPEECH, 2014 4   NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
  • 5. Why Symbolic Search (SS) •  DTW is effective1, but it is •  computationally expensive and difficult to be indexed, •  not easy to handle inexact match. •  Symbolic search allows indexing and fast search, e.g. using weighted finite state transducer (WFST). 1Anguera X., Rodrigues-Fuentes L.J., Szoke I., Buzo A., and Metze F., “Query by example search on speech at mediaeval 2014”, in Working Notes Proceedings of the Mediaeval 2014 workshop, Barcelona, Spain, Oct. 16-17 5   NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona
  • 6. Symbolic Search System 6   NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona •  Limitations of symbolic search for QbE-STD: •  Must use phone recognizers of other languages for tokenization à poor symbolic representation. •  Inconsistent phone representation between query and search audio.
  • 7. 7   NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona Limitation of Conventional Symbolic Search •  Full – Full symbolic search method •  pMiss – Miss rate •  pFA – False alarm rate •  ATWV – Actual Term Weighted Value As query length increases, •  Missing rate approaches 100% •  False alarm rate approaches 0 •  ATWV approaches 0
  • 8. 8   NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona Partial Phone Sequence Matching Partial Matching Steps •  If a query phone hypothesis is longer than 6, get all partial sequences of the hypothesis. •  Use all the unique partial sequences to search. •  Search results are pooled and all treated as the match of the query. •  Score normalization is applied, and decision is made. •  High missing rate of long queries can be reduced by simply shorten the query representation. •  Rationale: let the system return something first, and then decide which is true match.
  • 9. 9   NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona Effectiveness of Partial Phone Sequence Matching Full – Full symbolic search method Partial – Partial symbolic search method pMiss – Miss rate pFA – False alarm rate ATWV – Actual Term Weighted Value For queries longer than 6 phones: •  Missing rate reduced •  False alarm increased •  ATWV increased. If beta is not 66.7, the best trade- off point of pMiss and pFA will change.
  • 10. 10   Results NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona •  For type 1 query, the partial SS method is obviously worse than DTW method. •  But for type 2 and 3 queries, the partial SS method is comparable with DTW one. •  For type 3 query, the partial SS method is significantly better than the DTW one in terms MTWV. •  The two methods are very complementary.
  • 11. Conclusion 11  NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona We have described the NNI system for the QUESST 2014 Task •  DTW based subsystem •  Symbolic search subsystem •  Why conventional SS system is not working, especially for long queries •  Partial phone sequence SS method is proposed •  The NNI system results are reported In future, research will be focused on reducing the false alarms introduced by the partial matching method.
  • 12. Thanks ! 12  NNI QbE-STD system, MedialEval 2014 Workshop, Barcelona