SlideShare a Scribd company logo
1 of 1
Download to read offline
– The Query by Example Search on Speech Task (QUESST) involves searching for audio within audio content using audio queries.
– Queries may present small changes, filler content, reordering of words, and originate from spontaneous inquiries.
– They may also have significant background or intermittent noise and reverberation.
Our System: Spectral Subtraction to filter background noise; Fuses 6 special Dynamic Time Warping (DTW) paths obtained from the output of phonetic recognizers for 5 languages.
The Task
The SPL-IT-UC Query by Example Search on Speech
system for MediaEval 2015
Jorge Proença, Luis Castela, Fernando Perdigão
Instituto de Telecomunicações, Pole of Coimbra, Portugal
University of Coimbra – DEEC-FCTUC, Coimbra, Portugal
{jproenca, fp}@co.it.pt
Main contributions:
– Performing a careful Spectral Subtraction – to diminish severe background
noise which greatly influences the output of phonetic recognizers;
– Using the average distance matrix of all languages as 6th sub-system;
– Considering 6 possible DTW paths to tackle complex match cases;
– Truncating large distances per-query – may help to lower the burden of
critical false negatives.
– Besides side-info, all of the improvements also improve the ATWV metric.
Conclusions
MediaEval 2015 - QUESST
| September 14-15 2015, Wurzen, GERMANY
– We used the long temporal context neural network system from Brno
University of Technology (BUT).
– 5 sub-systems/languages (for 8 kHz):
Czech
Hungarian
Russian
Portuguese (trained)
English (trained)
– Output: state level posteriorgrams
(3 states per phoneme).
– Silence/Noise frames removed on queries.
2. Phonetic Recognizer
– Per language Local Distance matrix:
– Dot Product of Query and Audio posterior probability vectors;
– Back-off with l =10-4
6 sub-systems for DTW:
– 5 distance matrices from the 5 languages
– a 6th one, the average of the 5 distance matrices – ML
(Improvement: 5langs fusion - 0.7971 Cnxe, ML - 0.8136, 5langs+ML - 0.7873)
Basic DTW strategy (A1):
– Smallest distance in identically weighted unitary jumps.
– Output average distance of the final path.
3. Dynamic Time Warping (DTW)
Spectral Subtraction (SS) to counter constant background noise.
1. High pass filter for low-frequency artefacts.
2. Analyze averaged Energy of the signal and determine high and low levels
through median of quartiles:
3. High SNR signals: no SS applied due to distortions.
Others: get >100ms candidate segments for "noise“.
4. Subtract the average noise spectrum with classical SS.
(Improvement: from 0.8368 Cnxe → 0.8130 with SS)
1. Noise Filtering
Czech Posteriorgram example for one query
Linear Fusion (with Bosaris Toolkit), calibrating for Cnxe.
– 6 sub-systems x 6 paths = 36 distance vectors of audio-query pairs.
1. Per query distribution: Truncate large distances to the mean of the
distribution.
(Improvement: from 0.7939 -> 0.7873 Cnxe)
2. Normalize per-query: subtract mean, divide by standard deviation.
3. Side-info: 7 additional vectors for fusion:
– mean of distances per query before truncation and normalization (from the best
approach and sub-system: ML-A2);
– Query size in frames and log of query size;
– 4 SNR values: original and post SS SNRs of query and of audio.
4 systems submitted:
1. Linear Fusion of all approaches and sub-systems + side-info
2. Harmonic Mean of approaches and Linear Fusion of sub-systems + side-info
3. Same as 1, without side-info
4. Same as 2, without side-info
5. Fusion and Calibration
– Side-info always helpful for the Cnxe metric.
– Fusion of All best on Dev set.
– Harmonic mean: best on Eval (fusion of all may be over fitted for Dev).
– Best Dev A1: 0.8041, A2: 0.7978, A3: 0.8335, A4: 0.8137, A5: 0.8184, A6: 0.8460
(A2) overall best, may help in all cases due to co-articulation or intonation.
(A6) performs badly. Filler in query may be extension and not gap.
– Best Eval T1: 0.7107, T2: 0.8147, T3: 0.8115
6. Results
QueryQuery
Audio
Query vs. Audio posterior distance matrix (top) and the best path from A5 (bottom)
– Indexing Speed Factor – 2.14
– Searching Speed Factor – 0.0034 per sec
– Peak Memory – 120MB
Processing Speed
Query
Audio
1 1
1
𝑞′ = 1 − 𝜆 𝑞 + 𝜆𝑢
Fusion Systems Dev: Cnxe, MinCnxe Eval: Cnxe, MinCnxe
1. All + side-info 0.7782, 0.7716 0.7866, 0.7809
2. H.mean + side-info 0.7862, 0.7800 0.7842, 0.7786
3. All, no side 0.7873, 0.7816 0.7930, 0.7875
4. H.mean, no side 0.7957, 0.7893 0.7914, 0.7865
– 5 additional approaches:
(A2) – Cutting up to 250ms at the end of the query, keeping the total above
500ms.
(A3) – Cutting up to 250ms at the beginning of the query, keeping the total
above 500ms.
(A4) – Allowing one 'jump' along the audio up to ½ query’s length, that
– Can’t occur at initial and final 250ms of the query
– Can’t occur for queries shorter than 800ms
(A5) – Accounting for re-ordering of words.
– Find the best path for the beginning of the query, ahead of the end of the
first one, with restrictions similar to (A4).
(A6) – Allowing one 'jump' along the query, of maximum ⅓ of query length.
4. DTW Modifications
𝐷 𝑞, 𝑥 = − log 𝑞. 𝑥
0 200 400 600 800 1000 1200
-80
-70
-60
-50
-40
-30
-20
-10
0
Query frames (5ms)
AverageEnergy(dB)

More Related Content

Viewers also liked

MediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2016 - Simula Team @ Context of Experience TaskMediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2016 - Simula Team @ Context of Experience Taskmultimediaeval
 
MediaEval 2016 - Emotion in Music Task: Lessons Learned
MediaEval 2016 - Emotion in Music Task: Lessons LearnedMediaEval 2016 - Emotion in Music Task: Lessons Learned
MediaEval 2016 - Emotion in Music Task: Lessons Learnedmultimediaeval
 
MediaEval 2016 - MLPBOON Predicting Media Interestingness System
MediaEval 2016 - MLPBOON Predicting Media Interestingness SystemMediaEval 2016 - MLPBOON Predicting Media Interestingness System
MediaEval 2016 - MLPBOON Predicting Media Interestingness Systemmultimediaeval
 
7 крутых военных роботов
7 крутых военных роботов7 крутых военных роботов
7 крутых военных роботовNanoJam.ru
 
10 важных факторов при выборе кружка робототехники
10 важных факторов при выборе кружка робототехники10 важных факторов при выборе кружка робототехники
10 важных факторов при выборе кружка робототехникиNanoJam.ru
 
Kabul edilemez Ermeni iddialari
Kabul edilemez Ermeni iddialariKabul edilemez Ermeni iddialari
Kabul edilemez Ermeni iddialariRaci Göktaş
 
Titanic review
Titanic reviewTitanic review
Titanic reviewneesh2
 
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...multimediaeval
 
MediaEval 2015 - Multimodal Person Discovery in Broadcast TV
MediaEval 2015 - Multimodal Person Discovery in Broadcast TVMediaEval 2015 - Multimodal Person Discovery in Broadcast TV
MediaEval 2015 - Multimodal Person Discovery in Broadcast TVmultimediaeval
 
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep ModelsMediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Modelsmultimediaeval
 
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...multimediaeval
 
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep ModelsMediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Modelsmultimediaeval
 
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...multimediaeval
 
MediaEval 2016: LAPI at Predicting Media Interestingness Task
MediaEval 2016: LAPI at Predicting Media Interestingness TaskMediaEval 2016: LAPI at Predicting Media Interestingness Task
MediaEval 2016: LAPI at Predicting Media Interestingness Taskmultimediaeval
 
MediaEval 2016 - RECOD at Placing Task
MediaEval 2016 - RECOD at Placing TaskMediaEval 2016 - RECOD at Placing Task
MediaEval 2016 - RECOD at Placing Taskmultimediaeval
 
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...multimediaeval
 
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...multimediaeval
 

Viewers also liked (20)

MediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2016 - Simula Team @ Context of Experience TaskMediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2016 - Simula Team @ Context of Experience Task
 
MediaEval 2016 - Emotion in Music Task: Lessons Learned
MediaEval 2016 - Emotion in Music Task: Lessons LearnedMediaEval 2016 - Emotion in Music Task: Lessons Learned
MediaEval 2016 - Emotion in Music Task: Lessons Learned
 
MediaEval 2016 - MLPBOON Predicting Media Interestingness System
MediaEval 2016 - MLPBOON Predicting Media Interestingness SystemMediaEval 2016 - MLPBOON Predicting Media Interestingness System
MediaEval 2016 - MLPBOON Predicting Media Interestingness System
 
Assessment, Story, and Action White Paper
Assessment, Story, and Action White PaperAssessment, Story, and Action White Paper
Assessment, Story, and Action White Paper
 
7 крутых военных роботов
7 крутых военных роботов7 крутых военных роботов
7 крутых военных роботов
 
10 важных факторов при выборе кружка робототехники
10 важных факторов при выборе кружка робототехники10 важных факторов при выборе кружка робототехники
10 важных факторов при выборе кружка робототехники
 
Imran_CV
Imran_CV Imran_CV
Imran_CV
 
Curriculum vitae
Curriculum vitaeCurriculum vitae
Curriculum vitae
 
Kabul edilemez Ermeni iddialari
Kabul edilemez Ermeni iddialariKabul edilemez Ermeni iddialari
Kabul edilemez Ermeni iddialari
 
Titanic review
Titanic reviewTitanic review
Titanic review
 
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
 
MediaEval 2015 - Multimodal Person Discovery in Broadcast TV
MediaEval 2015 - Multimodal Person Discovery in Broadcast TVMediaEval 2015 - Multimodal Person Discovery in Broadcast TV
MediaEval 2015 - Multimodal Person Discovery in Broadcast TV
 
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep ModelsMediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
 
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...
 
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep ModelsMediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
 
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
 
MediaEval 2016: LAPI at Predicting Media Interestingness Task
MediaEval 2016: LAPI at Predicting Media Interestingness TaskMediaEval 2016: LAPI at Predicting Media Interestingness Task
MediaEval 2016: LAPI at Predicting Media Interestingness Task
 
MediaEval 2016 - RECOD at Placing Task
MediaEval 2016 - RECOD at Placing TaskMediaEval 2016 - RECOD at Placing Task
MediaEval 2016 - RECOD at Placing Task
 
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...
 
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...
 

Similar to MediaEval 2015 - The SPL-IT-UC Query by Example Search on Speech system for MediaEval 2015 - poster

A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfA_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfBala Murugan
 
MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...
MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...
MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...multimediaeval
 
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015multimediaeval
 
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...a3labdsp
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
Encrypted Traffic Mining
Encrypted Traffic MiningEncrypted Traffic Mining
Encrypted Traffic MiningHenry Huang
 
Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...csandit
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...csandit
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...cscpconf
 
Attention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingAttention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingIAESIJAI
 
Audio Signal Processing
Audio Signal Processing Audio Signal Processing
Audio Signal Processing Ahmed A. Arefin
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...IRJET Journal
 
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Sri Manakula Vinayagar Engineering College
 
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing IJECEIAES
 
An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...csandit
 
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...cscpconf
 
An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...csandit
 

Similar to MediaEval 2015 - The SPL-IT-UC Query by Example Search on Speech system for MediaEval 2015 - poster (20)

A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfA_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
 
MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...
MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...
MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
 
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
Encrypted Traffic Mining
Encrypted Traffic MiningEncrypted Traffic Mining
Encrypted Traffic Mining
 
Speaker Segmentation (2006)
Speaker Segmentation (2006)Speaker Segmentation (2006)
Speaker Segmentation (2006)
 
Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
 
Attention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingAttention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoising
 
Audio Signal Processing
Audio Signal Processing Audio Signal Processing
Audio Signal Processing
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
 
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
 
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
 
Une18apsipa
Une18apsipaUne18apsipa
Une18apsipa
 
An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...
 
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...
AN EFFICIENT PEAK VALLEY DETECTION BASED VAD ALGORITHM FOR ROBUST DETECTION O...
 
An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...
 

More from multimediaeval

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...multimediaeval
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...multimediaeval
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...multimediaeval
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...multimediaeval
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Taskmultimediaeval
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...multimediaeval
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimatormultimediaeval
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...multimediaeval
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Imagesmultimediaeval
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matchingmultimediaeval
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...multimediaeval
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...multimediaeval
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...multimediaeval
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detectionmultimediaeval
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...multimediaeval
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attentionmultimediaeval
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...multimediaeval
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...multimediaeval
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...multimediaeval
 

More from multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 

Recently uploaded

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 

Recently uploaded (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 

MediaEval 2015 - The SPL-IT-UC Query by Example Search on Speech system for MediaEval 2015 - poster

  • 1. – The Query by Example Search on Speech Task (QUESST) involves searching for audio within audio content using audio queries. – Queries may present small changes, filler content, reordering of words, and originate from spontaneous inquiries. – They may also have significant background or intermittent noise and reverberation. Our System: Spectral Subtraction to filter background noise; Fuses 6 special Dynamic Time Warping (DTW) paths obtained from the output of phonetic recognizers for 5 languages. The Task The SPL-IT-UC Query by Example Search on Speech system for MediaEval 2015 Jorge Proença, Luis Castela, Fernando Perdigão Instituto de Telecomunicações, Pole of Coimbra, Portugal University of Coimbra – DEEC-FCTUC, Coimbra, Portugal {jproenca, fp}@co.it.pt Main contributions: – Performing a careful Spectral Subtraction – to diminish severe background noise which greatly influences the output of phonetic recognizers; – Using the average distance matrix of all languages as 6th sub-system; – Considering 6 possible DTW paths to tackle complex match cases; – Truncating large distances per-query – may help to lower the burden of critical false negatives. – Besides side-info, all of the improvements also improve the ATWV metric. Conclusions MediaEval 2015 - QUESST | September 14-15 2015, Wurzen, GERMANY – We used the long temporal context neural network system from Brno University of Technology (BUT). – 5 sub-systems/languages (for 8 kHz): Czech Hungarian Russian Portuguese (trained) English (trained) – Output: state level posteriorgrams (3 states per phoneme). – Silence/Noise frames removed on queries. 2. Phonetic Recognizer – Per language Local Distance matrix: – Dot Product of Query and Audio posterior probability vectors; – Back-off with l =10-4 6 sub-systems for DTW: – 5 distance matrices from the 5 languages – a 6th one, the average of the 5 distance matrices – ML (Improvement: 5langs fusion - 0.7971 Cnxe, ML - 0.8136, 5langs+ML - 0.7873) Basic DTW strategy (A1): – Smallest distance in identically weighted unitary jumps. – Output average distance of the final path. 3. Dynamic Time Warping (DTW) Spectral Subtraction (SS) to counter constant background noise. 1. High pass filter for low-frequency artefacts. 2. Analyze averaged Energy of the signal and determine high and low levels through median of quartiles: 3. High SNR signals: no SS applied due to distortions. Others: get >100ms candidate segments for "noise“. 4. Subtract the average noise spectrum with classical SS. (Improvement: from 0.8368 Cnxe → 0.8130 with SS) 1. Noise Filtering Czech Posteriorgram example for one query Linear Fusion (with Bosaris Toolkit), calibrating for Cnxe. – 6 sub-systems x 6 paths = 36 distance vectors of audio-query pairs. 1. Per query distribution: Truncate large distances to the mean of the distribution. (Improvement: from 0.7939 -> 0.7873 Cnxe) 2. Normalize per-query: subtract mean, divide by standard deviation. 3. Side-info: 7 additional vectors for fusion: – mean of distances per query before truncation and normalization (from the best approach and sub-system: ML-A2); – Query size in frames and log of query size; – 4 SNR values: original and post SS SNRs of query and of audio. 4 systems submitted: 1. Linear Fusion of all approaches and sub-systems + side-info 2. Harmonic Mean of approaches and Linear Fusion of sub-systems + side-info 3. Same as 1, without side-info 4. Same as 2, without side-info 5. Fusion and Calibration – Side-info always helpful for the Cnxe metric. – Fusion of All best on Dev set. – Harmonic mean: best on Eval (fusion of all may be over fitted for Dev). – Best Dev A1: 0.8041, A2: 0.7978, A3: 0.8335, A4: 0.8137, A5: 0.8184, A6: 0.8460 (A2) overall best, may help in all cases due to co-articulation or intonation. (A6) performs badly. Filler in query may be extension and not gap. – Best Eval T1: 0.7107, T2: 0.8147, T3: 0.8115 6. Results QueryQuery Audio Query vs. Audio posterior distance matrix (top) and the best path from A5 (bottom) – Indexing Speed Factor – 2.14 – Searching Speed Factor – 0.0034 per sec – Peak Memory – 120MB Processing Speed Query Audio 1 1 1 𝑞′ = 1 − 𝜆 𝑞 + 𝜆𝑢 Fusion Systems Dev: Cnxe, MinCnxe Eval: Cnxe, MinCnxe 1. All + side-info 0.7782, 0.7716 0.7866, 0.7809 2. H.mean + side-info 0.7862, 0.7800 0.7842, 0.7786 3. All, no side 0.7873, 0.7816 0.7930, 0.7875 4. H.mean, no side 0.7957, 0.7893 0.7914, 0.7865 – 5 additional approaches: (A2) – Cutting up to 250ms at the end of the query, keeping the total above 500ms. (A3) – Cutting up to 250ms at the beginning of the query, keeping the total above 500ms. (A4) – Allowing one 'jump' along the audio up to ½ query’s length, that – Can’t occur at initial and final 250ms of the query – Can’t occur for queries shorter than 800ms (A5) – Accounting for re-ordering of words. – Find the best path for the beginning of the query, ahead of the end of the first one, with restrictions similar to (A4). (A6) – Allowing one 'jump' along the query, of maximum ⅓ of query length. 4. DTW Modifications 𝐷 𝑞, 𝑥 = − log 𝑞. 𝑥 0 200 400 600 800 1000 1200 -80 -70 -60 -50 -40 -30 -20 -10 0 Query frames (5ms) AverageEnergy(dB)