SlideShare a Scribd company logo
1 of 1
Download to read offline
CUNI at MediaEval 2014 Search and Hyperlinking Task: 
Visual and Prosodic Features in Hyperlinking 
Petra Galuščáková, Martin Kruliš, Jakub Lokoč and Pavel Pecina 
Charles University in Prague, Faculty of Mathematics and Physics 
• Find segments similar to a given (query) segment 
in the collection of audio-visual recordings (TV 
programmes provided by BBC). 
• Available metadata, subtitles, and automatic 
transcripts (LIMSI, LIUM, and NST-Sheffield), 
visual features (shots and keyframes), and 
prosodic features. 
1. Search System 
• The search system used for Hyperlinking is 
identical to the system used in the Search sub-task. 
• Segmentation: we used segments of fixed length 
and our segmentation system which based on 
Decision Trees (DT). 
• The calculated VisualSimilarity was used to modify the 
final score of the segment in the retrieval as follows: 
4. Prosodic Similarity 
• We took sequences of vectors of prosodic 
features appearing up to 1 second from the 
beginning of the query segment and found the 
most similar sequence of the vectors in each 
segment. 
Transcripts Segmentation Weightening Metadata Overlap MAP P 5 P 10 P 20 MAP-bin MAP-tol 
Subtitles Fixed None No No 0.1618 0.4786 0.4107 0.2893 0.1423 0.1216 
Subtitles Fixed Visual No No 0.1660 0.4929 0.4143 0.3000 0.1483 0.1245 
Subtitles Fixed None Yes No 0.4301 0.8600 0.7767 0.5483 0.2689 0.2465 
Subtitles Fixed Visual Yes Yes 4.1824 0.9667 0.9567 0.8967 0.3080 0.0996 
Subtitles Fixed Visual Yes No 0.4366 0.8667 0.7700 0.5633 0.2724 0.2580 
Subtitles Fixed Prosodic Yes No 0.4321 0.8533 0.7767 0.5517 0.2687 0.2473 
Subtitles DT Visual Yes No 0.8253 0.8867 0.8567 0.7383 0.2525 0.1991 
LIMSI DT Visual Yes No 0.5196 0.8333 0.7233 0.5817 0.2681 0.1976 
LIMSI Fixed Visual Yes Yes 4.0331 0.9400 0.9233 0.8983 0.3042 0.0950 
LIUM DT Visual Yes No 0.5195 0.8200 0.7467 0.5733 0.2674 0.2134 
LIUM Fixed Visual Yes Yes 3.8916 0.9267 0.8800 0.8800 0.2880 0.0993 
NST-Sheffield DT Visual Yes No 0.6889 0.8400 0.8067 0.6500 0.2666 0.1955 
NST-Sheffield Fixed Visual Yes Yes 3.9822 0.9267 0.9267 0.8817 0.2949 0.0914 
Tab1. Results of the Hyperlinking sub-task for different transcripts, segmentation types, 
weightening types, metadata, and removal of overlapping retrieved segments. 
3. Visual Similarity 
• We calculated VisualSimilarity between each 
segment and each query segment using the 
keyframes. 
• We used the Signature Quadratic Form Distance 
and Feature Signatures. 
• The fixed-length segmentation outperforms the 
DT-based segmentation with 120-seconds-long 
segments 
• 50-seconds-long DT-segments outperform the 
fixed-length segments measured by MAP and 
precision-based measures, but are outperformed by 
fixed-length segments using the MAP-bin and 
MAP-tol measures 
• There is a constant improvement obtained by using 
the visual weights. 
• There is small but promising improvement in 
MAP, P20, and MAP-tol measures by using the 
prosodic features. 
• The concatenation with the context and metadata 
is proved to be beneficial. 
6. Conclusion 
• We successfully combine features from multiple 
modalities (textual, visual, and prosodic). 
• The segmentation employing the Decision Trees 
outperforms the fixed-length segmentation. 
Acknowledgments 
This research is supported by the Czech Science Foundation, grant 
number P103/12/G084, Charles University Grant Agency GA UK, grant 
number 920913, and by SVV project number 260 104. 
2. Hyperlinking 
• We first transformed the query segment into a 
textual query by including all the words of the 
subtitles lying within the segment boundary. 
• We extended the segment boundary by including 
the context surrounding the query segment. 
• We combined textual and visual/prosodic 
similarity. 
5. Results 
http://siret.ms.mff.cuni.cz

More Related Content

Similar to CUNI at MediaEval 2014 Search and Hyperlinking Task: Visual and Prosodic Features in Hyperlinking

Deep learning-based switchable network for in-loop filtering in high efficie...
Deep learning-based switchable network for in-loop filtering in  high efficie...Deep learning-based switchable network for in-loop filtering in  high efficie...
Deep learning-based switchable network for in-loop filtering in high efficie...IJECEIAES
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
 
RIPE 78: A review of the KSK Roll
RIPE 78: A review of the KSK RollRIPE 78: A review of the KSK Roll
RIPE 78: A review of the KSK RollAPNIC
 
Novel Approach for Evaluating Video Transmission using Combined Scalable Vide...
Novel Approach for Evaluating Video Transmission using Combined Scalable Vide...Novel Approach for Evaluating Video Transmission using Combined Scalable Vide...
Novel Approach for Evaluating Video Transmission using Combined Scalable Vide...IJECEIAES
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Maria Eskevich
 
Fast channel zapping with destination oriented multicast for ip video delivery
Fast channel zapping with destination oriented multicast for ip video deliveryFast channel zapping with destination oriented multicast for ip video delivery
Fast channel zapping with destination oriented multicast for ip video deliveryecway
 
Java fast channel zapping with destination-oriented multicast for ip video d...
Java  fast channel zapping with destination-oriented multicast for ip video d...Java  fast channel zapping with destination-oriented multicast for ip video d...
Java fast channel zapping with destination-oriented multicast for ip video d...ecwayerode
 
2016-02-17 research seminar
2016-02-17 research seminar2016-02-17 research seminar
2016-02-17 research seminarifi8106tlu
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...Dongmin Choi
 
[2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇
[2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇[2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇
[2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇台灣資料科學年會
 
Research and activity report
Research and activity reportResearch and activity report
Research and activity reportMarco Cagnazzo
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxAbdulRehman743834
 
scopus indexed journals list
scopus indexed journals listscopus indexed journals list
scopus indexed journals listrikaseorika
 
published journals
published journalspublished journals
published journalsrikaseorika
 
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...Vignesh V Menon
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesUnited States Air Force Academy
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression Roberto Iacoviello
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
 
CNVMiner: Pipeline to Mine CNV & Structural Variation in Hierarchical Fashion
CNVMiner: Pipeline to Mine CNV & Structural Variation in Hierarchical FashionCNVMiner: Pipeline to Mine CNV & Structural Variation in Hierarchical Fashion
CNVMiner: Pipeline to Mine CNV & Structural Variation in Hierarchical FashionInterpretOmics
 

Similar to CUNI at MediaEval 2014 Search and Hyperlinking Task: Visual and Prosodic Features in Hyperlinking (20)

Deep learning-based switchable network for in-loop filtering in high efficie...
Deep learning-based switchable network for in-loop filtering in  high efficie...Deep learning-based switchable network for in-loop filtering in  high efficie...
Deep learning-based switchable network for in-loop filtering in high efficie...
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
 
RIPE 78: A review of the KSK Roll
RIPE 78: A review of the KSK RollRIPE 78: A review of the KSK Roll
RIPE 78: A review of the KSK Roll
 
Novel Approach for Evaluating Video Transmission using Combined Scalable Vide...
Novel Approach for Evaluating Video Transmission using Combined Scalable Vide...Novel Approach for Evaluating Video Transmission using Combined Scalable Vide...
Novel Approach for Evaluating Video Transmission using Combined Scalable Vide...
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
 
Fast channel zapping with destination oriented multicast for ip video delivery
Fast channel zapping with destination oriented multicast for ip video deliveryFast channel zapping with destination oriented multicast for ip video delivery
Fast channel zapping with destination oriented multicast for ip video delivery
 
Java fast channel zapping with destination-oriented multicast for ip video d...
Java  fast channel zapping with destination-oriented multicast for ip video d...Java  fast channel zapping with destination-oriented multicast for ip video d...
Java fast channel zapping with destination-oriented multicast for ip video d...
 
Internship Presentation
Internship PresentationInternship Presentation
Internship Presentation
 
2016-02-17 research seminar
2016-02-17 research seminar2016-02-17 research seminar
2016-02-17 research seminar
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
 
[2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇
[2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇[2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇
[2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇
 
Research and activity report
Research and activity reportResearch and activity report
Research and activity report
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
 
scopus indexed journals list
scopus indexed journals listscopus indexed journals list
scopus indexed journals list
 
published journals
published journalspublished journals
published journals
 
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large Repositories
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
 
CNVMiner: Pipeline to Mine CNV & Structural Variation in Hierarchical Fashion
CNVMiner: Pipeline to Mine CNV & Structural Variation in Hierarchical FashionCNVMiner: Pipeline to Mine CNV & Structural Variation in Hierarchical Fashion
CNVMiner: Pipeline to Mine CNV & Structural Variation in Hierarchical Fashion
 

More from multimediaeval

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...multimediaeval
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...multimediaeval
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...multimediaeval
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...multimediaeval
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Taskmultimediaeval
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...multimediaeval
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimatormultimediaeval
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...multimediaeval
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Imagesmultimediaeval
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matchingmultimediaeval
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...multimediaeval
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...multimediaeval
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...multimediaeval
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detectionmultimediaeval
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...multimediaeval
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attentionmultimediaeval
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...multimediaeval
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...multimediaeval
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...multimediaeval
 

More from multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 

CUNI at MediaEval 2014 Search and Hyperlinking Task: Visual and Prosodic Features in Hyperlinking

  • 1. CUNI at MediaEval 2014 Search and Hyperlinking Task: Visual and Prosodic Features in Hyperlinking Petra Galuščáková, Martin Kruliš, Jakub Lokoč and Pavel Pecina Charles University in Prague, Faculty of Mathematics and Physics • Find segments similar to a given (query) segment in the collection of audio-visual recordings (TV programmes provided by BBC). • Available metadata, subtitles, and automatic transcripts (LIMSI, LIUM, and NST-Sheffield), visual features (shots and keyframes), and prosodic features. 1. Search System • The search system used for Hyperlinking is identical to the system used in the Search sub-task. • Segmentation: we used segments of fixed length and our segmentation system which based on Decision Trees (DT). • The calculated VisualSimilarity was used to modify the final score of the segment in the retrieval as follows: 4. Prosodic Similarity • We took sequences of vectors of prosodic features appearing up to 1 second from the beginning of the query segment and found the most similar sequence of the vectors in each segment. Transcripts Segmentation Weightening Metadata Overlap MAP P 5 P 10 P 20 MAP-bin MAP-tol Subtitles Fixed None No No 0.1618 0.4786 0.4107 0.2893 0.1423 0.1216 Subtitles Fixed Visual No No 0.1660 0.4929 0.4143 0.3000 0.1483 0.1245 Subtitles Fixed None Yes No 0.4301 0.8600 0.7767 0.5483 0.2689 0.2465 Subtitles Fixed Visual Yes Yes 4.1824 0.9667 0.9567 0.8967 0.3080 0.0996 Subtitles Fixed Visual Yes No 0.4366 0.8667 0.7700 0.5633 0.2724 0.2580 Subtitles Fixed Prosodic Yes No 0.4321 0.8533 0.7767 0.5517 0.2687 0.2473 Subtitles DT Visual Yes No 0.8253 0.8867 0.8567 0.7383 0.2525 0.1991 LIMSI DT Visual Yes No 0.5196 0.8333 0.7233 0.5817 0.2681 0.1976 LIMSI Fixed Visual Yes Yes 4.0331 0.9400 0.9233 0.8983 0.3042 0.0950 LIUM DT Visual Yes No 0.5195 0.8200 0.7467 0.5733 0.2674 0.2134 LIUM Fixed Visual Yes Yes 3.8916 0.9267 0.8800 0.8800 0.2880 0.0993 NST-Sheffield DT Visual Yes No 0.6889 0.8400 0.8067 0.6500 0.2666 0.1955 NST-Sheffield Fixed Visual Yes Yes 3.9822 0.9267 0.9267 0.8817 0.2949 0.0914 Tab1. Results of the Hyperlinking sub-task for different transcripts, segmentation types, weightening types, metadata, and removal of overlapping retrieved segments. 3. Visual Similarity • We calculated VisualSimilarity between each segment and each query segment using the keyframes. • We used the Signature Quadratic Form Distance and Feature Signatures. • The fixed-length segmentation outperforms the DT-based segmentation with 120-seconds-long segments • 50-seconds-long DT-segments outperform the fixed-length segments measured by MAP and precision-based measures, but are outperformed by fixed-length segments using the MAP-bin and MAP-tol measures • There is a constant improvement obtained by using the visual weights. • There is small but promising improvement in MAP, P20, and MAP-tol measures by using the prosodic features. • The concatenation with the context and metadata is proved to be beneficial. 6. Conclusion • We successfully combine features from multiple modalities (textual, visual, and prosodic). • The segmentation employing the Decision Trees outperforms the fixed-length segmentation. Acknowledgments This research is supported by the Czech Science Foundation, grant number P103/12/G084, Charles University Grant Agency GA UK, grant number 920913, and by SVV project number 260 104. 2. Hyperlinking • We first transformed the query segment into a textual query by including all the words of the subtitles lying within the segment boundary. • We extended the segment boundary by including the context surrounding the query segment. • We combined textual and visual/prosodic similarity. 5. Results http://siret.ms.mff.cuni.cz