SlideShare a Scribd company logo
1 of 11
Download to read offline
Multimodality and Deep Learning when
predicting Media Interestingness
Eloise Berson - Claire-Hélène Demarty – Ngoc Duong
Technicolor, France
MediaEval 2017 Workshop
September, 13-15th 2017
 Build incrementally from last year’s systems
 Re-use similar features and DNN architectures
➢ Make use of the multimodal nature of content
➢ Model its temporal evolution
2
Motivation
9/14/2017
 Build incrementally from last year’s systems
 Re-use similar features and DNN architectures
➢ Make use of the multimodal nature of content
➢ Model its temporal evolution
 Investigate benefit of
 Adding some semantic & contextual information to the content
➢ Add a textual modality from IMDb movie description
➢ Use Image Captioning-based features
3
Motivation
9/14/2017
 For image and frame:
 CNN features from fc7 layer of the CaffeNet model
 Dimension: 4096
 For audio:
 60 MFCC features + first & second derivatives
 Dimension: 180
 For image only:
 Image Captioning Based (ICB) features [1]
 Dimension: 1024
 For text (used for the image & video subtasks):
 From IMDb description → keyword extraction → Word2Vec (W2V) description
 Dimension: 300
[1] R. Kiros, R. Salakhutdinov and R. S. Zemel. Unifying visual-semantic embeddings with
multimodal neural language models. arXiv preprint arXiv:1411.2539, 2014.
4
Features
9/14/2017
5 9/14/2017
Image subtask – different feature concatenations
Run#1: Baseline: 2016 best run
Run#2: ICB features
Run#3: CNN+W2V features
Run#4: CNN+ICB features
Run#5: CNN+ICB+W2V features
6 9/14/2017
Image subtask – different feature concatenations
Run#1: Baseline: 2016 best run
Run#2: ICB features
Run#3: CNN+W2V features
Run#4: CNN+ICB features
Run#5: CNN+ICB+W2V features
➢ Cross-validation (80%-20%)
➢ Re-train on complete devset
➢ Resampling of the data
➢ Classifier: a single MLP layer,
ReLU activation,
dropout=0.5
7 9/14/2017
Image subtask - results
Dev set Test set
Run Features MAP@10 MAP MAP@10 MAP
1 CNN 0.27 0.31 0.1028 0.2615
2 ICB 0.33 0.36 0.1054 0.2525
W2V 0.23 0.28 - -
3 CNN+W2V 0.35 0.38 0.0693 0.2244
ICB+W2V 0.33 0.37 - -
4 CNN+ICB 0.29 0.32 0.0875 0.2382
5 CNN+ICB+W2V - - 0.0861 0.2347
2016 MAP: 0.2336
 Improved MAP: dataset/annotations are better?
 Very low MAP@10 values!
 When compared with dev set: opposite results
 Dev set: Seems that semantic information brings improvement?
 Test set: Overfitting?
8 9/14/2017
Video subtask – different levels of embedding
Run#1: Baseline: 2016 run
Run#2: embedding after temporal average, no duplication
Run#3: embedding with combined (audio+video), duplication
Run#4: embedding in parallel to audio and video, duplication
Run#5: same as Run#4 but inversion of (softmax & temporal average)
run 2
9 9/14/2017
Video subtask – different levels of embedding
Run#1: Baseline: 2016 run
Run#2: embedding after temporal average, no duplication
Run#3: embedding with combined (audio+video), duplication
Run#4: embedding in parallel to audio and video, duplication
Run#5: same as Run#4 but inversion of (softmax & temporal average)
run 2
Multimodal processing:
• either one LSTM-ResNet layer (if temporal processing)
• or one simple MLP layer
Run#4 & Run#5: influence of the location of the decision step (softmax)
10
Video subtask - results
9/14/2017
Dev set Test set
Run Embedding MAP@10 MAP MAP@10 MAP
1 2016 system (A+V) 0.28 0.30 0.0589 0.1856
2 After temporal modeling - 0.27 0.0465 0.1768
3 After (A+V) merging 0.29 0.31 0.0563 0.1825
4 In parallel to (A+V) 0.30 0.32 0.0641 0.1878
5 Run#4 – location of decision - - 0.0609 0.1918
 Slightly improved MAP compared to 2016
 This time, similar results for both dev and test sets
 Semantic information did bring improvement
 Embedding at lower level, even if duplication, seems to work better
 Keeping decision for the very last step seems to be better, at least for MAP
11 9/14/2017
Thank you!

More Related Content

Similar to MediaEval 2017 - Interestingness Task: Multimodality and Deep Learning when predicting Media Interestingness

Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfVignesh V Menon
 
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsScalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsIndrajit Poddar
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsHow AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsQualcomm Research
 
18 13 sep17 8aug 8314 9991-1-ed (edit ari)
18 13 sep17 8aug 8314 9991-1-ed (edit ari)18 13 sep17 8aug 8314 9991-1-ed (edit ari)
18 13 sep17 8aug 8314 9991-1-ed (edit ari)IAESIJEECS
 
18 13 sep17 8aug 8314 9991-1-ed (edit ari)
18 13 sep17 8aug 8314 9991-1-ed (edit ari)18 13 sep17 8aug 8314 9991-1-ed (edit ari)
18 13 sep17 8aug 8314 9991-1-ed (edit ari)IAESIJEECS
 
Video compression
Video compressionVideo compression
Video compressionDarkNight14
 
Fred bovyresume@2
Fred bovyresume@2Fred bovyresume@2
Fred bovyresume@2Fred Bovy
 
Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Kasper Nissen
 
OSMC 2017 | Building a Monitoring solution for modern applications by Martin ...
OSMC 2017 | Building a Monitoring solution for modern applications by Martin ...OSMC 2017 | Building a Monitoring solution for modern applications by Martin ...
OSMC 2017 | Building a Monitoring solution for modern applications by Martin ...NETWAYS
 
給 RD 的 Kubernetes 初體驗
給 RD 的 Kubernetes 初體驗給 RD 的 Kubernetes 初體驗
給 RD 的 Kubernetes 初體驗William Yeh
 

Similar to MediaEval 2017 - Interestingness Task: Multimodality and Deep Learning when predicting Media Interestingness (20)

Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdf
 
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsScalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
 
Mpeg7
Mpeg7Mpeg7
Mpeg7
 
Video+Language: From Classification to Description
Video+Language: From Classification to DescriptionVideo+Language: From Classification to Description
Video+Language: From Classification to Description
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsHow AI research is enabling next-gen codecs
How AI research is enabling next-gen codecs
 
18 13 sep17 8aug 8314 9991-1-ed (edit ari)
18 13 sep17 8aug 8314 9991-1-ed (edit ari)18 13 sep17 8aug 8314 9991-1-ed (edit ari)
18 13 sep17 8aug 8314 9991-1-ed (edit ari)
 
18 13 sep17 8aug 8314 9991-1-ed (edit ari)
18 13 sep17 8aug 8314 9991-1-ed (edit ari)18 13 sep17 8aug 8314 9991-1-ed (edit ari)
18 13 sep17 8aug 8314 9991-1-ed (edit ari)
 
Video + Language 2019
Video + Language 2019Video + Language 2019
Video + Language 2019
 
Video + Language
Video + LanguageVideo + Language
Video + Language
 
Video compression
Video compressionVideo compression
Video compression
 
Articulo
ArticuloArticulo
Articulo
 
Fred bovyresume@2
Fred bovyresume@2Fred bovyresume@2
Fred bovyresume@2
 
Syamanth S
Syamanth SSyamanth S
Syamanth S
 
Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"
 
Mini Project- Digital Video Editing
Mini Project- Digital Video EditingMini Project- Digital Video Editing
Mini Project- Digital Video Editing
 
Worksheet 1
Worksheet 1Worksheet 1
Worksheet 1
 
OSMC 2017 | Building a Monitoring solution for modern applications by Martin ...
OSMC 2017 | Building a Monitoring solution for modern applications by Martin ...OSMC 2017 | Building a Monitoring solution for modern applications by Martin ...
OSMC 2017 | Building a Monitoring solution for modern applications by Martin ...
 
給 RD 的 Kubernetes 初體驗
給 RD 的 Kubernetes 初體驗給 RD 的 Kubernetes 初體驗
給 RD 的 Kubernetes 初體驗
 
Video Classification Basic
Video Classification Basic Video Classification Basic
Video Classification Basic
 
Resume
ResumeResume
Resume
 

More from multimediaeval

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...multimediaeval
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...multimediaeval
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...multimediaeval
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...multimediaeval
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Taskmultimediaeval
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...multimediaeval
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimatormultimediaeval
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...multimediaeval
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Imagesmultimediaeval
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matchingmultimediaeval
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...multimediaeval
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...multimediaeval
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...multimediaeval
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detectionmultimediaeval
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...multimediaeval
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attentionmultimediaeval
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...multimediaeval
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...multimediaeval
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...multimediaeval
 

More from multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 

Recently uploaded

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 

MediaEval 2017 - Interestingness Task: Multimodality and Deep Learning when predicting Media Interestingness

  • 1. Multimodality and Deep Learning when predicting Media Interestingness Eloise Berson - Claire-Hélène Demarty – Ngoc Duong Technicolor, France MediaEval 2017 Workshop September, 13-15th 2017
  • 2.  Build incrementally from last year’s systems  Re-use similar features and DNN architectures ➢ Make use of the multimodal nature of content ➢ Model its temporal evolution 2 Motivation 9/14/2017
  • 3.  Build incrementally from last year’s systems  Re-use similar features and DNN architectures ➢ Make use of the multimodal nature of content ➢ Model its temporal evolution  Investigate benefit of  Adding some semantic & contextual information to the content ➢ Add a textual modality from IMDb movie description ➢ Use Image Captioning-based features 3 Motivation 9/14/2017
  • 4.  For image and frame:  CNN features from fc7 layer of the CaffeNet model  Dimension: 4096  For audio:  60 MFCC features + first & second derivatives  Dimension: 180  For image only:  Image Captioning Based (ICB) features [1]  Dimension: 1024  For text (used for the image & video subtasks):  From IMDb description → keyword extraction → Word2Vec (W2V) description  Dimension: 300 [1] R. Kiros, R. Salakhutdinov and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539, 2014. 4 Features 9/14/2017
  • 5. 5 9/14/2017 Image subtask – different feature concatenations Run#1: Baseline: 2016 best run Run#2: ICB features Run#3: CNN+W2V features Run#4: CNN+ICB features Run#5: CNN+ICB+W2V features
  • 6. 6 9/14/2017 Image subtask – different feature concatenations Run#1: Baseline: 2016 best run Run#2: ICB features Run#3: CNN+W2V features Run#4: CNN+ICB features Run#5: CNN+ICB+W2V features ➢ Cross-validation (80%-20%) ➢ Re-train on complete devset ➢ Resampling of the data ➢ Classifier: a single MLP layer, ReLU activation, dropout=0.5
  • 7. 7 9/14/2017 Image subtask - results Dev set Test set Run Features MAP@10 MAP MAP@10 MAP 1 CNN 0.27 0.31 0.1028 0.2615 2 ICB 0.33 0.36 0.1054 0.2525 W2V 0.23 0.28 - - 3 CNN+W2V 0.35 0.38 0.0693 0.2244 ICB+W2V 0.33 0.37 - - 4 CNN+ICB 0.29 0.32 0.0875 0.2382 5 CNN+ICB+W2V - - 0.0861 0.2347 2016 MAP: 0.2336  Improved MAP: dataset/annotations are better?  Very low MAP@10 values!  When compared with dev set: opposite results  Dev set: Seems that semantic information brings improvement?  Test set: Overfitting?
  • 8. 8 9/14/2017 Video subtask – different levels of embedding Run#1: Baseline: 2016 run Run#2: embedding after temporal average, no duplication Run#3: embedding with combined (audio+video), duplication Run#4: embedding in parallel to audio and video, duplication Run#5: same as Run#4 but inversion of (softmax & temporal average) run 2
  • 9. 9 9/14/2017 Video subtask – different levels of embedding Run#1: Baseline: 2016 run Run#2: embedding after temporal average, no duplication Run#3: embedding with combined (audio+video), duplication Run#4: embedding in parallel to audio and video, duplication Run#5: same as Run#4 but inversion of (softmax & temporal average) run 2 Multimodal processing: • either one LSTM-ResNet layer (if temporal processing) • or one simple MLP layer Run#4 & Run#5: influence of the location of the decision step (softmax)
  • 10. 10 Video subtask - results 9/14/2017 Dev set Test set Run Embedding MAP@10 MAP MAP@10 MAP 1 2016 system (A+V) 0.28 0.30 0.0589 0.1856 2 After temporal modeling - 0.27 0.0465 0.1768 3 After (A+V) merging 0.29 0.31 0.0563 0.1825 4 In parallel to (A+V) 0.30 0.32 0.0641 0.1878 5 Run#4 – location of decision - - 0.0609 0.1918  Slightly improved MAP compared to 2016  This time, similar results for both dev and test sets  Semantic information did bring improvement  Embedding at lower level, even if duplication, seems to work better  Keeping decision for the very last step seems to be better, at least for MAP