Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What’s in a Question: Using Visual Questions as a Form of Supervision

117 views

Published on

CVPR 2017 Spotlight presentation
Authors: Siddha Ganju, Olga Russakovsky, Abhinav Gupta
Institute: Carnegie Mellon University
Project Page: http://sidgan.me/whats_in_a_question/

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

What’s in a Question: Using Visual Questions as a Form of Supervision

  1. 1. What’s in a Question: Using Visual Questions as a Form of Supervision
 Siddha Ganju, Olga Russakovsky, Abhinav Gupta http://sidgan.me/whats_in_a_question/
  2. 2. Questions are informative Information from the question: • The animal in the scene is a ‘dog’ • ‘Breed’ is a property of ‘dog’ • All ‘dogs’ in the scene are of the same ‘breed’ • Knowing the ‘breed’ may be important Goal: quantify and utilize this information What breed of dog is this? What’s in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju, Olga Russakovsky, Abhinav Gupta (CMU) http://sidgan.me/whats_in_a_question/
  3. 3. Analysis of Visual Questions (with no answers) What’s in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju, Olga Russakovsky, Abhinav Gupta (CMU) http://sidgan.me/whats_in_a_question/ • Questions image captions • Questions object classification • Questions to improve VQA Question Inferred objects What color is the bus?
  4. 4. CNN Text Embedding What is under the plane dolphin (0.2) yes (0.1) water (0.7) Multiple choice Can this plane land on water How many planes are there Text Embedding iBOWIMG [Zhou ArXiv15] iBOWIMG-2x Improving VQA with extra questions What’s in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju, Olga Russakovsky, Abhinav Gupta (CMU) http://sidgan.me/whats_in_a_question/
  5. 5. Experiment #1: With unanswered questions Results on MSCOCO VQA 1.0 What’s in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju, Olga Russakovsky, Abhinav Gupta (CMU) http://sidgan.me/whats_in_a_question/ Training: 1 target question (with answer) and 2 other questions per image Test: VQA Val set Model Accuracy iBOWIMG [Zhou ArXiv15] 47.3 iBOWIMG-2x 50.4
  6. 6. Experiment #2: Standard benchmark Results on MSCOCO VQA 1.0 What’s in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju, Olga Russakovsky, Abhinav Gupta (CMU) http://sidgan.me/whats_in_a_question/ Accuracy by question type Model Accuracy Yes/no Number Word iBOWIMG [Zhou ArXiv15] 55.7 76.5 34.9 42.6 iBOWIMG-2x 62.8 80.7 37.9 53.1 Training: All questions and answers on train+val Test: VQA test-dev set
  7. 7. Conclusions • Study using visual questions themselves as a form of supervision • Provide both qualitative and quantitative analysis of how much information is contained within the questions • Demonstrate significant improvements over baselines on standard benchmarks What’s in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju, Olga Russakovsky, Abhinav Gupta (CMU) http://sidgan.me/whats_in_a_question/

×