Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ReferItGame: Referring to Objects in Photographs of Natural Scenes


Published on

Survey of ReferItGame: Referring to Objects in Photographs of Natural Scenes

Published in: Technology
  • Be the first to comment

  • Be the first to like this

ReferItGame: Referring to Objects in Photographs of Natural Scenes

  1. 1. ReferItGame: Referring to Objects in Photographs of Natural Scenes Sahar Kazemzadeh, Vicente Ordonez, Mark Matten, Tamara L. Berg Presenter: Ruofei Du, Hao Zhou 09/19/2014
  2. 2. Natural Language Description: Two chimpanzees are playing with a soccer ball on grass. Referring Expression Generation (REG): [soccer, on, grass] [chimpanzee, young] [chimpanzee, old] Introduction Natural language referring expressions
  3. 3. Outline Referring to Objects in Photographs of Natural Scenes ● ReferItGame o Crowdsourcing, two-player validation ● ReferItGame dataset: o Data Collection o NLP Parsing, analyses and insights ● Generating Referring Expressions o Optimization based model to generate Referring Expressions
  4. 4. ReferItGame Gameplay and interface ● A two-player online game: ● Player 1 o Write a referring expression for the segmentation in the red contour. ● Player 2 o Click on the location of the described object
  5. 5. ReferItGame Training Session
  6. 6. ReferItGame Points rewarding mechanism
  7. 7. ReferItGame Writing referring expressions
  8. 8. ReferItGame Expression validation
  9. 9. ReferItGame Playing against the computer ● Single player version when no available other players online ● 5000 pre-recorded referring expressions o Amazon’s Mechanical Turk ● Automating Player 1 o Easy, show the pre-collected expressions ● Automating Player 2 o Compare the written expression against the pre-recorded expressions o Compute cosine similarity between two expressions with a bag of words representation  No similarity, generate random wrong click  Successful match, generate canned click
  10. 10. ReferItGame Dataset Image and Labels ImageCLEF IAPR 20,000 images SAIAPR TC-12 segmentation, 238 Obj categories
  11. 11. ReferItGame Dataset Collecting the dataset ● Collection Method o 4,000 expressions in 3 weeks  Social Media  Survey Section of Reddit o Remaining  Mechanical Turk  80% approval ratings and US only ● Statistics o 130,525 completed games  10,431 canned  120,094 real o 96,654 objs o 19,984 images
  12. 12. ReferItGame Dataset Processing the Dataset ● 7-tuple attributes: o r1: entry-level category bird, oscine, vertebrate o r2: color blue o r3: size tiny o r4: absolute location top of the image o r5: relative location relation … the car
  13. 13. ReferItGame Dataset Parsing ● StandfordCoreNLP parser (91% accuracy, 4,500 expressions, manually) o Predefined dictionary of attribute-values o Head noun in parse tree as Subject, template-based for attributes
  14. 14. ReferItGame Dataset Frequency & attribute occurrence for common categories Normalization? “For example, color attributes are used more frequently for categories like “car” or “woman” than for categories like “sky” or “rock”. “
  15. 15. ReferItGame Dataset Objects frequently used as reference points
  16. 16. ReferItGame Dataset Frequency of using 0,1,2 attributes within the same exp.
  17. 17. ReferItGame Dataset Object locations VS location words used
  18. 18. ReferItGame Dataset Normalized object size VS size words used
  19. 19. ReferItGame Dataset The frequency of usage of each attribute type for single/multiple instances of the category
  20. 20. ReferItGame Dataset Tag clouds showing entry-Level category words used in referring expressions
  21. 21. Generating RE: Generating Referring Expression Generation Model Referring ExpressionInput Images Got RE by optimizing:
  22. 22. Function E: Generating Referring Expression Generation Model Constraints:
  23. 23. Generating Referring Expression Content-based potentials Color attribute: Size attribute: Absolute-location attribute:
  24. 24. Generating Referring Expression Content-based potentials Relative-location attribute: Relative-object attribute:
  25. 25. Generating Referring Expression Prior statistics-based potentials Unary potentials: Relative-object attribute:
  26. 26. Experiments Qualitative Quite well Less well
  27. 27. Experiments Example objects predicted by color attribute values Cons: Color predictor makes mistakes. Suggestions: Cluster and predict color using vision method.
  28. 28. Experiments Quantitative
  29. 29. Summary ● Invent a two player online game to collect and verify RE ● A new large-scale dataset containing REs ● Analysis, study category-specific variations ● Optimization based model to generate RE for objects
  30. 30. Questions ● How to design the threshold of similarity for “canned’’ game? If just one meaningless word is matched, will this be called matched? Is it a problem to include this in the training data? ● Should Figure 3 plot A be normalized in order to do comparison? ● How to decide the precision and recall? (Table 1) ● For r5, should the relative position information be included in order to find the correct r5 (only based on probability is wired) ● For r6, should the salience of objects be considered? What if there are many objects around the target? Would the algorithm be scalable?