Upcoming SlideShare
×

# Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

1,517 views

Published on

Slides of our MMM 2012 paper.

Published in: Technology, Art & Photos
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

Views
Total views
1,517
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
8
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

1. 1. Identifying Objects inImages from Analyzing theUser„s Gaze Movementsfor Provided TagsTina Walber, Ansgar Scherp, Steffen StaabUniversity of Koblenz-Landau, Koblenz, GermanyMultimedia Modeling ConferenceKlagenfurt, AustriaJanuary 4-6, 2012
2. 2. Motivation: Image Tagging tree girl car store people sidewalk  Find specific objects in images  Analyzing the user‟s gaze path only T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 2 of 21
3. 3. Research Questions1.Best fixation measure to find the correct image region given a specific tag?2. Can we differentiate two regions in the same image? T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 3 of 21
4. 4. 3 Steps Conducted by Users Look at red blinking dot Decide whether tag can be seen (“y” or “n”) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 4 of 21
5. 5. Dataset LabelMe community images  Manually drawn polygons  Regions annotated with tags 182.657 images (August 2010) High-quality segmentation and annotation Used as ground truth T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 5 of 21
6. 6. Experiment Images and Tags Randomly selected 51 images Contain at least two tagged regions Created two tag sets for the 51 images Each image is assigned two tags (one per set) Tags are either “true” or “false”  “true”  object described by tag can be seen  “false”  object cannot be seen on the image Keep subjects concentrated during experiment T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 6 of 21
7. 7. Subjects & Experiment System 20 subjects  16 male, 4 female (age: 23-40, Ø=29.6)  Undergrads (6), PhD (12), office clerks (2) Experiment system  Simple web page in Internet Explorer  Standard notebook, resolution 1680x1050  Tobii X60 eye-tracker (60 Hz, 0.5° accuracy) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 7 of 21
8. 8. Conducting the Experiment Each user looked at 51 tag-image-pairs First tag-image-pair dismissed 94.3% correct answers Equal for true/false tags ~3s until decision (average) 85% of users strongly agreed or agreed that they felt comfortable during the experiment  Eyetracker did not much influence comfort T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 8 of 21
9. 9. Pre-processing of Eye-tracking Data Obtained 547 gaze paths from 20 users where  Users gave correct answers  Image has “true” tag assigned Fixation extraction  Tobii Studio‟s velocity & distance thresholds  Fixation: focus on particular point on screen One fixation inside or near the correct region 476 (87%) gaze paths fulfill this requirement T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 9 of 21
10. 10. Analysis of Gaze Fixations (1) Applied 13 fixation measures on the 476 paths (2 new, 7 standard Tobii , 4 literature) Fixation measure: function on users‟ gaze paths Calculated for each image region, over all users viewing the same tag-image-pair T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 10 of 21
11. 11. Considered Fixation MeasuresNr Name Favorite region r Origin1 firstFixation No. of fixations before 1st on r Tobii2 secondFixation No. of fixations before 2nd on r [13]3 fixationsAfter No. of fixations after last on r [4]4 fixationsBeforeDecision fixationsAfter, but before decision New5 fixationsAfterDecision fixationsBeforeDecision and after New6 fixationDuration Total duration of all fixations on r Tobii7 firstFixationDuration Duration of first fixation on r Tobii8 lastFixationDuration Duration of last fixation on r [11]9 fixationCount Number of fixations on r Tobii10 maxVisitDuration Max time first fixation until outside r Tobii11 meanVisitDuration Mean time first fixation until outside r Tobii12 visitCount No. of fixations until outside r Tobii13 T. saccLength S. Staab – Identifying Objects in Imageslength, before fixation on r Walber, A. Scherp, Saccade [6]of 21 11
12. 12. Analysis of Gaze Fixations (2) For every image region (b) the fixation measure is calculated over all gaze paths (c) Results are summed up per region Regions ordered according to fixation measure If favorite region (d) and tag (a) match, result is true positive (tp), otherwise false positive (fp) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 12 of 21
13. 13. Precision per Fixation Measure meanVisitDuration PSum of tp and fp assignments fixationsBeforeDecision lastFixationDuration fixationDuration Fixation measures T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 13 of 21
14. 14. Adding Boundaries and Weights Take eye-tracker inaccuracies into account Extension of region boundaries by 13 pixels Larger regions more likely to be fixated Give weight to regions < 5% of image size meanVisitDuration increases to P = 0.67 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 14 of 21
15. 15. Examples: Tag-Region-Assignments T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 15 of 21
16. 16. Comparison with Baselines Naïve baseline: largest region r is favorite Random baseline: randomly select favorite r Gaze / Gaze* significantly better (χ², α<0.001) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 16 of 21
17. 17. Effect of Gaze Path Aggregation P Number of gaze paths used Aggregation of precision P for Gaze* Single user still significantly better (χ² for naive with α<0.001 and random with α<0.002) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 17 of 21
18. 18. Research Questions1.Best fixation measure to find the correct image region given a specific tag?  meanVisitDuration with precision of 67%2. Can we differentiate two regions in the same image? T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 18 of 21
19. 19. Differentiate Two Objects Use second tag set to identify different objects in the same image 16 images (of our 51) have two “true” tags 6 images had two correct regions identified  Proportion of 38% Average precision for single object is 67%  Correct tag assignment for two images: 44% T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 19 of 21
20. 20. Correctly Differentiated Objects T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 20 of 21
21. 21. Research Questions1.Best fixation measure to find the correct image region given a specific tag?  meanVisitDuration with precision of 67%2. Can we differentiate two regions in the same image?  Accuracy of 38%Acknowledgement: This research was partially supported by the EU projectsPetamedia (FP7-216444) andObjects in Images T. Walber, A. Scherp, S. Staab – Identifying SocialSensor (FP7-287975). 21 of 21
22. 22. Influence of Red Dot First 5 fixations, over all subjects and all images T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 22 of 21
23. 23. Experiment Data Cleaning Manually replaced images witha) Tags that are incomprehensible, require expert-knowledge, or nonsenseb) Tag refers to multiple regions, but not all are drawn into the image (e.g., bicycle)c) Obstructed objects (bicycle behind a car)d) “False”-tag actually refers to a visible part of the image and thus were “true” tags T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 23 of 21