SlideShare a Scribd company logo
1 of 23
Identifying Objects in
Images from Analyzing the
User„s Gaze Movements
for Provided Tags
Tina Walber, Ansgar Scherp, Steffen Staab
University of Koblenz-Landau, Koblenz, Germany

Multimedia Modeling Conference
Klagenfurt, Austria
January 4-6, 2012
Motivation: Image Tagging
                      tree

                                                                  girl
       car

                                                                                  store

                                                                         people
       sidewalk
     Find specific objects in images
     Analyzing the user‟s gaze path only
 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images                     2 of 21
Research Questions


1.Best fixation measure to find the correct
  image region given a specific tag?



2. Can we differentiate two regions in the
   same image?


  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   3 of 21
3 Steps Conducted by Users




 Look at red blinking dot
 Decide whether tag can be seen (“y” or “n”)
 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   4 of 21
Dataset
 LabelMe community images
   Manually drawn polygons
   Regions annotated with tags
 182.657 images (August 2010)



 High-quality segmentation and annotation
 Used as ground truth

 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   5 of 21
Experiment Images and Tags
 Randomly selected 51 images
 Contain at least two tagged regions

 Created two tag sets for the 51 images
 Each image is assigned two tags (one per set)

 Tags are either “true” or “false”
   “true”  object described by tag can be seen
   “false”  object cannot be seen on the image
 Keep subjects concentrated during experiment
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   6 of 21
Subjects & Experiment System
 20 subjects
   16 male, 4 female (age: 23-40, Ø=29.6)
   Undergrads (6), PhD (12), office clerks (2)


 Experiment system
    Simple web page in Internet Explorer
    Standard notebook, resolution 1680x1050
    Tobii X60 eye-tracker (60 Hz, 0.5° accuracy)

  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   7 of 21
Conducting the Experiment
 Each user looked at 51 tag-image-pairs
 First tag-image-pair dismissed

 94.3% correct answers
 Equal for true/false tags
 ~3s until decision (average)

 85% of users strongly agreed or agreed that
  they felt comfortable during the experiment
   Eyetracker did not much influence comfort
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   8 of 21
Pre-processing of Eye-tracking Data
 Obtained 547 gaze paths from 20 users where
   Users gave correct answers
   Image has “true” tag assigned
 Fixation extraction
   Tobii Studio‟s velocity & distance thresholds
   Fixation: focus on particular point on screen

 One fixation inside or near the correct region
 476 (87%) gaze paths fulfill this requirement

  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   9 of 21
Analysis of Gaze Fixations (1)
 Applied 13 fixation measures on the 476 paths
  (2 new, 7 standard Tobii , 4 literature)

 Fixation measure: function on users‟ gaze paths
 Calculated for each image region, over all users
  viewing the same tag-image-pair




  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   10 of 21
Considered Fixation Measures
Nr Name                             Favorite region r                                   Origin
1    firstFixation                  No. of fixations before 1st on r                    Tobii
2    secondFixation                 No. of fixations before 2nd on r                    [13]
3    fixationsAfter                 No. of fixations after last on r                    [4]
4    fixationsBeforeDecision fixationsAfter, but before decision                        New
5    fixationsAfterDecision         fixationsBeforeDecision and after                   New
6    fixationDuration               Total duration of all fixations on r                Tobii
7    firstFixationDuration          Duration of first fixation on r                     Tobii
8    lastFixationDuration           Duration of last fixation on r                      [11]
9    fixationCount                  Number of fixations on r                            Tobii
10 maxVisitDuration                 Max time first fixation until outside r             Tobii
11 meanVisitDuration                Mean time first fixation until outside r Tobii
12 visitCount                       No. of fixations until outside r                    Tobii
13 T. saccLength S. Staab – Identifying Objects in Imageslength, before fixation on r
      Walber, A. Scherp,                Saccade                                         [6]of 21
                                                                                         11
Analysis of Gaze Fixations (2)




 For every image region (b) the fixation
  measure is calculated over all gaze paths (c)
 Results are summed up per region
 Regions ordered according to fixation measure
 If favorite region (d) and tag (a) match, result is
  true positive (tp), otherwise false positive (fp)
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   12 of 21
Precision per Fixation Measure
                                                                          meanVisitDuration                               P
Sum of tp and fp assignments




            fixationsBeforeDecision                                                             lastFixationDuration


                                                                                      fixationDuration



                                                                       Fixation measures
                               T. Walber, A. Scherp, S. Staab – Identifying Objects in Images                  13 of 21
Adding Boundaries and Weights
 Take eye-tracker inaccuracies into account
 Extension of region boundaries by 13 pixels




 Larger regions more likely to be fixated
 Give weight to regions < 5% of image size
 meanVisitDuration increases to P = 0.67
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   14 of 21
Examples: Tag-Region-Assignments




 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   15 of 21
Comparison with Baselines




 Naïve baseline: largest region r is favorite
 Random baseline: randomly select favorite r

 Gaze / Gaze* significantly better (χ², α<0.001)

  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   16 of 21
Effect of Gaze Path Aggregation
         P




                                    Number of gaze paths used

 Aggregation of precision P for Gaze*

 Single user still significantly better (χ² for
  naive with α<0.001 and random with α<0.002)
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   17 of 21
Research Questions


1.Best fixation measure to find the correct
  image region given a specific tag?
   meanVisitDuration with precision of 67%


2. Can we differentiate two regions in the
   same image?


  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   18 of 21
Differentiate Two Objects
 Use second tag set to identify different objects
  in the same image
 16 images (of our 51) have two “true” tags
 6 images had two correct regions identified
   Proportion of 38%

 Average precision for single object is 67%
  Correct tag assignment for two images: 44%


  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   19 of 21
Correctly Differentiated Objects




 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   20 of 21
Research Questions


1.Best fixation measure to find the correct
  image region given a specific tag?
    meanVisitDuration with precision of 67%


2. Can we differentiate two regions in the
   same image?
   Accuracy of 38%
Acknowledgement: This research was partially supported by the EU projects
Petamedia (FP7-216444) andObjects in Images
   T. Walber, A. Scherp, S. Staab – Identifying SocialSensor (FP7-287975). 21 of 21
Influence of Red Dot




 First 5 fixations, over all subjects and all images
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   22 of 21
Experiment Data Cleaning
 Manually replaced images with
a) Tags that are incomprehensible, require
   expert-knowledge, or nonsense
b) Tag refers to multiple regions, but not all are
   drawn into the image (e.g., bicycle)
c) Obstructed objects (bicycle behind a car)
d) “False”-tag actually refers to a visible part of
   the image and thus were “true” tags


  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   23 of 21

More Related Content

More from Ansgar Scherp

STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...Ansgar Scherp
 
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Ansgar Scherp
 
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresA Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresAnsgar Scherp
 
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...Ansgar Scherp
 
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataAnsgar Scherp
 
Knowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesKnowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesAnsgar Scherp
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationAnsgar Scherp
 
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Ansgar Scherp
 
A Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the WebA Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the WebAnsgar Scherp
 
Smart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interestSmart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interestAnsgar Scherp
 
Events in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationEvents in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationAnsgar Scherp
 
Linked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triplesLinked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triplesAnsgar Scherp
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataAnsgar Scherp
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataAnsgar Scherp
 
A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...Ansgar Scherp
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudAnsgar Scherp
 
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...Ansgar Scherp
 
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)Ansgar Scherp
 

More from Ansgar Scherp (18)

STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
 
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
 
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresA Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
 
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
 
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open Data
 
Knowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesKnowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital Libraries
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document Annotation
 
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
 
A Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the WebA Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the Web
 
Smart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interestSmart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interest
 
Events in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationEvents in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, Application
 
Linked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triplesLinked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triples
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open Data
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open Data
 
A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
 
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
 
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

  • 1. Identifying Objects in Images from Analyzing the User„s Gaze Movements for Provided Tags Tina Walber, Ansgar Scherp, Steffen Staab University of Koblenz-Landau, Koblenz, Germany Multimedia Modeling Conference Klagenfurt, Austria January 4-6, 2012
  • 2. Motivation: Image Tagging tree girl car store people sidewalk  Find specific objects in images  Analyzing the user‟s gaze path only T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 2 of 21
  • 3. Research Questions 1.Best fixation measure to find the correct image region given a specific tag? 2. Can we differentiate two regions in the same image? T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 3 of 21
  • 4. 3 Steps Conducted by Users  Look at red blinking dot  Decide whether tag can be seen (“y” or “n”) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 4 of 21
  • 5. Dataset  LabelMe community images  Manually drawn polygons  Regions annotated with tags  182.657 images (August 2010)  High-quality segmentation and annotation  Used as ground truth T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 5 of 21
  • 6. Experiment Images and Tags  Randomly selected 51 images  Contain at least two tagged regions  Created two tag sets for the 51 images  Each image is assigned two tags (one per set)  Tags are either “true” or “false”  “true”  object described by tag can be seen  “false”  object cannot be seen on the image  Keep subjects concentrated during experiment T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 6 of 21
  • 7. Subjects & Experiment System  20 subjects  16 male, 4 female (age: 23-40, Ø=29.6)  Undergrads (6), PhD (12), office clerks (2)  Experiment system  Simple web page in Internet Explorer  Standard notebook, resolution 1680x1050  Tobii X60 eye-tracker (60 Hz, 0.5° accuracy) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 7 of 21
  • 8. Conducting the Experiment  Each user looked at 51 tag-image-pairs  First tag-image-pair dismissed  94.3% correct answers  Equal for true/false tags  ~3s until decision (average)  85% of users strongly agreed or agreed that they felt comfortable during the experiment  Eyetracker did not much influence comfort T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 8 of 21
  • 9. Pre-processing of Eye-tracking Data  Obtained 547 gaze paths from 20 users where  Users gave correct answers  Image has “true” tag assigned  Fixation extraction  Tobii Studio‟s velocity & distance thresholds  Fixation: focus on particular point on screen  One fixation inside or near the correct region  476 (87%) gaze paths fulfill this requirement T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 9 of 21
  • 10. Analysis of Gaze Fixations (1)  Applied 13 fixation measures on the 476 paths (2 new, 7 standard Tobii , 4 literature)  Fixation measure: function on users‟ gaze paths  Calculated for each image region, over all users viewing the same tag-image-pair T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 10 of 21
  • 11. Considered Fixation Measures Nr Name Favorite region r Origin 1 firstFixation No. of fixations before 1st on r Tobii 2 secondFixation No. of fixations before 2nd on r [13] 3 fixationsAfter No. of fixations after last on r [4] 4 fixationsBeforeDecision fixationsAfter, but before decision New 5 fixationsAfterDecision fixationsBeforeDecision and after New 6 fixationDuration Total duration of all fixations on r Tobii 7 firstFixationDuration Duration of first fixation on r Tobii 8 lastFixationDuration Duration of last fixation on r [11] 9 fixationCount Number of fixations on r Tobii 10 maxVisitDuration Max time first fixation until outside r Tobii 11 meanVisitDuration Mean time first fixation until outside r Tobii 12 visitCount No. of fixations until outside r Tobii 13 T. saccLength S. Staab – Identifying Objects in Imageslength, before fixation on r Walber, A. Scherp, Saccade [6]of 21 11
  • 12. Analysis of Gaze Fixations (2)  For every image region (b) the fixation measure is calculated over all gaze paths (c)  Results are summed up per region  Regions ordered according to fixation measure  If favorite region (d) and tag (a) match, result is true positive (tp), otherwise false positive (fp) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 12 of 21
  • 13. Precision per Fixation Measure meanVisitDuration P Sum of tp and fp assignments fixationsBeforeDecision lastFixationDuration fixationDuration Fixation measures T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 13 of 21
  • 14. Adding Boundaries and Weights  Take eye-tracker inaccuracies into account  Extension of region boundaries by 13 pixels  Larger regions more likely to be fixated  Give weight to regions < 5% of image size  meanVisitDuration increases to P = 0.67 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 14 of 21
  • 15. Examples: Tag-Region-Assignments T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 15 of 21
  • 16. Comparison with Baselines  Naïve baseline: largest region r is favorite  Random baseline: randomly select favorite r  Gaze / Gaze* significantly better (χ², α<0.001) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 16 of 21
  • 17. Effect of Gaze Path Aggregation P Number of gaze paths used  Aggregation of precision P for Gaze*  Single user still significantly better (χ² for naive with α<0.001 and random with α<0.002) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 17 of 21
  • 18. Research Questions 1.Best fixation measure to find the correct image region given a specific tag?  meanVisitDuration with precision of 67% 2. Can we differentiate two regions in the same image? T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 18 of 21
  • 19. Differentiate Two Objects  Use second tag set to identify different objects in the same image  16 images (of our 51) have two “true” tags  6 images had two correct regions identified  Proportion of 38%  Average precision for single object is 67%  Correct tag assignment for two images: 44% T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 19 of 21
  • 20. Correctly Differentiated Objects T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 20 of 21
  • 21. Research Questions 1.Best fixation measure to find the correct image region given a specific tag?  meanVisitDuration with precision of 67% 2. Can we differentiate two regions in the same image?  Accuracy of 38% Acknowledgement: This research was partially supported by the EU projects Petamedia (FP7-216444) andObjects in Images T. Walber, A. Scherp, S. Staab – Identifying SocialSensor (FP7-287975). 21 of 21
  • 22. Influence of Red Dot  First 5 fixations, over all subjects and all images T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 22 of 21
  • 23. Experiment Data Cleaning  Manually replaced images with a) Tags that are incomprehensible, require expert-knowledge, or nonsense b) Tag refers to multiple regions, but not all are drawn into the image (e.g., bicycle) c) Obstructed objects (bicycle behind a car) d) “False”-tag actually refers to a visible part of the image and thus were “true” tags T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 23 of 21