SlideShare a Scribd company logo
1 of 37
Interaction Lab. Seoul National University of Science and Technology
Neural Networks for Semantic Gaze
Analysis in XR Settings
Jeong Jae-Yeop
ETRA2021, ACM Symposium on Eye Tracking Research and Applications
Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Lena Stubbemann, Dominik Dürrschnabel, Robert Refflinghaus 2021
Interaction Lab., Seoul National University of Science and Technology
■Intro
■Approach
■Evaluation
■Conclusion and future work
Agenda
2
Intro
Approach
3
Interaction Lab., Seoul National University of Science and Technology
■Semantic gaze analysis
 The process to identify objects or features of visual and cognitive attention
• Well controlled settings
• Visual patterns and oculometric parameters
• What users are looking at
Intro(1/6)
4
Interaction Lab., Seoul National University of Science and Technology
■Semantic gaze analysis in XR settings
 ROI (Region of Interest)
• Two-dimensional depiction of an object
 VOI (Volumes of Interest)
• Three-dimensional object that emerges from this intersection
Intro(2/6)
5
Interaction Lab., Seoul National University of Science and Technology
■Annotation to VOIs data(1/2)
 VOIs data for gaze
• User-specific gaze videos with constantly changing perspectives on the target object
• Move, vanish, reappear and change shape, size or illumination …
• Time consuming process
• Manual annotations are thus still considered a standard procedure
Intro(3/6)
6
Interaction Lab., Seoul National University of Science and Technology
■Annotation to VOIs data(2/2)
 VOIs annotation problem → Image classification
• CAD (Computer Aided Design) model
• CNN (Convolutional Neural Network)
• Three-dimensional problem → Two-dimensional problem : simplified
• CNN can also recognize different perspectives on the same three-dimensional body
Intro(4/6)
7
Interaction Lab., Seoul National University of Science and Technology
■Data augmentation
 GAN (Generative Adversarial Network)
• Image augmentation technique to adapt the training data to real environmental factors
• Overcome the need for challenging photorealistic simulations
• VOI annotation not only on an object level but also on a product feature level
Intro(5/6)
8
Interaction Lab., Seoul National University of Science and Technology
■Overview
Intro(6/6)
9
Approach
Evaluation
10
Interaction Lab., Seoul National University of Science and Technology
■Address annotation problem using object recognition
 Methodological details
• Use a CAD model to prepare training data for Cycle-GAN
• Use Cycle-GAN to create reality-alike synthetic data set
• Use synthetic data set to train CNN (Convolutional Neural Network)
• Predict VOIs of experimental data with trained CNN model
Approach(1/10)
Interaction Lab., Seoul National University of Science and Technology
■Use a CAD model to prepare training data for Cycle-GAN(1/2)
 The essential resource for using object recognition algorithms is suitable database
 Feature level annotation
• CAD model or virtual prototype
Approach(2/10)
12
Interaction Lab., Seoul National University of Science and Technology
■Use a CAD model to prepare training data for Cycle-GAN(2/2)
 Training data
Approach(3/10)
13
Interaction Lab., Seoul National University of Science and Technology
■Experimental data
 Egocentric videos, which are split into frames
 Only fixation marker, not scan path
• Only one fixation marker is contained in each frame
 Gaze coordinates (𝑥, 𝑦)
Approach(4/10)
14
Interaction Lab., Seoul National University of Science and Technology
■Use Cycle-GAN to create reality-alike synthetic data set
Approach(5/10)
15
Interaction Lab., Seoul National University of Science and Technology
■GAN (Generative Adversarial Network)
Approach(6/10)
16
Interaction Lab., Seoul National University of Science and Technology
■Cycle-GAN (Cycle Generative Adversarial Network)
Approach(7/10)
17
Interaction Lab., Seoul National University of Science and Technology
■Use synthetic data set to train CNN (Convolutional Neural Network)
Approach(8/10)
18
Interaction Lab., Seoul National University of Science and Technology
■Object recognition
 Object localization combined with image classification
• Pixels to instances by means of adjacent pixels that share textures, colors, or intensities
• Feature level recognition
 Eye tracking data
• Semantic or instance segmentation can be dispensed
• Provide us with the exact coordinates of the fixation relative to the gaze replay
Approach(9/10)
19
Interaction Lab., Seoul National University of Science and Technology
■Predict VOIs of experimental data with trained CNN model
 ResNet50v2
Approach(10/10)
20
Evaluation
Conclusion and future work
21
Interaction Lab., Seoul National University of Science and Technology
■Experimental setup
 Real world and virtual-reality setting
 Fully automated coffee machine
 VOI annotation on feature level
Evaluation(1/7)
Interaction Lab., Seoul National University of Science and Technology
■Conditions/baseline
 Comparing other method
• 𝐸𝑦𝑒𝑆𝑒𝑒3𝐷 (https://eyesee3d.eyemovementresearch.com/)
 𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ ∶ 𝑚𝑎𝑛𝑢𝑎𝑙 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛
 Performance metrics
• 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑎𝑛𝑑 𝑟𝑒𝑐𝑎𝑙𝑙, 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝐹1 − 𝑠𝑐𝑜𝑟𝑒
Evaluation(2/7)
23
Interaction Lab., Seoul National University of Science and Technology
■User study design
 Participants
• 24 (6 female and 18 male)
• 3 points calibration of the eye tracking system
• Interact with the product in both the virtual and the real settings
 First phase in experiment
• Freely explore object for 60 seconds
• Free movement around the machine
 Second phase in experiment
• Subjects are asked about their perceptual impressions
• Led to certain product features as they have to solve tasks such as brewing coffee
Evaluation(3/7)
24
Interaction Lab., Seoul National University of Science and Technology
■Apparatus
 Unity3D
• Two projectors with a resolution 1920 x 1200 pixels each
 SMIs mobile glasses + SMI 3D-6D head tracking
 Outside-in motion tracking OptiTrack Prime^x 13W
 Fixation detection with BeGaze 3.7
 Desktop
• Nvidia GeForce RTX 2060 SUPER chip
• 8GB RAM
Evaluation(4/7)
25
Interaction Lab., Seoul National University of Science and Technology
■Network trainings
 Thumbnail size : 224 x 224 px
 Image augmentation using Cycle-GAN
• Simulation image : 1,000
• Virtual image : 1,000
• Real image : 1,000
• Default settings except for epoch 50
 Total training data after augmentation
• Simulation image : 100,000
• Virtual image : 100,000
• Real image : 100,000
Evaluation(5/7)
26
Interaction Lab., Seoul National University of Science and Technology
■Data preparation
Evaluation(6/7)
27
Interaction Lab., Seoul National University of Science and Technology
■Network trainings
 CNN classification
• ResNet50v2 architecture
• Output layer with 12 neurons (10 VOIs + “Coffee machine but no VOI” and “No coffee machine”)
• 224 x 224
• Adam, learning rate of 0.001 over 20 epochs with the sparse categorical cross-entropy
Evaluation(7/7)
28
Conclusion and future work
29
Interaction Lab., Seoul National University of Science and Technology
■Result
 CNN-approach performs slightly better in virtual reality than in the real world
 Human annotation
• About 30,000, 25 hours (20 images per minute)
Conclusion and future work(1/7)
Interaction Lab., Seoul National University of Science and Technology
■Discussion(1/3)
 There the fixation marker is ambiguously located between four different VOIs and default classes
• Some of which are adjacent and others which are simultaneously hidden due to depth effects
Conclusion and future work(2/7)
31
Interaction Lab., Seoul National University of Science and Technology
■Discussion(2/3)
 Some are well-recognize and some are not
• Well classified : Display
 Standard classification problem
Conclusion and future work(3/7)
32
Interaction Lab., Seoul National University of Science and Technology
■Discussion(3/3)
 Cycle-GAN can also degrade image quality
• Use gaze coordinates, not fixation marker
Conclusion and future work(4/7)
33
Interaction Lab., Seoul National University of Science and Technology
■Limitation
 The study gave a proof of concept for two different domains
• Only coffee machine
Conclusion and future work(5/7)
34
Interaction Lab., Seoul National University of Science and Technology
■Conclusion
 Propose a method for semantic gaze analysis using machine learning, while eliminating the
resource-intense process of human annotations
 Neither markers nor motion tracking systems are required
 Not contain a personal bias and is thus not prone to evaluator effects
 The same methodical evaluation can be used across platforms
Conclusion and future work(6/7)
35
Interaction Lab., Seoul National University of Science and Technology
■Future work
 Our work is to be seen as a proof of concept.
• Potential future work to further increase the accuracy of predictions
 Chances for improving our approach
• Advanced image classification methods or further improving the image augmentations techniques
Conclusion and future work(7/7)
36
Q&A
37

More Related Content

Similar to Neural networks for semantic gaze analysis in xr settings

Unsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationUnsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationJaey Jeong
 
Mlp mixer an all-mlp architecture for vision
Mlp mixer  an all-mlp architecture for visionMlp mixer  an all-mlp architecture for vision
Mlp mixer an all-mlp architecture for visionJaey Jeong
 
Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Jaey Jeong
 
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsTablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsJaey Jeong
 
Gaze estimation using transformer
Gaze estimation using transformerGaze estimation using transformer
Gaze estimation using transformerJaey Jeong
 
Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Jaey Jeong
 
Accurate and low complex cell histogram generation by bypass the gradient of ...
Accurate and low complex cell histogram generation by bypass the gradient of ...Accurate and low complex cell histogram generation by bypass the gradient of ...
Accurate and low complex cell histogram generation by bypass the gradient of ...Nothing!
 
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...AnuragVijayAgrawal
 
Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...Jaey Jeong
 
Targeting accurate object extraction from an image a comprehensive study of ...
Targeting accurate object extraction from an image  a comprehensive study of ...Targeting accurate object extraction from an image  a comprehensive study of ...
Targeting accurate object extraction from an image a comprehensive study of ...LogicMindtech Nologies
 
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...thanhdowork
 
Implementation of Automated Attendance System using Deep Learning
Implementation of Automated Attendance System using Deep LearningImplementation of Automated Attendance System using Deep Learning
Implementation of Automated Attendance System using Deep LearningMd. Mahfujur Rahman
 
Long-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningLong-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningElaheh Rashedi
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsSangmin Woo
 
John W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final PresentationJohn W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final PresentationJohn Vinti
 
Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...Wesley De Neve
 

Similar to Neural networks for semantic gaze analysis in xr settings (20)

Unsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationUnsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimation
 
Mlp mixer an all-mlp architecture for vision
Mlp mixer  an all-mlp architecture for visionMlp mixer  an all-mlp architecture for vision
Mlp mixer an all-mlp architecture for vision
 
Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...
 
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsTablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
 
Gaze estimation using transformer
Gaze estimation using transformerGaze estimation using transformer
Gaze estimation using transformer
 
Progress Reprot.pptx
Progress Reprot.pptxProgress Reprot.pptx
Progress Reprot.pptx
 
Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...
 
Accurate and low complex cell histogram generation by bypass the gradient of ...
Accurate and low complex cell histogram generation by bypass the gradient of ...Accurate and low complex cell histogram generation by bypass the gradient of ...
Accurate and low complex cell histogram generation by bypass the gradient of ...
 
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
 
Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...
 
Targeting accurate object extraction from an image a comprehensive study of ...
Targeting accurate object extraction from an image  a comprehensive study of ...Targeting accurate object extraction from an image  a comprehensive study of ...
Targeting accurate object extraction from an image a comprehensive study of ...
 
Resume_updated_job
Resume_updated_jobResume_updated_job
Resume_updated_job
 
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
 
Implementation of Automated Attendance System using Deep Learning
Implementation of Automated Attendance System using Deep LearningImplementation of Automated Attendance System using Deep Learning
Implementation of Automated Attendance System using Deep Learning
 
Long-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningLong-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep Learning
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
 
Word
WordWord
Word
 
John W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final PresentationJohn W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final Presentation
 
ISM2014
ISM2014ISM2014
ISM2014
 
Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...
 

More from Jaey Jeong

핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNNJaey Jeong
 
hands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model traininghands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model trainingJaey Jeong
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forestJaey Jeong
 
deep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnndeep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnnJaey Jeong
 
deep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skillsdeep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skillsJaey Jeong
 
deep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationdeep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationJaey Jeong
 
deep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingdeep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingJaey Jeong
 
deep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkdeep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkJaey Jeong
 

More from Jaey Jeong (8)

핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN
 
hands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model traininghands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model training
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
 
deep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnndeep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnn
 
deep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skillsdeep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skills
 
deep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationdeep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagation
 
deep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingdeep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learing
 
deep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkdeep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural network
 

Recently uploaded

Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 

Recently uploaded (20)

Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 

Neural networks for semantic gaze analysis in xr settings

  • 1. Interaction Lab. Seoul National University of Science and Technology Neural Networks for Semantic Gaze Analysis in XR Settings Jeong Jae-Yeop ETRA2021, ACM Symposium on Eye Tracking Research and Applications Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG) Lena Stubbemann, Dominik Dürrschnabel, Robert Refflinghaus 2021
  • 2. Interaction Lab., Seoul National University of Science and Technology ■Intro ■Approach ■Evaluation ■Conclusion and future work Agenda 2
  • 4. Interaction Lab., Seoul National University of Science and Technology ■Semantic gaze analysis  The process to identify objects or features of visual and cognitive attention • Well controlled settings • Visual patterns and oculometric parameters • What users are looking at Intro(1/6) 4
  • 5. Interaction Lab., Seoul National University of Science and Technology ■Semantic gaze analysis in XR settings  ROI (Region of Interest) • Two-dimensional depiction of an object  VOI (Volumes of Interest) • Three-dimensional object that emerges from this intersection Intro(2/6) 5
  • 6. Interaction Lab., Seoul National University of Science and Technology ■Annotation to VOIs data(1/2)  VOIs data for gaze • User-specific gaze videos with constantly changing perspectives on the target object • Move, vanish, reappear and change shape, size or illumination … • Time consuming process • Manual annotations are thus still considered a standard procedure Intro(3/6) 6
  • 7. Interaction Lab., Seoul National University of Science and Technology ■Annotation to VOIs data(2/2)  VOIs annotation problem → Image classification • CAD (Computer Aided Design) model • CNN (Convolutional Neural Network) • Three-dimensional problem → Two-dimensional problem : simplified • CNN can also recognize different perspectives on the same three-dimensional body Intro(4/6) 7
  • 8. Interaction Lab., Seoul National University of Science and Technology ■Data augmentation  GAN (Generative Adversarial Network) • Image augmentation technique to adapt the training data to real environmental factors • Overcome the need for challenging photorealistic simulations • VOI annotation not only on an object level but also on a product feature level Intro(5/6) 8
  • 9. Interaction Lab., Seoul National University of Science and Technology ■Overview Intro(6/6) 9
  • 11. Interaction Lab., Seoul National University of Science and Technology ■Address annotation problem using object recognition  Methodological details • Use a CAD model to prepare training data for Cycle-GAN • Use Cycle-GAN to create reality-alike synthetic data set • Use synthetic data set to train CNN (Convolutional Neural Network) • Predict VOIs of experimental data with trained CNN model Approach(1/10)
  • 12. Interaction Lab., Seoul National University of Science and Technology ■Use a CAD model to prepare training data for Cycle-GAN(1/2)  The essential resource for using object recognition algorithms is suitable database  Feature level annotation • CAD model or virtual prototype Approach(2/10) 12
  • 13. Interaction Lab., Seoul National University of Science and Technology ■Use a CAD model to prepare training data for Cycle-GAN(2/2)  Training data Approach(3/10) 13
  • 14. Interaction Lab., Seoul National University of Science and Technology ■Experimental data  Egocentric videos, which are split into frames  Only fixation marker, not scan path • Only one fixation marker is contained in each frame  Gaze coordinates (𝑥, 𝑦) Approach(4/10) 14
  • 15. Interaction Lab., Seoul National University of Science and Technology ■Use Cycle-GAN to create reality-alike synthetic data set Approach(5/10) 15
  • 16. Interaction Lab., Seoul National University of Science and Technology ■GAN (Generative Adversarial Network) Approach(6/10) 16
  • 17. Interaction Lab., Seoul National University of Science and Technology ■Cycle-GAN (Cycle Generative Adversarial Network) Approach(7/10) 17
  • 18. Interaction Lab., Seoul National University of Science and Technology ■Use synthetic data set to train CNN (Convolutional Neural Network) Approach(8/10) 18
  • 19. Interaction Lab., Seoul National University of Science and Technology ■Object recognition  Object localization combined with image classification • Pixels to instances by means of adjacent pixels that share textures, colors, or intensities • Feature level recognition  Eye tracking data • Semantic or instance segmentation can be dispensed • Provide us with the exact coordinates of the fixation relative to the gaze replay Approach(9/10) 19
  • 20. Interaction Lab., Seoul National University of Science and Technology ■Predict VOIs of experimental data with trained CNN model  ResNet50v2 Approach(10/10) 20
  • 22. Interaction Lab., Seoul National University of Science and Technology ■Experimental setup  Real world and virtual-reality setting  Fully automated coffee machine  VOI annotation on feature level Evaluation(1/7)
  • 23. Interaction Lab., Seoul National University of Science and Technology ■Conditions/baseline  Comparing other method • 𝐸𝑦𝑒𝑆𝑒𝑒3𝐷 (https://eyesee3d.eyemovementresearch.com/)  𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ ∶ 𝑚𝑎𝑛𝑢𝑎𝑙 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛  Performance metrics • 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑎𝑛𝑑 𝑟𝑒𝑐𝑎𝑙𝑙, 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝐹1 − 𝑠𝑐𝑜𝑟𝑒 Evaluation(2/7) 23
  • 24. Interaction Lab., Seoul National University of Science and Technology ■User study design  Participants • 24 (6 female and 18 male) • 3 points calibration of the eye tracking system • Interact with the product in both the virtual and the real settings  First phase in experiment • Freely explore object for 60 seconds • Free movement around the machine  Second phase in experiment • Subjects are asked about their perceptual impressions • Led to certain product features as they have to solve tasks such as brewing coffee Evaluation(3/7) 24
  • 25. Interaction Lab., Seoul National University of Science and Technology ■Apparatus  Unity3D • Two projectors with a resolution 1920 x 1200 pixels each  SMIs mobile glasses + SMI 3D-6D head tracking  Outside-in motion tracking OptiTrack Prime^x 13W  Fixation detection with BeGaze 3.7  Desktop • Nvidia GeForce RTX 2060 SUPER chip • 8GB RAM Evaluation(4/7) 25
  • 26. Interaction Lab., Seoul National University of Science and Technology ■Network trainings  Thumbnail size : 224 x 224 px  Image augmentation using Cycle-GAN • Simulation image : 1,000 • Virtual image : 1,000 • Real image : 1,000 • Default settings except for epoch 50  Total training data after augmentation • Simulation image : 100,000 • Virtual image : 100,000 • Real image : 100,000 Evaluation(5/7) 26
  • 27. Interaction Lab., Seoul National University of Science and Technology ■Data preparation Evaluation(6/7) 27
  • 28. Interaction Lab., Seoul National University of Science and Technology ■Network trainings  CNN classification • ResNet50v2 architecture • Output layer with 12 neurons (10 VOIs + “Coffee machine but no VOI” and “No coffee machine”) • 224 x 224 • Adam, learning rate of 0.001 over 20 epochs with the sparse categorical cross-entropy Evaluation(7/7) 28
  • 30. Interaction Lab., Seoul National University of Science and Technology ■Result  CNN-approach performs slightly better in virtual reality than in the real world  Human annotation • About 30,000, 25 hours (20 images per minute) Conclusion and future work(1/7)
  • 31. Interaction Lab., Seoul National University of Science and Technology ■Discussion(1/3)  There the fixation marker is ambiguously located between four different VOIs and default classes • Some of which are adjacent and others which are simultaneously hidden due to depth effects Conclusion and future work(2/7) 31
  • 32. Interaction Lab., Seoul National University of Science and Technology ■Discussion(2/3)  Some are well-recognize and some are not • Well classified : Display  Standard classification problem Conclusion and future work(3/7) 32
  • 33. Interaction Lab., Seoul National University of Science and Technology ■Discussion(3/3)  Cycle-GAN can also degrade image quality • Use gaze coordinates, not fixation marker Conclusion and future work(4/7) 33
  • 34. Interaction Lab., Seoul National University of Science and Technology ■Limitation  The study gave a proof of concept for two different domains • Only coffee machine Conclusion and future work(5/7) 34
  • 35. Interaction Lab., Seoul National University of Science and Technology ■Conclusion  Propose a method for semantic gaze analysis using machine learning, while eliminating the resource-intense process of human annotations  Neither markers nor motion tracking systems are required  Not contain a personal bias and is thus not prone to evaluator effects  The same methodical evaluation can be used across platforms Conclusion and future work(6/7) 35
  • 36. Interaction Lab., Seoul National University of Science and Technology ■Future work  Our work is to be seen as a proof of concept. • Potential future work to further increase the accuracy of predictions  Chances for improving our approach • Advanced image classification methods or further improving the image augmentations techniques Conclusion and future work(7/7) 36