Presentation given by Mark Billinghurst at the 2024 XR Spring Summer School on March 7 2024. This lecture talks about different evaluation methods that can be used for Social XR/AR/VR experiences.
5. Typical Research Questions
• Is collaboration with AR/VR better than video conferencing?
• What is the impact of a particular input method?
• How should people be represented in Social XR interfaces?
• What communication cues can be added to improve collaboration?
• How can you effectively collaborate in hybrid interfaces?
• And more….
6. ISMAR Paper Trends
• ISMAR papers surveyed from 2008 – 2017
• Collaboration identified as new trend
• Only 9/526 papers = 1.7%
Kim, K., Billinghurst, M., Bruder, G., Duh, H. B. L., &
Welch, G. F. (2018). Revisiting trends in augmented
reality research: A review of the 2nd decade of ISMAR
(2008–2017). IEEE transactions on visualization and
computer graphics, 24(11), 2947-2962.
7. AR User Studies
• Key findings
• < 10% of all AR papers have user study
• Few collaborative user studies
• 12/291 user study papers < 5%
• Less than half used HMD
• Most studies in lab/indoor
• 1/15 studies outdoor, 3/15 field studies
Dey, A., Billinghurst, M., Lindeman, R. W., &
Swan, J. (2018). A systematic review of 10 years
of augmented reality usability studies: 2005 to
2014. Frontiers in Robotics and AI, 5, 37.
11. Existing AR Collaborative Studies
• Many papers use a combination of subjective and objective measures
• Typically have a small number of subjects
• Typically less than 20, University students
• Most involve pairs of users
• Less than half of the studies use HMDs
• Split between HMDs and HHDs
• Most experiments in controlled environments
• Lack of experimentation in real world conditions, heuristic, pilot studies
• Most evaluation is in a remote collaboration setting
• 30% in face-to-face collaboration
12. Opportunities
• Need for increased user studies in collaboration
• More use of field studies, natural user
• Need a wider range of evaluation methods
• Use a more diverse selection of participants
• Increase number of participants
• More user studies conducted outdoors are needed
• Report participant demographics, study design, or experimental task
13. Example: Collocated Communication Behaviours
• Is there a difference between AR-based & screen-based FtF collaboration?
• Hypothesis: FtF AR produces similar behaviours to FtF non-AR
Billinghurst, M., Belcher, D., Gupta, A., & Kiyokawa, K. (2003). Communication behaviors in colocated
collaborative AR interfaces. International Journal of Human-Computer Interaction, 16(3), 395-423.
14. Experiment Design
• Building arranging task
• Both people have half the requirements
• Conditions
• Face to Face – FtF with real buildings
• Projection – FtF with screen projection
• Augmented Reality – FtF with AR buildings
Face to Face Projection Augmented Reality
15. Measures
• Objective
• Performance time
• Communication Process Measures
• The number and type of gestures made
• The number of deictic phrases spoken
• The average number of words per phrase
• The number of speaker turns
• Subjective
• Subjective survey
• User comments
• Post experiment interview
How well could you work with your partner?
(1 = not very well, 5 = very well)
How easy was it to move the virtual objects?
(1 = not very easy, 5 = very easy)
What is that? (pointing)
16. Results
• Performance time
• Sig. diff. between conditions – AR slowest
• Communication measures
• No difference in number of words/turns
• Sig. Diff. in deictic phrases (FtF same as AR)
• Sig. Diff. in pick gestures (FtF same as AR)
• Subjective measures
• FtF manipulation same as AR
• FtF to work with than AR/FtF
Percentage Breakdown of Gestures
Subject Survey Results
17. Lessons Learned
• Positive Lessons
• Communication process measures valuable
• Gesture, speech analysis
• Collect user feedback/interviews
• Stronger statistical analysis
• Make observations
• Fewer mistakes
• Surveys could be stronger
• Validated surveys
• Better interview analysis
• Thematic analysis
“AR’s biggest limit was lack of peripheral
vision. The interaction physically (trading
buildings back and forth) as well as spatial
movement was natural, it was just a little
difficult to see.
By contrast in the Projection condition you
could see everything beautifully but
interaction was tough because the interface
didn’t feel instinctive.”
“working solo together”.
18. • Using AR/VR to share communication cues
• Gaze, gesture, head pose, body position
• Sharing same environment
• Virtual copy of real world
• Collaboration between AR/VR
• VR user appears in AR user’s space
Piumsomboon, T., Dey, A., Ens, B., Lee, G., & Billinghurst, M. (2019). The effects of sharing awareness cues
in collaborative mixed reality. Frontiers in Robotics and AI, 6, 5.
Example 2: Virtual Communication Cues (2019)
20. Conditions
• Baseline: In the Baseline condition, we showed only the head and hands of the
collaborator in the scene. The head and hands were presented in all conditions
• Field-of-view (FoV): We showed the FoV frustum of each collaborator to the
other. This enabled collaborators to understand roughly where their partner was
looking and how much area the other person could see at any point in time.
• Head-gaze (FoV + Head-gaze ray): FoV frustum plus a ray originating from the
user's head to identify the center of the FoV, which provided a more precise
indication where the other collaborator was looking
• Eye-gaze (FoV + Eye-gaze ray): In this cue, we showed a ray originating from
the user's eye to show exactly where the user was looking at.
21. Task
• Search task
• Find specific blocks together
• Two phases:
• Object identification
• Object placement
• Designed to force collaboration
• Each person seeing different information
• Within-subject Design
• Everyone experiences all conditions
23. Measures
• Performance (Objective)
• Rate of Mutual Gaze
• Task completion time
• Observed (Objective)
• Number of hand gestures
• Physical movement
• Distance between collaborator
• Subjective
• Usability Survey (SUS)
• Social Presence Survey
• Interview
24. Data Collected
• Participants
• 16 pairs = 32 people
• 9 women
• Aged 20 – 55, average 31 years
• Experience
• No experience with VR (6), no experience AR (10), no HMD (7).
• Data collection
• Objective
• 4 (conditions) × 8 (trials per condition) × 16 pairs = 512 data points
• Subjective
• 4 (conditions) × 32 (participants) = 128 data points.
27. Results
• Predictions
• Eye/Head pointing better than no cues
• Eye/head pointing could reduce need for pointing
• Results
• No difference in task completion time
• Head-gaze/eye-gaze great mutual gaze rate
• Using head-gaze greater ease of use than baseline
• All cues provide higher co-presence than baseline
• Pointing gestures reduced in cue conditions
• But
• No difference between head-gaze and eye-gaze
28. Example 3: Scaling Up (2020)
• IEEE VR 2020
• Large scale virtual conference
• 1965 attendees
Ahn, S. J., Levy, L., Eden, A., Won, A. S., MacIntyre,
B., & Johnsen, K. (2021). IEEEVR2020: Exploring
the first steps toward standalone virtual
conferences. Frontiers in Virtual Reality, 2, 648575.
29. Tools Used
• Mozilla Hubs
• 3D social VR
• Twitch
• Streaming
• Slack
• Text messaging
• Social Network
• Text-based
30. Analysis
•Subjective Survey
• Demographics
• Likert scale questions
• Conference effectiveness
• Media appropriateness
• Social Presence
• Open ended responses
• Thematic analysis
• Observation
• User behaviour
33. Thematic Analysis
• Look for common themes in the text from the open-ended questions
• Themes observed
• Fun and Playful Connections and Conversations
• Split Views on Posters in Hubs
• New Ways to Attend Conference Talks in Hubs
• Infrastructure Challenges
“The BOFs were super
enjoyable and a real hit for
learning and networking.”
“It was intimidating that there
were so few other people
there. Most often it was just
me and the presenter.”
“I think the experience would have
been vastly better with a better
connection”
34. Field Observations
• Process
• Moving between rooms
• Short interviews
• Observe and code behaviors
• Observation styles
• Broad observation – observe whole room
• Spotlight - focus on one participant for 10 minutes
• Categories of Behavior
• spatial (how attendees interacted in room), interactions (attendees interacted with each other)
• harassment (toxic interactions), communication (how attendees talked about their experience)
35. Field Observations
• Spatial navigation issues
• Difficulty in navigating space and interacting with each other
• Need to remove HMD to use keyboard
• Evolving interactions over time
• Learning interaction methods over time
• HMD use dropped by end of conference
• Limitations of social interactions
• Most users moving to less social platforms (twitch)
• Audio issues – being heard anywhere
• Democratization of Academic Conferences
• Increased diversity and removal of status
• Significantly increased participation
36. Example 4: More Detail (2022)
• Evaluating large scale social VR
• Using wider range of measures
Moreira, C., Simões, F. P., Lee, M. J., Zorzal, E. R.,
Lindeman, R. W., Pereira, J. M., ... & Jorge, J.
(2022). Toward VR in VR: Assessing Engagement
and Social Interaction in a Virtual Conference. IEEE
Access, 11, 1906-1922.
37. IEEE VR 2021
• Fully online virtual conference – 1200+ attendees
• Tools
• Virbella 3D platform – virtual avatars, Desktop, or HMD viewing
• Discord for chat/messaging
• Twitch/Youtube for video streaming
46. Key Lessons Learned
• There is a need for more Social XR evaluation studies
• Use a variety of subjective and objective measures
• Focus on the communication measures, not performance
• There are opportunities for new evaluation methods
• Adapt the tools to the number of participants
47. New Tools
• New types of sensors
• EEG, ECG, GSR, etc
• Sensors integrated into AR/VR systems
• Integrated into HMDs
• Data processing and capture tools
• iMotions, etc
• AR/VR Analytics tools
• Cognitive3D, etc
48. Sensor Enhanced VR HMDs
Eye tracking, heart rate,
pupillometry, and face camera
HP Omnicept Project Galea
EEG, EEG, EMG, EDA, PPG,
EOG, eye gaze, etc.
49. Multiple Physiological Sensors into HMD
• Incorporate range of sensors on HMD faceplate and over head
• EMG – muscle movement
• EOG – Eye movement
• EEG – Brain activity
• EDA, PPG – Heart rate
50.
51. Cognitive3D
• Capture capture and analytics for VR
• Multiple sensory input (eye tracking, HR, EEG, body movement, etc)
53. Moving Beyond Questionnaires
• Move data capture from post experiment to during experiment
• Move from performance measures to process measures
• Richer types of data captured
• Physiological Cues
• EEG, GSR, EMG, Heart rate, etc.
• Richer Behavioural Cues
• Body motion, user positioning, etc.
• Higher level understanding
• Map data to Emotion recognition, Cognitive load, etc.
• Use better analysis tools
• Video analysis, conversation analysis, multi-modal analysis, etc.
54. • Types of Studies
• Need for increased user studies in collaboration
• More use of field studies, natural user experiences
• Use a more diverse selection of participants
• Evaluation measures
• Need a wider range of evaluation methods
• Establish correlations between objective and subject measures
• Better tools
• New types of physiological sensors
• Develop new analytics
Research Opportunities