Wihte paper : Remote assistance in Mixed Reality

whIte pAper
bY ImmerSIon
ReMOte
assIstaNCe
IN MIxed
RealIty

What is
Mixed Reality
APPLICATION DOMAINS FOR
REMOTE ASSISTANCE
3. TeleAdvisor: An example of
Remote Assistance in AR for the industry ............................. 17
3.1 Remote assistance challenges ................................................ 18
3.2 AR for remote assistance .......................................................... 18
3.2 Design and implementation
of TeleAdvisor ....................................................................................... 19
3.3 Evaluation and limitations
of the system ......................................................................................... 20
4. Remote assistance
in Augmented Surgery ............................................................... 21
4.1 Challenges of MR for surgery ................................................... 22
4.2 Remotely guiding a surgeon in AR ......................................... 23
The zoom: Groupware ....................................................................... 25
Introduction.............................................................................................. 4
1.AnhistoricperspectiveonMixedReality.................................7
1.1 Reality-Virtuality Continuum and Virtual Reality .............. 7
1.2 Hybrid displays: Augmented
Reality and Augmented Virtuality .................................................. 8
1.3 A taxonomy for Mixed Reality displays .............................. 9
2.Whatis(really)MixedReality?..................................................11
2.1 MR beyond visual perception .................................................. 11
2.2 Blurred borders between AR and MR .................................. 12
2.3 Different definitions for different aspects of MR .............. 13
2.4 A framework for MR systems ................................................. 14
The zoom: Cooperation vs Collaboration ................................. 15

VISUALLY REPRESENTING
USeRS AND THEIR ACTIVITY
OUT-OF-THE-BOX
CONCEPTS
5. Visual cues for social presence in MR ................................. 27
5.1 Different aspects of presence ................................................... 27
5.2 Improving collaboration using visual cues ........................... 29
6. Avatar and telepresence of remote tutor ........................... 31
6.1 Industry 4.0 and
machine taks.......................................................................................... 32
6.2 Visually representing the remote user .................................. 33
7. Mini-Me: adding a
miniature adaptative avatar ..................................................... 35
7.1 Design of the Mini-Me system .................................................. 36
7.2 Experimental results
for cooperative and collaborative tasks ....................................... 37
The zoom: Full- immersion avatars ................................................ 39
8. Using light fields for
Hand-held mobile MR ................................................................. 41
8.1 MR light fields and system calibration .................................. 41
8.2 Adding annotations
into the shared workspace ............................................................... 43
8.3 Evaluating the usability of the system ................................... 41
9. Facilitating spatial referencing in MR ................................. 45
9.1 Letting the MR system
handle the referencing process ....................................................... 45
9.2 Evaluating the prototype ............................................................ 47
10. Using virtual replicas for
object-positionning tasks .......................................................... 49
10.1 Design of the two
interaction techniques ........................................................................ 50
10.2 Comparing virtual replicas
to a 2D baseline technique ............................................................... 50
About US................................................................................................. 53
Acronyms and definitions.................................................................. 54
References ............................................................................................. 55

P4
INtROdUCtION
Working with others has always raised multiple ques-
tions. What is the optimal process to take together
the best decisions? Which solutions can facilitate the
communication between participants? How to handle
conﬂicts and contradictory opinions?
Answering such questions is already complex when
users are co-located, but it becomes even more tricky
when it is not the case.
Remote assistance scenarios imply two main characte-
ristics: 1) users do not share the same physical space
and 2) they do not have the same knowledge and capa-
cities. On the one hand, local users can physically act on
their surroundings, but need help because they do not
know how to proceed with the task they have in mind.
On the other hand, remote helpers have the expertise
to perform this task, but cannot achieve it because they
are not physically present at the corresponding location.
Remote assistance is thus closely linked to remote gui-
dance.
ACM copyright for selected papers: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full
citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit
is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from permissions@acm.org.

P5
The recent Covid-19 pandemic and technological pro-
gress increased further the already growing interest for
remote assistance. In particular, Mixed Reality (MR) is
currently explored as a promising tool for many applica-
tion domains like industry [43] and surgery [18].
The goal of this white paper is to give an overview of
the current research about remote assistance in MR. To
do so, we present 10 selected research articles on this
topic: 9 recent articles (from 2015 or more recent) and 1
legacy article (from 1994). These articles are regrouped
into four main sections. After discussing the notion of
MR (Section 1), we present two key application domains
for remote assistance: industry and surgery (Section 2).
Local worker Remote helper
Figure 1 : Example of remote assistance scenario in Mixed Reality.
Then, we focus on visual activity cues and methods to
represent remote users in order to facilitate guidance
and remote cooperation (Section 3). Eventually, we go
over a selection of out-of-the-box papers with unique
concepts or approaches (Section 4).
By adopting a Human-Computer Interaction (HCI) point
of view, we hope to inspire developers, designers and
researchers interested in remote assistance to bring fur-
ther Mixed Reality applications.

SECTION 1
An historic perspective 07
Current definitions of MR 12
MIXED
REALITY?
What is

P7
whAt IS mixed reALitY
aN hIstORIC PeRsPeCtIVe ON
MIxed RealIty
Article : Milgram, P., & Kishino, F. (1994). A taxonomy
of mixed reality visual displays. IEICE TRANSACTIONS
on Information and Systems, 77(12), 1321-1329. PDF
freely accessible here.
Before focusing on remote assistance, it is necessary
to precise what lies behind the terms Mixed Reality
(MR). Many technologies like Augmented Reality (AR)
and Virtual Reality (VR) are interconnected to MR.
To the point that it may be difficult to differenciate
these different approaches merging real and virtual
environments.
To address this confusion, we chose to start from the
historical point of view of the notion of MR. In the early
90s, Milgram and Kishino proposed a first definition of
MR based on the Reality-Virtuality Continuum (Figure 1).
This vision had a ground-breaking impact on different
research communities, and to this date is still one of the
most used definition of MR. This is particularly true for
the Human-Computer Interaction (HCI) community [53].
In this section, we start by presenting this Continuum
to define the main existing technologies related to MR.
Then, we detailed the definition of MR based on it and
the taxonomy proposed by the authors to classify MR
displays.
1.1 Reality-Virtuality Continuum and Virtual
Reality
While the democratization of cheap Head-Mounted
Displays (HMD) only started a few years ago, VR is far
from being a new technology [10]. Immersing the user
inside a synthetic environment was concertized as early
as the mid 1950 with the Sensorama device. In 1960,
the first VR HMD was created. And only five years later,
Sutherland proposed the concept of Ultimate display, a
fictional technology allowing to simulate a virtual world
so realistically that it could not be distinguished from the
actual reality [56].
DID YOU KNOW?
The intricate relations between concepts like
Augmented Reality, Augmented Virtuality,
Virtual Reality and Mixed Reality is a common
source of mistakes, even for professionals. One
example in many: the Usine Digitale magazine
published an article about the training of
surgeons in VR. The article was illustrated
with an AR headset, the Hololens… which is
considered as a MR headset by its constructor
Microsoft. Confusing indeed.
mixed
reALitY (mr)
reAL
enVironment
Augmented
Reality (AR)
Augmented
Virtuality (AV)
VIrtUAL
enVironment
Figure 2 : The Reality-Virtuality Continuum proposed by Milgram and Kishino.

P8
Nearly 30 years later, Milgram and Kishino start their
work with this notion of Virtual Reality (VR), where
the user is fully immersed in a computer-generated
environment and can interact with virtual objects.
The authors observe that beyond the technological
progresses, other paradigms have started to appear.
Some systems do not provide a total user immersion, but
rather merge real and virtual elements up to a certain
degree. To classify these systems, they propose to use a
continuous scale: the Reality-Virtuality Continuum. This
continuum can be divided in three sections: 1) the real
environment on one side, 2) the fully virtual environment
on the other side and 3) everything in-between (Figure
1).
The extremities of the continuum are straightforward.
On the one hand, the real environment corresponds to
the world we are used to, fully perceived by our bare
senses and without computer-based medium. On the
other hand, the virtual environment refers to a totally
synthetic world and is directly linked to VR. According
to Milgram and Kishino, everything in between this fully
real and fully virtual extremum belongs to Mixed Reality
[29]. In other words, they do not envision MR as a specific
technology but rather as a super set of technologies
mixing the real and virtual(s) environment(s) (Figure 2).
A fascinating phenomenon is that reading the Reality-
Virtuality continuum from left to right does not match at
all with the historical development of these technologies.
As mentioned at the beginning, VR systems appeared
first for technical reasons. AR systems appeared second.
Approaches in the middle of the spectrum like Remixed
Reality [32] only became possible recently. Moreover,
it is also interesting to note that the case of VR is not
fully clear. Milgram and Kishino placed VR at the right
extremity of the continuum, which leaves a confusion
about deciding if it can be considered as part of MR or
not.
1.2 Hybrid displays: Augmented Reality and
Augmented Virtuality
To complete this definition of MR, Milgram and Kishino
identified six classes of displays than they consider
as MR interfaces [29]. As shown in T1, these classes
cover a large range of technologies, from augmenting
videos on a monitor to partially immersive large displays
allowing tangible interaction.
The authors then link these classes of display to existing
technologies.Forinstance,theyexplainthattheemerging
terminology of Augmented Reality mainly corresponds
to class 3 displays. This observation would need to be
nuanced nowadays. Over the last two decades, mobiles
devices have tremendously evolved, allowing the rise
of AR with mobile devices such as smartphones and
tablets instead of HMDs only. Interestingly, Milgram and
Kishino also report that they also started to consider
displays from classes 1, 2 and 4 as AR displays in their
lab. They argue that the core principle is the same for all
displays: augmenting real scenes with virtual content.
While this still holds for class 4 displays nowadays, it
may not be the case for classes 1 and 2.
a) Augmented Reality a) Augmented Virtuality,
Remixed Reality
c) Virtual Reality
Real environment augmented with
virtual elements
Virtual environment augmented with
real elements
Fully virtual environment
Mixed Reality
Mixed Reality
Figure3 : Main technologies mixing real and virtual environments

P9
On the contrary, the terms
Augmented Virtuality did not exist in
the 90’s literature and was proposed
by the authors. The concept of
augmenting a virtual world with
real elements had just started to
be explored in early studies [34].
Many technological have been
made since, and current video see-
through HMDs like the Varjo 3 [38]
have started to blurry the limits of
AR and AV, as predicted by Milgram
and Kishino.
Besides, other studies have started
to explore new concepts based on
video see-through, such as Remixed
Reality [32].
DID YOU KNOW?
The opposite of Augmented Reality is also based on a video-see-through approach!
Called Diminished Reality, it involves masking real-world elements by filtering them before
displaying the scene on a video see-through device. This allows to remove, replace objects or
see through obstacles [16].
# Description Current equivalent nowadays
1 Monitor displays where a video of
an environment is augmented with
virtual images overlaid on it.
Using a video-editing software and
seeing the result on a monitor.
2 Same as #1, but using an HMD Watching an edited video with an
HMD
3 See-through HMD. The user
perceives directly the current real
environment, which is augmented
with virtual objects
Optical see-through AR
4 Same as #3, but with video see-
through HMD. The user cannot see
the real world directly but watches
a real-time video reconstruction of
it based on cameras input
Video see-through AR
5 Completely graphic displays, on
which videos of real elements are
overlaid.
Augmented Virtuality
6 Completely graphic, partially
immersive displays (for instance:
large screens) where the user can
use real-world objects to interact
Tangible interaction on a tabletop,
tangible AR
Table 1: The 6 classes of MR display identified by Milgram and Kishino.
1.3 A taxonomy for Mixed Reality displays
In the rest of their paper, Milgram and Kishino refine
the classes of displays into a complete taxonomy. This
taxonomy is based on three axes: the Extend of world
knowledge, the Reproduction fidelity and the Extend of
presence metaphor.
The Extend of world knowledge axis refers to amount
of knowledge possessed by the system about the
environment. In some basic cases, the system does
not need to know anything about the environment. For
instance, a basic video-editing software can consider
the video frames as black-box images, leaving the user
freely superimpose virtual elements on it and handle
visual clues like occlusion and shadows. On the contrary,
VR systems fully know virtual world they generated.
Similarly to the Reality-Virtuality Continuum, many AR
systems can be placed somewhere in between since they
need to “understand” and model the real environment to
be able to correctly display virtual objects within it. As
shown in Figure 4, the authors refer to the intermediary
state with the Where and What keywords, which
correspond to the knowledge of locations and objects/
elements respectively.

P10
worLd pArtiALLY
modeLLed
worLd
UnmodeLLed
monoScopic
VideoS
monitor -
bASed
coLor
VideoS
hd
VideoS
StereroScopicS
VideoS
LArge Sceen
Where/What
Simple wireframes
Monoscopic
imaging
Shadings, texture,
transparency
Panoramic
imaging
Surrograte
travel
Where + What
Real-time, high-fidelity
3D anmations
Real-time
imaging
worLd fULLY
modeLLed
3d
hdtV
hdmS
Figure 4 : Extent of World Knowledge axis in the taxonomy of MR displayed by
Milgram and Kishino.
Figure 5 : The two other axes of the taxonomy. A) Reproduction Fidelity axis. B)
Extend of Presence Metaphor axis.
The two other axes are more straightforward. Milgram
and Kishino present them as two distinct ways to
convey realism: image quality for Reproduction Fidelity
and immersion for Extend of Presence Metaphor.
One could argue that the Reproduction Fidelity axis
(Figure 5a) may need to be updated to better match
with the current technologies. Nowadays, even cheap
hardware can handle stereoscopy or high-quality
rendering. However, the principle behind this axis still
holds since we have not yet reached the “ultimate
display” where virtual elements would be too realistic
to be distinguished from real ones. In fact, current
techniques often involve clever tricks such as foveal
rendering to maximize the image quality only where it is
strictly necessary (i.e. where is user is currently looking
at).
Similarly, the idea behind the Extend of Presence
Metaphor (Figure 5b) is still perfectly relevant to this
day. The feeling of presence is still an active research
topic in MR [33]. However, researchers have also
started to explore other approaches to increase this
feeling of immersion that go beyond visual perception,
as discussed in the next chapter.
a
b
KEY TAKE-AWAYS
The historic definition of MR: everything in the middle of the Reality-Virtuality Continuum. In other words, a
set of technologies mixing the real and a virtual environment, including Augmented Reality and Augmented
Virtuality.
This definition and the taxonomy proposed by the authors are focused on visual displays, and thus consider
only visual perception.

P11
whAt IS (reALLY) mixed reALitY
Article : Maximilian Speicher, Brian D. Hall, and
Michael Nebeling. 2019. What is Mixed Reality?. In CHI
Conference on Human Factors in Computing Systems
Proceedings (CHI 2019), May 4–9, 2019, Glasgow,
Scotland, UK. ACM, New York, NY, USA, 15 pages.
https://doi.org/10.1145/3290605.3300767.
Did we not just explain what Mixed Reality is? Yes… And
no. As mentioned in the previous section, defining MR
by using the Reality-Virtuality Continuum is probably
the most classical approach. However, that does not
mean that it is the best or the only existing one. In
fact, defining precisely and completely what is MR is
so complex that there is to this date no consensus on
this notion. This is true both in industry and academia
[53]. Despite the recent technological progress and
an increasing popularity, the exact boundaries of MR
remain a subject of intense discussions.
The problem is far from being a simple rhetorical
argument of experts around a given terminology.
Defining the limits of MR implies considering major
aspects of mixing real and virtual environments such as
the possible level of immersion and the user interactions.
Therefore… What is (really) Mixed Reality? This question
is at the heart of the second paper presented in this
book, a work by Speicher et al. recently presented at the
CHI 2019 conference. By conducting interviews with
experts and a literature survey, the authors identified
the currently co-existing definitions of MR and proposed
a conceptual framework to classify the different aspects
of MR systems. In the following, we present this work
and use it to precise the definition of MR that we will use
hereafter in this book.
2.1 MR beyond visual perception
Speicher et al. start their work by highlighting this
absence of agreement around the notion of MR and the
limitations of Milgram and Kishino’s definition. Its main
weakness is that it only relies on the visual perception.
As presented in the previous section, the authors
considered how realistic is the displayed environment
and to which extend is the user visually immerged in it.
This approach can be explained by the dominance
of visual perception for humans. Nonetheless, this
observation should not obscure the fact that Mixed
Reality could also imply mixing real and virtual
environments by using our other senses. This is the
case for haptics that has been expensively studied,
especially in VR [9]. For instance, having a realistic
haptic feedback is crucial for the training of surgeons
in AR and VR [49]. The interns must develop both their
dexterity and learn to recognize the haptic feedback of
different kind of surfaces and organic tissues. A few
studies also considered other senses such as audio [11]
and smell [47] in the context of MR. Mixing virtual and
real environments thus goes much further than inserting
virtual objects into the field of vision of users.
DID YOU KNOW?
Some studies even explored augmenting the sense
of taste. For instance, Niijima and Ogawa proposed
a method to simulate different virtual food textures
[40]. Ranasinghe and Do were also able to virtually
simulate the sensation of sweetness using thermal
stimulation [48].
Will a complete virtual dinner be possible in a few
years?

P12
2.2 Blurred borders between AR and MR
Speicher et al. also report the result of interviews
conducted with 10 experts from academia or industry.
Theseinterviewsleadtoaclearconclusion:thedifference
between AR and MR is far from being straightforward.
Some of the interviewed experts argued that MR is a «
stronger version» of AR, in the sense that the blending
between virtual and real is seamless.
They explained that in the case of MR users can interact
with both real and virtual content. On the contrary, other
experts argued that this is also the case in AR. They
even declared that MR is mainly a « marketing term ».
This vision may come from the efforts of companies
like Microsoft promoting the usage of the terms MR to
describe their own product like the Hololens HMD [35].
Is one version more relevant that the other one? Previous
definitions of AR do not necessarily help to decide. For
instance, Azuma defined 3 criteria for AR systems: [3]
• The combination of virtual and real,
• The possibility to interact in real time,
• The registration in three dimensions.
Nonetheless, if it is possible to interact in AR… When
does an AR system becomes an MR system? Is there
a specific threshold in terms of interaction techniques?
There is no clear answer to this question for now.
However, Speicher et al. found that most experts
agreed at least on something: the spatial nature of MR.
More precisely, they referred to the notion of spatial
registration (Figure 6).
A virtual object is spatially registered when its spatial
position considers all the features of the 3D environment.
Visual clues like occlusion with other physical objects
are respected. In other words, the virtual object is
positioned withing the reference of the 3D world (Figure
6b). On the contrary, a virtual object defined according
to the reference of its display (Figure 6a) is not spatially
registered.
Figure 6 : The notion of spatial registration. a) The virtual panel in blue is displayed according to the tablet screen
only. b) Virtual elements have a coherent spatial position within the 3D environment: they are spatially registered.
b
a

P13
2.3 Different definitions for
different aspects of MR
To further explore these questions,
Speicher et al. conducted a literature
review to identify the different
usages of the terms Mixed Reality.
The authors analyzed 68 papers
from well-known Human-Computer
Interaction (HCI) conferences such
as CHI, ISMAR and UIST. Overall,
Speicher et al. identified 6 co-
existing definitions of Mixed Reality
(Table 2).
As mentioned by the authors,
the goal of this study was not to
determine which of these definitions
is the best. On the contrary, they
aimed at highlighting the complexity
of wrapping all the aspects of
MR into a single vision shared by
all actors working with MR. The
authors explain that the priority for
MR actors is to clearly communicate
their understanding of what is MR.
# Name Description
1 The most common definition, from
the Reality-Virtuality Continuum
by Milgram and Kishino [29]. MR is
seen has the superset regrouping
every technology in-between the
real and virtual environment..
2 MR AR MR being a synonym for AR. Some-
times also noted as “MR/AR”.
3 MR = AR + VR MR as the combination of AR and
VR parts inside a system.
4 Collaboration In this definition, the emphasis is
put on the collaboration between
AR and VR users, potentially in
different physical locations.
5 Alignment of virtual and real
environments
Augmented Virtuality The syn-
chronization between two different
environments, one being physical
and the other one virtual.
For instance, Yannier et al. pro-
posed a MR system where a Kinect
observes physical block towers on
a table during an earthquake and
reﬂect in real time their state on
digital towers [61].
6 MR as “stronger” AR MR defined as a spatially registered
and interactive AR
Table 2: The 6 co-existing definitions of MR identified by Speicher et al
Name EarthShake MR game
by Yannier et al
Number of
environments
Many (physical bock towers and virtual copies)
Number of
users
One to many
Level of
immersion
No immersion (physical towers)
Partial immersion (virtual towers)
Level of
virtuality
Not immersive (physical towers), Partially immersive (virtual
towers)
Degree of
interaction
Implicit and explicit interaction
Input Motion (shaking the towers)
Output Visual (seeing which tower falls first after the earthquake
Figure 7: The framework of Speicher et al. Left: framework dimensions with respect to the
work of Yannier et al. mentioned in Table 2. Right: Setup of the EarthShake MR game [61],
picture courtesy of the authors.
Continuum-based
MR

P14
KEY TAKE-AWAYS
So, what is Mixed Reality? It depends, multiple definitions co-exist. But MR systems go beyond visual
perception only.
In the following, we will use MR as a mix of definitions 3 and 4. Collaboration is of course a crucial aspect
since we focus on remote assistance. Besides, we will mainly consider AR/VR technologies and interactions.
2.4 A framework for MR systems
To help classifying the existing MR systems independently of a global definition, Speicher et al. proposed a
conceptual framework based on 7 criteria, as shown in Figure 7.
The 5 initial criteria include dimensions like the number of environments and users, the level of immersion and
virtuality and the degree of interaction. Two other criteria were then added to consider the input and output of the
system. Such a framework aims at being general enough to be usable of many MR systems as possible and not
only on precise use-cases.
Can such a framework solve our initial question about what is MR? Probably not, but it is not its objective. Speicher
et al. end their paper by insisting on the importance of building a common unambiguous vocabulary to characterize
MR systems. Their framework is a step in this direction.

P15
the zoom:
cooperAtIon VS coLLAborAtIon
In specialized conferences such as CSCW, a distinction is sometimes made between cooperating and collaborating.
While this distinction varies between different academic fields [15], it is interesting to consider within the scope of
remote assistance.
Of course, it is possible to be somewhere in between cooperation and collaboration or to switch from one to
the other. However, remote assistance often involves predefined roles. Therefore, we will prioritize the notion of
cooperation hereafter. When both cooperation and collaboration could be involved, we will use group work instead.
Cooperation implies that participants have predefined
roles: technical expert, presenter, spectator gest…
These roles directly impact interactions between group
members, defining the responsibilities and privileges for
each type of user. For instance, only the organizer may
have access to screen-sharing at the beginning of a
brainstorming session.
On the contrary, collaboration imply a group
working without hierarchy defined in advance. The
roledistribution can evolve freely between participants
during the meeting. This may encourage less formal
exchanges with a dynamic evolution of tasks.

SECTION 2
Industry: example of TeleAdvisor 17
Surgery and mixed reality 21
FOR REMOTE
ASSISTANCE
Application domains

P17
AppLIcAtIon domAInS for remote ASSIStAnce
teleadVIsOR: aN exaMPle OF ReMOte
assIstaNCe IN aR FOR the INdUstRy
Article : Gurevich, P. et al. 2015. Design and
Implementation of TeleAdvisor: a Projection-Based
Augmented Reality System for Remote Collaboration.
Computer Supported Cooperative Work (CSCW). 24,
6 (Dec. 2015), 527–562. DOI:https://doi.org/10.1007/
s10606-015-9232-7.
Now that we distinguish better the multiple aspects of
Mixed Reality, it is time to explore the second key-notion
of this book: remote assistance. Once again, this is a wide
notion with many potential meaning and applications.
And while narrating the tribulations of calling customer
service because the cat confused the Internet box with a
mouse is tempting, well… We will focus on professional
cases of remote assistance instead.
Many studies have considered surgery [1] and industry
[12,21]askeyapplicationdomainsforremoteassistance
scenarios. The complexity of these environments and
tasks plays a major role here. Surgeons often do not
have extended knowledge of a given procedure for a
complex case and need some advice from colleagues
that are specialized on it. Technicians cannot know
every aspect of each machine in the factory. Instead of
relying on cumbersome paper documentation, MR is in
itself a promising solution. MR can be used to support
training and guidance purposes [18, 43]. However, when
no pre-existing guidance solution is available (which is
currently often the case), remote assistance given by an
experienced colleague is a powerful tool to save time
while reducing errors and accidents.
This chapter presents a study about an AR system for
remote assistance: TeleAdvisor [21], illustrated in Figure
8. In their work, Gurevich et al. detail the benefits and
challenges of AR for remote assistance in industrial
scenarios. The system they propose is an interesting
entry point on these questions.
Figure 8: TeleAdvisor, an AR remote assistance system by Gurevich et al [21]. Picture courtesy from the authors.

P18
Figure 9: Example of deictic gesture: a user in AR pointing at
a component on a virtual machine.
3.1 Remote assistance challenges
The first characteristic of remote assistance noted by
the authors is its asymmetry. The remote helper has
knowledge of the task to be performed but cannot
access the physical environment while the local worker
is inside this environment but does not know how to
proceed. This difference raises major constraints on the
cooperation. Only the local worker can concretely act
to achieve the physical task that must be performed.
Besides, the two users are not in the same location, and
thus cannot see each other.
Studies have shown that beyond having audio
communications, sharing a common visual view is
crucial for cooperative tasks [31]. Being able to see
the gestures of the other user significantly facilitates
the communication. This is typically the case for deictic
gestures, i.e. gestures made to designate a specific
location or point at a given object (see Figure 9). The
“Put-that-there” metaphor [8] is a well know HCI
example of voice command linked with a deictic gesture.
This kind of multimodal interaction is very common in
everyday life scenarios, especially when working within
a group.
DID YOU KNOW?
Most of the times yes, the local worker is the only
one able to interact with the physical environment
to perform the task. However, this may change in
the incoming years thanks to Digital Twins [25].
This technology allows to recreate an exact virtual
replica of a given physical system (for instance,
a building). Many sets of sensors can be used to
make sure that the virtual replica reﬂects in real
time the state of its physical twin.
What is the link with remote assistance? In fact,
the data connection between the twins goes
in both directions. This means that interacting
with the virtual version of a production line could
also impact its physical version by sending the
corresponding commands to the real machines.
Digital twins may thus allow a remote expert in MR
to directly inﬂuence the physical environment!
The question is now to determine how to make these
gestures perceivable for both users. Imitating common
videoconference tools by adding a video screen in each
workspace could seem a suitable solution. Nonetheless,
it would force the local worker to visually focus on both
the task to be performed and the distant screen. Such
a divided attention task a heavily impact performance.
Displaying the video on a mobile device may solve this
issue, but at the cost of mobilizing one local worker’s
hand.
3.2 AR for remote assistance
Augmented Reality is a promising technology for remote
assistance because it addresses many of these issues.
In particular, AR with an HMD leaves both user hands
free and allows to display virtual content in the current
Field of View (FoV) of the user. This approach may limit
divided attention side effects compared to a distant
monitor approach [54]. However, virtual objects can still
impact attentiveness because they can distract users
and prevent them for noticing real anomalies in the
workspace [16].
Besides, Gurevich et al. highlight in their state of the art
that with HMDs, cameras are directly linked with the
head position [11]. This allows the local worker to share
a real-time, movable view of the workspace, sure. But it
also means that the remote helper has no control over
this view and is constrained to look at the same location.
Head jittering (“shaky-cam” effect) and sudden head
movements can also disturb the other user.

P19
Mobile AR AR with HMD Projection-based AR
Benefits
Benefits Mobility
Mobility Hands free
Hands free
Mobility
Mobility
No jittering
No jittering
No equipment on user
No equipment on user
Limitations
Limitations Require at least 1 hand
Require at least 1 hand
Hand jittering
Hand jittering
Local worker dependent
Head jittering
Head jittering
No mobility
No mobility
Table 3: Comparison of the classical benefits and limitations of the
three main approaches for AR.
In their work, Gurevich et al. focus on projector-based
AR [21]. Early studies on this approach used mainly
pointers projected in the local worker environment
while later studies explored sharing hand gestures
and annotations from the remote helper [28]. With
TeleAdvisor, the authors aim at going one step further
by overcoming the lack of mobility of fixed projector AR
solutions. Of course, mobility is required when the local
user moves between different locations, for instance if a
technician needs to inspect machines in different rooms.
However, mobility is also a key feature to allow a remote
helper to get a different point of view inside a given
workspace without disturbing the local worker.
3.3DesignandimplementationofTeleAdvisor
The TeleAdvisor system is designed to achieve
independent view navigation. In other words, it allows
the remote helper to freely explore the workspace in
real-time, independently of the local worker. Gurevich
et al. wanted to reproduce the metaphor of a someone
looking above the shoulder of a user to see what he/
she is doing, providing visual guidance by pointing at
workspace objects and virtual drawing while orally
giving explanations and details [21].
To achieve such a result, the authors conceived a device
regrouping two cameras and a projector fixed on the
same 5-DOF articulated arm (Figure 10). This arm is
itself placed on a robotic wheeled workstation with a
laptop handling computations and communications. The
remote helper can control both the wheeled workstation
and the robotic arm to view the workspace and project
AR content from many different viewpoints.
A significant challenge with this kind of approach is
to make sure correctly synchronize the observed view
(from cameras) and the projection view. If the mapping
between the two is erroneous, the local worker will
see projected virtual objects with a significant spatial
offset… While the remote helper will be convinced to
be perfectly aligned with real world objects! This has of
course a non-negligible impact on performance, causing
confusion and creating errors.
Figure 10: The second TeleAdvisor prototype conceived by [21]. Picture courtesy of the authors.

P20
Figure 11: The Graphical User Interface of TeleAdvisor for the
remote helper. Picture courtesy from the authors.
With a non-movable system and a fixed workspace, this
issue may be addressed thanks to a careful calibration
[28]. However, TeleAdvisor is a mobile solution. It thus
requires a dynamic camera-projector views mapping
which consider in real time the distance between
the projector and the location virtual objects are
projected on. The authors propose an approach based
on an ofﬂine calibration (for stereo cameras and the
projector) followed by a real-time correction based on
homography. Discussing the technical implementation
of this procedure is out of the scope of this book but all
details can be found in the paper [21].
To control the robotic arm and change the point of view,
the remote helper has access to a 2D Graphical User
Interface (GUI) on a traditional computer. This approach
has the benefit of being straightforward to use. The
remote helper point at real objects by sharing a virtual
cursor, draw annotations and insert text and predefined
shapes into the workspace (Figure 11). This can be
achieved either by mouse and keyboard or by using a
touch screen.
The authors made the choice of having a single
exocentric view of the workspace instead of multiple
ones. Multiples views is a common paradigm in
Information Visualization [59], but the authors argue
than it may also be confusing. Gurevich et al. propose
positional bookmarks instead: the ability to save
KEY TAKE-AWAYS
TeleAdvisor is a great example of remote assistance
system based on projected AR. It involves many
important cooperation features such as free hand
drawing and independent navigation for the
remote helper.
The immersion feeling is however limited for the
remote helper.
different camera locations in order to automatically go
back to these positions later on [21]. This concept is
close to the discrete viewpoint switching technique in
AR conceived by Sukan et al. [55].
3.4 Evaluation and limitations of the system
Experimental evaluations of TeleAdvisor suggest two
main results. First, the system seems to be a promising
tool for remote assistance. Participants were able to
use the system effectively and mainly focused on the
free hand tool for drawing annotations. Qualitative
results indicate that TeleAdvisor was judged intuitive
and very useful. Secondly, the authors also compared
the classical system (remote helper controlling the
view) with an alternate one where the local worker is
in charge of physically moving the arm. Results suggest
that it may be better to let the remote helper manage
the view. Such a phenomenon may seem intuitive, but it
needed to be confirmed experimentally and quantified.
Nonetheless, TeleAdvisor still comes with a few
limitations. While using a 2D GUI on a computer makes
the learning phase straightforward, it also drastically
limits the remote helper immersion. This limitation may
impact performance and usability in complex large-
sized workspaces or when the task to be performed
involves a lot of 3D interaction. Besides, the feeling of
telepresence is limited for the local worker, who cannot
see hand gestures or facial expressions made by the
remote helper.

P21
AppLIcAtIon domAInS for remote ASSIStAnce
ReMOte assIstaNCe IN
aUGMeNted sURGeRy
Article : Andersen, D. et al. 2016. Virtual annotations
of the surgical field through an augmented reality
transparent display. The Visual Computer. 32, 11 (Nov.
2016), 1481–1498. DOI:https://doi.org/10.1007/s00371-
015-1135-6.
Mixed Reality has impacted many industrial sectors
and the TeleAdvisor system [21] presented in the
previous chapter is an example of remote assistance
system among many. However, there is another large
application domain where MR is more and more
investigated and used: the medical field, and especially
surgery. Adding medical information into the operating
room or directly superimposing it on the patient’s body
is a very interesting feature to facilitate the work of
surgeons. Nonetheless, MR goes beyond this addition of
virtual content: it also bring further remote assistance
features. In some cases, surgeons need to use a specific
surgical procedure they are not fully familiar with.
The assistance of expert colleagues thus becomes a
valuable support.
In this chapter, we present a paper focusing on
remote assistance for surgical operations, also called
telesurgery. Andersen et al. proposed a collaborative
system where a remote expert surgeon can create
AR content like annotations and virtual instructions to
guide a local trainee [1]. This local trainee is inside the
operating room and visualizes the AR content thanks to
a tablet fixed above the patient’s body. An overview of
the system is available in Figure 12.
Before entering into the details of the system proposed
by Andersen et al., we will start by reviewing the
challenges of surgery in MR.
FiGURe 12
Figure 12: The envisioned system proposed by Andersen et al.: a tablet above the patient’s body acting as a
«transparent» AR display [1]. Images courtesy from the authors.

P22
4.1 Challenges of MR for surgery
Surgeries are long, complex and stressful procedures. In
addition to the complex technical gestures to perform,
surgeons must adapt their work to the differences of
each patient’s body and sometimes take life-or-death
decisions on the fly [13]. Surgeons thus have a significant
cognitive load during operation. Anything breaking their
concentration or their feeling of being in control should
be avoided or removed from the operating room (OR).
The main constraint is the asepsis: every object in
contact with the patient must have been sterilized
beforehand by following the appropriate procedure. To
reduce as much as possible the risks of infection for the
patient, all medical team members also have a specific
sterilization phase before entering the OR. For instance,
surgeons wear sterile gloves and cannot touch non-
sterilized objects. HMD-based AR is compatible with
OR requirements but is far from being a perfect solution.
Since the HMD cannot be fully sterilized (electronic
components would be damaged in the process), the
surgeons cannot touch it after putting it. This can be an
issue it the HMD needs to be repositioned on the head or
if some projections (blood for instance) reach the HMD
glass or sensors. Besides, wearing an HMD for extended
periods of time (up to several hours) can increase the
physical tiredness of surgeons.
AR has a lot of potential to support the work of surgeons because it facilitates the access to medical information. It
allows to visualize virtual content such as patient’s data and radios within the patient area. Virtual instruction and
medical information can also be directly superimposed on the patient’s body to guide surgical gestures. Instead of
going back and forth between the patient and a distant monitor, the surgeon can thus visually focus only on the
patient [5]. Nonetheless, the OR context imposes several strict constraints to surgeons which directly impact MR
usage, as detailed in Table 4.
DID YOU KNOW?
The different technologies of MR can be useful for
surgery, but in different contexts [18]. For instance,
VR can be useful for training, teaching and
patient reeducation purposes. However, during an
operation, surgeons need to focus on the patient’s
body. That is why hereafter we mostly discuss
about AR.
DID YOU KNOW?
These constraints did not stopped Microsoft
from promoting the usage of the Hololens 2 for
augmented surgery. After a first operation in AR
at the end of 2017, the company organized in
February 2021 a 24 hours marathon of augmented
surgeries. Surgeons wearing the HMD could see
holograms in the OR and exchange in real-time
with remote colleagues. Followed by 15000 viewers
from 130 countries, the event is a clear sign of the
current interest about MR for surgery.
Contraint Name Description
Asepsis
Asepsis
OR environment
OR environment No contact with non-sterile object
No contact with non-sterile object
No hand-held device (tablet, controllers…)
No hand-held device (tablet, controllers…)
Cannot reposition or clean the HMD with sterile gloves
Cannot reposition or clean the HMD with sterile gloves
No body-touch interaction techniques like [2]
No body-touch interaction techniques like [2]
High
High
luminosity
luminosity
OR environment
OR environment Holograms may be harder to see
Holograms may be harder to see
Gestures may be more difficult to detect
Gestures may be more difficult to detect
Ambient
Ambient
noise
noise
OR environment
OR environment Harder to use voice commands: noisy medical machines,
Harder to use voice commands: noisy medical machines,
medical team communications, surgical masks…
medical team communications, surgical masks…
High stress
High stress
and cognitive
and cognitive
load
load
Surgical task
Surgical task Surgeons need to focus on the patient, not on MR content
Must not disturb surgical workflow
Must not disturb surgical workflow
Must be able to turn off MR at any time
Must be able to turn off MR at any time
Need of
precision
Surgical task
Surgical task Requires accurate real-time tracking and positioning of
Requires accurate real-time tracking and positioning of
virtual content (order of magnitude: a few mm, sometimes
virtual content (order of magnitude: a few mm, sometimes
less)
less)
Table 4: Overview of the main constraints in the Operating Room (OR) and their consequences on MR usage.
Need of
precision

P23
What about remote assistance? As mentioned before,
surgeons may need to seek the help of colleagues for an
operation. It can be because they are facing a specific
patient profile or because they need to perform a state-
of-the-art procedure they are not fully familiar with.
Since can for instance happen in rural hospitals where
surgeons perform less operations. Training surgeons
is difficult, costly and time-consuming while surgical
techniques are evolving quickly. Real-time guidance is
thus a valuable tool, especially compared to transferring
patients to another hospital with specialists.
4.2 Remotely guiding a surgeon in AR
The paper of Andersen et al. focus on this need of remote
cooperation in the operation room [1]. The authors
envision an AR system based on tablets, as illustrated in
Figure 12. To respect the asepsis constraints, the tablet is
not hand-held by the surgeon but fixed on a mechanical
arm above the patient. Thanks to its camera, the tablet
acts as a “transparent” device through which the
patient body can be seen. In addition, virtual AR content
created by the remote expert is displayed to guide the
surgeon. The surgeon do not need to hold the tablet,
which is suitable in the OR (hands free, no contact with
sterile gloves). However, if really needed, the position
and orientation of the tablet can still be adjusted.
The remote expert receives the real-time video stream
from the local surgeon’s tablet and can see the patient
body. This remote expert is not in the OR and is thus
not affected by its constraints. The authors proposed
a touch-based interface on a tablet to create virtual
annotations. They implemented three main hand
gestures to draw different types of annotations,
representing different surgical gestures and tools
(incision, stitch and palpation) [1]. An overview of the
corresponding GUI is available in Figure 13.
As mentioned in Table 4, operations require precision
from surgeons and a high manual dexterity. This
crucial need of precision is already hard to match in a
static context. However, the respiration cycle creates
movements within the patient’s body. Soft tissues may
be particularly difficult to track in real time because they
are easily deformed. To address this issue, the authors
proposed an annotation anchoring approach based on
reference video frames with OpenCV (for more details,
please refer to the paper).
Andersen et al. conducted three evaluations on their
prototype [1]. First, they conducted a performance test
to check the robustness of their annotation anchoring.
Then, they collected qualitative feedback during a
usability study with two surgeons. Finally, the authors
compared their AR system to a classical monitorbased
one during a pilot study (participants did not have a
medical background). An overview of experimental
results is available in Table 5.
Figure 13: The system interface for the remote expert. Images courtesy from the authors.

P24
Test Observed results
Performance test
Performance test System fairly robust to tablet
System fairly robust to tablet
movements and occlusions
movements and occlusions
Deformations on patient’s body
Deformations on patient’s body
cause much more issues
cause much more issues
Usability study
Usability study
with surgeons
with surgeons
Surgical field area: need lowest la-
Surgical field area: need lowest la-
tency and highest framerat
tency and highest framerate
The GUI for remote expert was per-
The GUI for remote expert was per-
ceived as complex
ceived as complex
Pilot study
Pilot study
Less visual focus shifts with AR
Less visual focus shifts with AR
Slightly slower with AR but much
Slightly slower with AR but much
better accuracy
better accuracy
Table 5: Overview of experimental results from the three evaluations
conducted by Andersen et al. [1].
Experimental results suggest that the proposed system
has potential for remote assistance in augmented
surgery. Preserving the visual attention of users (less
visual shifts) and allowing them to perform more precise
gestures are key desired outcomes of MR. The ability for
remote experts to add virtual annotations anchored on
the surgical field is also a great step forwards compared
to classical oral guidance only.
The system is fully implemented in the two tablets:
no computation is done by an external device. This is
a valuable design choice for surgery since the OR is
a resource-limited environment. Nonetheless, some
participants of the pilot study reported than the lack of
depth perception from the tablet screen increased the
task difficulty. It would be interesting to compare an
improved version of this prototype with a HMD-based
approach.
Moreover, qualitative feedback from the usability study
highlight the strong need to include surgeons in the
design of remote assistance systems. Surgeons are
particular users facing unique challenges and specific
constraints in the OR: generic designs, interfaces and
MR interaction techniques may not be adapted to the
surgical context.
KEY TAKE-AWAYS
Surgery is a key application domain for MR and remote assistance, but raises unique challenges related to
the operating room environment and the complexity of surgical procedures.
Instead of using an HMD, a tablet above the patient’s body is an interesting approach to visualize anchored
virtual content. This approach respects the constraints of surgeons and can facilitate remote guidance.

P25
the zoom:
groUpwAre
Groupware are a specific type of software designed for cooperative and collaborative tasks. They are built upon a
well-known statement: groups are complex social entities which are difficult to study. Many social and psychological
factors can inﬂuence the activity of a group: the location and personality of its members, the number of participants,
the chosen method to take decisions and handle conﬂicts…
It is thus difficult to design a piece of software adapted to a task with concurrent users. Yes, a Google Doc can do
the trick for a short school report, but have you tried to use it to redact a complete European project call with many
partners? You may soon realize that many key-features are missing to work together efficiently…
Test Description – Related features
Group awareness
Being conscious of the activity of
others
Observability of
resources
and actions
Observing, making public or filtering
elements
Level of coupling
Having the same unique view (“What
You See Is What I
See”) or different views for each user
See”) or different views for each user
And many others…
Many conceptual tools have been proposed in the
literature to analyze groupware [23, 52]. For instance,
ergonomic criteria regroup a set of properties like group
awareness and identification. The table has the right
below gives an overview of a few of them:
Remote assistance solutions can be considered as a subset of groupware. It may thus be valuable to havea look at
the design guidelines and past CSCW studies about groupware. They can simply give ideas or inform the conception
of the whole system!

SECTION 3
Visual cues and social presence 27
Avatars and telepresence 31
Mini-me: miniature avatars 35
USERS AND
THEIR ACTIVITY
Visually representing

VISUALLY representing users and their activity
VISUAL CUES FOR SOCIAL
PRESENCE IN MR
Article : Teo, T. et al. 2019. Investigating the use of
Different Visual Cues to Improve Social Presence within
a 360 Mixed Reality Remote Collaboration. The 17th
International Conference on Virtual-Reality Continuum
and its Applications in Industry (Brisbane QLD
Australia, Nov. 2019), 1–9.
Giving the feeling that local and remote users are
working next to each other in the same environment
can have a significant impact on user experience and
performance. In fact, this goes further than remote
assistance scenarios: remote communications in general
can benefit from “adding human into the loop”, of
getting closer to face-to-face physical exchanges. This
is particularly true in the current Covid-19 pandemic
context, were technological tools must be used to stay
connected to others. However, simple videos on a 2D
screen are far from giving the feeling of really being
together. How can we achieve such a result? MR is
a powerful tool for sure, but we are not quite able to
project perfect representation of ourselves like in many
SF books and movies..
In this chapter, we present a paper from Teo and al.
about this feeling of telepresence [57]. The authors
focused on a 360° panorama system in MR and
investigated different cues to increase this feeling and
to facilitate the collaboration between remote users.
While visual cues like shared pointers and drawings
may seem straightforward, it is interesting to see their
benefits and challenges in the case of a 3D environment
in MR. Besides, this paper regroups three different
experimental evaluations and extract valuable design
guidelines from them.
DID YOU KNOW?
Physicists are currently struggling to teleport away even a single molecule in a controlled environment.
Nonetheless, while the laws of Physics may be stubborn, virtual teleportation seems much more feasible in a
not so far future. Holoportation (for holographic teleportation) consists in capturing a volumetric video of users
in real time and displaying the corresponding hologram in the same shared environment. For an overview of
what it currently looks like, we recommend you this video (https://www.youtube.com/watch?v=Yy8XoPsbAk4
) from the i2CAT foundation about holoconferences.
Star Wars better watch out!
5.1 Different aspects of presence
Let’s start with a bit of terminology. The notion of remote
presence is complex and similarly to MR, multiples
definitions have been used over time to characterize it.
In their work, the authors focus on two different aspects:
• Spatial presence refers to the feeling of self-location
in a given environment and the perception of possible
actions within it. For instance, the MEC questionnaire
about spatial presence includes questions about the
feeling of being part of the environment, of being
physically present there and being part of the action
[58]. To some extends, it is similar to the concept of
immersion.
• Social presence (also called co-presence), on the
contrary, is focused on others. It refers to the feeling
than other users are “there” with us [41]. Social presence
is linked with the realism of the representation of other

P28
users and the feeling of being connected to them through
the medium. This aspect is particularly important for
collaborative tasks and remote assistance.
What about telepresence then? This term refers to the
notion of presence through a technological medium. It
is thus a broader concept englobing spatial and social
presence, as shown in Figure 14.
In their work, Teo et al. used a 360° panorama system
to study both spatial and social presence in the context
of remote collaboration in MR [57]. The local worker
wears a 360 camera mounted on top of an AR HMD (the
Hololens). This camera records a live 360 video of the
scene, allowing the remote helper in VR to be completely
immersed in the environment of the local worker (cf
Figure 15). Even better, it becomes possible to have
independent viewpoints between users: the remote
helper is not restricted to the current field of view of the
local worker but has access to the whole panorama.
Of course, this solution still has some limitations. The remote user can freely execute rotations (“turning the head”)
but is still restricted to the physical position of the local worker. It is thus not possible for the remote helper to
translate to other positions in the workspace to have a different point of view. Besides, it may be hard to be aware
of where the other user is currently looking at and 360 camera can also be affected by head tremors which might
impact user comfort.
Figure 14: Comparison of the notions of presence and telepresence.
Presence Telepresence
FiGURe 15
Figure 15: Overview of the 360 panorama MR system used in [57].
Local User
Remote User
VR controller
VR HMD
Hand tracker
360° Camera
AR HMD
LIVE 360 VIDEO

5.2 Improving collaboration using visual cues
Many studies have proposed visual cues to facilitate
the collaboration between remote users in MR. Here,
Teo et al. implemented several visual feedback for hand
gestures [36]. The remote user’s hand is represented
in AR by a virtual model in the FOV of the local user.
Pointing gestures are supported by drawing the
corresponding virtual ray. This line is drawn 1) from the
extremity of the local user’s finger or 2) from the head of
the VR controller of the remote helper. A dot at the end
of this ray can be used as a precise cursor. Moreover,
the remote helper can also draw annotation to guide the
local worker. These annotation are spatially registered
in the real environment and thus stay at the same fixed
position independently of user movements.
The authors also added two additional visual cues
related to users field of view. The View Frame is a
colored rectangle indicating the FOV of the other user
while the View Arrow always points at the View Frame.
This arrow becomes visible as soon as the view frame
is out of sight to help users know where the other is
looking at. Figure 16 gives an overview of the visual
cues implemented by the authors.
Figure 14: Visual cues considered by Teo et al. [57]. Image courtesy from the authors.
Teo et al. conducted three experimental studies to
investigate the effect of these visual cues on social and
spatial presence and user experience:
• Study A compared individual visual cues. Four
conditions were considered: no visual cues, virtual hand
only, pointing ray only and virtual hand + pointing ray.
In each condition, verbal communications were allowed
and View frames/arrows available.
• Study B focused on two conditions: virtual hand only
vs virtual hand + annotations. Contrary to study A, users
had to perform an asymmetric task this time: instead of
having the same role, users were either acting as worker
or helper.
• Study C explored users preference about the different
visual cues, allowing them to switch at will between
conditions.
Each time, users were performing a collaborative task
based on decorating or filling a bookshelf with different
objects.
Experimental results suggest than more than a single
visual cue, it is the combination of several cues that
matters. Such a combination can increase social
presence, partially improve spatial presence and reduce
subjective cognitive load [57]. The number of visual cues
thus plays a significant role. However, there may be a
threshold about this number as too much cues would
create visual occlusion. This is particularly true for AR
HMDs like the Hololens: many users reported that their
experience was negatively impacted by the limited size
of the augmented FOV.
DID YOU KNOW?
The field of vision of humans is close to 180°
horizontally and 125° vertically. Even if our gaze
converges on a precise point only at a time, the
different sectors of the peripheral vision still allow
us to perceive colors and movements. Therefore,
having augmented FOV of 30-40° (horizontally,
and even less vertically) with current AR headsets
represents a strong limitation. The challenge is
to address optical issues (distortions, luminance
of virtual objects, and so on) while preserving
use comfort (eye tiredness, bulkiness of head
equipment…).
Some studies have nonetheless investigated
the effects of having a large AR FOV [30], with
sometimes surprising results. For instance, it seems
than having a bigger FOV does not necessarily lead
to better performance in visual search tasks [30].

P30
Table 6: Design guidelines about MR systems for remote collaboration [57].
Design
guideline
Description
DG1
The size and number of visual cues must match with the size of
the FoV. The benefits of combining several cues may be lost if
they create too much visual occlusion.
DG2
Ensure that hand trackers can handle a suitable range of
different angles. The goal is to convey natural gestures.
DG3
The relevance of the different visual cues depends on the task
to be performed. However, a shared pointer can be a primary
cue for many tasks, followed by drawings and then hand
gestures.
Another interesting result is the
effect of user roles. Teo et al.
observed that the combination of
visual cues had an effect mostly on
the local user and not on the remote
user [57]. This role asymmetry
was also observed for subjective
preferences: local users were more
interested on easily-noticeable cues
while remote users preferred cues
that were easy to use. The nature of
the task may also inﬂuence results:
hand gestures may be more useful
for object manipulation tasks while a
shared pointer may be more suitable
for time-critical tasks.
The authors highlighted the fact that participants mainly used verbal communication to achieve the task. Visual
cues were only supporting oral exchanges and were mostly used when verbal communication was not efficient.
Finally, Teo et al. proposed three design guidelines based on their experimental results, as shown in Table 6.
It is interesting to notice than a similar study of Bai et al. [4], which also included a visual feedback for user eye-
gaze, also led to similar results. This ray-cast gaze cue even gave better results than the combination of other cues
for spatial layout and self-location awareness. Eye-gaze thus seems an promising candidate modality to support
with visual cues.
KEY TAKE-AWAYS
Combining several visual cues can support remote collaboration by increasing social and spatial presence.
These visual cues must be chosen with respect to the nature of the task and the roles of users to be
efficient. In particular, the local worker should have access in priority to visual cues as the remote helper
may not benefit as much from them.

P31
VISUALLY repreSenting USerS And their ActiVitY
aVataR aNd telePReseNCe
OF ReMOte tUtOR
Article : Cao, Y. et al. 2020. An Exploratory Study of
Augmented Reality Presence for Tutoring Machine
Tasks. Proceedings of the 2020 CHI Conference on
Human Factors in Computing Systems (Honolulu HI
USA, Apr. 2020), 1–13.
In the previous chapter, we focused on visual feedback
related to the current activity of users. However, instead
of simple activity cues, can we not directly represent
remote users? Would it not be better to display the whole
user body in MR in order to convey the whole non-verbal
communication? While this option seems reasonable, it
also raises many questions. What would be the best
approach to represent the user full-body? Is a realistic
representation always better than a “cartoonish” one?
What about visual occlusion with a human-sized
avatar?
To start exploring this topic of user representation, Cao
et al. recently proposed an experimental study of AR
presence for a remote tutoring task [12]. The authors
investigate different representations for the remote tutor,
from a simple location cue to a full-body avatar in AR. An
overview of the considered representations is available
at Figure 17. In addition to the valuable experimental
results based on different types of interaction, choosing
this paper also allow us to explore the domain of remote
tutoring, which shares many similarities with remote
assistance.
Figure17: The different remote user representations explored by Cao et al. [12]. Image courtesy from the authors.
(1) - a Video
(2) - b Non-Avatar-AR
(3) - c Full-body+AR
(4) - d Half-body+AR

P32
Figure 18: The different types of steps identified by Cao et al. [12]. Image courtesy from the authors.
6.1 Industry 4.0 and machine tasks
The authors start their work by highlighting the need of
adapted formations for Industry 4.0 workers. Industry
4.0 aims at transforming traditional factories into “smart
factories” thanks to the Internet of Things (IoT) and
autonomous systems including AI and Machine Learning
technologies (often called cyber-physical systems). In
other words, new processes and equipment are quickly
emerging and workers need to adapt to these changes.
In particular, they need to master new machines and
systems.
AR has been and is still currently explored as a
promising tool to facilitate learning phases and tutoring
sessions in industrial scenarios. Such scenarios include
maintenance training on machines and vehicles, facility
monitoring and mechanical part assembly [12]. AR also
allows to share complex 3D virtual objects of tools and
machine components [39]. For asynchronous scenarios
like tutoring sessions motivating this paper, experts can
create guidance content in advance by recording videos
of the procedure to be performed (see Figure 17a) and
DID YOU KNOW?
Despite its potential, MR is still far from being
deployed in a majority of factories. There are
of course technical limitations related to the
technology itself (size of FoV, limited computational
power on current HMD, and so on). But beyond
these, an appropriate network infrastructure is
also required to support MR usages for remote
assistance, especially in terms of latency and
bandwidth.
European projects like Evolved-5G aim at
overcoming this gap using 5G network capabilities.
Meanwhile, researchers already started working
on 6G [51]. See you in a few years!
by adding virtual instructions into the workspace. For
synchronous remote group work, sharing visual cues
allow to guide the attention of other users and to convey
their current activity.
Nonetheless, Cao et al. argue that many previous studies
and tutoring systems only consider local tasks, i.e. steps
that can be performed within arm’s reach [12]. In that
case, simply adding virtual content on the machine
may be enough to guide the local worker. Nonetheless,
machine tasks may require larger spatial movements
like moving inside the workspace or physically turning
around a large machine. Could adding an avatar
to represent this kind of movement help users? To
investigate this question, the authors explored here
types of steps illustrated in Figure 18:
1. Local steps can be performed with one hand and
without body-scale movement. For instance, pressing a
switch button within arm’s reach.
2. Body-coordinated steps imply a two-handed action
requiring body, hand and eye coordination to be
achieved. Turning two knobs at the same time while
monitoring their effect on a temperature gauge would
fall into this category.
3. Spatial steps require a significant navigation phase
before the machine interaction. An example of spatial
step could be to look for a specific tool a few meters
away in the workspace before using it on the machine.
What is the link between these three types of steps and
our chapter topic about avatars? In one sentence: the
authors explore different visual representations of the
remote tutor with respect to these different steps.
(2)
(1) (3)

P33
6.2 Visually representing the remote user
Cao et al. explored four different representations of
the remote user, as illustrated in Figure 17. The Video
condition represents the classical non-AR baseline. The
Non-avatar-AR condition represents the standard AR
approach, with virtual content superimposed on the
machine and a circle representing the location of the
remote user. The Half-Body+AR conditions builds on
the previous one by adding a partial avatar with only
a head, a torso and hands. Finally, the Full-Body+AR
condition completes this avatar by adding arms and
legs on it. Overall, there is thus a progressive increase of
visual information about the remote tutor.
To evaluate these representations and their impact on
social presence, the authors conceived a conducted
an experimental study based on the mockup machine
illustrated in Figure 19.
This testbed machine was conceived with two goals in
mind: 1) reproducing interaction metaphors found on
real machines (with physical widgets like knobs and
levers) and 2) allowing to test local, bodycoordinated
and spatial steps. Participants were invited to perform 4
sessions of machine tasks, where each session included
a mix the different types of steps.
Overall, the two avatar conditions (Half-Body+AR and
Full-Body+AR) were preferred over the two baselines
(Video and Non-avatar-AR) [12]. However, most
participants preferred the Half-Body+AR condition
because it created less visual occlusion. Quantitative
results support this preference: participants were
quicker when using this representation while keeping
the same level of accuracy. The authors suggest that
by masking a larger section of the machine, the Full-
Body+AR representation may have increased user
cognitive load and attention distraction.
Nonetheless, experimental results also highlight
the importance of the type of task. The Full-
Body+AR condition was perceived as the most useful
representation for body-coordinated tasks and gave a
better feeling of social presence. Using a representation
closer to a real human made the tutor more “friendly
and believable”. Meanwhile, the Non-avatar-AR was
the favorite and quickest condition for local tasks. In that
case, avatar representations provided little to benefit, or
where even judged cumbersome by some participants.
Figure 19: Experimental setup used by Cao et al. [12]. Edited pictures courtesy of the authors.
DID YOU KNOW?
Getting closer to a realistic human representation
may provide benefits, but be careful about the
uncanny valley [62]! This famous effect was
proposed as early as 1970. Mori theorized that
at a given point, humans would feel repulsion
in front of robots that are too close to humans
in terms of appearance and motions. Instead of
trying to be as realistic as possible, other studies
focus on other approaches to trigger a positive
emotional response about machines. For instance,
Herdel et al. explored giving cartoonish (and cute)
expressions to drones [22].
Overall, it thus seems that the type of task should be
considered by interaction designers as a major factor, as
summarized in Table 7.
Another observation made by the authors concerns
the tutor following paradigm. One the one hand, some
participants preferred staying “inside” of the tutor
avatar and reproduce its gestures synchronously. This
allowed them to have a first person view of the gestures
to be performed. On the other hand, other participants
preferred to stay apart from the tutor avatar. They
explained that they preferred this third person view
because they felt uneasy with colliding into a virtual
humanoid. This effect was already observed in a

P34
previous study [27]. Two guidelines may be extracted
from these observations:
• It is important to let users choose between a first person
and third person view of the remote user’s avatar.
• Spatially aware avatars avoiding “collisions” with
humans may increase the comfort of some users.
More generally, Cao et al. suggest following a user-
responsive design for tutoring [12]. Beyond having a
avatar aware of the movements of users, this also mean
adapting the AR content to the activity of users. For
instance, a recorded tutor avatar could be active only
when workers are looking at it to avoid disturbing their
attention.
Of course, classical cases of remote assistance is a
bit different since it is users form both sides are often
working synchronously. Nonetheless, most of the
findings of this paper can be generalized to other MR
remote group work contexts.
KEY TAKE-AWAYS
Avatars are useful to represent a remote user in MR.
They can increase performance, social presence
while reducing subjective cognitive load.
However, their size and level of visual details should
be considered carefully to limit visual occlusion.
The responsiveness to the activity of users is also
important.
Both first-person and third person views of the
remote user avatar can be useful, depending on
users.
Type of task Representation to consider Reasons
Local Only local AR content Avatar provide limited to no benefit
Better performance and comfort without it
Body-coordinated Full-body avatar Increased social presence
Spatial Half-body avatar Limited visual occlusion
Preferred overall
Overall Half-body avatar
Table 7: Visual representations to consider depending of the nature of the task.

P35
VISUALLY repreSenting USerS And their ActiVitY
MINI-Me: addING a MINIatURe
adaPtatIVe aVataR
Article : Piumsomboon, T. et al. 2018. Mini-Me:
An Adaptive Avatar for Mixed Reality Remote
Collaboration. Proceedings of the 2018 CHI Conference
on Human Factors in Computing Systems (Montreal
QC Canada, Apr. 2018), 1–13.
Mixed Reality allows to explore many dimensions and
new concepts for group work and remote assistance,
including scaling. Scaling virtual objects allows to
make them bigger to see specific details or to shrunk
them down to prevent them from occupying too much
space. In both VR and AR, scaling virtual objects is
now straightforward and natively available in existing
frameworks. In some cases, a whole environment can
be unscaled, for instance to obtain a World-In-Miniature
(WIM) [14]. More exotically, it is also possible to change
the scale of users. This can give the same impression
than having a WIM if the user becomes gigantic
compared to the environment. Or, on the contrary, MR
can be used to transform the user into the equivalent of
AntMan, lost in a world much bigger than usual.
Thammathip Piumsomboon, an HCI researcher working
on immersive technologies and AI, proposed several
studies built around these concepts of different scales
in MR. The paper presented in this chapter, Mini-Me,
explores an innovative concept: adding a second,
miniature avatar to complete the traditional human-
sized one. The remote user in VR is thus represented
by two avatars with different scale, location and
orientation. The Mini-Me avatar reﬂects the eye-gaze
direction and the gestures of the remote user and stays
within the local worker’s FoV. An overview of the system
is available in Figure 20.
Instead of playing on the amount of visible details of the
avatar like in the previous chapter [12], Piumsomboon
et al. thus play on its duplication and its size [46].
This approach is an interesting compromise between
increase the feeling of social presence without creating
too much visual occlusion.
Figure 20: Overview of the Mini-Me system [46]. The local user in AR can see two avatars conveying the activity of the remote
user: a human-sized avatar and a miniature one. Image courtesy from the authors.

P36
Figure 21: Different scales of the remote user in VR. a) VR user shrunk down, seeing the AR user (woman) as a giant. b) How
the AR user sees the miniaturized remote user (small avatar inside the dome). c) VR user (man) scaled up as a giant. Image
courtesy from the authors [46].
different perspective, as shown in Figure 21. The authors
thus applied a specific shader (toon shader) to the Mini-
Me avatar to make it more distinguishable from the main
avatar. A ring indicator is also displayed around the feet
of the Mini-Me and to indicates the direction of the VR
user. This additional feedback seems particularly useful
in this kind of setup because the VR user can move using
teleportation.
The authors also considered the size and the positioning
of the Mini-Me avatar in the local user’s FoV. Always
placing the miniature avatar in front of the gaze of the
user was too distracting. Placing it on one side of the
HMD screen was better but still created significant
visual occlusion. Therefore, Piumsomboon et al. made a
third iteration made a third design iteration. In this final
version, the scale of the Mini-Me avatar is dynamically
adapted by taking into account it’s distance to the
AR user. Besides, the surface where the user’s gaze
is projected also inﬂuences the miniature avatar. For
instance, if the user is looking at a table, the Mini-Me will
appear as if it was standing on it.
DID YOU KNOW?
Remote embodiment is a type of activity cues
based on physical states representations. They
convey body information like location, pose and
kinematics. Avatars are the one of the most
common approach for remote embodiment, but
they do not necessarily need to represent the
full body. For instance, Eckhoff et al. proposed a
pipeline to extract the tutor hand gestures on a
first-aid video and to display the corresponding
hands in AR over a training dummy [17].
7.1 Design of the Mini-Me system
The authors start by motivating their work: they state
that MR group work between AR and VR users may
become commonplace in the future. Their system is thus
designed for a local user in AR with the Hololens HMD
and a remote user in VR. Both users share the workspace
as the system targets room-scale scenarios. Mini-Me
builds on previous work about remote embodiment and
user miniaturization.
Piumsomboon et al. identified several issues and needs
related to group work in MR and used them to guide
the design of their system. We already mentioned
some of these problems: the limited size of augmented
FoV in current HMDs, the need to share non-verbal
communication cues or to know the location of remote
users… However, the authors also identified other
requirements like the need of transitions when an avatar
becomes visible to users or when it disappears. The
goal is to respect social conventions to avoid disturbing
users. Gracefully entering or exiting the user’s FoV? Yes.
Jumpscares? No thank you. Therefore, the authors added
a blue aura around the miniature avatar (see Figure 20).
This aura indicates in advance the proximity of avatar.
While the authors first intention was to use this halo
only during the avatar enters or exit the user’s FoV, their
realized that such a temporary visual effect was mostly
disturbing for participants. Therefore, they transform
the transient initial aura into a permanent visual cue.
Another specific requirement concerns the ability to
differentiate easily the two avatars of the remote user.
In a classical scenario, the difference of size between
the two would be an reliable indicator for the local user.
Nonetheless, the system allow the remote VR user to
scale up or down to explore the environment from a

P37
Figure 22: Mini-Me features. A) Reflecting eye gaze and pointing gestures of the remote VR user. b) Merging with human-sized
avatar when the local user is looking at it. c) Pinning the Mini-Me at a fixed location to prevent it from following the gaze of the
local user. Image courtesy from the authors.
DID YOU KNOW?
This puzzle game was inspired by previous studies
on group work based on AR [6]. More precisely,
they were using Tangible AR[7]: users were
manipulating squared tiles with a unique visual
pattern on each to be identified by the system.
Such a configuration is often not required
anymore to interact with virtual objects in AR.
However, using physical proxies to manipulate
virtual objects still has valuable benefits! Haptic
feedback, proprioception and object affordance
are valuable tools which are often absent from
mid-air interaction techniques.
Experimental results suggest that overall, the Mini-
Me avatar increase the social presence of the remote
user. Most participants found that Mini-Me was very
useful and 12/16 participants preferred it over that the
baseline. For instance, they reported that the miniature
avatar required “less looking back and forth between the
partner and the task space”. The adaptive positioning
of Mini-Me may thus help to limit divided attention.
Objective data also suggest that participants achieve
the cooperating task faster with Mini-Me than without
it. No quantitative result is reported for the collaborative
task as no time constraint was given to participants.
Nonetheless, the realism of the human-sized avatar
was also appreciated and judged positively with respect
to social presence. Moreover, participants paid similar
level of attentions to the remote user regardless of the
presence/absence of Mini-Me.
7.2 Experimental results for cooperative and collaborative tasks
To evaluate the benefits and usability of their system, Piumsomboon et al. conducted an experimental study [46].
In the baseline, the Mini-Me avatar is absent: only the human-sized avatar is visible and reflects the location
and actions of the remote user. Independently of the condition, the remote user was played by an experimenter:
participants always had the role of the local user in AR.
The study was divided in two tasks: a cooperative task (called asymmetric collaboration task by the authors) and
a collaborative task. In the cooperative task, the remote helper was guiding the participant to organize a retail
shelf following a precise configuration. Only the remote helper knew this configuration and only the participant
could place the AR objects on the shelf. This task thus perfectly reflected a remote assistance scenario. In the
collaborative task, participants had to solve an AR puzzle game together and had equal roles.
To convey the activity of the remote user, the Mini-Me reflects both gaze and pointing gestures. Its head is controlled
to always face the point the remote VR user is currently looking at. Similarly, the arms of the miniature avatar are
linked to the real-time tracking data from the VR controllers and the HMD. A visual ray is emerging from the finger
of the avatar also appears to reflect pointing actions, as shown in Figure 22a. Inverse kinematics are then applied
to obtain a coherent body pose.
Overall, the authors thus present a complete system with several interesting features about this combination of
avatars. The rest of their paper is focus on its evaluation through an experimental study.

P38
Criteria Cooperative task Collaborative task
Perceived task
difficulty
Lower with Mini-Me No observed effect
Subjective cognitive
load
Lower with Mini-Me No observed effect
Level of task focus No observed effect Higher with Mini-Me…
Task completion time Lower with Mini-Me No time constraint
Table 8: Summary of observed differences between the cooperative task and the collaborative task.
Interestingly, the authors also observed differences
between the two types of tasks, as reported in Table 8.
The authors then draw a few implications for the design
of groupware systems in MR. They encourage to use
Mini-Me or an equivalent to reduce the need to look at
the other user. This may be especially for cooperative
tasks around a spatial layout, which is the case of many
remote assistance scenarios. Having an adaptative
avatar: 1) conveying the eye-gaze and pointing
gestures and 2) visible at salient locations at any time
may facilitate the task to be performed.
Encouraged by these promising results, Piumsomboon
et al. envision to bring further their work by adding
facial expressions to the Mini-Me avatar [46]. They also
mention going further than visual feedback by adding
spatial sound to the system.
KEY TAKE-AWAYS
Adding a secondary, miniature avatar reﬂecting
the activity of the remote user is a promising
approach to support group work in MR (and
especially remote cooperation).
It is important to consider the positioning and the
scale of this secondary avatar. It should be visible
enough to increase user awareness without
disturbing the task.

P39
the zoom:
fULL- ImmerSIon AVAtArS
In several Science Fiction works, the notion of avatar can go to an extreme aspect: the complete immersion into
another body or mind. All it takes is a genius scientist, a complex machine with many strange lights and a bit of
scenarium to temporarily become someone else. This concept shares some similarities with the idea of the Ultimate
Display by Sutherland [56]: a system so perfect that it cannot be distinguished from the “classical”, unmediated
reality. It may not sound that much visionary nowadays as many authors and artists explored similar concepts, but
Sutherland wrote this report in 1965!
The concept of full-immersion is for instance present in
the Avatar. No, not the animated series about a ﬂying
arrow-head monk (which are cool too, but that is not the
point). Here, we are referring to James Cameron’s movie,
where the main character’s consciousness into the body
of a blue humanoid alien. With a bit of training, Jake is
soon able to control this body and has access to all its
five senses.
Another recent example of full-immersion can be found
in the recent Cyberpunk 2077 video game. Braindance
is a bit different as it involves a full immersion into
someone’s memory. Once again, the five senses are
involved as the subject experiences the senses and
emotions felt by the target at the selected moment. A
concept maybe inspired from the Strange Days movie
from 1995, which story was originally written a few
years before by… James Cameron.
What is the link with remote assistance in MR? The fact that Science Fiction and other anticipation works have
always inﬂuenced technological progress (the other way around is also true). MR already raises questions about
the notion of reality and immersion. Of course, we are very far from full-immersion avatars, but the progresses
made in domains like Brain-Computer Interfaces (BCI) are impressive. Maybe future remote assistance systems
will be a mix of both?

SECTION 4
Light fields for mobile MR 41
Spatial referrencing 45
Using virtual replicas 49
CONCEPTS
Out-of-the-box

P41
oUt-of-the-box conceptS
UsING lIGht FIelds FOR
haNd-held MOBIle MR
Article : Mohr, P. et al. 2020. Mixed Reality Light Fields
for Interactive Remote Assistance.
Proceedings of the 2020 CHI Conference on Human
Factors in Computing Systems
(Honolulu HI USA, Apr. 2020), 1–12.
So far, we mostly presented studies based on MR with
HMDs. Depending on the setup, the mobility of users
may be strictly limited to a predefined environment: for
instance, Cao et al. used external cameras in the room
to track user gestures [10]. In other studies where only
HMDs are involved, we could imagine letting users freely
move between different workspaces. The TeleAdvisor
system and its wheeled camera+projector robotic
arm [21] was even built with this purpose in mind.
Nonetheless, the significant mobility offered to users
by these systems comes at the cost of a non-negligible
amount of hardware and calibration. This limitation
may be a barrier for on-the-fly remote assistance in
unprepared environments.
Onthecontrary,Mohretal.proposedaremoteassistance
system only based on smartphones [36]. This approach
offer a great mobility with commonly found hardware.
Problem solved? Yes and no. MR with hand-held mobile
devices has its own limitations: it blocks at least one user
hand to hold the device, can suffer from hand tremor
and arm fatigue, offers a small augmented FoV… And it
is far from being new [19].
The novelty of the work of Mohr et al. comes from their
innovative approach: the exploitation of unstructured
light fields. To learn more about this intriguing concept,
please follow the guide…
DID YOU KNOW?
AR with hand-held mobile devices is often compared to peephole pointing. This concepts refers to cases
where the workspace is much greater than the screen. Thus, a window of the virtual space (or the augmented
FoV) is moved to reveal the targeted content [26].
Many studies proposed new interaction techniques to guide users towards this offscreen hidden content
[44]. It is far from simply adding virtual arrows pointing at every objects!
8.1 MR light fields and system calibration
The authors identified several requirements in their
review of previous work. They begin by stating that
adding visual remote instructions is an interesting and
well-known tool for remote assistance. However, using
2Doverlaysmayonlyworkwithastaticview.Inarealistic,
3D environment, both the local and remote users often
need to have dynamic and independent viewpoints. In
that case, 3D spatially-registered annotations become
necessary.
Two main approaches to obtain such annotations are
discussed: scanning in advance the environment to
know its 3D features or doing it in real-time. On the
one hand, the former conflicts with the spontaneous,
on-the-fly remote assistance aimed by the authors,
and was thus not considered [36]. On the other hand,
real-time scanning approaches like Simultaneous
Localization And Mapping (SLAM) require high-quality
sensors and high computational power to obtain a good
visual quality. These conditions are often not matched

P42
with current mobile devices. Besides, geometric reconstructions may be hard to achieve in some specific conditions,
for instance with shiny or textureless objects.
To overcome the limitations of these approaches, Mohr et al. proposed an alternative based on a database of
images registered in 3D space. These images represent a sampling of the light rays emitted from the local user’s
workspace, (hence the light field terminology). The images are pictures taken by the local user under the guidance
of the remote user, as illustrated in Figure 23.
Figure 23: Scene capture. a) The local worker takes pictures of the targeted workspace. b) The remote helper, using
an Augmented Virtuality interface, can explore the coarse scene and guide the local users to complete the scene
capture. c) The local user sees the virtual annotations in AR. Images courtesy from the authors [36].
Figure 24: Visual feedback for dense light field recording. The triangles become more and more green as they gain
in sample density from the corresponding angle. Snapshots are taken automatically: the local the local user only
needs to move the mobile device. Images courtesy from the authors [36].
Light fields require dense image spaces. Therefore, after the initial recording of a few reference images, the local
user must focus on specific positions proposed by the remote helper. To support this recording of dense local light
fields, a virtual sphere in AR is displayed. Its color indicates the level of sampling for each direction, as illustrated
in Figure 24. With enough image density, it is possible to obtain a photometric appearance of the workspace. In
addition, this high-quality view support a large variety of textures and materials like shiny, metallic or transparent
objects [36].
After the cooperative scene recording phase, the local user has thus recorded a set of local light fields. Using
only salient fragment of the environment allows to reduce the required network and computational resources:
it is thus adapted to the resources of current mobile devices. The downside of this approach is its limited depth
knowledge about the environment. Reconstructing all 3D surface information would be computation-heavy and
time expensive, which do not match with the mobile devices context. Fortunately, there are still ways to share
spatially-registered annotations.
a/ Local user
Local user
b/ Remote user c/Local user

P43
Figure 25: Overview of 3D annotations. a) The remote user draws an arrow on a 2D canvas. b) The local user sees
the corresponding virtual arrow in AR. Images courtesy from the authors [36].
8.2 Adding annotations into the shared
workspace
Mohr et al. included a 3D scene annotation feature in
their system [36]. Thanks to the scene recording phase,
the remote user can navigate within the workspace until
finding a suitable viewpoint. A tap gesture on the tactile
phone screen indicates to the system the main object of
interest. Based on the depth of the corresponding area,
the system then places a 2D plane on the scene. The
remote helper can use this plane as a canvas to draw
annotations and share them with the local worker.
Entering into the details of the automatic canvas
placement is beyond the scope of this chapter, but the
whole method is described in the paper [36]. It is worth
to mention that the remote helper can still translate and
rotate (yaw and pitch rotations) the canvas afterwards
if needed using the provided GUI.
Overall, this approach allows to make a good trade-off
between the limited 3D information and the amount
of computations required to create annotations in the
3D workspace. Moreover, both users share the same
system coordinates (the one from the local user during
the scene recording phase). Displaying the correct
spatially registered annotations on the local user side
is thus straightforward. An overview of the resulting 3D
annotations is available in Figure 25.
8.3 Evaluating the usability of the system
The authors conducted three experimental studies to
evaluate mainly the usability of their system:
• Study 1 focus on the authoring of annotations. To do
so, the proposed system is compared with an alternative
3D interaction technique based on multiple views.
• Study 2 evaluates the effectiveness of annotations
for local users. It also considers the impact of erroneous
registration (offsets between the annotation and the
targeted real object) on users.
• Study 3 focus on the scene recording phase.
The main experimental results from these three studies
are reported in Table 9.
The current prototype is limited to annotations made
on 2D planes and the scene recording phase may
need a few improvements. Nonetheless, the usability
of the system proposed by the authors seems overall
already good in the current state. This approach thus
seems promising for on-the-ﬂy remote assistance
with smartphone devices only. Besides, it introduces
interesting and innovative aspect for remote assistance
such as a cooperative setup phase and the usage of
local light fields in MR.

P44
Table 9: Overview of the main experimental results from [36].
Experimental results
Study 1 Initial results: Visual feedback required to support the canvas placement phase
After corresponding changes: Faster and less errors with the proposed system,
Also preferred by participants
Study 2 Positive qualitative feedback about usability
No observed impact from registration errors
Study 3 Overall, positive qualitative feedback about usability
Several improvement suggestions:
• Adding a live-stream view of the local worker activity for the remote user
• Facilitating the estimation of object scales for the remote user
• Facilitating the local light field recording by adding performance feedback and sharing the virtual sphere
with the remote helper.
Future work may focus on spatial and social presence of this kind of system, two aspects absent from the current
paper.
KEY TAKE-AWAYS
The notion of light fields refers to a set of images registered in a 3D environment. They provide a high visual
quality of the workspace, but may be difficult to capture and interact with.
• Mohr et al. proposed a MR system based on local light fields [36]. This system allows to share spatially
registered annotations after a cooperative setup phase.
• The proposed system seems promising for on-the-ﬂy remote assistance in terms of usability.

P45
oUt-of-the-box conceptS
FaCIlItatING sPatIal
ReFeReNCING IN MR
Article : Johnson, J.G. et al. 2021. Do You Really
Need to Know Where “That” Is? Enhancing Support
for Referencing in Collaborative Mixed Reality
Environments. Proceedings of the 2021 CHI Conference
on Human Factors in Computing Systems (Yokohama
Japan, May 2021), 1–14.
In the context of group work, referencing corresponds
to the ability of referring to an object in a way that is
understood by others [24]. This ability is at the heart
of many remote assistance tasks: remote helpers often
need to indicate target objects of interest while local
users may reference objects to ask confirmations or
more precise details about them. By nature, referencing
is thus a spatial ability linked to pointing gestures.
For co-located users, being in the same environment
significantly facilitate this process. However, in remote
assistance scenarios, users need to share enough
context to make referencing possible.
This penultimate chapter focus on spatial referencing in
MR by presenting a recent study conducted by Johnson et
al. [24]. The authors investigate the impact of providing
spatial information and system-generated guidance to
users. For local users, it could seem straightforward that
having visual guidance in MR would help to perform
the task compared to having no guidance. However,
what about remote helpers? Can a partially automated
guidance facilitate referencing on their side? Do they
need as much spatial information as possible (for
instance, being completely immersed in a shared VR
environment) or is a more abstract representation of the
environment enough? These are the kind of questions
explored in this paper.
9.1 Letting the MR system handle the
referencing process
Johnson et al. start by reviewing previous studies and
highlight the fact that remote helpers often prefer a
least-effort approach [24]. Instead of lengthy verbal
descriptions, we naturally tend to use short phrases
complemented by deictic gestures. The good news is
that MR remote assistance systems allow precisely to
do that: sharing a common virtual or mixed environment,
make visible activity cues and highlight user gestures
and body language…
More precisely, the authors distinguish two methods to
share contextual elements in MR:
1. Passive sharing uses features already present in
the environment. For instance, sharing a view of the
workspace allows users to visualize virtual and physical
elements present within it. One way to achieve this is to
use 3D reconstructions of the local user’s environment.
However, it is still complex to overcome the technical
limitations of current hardware and to achieve a high
level of performance. Using a 2D video feed like [1,
21] is simpler, but also raises questions like viewpoint
independence (see Chapter 3) and depth perception
(Chapter 4).
2. Explicit sharing is based on features added on purpose
into the environment. It includes visual feedback, audio
cues… Interestingly, such added features often rely on
passive ones (for instance, remote helpers may need
to view the local environment to correctly register a
guidance feedback into it). Explicit features independent
from any workspace view is quite rare.

Wihte paper : Remote assistance in Mixed Reality

Wihte paper : Remote assistance in Mixed Reality

Recommended

Recommended

More Related Content

Similar to Wihte paper : Remote assistance in Mixed Reality

Similar to Wihte paper : Remote assistance in Mixed Reality (20)

Recently uploaded

Recently uploaded (20)

Wihte paper : Remote assistance in Mixed Reality