SlideShare a Scribd company logo
1 of 13
Download to read offline
Noname manuscript No.
(will be inserted by the editor)




Self-talk discrimination in Human-Robot Interaction Situations
For Engagement Characterization
Jade LE MAITRE · Mohamed CHETOUANI




Received: date / Accepted: date


Abstract The estimation of engagement is a funda-            1 Introduction
mental issue in Human-Robot Interaction and assistive
applications. In this paper, we describe (1) the design      During the past decades, there has been growing in-
of triadic situation for cognitive stimulation for elderly   terest in service robotics partially due to human assis-
users; (2) the characterization of social signals describ-   tive applications. The proposed robotic systems are de-
ing engagement: system directed speech (SDS) and self-       signed to address various supports: physical, cognitive
talk (ST); (3) a framework for estimating an interac-        or either social. Human-Robot Interaction (HRI) plays
tion effort measure revealing the engagement of users.        a major role in these applications, identifying more pre-
The proposed triadic situation is formed by a user, a        cisely Socially Assistive Robotics (SAR) [1] as a promis-
computer providing cognitive exercises and a robot pro-      ing field. Indeed, SAR aims to aid patients through so-
viding encouragements and helps through verbal and           cial interaction with several applications, including mo-
non-verbal signals. The methodology followed for the         tivations and encouragements during exercises [1–3].
design of this situation is presented. Wizard-of-Oz ex-          Providing social signals during interaction is contin-
periments have been carried out and analyzed through         uously done during human-human interaction [4] and
eye-contact behaviors and dialogue acts (SDS and ST).        lack of them is identified in pathologies such as autism
An automatic recognition systems of these dialogue acts      [5]. Interpretation and generation of social signals al-
is proposed with k-NN, decision tree and SVM clas-           low sustaining and enriching interactions with conver-
sifiers trained with pitch, energy and rhythmic based         sational agents and/or robots. In [6], an early system
features. The best recognition system achieved an ac-        that realizes the full-action reaction cycle of communi-
curacy of 71%. Durations of both manually and auto-          cation by interpreting multimodal user input and gen-
matically labelled SDS and ST were combined to esti-         erating multimodal agent behaviors is presented. The
mate the Interaction Effort (IE) measure. Experiments         importance of feedbacks for the regulation of interac-
on collected data prove the effectiveness of the IE mea-      tion has been highlighted in several situations [7,8].
sure in capturing the engagement of elderly patients             The ROBADOM project [9] is devoted to the design
during the cognitive stimulation task.                       of a robot-based solution for assistive daily living aids:
                                                             management of shopping lists, meetings, medicines, re-
Keywords Social signal processing · Measuring                minders of appointments. Within the project, we are
engagement · Prosodic cues                                   developping a specific robot to provide verbal and non-
                                                             verbal helps such as encouragement and coaching dur-
Jade Le Maitre                                               ing cognitive stimulation exercises. Cognitive stimula-
ISIR UMR 7222
Universit´ Pierre et Marie Curie
         e
                                                             tion is identified as one of the methodologies alleviating
E-mail: lemaitre@isir.upmc.fr                                the elderly decline in some cognitive functions (memory,
Mohamed Chetouani
                                                             attention) [10]. The robot would be dedicated to MCI
ISIR UMR 7222                                                patients (Mild Cognitive Impairment, i.e. the presence
Universit´ Pierre et Marie Curie
         e                                                   of cognitive impairment that is not severe enough to
E-mail: mohamed.chetouani@upmc.fr                            meet the criteria of dementia). Cognitive impairment
2                                                                         Jade LE MAITRE, Mohamed CHETOUANI


is one of the major health problems facing elderly peo-     approaches attempt to estimate engagement from gaze
ple in the new millennium. This does not only refer to      [12, 24,13–15] by considering eye-contact as a promi-
dementia, but also to lesser degrees of cognitive deficit    nent social signal. Eye-contact is usually employed to
that are associated with a decreased quality of life and,   regulate the communication between humans [21]: ini-
in many cases, progress to dementia.                        tial contact, turn-taking, triggering backchannels...
    In the work described in this paper, an engagement
metric is developed for the estimation of interaction           Mutual gaze has been shown to contribute to smooth
efforts during cognitive stimulation exercises. Engage-      turn-taking [16,15]. Goffman [17] mentioned that eye-
ment is considered as the process to which partners         contact during interaction tends to signal each partner
establish, maintain and end interactions [13]. Engage-      that they agree to engage in social interaction. Defi-
ment detection is identified as a key element for the        ciency or failure in gaze during interaction may be in-
design of socially assistive robots. We propose to study    terpreted as lack of interest and attention as noticed
engagement in a triadic framework: user - computer          by Argyle and Cook [18]. In face-to-face communica-
(providing cognitive exercises) - robot (providing en-      tion, initiation, regulation and/or disambiguation can
couragements and backchannels). We identified specific        be achieved by eye-gaze behaviors. Efficiency of an in-
social signals such as system directed speech and self-     teraction is based on the ability of shifting roles, which
talk as indicators of engagement during interaction. In     is again possible via eye-gaze behaviors [19,20]. Dur-
this work, engagement is not considered as all-or-none      ing interaction, gaze might be combined with speech.
phenomenon but rather a continuous characterization         Kendon [21] analyzed these situations and, for instance,
is proposed. To our knowledge, this is perhaps the first     identified the fact that speakers look away from their
study that attempts to automatically estimate a metric,     partners at the beginning of an utterance, and look at
termed Interaction Effort, by exploiting dialogue acts.      their partners at the end of an utterance. This proce-
Specifically, our contributions are:                         dure might be useful since it serves to avoid cognitive
                                                            load (i.e. planning of the utterance) as well as shifting
 – An automatic recognition system for the detection        role with the partner.
   of both system directed speech and self-talk. Self-
   talk provides insights about the cognitive load of           In HRI situations, robots are required to estimate
   the patient with MCI. We also proposed relevant          the engagement of addressee for efficient communica-
   rhythmic features for the characterization.              tion. Estimation of gaze is a difficult task in HRI due to
 – The definition and evaluation of a measure of en-         greater distances between the robot and the addressee,
   gagement based the previously detected acts. This        consequently other cues such as head orientation, body
   measure is employed to understand the strategy of        posture, and pointing might also be used to indicate
   the patients during cognitive stimulation exercises.     at least direction of attention. Most of the proposed
                                                            techniques can be seen as based on the concept of face
    The remainder of this paper is organized as follows:
                                                            engagement proposed by Goffman [17] to describe the
Section 2 describes the related works in human-human
                                                            process in which people employ eye contact, gaze and
and human-machine interactions. Section 4 and section
                                                            facial gestures to interact with or engage each other. Ba-
5 give an overview of the cognitive stimulation situa-
                                                            sically, the engagement detection framework is based on
tion including the design of the robot and the Wizard-
                                                            (1) face detection and (2) facial/head gestures classifi-
of-Oz experiment. Section 6 describes the analysis of
                                                            cation. In [22], in order to understand behaviors of the
the manually labelled data for the extraction of dia-
                                                            potential addressee in human-robot interaction task,
logue acts from the Wizard-of-Oz experiment experi-
                                                            the authors proposed to combine multiple cues. A set of
ment: self-talk (ST) and system directed speech (SDS).
                                                            utterances is defined and used to start an interaction.
Section 7 shows and discusses the engagement charac-
                                                            In addition to the detection step, the authors estimate
terization framework. The experiments carried on for
                                                            visual focus of attention of users. They compute prob-
the evaluation of the proposed metric are described in
                                                            abilities that the partner is looking at a pre-defined list
section 8. Finally, section 9 concludes our work.
                                                            of possible focus targets. Since the focus targets include
                                                            the robot itself and other potential users, engagement
2 Related work                                              estimation is reinforced and allows to take benefit from
                                                            the eye-gaze functions without an explicit modeling. A
2.1 Social cues of engagement                               similar work done in [12] in a multi-robot interaction
                                                            framework, based on face detection and gestures classi-
The problem of detecting engagement has been stud-          fication, makes it possible to select and command indi-
ied using verbal and non-vebal cues but many existing       vidual robots.
Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization                         3


    Robots could also use eye-gaze behaviors for the          tivated by the fact that, in rehabilitative context, the
improvement of the interaction. Mutlu et al. [23] con-        extraction of implicit information on the patient are of
ducted experiments where their robot, Robovie, em-            seminal importance. Physiological signals are more cor-
ployed various strategies of eye-gaze behaviors to signal     related to internal state of a user and consequently can
specific roles during interaction: addressee, bystander,       be employed to infer emotional information [28]. En-
and overhearer. The authors show that gaze direction          gagement detection from these signals can be used by a
serves as a moderator since gaze cues support the con-        robot to alter the interaction scenario such as coaching,
versation by reinforcing roles and participation of hu-       assistance...
man subjects.                                                     Peters et al. [29] have highlighted the importance
    All the previous mentioned works have shown the           and the various elements related to engagement. They
benefit of using eye-gaze behaviors for measuring en-          proposed a simplified model based on an action-cognition-
gagement during interaction. However, this social signal      perception loop, which makes it possible to differentiate
should be more precisely characterized. Rich et al. [24]      between the several aspects of engagement: perception
investigated some relevant cues for recognizing engage-       (e.g. detection of cues), cognition (e.g. internal state:
ment firstly in human-human interaction and then pro-          motivation), action (e.g. display interest). They also
posed an automatic system for HRI situations. Regard-         identified a dimension termed experience, which aims
ing eye-gaze behaviors, they identified directed gaze and      at covering subjective experiences felt by individuals.
mutual facial gaze. Directed gaze characterizes events        This work shows that engagement is not a simple con-
when one person looks at some object following which          cept and, as other social signals, investigations on the
the other person looks. Mutual gaze refers to events          characterization, detection and understanding are still
when one person looks at the other person’s face. As a        required for the design of adaptive interfaces including
result, various features are employed to describe com-        robots.
municative functions of eye-gaze behaviors. Ishii et al.          In this paper, we deal with the characterization of
[15] also extract various features from eye-gaze behav-       engagement detection in assistive context and more pre-
iors such as gaze transitions, occurrence of mutual gaze,     cisely in a cognitive stimulation situation with elderly
gaze duration, distance of eye movement, and pupil size.      people. Research works on related topics are usually de-
These features are employed to predict user’s conversa-       voted to acceptability [30–32], however maintaining en-
tional engagement. The statistical modeling and combi-        gagement is identified as a key component of socially as-
nation of the features have shown the relevance of all of     sistive robots [33,34]. The use of interactive technology
them in the recognition of user’s attitudes (engagement       may be more challenging for elderly users. Xiao et al.
vs disengagement).                                            [35] have investigated how seniors deal with multimodal
    Other cues can be employed to detect engagement           interfaces and they found that elderly people require
in social interactions. Castellano et al. [14] proposed       more time and make errors as it can be expected. But,
to combine eye-contact with smiling, which is consid-         more interestingly, they identified that older users em-
ered in their specific game scenario as an indicator of        ploy an audible speech register termed self-talk, which is
engagement. The authors enriched the characterization         a kind of think-aloud process produced during difficult
by adding contextual features such as the game state          tasks. Discriminating self-talk (SF) from system/robot
and the behavior of the robot used (iCat facial expres-       directed speech (SDS) is of great importance for two
sions). A Bayesian network is employed for the model-         main reasons: (1) intended interactions are produced
ing of cause-effect relationships between the social sig-      during SDS, (2) production of SF can be used as an
nals and contextual features. The evaluation makes it         indicator of engagement. The next section aims at de-
possible to identify a set of actions done by users corre-    scribing more precisely this specific speech register.
lated with engagement. Spatial movements have been
used for initiating conversations [25] and more generally
for engagement characterization (see [26] for an inter-       2.2 Self-talk in machine interaction
esting discussion on relevant social cues).
    All these results show that in various tasks eye-gaze     Following the definition of Oppermann [11], self-talk, or
behaviors could be combined with others cues in order         private speech, refers to audible or visible talk people
to improve the detection rates. Other strategies can be       use to communicate with themselves. This register can
followed by avoiding the estimation of eye-gaze behav-        be considered as a part of off-talk, which is a special dia-
iors. For instance in [27], the authors employ physiolog-     logue act characterizing ”every utterance that is not di-
ical signals such as skin response and skin temperature       rected to the system as a question, a feedback utterance
for the estimation of engagement. The work was mo-            or an instruction”. Off-talk is identified as a problem
4                                                                           Jade LE MAITRE, Mohamed CHETOUANI


for automatic speech recognition systems and distin-          which is traduced by the production of self-talk. By be-
guishing it from on-talk (or system directed speech) will     ing aware of theses phases, the coach robot will be able
clearly improve recognition rates. However, the charac-       to produce useful feedback, encouragement or help.
teristics of off-talk make the task difficult. For instance,         In order to identify the interaction phases between a
in case the user is reading instructions, lexical informa-    therapeuth and a patient, we conducted experiments to
tion are not discriminant and other features should be        acquire interaction datas. In 4 we describe the actions
employed. One relevant strategy is to try to combine au-      of a therapeuth during a cognitive stimulation exercise,
dio and visual features as proposed in [36,37]. Batliner      and how the robot should be adapted. The robot’s in-
et al. formulated the problem by defining on-talk vs off-       teraction are described in section 3, and the technical
talk and on-view vs off-view strategies. The combina-          setup of the Wizard of Oz experiments are detailed in
tion of them leads to on-focus (on-talk + on-view) and        section (5.2).
off-focus, where on-view is not discriminant: listening            After the completion of all experiments, the cor-
to someone and looking away. The authors employed             pus analysis begins, as described in 6. The very first
an audio-visual framework for the classification of on-        analysis is the manual annotation of all the records
talk from various off-talk elements (read, paraphras-          taken during the experiments. After a gaze and key-
ing and spontaneous off-talk). Prosodic, part-of-speech        words annotation, the next step is the pooling of the
(POS) features and visual features (a simple face detec-      annotated keywords in clusters using a Latent Semantic
tion system) are employed. The detection of user’s focus      Analysis Method. This method groups together seman-
on interaction yields 76.6% by using prosodic features        tically similar keywords in meaningful clusters, struc-
and the combination with linguistic and visual features       turing in a non-supervised way the dialogue acts, and
allows to achieve 80.8% and 84.5% respectively.               giving them a semantic signification, as explained in
    From a conceptual point of view, open-talk (robot-        figure 3.
directed speech) is considered as a social speech be-             In these clusters occurs another manual annotation,
cause it is produced with the objective of communica-         splitting the keywords in two categories, whenever they
tion, while self-talk is known to be a means for think-       were spoken to the robot and/or the computer (System-
ing, planning and for self-regulation of behavior [38].       Directed-Speech) or they were spoken to the patient
Lunsford et al. [37] investigated audio-visual cues and       himself (Self-Talk). With all these labelled keywords
reviewed some functions of self-talk. Among the most          (cluster, ST or SDS), we are able to perform an en-
interesting ones, the authors reported that self-talk sup-    gagement characterization as described in section 7.
ports task performance and the self-regulation. Poten-
tial benefits of improving estimation of engagement are
(1) design of adaptive social interfaces (including robots)
(2) improvement of the impact of assistive devices and
(3) understanding the strategies and behaviors of indi-
viduals.                                                      Fig. 1 Steps of engagement characterization


3 Objectives

The purpose of this paper is engagement characteriza-         4 Patient-Therapeuth Interaction
tion in an interaction between a MCI patient and a
robot with the help of verbal utterances classification        Before conducting experiments between a robot ant pa-
in Self-Talk or System-Directed Speech.                       tients, we collected data about how patients interacted
    As seen in [58], during the completion of a spatial       with a therapist. The patient had to solve exercises on
task by seniors, a high amount of self-talk is observed:      a tactile screen, while the therapist, seated near the pa-
80% of the subjects of their study engaged in ST at           tient, observed the situation and provided help when-
some point during their session. This amount increases        ever the patient needed one. The therapist could help
with the difficulty level of the task, which is in strong       with the technical setup, indicating how to deal with
correlation with the cognitive load of the person: the        the tactile screen, or just provide help for a particu-
ST amount increased from low to high difficulty tasks           lar exercise, how to correct an answer or just say if
(26.9% versus 43.7%, respectively). A similar situation       the patient answered correctly. The interaction between
is proposed through the cognitive stimulation experi-         the patient, the therapist and the cognitive stimulation
ment (section 4) and the key idea is to identify phases       exercise is a triadic situation, as shown in Figure 2.
where the patient are less engaged on this activity,          Backchannels of the therapist are important for these
Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization                       5


patients to gain in confidence and solve these exercises       Paro. Various robots fitting these characteristics can
correctly, even if the therapist doesn’t have to say any-     be employed but we selected the rabbit shaped Violets
thing - the mere presence of the therapist matters. Psy-      Nabaztag, type Nabaztag: tag. This electronic device
chologists at the Broca Hospital therefore organized ses-     has enabled Wi-Fi, and can connect to the internet to
sions of recorded cognitive stimulation, which will lead      process specific services via a distant server located at
on our side to the analysis of the interaction between        http://www.nabaztag.com. The Nabaztag has motor-
the patient and the therapist. Thanks to these sessions,      ized ears, 4 color LEDs, a speaker and a microphone.
we were able to determine the interaction phases of the       As it will be described in section 5.2, ears and LEDs are
therapist, and duplicate them later to the robot. The         employed to enhance the expressiveness of the Nabaz-
therapist is always at the side of the patient, measur-       tag. Regarding the acceptability of this robot, exper-
ing and evaluating the attention and engagement of the        imental results can be founded in other projects such
patient. The presence of the therapist is important for       as SERA (Social Engagement with Robots and Agents)
the patient to gain in confidence. As shown in Figure 2,       where social engagement is investigated [30, 31].
during the cognitive stimulation exercises, the patient
sits in front of the tactile screen with the therapist at
his right. Because the acceptance of the robot is our
second concern between the fact that the robot will re-
act appropriately, we investigated as well with our col-
leagues which kind of robot might be acceptable by the        5.2 Description of the Wizard of Oz Experiments
targeted end-users with Mild-Cognitive Impairments.
                                                              5.2.1 Technical and Experimental Design

                                                              Human-robot communication differs from human-human
                                                              communication. Therefore, to gather reliable informa-
                                                              tion about human-robot communication it is important
                                                              to observe the human behaviour in a situation in which
                                                              humans believe to be interacting with a real robotic
                                                              system. It is important that the user thinks he or she
                                                              is communicating with the system, not a human, as
                                                              noted for example by [57]. The purpose of our Wizard
                                                              of Oz experiments is to record interactions between the
                                                              patient and the robot, using the interactions schemes
                                                              observed between the patient and the therapist. Af-
Fig. 2 Triadic situation either with robot or therapist       ter analyzing the videos of the sessions between the
                                                              therapist and the patient, patterns of interaction were
                                                              detected (therapist encouraging, therapist answering a
                                                              question, backchannels) and adapted to the Nabazatg.
                                                              The purpose was to give to the Nabaztag an interaction
5 Patient-Robot Interaction
                                                              panel, leaving the Wizard in charge the responsibility to
                                                              choose the right couple [answer+backchannel ] for each
5.1 Designing Robot for MCI patients
                                                              situation.
Focus group sessions were conducted at the Broca Hos-             The patient seats in front of the tablet-PC, with
pital to identify how the elderly perceive a robot’s ap-      the Nabaztag at his left as shown in Figure 2. Dur-
pearance. 15 adults over the age of 65, divided in three      ing the sequence of cognitive exercises solved on the
groups, took part in the sessions. 13 of them were re-        tablet PC, the robot interacts with the patient. The
cruited from the Memory Clinic at the Broca Hospi-            Wizard gathers informations about the situation with
tal; two were recruited from an association for the el-       the help of two cameras, and a screen capture of the
derly. Seven of the participant suffered from Mild Cog-        tactile screen. The Wizard can hear the patient, but
nitive Impairment, according to the definition criteria        reversed situation is impossible. The Nabaztag is re-
of [39]. From the results of the focus group, the robots      motely controlled by the Wizard, activating the couple
defined as attractive to them were small robots, often         [answer+backchannel ] at the same time. The mean du-
with a modern design, shaped like animals or objects          ration of a WOZ experiment is 7min30s and with a total
they could use in their daily life [40], like Mamoru or       of 96min.
6                                                                        Jade LE MAITRE, Mohamed CHETOUANI


5.2.2 Verbal and Nonverbal Behaviors for the Nabaztag     5.3 Participants Analysis

                                                          A total of eight participants were chosen by the ther-
Regarding the Nabaztag, nonverbal behaviors should
                                                          apeuths at the Broca hospital, seven females and one
be defined. Similar to other robots, such as Paro [41],
                                                          male, aged from 64 to 82, participated in the experi-
Emotirob [42] or Aibo [8], the Nabaztag can exploit
                                                          ments. Two of them had a slight MCI, the remaining
movements and sounds as social communicative signals.
                                                          six did’nt have any communication, hearing, or vision
According to the work of Lee and Nam [43] about the
                                                          impairments. Examples of interaction between the par-
relation between physical movement and the augmen-
                                                          ticipants and the Nabaztag are shown in the following
tation of emotional interaction, the expressions of the
                                                          two paragraphs, one dedicated to the SDS, the second
Nabaztag will be correlated with both speed of move-
                                                          to ST.
ments and LEDS blinking. As shown in Figure 3, slow
movements or blinking LEDS will express unpleasant
                                                          Example Dialogue Between a User and the Nabaztag:
expressions such as sadness or annoyance, expressed
                                                          System-Directed-Speech
when the user is getting lost or doesn’t know how to
solve the exercise. Positive expressions are related to Nab. : Good Morning, my name is Carole, what’s your
active movements and blinking, which are employed to          name?
encourage the user, for instance.                       User : My name is Bob.
                                                        Nab. : Hello, Bob. I’m here to help you solve your exer-
    The nonverbal behaviors for the Nabaztag were im-
                                                              cises. Do you want to start them now?
plemented using the NabazFrame, developed by the
                                                        User : Yes!
University of Bretagne Sud1 . Nonverbal choreographies
                                                        Nab. : Let’s go! First, you have to drag the images into
contain ear movements and different sets of blinking
                                                              the box corresponding to their name.
LEDs, the color depending of the mood or feedback
                                                        User : How do I do that?
we wanted to transmit. Green and yellow fast blinking
                                                        Nab. : Press on the image, then drag it to the box.
LEDs express pleasant expressions, while slow move-
ments with blue and violet LEDs express unpleasant
                                                          Example Dialogue Between a User and the Nabaztag:
signals (Fig. 3). These behaviors are currently tested
                                                          Self-Talk
by end-users in another work.
                                                        User : And I drag the little image with the tree to automn
                                                              box... oh, it’s not coming! She’s gone, uh, found, i
                                                              drag it to the box, oh, it escaped...
                                                        Nab. : When dragging the image to the correct box, you
                                                              must always touch the screen.
                                                        User : Uh, yes, I drag it, i drag the image, the tree is
                                                              in the box. The image with the flowers can’t be the
                                                              summer, these flowers grow in the spring, oh, i don’t
                                                              know, I drag the image to the spring, she’s not com-
                                                              ing, ah, yes, let’s go the the next image.


                                                          6 Analysis of the annotation of WOZ
                                                          experiments

                                                          6.1 Manual Annotation of the Content of the WOZ
                                                          Experiments

                                                          For each of the 8 participants of the recorded exper-
Fig. 3 Relationship between movements and expressions     iments, the first step was the manual annotation of
                                                          the videos taken during the Wizard of Oz experimenta-
                                                          tions. Before the annotation begun, the videos from the
                                                          two cameras were synchronized and edited together, in
                                                          order that the annotater could view the patient from
                                                          the two different angles: computer and robot. First, the
    1
        www-valoria.univ-ubs.fr                           gaze was annotated, without paying attention to the
Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization                      7


speech. The annotater carefully annotated when the pa-        detection and evaluation through act dialogues such as
tient was looking at the robot (the setup made clearly        self-talk and open-talk is relevant. The eye-contact is
visible when the patient was looking at the robot: from       not discriminant in our case, but we found that the
the computer point of view, the person moves her head,        verbal production of the patient could help to charac-
and from the robot-point of view the person gaze is fo-       terize their difficulties. Because the robot needs to know
cused on the camera), and then when the patient was           when the patient encounter difficulties, to produce the
looking directly at the computer (same but reversed).         appropriate feedback and help the patient, we decided
The second step was to perform the speech annotation.         to focus our work on the verbal utterances.
The annotater first listened, then annotated the rele-
vant keywords found in the patient’s utterances. Key-
words can be a single word, but also an expression of         6.2 Latent semantic analysis of the content of
words with close signification: ”I’m doing well ” will be      annotation
annotated as one keyword, for example. Filled pauses
were not annotated. The spoken keywords were primary          Understanding the content of annotation of interactive
divided in eight simple different categories, defined by        databases is made difficult by the strategies of indi-
the annotator after seeing the whole eight films. A gen-       viduals in a cognitive stimulation task. Indeed, several
eral structure was defined, as defined below:                   keywords or utterances correspond to one of the 8 cate-
 – Agreement : ”Let’s start”, ”Yes”, ”You’re right”           gories. To provide insights on the annotation databases,
 – Technical Question (often about the use of the             and consequently on the behaviors exhibited, we per-
   tactile screen)                                            formed a latent semantic analysis (LSA) [44]. LSA is
 – Contextual Question (about the cognitive exer-             based on a Singular Value Decomposition (SVD) of
   cise itself, such as ”What should I do with this pic-      term-document matrix, and results on a reduced di-
   ture? ”, ”Should I put it here, or there? ”)               mension of the feature space. The interpretation of the
 – Non-Obligatory Turn-Taking: Comment with-                  reduced space makes it possible to identify semantic
   out the need of an answer by the robot (Users with         concepts. Altough LSA is usually performed on texts,
   MCI are often commenting about what they are do-           we decided to use it here because the verbal production
   ing, ”And I move the picture over there...”)               of the patients is not spontaneous. The verbal utter-
 – Obligatory Turn-Taking: Comment with the need              ances are structured (the exercise, or their technical
   of an answer by the robot (such as ”I hope I am do-        difficulties). Even in the Self-Talk parts of the interac-
   ing right”), they indicate us the difficulty felt by the     tion, the verbal production is structured and limited.
   user                                                       The term-by-documents matrix becomes, in our case,
 – Support Needed: User is confused and need more             a keyword-by-clusters matrix. To train LSA we esti-
   support (”I don’t remember ”, ”I don’t know what I         mated the occurrence of each one of the 247 different
   should do? ”)                                              keywords or utterances produced by the 8 participants
 – Thanks: (some user thanks the Nabaztag when they           regarding the 8 categories. And using a Kaiser criterion
   receive help, and one complimented it about its color      (80% of the information), the reduced rank was set to
   and shape)                                                 5. We carefully analyzed the content (keywords) of the
 – Disagreement (”No, I don’t want”)                          obtained clusters and they were interpreted as:
    The gaze annotation shows that 89.76% of the time          – Positive Feeling: positive feeling expressed toward
users are looking at the computer. This is explained by          the robot, or toward the exercises.
the fact that they were asked to realize the exercises.        – Comments: useful keywords, but expressing noth-
Other explanations could be found in the specific design          ing than a simple statement about the exercise.
of the triadic situation, eye-contact with the robot is not    – Social Etiquette: the keywords put in this cluster
required for effective engagement: the robot is only here         were all expressing agreement with the robot(”Okay”,
to help and encourage, to solve an exercise the patient          ”I’m in”, ”Thanks”). This cluster is based on J.
has to gaze at the computer. As previously discussed,            Light’s original work, analyzing communication and
eye-gaze behaviors might not be discriminant for en-             characterizing the communication messages into dif-
gagement detection on some tasks, and more specifi-               ferent categories [59].
cally for seniors. Similar results have been obtained in       – Request Information: all the keywords express-
[37], where 99.5% of the self-talk utterances were asso-         ing a question on the exercises were put in this clus-
ciated with a gaze behavior directed to the system, and          ter. The questions are context based, they depend
98.1% for system directed speech. Eye-contact is not             on which part of the exercises the patient experi-
always discriminant for engagement / disengagement               ences difficulties.
8                                                                           Jade LE MAITRE, Mohamed CHETOUANI


 – Others: the last cluster, in which the words are rel-     highest amount of ST are respectively the Positive Feel-
   evant but the amount of words is to small to form a       ings, with 91 utterances, and the Others cluster, with
   specific cluster. This last cluster covers various com-    130 utterances. These two clusters are in direct relation-
   ments usefulness for the cognitive exercise but to-       ship with the exercises: the users expresses his feelings
   tally relevant for engagement because of the amount       toward the exercise in the first cluster, and in the last
   of self-talk (see table 1).                               one are labelled various utterances about the exercises,
                                                             expressed by the patients: In fact, all the patients were
                                                             totally engaged in the interaction and didn’t perform
6.3 Speech corpus
                                                             any other polluting task.
Our speech corpus consists of 543 utterances or key-
word, and each of them correspond to one of the se-          Table 1 Distribution of self-talk and system directed speech
mantic cluster. Each utterance or keyword has been           over the clusters
carefully annotated: (1) self-talk or (2) system/robot di-     Semantic cluster       Self-Talk   System Directed Speech
rected speech. The annotator simply answered to these           Positive Feeling          91                 9
two questions: ”According to you, does the person speak           Comments                21                24
                                                                Social Etiquette          43                69
to herself ? ” (ST) or ”speak to the robot or the com-        Request Information         27                33
puter? ” (SDS).                                                     Others               130                96
    We evaluated the subjectivity of the annotation pro-
cess by evaluating interjudge agreement. A second naive
annotator was chosen, who didn’t have any contacts               On table 2, patients 3 and 7 have the highest num-
with the participants, and didn’t even know before the       ber of self-talk verbalizations, they are in fact the two
differences between ST and SDS. This annotator watched        MCI patients and this result is well explained consid-
the videos and had to choose, based on the same ques-        ering their pathology. Patient 3 was the most talkative
tions, wether the verbal utterances were ST or SDS.          patient, and expressed a various range of positive feel-
We then performed an inter-annotator score between           ings. A measure of the importance of ST per patient
the very first labellization and the one of the naive an-     could trace the evolution of a given patient.
notator. With the kappa method, the result was a score
of 0.68 showing a sustainable agreement.
                                                             Table 2 Number of self-talk and system directed speech
    The distribution of the utterances over the clusters
is given in table 1. Most of the self-talk verbalizations     Users    Self-Talk    System Directed Speech
                                                               1           20                 19
are present in the most heterogeneous semantic cluster
                                                               2            1                  7
(Others), which shows that the amount of self-talk plays       3          106                 85
a role on engagement. Positive feeling is mostly com-          4           14                  2
posed by self-talk elements. This can be well explained        5           30                  6
by the fact that we are dealing with the expression of         6           37                 20
                                                               7           58                 37
feelings, or positive comments which appears when the
                                                               8           49                 55
robot gives satisfaction to the user. As the patient is
resolving a problem thanks to the robot, he is grate-
ful but concentrated on the exercise. He will speak to          After an acoustic analysis of all keywords and utter-
himself about his feelig of joy, or gratitude, because       ances annotated as self-talk and system directed speech,
he is concentrated on the exercise. After the exercises      we selected 293 utterances for ST and 223 for SDS. The
are complete, the cognitive load of the patient is lower,    removed utterances were mainly due to their shorter
thus the patients often express their gratitude, but this    duration (less than 1s). Durations of the utterances are
time directly to the robot. As one can expect, system        between 1 to 2.5s.
directed speech is more present on semantic clusters
characterized by direct relationship to the cognitive ex-
ercise: comments, social etiquette and request informa-      7 Using self-talk detection for engagement
tions. In fact, these clusters express a direct speech to
the robot because the person is asking, or requesting,       In this section we describe the system developed for
something very precise: as for a discussion between two      the measure of engagement from self-talk. Figure 4 de-
persons, the patient instinctively direct his speech to      picts the proposed system. It requires to discriminate
the system. A high amount of ST doesn’t mean a low           self-talk from system directed speech. Then, we com-
engagement in the interaction: the two clusters with the     bine the duration of both act dialogues in a measure of
Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization                         9


engagement, which aims at characterizing the degree of        7.2 Classification
interaction effort.
    As previously mentioned, self-talk is produced with       After the extraction of features, supervised classifica-
a lower intensity compared to open-talk or robot-directed     tion is used to categorise the data into classes cor-
speech. In this paper, we propose to investigate prosodic     responding to (1) self-talk, (2) system/robot directed
features including pitch, energy and rhythm. This is          speech. This can be implemented using standard ma-
motivated by the fact that most of the classification          chine learning methods. In this study, three different
frameworks of specific speech registers such as emotions       classifiers, decision trees, k-nearest neighbour (k-NN)
[45], infant-directed speech [46], robot-directed speech      (k is experimentally set to 3) [53], Support Vector Ma-
[47] are mainly based on the characterization of supra-       chines (SVM) with Radial Basis Function [54], were in-
segmental features. In addition, Batliner et al. [48] have    vestigated.
shown the relevance of prosodic features and more pre-            A k-fold cross-validation scheme was used for the
cisely duration features to discriminate on-talk to read      experimental setup (k is set to 10) and the performance
off-talk. However, researchers on speech characterisa-         are expressed in terms of the overall accuracy: average
tion and feature extraction show that is difficult to have      of the accuracies obtained over the k partitions.
a consensus on relevant features for the characterization
of emotions, intentions as well as personality.               7.3 Evaluating engagement

                                                              Engagement is an interactive process [13], in which two
                                                              participants decide when to establish, maintain and end
                                                              their connection. Thus, it is important to evaluate the
                                                              engagement and find a way to establish the interac-
                                                              tion effort caused by maintaining the interaction. Once
                                                              the system directed speech and the self-talk sequences
                                                              are detected, we propose to combine them to estimate
                                                              a dimension termed interaction effort. The interaction
Fig. 4 Description of the proposed system for engagement      effort is based on the dialogue: the interaction goes
measure
                                                              hand in hand with the dialogue in our experiments.
                                                              Because this is a verbal interaction, it is important to
                                                              analyze the differences in the adressee (self or system)
                                                              to evaluate the interaction effort. In [55], interaction ef-
                                                              fort (IE) is defined as a unitless measure of how much
7.1 Feature extraction                                        effort a user put into interacting with a robot. The au-
                                                              thors show that IE cannot be measured easily because it
Several studies have shown the relevance of both fun-         will require advanced tools such as eye-tracking and/or
damental frequency (F0) and energy based features for         brain activity sensors. In our application, we argued
emotion recognition applications [49]. F0 and energy          that IE is related to the quantity of robot/system di-
were estimated every 20 ms using Praat [50], and we           rected speech, which is characterized by (1) intended
computed several statistics for each voiced segment (seg-     communication and (2) on-view state (gazing to the
ment based method) [51]: maximum, minimum, mean,              system). While self-talk might be an indicator of plan-
standard deviation, interquartile range, mean absolute        ning, cognitive load is related to the difficulty of the
deviation, quantiles corresponding to the cumulative          task. We propose to estimate the interaction effort of a
probability values 0.25 and 0.75. resulting in a 16 di-       given human robot interaction during a cognitive stim-
mensional vector.                                             ulation situation by the following expression:
    Rhythmic features were obtained by applying a set                    SDS
                                                              IE =                                                   (1)
of perceptual filters to the audio signal dedicated to                SDS + ST
characterization of prominent events in speech termed         SDS and ST refer to the duration of system directed
p-centers [52]. Then, we estimated the spectrum of the        speech and self-talk (in seconds). The numerator is the
prominent signal for characterizing the speaking rate.        amount of time of intended interaction, and the denom-
We estimate 3 features: mean frequency, entropy and           inator is the amount of time of interaction time (speak-
barycenter of the spectrum. Differences in rhythm are          ing). IE is a unitless measure (0 ≤ IE ≤ 1). If SDS is
indicators of efficiency and clarity of interaction (fluid-      small relative to ST then IE is quite small. Typically, ef-
ity).                                                         ficient interactions, which do not require self-regulatory
10                                                                           Jade LE MAITRE, Mohamed CHETOUANI


behaviors from elderly people, will obtain an IE close to     Table 3 Accuracy of classifiers using 10 folds validation
1. In some interactions, self-regulatory speech can be a         Features        Decision Tree    k-NN      SVM
positive measure, improving the interaction, but in our         Pitch-based         49.8%        53.35%    52.16%
case, it transcribes the cognitive load of the patient. The    Energy-based          55.54       54.29%    59.51%
robot has to be aware of the patient’s difficulties, linked      Rhythm-based        52.78%        56.58%    56.97%
                                                                   Pitch           57.42%        59.28%    64.31%
to his cognitive load, we therefore describe the interac-        + Energy
tion as efficient when the cognitive load (the amount of             Pitch            55.46%       58.20%    71.62%
ST) is low.                                                      + Energy
    The IE measure proposed in this paper allows to             + Rhythm
evaluate the efficiency of the interaction. In future works,
this measure will allow to change the verbal and non-
verbal behaviors of the robot and more interestingly          8.2 Engagement characterization
to adapt both cognitive exercises and encouragements
provided.
                                                              This section describes the estimation of the Interac-
                                                              tion Effort measure (equation 1). For the evaluation
8 Experimental results                                        of only the IE measure, we exploited the manually la-
                                                              belled data. The results are presented table 4. The best
This section describes experiments and results performed      IE measure is obtained for the user 2 (0.83), but one
for the characterization of engagement based on the de-       should be careful with this result since he produced only
tection of self-talk. Firstly, the performance of our self-   1 self-talk and 7 system directed speech utterances (ta-
talk detection system is presented then we propose to         ble 2). For the most talkative user (patient 3), the IE
derivate a measure of engagement.                             measure provides insights about his behavior: a relative
                                                              balance between ST and SDS.
                                                                  Patients 4 and 5 Interaction effort estimation’s is
8.1 Detection of self-talk
                                                              under 0.20. It can be easily explained because these
                                                              patients did not talk directly to the robot. They ad-
Table 3 shows the recognition rates of all the classi-
                                                              dressed the system directly at the beginning of the ex-
fiers trained with different feature sets. In [37], energy
                                                              ercise, showing they understood the instructions. The
is found to discriminate system directed speech from
                                                              other verbal utterances were only comments addressed
self-talk. Compared to pitch based features, classifiers
                                                              to themselves. Patient 5 had little difficulties, where she
trained with energy are more efficient and best results
                                                              had to address directly to the robot to obtain a proper
are obtained by SVM. One possible explanation is that
                                                              answer, but the amount of her self-talk utterances was
the extraction of pitch might be more complex for self-
                                                              too considerable to balance these SDS utterances.
talk, which is produced by the users for themselves and
consequently with a lower energy and intelligibility.             For the automatic estimation, we followed the frame-
    In addition to what has been described in the lit-        work described figure 4. A Vocal Activity Detector (VAD),
erature concerning the energy, we argued that rhythm          suitable for real-time detection in robotics [8], is em-
should be relevant for our application because of the         ployed for the segmentation of speech. The self-talk /
change of speaking rates observed during self-talk. The       system directed speech discrimination system is based
experimental results show that using only rhythm based        on the SVM classifier trained with pitch, energy and
features allow to achieve acceptable performance (56.97%)     rhythmic features as previously designed. Once the speech
between those obtained by energy (59.51%) and those of        utterances classified, we extract their duration and at
pitch features (52.16%). Energy and rhythmic features         the end of the experiment we estimate the IE measure.
are more robust. Rhythmic features is related to the          Table 4 shows that the IE measures computed by au-
vocalic energy of signal [56] and has similar characteris-    tomatic approach capture the individual behaviors of
tics to that of short-term energy expect that perceptual      each user. In addition, high and low IE measures are
filters are employed before the computation of energy          efficiently characterized. However, for very low IE mea-
(from acoustic prominence enhancement). SVM classi-           sures such as user 4, the automatic approach under es-
fier trained with the three sets of features outperforms       timated the performance. This may due to factors such
(71.62%) all the configurations. One should note that          as the small amount of verbalizations of this given user.
adding features will not exhibit the same performance         Due to the imperfect classifications, some of the IE mea-
for all the classifiers. Adding features for decision tree     sures are over or under estimated but always allowing
and k-NN classifiers decreases the performance.                to characterize a trend.
Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization                           11


Table 4 Interaction effort (IE) measure estimation             Acknowledgements This work has been supported by French
                                                              National Research Agency (ANR) through TecSan program
 Users    From Annotation     From Automatic                  (project Robadom ANR-09-TECS-012). The authors would
                             Self-talk Detection              like to thank the Broca hospital for their work: Ya-Huei Wu,
   1             0.5                 0.62                     Christine Fassert, Victoria Cristancho-Lacroix and Anne-Sophie
   2            0.83                 0.78                     Rigaud
   3            0.45                 0.53
   4            0.13                 0.08
   5            0.20                 0.26
                                                              References
   6            0.43                 0.46
   7            0.42                 0.38
                                                              1. Feil-Seifer D. J., Mataric, M. J. (2005) Defining Socially
   8            0.57                 0.63                       Assistive Robotics. In International Conference on Rehabil-
                                                                itation Robotics, pages 465-468, Chicago, IL.
                                                              2. Fasola J., Mataric, M. J. (2010) Robot Exercise Instruc-
                                                                tor: A Socially Assistive Robot System to Monitor and En-
9 Conclusions
                                                                courage Physical Exercise for the Elderly. In 19th IEEE
                                                                International Symposium in Robot and Human Interactive
In this paper, we have demonstrated promising results           Communication (Ro-Man 2010), Viareggio, Italy.
on automatically estimating the interaction efforts of         3. Mataric M.J., Tapus A., Winstein C. J., and Eriksson
                                                                J. (2009) Socially Assistive Robotics for Stroke and Mild
patients during coaching experiment with cognitive stim-        TBI Rehabilitation. In Advanced Technologies in Reha-
ulation exercises. After an analysis of WOZ experiments         bilitation, Andrea Gaggioli, Emily A. Keshner, Patrice L.
(triadic situation), we identified relevant social signals       (Tamar) Weiss, Giuseppe Riva (eds.), IOS Press, 249-262.
                                                              4. Vinciarelli A., Pantic M., and Bourlard H. (2009) Social
characterizing the engagement of elderly patients: speech
                                                                Signal Processing: Survey of an Emerging Domain,Image
directed talk and self-talk. The last one is employed           and Vision Computing Journal, Vol. 27, no. 12, pp. 1743-
during difficult tasks by the users for planning, think-          1759.
ing and for self-regulation. We proposed a system to          5. Saint-Georges C., Cassel R.S, Cohen D., Chetouani M.,
                                                                Laznik M-C., Maestro S., and Muratori F. (2010) What
automatically detect these two social signals based on          studies of family home movies can teach us about autistic
the extraction of relevant features: pitch, energy and          infants: A literature review. Research in Autism Spectrum
rhythm with three different classifiers. The experimen-           Disorders. Vol 4 No 3, pages 355-366.
tal results have shown the discriminative function of         6. Cassell J., Bickmore J., Billinghurst M., Campbell L.,
                                                                Chang K., Vilhj`lmsson H., and Yan H. (1999) Embodi-
                                                                                  a
energy as described in the literature. In addition, the         ment in conversational interfaces: Rea. In CHI’99, pages
performance achieved by proposed rhythmic features              520–527, Pittsburgh.
demonstrate that users employ a different speech regis-        7. Wrede B., Kopp S., Rohlfing K., Lohse M., and Muhl C.
ter for intended communicative acts.                            (2010) Appropriate feedback in asymmetric interactions,
                                                                Journal of Pragmatics, vol. 42, no. 9, pp. 2369 - 2384.
    Future work in this area should investigate multi-        8. Al Moubayed S., Baklouti M., Chetouani M., Dutoit
modal cues for the extension to off-talk situations. Off-         T., Mahdhaoui A., Martin J.-C., Ondas S., Pelachaud
                                                                C., Urbain J., Yilmaz M. (2009) Generating Robot/Agent
talk is defined as the act of not speaking to an ad-
                                                                Backchannels During a Storytelling Experiment, In
dressee, and it includes self-talk but also talking to a        ICRA?09, IEEE International Conference on Robotics and
third addressee... In this case, the automatic detection        Automation. Kobe, Japan.
of on-view states could be of great importance. Among         9. Chetouani M., Wu Y.H., Jost C., Le Pevedic B., Fassert
                                                                C., Cristancho-Lacroix V., Lassiaille S., Granata, Tapus A.,
the automatic cues that should be developed, eye-gaze           Duhaut D., Rigaud A.S. (2010) Cognitive Services for El-
social signal detection remains one of the challenges.          derly People: The ROBADOM project, ECCE 2010 Work-
Eye-tracking systems are not really suitable for com-           shop: Robots that Care, European Conference on Cognitive
plex assistive applications and will change the behav-          Ergonomics.
                                                              10. Yanguas J., Buiza C., Etxeberria I., Urdaneta E., Gal-
iors during the interaction. Interaction effort can also         dona N., Gonzlez M.F. (2008) Effectiveness of a non-
include touching and/or manipulation and a more gen-            pharmacological cognitive intervention on elderly factorial
eral definition of multimodal and integrative engage-            analisys of Donostia Longitudinal Study. Adv. Gerontol. 3,
ment characterization should be proposed.                       30-41.
                                                              11. Oppermann D., Schiel F., Steininger S., Beringer
    Furthermore, we intent to use the IE measure for            N. (2001) Off-talk - a problem for human-machine-
the characterization of users and giving the opportu-           interaction?, In EUROSPEECH-2001, 2197-2200.
                                                              12. Couture-Beil A., Vaughan R., and Mori G. (2010) Se-
nity to adapt the robots behaviors and in our specific
                                                                lecting and Commanding Individual Robots in a Vision-
task: the encouragements and potentially the difficulty           Based Multi-Robot System. Seventh Canadian Conference
of the cognitive exercises. As future work, we will ex-         on Computer and Robot Vision (CRV).
ploit questionnaires in order to understand and esti-         13. Sidner, Candace L. and Kidd, Cory D. and Lee, Christo-
                                                                pher and Lesh, Neal (2004) Where to look: a study of
mate the engagement awareness of the users during in-
                                                                human-robot engagement. Proceedings of the 9th interna-
teraction (experience of engagement).                           tional conference on Intelligent user interfaces (IUI’04).
12                                                                                Jade LE MAITRE, Mohamed CHETOUANI


14. Castellano G., Pereira A., Leite I., Paiva A., McOwan P.      32. Heerink M., Krose B.J.A., Wielinga B.J., Evers V. (2006)
  W. (2009) Detecting user engagement with a robot com-             The Influence of a Robot’s Social Abilities on Acceptance
  panion using task and social interaction-based features.          by Elderly Users. Proceedings RO-MAN, Hertfordshire,
  Proceedings of the 2009 international conference on Mul-          september 2006, pp. 521-526
  timodal interfaces (ICMI-MLMI’09), pages 119–126.               33. Mataric M.J. (2005) The Role of Embodiment in Assis-
15. Ishii R., Shinohara Y., Nakano T., and Nishida T. (2011)        tive Interactive Robotics for the Elderly, AAAI Fall Sym-
  Combining Multiple Types of Eye-gaze Information to Pre-          posium on Caring Machines: AI for the Elderly, Arlington,
  dict User’s Conversational Engagement. 2nd Workshop on            VA.
  Eye Gaze on Intelligent Human Machine Interaction.              34. Tapus A., Tapus C., and Mataric M. J. (2009) The Use
16. Nakano Y.I., Ishii R. (2010) Estimating User’s Engage-          of Socially Assistive Robots in the Design of Intelligent
  ment from Eye-gaze Behaviors in Human-Agent Conversa-             Cognitive Therapies for People with Dementia, Proceed-
  tions. in 2010 International Conference on Intelligent User       ings, International Conference on Rehabilitation Robotics
  Interfaces (IUI2010).                                             (ICORR-09), Kyoto, Japan.
17. Goffman, E. (1963), Behavior in Public Places: Notes on        35. Xiao B., Lunsford R., Coulston R., Wesson M., Oviatt
  the Social Organization of Gatherings. New York: The Free         S. (2003) Modeling multimodal integration patterns and
  Press.                                                            performance in seniors: Toward adaptive processing of in-
18. Argyle M. and Cook M. (1976) Gaze and Mutual Gaze.              dividual differences. Proceedings of the 5th international
  Cambridge: Cambridge University Press.                            conference on Multimodal interfaces.
                                                                  36. Batliner A., Hacker C., Kaiser M., Mogele H., Noth
19. Duncan S. (1972) Some signals and rules for taking speak-
                                                                    E. (2007) Taking into account the user’s focus of atten-
  ing turns in conversations. Journal of Personality and Social
                                                                    tion with the help of audio-visual information: towards less
  Psychology, vol. 23, no. 2, pp. 283-292
                                                                    artificial human-machine communication, Auditory-Visual
20. Goodwin C. (1986) Gestures as a resource for the organi-
                                                                    Speech Processing (AVSP 2007).
  zation of mutual attention. Semiotica, vol. 62, no. 1/2, pp.    37. Lunsford R., Oviatt S., Coulston R., (2005) Audio-visual
  29-49                                                             cues distinguishing self- from system-directed speech in
21. Kendon, A. (1967) Some Functions of Gaze Direction in           younger and older adults. Proceedings of the 7th inter-
  Social Interaction. Acta Psychologica. 26: pp. 22-63.             national conference on Multimodal interfaces (ICMI’05),
22. Klotz D., Wienke J., Peltason J., Wrede B., Wrede S.,           pages 167-174.
  Khalidov V., Odobez J.M. (2011) Engagement-based multi-         38. Diaz, R. & Berk, L.E., ed. (1992), Private speech: From
  party dialog with a humanoid Robot. Proceedings of SIG-           social interaction to self regulation, Erlbaum, New Jersey,
  DIAL 2011: the 12th Annual Meeting of the Special Interest        NJ.
  Group on Discourse and Dialogue, pages 341-343.                 39. Petersen R.C., Doody R., Kurtz A., Mohs R.C., Morris
23. Mutlu B., Shiwa T., Kanda T., Ishiguro H., Hagita N.            J.C., Rabins P.V., Ritchie K., Rossor M., Thal L., Winblad
  (2009) Footing in human-robot conversations: How robots           B., (2001) Current concepts in mild cognitive impairment.
  might shape participants roles using gaze cues. In Proc. of       Arch. Neurol. 58, 1985-1992, 2001.
  ACM Conf. Human Robot Interaction.                              40. Wu Y.H., Fassert C., Rigaud A.S. (2011) Designing
24. Rich C., Ponsler B., Holroyd A., Sidner C. L. (2010) Rec-       robots for the elderly: appearance issue and beyond.
  ognizing engagement in human-robot interaction. In Proc.          Archives of Gerontology and Geriatrics.
  of ACM Conf. Human Robot Interaction.                           41. Shibata T., Wada K., Saito T., and Tanie K. (2001)
25. Shi C., Shimada M., Kanda T., Ishiguro H., Hagita N.            Mental Commit Robot and its Application to Therapy
  (2011) Spatial Formation Model for Initiating Conversa-           of Children. In IEEE/ASME International Conference On
  tion. Proceedings of Robotics: Science and Systems.               AIM’01.
26. Michalowski M.P., Sabanovic S., Simmons R. (2006) A           42. Saint-Aime S., Le Pevedic B., Duhaut D. (2008)
  Spatial Model of Engagement for a Social Robot. IEEE              EmotiRob: an emotional interaction model, In IEEE RO-
  International Workshop on Advanced Motion Control, pp.            MAN 2008, 17th International Symposium on Robot and
  762-767.                                                          Human Interactive Communication.
27. Mower E., Mataric M. J, and Narayanan S. (2011) A             43. Lee J., Nam T-J. (2006) Augmenting Emotional Inter-
  Framework for Automatic Human Emotion Classification               action Through Physical Movement, UIST2006, the 19th
  Using emotional Profiles, IEEE Transactions on Audio,              Annual ACM Symposium on User Interface Software and
  Speech, and Language Processing                                   Technology.
                                                                  44. Steinberger J. (2004) Using Latent Semantic Analysis in
28. Zong, C. and Chetouani, M. (2009). Hilbert-Huang trans-
                                                                    Text Summarization. Evaluation 93-100.
  form based physiological signals analysis for emotion recog-    45. Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T.,
  nition. IEEE Symposium on Signal Processing and Infor-            Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous,
  mation Technology (ISSPIT’09).                                    L., Aharonson, V. (2007) The relevance of feature type
29. Peters C., Castellano G., de Freitas S. (2009) An ex-           for the automatic classification of emotional user states:
  ploration of user engagement in HCI. Proceedings of               low level descriptors and functionals. Proceedings of Inter-
  AFFINE’09.                                                        speech, pages 2253-2256.
30. Payr S., Wallis P., Cunningham S., Hawley M. (2009)           46. Mahdhoui A. and Chetouani M. (2011) Supervised and
  Research on Social Engagement with a Rabbitic User                semi-supervised infant-directed speech classification for
  Interface, In Tscheligi M., de Ruyter B., Soldatos J.,            parent-infant interaction analysis. Speech Communication.
  Meschtscherjakov A., Buiza C., Streitz N., Mirlacher T.         47. Breazeal C. and Aryananda L. (2002) Recognizing affec-
  (eds.), Roots for the Future of Ambient Intelligence. Ad-         tive intent in robot directed speech, Autonomous Robots,
  junct Proceedings, 3rd European Conference on Ambient             12:1, pp. 83-104.
  Intelligence (AmI09), ICT&S Center, Salzburg.                   48. Hacker C., Batliner A., and Noth E. (2006) Are you look-
31. Klamer T., Ben Allouch S. (2010) Acceptance and use of          ing at me, are you talking with me: multimodal classifica-
  a social robot by elderly users in a domestic environment,        tion of the focus of attention. In Sojka P., Kopcek I., Pala
  ICST PERVASIVE Health 2010.                                       K. (Eds): TSD 2006, LNAI 4188, pp. 581-588.
Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization   13


49. Truong K., van Leeuwen D. (2007) Automatic discrimina-
  tion between laughter and speech., Speech Communication
  49 (2007) 144-158.
50. Boersma P., Weenink D. (2005) Praat, doing phonetics by
  computer, Tech. rep., Institute of Phonetic Sciences, Uni-
  versity of Amsterdam, Pays-Bas, URL www.praat.org
51. Shami, M., Verhelst, W. (2007) An Evaluation of the
  Robustness of Existing Supervised Machine Learning Ap-
  proaches to the Classification of Emotions, Speech. Speech
  Communication, vol. 49, issue 3, pages 201-212.
52. Tilsen S. and Johnson, K. (2008). Low-frequency Fourier
  analysis of speech rhythm. Journal of the Acoustical Society
  of America, 124:2, pp. EL34-39.
53. Duda, R., Hart, P., Stork, D. (2000) Pattern Classifica-
  tion, second edition.
54. Vapnik V. (1995) The Nature of Statistical Learning The-
  ory. Springer-Verlag.
55. Olsen D. R., Goodrich M. (2003) Metrics for Evaluating
  Human-Robot Interaction, PERMIS 2003.
56. Delaherche E. and Chetouani M. (2010). Multimodal co-
  ordination: exploring relevant features and measures. Sec-
  ond International Workshop on Social Signal Processing,
  ACM Multimedia 2010.
57. Dahlbaeck N., Joensson A., and Ahrenberg L., Wizard of
  Oz Studies ? Why and How. Proceedings of the 1993 Inter-
  national Workshop on Intelligent User Interfaces (IUI193),
  ACM Press, 1993, 193-200.
58. Xiao B. , Lunsford R., Coulston R., Wesson M., Ovi-
  att S.,(2003) Modeling multimodal integration patterns and
  performance in seniors: toward adaptive processing of in-
  dividual differences, Proceedings of the 5th international
  conference on Multimodal interfaces, Vancouver, British
  Columbia, Canada
59. Light J.(1997) Communication is the essence of human
  kife: Reflections on communicative competence. AAC Aug-
  mentative and Alternative Communication, 61-70.

More Related Content

Similar to Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization

CONSIDERATION OF HUMAN COMPUTER INTERACTION IN ROBOTIC FIELD
CONSIDERATION OF HUMAN COMPUTER INTERACTION IN ROBOTIC FIELD CONSIDERATION OF HUMAN COMPUTER INTERACTION IN ROBOTIC FIELD
CONSIDERATION OF HUMAN COMPUTER INTERACTION IN ROBOTIC FIELD ijcsit
 
Artificial Cognition for Human-robot Interaction
Artificial Cognition for Human-robot InteractionArtificial Cognition for Human-robot Interaction
Artificial Cognition for Human-robot InteractionSubmissionResearchpa
 
Contemporary Challenges for a Social Signal processing
Contemporary Challenges for a Social Signal processing Contemporary Challenges for a Social Signal processing
Contemporary Challenges for a Social Signal processing cscpconf
 
Identifier of human emotions based on convolutional neural network for assist...
Identifier of human emotions based on convolutional neural network for assist...Identifier of human emotions based on convolutional neural network for assist...
Identifier of human emotions based on convolutional neural network for assist...TELKOMNIKA JOURNAL
 
The study of attention estimation for child-robot interaction scenarios
The study of attention estimation for child-robot interaction scenariosThe study of attention estimation for child-robot interaction scenarios
The study of attention estimation for child-robot interaction scenariosjournalBEEI
 
Predicting user behavior using data profiling and hidden Markov model
Predicting user behavior using data profiling and hidden Markov modelPredicting user behavior using data profiling and hidden Markov model
Predicting user behavior using data profiling and hidden Markov modelIJECEIAES
 
An Extended Reasoning Cycle Algorithm for BDI Agents
An Extended Reasoning Cycle Algorithm for BDI AgentsAn Extended Reasoning Cycle Algorithm for BDI Agents
An Extended Reasoning Cycle Algorithm for BDI Agentspaperpublications3
 
An Extended Reasoning Cycle Algorithm for BDI Agents
An Extended Reasoning Cycle Algorithm for BDI AgentsAn Extended Reasoning Cycle Algorithm for BDI Agents
An Extended Reasoning Cycle Algorithm for BDI Agentspaperpublications3
 
Review of facial expression recognition system and used datasets
Review of facial expression recognition system and used datasetsReview of facial expression recognition system and used datasets
Review of facial expression recognition system and used datasetseSAT Journals
 
Review of facial expression recognition system and
Review of facial expression recognition system andReview of facial expression recognition system and
Review of facial expression recognition system andeSAT Publishing House
 
Exploiting incidental interactions between mobile devices
Exploiting incidental interactions between mobile devicesExploiting incidental interactions between mobile devices
Exploiting incidental interactions between mobile devicesRaúl Kripalani
 
A Study on Face Expression Observation Systems
A Study on Face Expression Observation SystemsA Study on Face Expression Observation Systems
A Study on Face Expression Observation Systemsijtsrd
 
How women think robots perceive them – as if robots were men
How women think robots perceive them – as if robots were men How women think robots perceive them – as if robots were men
How women think robots perceive them – as if robots were men Matthijs Pontier
 
A persuasive agent architecture for behavior change intervention
A persuasive agent architecture for behavior change interventionA persuasive agent architecture for behavior change intervention
A persuasive agent architecture for behavior change interventionIJICTJOURNAL
 
USER EXPERIENCE AND DIGITALLY TRANSFORMED/CONVERTED EMOTIONS
USER EXPERIENCE AND DIGITALLY TRANSFORMED/CONVERTED EMOTIONSUSER EXPERIENCE AND DIGITALLY TRANSFORMED/CONVERTED EMOTIONS
USER EXPERIENCE AND DIGITALLY TRANSFORMED/CONVERTED EMOTIONSIJMIT JOURNAL
 
Supporting relationships with awareness systems
Supporting relationships with awareness systemsSupporting relationships with awareness systems
Supporting relationships with awareness systemsOnno Romijn
 
Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...
Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...
Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...WTHS
 
Mental Health Assistant using LSTM
Mental Health Assistant using LSTMMental Health Assistant using LSTM
Mental Health Assistant using LSTMIRJET Journal
 

Similar to Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization (20)

CONSIDERATION OF HUMAN COMPUTER INTERACTION IN ROBOTIC FIELD
CONSIDERATION OF HUMAN COMPUTER INTERACTION IN ROBOTIC FIELD CONSIDERATION OF HUMAN COMPUTER INTERACTION IN ROBOTIC FIELD
CONSIDERATION OF HUMAN COMPUTER INTERACTION IN ROBOTIC FIELD
 
Artificial Cognition for Human-robot Interaction
Artificial Cognition for Human-robot InteractionArtificial Cognition for Human-robot Interaction
Artificial Cognition for Human-robot Interaction
 
Contemporary Challenges for a Social Signal processing
Contemporary Challenges for a Social Signal processing Contemporary Challenges for a Social Signal processing
Contemporary Challenges for a Social Signal processing
 
E0352435
E0352435E0352435
E0352435
 
Identifier of human emotions based on convolutional neural network for assist...
Identifier of human emotions based on convolutional neural network for assist...Identifier of human emotions based on convolutional neural network for assist...
Identifier of human emotions based on convolutional neural network for assist...
 
The study of attention estimation for child-robot interaction scenarios
The study of attention estimation for child-robot interaction scenariosThe study of attention estimation for child-robot interaction scenarios
The study of attention estimation for child-robot interaction scenarios
 
Predicting user behavior using data profiling and hidden Markov model
Predicting user behavior using data profiling and hidden Markov modelPredicting user behavior using data profiling and hidden Markov model
Predicting user behavior using data profiling and hidden Markov model
 
An Extended Reasoning Cycle Algorithm for BDI Agents
An Extended Reasoning Cycle Algorithm for BDI AgentsAn Extended Reasoning Cycle Algorithm for BDI Agents
An Extended Reasoning Cycle Algorithm for BDI Agents
 
An Extended Reasoning Cycle Algorithm for BDI Agents
An Extended Reasoning Cycle Algorithm for BDI AgentsAn Extended Reasoning Cycle Algorithm for BDI Agents
An Extended Reasoning Cycle Algorithm for BDI Agents
 
Review of facial expression recognition system and used datasets
Review of facial expression recognition system and used datasetsReview of facial expression recognition system and used datasets
Review of facial expression recognition system and used datasets
 
Review of facial expression recognition system and
Review of facial expression recognition system andReview of facial expression recognition system and
Review of facial expression recognition system and
 
Exploiting incidental interactions between mobile devices
Exploiting incidental interactions between mobile devicesExploiting incidental interactions between mobile devices
Exploiting incidental interactions between mobile devices
 
A Study on Face Expression Observation Systems
A Study on Face Expression Observation SystemsA Study on Face Expression Observation Systems
A Study on Face Expression Observation Systems
 
How women think robots perceive them – as if robots were men
How women think robots perceive them – as if robots were men How women think robots perceive them – as if robots were men
How women think robots perceive them – as if robots were men
 
A persuasive agent architecture for behavior change intervention
A persuasive agent architecture for behavior change interventionA persuasive agent architecture for behavior change intervention
A persuasive agent architecture for behavior change intervention
 
USER EXPERIENCE AND DIGITALLY TRANSFORMED/CONVERTED EMOTIONS
USER EXPERIENCE AND DIGITALLY TRANSFORMED/CONVERTED EMOTIONSUSER EXPERIENCE AND DIGITALLY TRANSFORMED/CONVERTED EMOTIONS
USER EXPERIENCE AND DIGITALLY TRANSFORMED/CONVERTED EMOTIONS
 
Supporting relationships with awareness systems
Supporting relationships with awareness systemsSupporting relationships with awareness systems
Supporting relationships with awareness systems
 
Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...
Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...
Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...
 
C0353018026
C0353018026C0353018026
C0353018026
 
Mental Health Assistant using LSTM
Mental Health Assistant using LSTMMental Health Assistant using LSTM
Mental Health Assistant using LSTM
 

More from Jade Le Maitre

La Robotique dans notre quotidien
La Robotique dans notre quotidienLa Robotique dans notre quotidien
La Robotique dans notre quotidienJade Le Maitre
 
Valéry, Mathias, Serge, Anthony : ou sont les speakeuses ?
Valéry, Mathias, Serge, Anthony : ou sont les speakeuses ?Valéry, Mathias, Serge, Anthony : ou sont les speakeuses ?
Valéry, Mathias, Serge, Anthony : ou sont les speakeuses ?Jade Le Maitre
 
#BlendWebMix : Réseaux sociaux et évènementiel
#BlendWebMix : Réseaux sociaux et évènementiel#BlendWebMix : Réseaux sociaux et évènementiel
#BlendWebMix : Réseaux sociaux et évènementielJade Le Maitre
 
Réseaux sociaux : communication évènementielle et institutionnelle
Réseaux sociaux : communication évènementielle et institutionnelleRéseaux sociaux : communication évènementielle et institutionnelle
Réseaux sociaux : communication évènementielle et institutionnelleJade Le Maitre
 
Dossier de sponsoring pour le 4L Trophy - Tr4id-Union
Dossier de sponsoring pour le 4L Trophy - Tr4id-UnionDossier de sponsoring pour le 4L Trophy - Tr4id-Union
Dossier de sponsoring pour le 4L Trophy - Tr4id-UnionJade Le Maitre
 
Osez La Robotique pour faire avancer votre entreprise !
Osez La Robotique pour faire avancer votre entreprise !Osez La Robotique pour faire avancer votre entreprise !
Osez La Robotique pour faire avancer votre entreprise !Jade Le Maitre
 
[H2020] New Horizon of European Robotics and Cognitive Systems
[H2020] New Horizon of European Robotics and Cognitive Systems[H2020] New Horizon of European Robotics and Cognitive Systems
[H2020] New Horizon of European Robotics and Cognitive SystemsJade Le Maitre
 
Science et médias sociaux : une réaction exothermique
Science et médias sociaux : une réaction exothermiqueScience et médias sociaux : une réaction exothermique
Science et médias sociaux : une réaction exothermiqueJade Le Maitre
 
Traitement du signal social pour la robotique d'assistance aux personnes défi...
Traitement du signal social pour la robotique d'assistance aux personnes défi...Traitement du signal social pour la robotique d'assistance aux personnes défi...
Traitement du signal social pour la robotique d'assistance aux personnes défi...Jade Le Maitre
 

More from Jade Le Maitre (9)

La Robotique dans notre quotidien
La Robotique dans notre quotidienLa Robotique dans notre quotidien
La Robotique dans notre quotidien
 
Valéry, Mathias, Serge, Anthony : ou sont les speakeuses ?
Valéry, Mathias, Serge, Anthony : ou sont les speakeuses ?Valéry, Mathias, Serge, Anthony : ou sont les speakeuses ?
Valéry, Mathias, Serge, Anthony : ou sont les speakeuses ?
 
#BlendWebMix : Réseaux sociaux et évènementiel
#BlendWebMix : Réseaux sociaux et évènementiel#BlendWebMix : Réseaux sociaux et évènementiel
#BlendWebMix : Réseaux sociaux et évènementiel
 
Réseaux sociaux : communication évènementielle et institutionnelle
Réseaux sociaux : communication évènementielle et institutionnelleRéseaux sociaux : communication évènementielle et institutionnelle
Réseaux sociaux : communication évènementielle et institutionnelle
 
Dossier de sponsoring pour le 4L Trophy - Tr4id-Union
Dossier de sponsoring pour le 4L Trophy - Tr4id-UnionDossier de sponsoring pour le 4L Trophy - Tr4id-Union
Dossier de sponsoring pour le 4L Trophy - Tr4id-Union
 
Osez La Robotique pour faire avancer votre entreprise !
Osez La Robotique pour faire avancer votre entreprise !Osez La Robotique pour faire avancer votre entreprise !
Osez La Robotique pour faire avancer votre entreprise !
 
[H2020] New Horizon of European Robotics and Cognitive Systems
[H2020] New Horizon of European Robotics and Cognitive Systems[H2020] New Horizon of European Robotics and Cognitive Systems
[H2020] New Horizon of European Robotics and Cognitive Systems
 
Science et médias sociaux : une réaction exothermique
Science et médias sociaux : une réaction exothermiqueScience et médias sociaux : une réaction exothermique
Science et médias sociaux : une réaction exothermique
 
Traitement du signal social pour la robotique d'assistance aux personnes défi...
Traitement du signal social pour la robotique d'assistance aux personnes défi...Traitement du signal social pour la robotique d'assistance aux personnes défi...
Traitement du signal social pour la robotique d'assistance aux personnes défi...
 

Recently uploaded

8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital ManagementMBA Assignment Experts
 
Book Review of Run For Your Life Powerpoint
Book Review of Run For Your Life PowerpointBook Review of Run For Your Life Powerpoint
Book Review of Run For Your Life Powerpoint23600690
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17Celine George
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesAmanpreetKaur157993
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnershipsexpandedwebsite
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...Nguyen Thanh Tu Collection
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文中 央社
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project researchCaitlinCummins3
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSAnaAcapella
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...Nguyen Thanh Tu Collection
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi RajagopalEADTU
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaEADTU
 
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportBasic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportDenish Jangid
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhleson0603
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...Nguyen Thanh Tu Collection
 
male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................MirzaAbrarBaig5
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024Borja Sotomayor
 

Recently uploaded (20)

8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
Book Review of Run For Your Life Powerpoint
Book Review of Run For Your Life PowerpointBook Review of Run For Your Life Powerpoint
Book Review of Run For Your Life Powerpoint
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategies
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopal
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
 
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportBasic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 

Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization

  • 1. Noname manuscript No. (will be inserted by the editor) Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization Jade LE MAITRE · Mohamed CHETOUANI Received: date / Accepted: date Abstract The estimation of engagement is a funda- 1 Introduction mental issue in Human-Robot Interaction and assistive applications. In this paper, we describe (1) the design During the past decades, there has been growing in- of triadic situation for cognitive stimulation for elderly terest in service robotics partially due to human assis- users; (2) the characterization of social signals describ- tive applications. The proposed robotic systems are de- ing engagement: system directed speech (SDS) and self- signed to address various supports: physical, cognitive talk (ST); (3) a framework for estimating an interac- or either social. Human-Robot Interaction (HRI) plays tion effort measure revealing the engagement of users. a major role in these applications, identifying more pre- The proposed triadic situation is formed by a user, a cisely Socially Assistive Robotics (SAR) [1] as a promis- computer providing cognitive exercises and a robot pro- ing field. Indeed, SAR aims to aid patients through so- viding encouragements and helps through verbal and cial interaction with several applications, including mo- non-verbal signals. The methodology followed for the tivations and encouragements during exercises [1–3]. design of this situation is presented. Wizard-of-Oz ex- Providing social signals during interaction is contin- periments have been carried out and analyzed through uously done during human-human interaction [4] and eye-contact behaviors and dialogue acts (SDS and ST). lack of them is identified in pathologies such as autism An automatic recognition systems of these dialogue acts [5]. Interpretation and generation of social signals al- is proposed with k-NN, decision tree and SVM clas- low sustaining and enriching interactions with conver- sifiers trained with pitch, energy and rhythmic based sational agents and/or robots. In [6], an early system features. The best recognition system achieved an ac- that realizes the full-action reaction cycle of communi- curacy of 71%. Durations of both manually and auto- cation by interpreting multimodal user input and gen- matically labelled SDS and ST were combined to esti- erating multimodal agent behaviors is presented. The mate the Interaction Effort (IE) measure. Experiments importance of feedbacks for the regulation of interac- on collected data prove the effectiveness of the IE mea- tion has been highlighted in several situations [7,8]. sure in capturing the engagement of elderly patients The ROBADOM project [9] is devoted to the design during the cognitive stimulation task. of a robot-based solution for assistive daily living aids: management of shopping lists, meetings, medicines, re- Keywords Social signal processing · Measuring minders of appointments. Within the project, we are engagement · Prosodic cues developping a specific robot to provide verbal and non- verbal helps such as encouragement and coaching dur- Jade Le Maitre ing cognitive stimulation exercises. Cognitive stimula- ISIR UMR 7222 Universit´ Pierre et Marie Curie e tion is identified as one of the methodologies alleviating E-mail: lemaitre@isir.upmc.fr the elderly decline in some cognitive functions (memory, Mohamed Chetouani attention) [10]. The robot would be dedicated to MCI ISIR UMR 7222 patients (Mild Cognitive Impairment, i.e. the presence Universit´ Pierre et Marie Curie e of cognitive impairment that is not severe enough to E-mail: mohamed.chetouani@upmc.fr meet the criteria of dementia). Cognitive impairment
  • 2. 2 Jade LE MAITRE, Mohamed CHETOUANI is one of the major health problems facing elderly peo- approaches attempt to estimate engagement from gaze ple in the new millennium. This does not only refer to [12, 24,13–15] by considering eye-contact as a promi- dementia, but also to lesser degrees of cognitive deficit nent social signal. Eye-contact is usually employed to that are associated with a decreased quality of life and, regulate the communication between humans [21]: ini- in many cases, progress to dementia. tial contact, turn-taking, triggering backchannels... In the work described in this paper, an engagement metric is developed for the estimation of interaction Mutual gaze has been shown to contribute to smooth efforts during cognitive stimulation exercises. Engage- turn-taking [16,15]. Goffman [17] mentioned that eye- ment is considered as the process to which partners contact during interaction tends to signal each partner establish, maintain and end interactions [13]. Engage- that they agree to engage in social interaction. Defi- ment detection is identified as a key element for the ciency or failure in gaze during interaction may be in- design of socially assistive robots. We propose to study terpreted as lack of interest and attention as noticed engagement in a triadic framework: user - computer by Argyle and Cook [18]. In face-to-face communica- (providing cognitive exercises) - robot (providing en- tion, initiation, regulation and/or disambiguation can couragements and backchannels). We identified specific be achieved by eye-gaze behaviors. Efficiency of an in- social signals such as system directed speech and self- teraction is based on the ability of shifting roles, which talk as indicators of engagement during interaction. In is again possible via eye-gaze behaviors [19,20]. Dur- this work, engagement is not considered as all-or-none ing interaction, gaze might be combined with speech. phenomenon but rather a continuous characterization Kendon [21] analyzed these situations and, for instance, is proposed. To our knowledge, this is perhaps the first identified the fact that speakers look away from their study that attempts to automatically estimate a metric, partners at the beginning of an utterance, and look at termed Interaction Effort, by exploiting dialogue acts. their partners at the end of an utterance. This proce- Specifically, our contributions are: dure might be useful since it serves to avoid cognitive load (i.e. planning of the utterance) as well as shifting – An automatic recognition system for the detection role with the partner. of both system directed speech and self-talk. Self- talk provides insights about the cognitive load of In HRI situations, robots are required to estimate the patient with MCI. We also proposed relevant the engagement of addressee for efficient communica- rhythmic features for the characterization. tion. Estimation of gaze is a difficult task in HRI due to – The definition and evaluation of a measure of en- greater distances between the robot and the addressee, gagement based the previously detected acts. This consequently other cues such as head orientation, body measure is employed to understand the strategy of posture, and pointing might also be used to indicate the patients during cognitive stimulation exercises. at least direction of attention. Most of the proposed techniques can be seen as based on the concept of face The remainder of this paper is organized as follows: engagement proposed by Goffman [17] to describe the Section 2 describes the related works in human-human process in which people employ eye contact, gaze and and human-machine interactions. Section 4 and section facial gestures to interact with or engage each other. Ba- 5 give an overview of the cognitive stimulation situa- sically, the engagement detection framework is based on tion including the design of the robot and the Wizard- (1) face detection and (2) facial/head gestures classifi- of-Oz experiment. Section 6 describes the analysis of cation. In [22], in order to understand behaviors of the the manually labelled data for the extraction of dia- potential addressee in human-robot interaction task, logue acts from the Wizard-of-Oz experiment experi- the authors proposed to combine multiple cues. A set of ment: self-talk (ST) and system directed speech (SDS). utterances is defined and used to start an interaction. Section 7 shows and discusses the engagement charac- In addition to the detection step, the authors estimate terization framework. The experiments carried on for visual focus of attention of users. They compute prob- the evaluation of the proposed metric are described in abilities that the partner is looking at a pre-defined list section 8. Finally, section 9 concludes our work. of possible focus targets. Since the focus targets include the robot itself and other potential users, engagement 2 Related work estimation is reinforced and allows to take benefit from the eye-gaze functions without an explicit modeling. A 2.1 Social cues of engagement similar work done in [12] in a multi-robot interaction framework, based on face detection and gestures classi- The problem of detecting engagement has been stud- fication, makes it possible to select and command indi- ied using verbal and non-vebal cues but many existing vidual robots.
  • 3. Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization 3 Robots could also use eye-gaze behaviors for the tivated by the fact that, in rehabilitative context, the improvement of the interaction. Mutlu et al. [23] con- extraction of implicit information on the patient are of ducted experiments where their robot, Robovie, em- seminal importance. Physiological signals are more cor- ployed various strategies of eye-gaze behaviors to signal related to internal state of a user and consequently can specific roles during interaction: addressee, bystander, be employed to infer emotional information [28]. En- and overhearer. The authors show that gaze direction gagement detection from these signals can be used by a serves as a moderator since gaze cues support the con- robot to alter the interaction scenario such as coaching, versation by reinforcing roles and participation of hu- assistance... man subjects. Peters et al. [29] have highlighted the importance All the previous mentioned works have shown the and the various elements related to engagement. They benefit of using eye-gaze behaviors for measuring en- proposed a simplified model based on an action-cognition- gagement during interaction. However, this social signal perception loop, which makes it possible to differentiate should be more precisely characterized. Rich et al. [24] between the several aspects of engagement: perception investigated some relevant cues for recognizing engage- (e.g. detection of cues), cognition (e.g. internal state: ment firstly in human-human interaction and then pro- motivation), action (e.g. display interest). They also posed an automatic system for HRI situations. Regard- identified a dimension termed experience, which aims ing eye-gaze behaviors, they identified directed gaze and at covering subjective experiences felt by individuals. mutual facial gaze. Directed gaze characterizes events This work shows that engagement is not a simple con- when one person looks at some object following which cept and, as other social signals, investigations on the the other person looks. Mutual gaze refers to events characterization, detection and understanding are still when one person looks at the other person’s face. As a required for the design of adaptive interfaces including result, various features are employed to describe com- robots. municative functions of eye-gaze behaviors. Ishii et al. In this paper, we deal with the characterization of [15] also extract various features from eye-gaze behav- engagement detection in assistive context and more pre- iors such as gaze transitions, occurrence of mutual gaze, cisely in a cognitive stimulation situation with elderly gaze duration, distance of eye movement, and pupil size. people. Research works on related topics are usually de- These features are employed to predict user’s conversa- voted to acceptability [30–32], however maintaining en- tional engagement. The statistical modeling and combi- gagement is identified as a key component of socially as- nation of the features have shown the relevance of all of sistive robots [33,34]. The use of interactive technology them in the recognition of user’s attitudes (engagement may be more challenging for elderly users. Xiao et al. vs disengagement). [35] have investigated how seniors deal with multimodal Other cues can be employed to detect engagement interfaces and they found that elderly people require in social interactions. Castellano et al. [14] proposed more time and make errors as it can be expected. But, to combine eye-contact with smiling, which is consid- more interestingly, they identified that older users em- ered in their specific game scenario as an indicator of ploy an audible speech register termed self-talk, which is engagement. The authors enriched the characterization a kind of think-aloud process produced during difficult by adding contextual features such as the game state tasks. Discriminating self-talk (SF) from system/robot and the behavior of the robot used (iCat facial expres- directed speech (SDS) is of great importance for two sions). A Bayesian network is employed for the model- main reasons: (1) intended interactions are produced ing of cause-effect relationships between the social sig- during SDS, (2) production of SF can be used as an nals and contextual features. The evaluation makes it indicator of engagement. The next section aims at de- possible to identify a set of actions done by users corre- scribing more precisely this specific speech register. lated with engagement. Spatial movements have been used for initiating conversations [25] and more generally for engagement characterization (see [26] for an inter- 2.2 Self-talk in machine interaction esting discussion on relevant social cues). All these results show that in various tasks eye-gaze Following the definition of Oppermann [11], self-talk, or behaviors could be combined with others cues in order private speech, refers to audible or visible talk people to improve the detection rates. Other strategies can be use to communicate with themselves. This register can followed by avoiding the estimation of eye-gaze behav- be considered as a part of off-talk, which is a special dia- iors. For instance in [27], the authors employ physiolog- logue act characterizing ”every utterance that is not di- ical signals such as skin response and skin temperature rected to the system as a question, a feedback utterance for the estimation of engagement. The work was mo- or an instruction”. Off-talk is identified as a problem
  • 4. 4 Jade LE MAITRE, Mohamed CHETOUANI for automatic speech recognition systems and distin- which is traduced by the production of self-talk. By be- guishing it from on-talk (or system directed speech) will ing aware of theses phases, the coach robot will be able clearly improve recognition rates. However, the charac- to produce useful feedback, encouragement or help. teristics of off-talk make the task difficult. For instance, In order to identify the interaction phases between a in case the user is reading instructions, lexical informa- therapeuth and a patient, we conducted experiments to tion are not discriminant and other features should be acquire interaction datas. In 4 we describe the actions employed. One relevant strategy is to try to combine au- of a therapeuth during a cognitive stimulation exercise, dio and visual features as proposed in [36,37]. Batliner and how the robot should be adapted. The robot’s in- et al. formulated the problem by defining on-talk vs off- teraction are described in section 3, and the technical talk and on-view vs off-view strategies. The combina- setup of the Wizard of Oz experiments are detailed in tion of them leads to on-focus (on-talk + on-view) and section (5.2). off-focus, where on-view is not discriminant: listening After the completion of all experiments, the cor- to someone and looking away. The authors employed pus analysis begins, as described in 6. The very first an audio-visual framework for the classification of on- analysis is the manual annotation of all the records talk from various off-talk elements (read, paraphras- taken during the experiments. After a gaze and key- ing and spontaneous off-talk). Prosodic, part-of-speech words annotation, the next step is the pooling of the (POS) features and visual features (a simple face detec- annotated keywords in clusters using a Latent Semantic tion system) are employed. The detection of user’s focus Analysis Method. This method groups together seman- on interaction yields 76.6% by using prosodic features tically similar keywords in meaningful clusters, struc- and the combination with linguistic and visual features turing in a non-supervised way the dialogue acts, and allows to achieve 80.8% and 84.5% respectively. giving them a semantic signification, as explained in From a conceptual point of view, open-talk (robot- figure 3. directed speech) is considered as a social speech be- In these clusters occurs another manual annotation, cause it is produced with the objective of communica- splitting the keywords in two categories, whenever they tion, while self-talk is known to be a means for think- were spoken to the robot and/or the computer (System- ing, planning and for self-regulation of behavior [38]. Directed-Speech) or they were spoken to the patient Lunsford et al. [37] investigated audio-visual cues and himself (Self-Talk). With all these labelled keywords reviewed some functions of self-talk. Among the most (cluster, ST or SDS), we are able to perform an en- interesting ones, the authors reported that self-talk sup- gagement characterization as described in section 7. ports task performance and the self-regulation. Poten- tial benefits of improving estimation of engagement are (1) design of adaptive social interfaces (including robots) (2) improvement of the impact of assistive devices and (3) understanding the strategies and behaviors of indi- viduals. Fig. 1 Steps of engagement characterization 3 Objectives The purpose of this paper is engagement characteriza- 4 Patient-Therapeuth Interaction tion in an interaction between a MCI patient and a robot with the help of verbal utterances classification Before conducting experiments between a robot ant pa- in Self-Talk or System-Directed Speech. tients, we collected data about how patients interacted As seen in [58], during the completion of a spatial with a therapist. The patient had to solve exercises on task by seniors, a high amount of self-talk is observed: a tactile screen, while the therapist, seated near the pa- 80% of the subjects of their study engaged in ST at tient, observed the situation and provided help when- some point during their session. This amount increases ever the patient needed one. The therapist could help with the difficulty level of the task, which is in strong with the technical setup, indicating how to deal with correlation with the cognitive load of the person: the the tactile screen, or just provide help for a particu- ST amount increased from low to high difficulty tasks lar exercise, how to correct an answer or just say if (26.9% versus 43.7%, respectively). A similar situation the patient answered correctly. The interaction between is proposed through the cognitive stimulation experi- the patient, the therapist and the cognitive stimulation ment (section 4) and the key idea is to identify phases exercise is a triadic situation, as shown in Figure 2. where the patient are less engaged on this activity, Backchannels of the therapist are important for these
  • 5. Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization 5 patients to gain in confidence and solve these exercises Paro. Various robots fitting these characteristics can correctly, even if the therapist doesn’t have to say any- be employed but we selected the rabbit shaped Violets thing - the mere presence of the therapist matters. Psy- Nabaztag, type Nabaztag: tag. This electronic device chologists at the Broca Hospital therefore organized ses- has enabled Wi-Fi, and can connect to the internet to sions of recorded cognitive stimulation, which will lead process specific services via a distant server located at on our side to the analysis of the interaction between http://www.nabaztag.com. The Nabaztag has motor- the patient and the therapist. Thanks to these sessions, ized ears, 4 color LEDs, a speaker and a microphone. we were able to determine the interaction phases of the As it will be described in section 5.2, ears and LEDs are therapist, and duplicate them later to the robot. The employed to enhance the expressiveness of the Nabaz- therapist is always at the side of the patient, measur- tag. Regarding the acceptability of this robot, exper- ing and evaluating the attention and engagement of the imental results can be founded in other projects such patient. The presence of the therapist is important for as SERA (Social Engagement with Robots and Agents) the patient to gain in confidence. As shown in Figure 2, where social engagement is investigated [30, 31]. during the cognitive stimulation exercises, the patient sits in front of the tactile screen with the therapist at his right. Because the acceptance of the robot is our second concern between the fact that the robot will re- act appropriately, we investigated as well with our col- leagues which kind of robot might be acceptable by the 5.2 Description of the Wizard of Oz Experiments targeted end-users with Mild-Cognitive Impairments. 5.2.1 Technical and Experimental Design Human-robot communication differs from human-human communication. Therefore, to gather reliable informa- tion about human-robot communication it is important to observe the human behaviour in a situation in which humans believe to be interacting with a real robotic system. It is important that the user thinks he or she is communicating with the system, not a human, as noted for example by [57]. The purpose of our Wizard of Oz experiments is to record interactions between the patient and the robot, using the interactions schemes observed between the patient and the therapist. Af- Fig. 2 Triadic situation either with robot or therapist ter analyzing the videos of the sessions between the therapist and the patient, patterns of interaction were detected (therapist encouraging, therapist answering a question, backchannels) and adapted to the Nabazatg. The purpose was to give to the Nabaztag an interaction 5 Patient-Robot Interaction panel, leaving the Wizard in charge the responsibility to choose the right couple [answer+backchannel ] for each 5.1 Designing Robot for MCI patients situation. Focus group sessions were conducted at the Broca Hos- The patient seats in front of the tablet-PC, with pital to identify how the elderly perceive a robot’s ap- the Nabaztag at his left as shown in Figure 2. Dur- pearance. 15 adults over the age of 65, divided in three ing the sequence of cognitive exercises solved on the groups, took part in the sessions. 13 of them were re- tablet PC, the robot interacts with the patient. The cruited from the Memory Clinic at the Broca Hospi- Wizard gathers informations about the situation with tal; two were recruited from an association for the el- the help of two cameras, and a screen capture of the derly. Seven of the participant suffered from Mild Cog- tactile screen. The Wizard can hear the patient, but nitive Impairment, according to the definition criteria reversed situation is impossible. The Nabaztag is re- of [39]. From the results of the focus group, the robots motely controlled by the Wizard, activating the couple defined as attractive to them were small robots, often [answer+backchannel ] at the same time. The mean du- with a modern design, shaped like animals or objects ration of a WOZ experiment is 7min30s and with a total they could use in their daily life [40], like Mamoru or of 96min.
  • 6. 6 Jade LE MAITRE, Mohamed CHETOUANI 5.2.2 Verbal and Nonverbal Behaviors for the Nabaztag 5.3 Participants Analysis A total of eight participants were chosen by the ther- Regarding the Nabaztag, nonverbal behaviors should apeuths at the Broca hospital, seven females and one be defined. Similar to other robots, such as Paro [41], male, aged from 64 to 82, participated in the experi- Emotirob [42] or Aibo [8], the Nabaztag can exploit ments. Two of them had a slight MCI, the remaining movements and sounds as social communicative signals. six did’nt have any communication, hearing, or vision According to the work of Lee and Nam [43] about the impairments. Examples of interaction between the par- relation between physical movement and the augmen- ticipants and the Nabaztag are shown in the following tation of emotional interaction, the expressions of the two paragraphs, one dedicated to the SDS, the second Nabaztag will be correlated with both speed of move- to ST. ments and LEDS blinking. As shown in Figure 3, slow movements or blinking LEDS will express unpleasant Example Dialogue Between a User and the Nabaztag: expressions such as sadness or annoyance, expressed System-Directed-Speech when the user is getting lost or doesn’t know how to solve the exercise. Positive expressions are related to Nab. : Good Morning, my name is Carole, what’s your active movements and blinking, which are employed to name? encourage the user, for instance. User : My name is Bob. Nab. : Hello, Bob. I’m here to help you solve your exer- The nonverbal behaviors for the Nabaztag were im- cises. Do you want to start them now? plemented using the NabazFrame, developed by the User : Yes! University of Bretagne Sud1 . Nonverbal choreographies Nab. : Let’s go! First, you have to drag the images into contain ear movements and different sets of blinking the box corresponding to their name. LEDs, the color depending of the mood or feedback User : How do I do that? we wanted to transmit. Green and yellow fast blinking Nab. : Press on the image, then drag it to the box. LEDs express pleasant expressions, while slow move- ments with blue and violet LEDs express unpleasant Example Dialogue Between a User and the Nabaztag: signals (Fig. 3). These behaviors are currently tested Self-Talk by end-users in another work. User : And I drag the little image with the tree to automn box... oh, it’s not coming! She’s gone, uh, found, i drag it to the box, oh, it escaped... Nab. : When dragging the image to the correct box, you must always touch the screen. User : Uh, yes, I drag it, i drag the image, the tree is in the box. The image with the flowers can’t be the summer, these flowers grow in the spring, oh, i don’t know, I drag the image to the spring, she’s not com- ing, ah, yes, let’s go the the next image. 6 Analysis of the annotation of WOZ experiments 6.1 Manual Annotation of the Content of the WOZ Experiments For each of the 8 participants of the recorded exper- Fig. 3 Relationship between movements and expressions iments, the first step was the manual annotation of the videos taken during the Wizard of Oz experimenta- tions. Before the annotation begun, the videos from the two cameras were synchronized and edited together, in order that the annotater could view the patient from the two different angles: computer and robot. First, the 1 www-valoria.univ-ubs.fr gaze was annotated, without paying attention to the
  • 7. Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization 7 speech. The annotater carefully annotated when the pa- detection and evaluation through act dialogues such as tient was looking at the robot (the setup made clearly self-talk and open-talk is relevant. The eye-contact is visible when the patient was looking at the robot: from not discriminant in our case, but we found that the the computer point of view, the person moves her head, verbal production of the patient could help to charac- and from the robot-point of view the person gaze is fo- terize their difficulties. Because the robot needs to know cused on the camera), and then when the patient was when the patient encounter difficulties, to produce the looking directly at the computer (same but reversed). appropriate feedback and help the patient, we decided The second step was to perform the speech annotation. to focus our work on the verbal utterances. The annotater first listened, then annotated the rele- vant keywords found in the patient’s utterances. Key- words can be a single word, but also an expression of 6.2 Latent semantic analysis of the content of words with close signification: ”I’m doing well ” will be annotation annotated as one keyword, for example. Filled pauses were not annotated. The spoken keywords were primary Understanding the content of annotation of interactive divided in eight simple different categories, defined by databases is made difficult by the strategies of indi- the annotator after seeing the whole eight films. A gen- viduals in a cognitive stimulation task. Indeed, several eral structure was defined, as defined below: keywords or utterances correspond to one of the 8 cate- – Agreement : ”Let’s start”, ”Yes”, ”You’re right” gories. To provide insights on the annotation databases, – Technical Question (often about the use of the and consequently on the behaviors exhibited, we per- tactile screen) formed a latent semantic analysis (LSA) [44]. LSA is – Contextual Question (about the cognitive exer- based on a Singular Value Decomposition (SVD) of cise itself, such as ”What should I do with this pic- term-document matrix, and results on a reduced di- ture? ”, ”Should I put it here, or there? ”) mension of the feature space. The interpretation of the – Non-Obligatory Turn-Taking: Comment with- reduced space makes it possible to identify semantic out the need of an answer by the robot (Users with concepts. Altough LSA is usually performed on texts, MCI are often commenting about what they are do- we decided to use it here because the verbal production ing, ”And I move the picture over there...”) of the patients is not spontaneous. The verbal utter- – Obligatory Turn-Taking: Comment with the need ances are structured (the exercise, or their technical of an answer by the robot (such as ”I hope I am do- difficulties). Even in the Self-Talk parts of the interac- ing right”), they indicate us the difficulty felt by the tion, the verbal production is structured and limited. user The term-by-documents matrix becomes, in our case, – Support Needed: User is confused and need more a keyword-by-clusters matrix. To train LSA we esti- support (”I don’t remember ”, ”I don’t know what I mated the occurrence of each one of the 247 different should do? ”) keywords or utterances produced by the 8 participants – Thanks: (some user thanks the Nabaztag when they regarding the 8 categories. And using a Kaiser criterion receive help, and one complimented it about its color (80% of the information), the reduced rank was set to and shape) 5. We carefully analyzed the content (keywords) of the – Disagreement (”No, I don’t want”) obtained clusters and they were interpreted as: The gaze annotation shows that 89.76% of the time – Positive Feeling: positive feeling expressed toward users are looking at the computer. This is explained by the robot, or toward the exercises. the fact that they were asked to realize the exercises. – Comments: useful keywords, but expressing noth- Other explanations could be found in the specific design ing than a simple statement about the exercise. of the triadic situation, eye-contact with the robot is not – Social Etiquette: the keywords put in this cluster required for effective engagement: the robot is only here were all expressing agreement with the robot(”Okay”, to help and encourage, to solve an exercise the patient ”I’m in”, ”Thanks”). This cluster is based on J. has to gaze at the computer. As previously discussed, Light’s original work, analyzing communication and eye-gaze behaviors might not be discriminant for en- characterizing the communication messages into dif- gagement detection on some tasks, and more specifi- ferent categories [59]. cally for seniors. Similar results have been obtained in – Request Information: all the keywords express- [37], where 99.5% of the self-talk utterances were asso- ing a question on the exercises were put in this clus- ciated with a gaze behavior directed to the system, and ter. The questions are context based, they depend 98.1% for system directed speech. Eye-contact is not on which part of the exercises the patient experi- always discriminant for engagement / disengagement ences difficulties.
  • 8. 8 Jade LE MAITRE, Mohamed CHETOUANI – Others: the last cluster, in which the words are rel- highest amount of ST are respectively the Positive Feel- evant but the amount of words is to small to form a ings, with 91 utterances, and the Others cluster, with specific cluster. This last cluster covers various com- 130 utterances. These two clusters are in direct relation- ments usefulness for the cognitive exercise but to- ship with the exercises: the users expresses his feelings tally relevant for engagement because of the amount toward the exercise in the first cluster, and in the last of self-talk (see table 1). one are labelled various utterances about the exercises, expressed by the patients: In fact, all the patients were totally engaged in the interaction and didn’t perform 6.3 Speech corpus any other polluting task. Our speech corpus consists of 543 utterances or key- word, and each of them correspond to one of the se- Table 1 Distribution of self-talk and system directed speech mantic cluster. Each utterance or keyword has been over the clusters carefully annotated: (1) self-talk or (2) system/robot di- Semantic cluster Self-Talk System Directed Speech rected speech. The annotator simply answered to these Positive Feeling 91 9 two questions: ”According to you, does the person speak Comments 21 24 Social Etiquette 43 69 to herself ? ” (ST) or ”speak to the robot or the com- Request Information 27 33 puter? ” (SDS). Others 130 96 We evaluated the subjectivity of the annotation pro- cess by evaluating interjudge agreement. A second naive annotator was chosen, who didn’t have any contacts On table 2, patients 3 and 7 have the highest num- with the participants, and didn’t even know before the ber of self-talk verbalizations, they are in fact the two differences between ST and SDS. This annotator watched MCI patients and this result is well explained consid- the videos and had to choose, based on the same ques- ering their pathology. Patient 3 was the most talkative tions, wether the verbal utterances were ST or SDS. patient, and expressed a various range of positive feel- We then performed an inter-annotator score between ings. A measure of the importance of ST per patient the very first labellization and the one of the naive an- could trace the evolution of a given patient. notator. With the kappa method, the result was a score of 0.68 showing a sustainable agreement. Table 2 Number of self-talk and system directed speech The distribution of the utterances over the clusters is given in table 1. Most of the self-talk verbalizations Users Self-Talk System Directed Speech 1 20 19 are present in the most heterogeneous semantic cluster 2 1 7 (Others), which shows that the amount of self-talk plays 3 106 85 a role on engagement. Positive feeling is mostly com- 4 14 2 posed by self-talk elements. This can be well explained 5 30 6 by the fact that we are dealing with the expression of 6 37 20 7 58 37 feelings, or positive comments which appears when the 8 49 55 robot gives satisfaction to the user. As the patient is resolving a problem thanks to the robot, he is grate- ful but concentrated on the exercise. He will speak to After an acoustic analysis of all keywords and utter- himself about his feelig of joy, or gratitude, because ances annotated as self-talk and system directed speech, he is concentrated on the exercise. After the exercises we selected 293 utterances for ST and 223 for SDS. The are complete, the cognitive load of the patient is lower, removed utterances were mainly due to their shorter thus the patients often express their gratitude, but this duration (less than 1s). Durations of the utterances are time directly to the robot. As one can expect, system between 1 to 2.5s. directed speech is more present on semantic clusters characterized by direct relationship to the cognitive ex- ercise: comments, social etiquette and request informa- 7 Using self-talk detection for engagement tions. In fact, these clusters express a direct speech to the robot because the person is asking, or requesting, In this section we describe the system developed for something very precise: as for a discussion between two the measure of engagement from self-talk. Figure 4 de- persons, the patient instinctively direct his speech to picts the proposed system. It requires to discriminate the system. A high amount of ST doesn’t mean a low self-talk from system directed speech. Then, we com- engagement in the interaction: the two clusters with the bine the duration of both act dialogues in a measure of
  • 9. Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization 9 engagement, which aims at characterizing the degree of 7.2 Classification interaction effort. As previously mentioned, self-talk is produced with After the extraction of features, supervised classifica- a lower intensity compared to open-talk or robot-directed tion is used to categorise the data into classes cor- speech. In this paper, we propose to investigate prosodic responding to (1) self-talk, (2) system/robot directed features including pitch, energy and rhythm. This is speech. This can be implemented using standard ma- motivated by the fact that most of the classification chine learning methods. In this study, three different frameworks of specific speech registers such as emotions classifiers, decision trees, k-nearest neighbour (k-NN) [45], infant-directed speech [46], robot-directed speech (k is experimentally set to 3) [53], Support Vector Ma- [47] are mainly based on the characterization of supra- chines (SVM) with Radial Basis Function [54], were in- segmental features. In addition, Batliner et al. [48] have vestigated. shown the relevance of prosodic features and more pre- A k-fold cross-validation scheme was used for the cisely duration features to discriminate on-talk to read experimental setup (k is set to 10) and the performance off-talk. However, researchers on speech characterisa- are expressed in terms of the overall accuracy: average tion and feature extraction show that is difficult to have of the accuracies obtained over the k partitions. a consensus on relevant features for the characterization of emotions, intentions as well as personality. 7.3 Evaluating engagement Engagement is an interactive process [13], in which two participants decide when to establish, maintain and end their connection. Thus, it is important to evaluate the engagement and find a way to establish the interac- tion effort caused by maintaining the interaction. Once the system directed speech and the self-talk sequences are detected, we propose to combine them to estimate a dimension termed interaction effort. The interaction Fig. 4 Description of the proposed system for engagement effort is based on the dialogue: the interaction goes measure hand in hand with the dialogue in our experiments. Because this is a verbal interaction, it is important to analyze the differences in the adressee (self or system) to evaluate the interaction effort. In [55], interaction ef- fort (IE) is defined as a unitless measure of how much 7.1 Feature extraction effort a user put into interacting with a robot. The au- thors show that IE cannot be measured easily because it Several studies have shown the relevance of both fun- will require advanced tools such as eye-tracking and/or damental frequency (F0) and energy based features for brain activity sensors. In our application, we argued emotion recognition applications [49]. F0 and energy that IE is related to the quantity of robot/system di- were estimated every 20 ms using Praat [50], and we rected speech, which is characterized by (1) intended computed several statistics for each voiced segment (seg- communication and (2) on-view state (gazing to the ment based method) [51]: maximum, minimum, mean, system). While self-talk might be an indicator of plan- standard deviation, interquartile range, mean absolute ning, cognitive load is related to the difficulty of the deviation, quantiles corresponding to the cumulative task. We propose to estimate the interaction effort of a probability values 0.25 and 0.75. resulting in a 16 di- given human robot interaction during a cognitive stim- mensional vector. ulation situation by the following expression: Rhythmic features were obtained by applying a set SDS IE = (1) of perceptual filters to the audio signal dedicated to SDS + ST characterization of prominent events in speech termed SDS and ST refer to the duration of system directed p-centers [52]. Then, we estimated the spectrum of the speech and self-talk (in seconds). The numerator is the prominent signal for characterizing the speaking rate. amount of time of intended interaction, and the denom- We estimate 3 features: mean frequency, entropy and inator is the amount of time of interaction time (speak- barycenter of the spectrum. Differences in rhythm are ing). IE is a unitless measure (0 ≤ IE ≤ 1). If SDS is indicators of efficiency and clarity of interaction (fluid- small relative to ST then IE is quite small. Typically, ef- ity). ficient interactions, which do not require self-regulatory
  • 10. 10 Jade LE MAITRE, Mohamed CHETOUANI behaviors from elderly people, will obtain an IE close to Table 3 Accuracy of classifiers using 10 folds validation 1. In some interactions, self-regulatory speech can be a Features Decision Tree k-NN SVM positive measure, improving the interaction, but in our Pitch-based 49.8% 53.35% 52.16% case, it transcribes the cognitive load of the patient. The Energy-based 55.54 54.29% 59.51% robot has to be aware of the patient’s difficulties, linked Rhythm-based 52.78% 56.58% 56.97% Pitch 57.42% 59.28% 64.31% to his cognitive load, we therefore describe the interac- + Energy tion as efficient when the cognitive load (the amount of Pitch 55.46% 58.20% 71.62% ST) is low. + Energy The IE measure proposed in this paper allows to + Rhythm evaluate the efficiency of the interaction. In future works, this measure will allow to change the verbal and non- verbal behaviors of the robot and more interestingly 8.2 Engagement characterization to adapt both cognitive exercises and encouragements provided. This section describes the estimation of the Interac- tion Effort measure (equation 1). For the evaluation 8 Experimental results of only the IE measure, we exploited the manually la- belled data. The results are presented table 4. The best This section describes experiments and results performed IE measure is obtained for the user 2 (0.83), but one for the characterization of engagement based on the de- should be careful with this result since he produced only tection of self-talk. Firstly, the performance of our self- 1 self-talk and 7 system directed speech utterances (ta- talk detection system is presented then we propose to ble 2). For the most talkative user (patient 3), the IE derivate a measure of engagement. measure provides insights about his behavior: a relative balance between ST and SDS. Patients 4 and 5 Interaction effort estimation’s is 8.1 Detection of self-talk under 0.20. It can be easily explained because these patients did not talk directly to the robot. They ad- Table 3 shows the recognition rates of all the classi- dressed the system directly at the beginning of the ex- fiers trained with different feature sets. In [37], energy ercise, showing they understood the instructions. The is found to discriminate system directed speech from other verbal utterances were only comments addressed self-talk. Compared to pitch based features, classifiers to themselves. Patient 5 had little difficulties, where she trained with energy are more efficient and best results had to address directly to the robot to obtain a proper are obtained by SVM. One possible explanation is that answer, but the amount of her self-talk utterances was the extraction of pitch might be more complex for self- too considerable to balance these SDS utterances. talk, which is produced by the users for themselves and consequently with a lower energy and intelligibility. For the automatic estimation, we followed the frame- In addition to what has been described in the lit- work described figure 4. A Vocal Activity Detector (VAD), erature concerning the energy, we argued that rhythm suitable for real-time detection in robotics [8], is em- should be relevant for our application because of the ployed for the segmentation of speech. The self-talk / change of speaking rates observed during self-talk. The system directed speech discrimination system is based experimental results show that using only rhythm based on the SVM classifier trained with pitch, energy and features allow to achieve acceptable performance (56.97%) rhythmic features as previously designed. Once the speech between those obtained by energy (59.51%) and those of utterances classified, we extract their duration and at pitch features (52.16%). Energy and rhythmic features the end of the experiment we estimate the IE measure. are more robust. Rhythmic features is related to the Table 4 shows that the IE measures computed by au- vocalic energy of signal [56] and has similar characteris- tomatic approach capture the individual behaviors of tics to that of short-term energy expect that perceptual each user. In addition, high and low IE measures are filters are employed before the computation of energy efficiently characterized. However, for very low IE mea- (from acoustic prominence enhancement). SVM classi- sures such as user 4, the automatic approach under es- fier trained with the three sets of features outperforms timated the performance. This may due to factors such (71.62%) all the configurations. One should note that as the small amount of verbalizations of this given user. adding features will not exhibit the same performance Due to the imperfect classifications, some of the IE mea- for all the classifiers. Adding features for decision tree sures are over or under estimated but always allowing and k-NN classifiers decreases the performance. to characterize a trend.
  • 11. Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization 11 Table 4 Interaction effort (IE) measure estimation Acknowledgements This work has been supported by French National Research Agency (ANR) through TecSan program Users From Annotation From Automatic (project Robadom ANR-09-TECS-012). The authors would Self-talk Detection like to thank the Broca hospital for their work: Ya-Huei Wu, 1 0.5 0.62 Christine Fassert, Victoria Cristancho-Lacroix and Anne-Sophie 2 0.83 0.78 Rigaud 3 0.45 0.53 4 0.13 0.08 5 0.20 0.26 References 6 0.43 0.46 7 0.42 0.38 1. Feil-Seifer D. J., Mataric, M. J. (2005) Defining Socially 8 0.57 0.63 Assistive Robotics. In International Conference on Rehabil- itation Robotics, pages 465-468, Chicago, IL. 2. Fasola J., Mataric, M. J. (2010) Robot Exercise Instruc- tor: A Socially Assistive Robot System to Monitor and En- 9 Conclusions courage Physical Exercise for the Elderly. In 19th IEEE International Symposium in Robot and Human Interactive In this paper, we have demonstrated promising results Communication (Ro-Man 2010), Viareggio, Italy. on automatically estimating the interaction efforts of 3. Mataric M.J., Tapus A., Winstein C. J., and Eriksson J. (2009) Socially Assistive Robotics for Stroke and Mild patients during coaching experiment with cognitive stim- TBI Rehabilitation. In Advanced Technologies in Reha- ulation exercises. After an analysis of WOZ experiments bilitation, Andrea Gaggioli, Emily A. Keshner, Patrice L. (triadic situation), we identified relevant social signals (Tamar) Weiss, Giuseppe Riva (eds.), IOS Press, 249-262. 4. Vinciarelli A., Pantic M., and Bourlard H. (2009) Social characterizing the engagement of elderly patients: speech Signal Processing: Survey of an Emerging Domain,Image directed talk and self-talk. The last one is employed and Vision Computing Journal, Vol. 27, no. 12, pp. 1743- during difficult tasks by the users for planning, think- 1759. ing and for self-regulation. We proposed a system to 5. Saint-Georges C., Cassel R.S, Cohen D., Chetouani M., Laznik M-C., Maestro S., and Muratori F. (2010) What automatically detect these two social signals based on studies of family home movies can teach us about autistic the extraction of relevant features: pitch, energy and infants: A literature review. Research in Autism Spectrum rhythm with three different classifiers. The experimen- Disorders. Vol 4 No 3, pages 355-366. tal results have shown the discriminative function of 6. Cassell J., Bickmore J., Billinghurst M., Campbell L., Chang K., Vilhj`lmsson H., and Yan H. (1999) Embodi- a energy as described in the literature. In addition, the ment in conversational interfaces: Rea. In CHI’99, pages performance achieved by proposed rhythmic features 520–527, Pittsburgh. demonstrate that users employ a different speech regis- 7. Wrede B., Kopp S., Rohlfing K., Lohse M., and Muhl C. ter for intended communicative acts. (2010) Appropriate feedback in asymmetric interactions, Journal of Pragmatics, vol. 42, no. 9, pp. 2369 - 2384. Future work in this area should investigate multi- 8. Al Moubayed S., Baklouti M., Chetouani M., Dutoit modal cues for the extension to off-talk situations. Off- T., Mahdhaoui A., Martin J.-C., Ondas S., Pelachaud C., Urbain J., Yilmaz M. (2009) Generating Robot/Agent talk is defined as the act of not speaking to an ad- Backchannels During a Storytelling Experiment, In dressee, and it includes self-talk but also talking to a ICRA?09, IEEE International Conference on Robotics and third addressee... In this case, the automatic detection Automation. Kobe, Japan. of on-view states could be of great importance. Among 9. Chetouani M., Wu Y.H., Jost C., Le Pevedic B., Fassert C., Cristancho-Lacroix V., Lassiaille S., Granata, Tapus A., the automatic cues that should be developed, eye-gaze Duhaut D., Rigaud A.S. (2010) Cognitive Services for El- social signal detection remains one of the challenges. derly People: The ROBADOM project, ECCE 2010 Work- Eye-tracking systems are not really suitable for com- shop: Robots that Care, European Conference on Cognitive plex assistive applications and will change the behav- Ergonomics. 10. Yanguas J., Buiza C., Etxeberria I., Urdaneta E., Gal- iors during the interaction. Interaction effort can also dona N., Gonzlez M.F. (2008) Effectiveness of a non- include touching and/or manipulation and a more gen- pharmacological cognitive intervention on elderly factorial eral definition of multimodal and integrative engage- analisys of Donostia Longitudinal Study. Adv. Gerontol. 3, ment characterization should be proposed. 30-41. 11. Oppermann D., Schiel F., Steininger S., Beringer Furthermore, we intent to use the IE measure for N. (2001) Off-talk - a problem for human-machine- the characterization of users and giving the opportu- interaction?, In EUROSPEECH-2001, 2197-2200. 12. Couture-Beil A., Vaughan R., and Mori G. (2010) Se- nity to adapt the robots behaviors and in our specific lecting and Commanding Individual Robots in a Vision- task: the encouragements and potentially the difficulty Based Multi-Robot System. Seventh Canadian Conference of the cognitive exercises. As future work, we will ex- on Computer and Robot Vision (CRV). ploit questionnaires in order to understand and esti- 13. Sidner, Candace L. and Kidd, Cory D. and Lee, Christo- pher and Lesh, Neal (2004) Where to look: a study of mate the engagement awareness of the users during in- human-robot engagement. Proceedings of the 9th interna- teraction (experience of engagement). tional conference on Intelligent user interfaces (IUI’04).
  • 12. 12 Jade LE MAITRE, Mohamed CHETOUANI 14. Castellano G., Pereira A., Leite I., Paiva A., McOwan P. 32. Heerink M., Krose B.J.A., Wielinga B.J., Evers V. (2006) W. (2009) Detecting user engagement with a robot com- The Influence of a Robot’s Social Abilities on Acceptance panion using task and social interaction-based features. by Elderly Users. Proceedings RO-MAN, Hertfordshire, Proceedings of the 2009 international conference on Mul- september 2006, pp. 521-526 timodal interfaces (ICMI-MLMI’09), pages 119–126. 33. Mataric M.J. (2005) The Role of Embodiment in Assis- 15. Ishii R., Shinohara Y., Nakano T., and Nishida T. (2011) tive Interactive Robotics for the Elderly, AAAI Fall Sym- Combining Multiple Types of Eye-gaze Information to Pre- posium on Caring Machines: AI for the Elderly, Arlington, dict User’s Conversational Engagement. 2nd Workshop on VA. Eye Gaze on Intelligent Human Machine Interaction. 34. Tapus A., Tapus C., and Mataric M. J. (2009) The Use 16. Nakano Y.I., Ishii R. (2010) Estimating User’s Engage- of Socially Assistive Robots in the Design of Intelligent ment from Eye-gaze Behaviors in Human-Agent Conversa- Cognitive Therapies for People with Dementia, Proceed- tions. in 2010 International Conference on Intelligent User ings, International Conference on Rehabilitation Robotics Interfaces (IUI2010). (ICORR-09), Kyoto, Japan. 17. Goffman, E. (1963), Behavior in Public Places: Notes on 35. Xiao B., Lunsford R., Coulston R., Wesson M., Oviatt the Social Organization of Gatherings. New York: The Free S. (2003) Modeling multimodal integration patterns and Press. performance in seniors: Toward adaptive processing of in- 18. Argyle M. and Cook M. (1976) Gaze and Mutual Gaze. dividual differences. Proceedings of the 5th international Cambridge: Cambridge University Press. conference on Multimodal interfaces. 36. Batliner A., Hacker C., Kaiser M., Mogele H., Noth 19. Duncan S. (1972) Some signals and rules for taking speak- E. (2007) Taking into account the user’s focus of atten- ing turns in conversations. Journal of Personality and Social tion with the help of audio-visual information: towards less Psychology, vol. 23, no. 2, pp. 283-292 artificial human-machine communication, Auditory-Visual 20. Goodwin C. (1986) Gestures as a resource for the organi- Speech Processing (AVSP 2007). zation of mutual attention. Semiotica, vol. 62, no. 1/2, pp. 37. Lunsford R., Oviatt S., Coulston R., (2005) Audio-visual 29-49 cues distinguishing self- from system-directed speech in 21. Kendon, A. (1967) Some Functions of Gaze Direction in younger and older adults. Proceedings of the 7th inter- Social Interaction. Acta Psychologica. 26: pp. 22-63. national conference on Multimodal interfaces (ICMI’05), 22. Klotz D., Wienke J., Peltason J., Wrede B., Wrede S., pages 167-174. Khalidov V., Odobez J.M. (2011) Engagement-based multi- 38. Diaz, R. & Berk, L.E., ed. (1992), Private speech: From party dialog with a humanoid Robot. Proceedings of SIG- social interaction to self regulation, Erlbaum, New Jersey, DIAL 2011: the 12th Annual Meeting of the Special Interest NJ. Group on Discourse and Dialogue, pages 341-343. 39. Petersen R.C., Doody R., Kurtz A., Mohs R.C., Morris 23. Mutlu B., Shiwa T., Kanda T., Ishiguro H., Hagita N. J.C., Rabins P.V., Ritchie K., Rossor M., Thal L., Winblad (2009) Footing in human-robot conversations: How robots B., (2001) Current concepts in mild cognitive impairment. might shape participants roles using gaze cues. In Proc. of Arch. Neurol. 58, 1985-1992, 2001. ACM Conf. Human Robot Interaction. 40. Wu Y.H., Fassert C., Rigaud A.S. (2011) Designing 24. Rich C., Ponsler B., Holroyd A., Sidner C. L. (2010) Rec- robots for the elderly: appearance issue and beyond. ognizing engagement in human-robot interaction. In Proc. Archives of Gerontology and Geriatrics. of ACM Conf. Human Robot Interaction. 41. Shibata T., Wada K., Saito T., and Tanie K. (2001) 25. Shi C., Shimada M., Kanda T., Ishiguro H., Hagita N. Mental Commit Robot and its Application to Therapy (2011) Spatial Formation Model for Initiating Conversa- of Children. In IEEE/ASME International Conference On tion. Proceedings of Robotics: Science and Systems. AIM’01. 26. Michalowski M.P., Sabanovic S., Simmons R. (2006) A 42. Saint-Aime S., Le Pevedic B., Duhaut D. (2008) Spatial Model of Engagement for a Social Robot. IEEE EmotiRob: an emotional interaction model, In IEEE RO- International Workshop on Advanced Motion Control, pp. MAN 2008, 17th International Symposium on Robot and 762-767. Human Interactive Communication. 27. Mower E., Mataric M. J, and Narayanan S. (2011) A 43. Lee J., Nam T-J. (2006) Augmenting Emotional Inter- Framework for Automatic Human Emotion Classification action Through Physical Movement, UIST2006, the 19th Using emotional Profiles, IEEE Transactions on Audio, Annual ACM Symposium on User Interface Software and Speech, and Language Processing Technology. 44. Steinberger J. (2004) Using Latent Semantic Analysis in 28. Zong, C. and Chetouani, M. (2009). Hilbert-Huang trans- Text Summarization. Evaluation 93-100. form based physiological signals analysis for emotion recog- 45. Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., nition. IEEE Symposium on Signal Processing and Infor- Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, mation Technology (ISSPIT’09). L., Aharonson, V. (2007) The relevance of feature type 29. Peters C., Castellano G., de Freitas S. (2009) An ex- for the automatic classification of emotional user states: ploration of user engagement in HCI. Proceedings of low level descriptors and functionals. Proceedings of Inter- AFFINE’09. speech, pages 2253-2256. 30. Payr S., Wallis P., Cunningham S., Hawley M. (2009) 46. Mahdhoui A. and Chetouani M. (2011) Supervised and Research on Social Engagement with a Rabbitic User semi-supervised infant-directed speech classification for Interface, In Tscheligi M., de Ruyter B., Soldatos J., parent-infant interaction analysis. Speech Communication. Meschtscherjakov A., Buiza C., Streitz N., Mirlacher T. 47. Breazeal C. and Aryananda L. (2002) Recognizing affec- (eds.), Roots for the Future of Ambient Intelligence. Ad- tive intent in robot directed speech, Autonomous Robots, junct Proceedings, 3rd European Conference on Ambient 12:1, pp. 83-104. Intelligence (AmI09), ICT&S Center, Salzburg. 48. Hacker C., Batliner A., and Noth E. (2006) Are you look- 31. Klamer T., Ben Allouch S. (2010) Acceptance and use of ing at me, are you talking with me: multimodal classifica- a social robot by elderly users in a domestic environment, tion of the focus of attention. In Sojka P., Kopcek I., Pala ICST PERVASIVE Health 2010. K. (Eds): TSD 2006, LNAI 4188, pp. 581-588.
  • 13. Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization 13 49. Truong K., van Leeuwen D. (2007) Automatic discrimina- tion between laughter and speech., Speech Communication 49 (2007) 144-158. 50. Boersma P., Weenink D. (2005) Praat, doing phonetics by computer, Tech. rep., Institute of Phonetic Sciences, Uni- versity of Amsterdam, Pays-Bas, URL www.praat.org 51. Shami, M., Verhelst, W. (2007) An Evaluation of the Robustness of Existing Supervised Machine Learning Ap- proaches to the Classification of Emotions, Speech. Speech Communication, vol. 49, issue 3, pages 201-212. 52. Tilsen S. and Johnson, K. (2008). Low-frequency Fourier analysis of speech rhythm. Journal of the Acoustical Society of America, 124:2, pp. EL34-39. 53. Duda, R., Hart, P., Stork, D. (2000) Pattern Classifica- tion, second edition. 54. Vapnik V. (1995) The Nature of Statistical Learning The- ory. Springer-Verlag. 55. Olsen D. R., Goodrich M. (2003) Metrics for Evaluating Human-Robot Interaction, PERMIS 2003. 56. Delaherche E. and Chetouani M. (2010). Multimodal co- ordination: exploring relevant features and measures. Sec- ond International Workshop on Social Signal Processing, ACM Multimedia 2010. 57. Dahlbaeck N., Joensson A., and Ahrenberg L., Wizard of Oz Studies ? Why and How. Proceedings of the 1993 Inter- national Workshop on Intelligent User Interfaces (IUI193), ACM Press, 1993, 193-200. 58. Xiao B. , Lunsford R., Coulston R., Wesson M., Ovi- att S.,(2003) Modeling multimodal integration patterns and performance in seniors: toward adaptive processing of in- dividual differences, Proceedings of the 5th international conference on Multimodal interfaces, Vancouver, British Columbia, Canada 59. Light J.(1997) Communication is the essence of human kife: Reflections on communicative competence. AAC Aug- mentative and Alternative Communication, 61-70.