Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization
Noname manuscript No.(will be inserted by the editor)Self-talk discrimination in Human-Robot Interaction SituationsFor Engagement CharacterizationJade LE MAITRE · Mohamed CHETOUANIReceived: date / Accepted: dateAbstract The estimation of engagement is a funda- 1 Introductionmental issue in Human-Robot Interaction and assistiveapplications. In this paper, we describe (1) the design During the past decades, there has been growing in-of triadic situation for cognitive stimulation for elderly terest in service robotics partially due to human assis-users; (2) the characterization of social signals describ- tive applications. The proposed robotic systems are de-ing engagement: system directed speech (SDS) and self- signed to address various supports: physical, cognitivetalk (ST); (3) a framework for estimating an interac- or either social. Human-Robot Interaction (HRI) playstion eﬀort measure revealing the engagement of users. a major role in these applications, identifying more pre-The proposed triadic situation is formed by a user, a cisely Socially Assistive Robotics (SAR)  as a promis-computer providing cognitive exercises and a robot pro- ing ﬁeld. Indeed, SAR aims to aid patients through so-viding encouragements and helps through verbal and cial interaction with several applications, including mo-non-verbal signals. The methodology followed for the tivations and encouragements during exercises [1–3].design of this situation is presented. Wizard-of-Oz ex- Providing social signals during interaction is contin-periments have been carried out and analyzed through uously done during human-human interaction  andeye-contact behaviors and dialogue acts (SDS and ST). lack of them is identiﬁed in pathologies such as autismAn automatic recognition systems of these dialogue acts . Interpretation and generation of social signals al-is proposed with k-NN, decision tree and SVM clas- low sustaining and enriching interactions with conver-siﬁers trained with pitch, energy and rhythmic based sational agents and/or robots. In , an early systemfeatures. The best recognition system achieved an ac- that realizes the full-action reaction cycle of communi-curacy of 71%. Durations of both manually and auto- cation by interpreting multimodal user input and gen-matically labelled SDS and ST were combined to esti- erating multimodal agent behaviors is presented. Themate the Interaction Eﬀort (IE) measure. Experiments importance of feedbacks for the regulation of interac-on collected data prove the eﬀectiveness of the IE mea- tion has been highlighted in several situations [7,8].sure in capturing the engagement of elderly patients The ROBADOM project  is devoted to the designduring the cognitive stimulation task. of a robot-based solution for assistive daily living aids: management of shopping lists, meetings, medicines, re-Keywords Social signal processing · Measuring minders of appointments. Within the project, we areengagement · Prosodic cues developping a speciﬁc robot to provide verbal and non- verbal helps such as encouragement and coaching dur-Jade Le Maitre ing cognitive stimulation exercises. Cognitive stimula-ISIR UMR 7222Universit´ Pierre et Marie Curie e tion is identiﬁed as one of the methodologies alleviatingE-mail: firstname.lastname@example.org the elderly decline in some cognitive functions (memory,Mohamed Chetouani attention) . The robot would be dedicated to MCIISIR UMR 7222 patients (Mild Cognitive Impairment, i.e. the presenceUniversit´ Pierre et Marie Curie e of cognitive impairment that is not severe enough toE-mail: email@example.com meet the criteria of dementia). Cognitive impairment
2 Jade LE MAITRE, Mohamed CHETOUANIis one of the major health problems facing elderly peo- approaches attempt to estimate engagement from gazeple in the new millennium. This does not only refer to [12, 24,13–15] by considering eye-contact as a promi-dementia, but also to lesser degrees of cognitive deﬁcit nent social signal. Eye-contact is usually employed tothat are associated with a decreased quality of life and, regulate the communication between humans : ini-in many cases, progress to dementia. tial contact, turn-taking, triggering backchannels... In the work described in this paper, an engagementmetric is developed for the estimation of interaction Mutual gaze has been shown to contribute to smootheﬀorts during cognitive stimulation exercises. Engage- turn-taking [16,15]. Goﬀman  mentioned that eye-ment is considered as the process to which partners contact during interaction tends to signal each partnerestablish, maintain and end interactions . Engage- that they agree to engage in social interaction. Deﬁ-ment detection is identiﬁed as a key element for the ciency or failure in gaze during interaction may be in-design of socially assistive robots. We propose to study terpreted as lack of interest and attention as noticedengagement in a triadic framework: user - computer by Argyle and Cook . In face-to-face communica-(providing cognitive exercises) - robot (providing en- tion, initiation, regulation and/or disambiguation cancouragements and backchannels). We identiﬁed speciﬁc be achieved by eye-gaze behaviors. Eﬃciency of an in-social signals such as system directed speech and self- teraction is based on the ability of shifting roles, whichtalk as indicators of engagement during interaction. In is again possible via eye-gaze behaviors [19,20]. Dur-this work, engagement is not considered as all-or-none ing interaction, gaze might be combined with speech.phenomenon but rather a continuous characterization Kendon  analyzed these situations and, for instance,is proposed. To our knowledge, this is perhaps the ﬁrst identiﬁed the fact that speakers look away from theirstudy that attempts to automatically estimate a metric, partners at the beginning of an utterance, and look attermed Interaction Eﬀort, by exploiting dialogue acts. their partners at the end of an utterance. This proce-Speciﬁcally, our contributions are: dure might be useful since it serves to avoid cognitive load (i.e. planning of the utterance) as well as shifting – An automatic recognition system for the detection role with the partner. of both system directed speech and self-talk. Self- talk provides insights about the cognitive load of In HRI situations, robots are required to estimate the patient with MCI. We also proposed relevant the engagement of addressee for eﬃcient communica- rhythmic features for the characterization. tion. Estimation of gaze is a diﬃcult task in HRI due to – The deﬁnition and evaluation of a measure of en- greater distances between the robot and the addressee, gagement based the previously detected acts. This consequently other cues such as head orientation, body measure is employed to understand the strategy of posture, and pointing might also be used to indicate the patients during cognitive stimulation exercises. at least direction of attention. Most of the proposed techniques can be seen as based on the concept of face The remainder of this paper is organized as follows: engagement proposed by Goﬀman  to describe theSection 2 describes the related works in human-human process in which people employ eye contact, gaze andand human-machine interactions. Section 4 and section facial gestures to interact with or engage each other. Ba-5 give an overview of the cognitive stimulation situa- sically, the engagement detection framework is based ontion including the design of the robot and the Wizard- (1) face detection and (2) facial/head gestures classiﬁ-of-Oz experiment. Section 6 describes the analysis of cation. In , in order to understand behaviors of thethe manually labelled data for the extraction of dia- potential addressee in human-robot interaction task,logue acts from the Wizard-of-Oz experiment experi- the authors proposed to combine multiple cues. A set ofment: self-talk (ST) and system directed speech (SDS). utterances is deﬁned and used to start an interaction.Section 7 shows and discusses the engagement charac- In addition to the detection step, the authors estimateterization framework. The experiments carried on for visual focus of attention of users. They compute prob-the evaluation of the proposed metric are described in abilities that the partner is looking at a pre-deﬁned listsection 8. Finally, section 9 concludes our work. of possible focus targets. Since the focus targets include the robot itself and other potential users, engagement2 Related work estimation is reinforced and allows to take beneﬁt from the eye-gaze functions without an explicit modeling. A2.1 Social cues of engagement similar work done in  in a multi-robot interaction framework, based on face detection and gestures classi-The problem of detecting engagement has been stud- ﬁcation, makes it possible to select and command indi-ied using verbal and non-vebal cues but many existing vidual robots.
Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization 3 Robots could also use eye-gaze behaviors for the tivated by the fact that, in rehabilitative context, theimprovement of the interaction. Mutlu et al.  con- extraction of implicit information on the patient are ofducted experiments where their robot, Robovie, em- seminal importance. Physiological signals are more cor-ployed various strategies of eye-gaze behaviors to signal related to internal state of a user and consequently canspeciﬁc roles during interaction: addressee, bystander, be employed to infer emotional information . En-and overhearer. The authors show that gaze direction gagement detection from these signals can be used by aserves as a moderator since gaze cues support the con- robot to alter the interaction scenario such as coaching,versation by reinforcing roles and participation of hu- assistance...man subjects. Peters et al.  have highlighted the importance All the previous mentioned works have shown the and the various elements related to engagement. Theybeneﬁt of using eye-gaze behaviors for measuring en- proposed a simpliﬁed model based on an action-cognition-gagement during interaction. However, this social signal perception loop, which makes it possible to diﬀerentiateshould be more precisely characterized. Rich et al.  between the several aspects of engagement: perceptioninvestigated some relevant cues for recognizing engage- (e.g. detection of cues), cognition (e.g. internal state:ment ﬁrstly in human-human interaction and then pro- motivation), action (e.g. display interest). They alsoposed an automatic system for HRI situations. Regard- identiﬁed a dimension termed experience, which aimsing eye-gaze behaviors, they identiﬁed directed gaze and at covering subjective experiences felt by individuals.mutual facial gaze. Directed gaze characterizes events This work shows that engagement is not a simple con-when one person looks at some object following which cept and, as other social signals, investigations on thethe other person looks. Mutual gaze refers to events characterization, detection and understanding are stillwhen one person looks at the other person’s face. As a required for the design of adaptive interfaces includingresult, various features are employed to describe com- robots.municative functions of eye-gaze behaviors. Ishii et al. In this paper, we deal with the characterization of also extract various features from eye-gaze behav- engagement detection in assistive context and more pre-iors such as gaze transitions, occurrence of mutual gaze, cisely in a cognitive stimulation situation with elderlygaze duration, distance of eye movement, and pupil size. people. Research works on related topics are usually de-These features are employed to predict user’s conversa- voted to acceptability [30–32], however maintaining en-tional engagement. The statistical modeling and combi- gagement is identiﬁed as a key component of socially as-nation of the features have shown the relevance of all of sistive robots [33,34]. The use of interactive technologythem in the recognition of user’s attitudes (engagement may be more challenging for elderly users. Xiao et al.vs disengagement).  have investigated how seniors deal with multimodal Other cues can be employed to detect engagement interfaces and they found that elderly people requirein social interactions. Castellano et al.  proposed more time and make errors as it can be expected. But,to combine eye-contact with smiling, which is consid- more interestingly, they identiﬁed that older users em-ered in their speciﬁc game scenario as an indicator of ploy an audible speech register termed self-talk, which isengagement. The authors enriched the characterization a kind of think-aloud process produced during diﬃcultby adding contextual features such as the game state tasks. Discriminating self-talk (SF) from system/robotand the behavior of the robot used (iCat facial expres- directed speech (SDS) is of great importance for twosions). A Bayesian network is employed for the model- main reasons: (1) intended interactions are produceding of cause-eﬀect relationships between the social sig- during SDS, (2) production of SF can be used as annals and contextual features. The evaluation makes it indicator of engagement. The next section aims at de-possible to identify a set of actions done by users corre- scribing more precisely this speciﬁc speech register.lated with engagement. Spatial movements have beenused for initiating conversations  and more generallyfor engagement characterization (see  for an inter- 2.2 Self-talk in machine interactionesting discussion on relevant social cues). All these results show that in various tasks eye-gaze Following the deﬁnition of Oppermann , self-talk, orbehaviors could be combined with others cues in order private speech, refers to audible or visible talk peopleto improve the detection rates. Other strategies can be use to communicate with themselves. This register canfollowed by avoiding the estimation of eye-gaze behav- be considered as a part of oﬀ-talk, which is a special dia-iors. For instance in , the authors employ physiolog- logue act characterizing ”every utterance that is not di-ical signals such as skin response and skin temperature rected to the system as a question, a feedback utterancefor the estimation of engagement. The work was mo- or an instruction”. Oﬀ-talk is identiﬁed as a problem
4 Jade LE MAITRE, Mohamed CHETOUANIfor automatic speech recognition systems and distin- which is traduced by the production of self-talk. By be-guishing it from on-talk (or system directed speech) will ing aware of theses phases, the coach robot will be ableclearly improve recognition rates. However, the charac- to produce useful feedback, encouragement or help.teristics of oﬀ-talk make the task diﬃcult. For instance, In order to identify the interaction phases between ain case the user is reading instructions, lexical informa- therapeuth and a patient, we conducted experiments totion are not discriminant and other features should be acquire interaction datas. In 4 we describe the actionsemployed. One relevant strategy is to try to combine au- of a therapeuth during a cognitive stimulation exercise,dio and visual features as proposed in [36,37]. Batliner and how the robot should be adapted. The robot’s in-et al. formulated the problem by deﬁning on-talk vs oﬀ- teraction are described in section 3, and the technicaltalk and on-view vs oﬀ-view strategies. The combina- setup of the Wizard of Oz experiments are detailed intion of them leads to on-focus (on-talk + on-view) and section (5.2).oﬀ-focus, where on-view is not discriminant: listening After the completion of all experiments, the cor-to someone and looking away. The authors employed pus analysis begins, as described in 6. The very ﬁrstan audio-visual framework for the classiﬁcation of on- analysis is the manual annotation of all the recordstalk from various oﬀ-talk elements (read, paraphras- taken during the experiments. After a gaze and key-ing and spontaneous oﬀ-talk). Prosodic, part-of-speech words annotation, the next step is the pooling of the(POS) features and visual features (a simple face detec- annotated keywords in clusters using a Latent Semantiction system) are employed. The detection of user’s focus Analysis Method. This method groups together seman-on interaction yields 76.6% by using prosodic features tically similar keywords in meaningful clusters, struc-and the combination with linguistic and visual features turing in a non-supervised way the dialogue acts, andallows to achieve 80.8% and 84.5% respectively. giving them a semantic signiﬁcation, as explained in From a conceptual point of view, open-talk (robot- ﬁgure 3.directed speech) is considered as a social speech be- In these clusters occurs another manual annotation,cause it is produced with the objective of communica- splitting the keywords in two categories, whenever theytion, while self-talk is known to be a means for think- were spoken to the robot and/or the computer (System-ing, planning and for self-regulation of behavior . Directed-Speech) or they were spoken to the patientLunsford et al.  investigated audio-visual cues and himself (Self-Talk). With all these labelled keywordsreviewed some functions of self-talk. Among the most (cluster, ST or SDS), we are able to perform an en-interesting ones, the authors reported that self-talk sup- gagement characterization as described in section 7.ports task performance and the self-regulation. Poten-tial beneﬁts of improving estimation of engagement are(1) design of adaptive social interfaces (including robots)(2) improvement of the impact of assistive devices and(3) understanding the strategies and behaviors of indi-viduals. Fig. 1 Steps of engagement characterization3 ObjectivesThe purpose of this paper is engagement characteriza- 4 Patient-Therapeuth Interactiontion in an interaction between a MCI patient and arobot with the help of verbal utterances classiﬁcation Before conducting experiments between a robot ant pa-in Self-Talk or System-Directed Speech. tients, we collected data about how patients interacted As seen in , during the completion of a spatial with a therapist. The patient had to solve exercises ontask by seniors, a high amount of self-talk is observed: a tactile screen, while the therapist, seated near the pa-80% of the subjects of their study engaged in ST at tient, observed the situation and provided help when-some point during their session. This amount increases ever the patient needed one. The therapist could helpwith the diﬃculty level of the task, which is in strong with the technical setup, indicating how to deal withcorrelation with the cognitive load of the person: the the tactile screen, or just provide help for a particu-ST amount increased from low to high diﬃculty tasks lar exercise, how to correct an answer or just say if(26.9% versus 43.7%, respectively). A similar situation the patient answered correctly. The interaction betweenis proposed through the cognitive stimulation experi- the patient, the therapist and the cognitive stimulationment (section 4) and the key idea is to identify phases exercise is a triadic situation, as shown in Figure 2.where the patient are less engaged on this activity, Backchannels of the therapist are important for these
Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization 5patients to gain in conﬁdence and solve these exercises Paro. Various robots ﬁtting these characteristics cancorrectly, even if the therapist doesn’t have to say any- be employed but we selected the rabbit shaped Violetsthing - the mere presence of the therapist matters. Psy- Nabaztag, type Nabaztag: tag. This electronic devicechologists at the Broca Hospital therefore organized ses- has enabled Wi-Fi, and can connect to the internet tosions of recorded cognitive stimulation, which will lead process speciﬁc services via a distant server located aton our side to the analysis of the interaction between http://www.nabaztag.com. The Nabaztag has motor-the patient and the therapist. Thanks to these sessions, ized ears, 4 color LEDs, a speaker and a microphone.we were able to determine the interaction phases of the As it will be described in section 5.2, ears and LEDs aretherapist, and duplicate them later to the robot. The employed to enhance the expressiveness of the Nabaz-therapist is always at the side of the patient, measur- tag. Regarding the acceptability of this robot, exper-ing and evaluating the attention and engagement of the imental results can be founded in other projects suchpatient. The presence of the therapist is important for as SERA (Social Engagement with Robots and Agents)the patient to gain in conﬁdence. As shown in Figure 2, where social engagement is investigated [30, 31].during the cognitive stimulation exercises, the patientsits in front of the tactile screen with the therapist athis right. Because the acceptance of the robot is oursecond concern between the fact that the robot will re-act appropriately, we investigated as well with our col-leagues which kind of robot might be acceptable by the 5.2 Description of the Wizard of Oz Experimentstargeted end-users with Mild-Cognitive Impairments. 5.2.1 Technical and Experimental Design Human-robot communication diﬀers from human-human communication. Therefore, to gather reliable informa- tion about human-robot communication it is important to observe the human behaviour in a situation in which humans believe to be interacting with a real robotic system. It is important that the user thinks he or she is communicating with the system, not a human, as noted for example by . The purpose of our Wizard of Oz experiments is to record interactions between the patient and the robot, using the interactions schemes observed between the patient and the therapist. Af-Fig. 2 Triadic situation either with robot or therapist ter analyzing the videos of the sessions between the therapist and the patient, patterns of interaction were detected (therapist encouraging, therapist answering a question, backchannels) and adapted to the Nabazatg. The purpose was to give to the Nabaztag an interaction5 Patient-Robot Interaction panel, leaving the Wizard in charge the responsibility to choose the right couple [answer+backchannel ] for each5.1 Designing Robot for MCI patients situation.Focus group sessions were conducted at the Broca Hos- The patient seats in front of the tablet-PC, withpital to identify how the elderly perceive a robot’s ap- the Nabaztag at his left as shown in Figure 2. Dur-pearance. 15 adults over the age of 65, divided in three ing the sequence of cognitive exercises solved on thegroups, took part in the sessions. 13 of them were re- tablet PC, the robot interacts with the patient. Thecruited from the Memory Clinic at the Broca Hospi- Wizard gathers informations about the situation withtal; two were recruited from an association for the el- the help of two cameras, and a screen capture of thederly. Seven of the participant suﬀered from Mild Cog- tactile screen. The Wizard can hear the patient, butnitive Impairment, according to the deﬁnition criteria reversed situation is impossible. The Nabaztag is re-of . From the results of the focus group, the robots motely controlled by the Wizard, activating the coupledeﬁned as attractive to them were small robots, often [answer+backchannel ] at the same time. The mean du-with a modern design, shaped like animals or objects ration of a WOZ experiment is 7min30s and with a totalthey could use in their daily life , like Mamoru or of 96min.
6 Jade LE MAITRE, Mohamed CHETOUANI5.2.2 Verbal and Nonverbal Behaviors for the Nabaztag 5.3 Participants Analysis A total of eight participants were chosen by the ther-Regarding the Nabaztag, nonverbal behaviors should apeuths at the Broca hospital, seven females and onebe deﬁned. Similar to other robots, such as Paro , male, aged from 64 to 82, participated in the experi-Emotirob  or Aibo , the Nabaztag can exploit ments. Two of them had a slight MCI, the remainingmovements and sounds as social communicative signals. six did’nt have any communication, hearing, or visionAccording to the work of Lee and Nam  about the impairments. Examples of interaction between the par-relation between physical movement and the augmen- ticipants and the Nabaztag are shown in the followingtation of emotional interaction, the expressions of the two paragraphs, one dedicated to the SDS, the secondNabaztag will be correlated with both speed of move- to ST.ments and LEDS blinking. As shown in Figure 3, slowmovements or blinking LEDS will express unpleasant Example Dialogue Between a User and the Nabaztag:expressions such as sadness or annoyance, expressed System-Directed-Speechwhen the user is getting lost or doesn’t know how tosolve the exercise. Positive expressions are related to Nab. : Good Morning, my name is Carole, what’s youractive movements and blinking, which are employed to name?encourage the user, for instance. User : My name is Bob. Nab. : Hello, Bob. I’m here to help you solve your exer- The nonverbal behaviors for the Nabaztag were im- cises. Do you want to start them now?plemented using the NabazFrame, developed by the User : Yes!University of Bretagne Sud1 . Nonverbal choreographies Nab. : Let’s go! First, you have to drag the images intocontain ear movements and diﬀerent sets of blinking the box corresponding to their name.LEDs, the color depending of the mood or feedback User : How do I do that?we wanted to transmit. Green and yellow fast blinking Nab. : Press on the image, then drag it to the box.LEDs express pleasant expressions, while slow move-ments with blue and violet LEDs express unpleasant Example Dialogue Between a User and the Nabaztag:signals (Fig. 3). These behaviors are currently tested Self-Talkby end-users in another work. User : And I drag the little image with the tree to automn box... oh, it’s not coming! She’s gone, uh, found, i drag it to the box, oh, it escaped... Nab. : When dragging the image to the correct box, you must always touch the screen. User : Uh, yes, I drag it, i drag the image, the tree is in the box. The image with the ﬂowers can’t be the summer, these ﬂowers grow in the spring, oh, i don’t know, I drag the image to the spring, she’s not com- ing, ah, yes, let’s go the the next image. 6 Analysis of the annotation of WOZ experiments 6.1 Manual Annotation of the Content of the WOZ Experiments For each of the 8 participants of the recorded exper-Fig. 3 Relationship between movements and expressions iments, the ﬁrst step was the manual annotation of the videos taken during the Wizard of Oz experimenta- tions. Before the annotation begun, the videos from the two cameras were synchronized and edited together, in order that the annotater could view the patient from the two diﬀerent angles: computer and robot. First, the 1 www-valoria.univ-ubs.fr gaze was annotated, without paying attention to the
Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization 7speech. The annotater carefully annotated when the pa- detection and evaluation through act dialogues such astient was looking at the robot (the setup made clearly self-talk and open-talk is relevant. The eye-contact isvisible when the patient was looking at the robot: from not discriminant in our case, but we found that thethe computer point of view, the person moves her head, verbal production of the patient could help to charac-and from the robot-point of view the person gaze is fo- terize their diﬃculties. Because the robot needs to knowcused on the camera), and then when the patient was when the patient encounter diﬃculties, to produce thelooking directly at the computer (same but reversed). appropriate feedback and help the patient, we decidedThe second step was to perform the speech annotation. to focus our work on the verbal utterances.The annotater ﬁrst listened, then annotated the rele-vant keywords found in the patient’s utterances. Key-words can be a single word, but also an expression of 6.2 Latent semantic analysis of the content ofwords with close signiﬁcation: ”I’m doing well ” will be annotationannotated as one keyword, for example. Filled pauseswere not annotated. The spoken keywords were primary Understanding the content of annotation of interactivedivided in eight simple diﬀerent categories, deﬁned by databases is made diﬃcult by the strategies of indi-the annotator after seeing the whole eight ﬁlms. A gen- viduals in a cognitive stimulation task. Indeed, severaleral structure was deﬁned, as deﬁned below: keywords or utterances correspond to one of the 8 cate- – Agreement : ”Let’s start”, ”Yes”, ”You’re right” gories. To provide insights on the annotation databases, – Technical Question (often about the use of the and consequently on the behaviors exhibited, we per- tactile screen) formed a latent semantic analysis (LSA) . LSA is – Contextual Question (about the cognitive exer- based on a Singular Value Decomposition (SVD) of cise itself, such as ”What should I do with this pic- term-document matrix, and results on a reduced di- ture? ”, ”Should I put it here, or there? ”) mension of the feature space. The interpretation of the – Non-Obligatory Turn-Taking: Comment with- reduced space makes it possible to identify semantic out the need of an answer by the robot (Users with concepts. Altough LSA is usually performed on texts, MCI are often commenting about what they are do- we decided to use it here because the verbal production ing, ”And I move the picture over there...”) of the patients is not spontaneous. The verbal utter- – Obligatory Turn-Taking: Comment with the need ances are structured (the exercise, or their technical of an answer by the robot (such as ”I hope I am do- diﬃculties). Even in the Self-Talk parts of the interac- ing right”), they indicate us the diﬃculty felt by the tion, the verbal production is structured and limited. user The term-by-documents matrix becomes, in our case, – Support Needed: User is confused and need more a keyword-by-clusters matrix. To train LSA we esti- support (”I don’t remember ”, ”I don’t know what I mated the occurrence of each one of the 247 diﬀerent should do? ”) keywords or utterances produced by the 8 participants – Thanks: (some user thanks the Nabaztag when they regarding the 8 categories. And using a Kaiser criterion receive help, and one complimented it about its color (80% of the information), the reduced rank was set to and shape) 5. We carefully analyzed the content (keywords) of the – Disagreement (”No, I don’t want”) obtained clusters and they were interpreted as: The gaze annotation shows that 89.76% of the time – Positive Feeling: positive feeling expressed towardusers are looking at the computer. This is explained by the robot, or toward the exercises.the fact that they were asked to realize the exercises. – Comments: useful keywords, but expressing noth-Other explanations could be found in the speciﬁc design ing than a simple statement about the exercise.of the triadic situation, eye-contact with the robot is not – Social Etiquette: the keywords put in this clusterrequired for eﬀective engagement: the robot is only here were all expressing agreement with the robot(”Okay”,to help and encourage, to solve an exercise the patient ”I’m in”, ”Thanks”). This cluster is based on J.has to gaze at the computer. As previously discussed, Light’s original work, analyzing communication andeye-gaze behaviors might not be discriminant for en- characterizing the communication messages into dif-gagement detection on some tasks, and more speciﬁ- ferent categories .cally for seniors. Similar results have been obtained in – Request Information: all the keywords express-, where 99.5% of the self-talk utterances were asso- ing a question on the exercises were put in this clus-ciated with a gaze behavior directed to the system, and ter. The questions are context based, they depend98.1% for system directed speech. Eye-contact is not on which part of the exercises the patient experi-always discriminant for engagement / disengagement ences diﬃculties.
8 Jade LE MAITRE, Mohamed CHETOUANI – Others: the last cluster, in which the words are rel- highest amount of ST are respectively the Positive Feel- evant but the amount of words is to small to form a ings, with 91 utterances, and the Others cluster, with speciﬁc cluster. This last cluster covers various com- 130 utterances. These two clusters are in direct relation- ments usefulness for the cognitive exercise but to- ship with the exercises: the users expresses his feelings tally relevant for engagement because of the amount toward the exercise in the ﬁrst cluster, and in the last of self-talk (see table 1). one are labelled various utterances about the exercises, expressed by the patients: In fact, all the patients were totally engaged in the interaction and didn’t perform6.3 Speech corpus any other polluting task.Our speech corpus consists of 543 utterances or key-word, and each of them correspond to one of the se- Table 1 Distribution of self-talk and system directed speechmantic cluster. Each utterance or keyword has been over the clusterscarefully annotated: (1) self-talk or (2) system/robot di- Semantic cluster Self-Talk System Directed Speechrected speech. The annotator simply answered to these Positive Feeling 91 9two questions: ”According to you, does the person speak Comments 21 24 Social Etiquette 43 69to herself ? ” (ST) or ”speak to the robot or the com- Request Information 27 33puter? ” (SDS). Others 130 96 We evaluated the subjectivity of the annotation pro-cess by evaluating interjudge agreement. A second naiveannotator was chosen, who didn’t have any contacts On table 2, patients 3 and 7 have the highest num-with the participants, and didn’t even know before the ber of self-talk verbalizations, they are in fact the twodiﬀerences between ST and SDS. This annotator watched MCI patients and this result is well explained consid-the videos and had to choose, based on the same ques- ering their pathology. Patient 3 was the most talkativetions, wether the verbal utterances were ST or SDS. patient, and expressed a various range of positive feel-We then performed an inter-annotator score between ings. A measure of the importance of ST per patientthe very ﬁrst labellization and the one of the naive an- could trace the evolution of a given patient.notator. With the kappa method, the result was a scoreof 0.68 showing a sustainable agreement. Table 2 Number of self-talk and system directed speech The distribution of the utterances over the clustersis given in table 1. Most of the self-talk verbalizations Users Self-Talk System Directed Speech 1 20 19are present in the most heterogeneous semantic cluster 2 1 7(Others), which shows that the amount of self-talk plays 3 106 85a role on engagement. Positive feeling is mostly com- 4 14 2posed by self-talk elements. This can be well explained 5 30 6by the fact that we are dealing with the expression of 6 37 20 7 58 37feelings, or positive comments which appears when the 8 49 55robot gives satisfaction to the user. As the patient isresolving a problem thanks to the robot, he is grate-ful but concentrated on the exercise. He will speak to After an acoustic analysis of all keywords and utter-himself about his feelig of joy, or gratitude, because ances annotated as self-talk and system directed speech,he is concentrated on the exercise. After the exercises we selected 293 utterances for ST and 223 for SDS. Theare complete, the cognitive load of the patient is lower, removed utterances were mainly due to their shorterthus the patients often express their gratitude, but this duration (less than 1s). Durations of the utterances aretime directly to the robot. As one can expect, system between 1 to 2.5s.directed speech is more present on semantic clusterscharacterized by direct relationship to the cognitive ex-ercise: comments, social etiquette and request informa- 7 Using self-talk detection for engagementtions. In fact, these clusters express a direct speech tothe robot because the person is asking, or requesting, In this section we describe the system developed forsomething very precise: as for a discussion between two the measure of engagement from self-talk. Figure 4 de-persons, the patient instinctively direct his speech to picts the proposed system. It requires to discriminatethe system. A high amount of ST doesn’t mean a low self-talk from system directed speech. Then, we com-engagement in the interaction: the two clusters with the bine the duration of both act dialogues in a measure of
Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization 9engagement, which aims at characterizing the degree of 7.2 Classiﬁcationinteraction eﬀort. As previously mentioned, self-talk is produced with After the extraction of features, supervised classiﬁca-a lower intensity compared to open-talk or robot-directed tion is used to categorise the data into classes cor-speech. In this paper, we propose to investigate prosodic responding to (1) self-talk, (2) system/robot directedfeatures including pitch, energy and rhythm. This is speech. This can be implemented using standard ma-motivated by the fact that most of the classiﬁcation chine learning methods. In this study, three diﬀerentframeworks of speciﬁc speech registers such as emotions classiﬁers, decision trees, k-nearest neighbour (k-NN), infant-directed speech , robot-directed speech (k is experimentally set to 3) , Support Vector Ma- are mainly based on the characterization of supra- chines (SVM) with Radial Basis Function , were in-segmental features. In addition, Batliner et al.  have vestigated.shown the relevance of prosodic features and more pre- A k-fold cross-validation scheme was used for thecisely duration features to discriminate on-talk to read experimental setup (k is set to 10) and the performanceoﬀ-talk. However, researchers on speech characterisa- are expressed in terms of the overall accuracy: averagetion and feature extraction show that is diﬃcult to have of the accuracies obtained over the k partitions.a consensus on relevant features for the characterizationof emotions, intentions as well as personality. 7.3 Evaluating engagement Engagement is an interactive process , in which two participants decide when to establish, maintain and end their connection. Thus, it is important to evaluate the engagement and ﬁnd a way to establish the interac- tion eﬀort caused by maintaining the interaction. Once the system directed speech and the self-talk sequences are detected, we propose to combine them to estimate a dimension termed interaction eﬀort. The interactionFig. 4 Description of the proposed system for engagement eﬀort is based on the dialogue: the interaction goesmeasure hand in hand with the dialogue in our experiments. Because this is a verbal interaction, it is important to analyze the diﬀerences in the adressee (self or system) to evaluate the interaction eﬀort. In , interaction ef- fort (IE) is deﬁned as a unitless measure of how much7.1 Feature extraction eﬀort a user put into interacting with a robot. The au- thors show that IE cannot be measured easily because itSeveral studies have shown the relevance of both fun- will require advanced tools such as eye-tracking and/ordamental frequency (F0) and energy based features for brain activity sensors. In our application, we arguedemotion recognition applications . F0 and energy that IE is related to the quantity of robot/system di-were estimated every 20 ms using Praat , and we rected speech, which is characterized by (1) intendedcomputed several statistics for each voiced segment (seg- communication and (2) on-view state (gazing to thement based method) : maximum, minimum, mean, system). While self-talk might be an indicator of plan-standard deviation, interquartile range, mean absolute ning, cognitive load is related to the diﬃculty of thedeviation, quantiles corresponding to the cumulative task. We propose to estimate the interaction eﬀort of aprobability values 0.25 and 0.75. resulting in a 16 di- given human robot interaction during a cognitive stim-mensional vector. ulation situation by the following expression: Rhythmic features were obtained by applying a set SDS IE = (1)of perceptual ﬁlters to the audio signal dedicated to SDS + STcharacterization of prominent events in speech termed SDS and ST refer to the duration of system directedp-centers . Then, we estimated the spectrum of the speech and self-talk (in seconds). The numerator is theprominent signal for characterizing the speaking rate. amount of time of intended interaction, and the denom-We estimate 3 features: mean frequency, entropy and inator is the amount of time of interaction time (speak-barycenter of the spectrum. Diﬀerences in rhythm are ing). IE is a unitless measure (0 ≤ IE ≤ 1). If SDS isindicators of eﬃciency and clarity of interaction (ﬂuid- small relative to ST then IE is quite small. Typically, ef-ity). ﬁcient interactions, which do not require self-regulatory
10 Jade LE MAITRE, Mohamed CHETOUANIbehaviors from elderly people, will obtain an IE close to Table 3 Accuracy of classiﬁers using 10 folds validation1. In some interactions, self-regulatory speech can be a Features Decision Tree k-NN SVMpositive measure, improving the interaction, but in our Pitch-based 49.8% 53.35% 52.16%case, it transcribes the cognitive load of the patient. The Energy-based 55.54 54.29% 59.51%robot has to be aware of the patient’s diﬃculties, linked Rhythm-based 52.78% 56.58% 56.97% Pitch 57.42% 59.28% 64.31%to his cognitive load, we therefore describe the interac- + Energytion as eﬃcient when the cognitive load (the amount of Pitch 55.46% 58.20% 71.62%ST) is low. + Energy The IE measure proposed in this paper allows to + Rhythmevaluate the eﬃciency of the interaction. In future works,this measure will allow to change the verbal and non-verbal behaviors of the robot and more interestingly 8.2 Engagement characterizationto adapt both cognitive exercises and encouragementsprovided. This section describes the estimation of the Interac- tion Eﬀort measure (equation 1). For the evaluation8 Experimental results of only the IE measure, we exploited the manually la- belled data. The results are presented table 4. The bestThis section describes experiments and results performed IE measure is obtained for the user 2 (0.83), but onefor the characterization of engagement based on the de- should be careful with this result since he produced onlytection of self-talk. Firstly, the performance of our self- 1 self-talk and 7 system directed speech utterances (ta-talk detection system is presented then we propose to ble 2). For the most talkative user (patient 3), the IEderivate a measure of engagement. measure provides insights about his behavior: a relative balance between ST and SDS. Patients 4 and 5 Interaction eﬀort estimation’s is8.1 Detection of self-talk under 0.20. It can be easily explained because these patients did not talk directly to the robot. They ad-Table 3 shows the recognition rates of all the classi- dressed the system directly at the beginning of the ex-ﬁers trained with diﬀerent feature sets. In , energy ercise, showing they understood the instructions. Theis found to discriminate system directed speech from other verbal utterances were only comments addressedself-talk. Compared to pitch based features, classiﬁers to themselves. Patient 5 had little diﬃculties, where shetrained with energy are more eﬃcient and best results had to address directly to the robot to obtain a properare obtained by SVM. One possible explanation is that answer, but the amount of her self-talk utterances wasthe extraction of pitch might be more complex for self- too considerable to balance these SDS utterances.talk, which is produced by the users for themselves andconsequently with a lower energy and intelligibility. For the automatic estimation, we followed the frame- In addition to what has been described in the lit- work described ﬁgure 4. A Vocal Activity Detector (VAD),erature concerning the energy, we argued that rhythm suitable for real-time detection in robotics , is em-should be relevant for our application because of the ployed for the segmentation of speech. The self-talk /change of speaking rates observed during self-talk. The system directed speech discrimination system is basedexperimental results show that using only rhythm based on the SVM classiﬁer trained with pitch, energy andfeatures allow to achieve acceptable performance (56.97%) rhythmic features as previously designed. Once the speechbetween those obtained by energy (59.51%) and those of utterances classiﬁed, we extract their duration and atpitch features (52.16%). Energy and rhythmic features the end of the experiment we estimate the IE measure.are more robust. Rhythmic features is related to the Table 4 shows that the IE measures computed by au-vocalic energy of signal  and has similar characteris- tomatic approach capture the individual behaviors oftics to that of short-term energy expect that perceptual each user. In addition, high and low IE measures areﬁlters are employed before the computation of energy eﬃciently characterized. However, for very low IE mea-(from acoustic prominence enhancement). SVM classi- sures such as user 4, the automatic approach under es-ﬁer trained with the three sets of features outperforms timated the performance. This may due to factors such(71.62%) all the conﬁgurations. One should note that as the small amount of verbalizations of this given user.adding features will not exhibit the same performance Due to the imperfect classiﬁcations, some of the IE mea-for all the classiﬁers. Adding features for decision tree sures are over or under estimated but always allowingand k-NN classiﬁers decreases the performance. to characterize a trend.
Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization 11Table 4 Interaction eﬀort (IE) measure estimation Acknowledgements This work has been supported by French National Research Agency (ANR) through TecSan program Users From Annotation From Automatic (project Robadom ANR-09-TECS-012). The authors would Self-talk Detection like to thank the Broca hospital for their work: Ya-Huei Wu, 1 0.5 0.62 Christine Fassert, Victoria Cristancho-Lacroix and Anne-Sophie 2 0.83 0.78 Rigaud 3 0.45 0.53 4 0.13 0.08 5 0.20 0.26 References 6 0.43 0.46 7 0.42 0.38 1. Feil-Seifer D. J., Mataric, M. J. (2005) Deﬁning Socially 8 0.57 0.63 Assistive Robotics. In International Conference on Rehabil- itation Robotics, pages 465-468, Chicago, IL. 2. Fasola J., Mataric, M. J. (2010) Robot Exercise Instruc- tor: A Socially Assistive Robot System to Monitor and En-9 Conclusions courage Physical Exercise for the Elderly. In 19th IEEE International Symposium in Robot and Human InteractiveIn this paper, we have demonstrated promising results Communication (Ro-Man 2010), Viareggio, Italy.on automatically estimating the interaction eﬀorts of 3. Mataric M.J., Tapus A., Winstein C. J., and Eriksson J. (2009) Socially Assistive Robotics for Stroke and Mildpatients during coaching experiment with cognitive stim- TBI Rehabilitation. In Advanced Technologies in Reha-ulation exercises. After an analysis of WOZ experiments bilitation, Andrea Gaggioli, Emily A. Keshner, Patrice L.(triadic situation), we identiﬁed relevant social signals (Tamar) Weiss, Giuseppe Riva (eds.), IOS Press, 249-262. 4. Vinciarelli A., Pantic M., and Bourlard H. (2009) Socialcharacterizing the engagement of elderly patients: speech Signal Processing: Survey of an Emerging Domain,Imagedirected talk and self-talk. The last one is employed and Vision Computing Journal, Vol. 27, no. 12, pp. 1743-during diﬃcult tasks by the users for planning, think- 1759.ing and for self-regulation. We proposed a system to 5. Saint-Georges C., Cassel R.S, Cohen D., Chetouani M., Laznik M-C., Maestro S., and Muratori F. (2010) Whatautomatically detect these two social signals based on studies of family home movies can teach us about autisticthe extraction of relevant features: pitch, energy and infants: A literature review. Research in Autism Spectrumrhythm with three diﬀerent classiﬁers. The experimen- Disorders. Vol 4 No 3, pages 355-366.tal results have shown the discriminative function of 6. Cassell J., Bickmore J., Billinghurst M., Campbell L., Chang K., Vilhj`lmsson H., and Yan H. (1999) Embodi- aenergy as described in the literature. In addition, the ment in conversational interfaces: Rea. In CHI’99, pagesperformance achieved by proposed rhythmic features 520–527, Pittsburgh.demonstrate that users employ a diﬀerent speech regis- 7. Wrede B., Kopp S., Rohlﬁng K., Lohse M., and Muhl C.ter for intended communicative acts. (2010) Appropriate feedback in asymmetric interactions, Journal of Pragmatics, vol. 42, no. 9, pp. 2369 - 2384. Future work in this area should investigate multi- 8. Al Moubayed S., Baklouti M., Chetouani M., Dutoitmodal cues for the extension to oﬀ-talk situations. Oﬀ- T., Mahdhaoui A., Martin J.-C., Ondas S., Pelachaud C., Urbain J., Yilmaz M. (2009) Generating Robot/Agenttalk is deﬁned as the act of not speaking to an ad- Backchannels During a Storytelling Experiment, Indressee, and it includes self-talk but also talking to a ICRA?09, IEEE International Conference on Robotics andthird addressee... In this case, the automatic detection Automation. Kobe, Japan.of on-view states could be of great importance. Among 9. Chetouani M., Wu Y.H., Jost C., Le Pevedic B., Fassert C., Cristancho-Lacroix V., Lassiaille S., Granata, Tapus A.,the automatic cues that should be developed, eye-gaze Duhaut D., Rigaud A.S. (2010) Cognitive Services for El-social signal detection remains one of the challenges. derly People: The ROBADOM project, ECCE 2010 Work-Eye-tracking systems are not really suitable for com- shop: Robots that Care, European Conference on Cognitiveplex assistive applications and will change the behav- Ergonomics. 10. Yanguas J., Buiza C., Etxeberria I., Urdaneta E., Gal-iors during the interaction. Interaction eﬀort can also dona N., Gonzlez M.F. (2008) Eﬀectiveness of a non-include touching and/or manipulation and a more gen- pharmacological cognitive intervention on elderly factorialeral deﬁnition of multimodal and integrative engage- analisys of Donostia Longitudinal Study. Adv. Gerontol. 3,ment characterization should be proposed. 30-41. 11. Oppermann D., Schiel F., Steininger S., Beringer Furthermore, we intent to use the IE measure for N. (2001) Oﬀ-talk - a problem for human-machine-the characterization of users and giving the opportu- interaction?, In EUROSPEECH-2001, 2197-2200. 12. Couture-Beil A., Vaughan R., and Mori G. (2010) Se-nity to adapt the robots behaviors and in our speciﬁc lecting and Commanding Individual Robots in a Vision-task: the encouragements and potentially the diﬃculty Based Multi-Robot System. Seventh Canadian Conferenceof the cognitive exercises. As future work, we will ex- on Computer and Robot Vision (CRV).ploit questionnaires in order to understand and esti- 13. Sidner, Candace L. and Kidd, Cory D. and Lee, Christo- pher and Lesh, Neal (2004) Where to look: a study ofmate the engagement awareness of the users during in- human-robot engagement. Proceedings of the 9th interna-teraction (experience of engagement). tional conference on Intelligent user interfaces (IUI’04).
12 Jade LE MAITRE, Mohamed CHETOUANI14. Castellano G., Pereira A., Leite I., Paiva A., McOwan P. 32. Heerink M., Krose B.J.A., Wielinga B.J., Evers V. (2006) W. (2009) Detecting user engagement with a robot com- The Inﬂuence of a Robot’s Social Abilities on Acceptance panion using task and social interaction-based features. by Elderly Users. Proceedings RO-MAN, Hertfordshire, Proceedings of the 2009 international conference on Mul- september 2006, pp. 521-526 timodal interfaces (ICMI-MLMI’09), pages 119–126. 33. Mataric M.J. (2005) The Role of Embodiment in Assis-15. Ishii R., Shinohara Y., Nakano T., and Nishida T. (2011) tive Interactive Robotics for the Elderly, AAAI Fall Sym- Combining Multiple Types of Eye-gaze Information to Pre- posium on Caring Machines: AI for the Elderly, Arlington, dict User’s Conversational Engagement. 2nd Workshop on VA. Eye Gaze on Intelligent Human Machine Interaction. 34. Tapus A., Tapus C., and Mataric M. J. (2009) The Use16. Nakano Y.I., Ishii R. (2010) Estimating User’s Engage- of Socially Assistive Robots in the Design of Intelligent ment from Eye-gaze Behaviors in Human-Agent Conversa- Cognitive Therapies for People with Dementia, Proceed- tions. in 2010 International Conference on Intelligent User ings, International Conference on Rehabilitation Robotics Interfaces (IUI2010). (ICORR-09), Kyoto, Japan.17. Goﬀman, E. (1963), Behavior in Public Places: Notes on 35. Xiao B., Lunsford R., Coulston R., Wesson M., Oviatt the Social Organization of Gatherings. New York: The Free S. (2003) Modeling multimodal integration patterns and Press. performance in seniors: Toward adaptive processing of in-18. Argyle M. and Cook M. (1976) Gaze and Mutual Gaze. dividual diﬀerences. Proceedings of the 5th international Cambridge: Cambridge University Press. conference on Multimodal interfaces. 36. Batliner A., Hacker C., Kaiser M., Mogele H., Noth19. Duncan S. (1972) Some signals and rules for taking speak- E. (2007) Taking into account the user’s focus of atten- ing turns in conversations. Journal of Personality and Social tion with the help of audio-visual information: towards less Psychology, vol. 23, no. 2, pp. 283-292 artiﬁcial human-machine communication, Auditory-Visual20. Goodwin C. (1986) Gestures as a resource for the organi- Speech Processing (AVSP 2007). zation of mutual attention. Semiotica, vol. 62, no. 1/2, pp. 37. Lunsford R., Oviatt S., Coulston R., (2005) Audio-visual 29-49 cues distinguishing self- from system-directed speech in21. Kendon, A. (1967) Some Functions of Gaze Direction in younger and older adults. Proceedings of the 7th inter- Social Interaction. Acta Psychologica. 26: pp. 22-63. national conference on Multimodal interfaces (ICMI’05),22. Klotz D., Wienke J., Peltason J., Wrede B., Wrede S., pages 167-174. Khalidov V., Odobez J.M. (2011) Engagement-based multi- 38. Diaz, R. & Berk, L.E., ed. (1992), Private speech: From party dialog with a humanoid Robot. Proceedings of SIG- social interaction to self regulation, Erlbaum, New Jersey, DIAL 2011: the 12th Annual Meeting of the Special Interest NJ. Group on Discourse and Dialogue, pages 341-343. 39. Petersen R.C., Doody R., Kurtz A., Mohs R.C., Morris23. Mutlu B., Shiwa T., Kanda T., Ishiguro H., Hagita N. J.C., Rabins P.V., Ritchie K., Rossor M., Thal L., Winblad (2009) Footing in human-robot conversations: How robots B., (2001) Current concepts in mild cognitive impairment. might shape participants roles using gaze cues. In Proc. of Arch. Neurol. 58, 1985-1992, 2001. ACM Conf. Human Robot Interaction. 40. Wu Y.H., Fassert C., Rigaud A.S. (2011) Designing24. Rich C., Ponsler B., Holroyd A., Sidner C. L. (2010) Rec- robots for the elderly: appearance issue and beyond. ognizing engagement in human-robot interaction. In Proc. Archives of Gerontology and Geriatrics. of ACM Conf. Human Robot Interaction. 41. Shibata T., Wada K., Saito T., and Tanie K. (2001)25. Shi C., Shimada M., Kanda T., Ishiguro H., Hagita N. Mental Commit Robot and its Application to Therapy (2011) Spatial Formation Model for Initiating Conversa- of Children. In IEEE/ASME International Conference On tion. Proceedings of Robotics: Science and Systems. AIM’01.26. Michalowski M.P., Sabanovic S., Simmons R. (2006) A 42. Saint-Aime S., Le Pevedic B., Duhaut D. (2008) Spatial Model of Engagement for a Social Robot. IEEE EmotiRob: an emotional interaction model, In IEEE RO- International Workshop on Advanced Motion Control, pp. MAN 2008, 17th International Symposium on Robot and 762-767. Human Interactive Communication.27. Mower E., Mataric M. J, and Narayanan S. (2011) A 43. Lee J., Nam T-J. (2006) Augmenting Emotional Inter- Framework for Automatic Human Emotion Classiﬁcation action Through Physical Movement, UIST2006, the 19th Using emotional Proﬁles, IEEE Transactions on Audio, Annual ACM Symposium on User Interface Software and Speech, and Language Processing Technology. 44. Steinberger J. (2004) Using Latent Semantic Analysis in28. Zong, C. and Chetouani, M. (2009). Hilbert-Huang trans- Text Summarization. Evaluation 93-100. form based physiological signals analysis for emotion recog- 45. Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., nition. IEEE Symposium on Signal Processing and Infor- Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, mation Technology (ISSPIT’09). L., Aharonson, V. (2007) The relevance of feature type29. Peters C., Castellano G., de Freitas S. (2009) An ex- for the automatic classiﬁcation of emotional user states: ploration of user engagement in HCI. Proceedings of low level descriptors and functionals. Proceedings of Inter- AFFINE’09. speech, pages 2253-2256.30. Payr S., Wallis P., Cunningham S., Hawley M. (2009) 46. Mahdhoui A. and Chetouani M. (2011) Supervised and Research on Social Engagement with a Rabbitic User semi-supervised infant-directed speech classiﬁcation for Interface, In Tscheligi M., de Ruyter B., Soldatos J., parent-infant interaction analysis. Speech Communication. Meschtscherjakov A., Buiza C., Streitz N., Mirlacher T. 47. Breazeal C. and Aryananda L. (2002) Recognizing aﬀec- (eds.), Roots for the Future of Ambient Intelligence. Ad- tive intent in robot directed speech, Autonomous Robots, junct Proceedings, 3rd European Conference on Ambient 12:1, pp. 83-104. Intelligence (AmI09), ICT&S Center, Salzburg. 48. Hacker C., Batliner A., and Noth E. (2006) Are you look-31. Klamer T., Ben Allouch S. (2010) Acceptance and use of ing at me, are you talking with me: multimodal classiﬁca- a social robot by elderly users in a domestic environment, tion of the focus of attention. In Sojka P., Kopcek I., Pala ICST PERVASIVE Health 2010. K. (Eds): TSD 2006, LNAI 4188, pp. 581-588.
Self-talk discrimination in Human-Robot Interaction Situations For Engagement Characterization 1349. Truong K., van Leeuwen D. (2007) Automatic discrimina- tion between laughter and speech., Speech Communication 49 (2007) 144-158.50. Boersma P., Weenink D. (2005) Praat, doing phonetics by computer, Tech. rep., Institute of Phonetic Sciences, Uni- versity of Amsterdam, Pays-Bas, URL www.praat.org51. Shami, M., Verhelst, W. (2007) An Evaluation of the Robustness of Existing Supervised Machine Learning Ap- proaches to the Classiﬁcation of Emotions, Speech. Speech Communication, vol. 49, issue 3, pages 201-212.52. Tilsen S. and Johnson, K. (2008). Low-frequency Fourier analysis of speech rhythm. Journal of the Acoustical Society of America, 124:2, pp. EL34-39.53. Duda, R., Hart, P., Stork, D. (2000) Pattern Classiﬁca- tion, second edition.54. Vapnik V. (1995) The Nature of Statistical Learning The- ory. Springer-Verlag.55. Olsen D. R., Goodrich M. (2003) Metrics for Evaluating Human-Robot Interaction, PERMIS 2003.56. Delaherche E. and Chetouani M. (2010). Multimodal co- ordination: exploring relevant features and measures. Sec- ond International Workshop on Social Signal Processing, ACM Multimedia 2010.57. Dahlbaeck N., Joensson A., and Ahrenberg L., Wizard of Oz Studies ? Why and How. Proceedings of the 1993 Inter- national Workshop on Intelligent User Interfaces (IUI193), ACM Press, 1993, 193-200.58. Xiao B. , Lunsford R., Coulston R., Wesson M., Ovi- att S.,(2003) Modeling multimodal integration patterns and performance in seniors: toward adaptive processing of in- dividual diﬀerences, Proceedings of the 5th international conference on Multimodal interfaces, Vancouver, British Columbia, Canada59. Light J.(1997) Communication is the essence of human kife: Reﬂections on communicative competence. AAC Aug- mentative and Alternative Communication, 61-70.