Total: 18 minutes including questions => 15 minutes for the presentation Thank you very much, chairperson, for your kind introduction. My name is Le Quoc Anh, I am a PhD student from Paris where I work on an expressive gesture model for humanoid robots under the direction of Professor Catherine Pelachaud. Schedule Mechanisme Such as Account Realize Obtain /ob chen/ Architecture /ar ki tec tro/ Exchange /ex s change z/ Twice / wi so/ Table /ta ble/ Creating /cre et ting/ Message /me se/ Virtual /vir tu al/
Mancini ACII2011, expressivity can contribute to convey emotional contents from agent to users. The main objective of my work is to generate communicative gestures for the humanoid robot Nao while it is reading a story. For many years, we have developed a virtual agent, namely Greta that can communicate with human through voice and faces, hand gestures. We want to develop this framework to control non-behaviors of the Nao robot. From given communicative intentions, the GRETA system selects and plans corresponding gestures. The animation scripts are encoded with a symbolic language. My work is then focusing on creating gesture animations for the robot. In detail, the question is how to synchronize gestures with speech and render gestures with expressivities. The work takes place within the French national project GVLEX (it means gesture and voice for an expressive lecture). Its objective is to use the robot NAO to tell a story with expressive gestures to children. The project has four partners: LIMSI works on linguistic aspects, Aldebaran works on robotics (mechanism and an operating system of the robot NAO). Acapela works on speech synthesis (text to speech). And for us, Telecom ParisTech, we work on nonverbal behaviors in general and (especially) expressive gestures accompanying speech in particular.
Recently, several systems have been developed to create gestures for humanoid robots. For example, Salem et al. Base on the gesture engine of virtual agent system MAX to control gestures of the robot ASIMO. These systemes have some common characteristics. For example, they use symbolic languages to specify gestural scripts and the synchronisation of gestures and speech is done by adapting the gestures to the speech. Our system has some differences compared to the others. It follows the SAIBA framework, a standard architecture. In our system, the gesture lexicon is an external parameter that can be modfied to be adapted to a specific agent. The system focus on gestural expressivity also.
In our system, we use BML (a symbolic behavior representation language) to specify an animation script. The expressivity of gestures is translated into a set of expressivity parameters such as the speed of movement, power of movement, etc. We predefine a repository of gesture, called gesture lexicon. The elaboration of gestures is based on gesture annotations extracted from a storytelling video corpus. The system selects and plans gestures from lexicon and then realizes them. The animation is obtained by translating symbolic description of gestures into joint values of the robot. Feasible: It is due to physical constraints Animation is specified by scripts described with BML = gesture specification + descriptions BML
Using a virtual agent framework to control a physical robots raises several problems because the robot has certain physical constraints such as limit of movement space and speed. This is really important point. Our solution is to use the same representation language to control both agent systems (virtual and physical). So that we can use the same algorithm for selecting and planning gesture, but different algorithm for ereating the animation. Additionally, we plan to build a propre gesture database for the robot in which gesture movement space and velocity specification are predefined.
Now I would like to turn to the implemention section. As you can see, the system consists of two seperated modules. The first module, Behavior Planning selects and plans gestures corresponding to the given intentions encoded in the FML message. The second module, Behavior Realizer schedules the phases of gestures and creates gesture animations. In a bit more detail, the same ways for selecting and planning gestures are applied to agents but the method for producing animation is different for each agent. The next slide will talk about gesture lexicons.
Gestures are elaborated with the information of gesture annotations which are made from a storytelling video corpus. We use BML syntaxes to encode gestures. Following the observations of Kendon, a gesture can be devided into several phases (Preparation, Stroke and Retraction), in which the stroke phase is the most important that conveys the meaning of gesture. In the lexicon, only stroke phases are specified. Other phases will generated automatically by the system.
In order to synchronize gestures with speech, the stroke phase of gesture must happen at the same time of emphasized wods. In our system, the timing of gesture stroke phase is specified by synchonization points.
In detail, the system have to predict the duration of the preparation phase so that the agent know exactly when to start the gesture to be synchronized with the speech. In this step, the system verifies whether agent has enough time to do gesture. If not, the gesture have to be deleted. In an other case, if the gesture planned duration is too long, a hold phase should be added to make gesture more natural. The coarticulation between two consecutive gestures is doned by checking the available time between them. If enough time, retraction phase of the first gesture will be executed. Otherwise, the retraction phase is canceled and the hand moves from the end of stroke of the first gesture to the preparation of the second gesture.
Agents can do the same gestures with different ways. That depends on the affective states and personal types For exemple, if the agent is angry, he does hand movements faster and stronger. To make gestures more expressive, we define several parameters such as spatial extent, temporal extent, power, etc.
Agents can do the same gestures with different ways. That depends on the context, the personality, the expression of the agent. For exemple, if the agent is angry, he does hand movements faster and stronger. To make gestures more expressive, we define several parameters such as spatial extent, temporal extent, power, etc.
From the selected gestures, the system plans gestures phases while taking into account expressivity parameters. After that, the symbolic gesture description are translated into joint values and sent to the robot.
Let’s look at a concrete example of this. In the left, that is a description of beat gesture with sadness. The expressivity parameters are set so that the gesture will be done in the way weakly and slowly.
Now you can see the results of the system. http://www.youtube.com/watch?v=MSNHqmIMnpk As you can see,…
I have just presented an expressive gesture model designed and implemented for the humanoid robot NAO. This platform is used not only for the Nao robot but also for the virtual agent Greta. In this model allows us to create gestures with different emotional charateristics. In the near future, we will complete and validate the model through perceptive evaluations. (Each pair may be different in form but convey similar meaning ) Why need expressivity: The same gesture can be presented with different expressions. It depends on the conversational context, the current emotion and the character.
Thank you for your attention. I would be happy to try answer any questions you might have. Remarques: Pas clair d: travailler sur quelle partie du system grete pour nao Concentrer
Gestures in the lexicon are specified symbolically. Following the observation of Kendon, each gesture can be divided into several phases: preparation , stroke and retraction. In which the stroke phase is the most important that conveys the meaning of the gesture. In the lexicon, only the description of the stroke phase is specified with BML. The other phases will be generated automatically by the system. The stroke phase is formed by some key poses. Each key pose is described by the information of wrist position, palm orientation and hand shape. There is maybe some temporal constraints that are included in this description such as minimum time for executing a gesture.
Affective Computing and Intelligent Interaction (ACII 2011)
Expressive Gesture Model for a Humanoid Robot Le Quoc Anh - Catherine Pelachaud CNRS, LTCI Telecom-ParisTech, France Doctoral Consortium, ACII 2011, Memphis, USA
Objectives Generate communicative gestures for Nao robot • Integrated within an existing platform (GRETA) • Scripts with a symbolic language • Synchronization (gestures and speech) • Expressivity of gestures GVLEX project (Gesture & Voice for Expressive Reading) • Robot tells a story expressively. • Partners : LIMSI (linguistic aspects), Aldebaran (robotics), Acapela (speech synthesis), Telecom ParisTech (expressive gestures)page 2 ACII 2011 Le Quoc Anh & Catherine Pelachaud
State of the art Several initiatives recently: Salem et al. (2010), Holroyd et al. (2011), Ng-Thow-Hing et al. (2010), Shi et al. (2010), Nozawa et al. (2006). • Motion scripts with MURML, BML, MPML-HR, etc • Adapt gestures to speech (for synchronization) • Mechanism for receiving and processing feedback from the robot • Gesture animation: no expressivity Our system: Focus on expressivity and synchronization of gestures with speechpage 3 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Our methodology Gesture describes with a symbolic language (BML) Gestural expressivity (amplitude, fluidity, power, repetition, speed, stiffness,…) Elaboration of gestures from a storytelling video corpus (Martin et al., 2009) Execution of the animation by translating into joint valuespage 4 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Problem and Solution Using a virtual agent framework to control a physical robot raises several problems: • Different degrees of freedom • Limit of movement space and speed Solution: • Use the same representation language - same algorithm for selecting and planning gestures - different algorithm for creating the animation • Elaborate one gesture repository for the robot and another one for the Greta agent • Gesture movement space and velocity specificationpage 5 ACII 2011 Le Quoc Anh & Catherine Pelachaud
System Overview WAV fileFML Behavior Planning BML Behavior Realizer Animation Computation Animation Production Lexicons Symbolic Description of Joint Values Instantiation Gesture Phases Lexicon for Nao Temporal Infomation Interpolation Module Lexicon for Greta Behavior Planning: selects and plans gestures. Behavior Realizer: schedules and creates gesture animations. page 6 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Gesture Elaboration• Annotation of gestures from a storytelling video corpus from Martin et al. (2009) Base of gesture elaboration in lexicons• From gesture annotation to entries in Nao lexicon• BML description of each gesture: Gesture->Phases->Hands (wrist position, palm orientation, shape,...) Only stroke phases are specified. Other phases will be generated automatically by the systempage 7 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Synchronization of gestures with speech The stroke phase coincides or precedes emphasized words of the speech (McNeill, 1992) Gesture stroke phase timing specified by synch pointspage 8 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Synchronization of gestures with speech Algorithm • Compute preparation phase • Delete gesture if not enough time (strokeEnd(i-1) > strokeStart(i)+duration) • Add a hold phase to fit gesture planned duration • Coarticulation between several gestures - If enough time, retraction phase (ie go back to rest position) Start end Start end - Otherwise, go from end of stroke to preparation phase of next gesture S-start S-end S-start S-end Start endpage 9 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Gesture expressivity Spatial Extent (SPC): Amplitude of movement Temporal Extent (TMP): Speed of movement Power (PWR): Acceleration of movement Fluidity (FLD): Smoothness and Continuity Repetition (REP): Number of Stroke times Stiffness (STF): Tension/Flexibilitypage 10 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Gesture expressivity Spatial Extent (SPC): Amplitude of movement Temporal Extent (TMP): Speed of movement Power (PWR): Acceleration of movement Fluidity (FLD): Smoothness and Continuity Repetition (REP): Number of Stroke times Stiffness (STF): Tension/Flexibilitypage 11 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Animation Computation & Execution Schedule and plan gestures phases Compute expressivity parameters Translate symbolic descriptions into joint values Execute animation • Send timed key-positions to the robot using available APIs • Animation is obtained by interpolating between joint values with robot built-in proprietary procedures.page 12 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Example<bml> <gesture id=“beat_hungry” min_time="1.0" ><speech id="s1" start="0.0“ <phase type="STROKE-START“>vce=speaker=Antoine spd=180 <hand side=“BOTH">Et le troisième dit tristement: <verticalLocation>YCC</verticalLocation> <horizontalLocation>XCenter</horizontalLocation>vce=speaker=AntoineSad spd=90 pau=200 <distanceLocation>Zmiddle</distanceLocation><tm id="tm1"/>Jai très faim! <handShape>OPENHAND</handShape></speech> <palmOrientation>INWARD</palmOrientation><gesture id="beat_hungry" </hand>start="s1:tm1" end=“start+1.5" stroke="0.5"> </phase><FLD.value>0</FLD.value> <phase type="STROKE-END“ ><OAC.value>0</OAC.value> <hand side=“BOTH"><PWR.value>-1.0</PWR.value> <verticalLocation>YLowerEP</verticalLocation><REP.value>0</REP.value> <horizontalLocation>XCenter</horizontalLocation><SPC.value>-0.3</SPC.value> <distanceLocation>ZNear</distanceLocation><TMP.value>-0.2</TMP.value> <handShape>OPEN</handShape></gesture> <palmOrientation>INWARD</palmOrientation></bml> </hand> </phase> </gesture>animation<phase="preparation", start-time=“Start", end-time="Ready", description of stroke-starts position>animation <phase="stroke", start-time="Stroke-start", end-time="Stroke-end", description of stroke-ends position>animation<phase="retraction", start-time="Relax", end-time="End", description of rest position> page 13 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Examplepage 14 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Conclusion and future work Conclusion • A gesture model is designed, implemented for Nao while taking into account physical constraints of the robot. • Common platform for both virtual agent and robot • Expressivity model • Allows us to create gestures with different affective states and personal style Future work • Build two repositories of gestures, one for Greta and another one for NAO • Improve expressivity and synch of gestures with speech • Receive and process feedback from the robot • Validate the model through perceptive evaluationspage 15 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Acknowledgment This work has been funded by the ANR GVLEX project It is supported from members of the laboratory TSI, Telecom-ParisTechpage 16 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Gesture Specification Gesture->Phases->Hands (wrist position, palm orientation, shape,...) Only stroke phases are specified. Other phases will be generated automatically by the system 1. <gesture id="greeting" category="ICONIC" min_time="1.0“ hand="RIGHT"> 2. <phase type="STROKE-START" twohand="ASSYMMETRIC“ > 3. <hand side="RIGHT"> 4. <vertical_location>YUpperPeriphery</vertical_location> 5. <horizontal_location>XPeriphery</horizontal_location> 6. <location_distance>ZNear</location_distance> 7. <hand_shape>OPEN</handshape> 8. <palm_orientation>AWAY</palm_orientation> 9. </hand> 10. </phase> 11. <phase type="STROKE-END" twohand="ASSYMMETRIC"> 12. <hand side="RIGHT"> 13. <vertical_location>YUpperPeriphery</vertical_location> 14. <horizontal_location>XExtremePeriphery</horizontal_location> 15. <location_distance>ZNear</location_distance> 16. <hand_shape>OPEN</handshape> 17. <palm_orientation>AWAY</palm_orientation> 18. </hand> 19.</phase> 20.</gesture> An example for greeting gesturepage 17 ACII 2011 Le Quoc Anh & Catherine Pelachaud