ICMI 2012 Workshop on gesture and speech production
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

ICMI 2012 Workshop on gesture and speech production

  • 653 views
Uploaded on

In this slides, we present a common gesture speech framework for both virtual agents like ECA, IVA, VH and physical agents like humanoid robots. This framework is designed for different embodiments......

In this slides, we present a common gesture speech framework for both virtual agents like ECA, IVA, VH and physical agents like humanoid robots. This framework is designed for different embodiments so that its processus are independent from a specific agent.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
653
On Slideshare
579
From Embeds
74
Number of Embeds
1

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 74

http://lequocanh.info 74

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Schedule Mechanisme Such as Account Realize Obtain /ob chen/ Architecture /ar ki tec tro/ Exchange /ex s change z/ Twice / wi so/ Table /ta ble/ Creating /cre et ting/ Message /me se/ Virtual /vir tu al/
  • donnes une description des keyframes que contiennent-elles comme information
  • rajouter les definitions manquantes “ Power”: acceleration simulation through slerp (frame interpolation) or trajectory interpolation: use of time variation functions (easing in out functions) Expressive Posture: Volume Editing Power parameter: torso relative rotation varies with time and gesture target positions due to inertia Expressive Animated Sequence: Sequential Editing “ fluidity” and “tension” using TCB spline and noise functions(for trajectory) “ Power”: acceleration simulation through slerp (frame interpolation) or trajectory interpolation: use of time variation functions (easing in out functions)
  • Joint rotation interpolation: use Slerp (spherical linear interpolation) with time warping: easing in out functions. Definition of trajectory parameters: Various trajectory paths: line, circle, spiral, etc. Expressivity: Kochanek Bartels splines(TCB splines)
  • For posture generation, we use Forward kine. FK defines the initial states; the IK retargets the postures. Relative torso movement is first generated by using potential torso target depending on both hand gestures positions. (vt1, vl5) We decompose torso movement into horizontal and vertical movements, it depends on the center of both hands targets, we solve it directly by analytical method. Head direction is generated by FK, and trigonometric function for gaze. For Arm gesture we use a mass spring solver, which can apply light weight shoulder movements by defining arm chain from sternoclavicular till wrist. It allows us to model passive shoulder movement
  • The system of Salem et al. produce gesture parameters > potentially result in mistimed synchronization with speech affiliate due to physical joint velocity limits Max: Gesture shapes are designed for virtual agent > Mapping solution
  • Long-term plan: Mutual synchronization: Adapting phoneme duration to gestures

Transcript

  • 1. A Common Gesture and Speech Production Framework for Virtual and Physical Agents Quoc Anh Le - Jing Huang - Catherine Pelachaud CNRS, LTCI Telecom-ParisTech, France Workshop on Speech and Gesture Production, ICMI 2012, Santa Monica, CA, USA
  • 2. Introduction  Motivations • Similar approaches between virtual agents and humanoid robots • Limits of existing systems: agent dependent  Objectives • Common co-verbal gesture generation framework for both virtual and physical agents  Methodologies • Based on GRETA system • Use - same representation languages - same algorithm for selecting and planning gestures - different algorithms for creating the animationpage 2
  • 3. Architecture Overview Intent Lexicon Behavior Lexicon Input Data (text, audio, Baselines for Nao Gestuary for Nao video, etc) Baselines for Greta Gestuary for Greta Intent Planner Behavior Planner Behavior Realizer (Common Module) (Common Module) (Common Module) FML- FML- BML BML Keyframes APML APML ActiveMQ Messaging Central System Keyframes KeyframesFAP-BAP FAP-BAP Joint Nao Built-in Player Values Animation Realizer Animation Realizer Values Proprietary (Specific Module) (Specific Module) Procedures Greta Nao Animation Lexicon Animation Lexiconpage 3
  • 4. Behavior Realizer Intent Lexicon Behavior Lexicon Behavior Lexicon Input Data (text, audio, Baselines for Nao Gestuary for Nao video, etc) Baselines for Greta Gestuary for Greta Intent Planner Behavior Planner Behavior Realizer (Common Module) (Common Module) (Common Module) FML- FML- BML BML Keyframes APML APML Keyframes KeyframesFAP-BAP FAP-BAP Joint Nao Built-in Player Values Animation Realizer Animation Realizer Values Proprietary (Specific Module) (Specific Module) Procedures Greta Nao Animation Lexicon Animation Lexiconpage 4
  • 5. Behavior Realizer: Outline  Common processes to all agents 1. Create gesture from the gestuary of an agent 2. Schedule timing of gesture phases 3. Generate keyframes: pair (absolute time, symbolic description of hand configuration at this time)  Different databases  For Nao  Gestuary (for instance, pointing with full stretch arm)  Velocity profile (empirically determined from Nao)  For Greta  Gestuary (for instance, pointing with one finger)  Velocity profile (empirically determined from real humans)page 5
  • 6. Example: Different pointing gestures <bml id=“bml1” >Nao Gestuary.. <speech xmlns="" id="s1" start="0"> <text>It is <sync id=« tm1 »/> overthere! <sync id=« tm2 »/> BML Greta Gestuary .. </speech><gesture id=« pointing »> <gesture id=« g1 » lexeme=« pointing » start=«s1:tm1» end=«s2:tm2»> <gesture id=« pointing »><phase type=« stroke »> <vertical>YUpperP</vertical> 1 <description priority=« 1 » type=«GRETA»> <GRETA:SPC>0.80</GRETA:SPC> <GRETA:TMP>0.50</GRETA:TMP> 1 <phase type=« stroke »> <vertical>YP</vertical> <horizontal>XEP</horizontal> <GRETA:FLD>-0.62</GRETA:FLD> <horizontal>XP</horizontal> <distance>XFar<distance> <GRETA:PWR>0.30</GRETA:PWR> <distance>XMiddle<distance> <hShape>OPEN</hShape> <GRETA:REP>0.00</GRETA:REP> <hShape>INDEX</hShape> <GRETA:OPE>1.00</GRETA:OPE></phase> <GRETA:TEN>0.20</GRETA:TEN> </phase></gestures> </description> </gestures>… </gesture> … </bml> 2, 3 2,3 <keyframe 1 (time, description)> <keyframe 1 (time, description)> <keyframe 2 (time, description)> <keyframe 2 (time, description)> … … <keyframe N (time, description)> <keyframe N (time, description)> 4 4 JOINT VALUES BAP page 6
  • 7. BR: Synchronization with speech  Algorithm • Compute preparation phase • Do not perform gesture if not enough time (strokeEnd(i-1) > strokeStart(i) +duration) • Add a hold phase to fit gesture planned duration • Co-articulation between several gestures - If enough time, retraction phase (ie go back to rest position) Start end Start end - Otherwise, go from end of stroke to preparation phase of next gesture S-start S-end S-start S-end end Startpage 7
  • 8. BR: Velocity profiles  Gesture velocity • Predict a movement duration using Fitts’ law: • Movement Time = a+b*log2(Distance+1) • Threshold of maximal speeds (empirically determined) • Stroke phase is different from other phases in velocity and acceleration (Quek, 1995)  Add expressivity • Temportal extent (TMP): Modulate the duration of whole gesture => change coefficient of Fitts’ Lawpage 8
  • 9. BR: Build coefficients of Fitts’ lawpage 9
  • 10. Animation Realizer Intent Lexicon Behavior Lexicon Input Data (text, audio, Baselines for Nao Gestuary for Nao video, etc) Baselines for Greta Gestuary for Greta Intent Planner Behavior Planner Behavior Realizer (Common Module) (Common Module) (Common Module) FML- FML- BML BML Keyframes APML APML Keyframes Keyframes FAP-BAP Joint Values Animation Realizer Animation Realizer Values (Specific Module) (Specific Module) Greta Nao Animation Lexicon Animation Lexiconpage 10
  • 11. Implemented expressivity parametersEXP Definition Nao GretaTMP Velocity of movement Change coefficient of Fitts’ Change coefficient of law Fitts’ lawSPC Amplitude of movement Limited in predefined key Change gesture positions space scalesPWR Acceleration of Modulate stroke duration Modulate stroke movement accelerationREP Number of stroke Yes Yes repetition timesFLD Smoothness and No No ContinuityOPN Relative spatial extent to No elbow swivel angle bodyTEN Muscular tension No No Create animation parameters  Joint values for Nao  BAP values for Greta page 11
  • 12. Create animation parameters  Descritization of the gestural space of McNeill (1992)  One symbolic position will be translated into concrete values of agent joints (for instance 6 joints of Nao as table below) Code ArmX ArmY ArmZ Joint values (LShoulderPitch, LShoulderRoll, LElbowYaw, LElbowRoll, LWristYaw, Hand) 000 XEP YUpperEP ZNear (-54.4953, 22.4979, -79.0171, -5.53477, -0.00240423, 1.0) 001 XEP YUpperEP ZMiddle (-65.5696, 22.0584, -78.7534, -8.52309, -0.178188, 1.0) 002 XEP YUpperEP ZFar (-79.2807, 22.0584, -78.6655,-8.4352, -0.178188, 1.0) 010 XEP YUpperP ZNear (-21.0964, 24.2557, -79.4565, -26.8046, 0.261271, 1.0) ... ... ... ... ...  Translate symbolic keyframes in joint values  Animation is obtained by interpolating between  joint values with robot built-in proprietary procedures  use Slerp (spherical linear interpolation) with time warping: easing in out functionsfor Gretapage 12
  • 13. Greta: Full Body IK Torso IK Analytic Method: Arm To Torso Torso target depending on hand positionpage 13
  • 14. Demo: Gretapage 14
  • 15. Demo: Naopage 15
  • 16. Perceptive Evaluation  Objective • Evaluate how robot’s gestures are perceived by human users  Procedure • Participants (63 French speakers) rate videos of Nao storyteller • Random displayed versions to the participants: - Gestures with expressivity VS. Gestures without expressivity - Gesture-speech synchronization VS. Gesture-speech asynchronization  Results (using the ANOVA method) • Synchronization: - F(1, 124) = 4.94, p < .05 - 76% agreed that gestures were synchronized with speech for sync version • Expressivity: - F(1, 124) = 4.43, p < .05 - 70% agreed that gestures were expressive for expressivity versionpage 16
  • 17. State of the art  Most similar work: Salem et al. (2012) • Same idea (based on existing Max virtual agent system)  Main differences: • Our system: re-designed GRETA as a common framework • Salem et al.’s system: adjusted Max’s ACE to ASIMO robot Features Our model Salem et al.’s system Gesture Product Online from templates Automatically generated from trained regardless specific domain specified domain data corpus Gesture Shapes Agent specific parameter Original for Max and mapped to ASIMO configurations Gesture Timing Agent specific parameter Original for Max and adapted to ASIMO by feedback Expressivity Yes No Synchronization Adapt gesture to speech Cross-Modal Adjustmentpage 17
  • 18. Future works  Short-term plan • Human like gestures: enhance velocity profiles • Expressivity: implement fluidity and tension  Long-term plan • Feedback mechanism • Study of the coherence between consecutive gestures in a G-Unit (Kendon, 2004)page 18