This document introduces a system for generating communicative expressive gestures that accompany speech for virtual and physical agents. The key aspects of the system are:
1) It takes into account gesture expressivity when animating gestures on the fly from abstract gesture templates.
2) It schedules gestures to ensure their execution is tightly synchronized with speech.
3) The first implementation of this system controls co-verbal gestures for the Greta virtual agent and Nao physical robot. It represents an attempt to develop a common framework to control gesture behaviors for both virtual and physical agents.
This paper presents an expressive gesture model for generating communicative gestures with speech for the humanoid robot Nao. The model extends an existing virtual agent platform called GRETA to control gestures of both virtual and physical agents. Gestures are stored symbolically in a lexicon and selected based on intentions and emotions. Parameters of gestural expressivity like temporal extension and fluidity are applied when gestures are instantiated on the robot while considering its physical limitations.
Self-talk discrimination in Human-Robot Interaction Situations For Engagement...Jade Le Maitre
This document describes a study on developing a metric to characterize engagement in human-robot interaction situations for cognitive stimulation exercises with elderly users. The researchers designed a triadic situation involving a user, a computer providing exercises, and a robot providing encouragement. They analyzed social signals like self-talk and system-directed speech during wizard-of-oz experiments. An automatic recognition system was developed to detect these dialogue acts, achieving 71% accuracy. The durations of detected acts were combined to estimate an "Interaction Effort" measure of user engagement during exercises. The measure effectively captured engagement levels of elderly patients in cognitive stimulation tasks.
A probabilistic model for recursive factorized image featuresirisshicat
This document proposes a probabilistic model for learning hierarchical visual representations in a recursive manner. The model, based on Latent Dirichlet Allocation, learns image features at multiple layers of abstraction jointly rather than in a strictly feedforward way. The model represents local image patches as distributions over visual words at the lowest layer, and higher layers learn distributions over the representations of lower layers. Evaluating the model on a standard recognition dataset, it outperforms existing hierarchical models and achieves performance on par with state-of-the-art single-feature models, demonstrating the benefits of joint learning and inference in hierarchical visual processing.
Ontology 101 - New York Semantic Technology ConferenceRobert Kost
The document provides an introduction to ontologies and knowledge representation. It defines ontologies as specifications of conceptualizations that use formal logic to describe domains of interest through terminology, concepts, properties, and relations. The document also introduces the Web Ontology Language (OWL) as a tool for building ontologies and discusses how ontologies can be used to enable accurate search, information sharing, and machine understanding across different sources and applications.
This document provides a high-level summary of an introduction to ontology and knowledge representation:
1) It defines ontology as a specification of a conceptualization that provides a rich description of terminology, concepts, properties, and relations in a particular domain.
2) It explains that knowledge representation uses formal symbolic expressions to represent knowledge and ideas, and that logic and ontology can be applied to build computable models of domains.
3) It gives an overview of some key aspects of knowledge representation languages like their vocabulary of logical and domain-dependent symbols, their syntax for expressing statements, and their semantics for interpreting statements.
The document discusses dynamic service generation through agent interactions on the Grid. It introduces dynamic service generation and describes key concepts of service-oriented computing and the Grid, including Grid services and their lifecycle. It also covers multi-agent systems and the STROBE model, which defines agent representation and communication. Finally, it proposes a service-based integration of Grid and multi-agent system technologies.
The Role of Skills and Qualifications in Ensuring Quality of Service DeliveryFEANTSA
Presentation given by Paolo Brusa, Multipolis, Italy, at a FEANTSA Conference on "Quality in Social Services from the Perspective of Services Working with Homeless People", Luxembourg City, Luxembourg, 2011
This document discusses personalization in e-learning and the challenges it poses for instructional designers. It covers 5 types of personalization and challenges including understanding learner-content interactions and designing personalized learning paths. The key needs for personalization are identified as a learner model, learning object design model, ontologies, and learning analytics. Several studies are summarized that explore modeling learner characteristics, visual search performance, memory spans, navigation design, and levels of processing to better understand learners and design personalized instruction. Challenges remain around differentiating learning paths for individuals and groups.
This paper presents an expressive gesture model for generating communicative gestures with speech for the humanoid robot Nao. The model extends an existing virtual agent platform called GRETA to control gestures of both virtual and physical agents. Gestures are stored symbolically in a lexicon and selected based on intentions and emotions. Parameters of gestural expressivity like temporal extension and fluidity are applied when gestures are instantiated on the robot while considering its physical limitations.
Self-talk discrimination in Human-Robot Interaction Situations For Engagement...Jade Le Maitre
This document describes a study on developing a metric to characterize engagement in human-robot interaction situations for cognitive stimulation exercises with elderly users. The researchers designed a triadic situation involving a user, a computer providing exercises, and a robot providing encouragement. They analyzed social signals like self-talk and system-directed speech during wizard-of-oz experiments. An automatic recognition system was developed to detect these dialogue acts, achieving 71% accuracy. The durations of detected acts were combined to estimate an "Interaction Effort" measure of user engagement during exercises. The measure effectively captured engagement levels of elderly patients in cognitive stimulation tasks.
A probabilistic model for recursive factorized image featuresirisshicat
This document proposes a probabilistic model for learning hierarchical visual representations in a recursive manner. The model, based on Latent Dirichlet Allocation, learns image features at multiple layers of abstraction jointly rather than in a strictly feedforward way. The model represents local image patches as distributions over visual words at the lowest layer, and higher layers learn distributions over the representations of lower layers. Evaluating the model on a standard recognition dataset, it outperforms existing hierarchical models and achieves performance on par with state-of-the-art single-feature models, demonstrating the benefits of joint learning and inference in hierarchical visual processing.
Ontology 101 - New York Semantic Technology ConferenceRobert Kost
The document provides an introduction to ontologies and knowledge representation. It defines ontologies as specifications of conceptualizations that use formal logic to describe domains of interest through terminology, concepts, properties, and relations. The document also introduces the Web Ontology Language (OWL) as a tool for building ontologies and discusses how ontologies can be used to enable accurate search, information sharing, and machine understanding across different sources and applications.
This document provides a high-level summary of an introduction to ontology and knowledge representation:
1) It defines ontology as a specification of a conceptualization that provides a rich description of terminology, concepts, properties, and relations in a particular domain.
2) It explains that knowledge representation uses formal symbolic expressions to represent knowledge and ideas, and that logic and ontology can be applied to build computable models of domains.
3) It gives an overview of some key aspects of knowledge representation languages like their vocabulary of logical and domain-dependent symbols, their syntax for expressing statements, and their semantics for interpreting statements.
The document discusses dynamic service generation through agent interactions on the Grid. It introduces dynamic service generation and describes key concepts of service-oriented computing and the Grid, including Grid services and their lifecycle. It also covers multi-agent systems and the STROBE model, which defines agent representation and communication. Finally, it proposes a service-based integration of Grid and multi-agent system technologies.
The Role of Skills and Qualifications in Ensuring Quality of Service DeliveryFEANTSA
Presentation given by Paolo Brusa, Multipolis, Italy, at a FEANTSA Conference on "Quality in Social Services from the Perspective of Services Working with Homeless People", Luxembourg City, Luxembourg, 2011
This document discusses personalization in e-learning and the challenges it poses for instructional designers. It covers 5 types of personalization and challenges including understanding learner-content interactions and designing personalized learning paths. The key needs for personalization are identified as a learner model, learning object design model, ontologies, and learning analytics. Several studies are summarized that explore modeling learner characteristics, visual search performance, memory spans, navigation design, and levels of processing to better understand learners and design personalized instruction. Challenges remain around differentiating learning paths for individuals and groups.
ICMI 2012 Workshop on gesture and speech productionLê Anh
In this slides, we present a common gesture speech framework for both virtual agents like ECA, IVA, VH and physical agents like humanoid robots. This framework is designed for different embodiments so that its processus are independent from a specific agent.
Dokumen ini membahas tentang empat jenis gabungan bunyi dalam bahasa Melayu, yaitu diftong yang terdiri dari dua vokal yang disebut dalam satu suku kata, vokal berganding yang terdiri dari dua vokal terpisah sebutannya, digraf yang terdiri dari dua konsonan tertentu, dan konsonan bergabung yang terdiri dari dua atau lebih konsonan berturutan dalam satu kata. Contoh-contoh diberikan
Automatic vs. human question answering over multimedia meeting recordingsLê Anh
Information access in meeting recordings can be assisted by
meeting browsers, or can be fully automated following a
question-answering (QA) approach. An information access task
is defined, aiming at discriminating true vs. false parallel statements
about facts in meetings. An automatic QA algorithm is
applied to this task, using passage retrieval over a meeting transcript.
The algorithm scores 59% accuracy for passage retrieval,
while random guessing is below 1%, but only scores 60% on
combined retrieval and question discrimination, for which humans
reach 70%–80% and the baseline is 50%. The algorithm
clearly outperforms humans for speed, at less than 1 second
per question, vs. 1.5–2 minutes per question for humans. The
degradation on ASR compared to manual transcripts still yields
lower but acceptable scores, especially for passage identification.
Automatic QA thus appears to be a promising enhancement
to meeting browsers used by humans, as an assistant for
relevant passage identification.
The document describes a system for generating co-speech gestures for the humanoid robot NAO through the Behavior Markup Language (BML). The system extends an existing virtual agent system to generate communicative gestures for both virtual and physical agents like NAO. It takes as input a specification of multi-modal behaviors encoded in BML and synchronizes and realizes the verbal and nonverbal behaviors on the robot. The system includes a behavior planner that selects gestures from a repository and a behavior realizer that generates the animations displayed by the robot based on the BML output from the planner.
This document summarizes a research paper about designing and implementing an expressive gesture model for a humanoid robot. The goal is to equip a humanoid robot named NAO with the ability to perform communicative gestures while telling a story. The system selects gestures from a database called a lexicon based on the intentions and emotions to convey. It then plans the timing of the gestures to synchronize with speech. The gestures are instantiated as robot joint movements and sent to the robot to execute. The research aims to address the robot's physical constraints like limited movement and joint speeds when generating gestures.
IRJET- CARES (Computerized Avatar for Rhetorical & Emotional Supervision)IRJET Journal
This document proposes the CARES (Computerized Avatar for Rhetorical and Emotional Supervision) framework. CARES aims to generate a 3D photo-realistic voice-enabled avatar that can express sentiments and emotions to act as a personal assistant. The avatar would use integrated aural and visual capabilities to provide emotional support to users. It would allow users to interact with virtual versions of deceased family members to stay connected. The framework builds upon prior work on avatar development, speech synthesis, and synchronizing facial animation with speech. It also incorporates techniques for modeling facial morphology and generating emotional transitions between different states.
Computer Science
Active and Programmable Networks
Active safety systems
Ad Hoc & Sensor Network
Ad hoc networks for pervasive communications
Adaptive, autonomic and context-aware computing
Advance Computing technology and their application
Advanced Computing Architectures and New Programming Models
Advanced control and measurement
Aeronautical Engineering,
Agent-based middleware
Alert applications
Automotive, marine and aero-space control and all other control applications
Autonomic and self-managing middleware
Autonomous vehicle
Biochemistry
Bioinformatics
BioTechnology(Chemistry, Mathematics, Statistics, Geology)
Broadband and intelligent networks
Broadband wireless technologies
CAD/CAM/CAT/CIM
Call admission and flow/congestion control
Capacity planning and dimensioning
Changing Access to Patient Information
Channel capacity modelling and analysis
Civil Engineering,
Cloud Computing and Applications
Collaborative applications
Communication application
Communication architectures for pervasive computing
Communication systems
Computational intelligence
Computer and microprocessor-based control
Computer Architecture and Embedded Systems
Computer Business
Computer Sciences and Applications
Computer Vision
Computer-based information systems in health care
Computing Ethics
Computing Practices & Applications
Congestion and/or Flow Control
Content Distribution
Context-awareness and middleware
Creativity in Internet management and retailing
Cross-layer design and Physical layer based issue
Cryptography
Data Base Management
Data fusion
Data Mining
Data retrieval
Data Storage Management
Decision analysis methods
Decision making
Digital Economy and Digital Divide
Digital signal processing theory
Distributed Sensor Networks
Drives automation
Drug Design,
Drug Development
DSP implementation
E-Business
E-Commerce
E-Government
Electronic transceiver device for Retail Marketing Industries
Electronics Engineering,
Embeded Computer System
Emerging advances in business and its applications
Emerging signal processing areas
Enabling technologies for pervasive systems
Energy-efficient and green pervasive computing
Environmental Engineering,
Estimation and identification techniques
Evaluation techniques for middleware solutions
Event-based, publish/subscribe, and message-oriented middleware
Evolutionary computing and intelligent systems
Expert approaches
Facilities planning and management
Flexible manufacturing systems
Formal methods and tools for designing
Fuzzy algorithms
Fuzzy logics
GPS and location-based app
CONSIDERATION OF HUMAN COMPUTER INTERACTION IN ROBOTIC FIELD ijcsit
This document discusses considerations for improving human-robot interaction through applying principles of human-computer interaction. It summarizes several existing models of human-computer interaction and adapts the action theory model to the context of human-robot interaction. The adapted model incorporates the simulation of robot emotions and awareness of human emotions into the interaction process. The summary then provides examples of how this adapted model could be applied to scenarios of social interaction between humans and robots.
Command, Goal Disambiguation, Introspection, and Instruction in Gesture-Free ...Vladimir Kulyukin
The document discusses a robotic office assistant capable of gesture-free spoken dialogue with a human operator. It describes the robot's architecture, which includes a deliberation tier for knowledge processing, an execution tier for managing behaviors, and a control tier for interacting with the world through skills. The robot's knowledge representation includes a semantic network for symbolic knowledge shared between language and vision, as well as procedural knowledge in the form of behaviors and skills. The document focuses on how the robot achieves command, goal disambiguation, introspection, and instruction through dialogue without gestures.
An ontology for semantic modelling of virtual worldijaia
This article presents a new representation of semantic virtual environments. We propose to use the ontology as a tool for implementation. Our model, called SVHsIEVs1 provides a consistent representation of the following aspects: the simulated environment, its structure, and the knowledge items using ontology, interactions and tasks that virtual humans can perform in the environment. In SVHsIEVs, we find two type of ontology: the global ontology and the local ontology for Virtual Human. Our architecture has been successfully tested in 3D dynamic environments.
We demonstrate a prototype allowing the unsupervised modeling of the structure of task-oriented dialogues. The prototype aims to assist the conception of a conversational agent architecture. The graphical representation displays the main stages of the dialogues and the transitions between them. Our tool allows to manipulate this graphical representation. We detail the various functionalities demonstrated.
AN EMOTIONAL MIMICKING HUMANOID BIPED ROBOT AND ITS QUANTUM ...butest
The document describes research on developing an emotional mimicking humanoid biped robot controlled using a quantum computing approach. The robot is able to mimic human gestures seen by a camera using a finite state machine. The researchers acquired two KHR-1 robots and integrated them into their robot theater system. They used OpenCV for computer vision and interfaced the robots with a state machine behavior controller. The researchers propose using the D-Wave quantum computer to help solve large constraint satisfaction problems needed for complex whole-body motion planning in real-time. This would allow the robot to exhibit a broader range of emotional behaviors and movements.
AN EMOTIONAL MIMICKING HUMANOID BIPED ROBOT AND ITS QUANTUM ...butest
This document summarizes research on developing an emotional mimicking humanoid biped robot controlled using a quantum computing approach. Key points include:
- The robot responds to human gestures seen by a camera using a finite state machine or constraints-satisfaction model linking vision, motion, emotions and planning.
- Quantum computing could quadratically speed up the constraints problems required for large-scale motion planning, emotions and behaviors, allowing real-time solutions.
- The researchers propose using the D-Wave Orion quantum computer to control a humanoid robot.
- They acquired two KHR-1 robots and integrated sensors, vision, speech and a robot programming language to develop the robot and research emotional robotics, robot
How women think robots perceive them – as if robots were men Matthijs Pontier
In previous studies, we developed an empirical account of user engagement with software agents. We
formalized this model, tested it for internal consistency, and implemented it into a series of software agents to
have them build up an affective relationship with their users. In addition, we equipped the agents with a module
for affective decision-making, as well as the capability to generate a series of emotions (e.g., joy and anger). As
follow-up of a successful pilot study with real users, the current paper employs a non-naïve version of a Turing
Test to compare an agent’s affective performance with that of a human. We compared the performance of an
agent equipped with our cognitive model to the performance of a human that controlled the agent in a Wizard
of Oz condition during a speed-dating experiment in which participants were told they were dealing with a
robot in bot h conditions. Participants did not detect any differences between the two conditions in the
emotions the agent experienced and in the way he supposedly perceived the participants. As is, our model can
be used for designing believable virtual agents or humanoid robots on the surface level of emotion expression.
To Ask or To Sense? Planning to Integrate Speech and Sensorimotor Actstoukaigi
This document describes research into developing a conversational robot that can integrate speech acts and sensorimotor acts when resolving ambiguities. The robot needs to decide whether to ask a clarifying question or perform a sensory action like moving its head to see from a different perspective. The researchers present a planning algorithm that treats speech acts and sensory actions in a common framework by calculating the expected costs and information rewards of different actions. They evaluate the algorithm's performance under various settings and discuss possible extensions.
The document discusses visual interpretation of hand gestures for human-computer interaction. It proposes using pointing gestures with a depth camera to interact with large displays. The system tracks hand movements using RGB-D cameras and uses the hand position and orientation to control the movement and rotation of virtual objects in a display. It discusses approaches for modeling, recognizing, and analyzing hand gestures as well as applications of gesture-based interaction systems. The methodology presented uses color segmentation and centroid tracking of a user's hand to determine coordinates and control a virtual object similarly to a computer mouse.
Temporal Reasoning Graph for Activity RecognitionIRJET Journal
This document discusses using a convolutional neural network and background subtraction for human activity recognition in videos. It proposes a model that uses CNN to extract features from video frames and classify human activities. The proposed system first acquires and preprocesses video data. It then extracts frames from the videos using background subtraction. These frames are split into training and testing sets for the CNN model. The CNN model is tested on the testing set to evaluate its ability to accurately classify human activities. Experimental results show the CNN model combined with background subtraction achieves good performance for human activity recognition.
ICMI 2012 Workshop on gesture and speech productionLê Anh
In this slides, we present a common gesture speech framework for both virtual agents like ECA, IVA, VH and physical agents like humanoid robots. This framework is designed for different embodiments so that its processus are independent from a specific agent.
Dokumen ini membahas tentang empat jenis gabungan bunyi dalam bahasa Melayu, yaitu diftong yang terdiri dari dua vokal yang disebut dalam satu suku kata, vokal berganding yang terdiri dari dua vokal terpisah sebutannya, digraf yang terdiri dari dua konsonan tertentu, dan konsonan bergabung yang terdiri dari dua atau lebih konsonan berturutan dalam satu kata. Contoh-contoh diberikan
Automatic vs. human question answering over multimedia meeting recordingsLê Anh
Information access in meeting recordings can be assisted by
meeting browsers, or can be fully automated following a
question-answering (QA) approach. An information access task
is defined, aiming at discriminating true vs. false parallel statements
about facts in meetings. An automatic QA algorithm is
applied to this task, using passage retrieval over a meeting transcript.
The algorithm scores 59% accuracy for passage retrieval,
while random guessing is below 1%, but only scores 60% on
combined retrieval and question discrimination, for which humans
reach 70%–80% and the baseline is 50%. The algorithm
clearly outperforms humans for speed, at less than 1 second
per question, vs. 1.5–2 minutes per question for humans. The
degradation on ASR compared to manual transcripts still yields
lower but acceptable scores, especially for passage identification.
Automatic QA thus appears to be a promising enhancement
to meeting browsers used by humans, as an assistant for
relevant passage identification.
The document describes a system for generating co-speech gestures for the humanoid robot NAO through the Behavior Markup Language (BML). The system extends an existing virtual agent system to generate communicative gestures for both virtual and physical agents like NAO. It takes as input a specification of multi-modal behaviors encoded in BML and synchronizes and realizes the verbal and nonverbal behaviors on the robot. The system includes a behavior planner that selects gestures from a repository and a behavior realizer that generates the animations displayed by the robot based on the BML output from the planner.
This document summarizes a research paper about designing and implementing an expressive gesture model for a humanoid robot. The goal is to equip a humanoid robot named NAO with the ability to perform communicative gestures while telling a story. The system selects gestures from a database called a lexicon based on the intentions and emotions to convey. It then plans the timing of the gestures to synchronize with speech. The gestures are instantiated as robot joint movements and sent to the robot to execute. The research aims to address the robot's physical constraints like limited movement and joint speeds when generating gestures.
IRJET- CARES (Computerized Avatar for Rhetorical & Emotional Supervision)IRJET Journal
This document proposes the CARES (Computerized Avatar for Rhetorical and Emotional Supervision) framework. CARES aims to generate a 3D photo-realistic voice-enabled avatar that can express sentiments and emotions to act as a personal assistant. The avatar would use integrated aural and visual capabilities to provide emotional support to users. It would allow users to interact with virtual versions of deceased family members to stay connected. The framework builds upon prior work on avatar development, speech synthesis, and synchronizing facial animation with speech. It also incorporates techniques for modeling facial morphology and generating emotional transitions between different states.
Computer Science
Active and Programmable Networks
Active safety systems
Ad Hoc & Sensor Network
Ad hoc networks for pervasive communications
Adaptive, autonomic and context-aware computing
Advance Computing technology and their application
Advanced Computing Architectures and New Programming Models
Advanced control and measurement
Aeronautical Engineering,
Agent-based middleware
Alert applications
Automotive, marine and aero-space control and all other control applications
Autonomic and self-managing middleware
Autonomous vehicle
Biochemistry
Bioinformatics
BioTechnology(Chemistry, Mathematics, Statistics, Geology)
Broadband and intelligent networks
Broadband wireless technologies
CAD/CAM/CAT/CIM
Call admission and flow/congestion control
Capacity planning and dimensioning
Changing Access to Patient Information
Channel capacity modelling and analysis
Civil Engineering,
Cloud Computing and Applications
Collaborative applications
Communication application
Communication architectures for pervasive computing
Communication systems
Computational intelligence
Computer and microprocessor-based control
Computer Architecture and Embedded Systems
Computer Business
Computer Sciences and Applications
Computer Vision
Computer-based information systems in health care
Computing Ethics
Computing Practices & Applications
Congestion and/or Flow Control
Content Distribution
Context-awareness and middleware
Creativity in Internet management and retailing
Cross-layer design and Physical layer based issue
Cryptography
Data Base Management
Data fusion
Data Mining
Data retrieval
Data Storage Management
Decision analysis methods
Decision making
Digital Economy and Digital Divide
Digital signal processing theory
Distributed Sensor Networks
Drives automation
Drug Design,
Drug Development
DSP implementation
E-Business
E-Commerce
E-Government
Electronic transceiver device for Retail Marketing Industries
Electronics Engineering,
Embeded Computer System
Emerging advances in business and its applications
Emerging signal processing areas
Enabling technologies for pervasive systems
Energy-efficient and green pervasive computing
Environmental Engineering,
Estimation and identification techniques
Evaluation techniques for middleware solutions
Event-based, publish/subscribe, and message-oriented middleware
Evolutionary computing and intelligent systems
Expert approaches
Facilities planning and management
Flexible manufacturing systems
Formal methods and tools for designing
Fuzzy algorithms
Fuzzy logics
GPS and location-based app
CONSIDERATION OF HUMAN COMPUTER INTERACTION IN ROBOTIC FIELD ijcsit
This document discusses considerations for improving human-robot interaction through applying principles of human-computer interaction. It summarizes several existing models of human-computer interaction and adapts the action theory model to the context of human-robot interaction. The adapted model incorporates the simulation of robot emotions and awareness of human emotions into the interaction process. The summary then provides examples of how this adapted model could be applied to scenarios of social interaction between humans and robots.
Command, Goal Disambiguation, Introspection, and Instruction in Gesture-Free ...Vladimir Kulyukin
The document discusses a robotic office assistant capable of gesture-free spoken dialogue with a human operator. It describes the robot's architecture, which includes a deliberation tier for knowledge processing, an execution tier for managing behaviors, and a control tier for interacting with the world through skills. The robot's knowledge representation includes a semantic network for symbolic knowledge shared between language and vision, as well as procedural knowledge in the form of behaviors and skills. The document focuses on how the robot achieves command, goal disambiguation, introspection, and instruction through dialogue without gestures.
An ontology for semantic modelling of virtual worldijaia
This article presents a new representation of semantic virtual environments. We propose to use the ontology as a tool for implementation. Our model, called SVHsIEVs1 provides a consistent representation of the following aspects: the simulated environment, its structure, and the knowledge items using ontology, interactions and tasks that virtual humans can perform in the environment. In SVHsIEVs, we find two type of ontology: the global ontology and the local ontology for Virtual Human. Our architecture has been successfully tested in 3D dynamic environments.
We demonstrate a prototype allowing the unsupervised modeling of the structure of task-oriented dialogues. The prototype aims to assist the conception of a conversational agent architecture. The graphical representation displays the main stages of the dialogues and the transitions between them. Our tool allows to manipulate this graphical representation. We detail the various functionalities demonstrated.
AN EMOTIONAL MIMICKING HUMANOID BIPED ROBOT AND ITS QUANTUM ...butest
The document describes research on developing an emotional mimicking humanoid biped robot controlled using a quantum computing approach. The robot is able to mimic human gestures seen by a camera using a finite state machine. The researchers acquired two KHR-1 robots and integrated them into their robot theater system. They used OpenCV for computer vision and interfaced the robots with a state machine behavior controller. The researchers propose using the D-Wave quantum computer to help solve large constraint satisfaction problems needed for complex whole-body motion planning in real-time. This would allow the robot to exhibit a broader range of emotional behaviors and movements.
AN EMOTIONAL MIMICKING HUMANOID BIPED ROBOT AND ITS QUANTUM ...butest
This document summarizes research on developing an emotional mimicking humanoid biped robot controlled using a quantum computing approach. Key points include:
- The robot responds to human gestures seen by a camera using a finite state machine or constraints-satisfaction model linking vision, motion, emotions and planning.
- Quantum computing could quadratically speed up the constraints problems required for large-scale motion planning, emotions and behaviors, allowing real-time solutions.
- The researchers propose using the D-Wave Orion quantum computer to control a humanoid robot.
- They acquired two KHR-1 robots and integrated sensors, vision, speech and a robot programming language to develop the robot and research emotional robotics, robot
How women think robots perceive them – as if robots were men Matthijs Pontier
In previous studies, we developed an empirical account of user engagement with software agents. We
formalized this model, tested it for internal consistency, and implemented it into a series of software agents to
have them build up an affective relationship with their users. In addition, we equipped the agents with a module
for affective decision-making, as well as the capability to generate a series of emotions (e.g., joy and anger). As
follow-up of a successful pilot study with real users, the current paper employs a non-naïve version of a Turing
Test to compare an agent’s affective performance with that of a human. We compared the performance of an
agent equipped with our cognitive model to the performance of a human that controlled the agent in a Wizard
of Oz condition during a speed-dating experiment in which participants were told they were dealing with a
robot in bot h conditions. Participants did not detect any differences between the two conditions in the
emotions the agent experienced and in the way he supposedly perceived the participants. As is, our model can
be used for designing believable virtual agents or humanoid robots on the surface level of emotion expression.
To Ask or To Sense? Planning to Integrate Speech and Sensorimotor Actstoukaigi
This document describes research into developing a conversational robot that can integrate speech acts and sensorimotor acts when resolving ambiguities. The robot needs to decide whether to ask a clarifying question or perform a sensory action like moving its head to see from a different perspective. The researchers present a planning algorithm that treats speech acts and sensory actions in a common framework by calculating the expected costs and information rewards of different actions. They evaluate the algorithm's performance under various settings and discuss possible extensions.
The document discusses visual interpretation of hand gestures for human-computer interaction. It proposes using pointing gestures with a depth camera to interact with large displays. The system tracks hand movements using RGB-D cameras and uses the hand position and orientation to control the movement and rotation of virtual objects in a display. It discusses approaches for modeling, recognizing, and analyzing hand gestures as well as applications of gesture-based interaction systems. The methodology presented uses color segmentation and centroid tracking of a user's hand to determine coordinates and control a virtual object similarly to a computer mouse.
Temporal Reasoning Graph for Activity RecognitionIRJET Journal
This document discusses using a convolutional neural network and background subtraction for human activity recognition in videos. It proposes a model that uses CNN to extract features from video frames and classify human activities. The proposed system first acquires and preprocesses video data. It then extracts frames from the videos using background subtraction. These frames are split into training and testing sets for the CNN model. The CNN model is tested on the testing set to evaluate its ability to accurately classify human activities. Experimental results show the CNN model combined with background subtraction achieves good performance for human activity recognition.
Gestures are expressive, meaningful body motions, i.e., physical movements of the fingers, hands, arms, head, face, or body with the intent to convey information or interact with the environment.
Our research aims to propose a global approach for specification, design and verification of context awareness Human Computer Interface (HCI). This is a Model Based Design approach (MBD). This methodology describes the ubiquitous environment by ontologies. OWL is the standard used for this purpose. The specification and modeling of Human-Computer Interaction are based on Petri nets (PN). This raises the question of representation of Petri nets with XML. We use for this purpose, the standard of modeling PNML. In this paper, we propose an extension of this standard for specification, generation and verification of HCI. This extension is a methodological approach for the construction of PNML with Petri nets. The design principle uses the concept of composition of elementary structures of Petri nets as PNML Modular. The objective is to obtain a valid interface through verification of properties of elementary Petri nets represented with PNML.
The document discusses strategies for mid-level robot control and learning. It proposes using a common language called teleo-reactive (T-R) programs to represent robot control programs that were explicitly programmed, planned, or learned. The document describes T-R programs and some preliminary experiments on learning T-R programs through reinforcement learning. It found that perceptual imperfections like noise and aliasing pose challenges for learning but that T-R programs can still be learned to achieve robot tasks. Further experimentation is needed to develop the approach.
A gesture recognition system for the Colombian sign language based on convolu...journalBEEI
Sign languages (or signed languages) are languages that use visual techniques, primarily with the hands, to transmit information and enable communication with deaf-mutes people. This language is traditionally only learned by people with this limitation, which is why communication between deaf and non-deaf people is difficult. To solve this problem we propose an autonomous model based on convolutional networks to translate the Colombian Sign Language (CSL) into normal Spanish text. The scheme uses characteristic images of each static sign of the language within a base of 24000 images (1000 images per category, with 24 categories) to train a deep convolutional network of the NASNet type (Neural Architecture Search Network). The images in each category were taken from different people with positional variations to cover any angle of view. The performance evaluation showed that the system is capable of recognizing all 24 signs used with an 88% recognition rate.
Human-robot interaction can increase the challenges of artificial intelligence. Many domains of AI and its effect is laid down, which is mainly called for their integration, modelling of human cognition and human, collecting and representing knowledge, use of this knowledge in human level, maintaining decision making processes and providing these decisions towards physical action eligible to and in coordination with humans. A huge number of AI technologies are abstracted from task planning to theory of mind building, from visual processing to symbolic reasoning and from reactive control to action recognition and learning. Specific human-robot interaction is focused on this case. Multi-model and situated communication can support human-robot collaborative task achievement. Present study deals with the process of using artificial intelligence (AI) for human-robot interaction. by Vishal Dineshkumar Soni 2018. Artificial Cognition for Human-robot Interaction. International Journal on Integrated Education. 1, 1 (Dec. 2018), 49-53. DOI:https://doi.org/10.31149/ijie.v1i1.482. https://journals.researchparks.org/index.php/IJIE/article/view/482/459 https://journals.researchparks.org/index.php/IJIE/article/view/482
Computers still have a long way to go before they can interact with users in a truly natural fashion. From a
user’s perspective, the most natural way to interact with a computer would be through a
speech and gesture
interface. Although speech recognition has made significant advances in the past ten years, gesture
recognition has been lagging behind.
Sign L
anguages (SL) are the most accomplished forms of gestural
communication. Therefore, their auto
matic analysis is a real challenge, which is interestingly implied to their
lexical and syntactic organization levels. Statements dealing with sign language occupy a significant interest
in the Automatic Natural Language Processing (ANLP) domain. In this w
ork, we are dealing with sign
language recognition, in particular of French Sign Language (FSL). FSL has its own specificities, such as the
simultaneity of several parameters, the important role of the facial expression or movement and the use of
space for
the proper utterance organization. Unlike speech recognition, Frensh sign language (FSL) events
occur both sequentially and simultaneously. Thus, the computational processing of FSL is too complex than
the spoken languages. We present a novel approach bas
ed on HMM to reduce the recognition complexity.
This document is a doctoral thesis presented by Ms. Ngoc Anh Nguyen to obtain a Doctor of Management Science degree from Telecom Ecole de Management and Université d'Evry-Val-d'Essonne. The thesis examines ethnic identity, socialization factors, and their impacts on ethnic consumption behavior and ethnic food consumption in France.
The thesis includes a literature review to develop a conceptual model of the relationships between ethnic identity, socialization factors, and ethnic consumption behaviors. It also describes the methodology, which involves collecting and analyzing data to test the hypotheses in the conceptual model.
The results of a survey on France's ethnic population confirmed the hypotheses in general. The findings provide both theoretical
The document describes a thesis that aims to develop an expressive gesture model for a humanoid agent. It discusses the importance of gestures in communication and defines the challenges in generating believable gestures for virtual agents or robots. The thesis proposes procedures to address three main aspects of gesture generation: defining the form of gestures using a gesture lexicon with symbolic templates, modeling gesture expressivity using quality parameters, and temporally synchronizing gestures with speech. The developed gesture model is integrated within a multimodal behavior generation system to select and realize expressive gestures that are coordinated with synthesized speech.
Applying Computer Vision to Traffic Monitoring System in Vietnam Lê Anh
This document summarizes research on applying computer vision algorithms to develop an automatic traffic monitoring system in Vietnam. Key aspects of the system include vehicle detection using differences between frames, vehicle segmentation using edge detection and dilation, vehicle classification based on area and shape, and vehicle tracking across frames to count vehicles and estimate speeds. Experimental results found the system could detect 90-95% of vehicles and estimate speeds accurately 90-93% of the time. The research aims to improve traffic management by providing real-time traffic information.
This document discusses using Fitts' Law to simulate the movement duration of human gestures for a virtual agent system. It retrieves training data from video annotations of human gestures. A regression line is fitted to the data to determine the Fitts' Law parameters for the virtual agent. Limitations include only considering 2D spatial information and not accounting for gesture context or articulation constraints.
Affective Computing and Intelligent Interaction (ACII 2011)Lê Anh
The document proposes an expressive gesture model for generating communicative gestures on a Nao humanoid robot. The model aims to synchronize gestures with speech and add expressivity based on parameters like amplitude, speed, and fluidity. It uses a symbolic language to describe gestures and translates these into joint values for animation execution on the robot. The system is integrated within an existing virtual agent platform to leverage existing gesture selection and planning algorithms while using robot-specific gesture repertoires and animation approaches.
The document discusses expressive gesture generation for the NAO robot. It aims to 1) generate communicative gestures integrated within an existing virtual agent platform and 2) focus on expressivity and synchronization of gestures with speech. The methodology includes building a gesture library from video, using a common framework to control virtual and physical agents, and specifying gestures symbolically to convey meaning while accounting for different embodiments.
Journée Inter-GDR ISIS et Robotique: Interaction Homme-RobotLê Anh
This document describes work on modeling expressive gestures for the NAO humanoid robot. Researchers captured video of storytellers and annotated over 125 gestures. They developed a gesture schema and built a gesture repertoire by symbolically describing gestures. The researchers used the Greta platform to compute gestures based on a story's content. They pre-calculated joint positions and programmed behavioral scripts. The goal is to have NAO read stories expressively using verbal and nonverbal behaviors synchronized with speech. Future work includes defining invariant gesture meanings and evaluating the approach.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Things to Consider When Choosing a Website Developer for your Website | FODUUFODUU
Choosing the right website developer is crucial for your business. This article covers essential factors to consider, including experience, portfolio, technical skills, communication, pricing, reputation & reviews, cost and budget considerations and post-launch support. Make an informed decision to ensure your website meets your business goals.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
20240609 QFM020 Irresponsible AI Reading List May 2024
ACM ICMI Workshop 2012
1. A Common Gesture and Speech Production Framework for
Virtual and Physical Agents
Quoc Anh Le Jing Huang Catherine Pelachaud
Telecom ParisTech Telecom ParisTech CNRS, LTCI
37 rue Dareau 37 rue Dareau 37 rue Dareau
75014, Paris 75014, Paris 75014, Paris
quoc@enst.fr jing.huang@enst.fr catherine.pelachaud@enst.fr
ABSTRACT the virtual agents have [15]. For instance the expressive an-
We introduce a modular system to generate communicative thropomorphic robot Kismet at MIT can communicate rich
expressive gestures accompanying speech for an agent. This information through its facial expressions [2]. The ASIMO
system is designed as a common model for different embod- robot produces gestures accompanying speech in human com-
iments so that its processes are independent from a spe- munication [27]. The Nao humanoid robot can convey sev-
cific agent. There are two main features of this system. eral emotions such as anger, happiness, sadness through its
Firstly gesture expressivity is taken into account when ges- dynamic body movements [9, 20]. The approach of two do-
ture animation are computed on the fly from abstract ges- mains, virtual embodied agents (e.g., embodied conversa-
ture templates. Secondly gestures are scheduled to ensure tional agents) and physical embodied agents (e.g. robots)
that their execution are tightly tied to speech. In this pa- allows us to think about a common framework to control
per, we present the first implementation of this system being their behaviors in a same way. For this reason we aim at
used to control co-verbal gestures of the Greta virtual agent extending and developing our existing system to be able to
and of the Nao physical robot. handle both virtual and physical agents. The common ges-
ture generation model for the virtual agent Greta [25] and
the robot Nao [8] is our first attempt to reach this goal. In
Categories and Subject Descriptors this model we focus on three main aspects of human ges-
H.5.2 [Information Interfaces and Presentation]: Mis- tures. They are the form of gestures, the expressivity of
cellaneous gestures and the synchronization of gestures with speech.
Since the virtual and physical agents have different motion
capacities (e.g., the robot has less degrees of freedom and
General Terms has some limits in its movement speed), our methodology is
Algorithms, Design, Language to control the agents’ behaviors at a symbolic level through
representation languages such as FML [12] and BML [29].
Keywords This solution enables using the same processes for selecting
and planning gestures, and different algorithms for creating
Gesture, Speech, Synchronization, Expressivity, HRI, HMI, animation only.
BML, FML, SAIBA, GRETA, NAO Regarding the form of gestures, the robot and the vir-
tual agent may not be able to display the same gestures but
1. INTRODUCTION their selected gestures have to convey the same meaning (or
For many years, we have developed a virtual intelligent at least similar meanings). For this reason, we create two
agent (IVA) system, namely GRETA [25] that enables to repertoires of gesture templates, one for the virtual agent
produce and to respond appropriately verbal and non ver- and another one for the robot. These two repertoires have
bal behaviors like gaze, facial expressions, head movements entries for the same list of communicative intentions. Given
and gestures to human users. The modular architecture of an intent, the system selects appropriate gestures from ei-
this system follows SAIBA (Situation, Agent, Intention, Be- ther repertoires. For instance to point at an object, Greta
havior, Animation), an international standard multimodal can select an index gesture with one finger. Nao has only
behavior generation framework for embodied agents [29]. two hand configurations, open and closed. It cannot extend
Recently, the advance of robotics technology bring us hu- one finger as the virtual agent does, but it can full stretch
manoid robots with certain behavior capacities as much as its arm to point the object. As a result, for the same intent
of object pointing, while the Nao repertoire contains a ges-
Permission to make digital or hard copies of all or part of this work for
ture of whole stretched arm, the Greta repertoire contains
personal or classroom use is granted without fee provided that copies are an index gesture with one finger.
not made or distributed for profit or commercial advantage and that copies Concerning gesture expressivity, we have designed a set
bear this notice and the full citation on the first page. To copy otherwise, or of quality dimensions such as: 1) Spatial extent (SPC) de-
republish, to post on servers or to redistribute to lists, requires prior specific termines the amplitude of movements (e.g., contracting vs.
permission and/or a fee. expanding); 2) Fluidity (FLD) refers to smoothness and con-
ICMI 2012 Workshop on Speech and Gesture Production in Virtually and
Physically Embodied Conversational Agents, October 26, 2012, Santa
tinuity of movements (e.g., smooth vs. jerky); 3) Power
Monica, CA, USA. (PWR) defines acceleration and dynamic properties of move-
Copyright 2012 ACM 978-1-4503-1514-2/12/10...$15.00..
2. ments (e.g., weak vs. strong); 4) Temporal extent (TMP) tion schemes simulate agent’s communicative style. Another
refers to the global duration of movements (e.g., quick vs. data-driven method was proposed by Neff et al. [22]. In
sustained actions); 5) Repetition (REP) defines tendency to this method their model creates gesture animation based on
rhythmic repeats of specific movements; 6) Tension (TEN) gesturing styles extracted from gesture annotations of real
refers to hand-arm muscle states (e.g., relax vs. tense); 7) human subjects. In general, both these two systems and
Openness (OPE) determines spatial relation of hand-arm our model create gestures from predefined gestural proto-
positions to the body (e.g., away from body in an open types. In our system, gestural prototypes are abstract ges-
gesture). These parameters have been implemented for the ture templates that have no reference to a specific animation
virtual agent Greta [11]. We want to realize such a set of parameters of agents (e.g., wrist joint).
expressivity parameters for the Nao robot’s gestures. From The model of Bergmann et al. [1] combines data-driven
a same gesture template, an agent can animate the gesture machine learning techniques and rule-based decision meth-
in different ways depending on current emotion state or per- ods. It also introduces several contextual factors. The whole
sonality of the agent. For instance a sad agent may realizes architecture is used for a computational Human-Computer
gestures slowly and weakly vs. an angry agent can gesture Interaction simulation, focusing on the production of the
quickly and strongly. speech-accompanying iconic gestures. This model allows the
In this framework, the synchronization of gestures with generation of gestures on the fly. It is one of the few models
speech is ensured by adapting gesture movements to the to have such a capacity. However this is a domain depen-
speech timing. According to Kendon and McNeill [16, 21], dent gesture generation model. While our model can handle
the most meaningful part of a gesture (i.e., the stroke phase) all types of gestures regardless specific domains, the model
mainly happens at the same time or lightly before the stressed of Bergmann is limited to iconic gestures and it have to be
syllables of speech. While a robot may potentially need re-trained with a new data corpus to be able to produce
longer time for execution of hand movements than a virtual appropriate gestures for a new domain.
agent, our synchronization engine has to be able to predict Concerning the expressivity of nonverbal behaviors (e.g.,
gesture duration for each agent’s embodiment type so that gesture expressivity), it exists several expressivity models ei-
their gestures are scheduled correctly. In our case, the du- ther act as filter over an animation or modulate the gesture
ration of gesture movements between any two positions in specification ahead of time. EMOTE implements the effort
gesture space of the Nao robot is pre-calculated because we and shape components of the Laban Movement Analysis [4].
cannot have it on the fly. These parameters affect the wrist location of the humanoid.
The paper is structured as follows. The next section They act as a filter on the overall animation of the virtual
presents some recent initiatives in generating gestures for humanoid. On the other hand, a model of nonverbal behav-
virtual agents and for humanoid robots and how our ap- ior expressivity has been defined that acts on the synthesis
proach differs from these existing works. Then, Section 3 computation of a behavior [10]. It is based on perceptual
gives an overview of our system and explains how our sys- studies conducted by Wallbott [30]. Among a large set of
tem is designed to be common for both virtual and physical variables that are considered in the perceptual studies, six
agents. Section 4 presents gesture lexicons which are elab- parameters [11] are retained and implemented in the Greta
orated to be adapted to agents’ embodiment. In Section 5 ECA system.
and 6, we describe the mechanism to select and plan ges-
tures from gesture lexicons to synchronize with speech and Speech Gesture Production for Humanoid Robots
to be rendered expressive. Section 7 shows hows gestures
The most similar approach to our model is the work of Salem
with expressivity are produced and realized for Greta and
et al. [27]. We share the same idea of using an existing
Nao. Section 8 concludes the paper and proposes some fu-
virtual agent system to control a physical humanoid robot.
ture works.
Both of us have to face difficulties of physical constraints
while creating robot gestures (e.g., limit of space and speed
2. STATE OF THE ART robot movements). However, we have certain differences
This section presents some recent initiatives to generate in resolving these problems. While Salem et al. fully use
co-verbal gestures for virtual agents and physical robots. the MAX system to produce gesture parameters (i.e., joint
The differences and similarities between these approaches angles or effector targets) which are still designed for the
and our system are analyzed in detail. virtual agent, our existing GRETA system is extended and
developed so that its extern parameters can be customized
Co-verbal Gesture Production for Virtual Agents to produce gesture parameters for a specific agent embodi-
The first system that generates gestures for a virtual agent ment (e.g., a virtual agent or a physical robot). For instance,
was proposed by Cassell et al [3]. In their system, gestures the MAX system produces an iconic gesture of complicated
are selected and computed from gesture templates. These hand shapes that is feasible for the MAX agent but have to
gesture templates are predefined and stored in a gesture be mapped to one of three basic hand shapes of ASIMO. In
repertoire called lexicon. A similar method is still used in our system, we deal with this problem ahead of time when
our system. However our model takes into account a set of elaborating lexicon for each agent type. This allows us to
expressivity parameters while creating gesture animations. ensure that both agents convey the same information. In ad-
So that we can produce variants of a gesture from a same dition, the quality of our robot’s gestures is increased with
abstract gesture template. a set of expressivity parameters that is taken into account
Stone et al. [28] proposed a data-driven method for syn- while the system generate gesture animations. This gesture
chronizing small units of pre-recorded gesture animation and expressivity has not yet been studied in Salem’s robot sys-
speech. Their approach generates gestures synchronized with tem although it was mentioned in development of the Max
each phrase of speech automatically. Different combina- agent [1].
3. trates the data flow of our model. A message service system
(i.e. in our case ActiveMQ) is used to exchange data in
real-time between modules. The ActiveMQ facilitates us to
integrate a new module into the system to send as well as
receive messages from other modules.
The following subsections present in detail each process in
the system.
Figure 1: SAIBA framework.
4. GESTURE TEMPLATES
An implementation and evaluation of gesture expressiv- In our system, gestures are generated on the fly from ab-
ity was done in the robot gesture generation system of Ng- stract gesture templates in a gestuary that was introduced
Thow-Hing [23]. This system selects gesture types corre- firstly by De Ruiter [5]. Each entry in a gestuary is a pair
sponding to input text through a parts-of-speech analysis. of two informations: the name of communicative intention
Then it schedules the gestures to be synchronized with speech and the description of gesture that conveys the given com-
using temporal information returned from a text-to-speech municative intention. Gesture templates are described sym-
engine. The system calculates gesture trajectories on the fly bolically with a representation language as an extension of
from gesture templates while taking into account its style BML [29]. Their descriptions have no reference to specific
parameters. Differently from our model, his system was not animation parameters of agents (e.g. wrist joint).
designed as a common framework for both virtual and phys- Gesture is specified symbolically in the agent and robot
ical agents. lexicons. We rely on the theory of gestures of McNeill [21],
There are also other initiatives that generate gestures for the gestural hierarchy of Kendon [16] to specify a symbolic
a humanoid robot such as [24, 14] but they are limited in gesture. As a result, a gestural action may divided into
simple gestures or gestures for certain functions only. For several phases of wrist movement, in which the obligatory
instance pointing gestures in presentation [24]. phase is call stroke transmitting the meaning of the gesture.
All of the above systems have a mechanism to synchro- The stroke phase may be preceded by a preparatory phase
nize gestures with speech. Gesture movements are adapted which serves to take the articulatory joints (e.g. hand and
to speech’s timing in [27, 23, 24] . This solution is also used wrist) to a position where the stroke occurs. After that
in our system. Some systems have a feedback mechanism to it may be followed by a retraction phase that returns the
receive and process feedback information from the robot in articulatory joints to relax position or a position initialized
real-time, which is then used to improve the smoothness of for the next gesture. In our lexicons, only the description of
gesture movements [27], or to improve the synchronization the stroke phase is specified for each gesture. Other phases
of gestures with speech [14]. They have also a common char- will be generated automatically by the system. A stroke
acteristic that robot gestures are driven by a script language phase is represented through a sequence of key poses, each
such as MURML [27], BML [14] and MPML-HR [24]. of which is described with the information of hand shape,
wrist position, palm orientation, etc. A trajectory type is
declared as linear, curve, etc to indicate how to move from
3. SYSTEM OVERVIEW one key pose to another one.
Our system follows the architecture of the SAIBA frame-
work [29] (cf. Figure 1). This architecture consists of three
separated modules: (i) the first module, Intent Planner, de- 5. FML-APML TO BML
fines the communicative intents that the agent aims to com- The FML language has not yet been standardized so that
municate to the users such as emotional states, beliefs or we use our FML-APML language [19]. The FML-APML
goals; (ii) the second, Behavior Planner, selects and plans is based on the Affective Presentation Markup Language
the corresponding multi-modal behavior to be realized; (iii) (APML) [6] and has similar syntax with FML [12].
and the third module, Behavior Realizer, synchronizes and A FML message includes two description parts: one for
realizes the planned behaviors. The results of the first mod- speech and another one for communicative intents. The de-
ule is the input of the second module through an interface scription of speech is borrowed from the BML syntax. It
described with the Function Markup Language (FML) [13]. indicates the text to be uttered by the agent as well as time
The output of the second module is encoded the Behavior markers for synchronization purposes. The second part is
Markup Language (BML) [29], and then sent to the third based on the work of Poggi [26]; it defines information on
module. Both languages FML and BML are XML-based and the world and on the speaker’s mind. In this part, each tag
do not refer to specific animation parameters of agents (e.g. corresponds to one of the communicative intentions. Each
wrist joint). That means the Intent Planner and Behav- intention has tag attributes to indicate its importance degree
ior Planner modules in this platform are independent of the (probability to happen), timing (absolute or relative to the
agent’s embodiment and the animation player technology. speech’s time markers), etc. The Behavior Planner selects
The Behavior Realizer receives the BML message and in- from the agent’s lexicon the behaviors that convey specific
stantiates the BML tags from either gesture repertoires (i.e. communicative acts. It also calculates absolute start and
one repertoire for the virtual agent and another one for the end time for them, as well as values of expressivity param-
physical robot) in order to schedule gesture phases and gen- eters. A speech synthesizer (e.g. Acapela or OpenMary) is
erate a set of gesture keyframes. This module is common called in this module to create audio data and to instantiate
to both agents. The next module, Animation Realizer, is time markers. The selected gestures and speech’s informa-
responsible in generating the animation from the keyframes. tion are outputted within a BML message and sent to the
Only, this module is specific to each agent. Figure 2 illus- Behavior Realizer module.
4. Figure 2: A Common Gesture Generation Framework for Virtual and Physical Agents.
6. BML TO KEYFRAMES a defined relax position.
This process has two main tasks: scheduling gesture phases We apply the Fitts’ Law (ie. simulating human movement
to synchronize with speech while taking into account the law) [7] to have the natural movement speed. The param-
expressivity parameters and loading gestures from either eters of Fitts’ Law function is customized to adapt to each
gestural lexicons to create corresponding keyframes. Each agent.
keyframe contains the symbolic description and timing of
each gesture phase. The symbolic representation of keyframes GESTURE EXPRESSIVITY
allow us to use the same algorithm for the synchronization The set of expressivity parameters is divided into two sub-
of gestures with speech independently of the agent embod- sets. The first subset including spatial extent (SPC), tempo-
iment or animation parameters. Speech signal is also de- ral extent (TMP), stroke repetition (REP) is taken into ac-
scribed within a keyframe. This keyframe indicates the au- count whilst the timing of gesture phases is calculated. The
dio source provided by the speech synthesizer as well as the second subset including other parameters of the set (i.e fluid-
start time to play this audio. ity, power, openness, tension of gesture movement) is applied
when creating gesture animation. The reason is that the ex-
pressivity parameters in the second subset is dependent on
SYNCHRONIZATION the agents’ embodiment. For instance the Nao robot does
In our system, the synchronization between gesture signal not support the acceleration modulation of gesture move-
and speech is realized by adapting the gesture timing to ments in real-time. In the first subset of expressivity pa-
speech. It means the temporal information of gestures within rameters, the temporal extent(TMP) modifies the duration
bml tag (i.e. for gesture phases) are relative to the speech. of a gesture. If the TMP value increases, the gesture lasts
They are specified through time markers encoded by seven less. It means the speed of the movement is faster. How-
synchronization points: start, ready, stroke-start, stroke, stroke- ever, in order to keep the synchronization with speech the
end, relax and end [29]. The most meaningful part occurs time of stroke-end sync point can not be changed. Conse-
between the stroke-start and the stroke-end (i.e. the stroke quently the time of stroke-star and start sync points is later.
phase). The preparation phase goes from start to ready. In On the contrary, their time is earlier if the TMP value de-
our system, the synchronization between gesture and speech creases. Concerning spatial extent (SPC), it modulates the
is ensured by forcing the end time of the stroke phase (i.e. amplitude of gesture movements along the vertical, horizon-
stroke-end sync point) to coincide with the stressed syllables. tal and depth dimensions. When a gesture is elaborated,
The duration of the preparation and stroke phase are hence certain dimensions are fixed to keep a gesture meaning. So
pre-estimated so that the system can calculate exactly the that only re-sizable dimensions are affected by the SPC pa-
time to start the gesture. This ensures that the stroke hap- rameter. They are increased if the SPC value increases and
pens on the stressed syllables. This pre-estimation is done vice versa. The REP parameter defines the number of re-
by calculating the distance between the current hand-arm peating stroke phase in a gesture action. The duration of
position and the next desired position and by computing the complete gesture increases linearly with the REP value.
how long it takes to perform the trajectory. In case that the
allocated time is not enough to do the preparation phase,
the whole gesture has to be canceled, leaving free time to 7. KEYFRAMES TO ANIMATION
prepare for the next gesture. In other cases, if the allocated The process to compute the animation from a given set
duration totally for a gesture is too long, a hold phase is of keyframes is specific to each embodiment. While all pre-
added to keep this gesture movement more natural. The re- vious computations use the common agent framework, this
traction phase is optional. It depends on its available time stage is embodiment dependent. The following subsections
and also on the start time for the next gesture. This phase present in detail how to calculate the values of the animation
will be canceled if it has not enough time to move hands to parameters for the Greta virtual agent and the Nao robot.
5. Figure 3: Standard BML synchronization points.
7.1 Generating Greta gesture animation ing to the key positions in McNeill’s gesture space [17]. The
In this section, we present the implementation of our an- symbolic position of a gesture keyframe is instantiated with
imation pipeline. It starts by receiving BML-like symbolic corresponding wrist position. From the actual position of
key frames time stamped in the motion planner. All key the wrist, the palm orientation and hand shape are com-
frames are received by streaming, and hence our anima- puted in real-time. The robot has only two hand shape con-
tion computations need to be achieved on the fly. Each figurations (i.e. open and close). The TMP value modifies
key frame includes gesture phases, expressivity parameters, the complete duration of a gesture, the PWR value modu-
gesture trajectory and the description of shape and mo- lates the acceleration of the movement of this gesture. For
tion for hand, torso, head, etc. We group keyframes per the Nao robot, while the movement acceleration cannot be
modalities, ie torso movements, head movements, arm ges- modified, the system adjusts the duration of each phase of
ture movements (two groups: left and right sides) in order the gesture to simulate a change of movement speed. A hold
to create a full body information. A key frame is defined time is also added after stroke phase when the PWR value
by two computational attribute types: movement descrip- increases to simulate a powerful movement. The Fluidity
tions and targets to be reached through forward and inverse (FLD) parameter modifies the smoothness of single gesture
kinematics techniques. Direct movement descriptions are and the continuity between consecutive gestures. It modifies
used to define forward kinematics (FK); the data can be the motion curve. However, the modification of the acceler-
abstracted from either motion capture or edited motion of ation and trajectory curve is not available for the Nao robot
different body parts. The targets will describe the gesture so that we can not apply these changes. So far, the FLD
trajectory: we can perform a targeting process to reorganize value modulates the way that the robot link consecutive
the gesture trajectory that can take the form of line, curve, gestures. For instance when the FLD value increases, the
circle, and spiral. After this path targeting process, we ob- movement between two consecutive gestures is smoother,
tain animation sequences for each body part (head, torso, the robot does a movement liaison from the first gesture
gestures, etc). The next step is to gather these animation without retraction phase to the second gesture.
sequences into a single time stamps sequence covering the Lastly all joint values with timing information are sent to
whole body. With this gathering process, we can create full the robot (as an animation layer). The animation is obtained
body animation dependency, such as arm gestures influenc- by interpolating between joint values with the robot built-in
ing torso movements. This influence mechanism is part of proprietary procedures [8].
the reaching model. We use forward kinematics to define the
Experimental results
initial states for our agent skeleton system. Our IK method
is applied to complete the key frames specification for the The Nao’s gestures generation system was evaluated through
body. When the full body posture is computed, we apply re- perceptive tests. We wanted to evaluate how robot’s ges-
targeting when processing the second subset of expressivity tures were perceived by human users at the level of the ex-
parameters (FLD, PWR, OPE, TEN) (see section Gesture pressivity, the naturalness of gestures and the synchroniza-
Expressivity). We defined several different expressivity pa- tion of gestures with speech while the robot was telling a
rameters. Using various easing functions to modulate speed French tale [18]. 63 French speakers participated in our ex-
and acceleration interpolation curves allows the simulation periment. The results showed that the co-verbal expressive
of PWR and TEN. The last process of our pipeline is to gestures generated by our model and displayed by the Nao
generate animation frames from key frames and finally to robot were acceptable. 48 participants (76%) agreed that
convert these animation frames into BAP (MPEG-4 body gestures were synchronized with speech and 44 participants
animation parameter) to animate our conversational virtual (70%) approved that gestures were expressive. However, the
agent. This process is only performed in 3D rotation space. naturalness of gestures were not appropriate and need to be
All the BAP frames are sent to the rendering and animation improved in future work.
player.
8. CONCLUSIONS
7.2 Generating Nao gesture animation We have designed and implemented a framework to ani-
Similarly to the Greta gesture animation module, this pro- mate virtual and physical agents. This framework is as much
cess receives and processes keyframes on the fly (through as possible independent of the embodiment of the agents.
ActiveMQ). Then it translates keyframes into joint values Only the last step, consisting in interpolating keyframes into
of the robot. The second subset of expressivity parameters animation frames, is agent dependent. In our system a ges-
is applied in this stage. ture lexicon is elaborated for each agent. It allows us to en-
To avoid singular positions in the gesture movement space compass variations and limitations of agent embodiments.
of the robot, we predefine a set of wrist positions the robot Elements of the lexicon are stored using the same symbolic
can reach. In our case this set has 105 positions correspond- language. An extended set of expressivity parameters have
6. been implemented. The parameters act on the volume and [13] D. Heylen, S. Kopp, S. Marsella, C. Pelachaud, and
dynamism of gestures production. Our gesture engine en- H. Vilhj´lmsson. The next step towards a function
a
sures also that the timing of gesture phases is synchronized markup language. pages 270–280, 2008.
with speech. [14] A. Holroyd and C. Rich. Using the behavior markup
language for human-robot interaction. In Proceedings
9. ACKNOWLEDGMENTS of the seventh annual ACM/IEEE international
The authors would like to thank Andr´-Marie Pez for his
e conference on Human-Robot Interaction, pages
help in implementing the system. This work has been par- 147–148. ACM, 2012.
tially supported by the French national projects ANR CE- a˘ ´
[15] T. Holz, M. Dragone, and G. OˆAZHare. Where
CIL, GVLEX and IMMEMO. robots and virtual agents meet. International Journal
of Social Robotics, 1(1):83–93, 2009.
10. REFERENCES [16] A. Kendon. Gesture: Visible action as utterance.
[1] K. Bergmann and S. Kopp. Modeling the production Cambridge University Press, 2004.
of coverbal iconic gestures by learning bayesian [17] Q. Le, S. Hanoune, and C. Pelachaud. Design and
decision networks. Appl. Artif. Intell., 24(6):530–551, implementation of an expressive gesture model for a
2010. humanoid robot. 11th IEEE-RAS Humanoid Robots,
[2] C. Breazeal. Emotion and sociable humanoid robots. pages 134–140, 2011.
Int. J. Hum.-Comput. Stud., 59(1-2):119–155, 2003. [18] Q. A. Le and C. Pelachaud. Evaluating an expressive
[3] J. Cassell, T. Bickmore, M. Billinghurst, L. Campbell, gesture model for a humanoid robot: Experimental
K. Chang, H. Vilhj´lmsson, and H. Yan. Embodiment
a results. Submitted to 8th ACM/IEEE International
in conversational interfaces: Rea. In Proceedings of the Conference on Human-Robot Interaction, 2012.
SIGCHI conference on Human factors in computing [19] C. P. M. Mancini. The fml - apml language. The First
systems: the CHI is the limit, pages 520–527. ACM, FML workshop, 2008.
1999. [20] V. Manohar, S. al Marzooqi, and J. W. Crandall.
[4] D. Chi, M. Costa, L. Zhao, and N. Badler. The emote Expressing emotions through robots: a case study
model for effort and shape. In Proceedings of the 27th using off-the-shelf programming interfaces. In The 6th
annual conference on Computer graphics and Int. Conf. on HRI, pages 199–200. ACM, 2011.
interactive techniques, pages 173–182. ACM [21] D. McNeill. Hand and mind: What gestures reveal
Press/Addison-Wesley Publishing Co., 2000. about thought. 1996.
[5] J. P. De Ruiter. Gesture and Speech Production. [22] M. Neff, M. Kipp, I. Albrecht, and H. Seidel. Gesture
Doctoral dissertation at Catholic University of modeling and animation based on a probabilistic
Nijmegen, Netherlands, 1998. re-creation of speaker style. ACM Transactions on
[6] B. DeCarolis, C. Pelachaud, I. Poggi, and Graphics (TOG), 27(1):5, 2008.
M. Steedman. Apml, a mark-up language for [23] V. Ng-Thow-Hing, P. Luo, and S. Okita. Synchronized
believable behavior generation. Life-like Characters. gesture and speech production for humanoid robots.
Tools, Affective Functions and Applications. The Int. Conf. on Intelligent Robots and Systems
[7] P. Fitts. The information capacity of the human motor (IROS’10). IEEE/RSJ, 2010.
system in controlling the amplitude of movement. [24] Y. Nozawa, H. Dohi, H. Iba, and M. Ishizuka.
Journal of experimental psychology, 47(6):381, 1954. Humanoid robot presentation controlled by
[8] D. Gouaillier, V. Hugel, P. Blazevic, C. Kilner, multimodal presentation markup language mpml.
J. Monceaux, P. Lafourcade, B. Marnier, J. Serre, and Computer animation and virtual worlds, pages
B. Maisonnier. Mechatronic design of nao humanoid. 153–158, 2004.
The Int. Conf. on Robotics and Automation, 2009., [25] C. Pelachaud. Multimodal expressive embodied
pages 769–774, 2009. conversational agents. pages 683–689, 2005.
[9] M. Haring, N. Bee, and E. Andre. Creation and [26] I. Poggi, C. Pelachaud, and E. Caldognetto. Gestural
evaluation of emotion expression with body mind markers in ecas. Gesture-Based Communication
movement, sound and eye color for humanoid robots. in Human-Computer Interaction, pages 481–482, 2004.
In RO-MAN, 2011 IEEE, pages 204–209, 2011. [27] M. Salem, S. Kopp, I. Wachsmuth, K. Rohlfing, and
[10] B. Hartmann, M. Mancini, and C. Pelachaud. F. Joublin. Generation and evaluation of
Towards affective agent action: Modelling expressive communicative robot gesture. International Journal of
eca gestures. In International conference on Intelligent Social Robotics, pages 1–17, 2012.
User Interfaces-Workshop on Affective Interaction, [28] M. Stone, D. DeCarlo, I. Oh, C. Rodriguez, A. Stere,
San Diego, CA, 2005. A. Lees, and C. Bregler. Speaking with hands:
[11] B. Hartmann, M. Mancini, and C. Pelachaud. Creating animated conversational characters from
Implementing expressive gesture synthesis for recordings of human performance. ACM Transactions
embodied conversational agents. LNCS: Gesture in on Graphics (TOG), 23(3):506–513, 2004.
human-Computer Interaction and Simulation, pages [29] H. Vilhj´lmsson et al. The behavior markup language:
a
188–199, 2006. Recent developments and challenges. Intelligent
[12] D. Heylen, S. Kopp, S. Marsella, C. Pelachaud, and Virtual Agents, pages 99–111, 2007.
H. Vilhj´lmsson. The next step towards a function
a [30] H. Wallbott. Bodily expression of emotion. European
markup language. Intelligent Virtual Agents, pages journal of social psychology, 28(6):879–896, 1998.
270–280, 2008.