• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Kismet, the robo t

Kismet, the robo t






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Kismet, the robo t Kismet, the robo t Presentation Transcript

    • KISMET, THE ROBOT HARDWARE DESIGN Kismet is an expressive robotic creature with perceptual and motor modalities tailored to natural human communication channels. To facilitate a natural infant-caretaker interaction, the robot is equipped with visual, auditory, and proprioceptive sensory inputs. The motor outputs include vocalizations, facial expressions, and motor capabilities to adjust the gaze direction of the eyes and the orientation of the head. Note that these motor systems serve to steer the visual and auditory
    • Our hardware and software control architectures have been designed to meet the challenge of realtime processing of visual signals (approaching 30 Hz) and auditory signals (8 kHz sample rate and frame windows of 10 ms) with minimal latencies (less than 500 ms). The high-level perception system, the motivation system, the behavior system, the motor skill system, and the face motor system execute on four Motorola 68332 microprocessors running L, a multi-threaded Lisp developed in our lab. Vision processing, visual attention and eye/neck control is performed by nine networked 400 MHz PCs running QNX (a real-time Unix operating system). Expressive speech synthesis and vocal affective intent recognition runs on a dual 450 MHz PC running NT, and the speech recognition system runs on a
    • Vision System The robot's vision system consists of four color CCD cameras mounted on a stereo active vision head. Two wide field of view (fov) cameras are mounted centrally and move with respect to the head. These are 0.25 inch CCD lipstick cameras with 2.2 mm lenses manufactured by Elmo Corporation. They are used to decide what the robot should pay attention to, and to compute a distance estimate. There is also a camera mounted within the pupil of each eye. These are 0.5 inch CCD foveal cameras with an 8 mm focal length lenses, and are used for higher resolution post-attentional processing, such as eye detection. Kismet has three degrees of freedom to control gaze direction and three degrees of freedom to control its neck. The degrees of freedom are driven by Maxon DC servo motors with high resolution optical encoders for accurate position control. This gives the robot the ability to move and orient its eyes like a human, engaging in a variety of human visual behaviors. This is not only advantageous from a visual
    • Auditory System The caregiver can influence the robot's behavior through speech by wearing a small unobtrusive wireless microphone. This auditory signal is fed into a 500 MHz PC running Linux. The real-time, low-level speech processing and recognition software was developed at MIT by the Spoken Language Systems Group. These auditory features are sent to a dual 450 mHz PC running NT. The NT machine processes these features in real-time to recognize the spoken affective intent of the caregiver.
    • EXPRESSIVE MOTOR SYSTEM Kismet has a 15 DoF face that displays a wide assortment of facial expressions to mirror its ``emotional'' state as well as to serve other communicative purposes. Each ear has two degrees of freedom that allows Kismet to perk its ears in an interested fashion, or fold them back in a manner reminiscent of an angry animal. Each eyebrow can lower and furrow in frustration, elevate upwards for surprise, or slant the inner corner of the brow upwards for sadness. Each eyelid can open and close independently, allowing the robot to wink an eye or blink both. The robot has four lip actuators, one at each corner of the mouth, that can be curled upwards for a smile or downwards for a frown. There is also a single degree of freedom jaw.
    • VOCALIZATION SYSTEM The robot's vocalization capabilities are generated through an articulatory synthesizer. The underlying software (DECtalk v4.5) is based on the Klatt synthesizer which models the physiological characteristics of the human articulatory tract. By adjusting the parameters of the synthesizer it is possible to convey speaker personality (Kismet sounds like a young child) as well as adding emotional qualities to synthesized speech (Cahn 1990).
    • THE FRAMEWORK The system architecture consists of six subsystems: the low-level feature extraction system, the high-level perception system, the attention system, the motivation system, the behavior system, and the motor system. The low-level feature extraction system extracts sensor-based features from the world, and the high-level perceptual system encapsulates these features into percepts that can influence behavior, motivation, and motor processes. The robot has many behaviors in its repertoire, and several motivations to satiate, so its goals vary over time. The motor system carries out these goals by orchestrating the output modalities (actuator or vocal) to achieve them. For Kismet, these actions are realized as motor skills that accomplish the task physically, or expressive motor acts that accomplish the task via social signals.
    • THE LOW-LEVEL FEATURE EXTRACTION SYSTEM The low-level feature extraction system is responsible for processing the raw sensory information into quantities that have behavioral significance for the robot. The routines are designed to be cheap, fast, and just adequate. Of particular interest are those perceptual cues that infants seem to rely on. For instance, visual and auditory cues such as detecting eyes and the recognition of vocal affect are important for infants.
    • THE ATTENTION SYSTEM The low-level visual percepts are sent to the attention system. The purpose of the attention system is to pick out low-level perceptual stimuli that are particularly salient or relevant at that time, and to direct the robot's attention and gaze toward them. This provides the robot with a locus of attention that it can use to organize its behavior. A perceptual stimulus may be salient for several reasons. It may capture the robot's attention because of its sudden appearance, or perhaps due to its sudden change. It may stand out because of its inherent saliency such as a red ball may stand out from the background. Or perhaps its quality has special behavioral significance for the robot such as being a typical indication of danger
    • THE PERCEPTUAL SYSTEM The low-level features corresponding to the target stimuli of the attention system are fed into the perceptual system. Here they are encapsulated into behaviorally relevant percepts. To environmentally elicit processes in these systems, each behavior and emotive response has an associated releaser. As conceptualized by Tinbergen and Lorenz, a releaser can be viewed as a collection of feature detectors that are minimally necessary to identify a particular object or event of behavioral significance. The function of the releasers is to ascertain if all environmental (perceptual) conditions are right for the response to become active.
    • THE MOTIVATION SYSTEM The motivation system consists of the robot's basic ``drives'' and ``emotions''. The ``drives'' represent the basic ``needs'' of the robot and are modeled as simple homeostatic regulation mechanisms. When the needs of the robot are being adequately met, the intensity level of each ``drive'' is within a desired regime. However, as the intensity level moves farther away from the homeostatic regime, the robot becomes more strongly motivated to engage in behaviors that restore that ``drive''. Hence the ``drives'' largely establish the robot's own agenda, and play a significant role in determining which behavior(s) the robot activates at any one time. The ``emotions'' are modeled from a functional perspective. Based on simple appraisals of the benefit or detriment of a given stimulus, the robot evokes positive emotive responses that serve to bring itself closer to it, or negative emotive responses in order to withdraw from it. There is a distinct emotive response for each class of eliciting conditions
    • THE BEHAVIOR SYSTEM The behavior system organizes the robot's task-based behaviors into a coherent structure. Each behavior is viewed as a self-interested, goaldirected entity that competes with other behaviors to establish the current task. An arbitration mechanism is required to determine which behavior(s) to activate and for how long, given that the robot has several motivations that it must tend to and different behaviors that it can use to achieve them. The main responsibility of the behavior system is to carry out this arbitration. In particular, it addresses the issues of relevancy, coherency, persistence, and opportunism. By doing so, the robot is able to behave in a sensible manner in a complex and dynamic environment
    • THE MOTOR SYSTEM The motor system arbitrates the robot's motor skills and expressions. It consists of four subsystems: the motor skills system, the facial animation system, the expressive vocalization system, and the oculomotor system. Given that a particular goal and behavioral strategy have been selected, the motor system determines how to move the robot so as to carry out that course of action. Overall, the motor skills system coordinates body posture, gaze direction, vocalizations, and facial expressions to address issues of blending and sequencing the action primitives from the specialized motor systems.
    • SOCIALIZING WITH PEOPLE Kismet is designed to make use of human social protocol for various purposes. One such purpose is to make life easier for its vision system. If a person is visible, but is too distant for their face to be imaged at adequate resolution, Kismet engages in a calling behavior to summon the person closer. People who come too close to the robot also cause difficulties for the cameras with narrow fields of view, since only a small part of a face may be visible. In this circumstance, a withdrawal response is invoked, where Kismet draws back physically from the person. This behavior, by itself, aids the cameras somewhat by increasing the distance between Kismet and the human. But the behavior can have a secondary and greater effect through social amplification -- for a human close to Kismet, a withdrawal response is a strong social cue to back away, since it is analogous to the human response to invasions of ``personal space.'
    • ENVELOPE DISPLAYS Such regulatory mechanisms play roles in more complex social interactions, such as conversational turn-taking. Here control of gaze direction is important for regulating conversation rate. In general, people are likely to glance aside when they begin their turn, and make eye contact when they are prepared to relinquish their turn and await a response. People tend to raise their brows when listening or waiting for the other to speak. Blinks occur most frequently at the end of an utterance. These envelope displays and other cues allow Kismet to influence the flow of conversation to the advantage of its auditory processing. The visual-motor system can also be driven by the requirements of a nominally unrelated sensory modality, just as behaviors that seem completely orthogonal to vision (such as earwiggling during the call behavior to attract a person's attention) are nevertheless recruited for the purposes of regulation.
    • OTHER REGULATORY DISPLAYS Some regulatory displays also help protect the robot. Objects that suddenly appear close to the robot trigger a looming reflex, causing the robot to quickly withdraw and appear startled. If the event is repeated, the response quickly habituates and the robot simply appears annoyed, since its best strategy for ending these repetitions is to clearly signal that they are undesirable. Similarly, rapidly moving objects close to the robot are threatening and trigger an escape response. These mechanisms are all designed to elicit natural and intuitive responses from humans, without any special training. But even without these carefully crafted mechanisms, it is often clear to a human when Kismet's perception is failing, and what corrective action would help, because the robot's perception is reflected in behavior in a familiar way. Inferences made based on our human preconceptions are actually likely to work.