Kismet (robot)
Encyclopedia
Kismet is a robot
made in the late 1990s at Massachusetts Institute of Technology
by Dr. Cynthia Breazeal
. The robot's auditory, visual and expressive systems were intended to allow it to participate in human social interaction and to demonstrate simulated human emotion
and appearance. The name Kismet
comes from the Arabic
, Turkish
, Urdu
, Hindi
and Punjabi
word meaning "fate" or sometimes "luck".
, visual
, and proprioception
abilities. Kismet simulates emotion through various facial expressions, vocalizations, and movement. Facial expressions are created through movements of the ears, eyebrows, eyelids, lips, jaw, and head. The cost of physical materials is an estimated US$25,000.
Four color CCD cameras mounted on a stereo active vision head and two wide field of view cameras allow Kismet to decide what to pay attention to and to estimate distances. A .5 inch CCD foveal camera with an 8 mm focal length lens
is used for higher resolution post-attentional processing, such as eye detection.
By wearing a small microphone, a user can influence Kismet's behaviour. An auditory signal is carried into a 500 MHz PC running Linux
, using software developed at MIT by the Spoken Language Systems Group that can process real-time, low-level speech patterns. A 450 MHz PC running NT processes these features in real-time to recognize the spoken affective intent of the caregiver.
In addition to the computers mentioned above, there are four Motorola 68332s
, nine 400 MHz PCs, and another 500 MHz PC.
Maxon DC servo motors
with high resolution optical encoders are positioned to give Kismet three degrees of eye movement, which allow it to control gaze direction and gives Kismet the ability to move and orient its eyes like a human. This allows Kismet to simulate human visual behaviors. It also allows humans to assign a communicative value to eye movements and to allow Kismet to focus on what it deems important in its field of vision.
Kismet's audio system is mainly tuned towards identifying affect in infant-directed speech. In particular, it can detect five different types of affective speech: approval, prohibition, attention, comfort, and neutral. The affective intent classifier was created as follows. Low-level features such as pitch mean and energy (volume) variance were extracted from samples of recorded speech. The classes of affective intent were then modeled as a gaussian mixture model and trained with these samples using the expectation-maximization algorithm
. Classification is done with multiple stages, first classifying an utterance into one of two general groups (e.g. soothing/neutral vs. prohibition/attention/approval) and then doing more detailed classification. This architecture significantly improved performance for hard-to-distinguish classes, like approval ("You're a clever robot") versus attention ("Hey Kismet over here").
(its limited attention span).
: anger, disgust, fear, joy, sorrow, surprise. In addition, it contains three arousal states: boredom, interest, and calm. These emotional states can activate behaviors. For example, the fear emotion can induce the escape behavior.
At any given moment, Kismet can only be in one emotional state at a time. However, Breazeal states that Kismet is not conscious, so it does not have feelings.
Kismet speaks a proto-language with a variety of phonemes, similar to baby's babbling. It uses the DECtalk
voice synthesizer, and changes pitch, timing, articulation, etc. to express various emotions. Intonation is used to vary between question and statement-like utterances. Lip synchronization was important for realism, and the developers used a strategy from animation : simplicity is the secret to successful lip animation. Thus, they did not try to imitate lip motions perfectly, but instead create a visual short hand that passes unchallenged by the viewer.
opera Three Tales
, as a symbol of the development of artificial intelligence, and also a voice of traditional ethics.
A replica of Kismet is part of a traveling exhibit along with Breazeal (in a pre-recorded segment) in the Star Wars: Where Science Meets Imagination
exhibition.
Robot
A robot is a mechanical or virtual intelligent agent that can perform tasks automatically or with guidance, typically by remote control. In practice a robot is usually an electro-mechanical machine that is guided by computer and electronic programming. Robots can be autonomous, semi-autonomous or...
made in the late 1990s at Massachusetts Institute of Technology
Massachusetts Institute of Technology
The Massachusetts Institute of Technology is a private research university located in Cambridge, Massachusetts. MIT has five schools and one college, containing a total of 32 academic departments, with a strong emphasis on scientific and technological education and research.Founded in 1861 in...
by Dr. Cynthia Breazeal
Cynthia Breazeal
Cynthia Lynn Breazeal is an Associate Professor of Media Arts and Sciences at the Massachusetts Institute of Technology, where she is the director of the Personal Robots Group at the MIT Media Laboratory...
. The robot's auditory, visual and expressive systems were intended to allow it to participate in human social interaction and to demonstrate simulated human emotion
Emotion
Emotion is a complex psychophysiological experience of an individual's state of mind as interacting with biochemical and environmental influences. In humans, emotion fundamentally involves "physiological arousal, expressive behaviors, and conscious experience." Emotion is associated with mood,...
and appearance. The name Kismet
Kismet
Kismet may refer to:* Fate or Destiny in Turkish and Hindi-Urdu, a predetermined course of events, from Persian qesmat, from Arabic qisma, lot, from qasama, to divide, allot...
comes from the Arabic
Arabic language
Arabic is a name applied to the descendants of the Classical Arabic language of the 6th century AD, used most prominently in the Quran, the Islamic Holy Book...
, Turkish
Turkish language
Turkish is a language spoken as a native language by over 83 million people worldwide, making it the most commonly spoken of the Turkic languages. Its speakers are located predominantly in Turkey and Northern Cyprus with smaller groups in Iraq, Greece, Bulgaria, the Republic of Macedonia, Kosovo,...
, Urdu
Urdu
Urdu is a register of the Hindustani language that is identified with Muslims in South Asia. It belongs to the Indo-European family. Urdu is the national language and lingua franca of Pakistan. It is also widely spoken in some regions of India, where it is one of the 22 scheduled languages and an...
, Hindi
Hindi
Standard Hindi, or more precisely Modern Standard Hindi, also known as Manak Hindi , High Hindi, Nagari Hindi, and Literary Hindi, is a standardized and sanskritized register of the Hindustani language derived from the Khariboli dialect of Delhi...
and Punjabi
Punjabi language
Punjabi is an Indo-Aryan language spoken by inhabitants of the historical Punjab region . For Sikhs, the Punjabi language stands as the official language in which all ceremonies take place. In Pakistan, Punjabi is the most widely spoken language...
word meaning "fate" or sometimes "luck".
Hardware design and construction
In order for Kismet to properly interact with human beings, it contains input devices that give it auditoryHearing (sense)
Hearing is the ability to perceive sound by detecting vibrations through an organ such as the ear. It is one of the traditional five senses...
, visual
Visual perception
Visual perception is the ability to interpret information and surroundings from the effects of visible light reaching the eye. The resulting perception is also known as eyesight, sight, or vision...
, and proprioception
Proprioception
Proprioception , from Latin proprius, meaning "one's own" and perception, is the sense of the relative position of neighbouring parts of the body and strength of effort being employed in movement...
abilities. Kismet simulates emotion through various facial expressions, vocalizations, and movement. Facial expressions are created through movements of the ears, eyebrows, eyelids, lips, jaw, and head. The cost of physical materials is an estimated US$25,000.
Four color CCD cameras mounted on a stereo active vision head and two wide field of view cameras allow Kismet to decide what to pay attention to and to estimate distances. A .5 inch CCD foveal camera with an 8 mm focal length lens
Lens (optics)
A lens is an optical device with perfect or approximate axial symmetry which transmits and refracts light, converging or diverging the beam. A simple lens consists of a single optical element...
is used for higher resolution post-attentional processing, such as eye detection.
By wearing a small microphone, a user can influence Kismet's behaviour. An auditory signal is carried into a 500 MHz PC running Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
, using software developed at MIT by the Spoken Language Systems Group that can process real-time, low-level speech patterns. A 450 MHz PC running NT processes these features in real-time to recognize the spoken affective intent of the caregiver.
In addition to the computers mentioned above, there are four Motorola 68332s
Motorola 68000
The Motorola 68000 is a 16/32-bit CISC microprocessor core designed and marketed by Freescale Semiconductor...
, nine 400 MHz PCs, and another 500 MHz PC.
Maxon DC servo motors
Servomechanism
thumb|right|200px|Industrial servomotorThe grey/green cylinder is the [[Brush |brush-type]] [[DC motor]]. The black section at the bottom contains the [[Epicyclic gearing|planetary]] [[Reduction drive|reduction gear]], and the black object on top of the motor is the optical [[rotary encoder]] for...
with high resolution optical encoders are positioned to give Kismet three degrees of eye movement, which allow it to control gaze direction and gives Kismet the ability to move and orient its eyes like a human. This allows Kismet to simulate human visual behaviors. It also allows humans to assign a communicative value to eye movements and to allow Kismet to focus on what it deems important in its field of vision.
Software system
Kismet's social intelligence software system, or synthetic nervous system (SNS), was designed with human models of intelligent behavior in mind. It contains six subsystems as follows.Low-level feature extraction system
This system processes raw visual and auditory information from cameras and microphones. Kismet's vision system can perform eye detection, motion detection and, albeit controversial, skin-color detection. Whenever Kismet moves its head, it momentarily disables its motion detection system to avoid detecting self-motion. It also uses its stereo cameras to estimate the distance of an object in its visual field, for example to detect threats—large, close objects with a lot of movement.Kismet's audio system is mainly tuned towards identifying affect in infant-directed speech. In particular, it can detect five different types of affective speech: approval, prohibition, attention, comfort, and neutral. The affective intent classifier was created as follows. Low-level features such as pitch mean and energy (volume) variance were extracted from samples of recorded speech. The classes of affective intent were then modeled as a gaussian mixture model and trained with these samples using the expectation-maximization algorithm
Expectation-maximization algorithm
In statistics, an expectation–maximization algorithm is an iterative method for finding maximum likelihood or maximum a posteriori estimates of parameters in statistical models, where the model depends on unobserved latent variables...
. Classification is done with multiple stages, first classifying an utterance into one of two general groups (e.g. soothing/neutral vs. prohibition/attention/approval) and then doing more detailed classification. This architecture significantly improved performance for hard-to-distinguish classes, like approval ("You're a clever robot") versus attention ("Hey Kismet over here").
Attention system
Kismet's attention system selects stimuli in its environment to direct the robot's attention and gaze to, for example if something suddenly appears. The system has two stages: pre-attentive, which uses the low-level visual feature detectors to detect colors and motions, and a limited capacity stage which processes a certain region in the visual field. For example, facial expression recognition or object detection is done in the limited capacity stage. This attention system is influenced not only by external factors, but by Kismet's current task at hand (seek-people vs. seek-toys) or habituationHabituation
Habituation can be defined as a process or as a procedure. As a process it is defined as a decrease in an elicited behavior resulting from the repeated presentation of an eliciting stimulus...
(its limited attention span).
High-level perceptual system
Kismet's perceptual system translates low-level features into meaningful events. This is done through releasers. A releaser is a kind of checklist, which assesses a combination of low-level features to decide what kind of event it is. For example, "big, fast motion, and close" will indicate a threat.Motivation system
The motivation system coordinates Kismet's drives and emotions. The drive subsystem regulates Kismet's social, stimulation and fatigue related needs. Like in an animal that has a level of hunger, each drive becomes more intense until it is satiated. These drives affect Kismet's emotion system, which contains the 6 basic emotions as described by Paul EkmanPaul Ekman
Paul Ekman is a psychologist who has been a pioneer in the study of emotions and their relation to facial expressions. He has been considered one of the 100 most eminent psychologists of the twentieth century...
: anger, disgust, fear, joy, sorrow, surprise. In addition, it contains three arousal states: boredom, interest, and calm. These emotional states can activate behaviors. For example, the fear emotion can induce the escape behavior.
At any given moment, Kismet can only be in one emotional state at a time. However, Breazeal states that Kismet is not conscious, so it does not have feelings.
Behavior system
Kismet's behavior system decides what behavior to carry out. Behaviors include play with toy, greet person, sleep, and so on. Each behavior receives input from the emotion system, drive system, and various releasers. The values from these modules are combined to produce an activation level value for each behavior. If the activation level of the behavior reaches a certain threshold, Kismet performs the associated behavior.Motor system
The motor system controls Kismet's body posture, facial expressions, speech and lip synchronization, and gaze direction. The robot has 9 basis postures, or expressions: fear, accepting, tired, content, stern, disgust, anger, surprise and unhappy. However, Kismet's emotion space is continuous, not discrete. For example, if Kismet shows an unhappy posture and the observer start speaking in a soothing voice, Kismet's expression can smoothly transition to accepting. In addition to facial features such as eyebrows and mouth shape, Kismet can also change the orientation of its ears. For instance, arousal is conveyed by pointing its ears upward.Kismet speaks a proto-language with a variety of phonemes, similar to baby's babbling. It uses the DECtalk
DECtalk
DECtalk was a speech synthesizer and text-to-speech technology developed by Digital Equipment Corporation in the early 1980s, based largely on the work of Dennis Klatt at MIT, whose source-filter algorithm was variously known as KlattTalk or MITalk....
voice synthesizer, and changes pitch, timing, articulation, etc. to express various emotions. Intonation is used to vary between question and statement-like utterances. Lip synchronization was important for realism, and the developers used a strategy from animation : simplicity is the secret to successful lip animation. Thus, they did not try to imitate lip motions perfectly, but instead create a visual short hand that passes unchallenged by the viewer.
In the media
Kismet has been featured on NBC as well as Discover magazine. It also played a small role in the Steve ReichSteve Reich
Stephen Michael "Steve" Reich is an American composer who together with La Monte Young, Terry Riley, and Philip Glass is a pioneering composer of minimal music...
opera Three Tales
Three Tales (opera)
Three Tales is a contemporary video-opera in three acts, composed by American composer Steve Reich in 2002. Beryl Korot, the wife of the composer, created the visuals which accompany the music written for ensemble and pre-recorded audio. Its world premiere was at the Vienna Festival, on May 12, 2002...
, as a symbol of the development of artificial intelligence, and also a voice of traditional ethics.
A replica of Kismet is part of a traveling exhibit along with Breazeal (in a pre-recorded segment) in the Star Wars: Where Science Meets Imagination
Star Wars: Where Science Meets Imagination
Star Wars: Where Science Meets Imagination is a traveling exhibition created by the Museum of Science, Boston, featuring props and costumes used in the Star Wars films, but focusing primarily on the science behind George Lucas' science fiction-fantasy epic...
exhibition.
See also
- Affective computingAffective computingAffective computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects. It is an interdisciplinary field spanning computer sciences, psychology, and cognitive science...
- Artificial intelligenceArtificial intelligenceArtificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...