Computer facial animation
Encyclopedia
Computer facial animation is primarily an area of computer graphics
that encapsulates models and techniques for generating and animating images of the human head
and face
. Due to its subject and output type, it is also related to many other scientific and artistic fields from psychology
to traditional animation
. The importance of human faces
in verbal and non-verbal communication
and advances in computer graphics hardware
and software have caused considerable scientific, technological, and artistic interests in computer facial animation.
Although development of computer graphics
methods for facial animation started in the early 1970s, major achievements in this field are more recent and happened since the late 1980s.
Computer facial animation includes a variety of techniques from morphing
to three-dimensional modeling and rendering
. It has become well-known and popular through animated feature film
s and computer games
but its applications include many more areas such as communication
, education
, scientific simulation
, and agent
-based systems (for example online customer service representatives).
has been the subject of scientific investigation for more than one hundred years. Study of facial movements and expressions started from a biological point of view. After some older investigations, for example by John Bulwer
in late 1640s, Charles Darwin
’s book The Expression of the Emotions in Men and Animals can be considered a major departure for modern research in behavioural biology
.
More recently, one of the most important attempts to describe facial activities (movements) was Facial Action Coding System
(FACS). Introduced by Ekman
and Friesen
in 1978, FACS
defines 46 basic facial Action Units (AUs). A major group of these Action Units represent primitive movements of facial muscles in actions such as raising brows, winking, and talking. Eight AUs are for rigid three-dimensional head movements, i.e. turning and tilting left and right and going up, down, forward and backward. FACS has been successfully used for describing desired movements of synthetic faces and also in tracking facial activities.
Computer based facial expression modelling and animation
is not a new endeavour. The earliest work with computer based facial representation was done in the early 1970s. The first three-dimensional facial animation was created by Parke
in 1972. In 1973, Gillenson developed an interactive system to assemble and edit line drawn facial images. And in 1974, Parke
developed a parameterized three-dimensional facial model.
The early 1980s saw the development of the first physically based muscle-controlled face model by Platt and the development of techniques for facial caricatures by Brennan. In 1985, the short animated film ``Tony de Peltrie’’ was a landmark for facial animation. In it for the first time computer facial expression and speech animation were a fundamental part of telling the story.
The late 1980s saw the development of a new muscle-based model by Waters
, the development of an abstract muscle action model by Magnenat-Thalmann and colleagues, and approaches to automatic speech synchronization by Lewis and by Hill. The 1990s have seen increasing activity in the development of facial animation techniques and the use of computer facial animation as a key storytelling component as illustrated in animated films such as Toy Story
, Antz
, Shrek
, and Monsters, Inc, and computer games
such as Sims
. Casper
(1995) is a milestone in this period, being the first movie with a lead actor produced exclusively using digital facial animation (Toy Story
was released later the same year).
The sophistication of the films increased after 2000. In The Matrix Reloaded
and Matrix Revolutions dense optical flow
from several high-definition cameras was used to capture realistic facial movement at every point on the face. Polar Express (film) used a large Vicon system to capture upward of 150 points. Although these systems are automated, a large amount of manual clean-up effort is still needed to make the data usable. Another milestone in facial animation was reached by The Lord of the Rings
where a character specific shape base system was developed. Mark Sagar pioneered the use of FACS
in entertainment facial animation, and FACS based systems developed by Sagar were used on Monster House
, King Kong
, and other films.
is a technique which allows in-between transitional images to be generated between a pair of target still images or between frames from sequences of video. These morphing
techniques usually consist of a combination of a geometric deformation technique, which aligns the target images, and a cross-fade which creates the smooth transition in the image texture. An early example of image morphing
can be seen in Michael Jackson
's video for "Black Or White". In 1997 Ezzat and Poggio working at the MIT Center for Biological and Computational Learning created a system called MikeTalk which morphs between image keyframes, representing viseme
s, to create speech animation.
Another form of animation from images consists of concatenating together sequences captured from video. In 1997 Bregler et al. described a technique called video-rewrite where existing footage of an actor is cut into segments corresponding to phonetic units which are blended together to create new animations of a speaker. Video-rewrite uses computer vision
techniques to automatically track lip movements in video and these features are used in the alignment and blending of the extracted phonetic units. This animation technique only generates animations of the lower part of the face, these are then composited with video of the original actor to produce the final animation.
models provide the most powerful means of generating computer facial animation. One of the earliest works on computerized head models for graphics
and animation
was done by Parke. The model was a mesh
of 3D points controlled by a set of conformation and expression parameters. The former group controls the relative location of facial feature points such as eye
and lip
corners. Changing these parameters can re-shape a base model to create new heads. The latter group of parameters (expression) are facial actions
that can be performed on face such as stretching lips or closing eyes. This model was extended by other researchers to include more facial features and add more flexibility. Different methods for initializing such “generic” model based on individual (3D or 2D) data have been proposed and successfully implemented. The parameterized models are effective ways due to use of limited parameters, associated to main facial feature points. The MPEG-4
standard (Section 7.15.3 – Face animation parameter data) defines a minimum set of parameters for facial animation.
Animation
is done by changing parameters over time. Facial animation is approached in different ways, traditional techniques include
1. Shape based systems offer a fast playback as well as a high degree of fidelity of expressions. The technique involves modelling portions of the face mesh to approximate expressions and viseme
s and then blending the different sub meshes, known as morph targets or shapes. Perhaps the most accomplished character using this technique was Gollum, from The Lord of the Rings. Drawbacks of this technique are that they involve intensive manual labor, are specific to each character and must be animated by slider parameter tables.
2. Skeletal Muscle systems, physically based head models form another approach in modeling the head
and face
. Here the physical and anatomical
characteristics of bone
s, tissues
, and skin
are simulated to provide a realistic appearance (e.g. spring-like elasticity). Such methods can be very powerful for creating realism but the complexity of facial structures make them computationally expensive, and difficult to create. Considering the effectiveness of parameterized models for communicative purposes (as explained in the next section), it may be argued that physically based models are not a very efficient choice in many applications. This does not deny the advantages of physically based models and the fact that they can even be used within the context of parameterized models to provide local details when needed. Waters
, Terzopoulos, Kahler, and Seidel (among others) have developed physically based facial animation systems.
3. 'Envelope Bones' or 'Cages' are commonly used in games. They produce simple and fast models, but are not prone to portray subtlety.
4. Motion capture
uses cameras placed around a subject. The subject is generally fitted either with reflectors (passive motion capture) or sources (active motion capture) that precisely determine the subject's position in space. The data recorded by the cameras is then digitized and converted into a three-dimensional computer model of the subject. Until recently, the size of the detectors/sources used by motion capture systems made the technology inappropriate for facial capture. However, miniaturization and other advancements have made motion capture a viable tool for computer facial animation. Facial motion capture was used extensively in Polar Express by Imageworks
where hundreds of motion points were captured. This film was very accomplished and while it attempted to recreate realism, it was criticised for having fallen in the 'uncanny valley
', the realm where animation realism is sufficient for human recognition but fails to convey the emotional message. The main difficulties of motion capture are the quality of the data which may include vibration as well as the retargeting of the geometry of the points. A recent technology developed at the Applied Geometry Group and Computer Vision Laboratory at ETH Zurich
achieves real-time performance without the use of any markers using a high speed structured light scanner. The system is based on a robust offline face tracking stage which trains the system with different facial expressions. The matched sequences are used to build a person-specific linear face model that is subsequently used for online face tracking and expression transfer.
5. Deformation Solver Face Robot.
s are used to represent the key poses in observed speech (i.e. the position of the lips, jaw and tongue when producing a particular phoneme
), however there is a great deal of variation in the realisation of visemes during the production of natural speech. The source of this variation is termed coarticulation
which is the influence of surrounding visemes upon the current viseme (i.e. the effect of context). To account for coarticulation current systems either explicitly take into account context when blending viseme keyframes or use longer units such as diphone
, triphone
, syllable
or even word
and sentence
-length units.
One of the most common approaches to speech animation is the use of dominance functions introduced by Cohen and Massaro. Each dominance function represents the influence over time that a viseme has on a speech utterance. Typically the influence will be greatest at the center of the viseme and will degrade with distance from the viseme center. Dominance functions are blended together to generate a speech trajectory in much the same way that spline
basis functions are blended together to generate a curve. The shape of each dominance function will be different according to both which viseme it represents and what aspect of the face is being controlled (e.g. lip width, jaw rotation etc.). This approach to computer-generated speech animation can be seen in the Baldi talking head.
Other models of speech use basis units which include context (e.g. diphone
s, triphone
s etc.) instead of visemes. As the basis units already incorporate the variation of each viseme according to context and to some degree the dynamics of each viseme, no model of coarticulation
is required. Speech is simply generated by selecting appropriate units from a database and blending the units together. This is similar to concatenative techniques in audio speech synthesis
. The disadvantage to these models is that a large amount of captured data is required to produce natural results, and whilst longer units produce more natural results the size of database required expands with the average length of each unit.
Finally, some models directly generate speech animations from audio. These systems typically use hidden markov model
s or neural nets to transform audio parameters into a stream of control parameters for a facial model. The advantage of this method is the capability of voice context handling, the natural rhythm, tempo, emotional and dynamics handling without complex approximation algorithms. The training database is not needed to be labeled since there are no phonemes or visemes needed; the only needed data is the voice and the animation parameters. An example of this approach is the Johnnie Talker systemhttp://digitus.itk.ppke.hu/~flugi/johnnie/.
presentation languages such as SMIL
and VRML
. Due to the popularity and effectiveness of XML
as a data representation mechanism, most face animation languages are XML-based. For instance, this is a sample from Virtual Human Markup Language
(VHML):
More advanced languages allow decision-making, event handling, and parallel and sequential actions. Following is an example from Face Modeling Language
(FML):
Computer graphics
Computer graphics are graphics created using computers and, more generally, the representation and manipulation of image data by a computer with help from specialized software and hardware....
that encapsulates models and techniques for generating and animating images of the human head
Human head
In human anatomy, the head is the upper portion of the human body. It supports the face and is maintained by the skull, which itself encloses the brain.-Cultural importance:...
and face
Face
The face is a central sense organ complex, for those animals that have one, normally on the ventral surface of the head, and can, depending on the definition in the human case, include the hair, forehead, eyebrow, eyelashes, eyes, nose, ears, cheeks, mouth, lips, philtrum, temple, teeth, skin, and...
. Due to its subject and output type, it is also related to many other scientific and artistic fields from psychology
Psychology
Psychology is the study of the mind and behavior. Its immediate goal is to understand individuals and groups by both establishing general principles and researching specific cases. For many, the ultimate goal of psychology is to benefit society...
to traditional animation
Animation
Animation is the rapid display of a sequence of images of 2-D or 3-D artwork or model positions in order to create an illusion of movement. The effect is an optical illusion of motion due to the phenomenon of persistence of vision, and can be created and demonstrated in several ways...
. The importance of human faces
Face
The face is a central sense organ complex, for those animals that have one, normally on the ventral surface of the head, and can, depending on the definition in the human case, include the hair, forehead, eyebrow, eyelashes, eyes, nose, ears, cheeks, mouth, lips, philtrum, temple, teeth, skin, and...
in verbal and non-verbal communication
Communication
Communication is the activity of conveying meaningful information. Communication requires a sender, a message, and an intended recipient, although the receiver need not be present or aware of the sender's intent to communicate at the time of communication; thus communication can occur across vast...
and advances in computer graphics hardware
Graphics processing unit
A graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...
and software have caused considerable scientific, technological, and artistic interests in computer facial animation.
Although development of computer graphics
Computer graphics
Computer graphics are graphics created using computers and, more generally, the representation and manipulation of image data by a computer with help from specialized software and hardware....
methods for facial animation started in the early 1970s, major achievements in this field are more recent and happened since the late 1980s.
Computer facial animation includes a variety of techniques from morphing
Morphing
Morphing is a special effect in motion pictures and animations that changes one image into another through a seamless transition. Most often it is used to depict one person turning into another through technological means or as part of a fantasy or surreal sequence. Traditionally such a depiction...
to three-dimensional modeling and rendering
Rendering (computer graphics)
Rendering is the process of generating an image from a model , by means of computer programs. A scene file contains objects in a strictly defined language or data structure; it would contain geometry, viewpoint, texture, lighting, and shading information as a description of the virtual scene...
. It has become well-known and popular through animated feature film
Film
A film, also called a movie or motion picture, is a series of still or moving images. It is produced by recording photographic images with cameras, or by creating images using animation techniques or visual effects...
s and computer games
Computer Games
"Computer Games" is a single by New Zealand group, Mi-Sex released in 1979 in Australia and New Zealand and in 1981 throughout Europe. It was the single that launched the band, and was hugely popular, particularly in Australia and New Zealand...
but its applications include many more areas such as communication
Communication
Communication is the activity of conveying meaningful information. Communication requires a sender, a message, and an intended recipient, although the receiver need not be present or aware of the sender's intent to communicate at the time of communication; thus communication can occur across vast...
, education
Education
Education in its broadest, general sense is the means through which the aims and habits of a group of people lives on from one generation to the next. Generally, it occurs through any experience that has a formative effect on the way one thinks, feels, or acts...
, scientific simulation
Simulation
Simulation is the imitation of some real thing available, state of affairs, or process. The act of simulating something generally entails representing certain key characteristics or behaviours of a selected physical or abstract system....
, and agent
Software agent
In computer science, a software agent is a piece of software that acts for a user or other program in a relationship of agency, which derives from the Latin agere : an agreement to act on one's behalf...
-based systems (for example online customer service representatives).
History
Human facial expressionFacial expression
A facial expression one or more motions or positions of the muscles in the skin. These movements convey the emotional state of the individual to observers. Facial expressions are a form of nonverbal communication. They are a primary means of conveying social information among humans, but also occur...
has been the subject of scientific investigation for more than one hundred years. Study of facial movements and expressions started from a biological point of view. After some older investigations, for example by John Bulwer
John Bulwer
John Bulwer was an English physician and early Baconian natural philosopher who wrote five works exploring the Body and human communication, particularly by gesture....
in late 1640s, Charles Darwin
Charles Darwin
Charles Robert Darwin FRS was an English naturalist. He established that all species of life have descended over time from common ancestry, and proposed the scientific theory that this branching pattern of evolution resulted from a process that he called natural selection.He published his theory...
’s book The Expression of the Emotions in Men and Animals can be considered a major departure for modern research in behavioural biology
Biology
Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy. Biology is a vast subject containing many subdivisions, topics, and disciplines...
.
More recently, one of the most important attempts to describe facial activities (movements) was Facial Action Coding System
Facial Action Coding System
Facial Action Coding System is a system to taxonomize human facial expressions, originally developed by Paul Ekman and Wallace V. Friesen in 1978...
(FACS). Introduced by Ekman
Paul Ekman
Paul Ekman is a psychologist who has been a pioneer in the study of emotions and their relation to facial expressions. He has been considered one of the 100 most eminent psychologists of the twentieth century...
and Friesen
Friesen
Friesen is a commune in the Haut-Rhin department in Alsace in north-eastern France.-References:*...
in 1978, FACS
FACS
FACS may refer to:* Facial Action Coding System, a procedure to systematically describe human facial expressions.* Fluorescence-activated cell sorting, a biological technique used in flow cytometry.* Family and consumer science...
defines 46 basic facial Action Units (AUs). A major group of these Action Units represent primitive movements of facial muscles in actions such as raising brows, winking, and talking. Eight AUs are for rigid three-dimensional head movements, i.e. turning and tilting left and right and going up, down, forward and backward. FACS has been successfully used for describing desired movements of synthetic faces and also in tracking facial activities.
Computer based facial expression modelling and animation
Animation
Animation is the rapid display of a sequence of images of 2-D or 3-D artwork or model positions in order to create an illusion of movement. The effect is an optical illusion of motion due to the phenomenon of persistence of vision, and can be created and demonstrated in several ways...
is not a new endeavour. The earliest work with computer based facial representation was done in the early 1970s. The first three-dimensional facial animation was created by Parke
Fred Parke
Frederic Ira Parke graduated from the University of Utah with a BS degree in physics in 1965. He was then a graduate student of the University of Utah College of Engineering where he received his MS and PhD in computer science. Parke was the creator of the first CG physically modeled human face...
in 1972. In 1973, Gillenson developed an interactive system to assemble and edit line drawn facial images. And in 1974, Parke
Fred Parke
Frederic Ira Parke graduated from the University of Utah with a BS degree in physics in 1965. He was then a graduate student of the University of Utah College of Engineering where he received his MS and PhD in computer science. Parke was the creator of the first CG physically modeled human face...
developed a parameterized three-dimensional facial model.
The early 1980s saw the development of the first physically based muscle-controlled face model by Platt and the development of techniques for facial caricatures by Brennan. In 1985, the short animated film ``Tony de Peltrie’’ was a landmark for facial animation. In it for the first time computer facial expression and speech animation were a fundamental part of telling the story.
The late 1980s saw the development of a new muscle-based model by Waters
Keith Waters
Keith Waters , formerly of LifeFX Networks, Inc., has been involved in facial animation for the past 20 years. He development of a muscle-based model for facial animation including a physically based skin tissue model as well as a visual text-to-speech system called DECface...
, the development of an abstract muscle action model by Magnenat-Thalmann and colleagues, and approaches to automatic speech synchronization by Lewis and by Hill. The 1990s have seen increasing activity in the development of facial animation techniques and the use of computer facial animation as a key storytelling component as illustrated in animated films such as Toy Story
Toy Story
Toy Story is a 1995 American computer-animated film released by Walt Disney Pictures. It is Pixar's first feature film as well as the first ever feature film to be made entirely with CGI. The film was directed by John Lasseter and featuring the voices of Tom Hanks and Tim Allen...
, Antz
Antz
Antz is a 1998 American computer animated action adventure film produced by DreamWorks Animation. It features the voices of well-known actors such as Woody Allen, Sharon Stone, Jennifer Lopez, Sylvester Stallone, Dan Aykroyd, Anne Bancroft, Gene Hackman, Christopher Walken, and Danny Glover as...
, Shrek
Shrek
Shrek is a 2001 American computer-animated fantasy comedy film directed by Andrew Adamson and Vicky Jenson, featuring the voices of Mike Myers, Eddie Murphy, Cameron Diaz, and John Lithgow. Loosely based on William Steig's 1990 fairy tale picture book Shrek!...
, and Monsters, Inc, and computer games
Computer Games
"Computer Games" is a single by New Zealand group, Mi-Sex released in 1979 in Australia and New Zealand and in 1981 throughout Europe. It was the single that launched the band, and was hugely popular, particularly in Australia and New Zealand...
such as Sims
SIMS
- Last name :* Andrew Sims, American rapper, member of the Doomtree collective* Ashton Sims, Australian rugby league footballer* Charles Sims , British painter* Christopher A. Sims, American economist* Ernie Sims, NFL linebacker...
. Casper
Casper
Caspar, one of the Three Biblical MagiCasper may refer to:-Given name:*Casper , 5th Century ruler of the Mayan city of Palenque*Caspar Badrutt , Swiss businessman and pioneer of alpine resorts...
(1995) is a milestone in this period, being the first movie with a lead actor produced exclusively using digital facial animation (Toy Story
Toy Story
Toy Story is a 1995 American computer-animated film released by Walt Disney Pictures. It is Pixar's first feature film as well as the first ever feature film to be made entirely with CGI. The film was directed by John Lasseter and featuring the voices of Tom Hanks and Tim Allen...
was released later the same year).
The sophistication of the films increased after 2000. In The Matrix Reloaded
The Matrix Reloaded
The Matrix Reloaded is a 2003 American science fiction film and the second installment in The Matrix trilogy, written and directed by the Wachowskis. It premiered on May 7, 2003, in Westwood, Los Angeles, California, and went on general release by Warner Bros. in North American theaters on May 15,...
and Matrix Revolutions dense optical flow
Optical flow
Optical flow or optic flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and the scene. The concept of optical flow was first studied in the 1940s and ultimately published by American psychologist James J....
from several high-definition cameras was used to capture realistic facial movement at every point on the face. Polar Express (film) used a large Vicon system to capture upward of 150 points. Although these systems are automated, a large amount of manual clean-up effort is still needed to make the data usable. Another milestone in facial animation was reached by The Lord of the Rings
The Lord of the Rings film trilogy
The Lord of the Rings is an epic film trilogy consisting of three fantasy adventure films based on the three-volume book of the same name by English author J. R. R. Tolkien. The films are The Fellowship of the Ring , The Two Towers and The Return of the King .The films were directed by Peter...
where a character specific shape base system was developed. Mark Sagar pioneered the use of FACS
FACS
FACS may refer to:* Facial Action Coding System, a procedure to systematically describe human facial expressions.* Fluorescence-activated cell sorting, a biological technique used in flow cytometry.* Family and consumer science...
in entertainment facial animation, and FACS based systems developed by Sagar were used on Monster House
Monster House (film)
Monster House is a 2006 computer animated motion capture horror/comedy film produced by ImageMovers and Amblin Entertainment, and distributed by Columbia Pictures. Executive produced by Robert Zemeckis and Steven Spielberg, this is the first time since Back to the Future Part III that they have...
, King Kong
King Kong (2005 film)
King Kong is a 2005 fantasy adventure film directed by Peter Jackson. It is a remake of the 1933 film of the same name and stars Naomi Watts, Jack Black and Adrien Brody. Andy Serkis, through performance capture, portrays Kong....
, and other films.
2D Animation
Two-dimensional facial animation is commonly based upon the transformation of images, including both images from still photography and sequences of video. Image morphingMorphing
Morphing is a special effect in motion pictures and animations that changes one image into another through a seamless transition. Most often it is used to depict one person turning into another through technological means or as part of a fantasy or surreal sequence. Traditionally such a depiction...
is a technique which allows in-between transitional images to be generated between a pair of target still images or between frames from sequences of video. These morphing
Morphing
Morphing is a special effect in motion pictures and animations that changes one image into another through a seamless transition. Most often it is used to depict one person turning into another through technological means or as part of a fantasy or surreal sequence. Traditionally such a depiction...
techniques usually consist of a combination of a geometric deformation technique, which aligns the target images, and a cross-fade which creates the smooth transition in the image texture. An early example of image morphing
Morphing
Morphing is a special effect in motion pictures and animations that changes one image into another through a seamless transition. Most often it is used to depict one person turning into another through technological means or as part of a fantasy or surreal sequence. Traditionally such a depiction...
can be seen in Michael Jackson
Michael Jackson
Michael Joseph Jackson was an American recording artist, entertainer, and businessman. Referred to as the King of Pop, or by his initials MJ, Jackson is recognized as the most successful entertainer of all time by Guinness World Records...
's video for "Black Or White". In 1997 Ezzat and Poggio working at the MIT Center for Biological and Computational Learning created a system called MikeTalk which morphs between image keyframes, representing viseme
Viseme
A viseme is a representational unit used to classify speech sounds in the visual domain. The term viseme was introduced based on the interpretation of the phoneme as a basic unit of speech in the acoustic/auditory domain,...
s, to create speech animation.
Another form of animation from images consists of concatenating together sequences captured from video. In 1997 Bregler et al. described a technique called video-rewrite where existing footage of an actor is cut into segments corresponding to phonetic units which are blended together to create new animations of a speaker. Video-rewrite uses computer vision
Computer vision
Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...
techniques to automatically track lip movements in video and these features are used in the alignment and blending of the extracted phonetic units. This animation technique only generates animations of the lower part of the face, these are then composited with video of the original actor to produce the final animation.
3D Animation
Three-dimensional headHuman head
In human anatomy, the head is the upper portion of the human body. It supports the face and is maintained by the skull, which itself encloses the brain.-Cultural importance:...
models provide the most powerful means of generating computer facial animation. One of the earliest works on computerized head models for graphics
Graphics
Graphics are visual presentations on some surface, such as a wall, canvas, computer screen, paper, or stone to brand, inform, illustrate, or entertain. Examples are photographs, drawings, Line Art, graphs, diagrams, typography, numbers, symbols, geometric designs, maps, engineering drawings,or...
and animation
Animation
Animation is the rapid display of a sequence of images of 2-D or 3-D artwork or model positions in order to create an illusion of movement. The effect is an optical illusion of motion due to the phenomenon of persistence of vision, and can be created and demonstrated in several ways...
was done by Parke. The model was a mesh
Polygon mesh
A polygon mesh or unstructured grid is a collection of vertices, edges and faces that defines the shape of a polyhedral object in 3D computer graphics and solid modeling...
of 3D points controlled by a set of conformation and expression parameters. The former group controls the relative location of facial feature points such as eye
Human eye
The human eye is an organ which reacts to light for several purposes. As a conscious sense organ, the eye allows vision. Rod and cone cells in the retina allow conscious light perception and vision including color differentiation and the perception of depth...
and lip
Lip
Lips are a visible body part at the mouth of humans and many animals. Lips are soft, movable, and serve as the opening for food intake and in the articulation of sound and speech...
corners. Changing these parameters can re-shape a base model to create new heads. The latter group of parameters (expression) are facial actions
Facial Action Coding System
Facial Action Coding System is a system to taxonomize human facial expressions, originally developed by Paul Ekman and Wallace V. Friesen in 1978...
that can be performed on face such as stretching lips or closing eyes. This model was extended by other researchers to include more facial features and add more flexibility. Different methods for initializing such “generic” model based on individual (3D or 2D) data have been proposed and successfully implemented. The parameterized models are effective ways due to use of limited parameters, associated to main facial feature points. The MPEG-4
MPEG-4
MPEG-4 is a method of defining compression of audio and visual digital data. It was introduced in late 1998 and designated a standard for a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group under the formal standard ISO/IEC...
standard (Section 7.15.3 – Face animation parameter data) defines a minimum set of parameters for facial animation.
Animation
Animation
Animation is the rapid display of a sequence of images of 2-D or 3-D artwork or model positions in order to create an illusion of movement. The effect is an optical illusion of motion due to the phenomenon of persistence of vision, and can be created and demonstrated in several ways...
is done by changing parameters over time. Facial animation is approached in different ways, traditional techniques include
- shapes/morph targets,
- skeleton-muscle systems,
- bones/cages,
- motion captureMotion captureMotion capture, motion tracking, or mocap are terms used to describe the process of recording movement and translating that movement on to a digital model. It is used in military, entertainment, sports, and medical applications, and for validation of computer vision and robotics...
on points on the face and - knowledge based solver deformations.
1. Shape based systems offer a fast playback as well as a high degree of fidelity of expressions. The technique involves modelling portions of the face mesh to approximate expressions and viseme
Viseme
A viseme is a representational unit used to classify speech sounds in the visual domain. The term viseme was introduced based on the interpretation of the phoneme as a basic unit of speech in the acoustic/auditory domain,...
s and then blending the different sub meshes, known as morph targets or shapes. Perhaps the most accomplished character using this technique was Gollum, from The Lord of the Rings. Drawbacks of this technique are that they involve intensive manual labor, are specific to each character and must be animated by slider parameter tables.
2. Skeletal Muscle systems, physically based head models form another approach in modeling the head
Human head
In human anatomy, the head is the upper portion of the human body. It supports the face and is maintained by the skull, which itself encloses the brain.-Cultural importance:...
and face
Face
The face is a central sense organ complex, for those animals that have one, normally on the ventral surface of the head, and can, depending on the definition in the human case, include the hair, forehead, eyebrow, eyelashes, eyes, nose, ears, cheeks, mouth, lips, philtrum, temple, teeth, skin, and...
. Here the physical and anatomical
Anatomy
Anatomy is a branch of biology and medicine that is the consideration of the structure of living things. It is a general term that includes human anatomy, animal anatomy , and plant anatomy...
characteristics of bone
Bone
Bones are rigid organs that constitute part of the endoskeleton of vertebrates. They support, and protect the various organs of the body, produce red and white blood cells and store minerals. Bone tissue is a type of dense connective tissue...
s, tissues
Biological tissue
Tissue is a cellular organizational level intermediate between cells and a complete organism. A tissue is an ensemble of cells, not necessarily identical, but from the same origin, that together carry out a specific function. These are called tissues because of their identical functioning...
, and skin
Human skin
The human skin is the outer covering of the body. In humans, it is the largest organ of the integumentary system. The skin has multiple layers of ectodermal tissue and guards the underlying muscles, bones, ligaments and internal organs. Human skin is similar to that of most other mammals,...
are simulated to provide a realistic appearance (e.g. spring-like elasticity). Such methods can be very powerful for creating realism but the complexity of facial structures make them computationally expensive, and difficult to create. Considering the effectiveness of parameterized models for communicative purposes (as explained in the next section), it may be argued that physically based models are not a very efficient choice in many applications. This does not deny the advantages of physically based models and the fact that they can even be used within the context of parameterized models to provide local details when needed. Waters
Keith Waters
Keith Waters , formerly of LifeFX Networks, Inc., has been involved in facial animation for the past 20 years. He development of a muscle-based model for facial animation including a physically based skin tissue model as well as a visual text-to-speech system called DECface...
, Terzopoulos, Kahler, and Seidel (among others) have developed physically based facial animation systems.
3. 'Envelope Bones' or 'Cages' are commonly used in games. They produce simple and fast models, but are not prone to portray subtlety.
4. Motion capture
Motion capture
Motion capture, motion tracking, or mocap are terms used to describe the process of recording movement and translating that movement on to a digital model. It is used in military, entertainment, sports, and medical applications, and for validation of computer vision and robotics...
uses cameras placed around a subject. The subject is generally fitted either with reflectors (passive motion capture) or sources (active motion capture) that precisely determine the subject's position in space. The data recorded by the cameras is then digitized and converted into a three-dimensional computer model of the subject. Until recently, the size of the detectors/sources used by motion capture systems made the technology inappropriate for facial capture. However, miniaturization and other advancements have made motion capture a viable tool for computer facial animation. Facial motion capture was used extensively in Polar Express by Imageworks
Imageworks
Imageworks may refer to* Sony Pictures Imageworks* Image Works, a video game publisher in the late-1980s and early-1990s.* Imageworks, a 1980s music industry promotion company, later Baltimore Productions / Imageworks....
where hundreds of motion points were captured. This film was very accomplished and while it attempted to recreate realism, it was criticised for having fallen in the 'uncanny valley
Uncanny Valley
The uncanny valley is a hypothesis in the field of robotics and 3D computer animation, which holds that when human replicas look and act almost, but not perfectly, like actual human beings, it causes a response of revulsion among human observers...
', the realm where animation realism is sufficient for human recognition but fails to convey the emotional message. The main difficulties of motion capture are the quality of the data which may include vibration as well as the retargeting of the geometry of the points. A recent technology developed at the Applied Geometry Group and Computer Vision Laboratory at ETH Zurich
ETH Zurich
The Swiss Federal Institute of Technology Zurich or ETH Zürich is an engineering, science, technology, mathematics and management university in the City of Zurich, Switzerland....
achieves real-time performance without the use of any markers using a high speed structured light scanner. The system is based on a robust offline face tracking stage which trains the system with different facial expressions. The matched sequences are used to build a person-specific linear face model that is subsequently used for online face tracking and expression transfer.
5. Deformation Solver Face Robot.
Speech Animation
Speech is usually treated in a different way to the animation of facial expressions, this is because simple keyframe-based approaches to animation typically provide a poor approximation to real speech dynamics. Often visemeViseme
A viseme is a representational unit used to classify speech sounds in the visual domain. The term viseme was introduced based on the interpretation of the phoneme as a basic unit of speech in the acoustic/auditory domain,...
s are used to represent the key poses in observed speech (i.e. the position of the lips, jaw and tongue when producing a particular phoneme
Phoneme
In a language or dialect, a phoneme is the smallest segmental unit of sound employed to form meaningful contrasts between utterances....
), however there is a great deal of variation in the realisation of visemes during the production of natural speech. The source of this variation is termed coarticulation
Coarticulation
Coarticulation in its general sense refers to a situation in which a conceptually isolated speech sound is influenced by, and becomes more like, a preceding or following speech sound...
which is the influence of surrounding visemes upon the current viseme (i.e. the effect of context). To account for coarticulation current systems either explicitly take into account context when blending viseme keyframes or use longer units such as diphone
Diphone
In phonetics, a diphone is an adjacent pair of phones. It is usually used to refer to a recording of the transition between two phones.In the following diagram, a stream of phones are represented by P1, P2, etc., and the corresponding diphones are represented by D1-2, D2-3, etc:...
, triphone
Triphone
In linguistics, a triphone is a sequence of three phonemes. Triphones are useful in models of natural language processing where they are used to establish the various contexts in which a phoneme can occur in a particular natural language....
, syllable
Syllable
A syllable is a unit of organization for a sequence of speech sounds. For example, the word water is composed of two syllables: wa and ter. A syllable is typically made up of a syllable nucleus with optional initial and final margins .Syllables are often considered the phonological "building...
or even word
Word
In language, a word is the smallest free form that may be uttered in isolation with semantic or pragmatic content . This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own...
and sentence
Sentence (linguistics)
In the field of linguistics, a sentence is an expression in natural language, and often defined to indicate a grammatical unit consisting of one or more words that generally bear minimal syntactic relation to the words that precede or follow it...
-length units.
One of the most common approaches to speech animation is the use of dominance functions introduced by Cohen and Massaro. Each dominance function represents the influence over time that a viseme has on a speech utterance. Typically the influence will be greatest at the center of the viseme and will degrade with distance from the viseme center. Dominance functions are blended together to generate a speech trajectory in much the same way that spline
Spline
Spline can refer to:* Spline , a mating feature for rotating elements* Spline , a mathematical function used for interpolation or smoothing* Smoothing spline, a method of smoothing using a spline function...
basis functions are blended together to generate a curve. The shape of each dominance function will be different according to both which viseme it represents and what aspect of the face is being controlled (e.g. lip width, jaw rotation etc.). This approach to computer-generated speech animation can be seen in the Baldi talking head.
Other models of speech use basis units which include context (e.g. diphone
Diphone
In phonetics, a diphone is an adjacent pair of phones. It is usually used to refer to a recording of the transition between two phones.In the following diagram, a stream of phones are represented by P1, P2, etc., and the corresponding diphones are represented by D1-2, D2-3, etc:...
s, triphone
Triphone
In linguistics, a triphone is a sequence of three phonemes. Triphones are useful in models of natural language processing where they are used to establish the various contexts in which a phoneme can occur in a particular natural language....
s etc.) instead of visemes. As the basis units already incorporate the variation of each viseme according to context and to some degree the dynamics of each viseme, no model of coarticulation
Coarticulation
Coarticulation in its general sense refers to a situation in which a conceptually isolated speech sound is influenced by, and becomes more like, a preceding or following speech sound...
is required. Speech is simply generated by selecting appropriate units from a database and blending the units together. This is similar to concatenative techniques in audio speech synthesis
Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...
. The disadvantage to these models is that a large amount of captured data is required to produce natural results, and whilst longer units produce more natural results the size of database required expands with the average length of each unit.
Finally, some models directly generate speech animations from audio. These systems typically use hidden markov model
Hidden Markov model
A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. An HMM can be considered as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E...
s or neural nets to transform audio parameters into a stream of control parameters for a facial model. The advantage of this method is the capability of voice context handling, the natural rhythm, tempo, emotional and dynamics handling without complex approximation algorithms. The training database is not needed to be labeled since there are no phonemes or visemes needed; the only needed data is the voice and the animation parameters. An example of this approach is the Johnnie Talker systemhttp://digitus.itk.ppke.hu/~flugi/johnnie/.
Face Animation Languages
Many face animation languages are used to describe the content of facial animation. They can be input to a compatible "player" software which then creates the requested actions. Face animation languages are closely related to other multimediaMultimedia
Multimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...
presentation languages such as SMIL
Synchronized Multimedia Integration Language
SMIL , the Synchronized Multimedia Integration Language, is a W3C recommended XML markup language for describing multimedia presentations. It defines markup for timing, layout, animations, visual transitions, and media embedding, among other things...
and VRML
VRML
VRML is a standard file format for representing 3-dimensional interactive vector graphics, designed particularly with the World Wide Web in mind...
. Due to the popularity and effectiveness of XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
as a data representation mechanism, most face animation languages are XML-based. For instance, this is a sample from Virtual Human Markup Language
Virtual Human Markup Language
The Virtual Human Markup Language often abbreviated as VHML is a markup language used for the computer animation of human bodies and facial expressions....
(VHML):
More advanced languages allow decision-making, event handling, and parallel and sequential actions. Following is an example from Face Modeling Language
Face Modeling Language
Face Modeling Language is an XML-based language for describing face animation. supports MPEG-4 facial animation parameters, decision-making and dynamic event handling, and typical programming constructs such as loops. It is part of ....
(FML):
See also
- AnimationAnimationAnimation is the rapid display of a sequence of images of 2-D or 3-D artwork or model positions in order to create an illusion of movement. The effect is an optical illusion of motion due to the phenomenon of persistence of vision, and can be created and demonstrated in several ways...
- CaricatureCaricatureA caricature is a portrait that exaggerates or distorts the essence of a person or thing to create an easily identifiable visual likeness. In literature, a caricature is a description of a person using exaggeration of some characteristics and oversimplification of others.Caricatures can be...
- Computer animationComputer animationComputer animation is the process used for generating animated images by using computer graphics. The more general term computer generated imagery encompasses both static scenes and dynamic images, while computer animation only refers to moving images....
- Computer graphicsComputer graphicsComputer graphics are graphics created using computers and, more generally, the representation and manipulation of image data by a computer with help from specialized software and hardware....
- Facial expressionFacial expressionA facial expression one or more motions or positions of the muscles in the skin. These movements convey the emotional state of the individual to observers. Facial expressions are a form of nonverbal communication. They are a primary means of conveying social information among humans, but also occur...
- Face Modeling LanguageFace Modeling LanguageFace Modeling Language is an XML-based language for describing face animation. supports MPEG-4 facial animation parameters, decision-making and dynamic event handling, and typical programming constructs such as loops. It is part of ....
- Interactive online charactersInteractive online charactersAn automated online assistant is a program that uses artificial intelligence to provide customer service or other assistance on a website. Such an assistant may basically consist of a dialog system, an avatar, as well an expert system to provide specific expertise to the user.Automated online...
- MorphingMorphingMorphing is a special effect in motion pictures and animations that changes one image into another through a seamless transition. Most often it is used to depict one person turning into another through technological means or as part of a fantasy or surreal sequence. Traditionally such a depiction...
- Parametric surfaceParametric surfaceA parametric surface is a surface in the Euclidean space R3 which is defined by a parametric equation with two parameters. Parametric representation is the most general way to specify a surface. Surfaces that occur in two of the main theorems of vector calculus, Stokes' theorem and the divergence...
- Texture mappingTexture mappingTexture mapping is a method for adding detail, surface texture , or color to a computer-generated graphic or 3D model. Its application to 3D graphics was pioneered by Dr Edwin Catmull in his Ph.D. thesis of 1974.-Texture mapping:...
Further reading
- Computer Facial Animation by Frederic I. Parke, Keith Waters 2008 ISBN 1568814488
- Data-driven 3D facial animation by Zhigang Deng, Ulrich Neumann 2007 ISBN 1846289068
- Handbook of Virtual Humans by Nadia Magnenat-Thalmann and Daniel Thalmann, 2004 ISBN 0470023163
External links
- Face/Off: Live Facial Puppetry - Realtime markerless facial animation technology developed at ETH Zurich
- The "Artificial Actors" Project - Institute of Animation
- Cubic Motion - Facial Animation Specialist
- iFACE
- Direct voice to animation conversion, Johnnie Talker
- Animated Baldi
- Xface: Open Source 3D Facial Animation Toolkit with MPEG-4
- CU Animate Tools for Enabling Conversations with Animated Characters
- CU Animate Applications
- Rocketbox Libraries - Stock 3D Character Models with Facial Animation Rigs
- Animation with Equations - facial animation blog