Facial motion capture
Encyclopedia
Facial Motion Capture is the process of electronically converting the movements of a person's face into a digital database using cameras or laser scanners. This database may then be used to produce CG (computer graphics) computer animation for movies, games, or real-time avatars. Because the motion of CG characters is derived from the movements of real people, it results in more realistic and nuanced computer character animation than if the animation were created manually.
A facial motion capture database describes the coordinates or relative positions of reference points on the actor's face. The capture may be in two dimensions, in which case the capture process is sometimes called "expression tracking", or in three dimensions. Two dimensional capture can be achieved using a single camera and low cost capture software such as Zign Creations' Zign Track. This produces less sophisticated tracking, and is unable to fully capture three dimensional motions such as head rotation. Three dimensional capture is accomplished using multi-camera rigs or laser marker system. Such systems are typically far more expensive, complicated, and time-consuming to use.
Facial Motion Capture is related to body motion capture
, but is more challenging due to the higher resolution requirements to detect and track subtle expressions possible from small movements of the eyes and lips. These movements are often less than a few millimeters, requiring even greater resolution and fidelity and different filtering techniques than usually used in full body capture. The additional constraints of the face also allow more opportunities for using models and rules.
Two predominate technologies exist; marker and markerless tracking systems.
in 1990. There, he describes 'a means of acquiring the expressions of realfaces, and applying them to computer-generated faces'.
and track the marker movement with high resolution cameras. This has been used on movies such as The Polar Express
and Beowulf
to allow an actor such as Tom Hanks
to drive the facial expressions of several different characters. Unfortunately this is relatively cumbersome and makes the actors expressions overly driven once the smoothing and filtering have taken place. Next generation systems such as CaptiveMotion utilize offshoots of the traditional marker based system with higher levels of details.
Active LED Marker technology is currently being used to drive facial animation in real-time to provide user feedback.
s, the corners of the lips and eyes, and wrinkles and then track them. This technology is discussed and demonstrated at CMU
, IBM
, University of Manchester
(where much of this started with Tim Cootes, Gareth Edwards and Chris Taylor) and other locations, using active appearance model
s, principal component analysis, eigen tracking
, deformable surface models and other techniques to track the desired facial features from frame
to frame. This technology is much less cumbersome, and allows greater expression for the actor.
These vision based approaches also have the ability to track pupil movement, eyelids, teeth occlusion by the lips and tongue, which are obvious problems in most computer animated features. Typical limitations of vision based approaches are resolution and frame rate, both of which are decreasing as issues as high speed, high resolution CMOS cameras become available from multiple sources.
The technology for markerless face tracking is related to that in a Facial recognition system
,
since a facial recognition system can potentially be applied sequentially to each frame
of video, resulting in face tracking.
For example, the Neven Vision system (formerly Eyematics, now acquired by Google) allowed real-time
2D face tracking with no person-specific training; their system was also amongst the best-performing facial recognition systems in the U.S. Government's 2002 Facial Recognition Vendor Test (FRVT).
On the other hand some recognition systems do not explicitly track expressions or
even fail on non-neutral expressions, and so are not suitable for tracking.
Conversely, systems such as deformable surface models pool temporal information to disambiguate and obtain more robust results, and thus could not be applied from a single photograph.
Markerless face tracking has progressed to commercial systems such as image-metrics and has been applied in movies such as The Matrix
sequels
and The Curious Case of Benjamin Button
.
The latter used the Mova Contour
system to capture a deformable
facial model, which was then animated with a combination of manual and
vision tracking. Avatar was another prominent performance capture movie however it used painted markers
rather than being markerless.
Markerless systems can be classified according to several distinguishing criteria:
To date, no system is ideal with respect to all these criteria. For example the Neven Vision
system was fully automatic and required no hidden patterns or per-person training, but was 2D.
The Face/Off system
is 3D, automatic, and real-time but requires projected patterns.
A facial motion capture database describes the coordinates or relative positions of reference points on the actor's face. The capture may be in two dimensions, in which case the capture process is sometimes called "expression tracking", or in three dimensions. Two dimensional capture can be achieved using a single camera and low cost capture software such as Zign Creations' Zign Track. This produces less sophisticated tracking, and is unable to fully capture three dimensional motions such as head rotation. Three dimensional capture is accomplished using multi-camera rigs or laser marker system. Such systems are typically far more expensive, complicated, and time-consuming to use.
Facial Motion Capture is related to body motion capture
Motion capture
Motion capture, motion tracking, or mocap are terms used to describe the process of recording movement and translating that movement on to a digital model. It is used in military, entertainment, sports, and medical applications, and for validation of computer vision and robotics...
, but is more challenging due to the higher resolution requirements to detect and track subtle expressions possible from small movements of the eyes and lips. These movements are often less than a few millimeters, requiring even greater resolution and fidelity and different filtering techniques than usually used in full body capture. The additional constraints of the face also allow more opportunities for using models and rules.
Two predominate technologies exist; marker and markerless tracking systems.
History
One of the first papers discussing performance-driven animation was published by Lance WilliamsLance Williams
Lance J. Williams is a prominent graphics researcher who made major contributions to texture map prefiltering, shadow rendering algorithms, facial animation, and antialiasing techniques...
in 1990. There, he describes 'a means of acquiring the expressions of realfaces, and applying them to computer-generated faces'.
Marker-based
Traditional marker based systems apply up to 350 markers to the actors faceFace
The face is a central sense organ complex, for those animals that have one, normally on the ventral surface of the head, and can, depending on the definition in the human case, include the hair, forehead, eyebrow, eyelashes, eyes, nose, ears, cheeks, mouth, lips, philtrum, temple, teeth, skin, and...
and track the marker movement with high resolution cameras. This has been used on movies such as The Polar Express
The Polar Express (film)
The Polar Express is a 2004 motion capture computer-animated film based on the children's book of the same title by Chris Van Allsburg. Written, produced, and directed by Robert Zemeckis, the human characters in the film were animated using live action performance capture technique, with the...
and Beowulf
Beowulf (2007 film)
Beowulf is a 2007 American animated fantasy film written by Neil Gaiman and Roger Avary inspired by the Old English epic poem of the same name. Directed by Robert Zemeckis, the film was created through a motion capture process similar to the technique he used in The Polar Express...
to allow an actor such as Tom Hanks
Tom Hanks
Thomas Jeffrey "Tom" Hanks is an American actor, producer, writer, and director. Hanks worked in television and family-friendly comedies, gaining wide notice in 1988's Big, before achieving success as a dramatic actor in several notable roles, including Andrew Beckett in Philadelphia, the title...
to drive the facial expressions of several different characters. Unfortunately this is relatively cumbersome and makes the actors expressions overly driven once the smoothing and filtering have taken place. Next generation systems such as CaptiveMotion utilize offshoots of the traditional marker based system with higher levels of details.
Active LED Marker technology is currently being used to drive facial animation in real-time to provide user feedback.
Markerless
Markerless technologies use the features of the face such as nostrilNostril
A nostril is one of the two channels of the nose, from the point where they bifurcate to the external opening. In birds and mammals, they contain branched bones or cartilages called turbinates, whose function is to warm air on inhalation and remove moisture on exhalation...
s, the corners of the lips and eyes, and wrinkles and then track them. This technology is discussed and demonstrated at CMU
Carnegie Mellon University
Carnegie Mellon University is a private research university in Pittsburgh, Pennsylvania, United States....
, IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
, University of Manchester
University of Manchester
The University of Manchester is a public research university located in Manchester, United Kingdom. It is a "red brick" university and a member of the Russell Group of research-intensive British universities and the N8 Group...
(where much of this started with Tim Cootes, Gareth Edwards and Chris Taylor) and other locations, using active appearance model
Active Appearance Model
An active appearance model is a computer vision algorithm for matching a statistical model of object shape and appearance to a new image. They are built during a training phase...
s, principal component analysis, eigen tracking
Eigenface
Eigenfaces are a set of eigenvectors used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby and used by Matthew Turk and Alex Pentland in face classification. It is considered the first successful example...
, deformable surface models and other techniques to track the desired facial features from frame
Film frame
In filmmaking, video production, animation, and related fields, a film frame or video frame is one of the many still images which compose the complete moving picture...
to frame. This technology is much less cumbersome, and allows greater expression for the actor.
These vision based approaches also have the ability to track pupil movement, eyelids, teeth occlusion by the lips and tongue, which are obvious problems in most computer animated features. Typical limitations of vision based approaches are resolution and frame rate, both of which are decreasing as issues as high speed, high resolution CMOS cameras become available from multiple sources.
The technology for markerless face tracking is related to that in a Facial recognition system
Facial recognition system
A facial recognition system is a computer application for automatically identifying or verifying a person from a digital image or a video frame from a video source...
,
since a facial recognition system can potentially be applied sequentially to each frame
of video, resulting in face tracking.
For example, the Neven Vision system (formerly Eyematics, now acquired by Google) allowed real-time
2D face tracking with no person-specific training; their system was also amongst the best-performing facial recognition systems in the U.S. Government's 2002 Facial Recognition Vendor Test (FRVT).
On the other hand some recognition systems do not explicitly track expressions or
even fail on non-neutral expressions, and so are not suitable for tracking.
Conversely, systems such as deformable surface models pool temporal information to disambiguate and obtain more robust results, and thus could not be applied from a single photograph.
Markerless face tracking has progressed to commercial systems such as image-metrics and has been applied in movies such as The Matrix
The Matrix (franchise)
The Matrix is a science fiction action franchise created by Andy and Larry Wachowski and distributed by Warner Bros. Pictures. The series began with the 1999 film The Matrix and later spawned two sequels; The Matrix Reloaded and The Matrix Revolutions, both released in 2003, thus forming a trilogy...
sequels
and The Curious Case of Benjamin Button
The Curious Case of Benjamin Button (film)
The Curious Case of Benjamin Button is a 2008 American fantasy-drama film directed by David Fincher. The screenplay by Eric Roth and Robin Swicord is loosely based on the 1922 short story of the same name by F. Scott Fitzgerald...
.
The latter used the Mova Contour
Contour (camera system)
Mova Contour is a multi-camera system developed by former Apple Computer engineer Steve Perlman. It records surfaces digitally, by using fluorescent makeup and stereo triangulation, allowing for very detailed digitization and manipulation...
system to capture a deformable
facial model, which was then animated with a combination of manual and
vision tracking. Avatar was another prominent performance capture movie however it used painted markers
rather than being markerless.
Markerless systems can be classified according to several distinguishing criteria:
- 2D versus 3D tracking
- whether person-specific training or other human assistance is required
- real-time performance (which is only possible if no training or supervision is required)
- whether they need an additional source of information such as projected patterns or invisible paint such as used in the Mova system.
To date, no system is ideal with respect to all these criteria. For example the Neven Vision
system was fully automatic and required no hidden patterns or per-person training, but was 2D.
The Face/Off system
is 3D, automatic, and real-time but requires projected patterns.