Topics to Cover: • Facial Animation Parameters(FAP) • Facial Definition Parameters(FDP) • Face Model • Coding of FAP’s • Integration of Face Animation and Text to Speech(TTS) synthesis. • (Binary Format Scene)BIFS for Facial Animation.
• What is (Facial Animation Parameters)FAP? It is based on the minimal perceptual actions of human beings,such as expressions,emotions etc..and are closely related to the muscle actions.• What is (Facial Definition Parameters)FDP? It allows the user to configure the 3D facial model to be used at the receiver.(either sending the previously sent model or introducing a fresh model)
Face Model:• Every MPEG-4 terminal that is able to decode FAP streams has to provide a face model for animation.• This model is proprietary to decoder itself.• The encoder does not know about the look of the face model.• Using an FDP node MPEG-4 allows the encoder to specify completely the face model to animate.• The FDP node can also be used to calibrate the proprietary model of the decoder.
• The decoder may choose to specify the location of all or some feature points.• After specifying the feature points, the decoder can adapt its own proprietary face model such that the model conforms to the feature point positions.• Face model adaptation also allows for the downloading of texture maps for the face.• Each feature point has a different texture map• In order to specify the mapping of the texture map onto the face model,the encoder sends texture co-ordinates for each feature point.
• Encoder specific.• The process of adapting the feature point locations of the face model according to encoder specifications is referred to as Face Model Calibration.• Sometimes also called as Face Model Adaptation.
Simplified scene graph for a head model. Root Group Head Transform X Head Transform Y Head Transform Z Left Eye Right Eye Face Hair Tongue Teeth Transform X Transform X Left Eye Right Eye Transform Y Transform Y Left Eye Right Eye
• A root node is a collection of objects.• For the objects to move together in a group, they need to be in the same transform group.• When the transform nodes contain different transforms, the information setting has a cumulative effect.• The transform node defines geometric 3D transformations such as scaling,rotation etc.• Indexed Face Set is used to define the geometry and the surface attributes (color and texture) of the object.• The rotations for the left eye and right eye are also embedded in this.
Coding Of (Facial Animation Parameters)FAP’s:• Tools used for coding: 1) Arithmetic encoder(low delay) 2)DCT coding technique (high delay)
• 1)Using Arithmetic decoder:-Allows for low delay FAP coding-coding efficiency is low• 2)Using DCT:-Introduces larger delay.-Achieves higher coding efficiency.
• The first set of FAP values , FAP(0) is coded without prediction.(At time instant zero)• The value of a FAP at time instant k i.e FAP(k) is predicted using the previous encoded value FAP(k-1)• e` is quantized using the step size QP multiplied by a quantization parameter FAP_QUANT.• 0< FAP_QUANT<31• The quantized prediction error e` is arithmetically encoded using a separate adaptive probability model for each FAP.• FAP_QUANT>15,is usually not used because the quality of the animation gets reduced.• At the decoder,the received data is arithmetically decoded,dequantized and added to the previously decoded value.
DCT:• Applied to 16 consecutive FAP values.• Hence,it introduces a significant delay in the coding and decoding processes.• After computing the DCT of 16 consecutive values of one FAP,DC and AC coefficients are coded seperately.• DC coefficients use the prediction method• AC coefficients are directly coded.• Both AC and DC coefficients are quantized seperately.• The quantized coefficients are encoded with one VLC word defining the number of zero co-efficients,prior to next non-zero coefficients and another VLC for the amplitude of this non zero coefficient.
Integration of TTS synthesizer into an MPEG4 face animation system
Integration of Face Animation and Text to Speech(TTS) synthesis• Syncronization of a FAP stream with TTS synthesizers using the TTSI(TTS interface) is only possible if the encoder sends the timing information.• This is because,a conventional TTS is an asynchronous source.• Decoder:Decodes the text and passes it to the proprietary speech synthesizer.
• SYNTHESIZER:Creates speech samples that are handed to the compositor.• COMPOSITOR:Provides audio or video output to the user.• The second output interface of the synthesizer sends the phonemes of the synthesized speech as well as the start time and duration information of each phoneme to FAP converter.• The converter translates the phonemes and timing information into FAP’s so that the face renderer can use in order to animate the face model.
• Bookmarks in the text of TTS is used to animate facial expressions.• When the TTS finds the bookmarks in the text,it sends it to FAP converter.• FAP converter transforms the phonemes into visemes and timing information into the FAP’s.• When the TTS finds the bookmark in the text,it sends this bookmark to the FAP converter.• The bookmark defines the start point and duration of transition to FAP amplitude.
Integration with MPEG-4 Systems:(BIFS)• To use face animation in MPEG-4 systems,a BIFS scene graph has to be transmitted.• Minimum scene graph should contain a face node and FAP node.• The nodes of FAP’s may be the high level FAP’s such as visemes and expressions.• The scene graph would enable the encoder to animate the proprietary face model of the decoder.• In order to download a face model to the decoder,it requires a FDP node.• A FDP node is further divided into its children,viz Face definition table(Fdef),Face Definition Mesh(FDM),Face Definition Transform(FDT).
Nodes of the BIFS scene that are used to describe and animate a face