Video Transcoding with Intel IPP Eric Shufro April 27, 2004 COT6930
Introduction to Transcoding What is transcoding? Why transcode? What is involved? Performance Quality Intel IPP Applications
Transcoding Overview Reduced bitrate Reasonable Quality Performance Gain MPEG-2 bit stream MPEG-4 bit stream MPEG-2 bit stream MPEG-4 bit stream Decode Encode Partial Decode and Encode 1 2
Applications Streaming video for both broadband and narrow band networks. Decreased video bitrate for playback on mobile or other small embedded devices. Conversion and modification of pre-encoded bit streams. Perhaps steganography? (jpg, mp3…)
Integrated Performance Primitives Provides source code and libraries for media types such as MP3, MPEG-2, MPEG-4, H.263, JPEG, JPEG2000, GSM-AMR, G.723, and computer vision. [Intel] Well documented. Easy to use.
MPEG-2 Decoder Runs in two separate threads. Responsible for splitting the input stream into two separate parts, audio and video and then decoding the video stream into YUV components. YUV buffers are available through the transcoder class to the input of the encoder.
MPEG-4 Encoder Uses input data from the decoder YUV frame buffers on a frame by frame basis. Creates an MPEG-4 bit stream file, out.cms, between 800KB and 1MB in size, video only. Motion Estimation can be disabled for testing purposes. Requires input parameters via a parameter file, though some elements can be ignored.
Transcoder Architecture MPEG-2 decoder and MPEG-4 encoder based on the IPP. Transcoder class encapsulates both the encoder and decoder. Memory is accessible between the encoder and the decoder. Transcoder runs in three separate threads.
Transcoder Initialization Read application parameters, source file, encoder parameter file and output file name. Read the encoder parameters into memory. Create decoder and splitter thread. DecoderInit() EncoderInit(), overwrite parameters. Splitter – Open bit stream Encode MPEG-4 header. Begin transcoding.
Input Stream MPEG-2. Contains both audio and video. 147 frames @ 720x480. Source file is 3.57 MB. Intra and Inter coded frames (I, P, B).
Encoder Modifications ExpandFrame() bypassed. StepLuma and StepChroma artificially set after decoder init. Parameters overwritten after encoder init. mp4_MacroBlock changed to a public member of ippVideoEncoderMPEG4. Motion Estimation can be disabled. (avoids Sum of Absolute Difference)
Modified Parameters Input filename (ignored). Resolution (ignored). Frame count (ignored). Frame rate = 30. ME algorithm and accuracy. Number of motion vectors, 1 or 4.
Decoder / Encoder Interfacing memcpy(mp4enc.mCurrPtr Y , frame->Y_comp_data, mp4par.Width  * mp4par.Height); memcpy(mp4enc.mCurrPtr U , frame->U_comp_data, mp4par.Width  * mp4par.Height /4 ); memcpy(mp4enc.mCurrPtr V , frame->V_comp_data, mp4par.Width  * mp4par.Height /4 ); Transcoder is hard coded to work with 4:2:0 only
Motion Vectors No ME for I-VOP’s Better quality with 4MV With ME disabled, all MV copied,    but correct number coded. mp4enc.MBinfo->mv[0].dx = this->context->macroblock.vector_luma[0]; //x mp4enc.MBinfo->mv[0].dy = this->context->macroblock.vector_luma[1]; //y mp4enc.MBinfo->mv[1].dx = this->context->macroblock.vector_luma[2]; //x mp4enc.MBinfo->mv[1].dy = this->context->macroblock.vector_luma[3]; //y mp4enc.MBinfo->mv[2].dx = this->context->macroblock.vector_luma[4]; //x mp4enc.MBinfo->mv[2].dy = this->context->macroblock.vector_luma[5]; //y mp4enc.MBinfo->mv[3].dx = this->context->macroblock.vector_luma[6]; //x mp4enc.MBinfo->mv[3].dy = this->context->macroblock.vector_luma[7]; //y 16x16, 1MV 8x8, 4MV
Output Stream MPEG-4. Contains only video. 147 frames @ 720x480. Output file is 824 KB. Intra and Inter coded VOP’s (I, P).
Image Comparison MPEG-2 to MPEG-4 with 1 MV and ME Enabled Mpeg-2 Mpeg-4
Peak Signal to Noise Ratio F(i,j), The Average Decoded   Luminance Pixel Shade (0-255) N^2 = The Number of pixels present Error Image
PSNR of Sample Clip 1 MV 4 MV Graphs created by CalcPSNR, a freely distributed product of VideoSoft Inc. Motion Estimation Enabled
PSNR of Sample Clip 1 MV 4 MV Graphs created by CalcPSNR, a freely distributed product of VideoSoft Inc. Motion Estimation Disabled
Conclusion Transcoding is practical for many applications and quality can be maintained. PSNR is reasonable measure of quality, but does reveal everything. Partial decoding and encoding along with motion vector reuse can save execution time (168%)! Dramatic difference in execution time between AMD and Intel processors of near equivalent speed due to the use of the Intel IPP.
Limitations Resolution (Input = Output) Format (4:2:0) Audio (None)
References Intel  -  http://www.intel.com/software/products/ipp/overview.htm VideoSoft -  http://www.videosoftinc.com  (for PSNR) HK – H. Kalva, A. Vetro, and H. Sun,  "Performance Optimization of the MPEG-2 to MPEG-4 Video Transcoder“ , May 2003. GIT - Seong Hwan Jang, Nikil Jayant  (Georgia Institute of Technology)

presentation

  • 1.
    Video Transcoding withIntel IPP Eric Shufro April 27, 2004 COT6930
  • 2.
    Introduction to TranscodingWhat is transcoding? Why transcode? What is involved? Performance Quality Intel IPP Applications
  • 3.
    Transcoding Overview Reducedbitrate Reasonable Quality Performance Gain MPEG-2 bit stream MPEG-4 bit stream MPEG-2 bit stream MPEG-4 bit stream Decode Encode Partial Decode and Encode 1 2
  • 4.
    Applications Streaming videofor both broadband and narrow band networks. Decreased video bitrate for playback on mobile or other small embedded devices. Conversion and modification of pre-encoded bit streams. Perhaps steganography? (jpg, mp3…)
  • 5.
    Integrated Performance PrimitivesProvides source code and libraries for media types such as MP3, MPEG-2, MPEG-4, H.263, JPEG, JPEG2000, GSM-AMR, G.723, and computer vision. [Intel] Well documented. Easy to use.
  • 6.
    MPEG-2 Decoder Runsin two separate threads. Responsible for splitting the input stream into two separate parts, audio and video and then decoding the video stream into YUV components. YUV buffers are available through the transcoder class to the input of the encoder.
  • 7.
    MPEG-4 Encoder Usesinput data from the decoder YUV frame buffers on a frame by frame basis. Creates an MPEG-4 bit stream file, out.cms, between 800KB and 1MB in size, video only. Motion Estimation can be disabled for testing purposes. Requires input parameters via a parameter file, though some elements can be ignored.
  • 8.
    Transcoder Architecture MPEG-2decoder and MPEG-4 encoder based on the IPP. Transcoder class encapsulates both the encoder and decoder. Memory is accessible between the encoder and the decoder. Transcoder runs in three separate threads.
  • 9.
    Transcoder Initialization Readapplication parameters, source file, encoder parameter file and output file name. Read the encoder parameters into memory. Create decoder and splitter thread. DecoderInit() EncoderInit(), overwrite parameters. Splitter – Open bit stream Encode MPEG-4 header. Begin transcoding.
  • 10.
    Input Stream MPEG-2.Contains both audio and video. 147 frames @ 720x480. Source file is 3.57 MB. Intra and Inter coded frames (I, P, B).
  • 11.
    Encoder Modifications ExpandFrame()bypassed. StepLuma and StepChroma artificially set after decoder init. Parameters overwritten after encoder init. mp4_MacroBlock changed to a public member of ippVideoEncoderMPEG4. Motion Estimation can be disabled. (avoids Sum of Absolute Difference)
  • 12.
    Modified Parameters Inputfilename (ignored). Resolution (ignored). Frame count (ignored). Frame rate = 30. ME algorithm and accuracy. Number of motion vectors, 1 or 4.
  • 13.
    Decoder / EncoderInterfacing memcpy(mp4enc.mCurrPtr Y , frame->Y_comp_data, mp4par.Width * mp4par.Height); memcpy(mp4enc.mCurrPtr U , frame->U_comp_data, mp4par.Width * mp4par.Height /4 ); memcpy(mp4enc.mCurrPtr V , frame->V_comp_data, mp4par.Width * mp4par.Height /4 ); Transcoder is hard coded to work with 4:2:0 only
  • 14.
    Motion Vectors NoME for I-VOP’s Better quality with 4MV With ME disabled, all MV copied, but correct number coded. mp4enc.MBinfo->mv[0].dx = this->context->macroblock.vector_luma[0]; //x mp4enc.MBinfo->mv[0].dy = this->context->macroblock.vector_luma[1]; //y mp4enc.MBinfo->mv[1].dx = this->context->macroblock.vector_luma[2]; //x mp4enc.MBinfo->mv[1].dy = this->context->macroblock.vector_luma[3]; //y mp4enc.MBinfo->mv[2].dx = this->context->macroblock.vector_luma[4]; //x mp4enc.MBinfo->mv[2].dy = this->context->macroblock.vector_luma[5]; //y mp4enc.MBinfo->mv[3].dx = this->context->macroblock.vector_luma[6]; //x mp4enc.MBinfo->mv[3].dy = this->context->macroblock.vector_luma[7]; //y 16x16, 1MV 8x8, 4MV
  • 15.
    Output Stream MPEG-4.Contains only video. 147 frames @ 720x480. Output file is 824 KB. Intra and Inter coded VOP’s (I, P).
  • 16.
    Image Comparison MPEG-2to MPEG-4 with 1 MV and ME Enabled Mpeg-2 Mpeg-4
  • 17.
    Peak Signal toNoise Ratio F(i,j), The Average Decoded Luminance Pixel Shade (0-255) N^2 = The Number of pixels present Error Image
  • 18.
    PSNR of SampleClip 1 MV 4 MV Graphs created by CalcPSNR, a freely distributed product of VideoSoft Inc. Motion Estimation Enabled
  • 19.
    PSNR of SampleClip 1 MV 4 MV Graphs created by CalcPSNR, a freely distributed product of VideoSoft Inc. Motion Estimation Disabled
  • 20.
    Conclusion Transcoding ispractical for many applications and quality can be maintained. PSNR is reasonable measure of quality, but does reveal everything. Partial decoding and encoding along with motion vector reuse can save execution time (168%)! Dramatic difference in execution time between AMD and Intel processors of near equivalent speed due to the use of the Intel IPP.
  • 21.
    Limitations Resolution (Input= Output) Format (4:2:0) Audio (None)
  • 22.
    References Intel - http://www.intel.com/software/products/ipp/overview.htm VideoSoft - http://www.videosoftinc.com (for PSNR) HK – H. Kalva, A. Vetro, and H. Sun, "Performance Optimization of the MPEG-2 to MPEG-4 Video Transcoder“ , May 2003. GIT - Seong Hwan Jang, Nikil Jayant (Georgia Institute of Technology)