Homogeneous Motion Discovery Oriented Reference
Frame for High Efficiency Video Coding
Ashek Ahmmed1, David Taubman2, Aous Naman2, Mark Pickering3
1Department of Computer Science and Engineering
Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh.
2School of Electrical Engineering and Telecommunications
The University of New South Wales, Sydney, Australia.
3School of Engineering and Information Technology
The University of New South Wales, Canberra, Australia.
Picture Coding Symposium, 2016
Traditional video coding
Modern video coding standards are pixel and frame centric where
motion modeling plays a central role.
Block-based translational motion models are used.
Neighboring pixels are grouped together into square or rectangular
blocks to form an artificial partitioning of the current frame.
Motion modeling then involves performing a search in the set of
already coded frames for an identical shape block that closely
resembles the target block.
Traditional video coding
Figure 1: Block partitioning used by the HM 16.7 encoder for HEVC to code frame 5 of
the Kimono sequence. Hierarchical GOP structure is used with QP value 22.1
1
Intel Video Pro Analyzer is used to generate the above image.
Traditional video coding
In most video sequences, the motion vector field is a complex
combination of camera motion and object motion.
Block-based translational motion model ties all the pixels inside a
block to a single motion vector.
Traditional video coding
In most video sequences, the motion vector field is a complex
combination of camera motion and object motion.
Block-based translational motion model ties all the pixels inside a
block to a single motion vector.
This uniformity of motion within a block assumption does not hold in
the vicinity of object boundaries i.e. where motion discontinuity exists.
Figure 2: A block in the Foreman sequence
that contains motion discontinuity.
Figure 3: Motion compensated prediction
error for this block.
Traditional video coding
Approach 1: square or rectangular block partitioning
Partitioning motion blocks around object boundaries into smaller
square or rectangular sub-blocks can improve the coding efficiency
due to better matching between blocks and objects in the scene.
“variable-size motion blocks”1 are to be found in H.264/AVC with
several types of block partitions from 4 × 4 to 16 × 16 pixels are
supported.
HEVC allows a range of symmetric and asymmetric partitions with
the maximum block size can go up as far as 64 × 64 pixels.
1
M. Chan, Y. Yu, and A. Constantinides, “Variable size block matching motion compensation with applications to video
coding,” Communications, Speech and Vision, IEE Proceedings I, vol. 137, no. 4, pp. 205–212, Aug 1990.
Traditional video coding
Figure 4: Block partitioning structure and the motion compensated prediction error for
the block shown in figure 5. The used encoder is the JM software for H.264/AVC.1
1
the used bit stream analyzer is Elecard StreamEye.
Traditional video coding
Approach 2: slanted block partitioning
Another approach1,2,3,4,5 also partitions motion blocks into smaller
sub-blocks.
Takes into account the actual underlying motion discontinuities, by
performing slanted and arbitrary positioned partitioning of blocks.
1
R. De Forni and D. Taubman, “On the benefits of leaf merging in quad-tree motion models,” IEEE International
Conference on Image Processing, pp. 858–861, Sept 2005.
2
M. Tagliasacchi, M. Sarchi, and S. Tubaro, “Motion estimation by quadtree pruning and merging,” IEEE International
Conference on Multimedia and Expo, pp. 1861–1864, July 2006.
3
E. Hung, R. de Queiroz, and D. Mukherjee, “On macroblock partition for motion compensation,” IEEE International
Conference on Image Processing, pp. 1697–1700, Oct 2006.
4
A. Muhit, M. Pickering, and M. Frater, “A fast approach for geometry adaptive block partitioning,” Picture Coding
Symposium, pp. 1–4, May 2009.
5
R. Mathew and D. Taubman, “Quad-tree motion modeling with leaf merging,” IEEE Transactions on Circuits and
Systems for Video Technology, vol. 20, no. 10, pp. 1331–1345, Oct 2010.
Traditional video coding
Figure 5: Fast wedge algorithm.1 Figure 6: Geometry adaptive partitioning.2
1
E. Hung, R. de Queiroz, and D. Mukherjee, “On macroblock partition for motion compensation,” IEEE International
Conference on Image Processing, pp. 1697–1700, Oct 2006.
2
A. Muhit, M. Pickering, and M. Frater, “A fast approach for geometry adaptive block partitioning,” Picture Coding
Symposium, pp. 1–4, May 2009.
Issues with traditional video coding
Distortion in the prediction is reduced at the expense of increasing the
bit budget associated with motion information.
This increase in bit rate is due to motion model itself is used for
approximating the geometric boundaries of moving objects in
the scene.
The current frame fp is partitioned and each block is predicted from
fr1, fr2 or both, using per-block motion models.
Motion hint based video coding
The inspiration behind motion hint is to avoid using motion model to
describe object boundaries since the spatial structure of
previously-decoded frames can be exploited to infer appropriate
boundaries for the future ones.
Motion hint1,2 provides a global description of motion over specific
domain.
Related to the foreground-background segmentation where the
foreground and background motions are the hints.
1
A. Naman, D. Edwards, and D. Taubman, “Efficient communication of video using metadata,” IEEE International
Conference on Image Processing, pp. 581–584, Sept 2011.
2
A. Naman, R. Xu, and D. Taubman, “Inter-frame prediction using motion hints,” IEEE International Conference on Image
Processing,pp. 1792–1796, Sept 2013.
Motion hint based video coding
The appealing thing about motion hint is that even though the
observed motion field for a frame is discontinuous and non-invertible,
motion hint is continuous and invertible, at least over defined
temporal window (domain).
Motion hint based video coding
The appealing thing about motion hint is that even though the
observed motion field for a frame is discontinuous and non-invertible,
motion hint is continuous and invertible, at least over defined
temporal window (domain).
Hence frame data available at one time instant may be mapped to
any other time instant within the domain.
In this example, the domain of the foreground motion hint, DF
r1, is just a single rectangular
region and the domain of the background, DB
r1, is assumed to be the whole frame.
Motion hint based video coding
A bi-directional motion hints based coding paradigm was proposed by
the authors1,2.
While existing approaches3,4 perform segmentation in the current
frame, the proposed approach performed foreground-background
segmentation in the reference frame.
These segmented foreground and background regions were then
mapped (motion compensated) and fused together to generate a
prediction for the current frame.
1
A. Ahmmed, R. Xu, A. T. Naman, M. J. Alam, M. Pickering, and D. Taubman, “Motion segmentation initialization
strategies for bidirectional inter-frame prediction,” IEEE International Workshop on Multimedia Signal Processing, pp. 58–63,
Sept 2013.
2
A. Ahmmed, M. J. Alam, M. Pickering, R. Xu, A. T. Naman, and D. Taubman, “Motion hints based inter-frame
prediction for hybrid video coding,” Picture Coding Symposium, pp. 177–180, Dec 2013.
3
M. Orchard, “Predictive motion-field segmentation for image sequence coding,” IEEE Transactions on Circuits and
Systems for Video Technology, vol. 3, no. 1, pp. 54–70, Feb 1993.
4
J. H. Kim, A. Ortega, P. Yin, P. Pandit, and C. Gomila, “Motion compensation based on implicit block segmentation,”
IEEE International Conference on Image Processing, pp. 2452–2455, Oct 2008.
Motion hint based video coding
Figure 7: Bi-directional motion hints based inter-frame prediction paradigm.
Motion hint based video coding
Advantages of using motion hints
Avoids ascribing motion vectors to regions where content in the
current frame is occluded in a reference frame.
Segmenting the reference frames permit the use of information from
additional frames or even other media types e.g. depth maps as
evidence within the segmentation estimation step.
A savings in bit rate of up to 4.14%, in CIF resolution video
sequences for IBBBP GOP structure, is achieved over the HM
reference encoder for HEVC1.
1
A. Ahmmed and M. Pickering, “Motion hint field with content adaptive motion model for high efficiency video coding
(HEVC),” Data Compression Conference, March 2016.
Issues with motion hint based video coding
involves a challenging task of foreground-background motion
segmentation (an inverse problem).
Issues with motion hint based video coding
involves a challenging task of foreground-background motion
segmentation (an inverse problem).
the hint segmentation algorithm used in our previous works is not
suitable for HD, full HD and higher resolution sequences.
super-pixels need to be grouped into homogenous motion groups but
for higher resolution frame, the number of super-pixels becomes too
many to come up with representative enough foreground and
background shapes within a bearable number of iterations.
Figure 8: An example frame. Figure 9: Converged region1. Figure 10: Converged region2.
Contribution of the present work
Bi-directional affine motion model compensated prediction is used as
a reference frame for predicting the intermediate frame.
The prediction generation process does not require any foreground
and background segmentation, therefore computationally simpler even
for high resolution sequences.
Presented design enables re-use of a HEVC encoder without
modifications in low-level coding tools.
Contribution of the present work
Bi-directional affine motion model compensated prediction is used as
a reference frame for predicting the intermediate frame.
The prediction generation process does not require any foreground
and background segmentation, therefore computationally simpler even
for high resolution sequences.
Presented design enables re-use of a HEVC encoder without
modifications in low-level coding tools.
Coding/Decoding architecture
Bh-frame
P-frame
I-frame
HEVC
coding/decoding
Frame
Buffer
Affine model
based prediction
process
Frame Pairs Raffine
decoded I-, P-, & Bh-frames
I-, P- reference frames
I-, P-, Bh-, Rhint reference frames
Block diagram of the coding/decoding framework that uses the bi-directional affine motion
model compensated prediction as a reference frame, along with the usual temporal reference(s),
for the B-frames.
Bi-directional affine motion compensated prediction
Affine motion field between the reference frame Ri and the current
B-frame, C is estimated.
1
H. Lakshman, H. Schwarz, and T. Wiegand, “Adaptive motion model selection using a cubic spline based estimation
framework,” IEEE International Conference on Image Processing (ICIP), pp. 805–808, Sept 2010.
Bi-directional affine motion compensated prediction
Affine motion field between the reference frame Ri and the current
B-frame, C is estimated.
Resultant affine motion model M
(Ri →C)
1 is approximated by 3-corner
motion vectors1 whose fractional parts are quantized.
1
H. Lakshman, H. Schwarz, and T. Wiegand, “Adaptive motion model selection using a cubic spline based estimation
framework,” IEEE International Conference on Image Processing (ICIP), pp. 805–808, Sept 2010.
Bi-directional affine motion compensated prediction
Affine motion field between the reference frame Ri and the current
B-frame, C is estimated.
Resultant affine motion model M
(Ri →C)
1 is approximated by 3-corner
motion vectors1 whose fractional parts are quantized.
Ri is warped by this quantized motion model M
(Ri →C)
1 to generate an
affine motion compensated prediction of C .
CRi →C
1 = M
(Ri →C)
1 Ri (1)
1
H. Lakshman, H. Schwarz, and T. Wiegand, “Adaptive motion model selection using a cubic spline based estimation
framework,” IEEE International Conference on Image Processing (ICIP), pp. 805–808, Sept 2010.
Bi-directional affine motion compensated prediction
Affine motion field between the reference frame Ri and the current
B-frame, C is estimated.
Resultant affine motion model M
(Ri →C)
1 is approximated by 3-corner
motion vectors1 whose fractional parts are quantized.
Ri is warped by this quantized motion model M
(Ri →C)
1 to generate an
affine motion compensated prediction of C .
CRi →C
1 = M
(Ri →C)
1 Ri (1)
1
H. Lakshman, H. Schwarz, and T. Wiegand, “Adaptive motion model selection using a cubic spline based estimation
framework,” IEEE International Conference on Image Processing (ICIP), pp. 805–808, Sept 2010.
Bi-directional affine motion compensated prediction
An error analysis is performed over the prediction error associated to
the prediction CRi →C
1 .
Prediction error blocks having the sum-squared error (SSE) greater
than the blockwise mean SSE of this error image are identified.
Bi-directional affine motion compensated prediction
An error analysis is performed over the prediction error associated to
the prediction CRi →C
1 .
Prediction error blocks having the sum-squared error (SSE) greater
than the blockwise mean SSE of this error image are identified.
Blocks with high SSE are then used to estimate another affine motion
model, M
(Ri →C)
2 , which in turn is employed to generate a second
prediction of C .
CRi →C
2 = M
(Ri →C)
2 Ri (2)
Predictions CRi →C
1 and CRi →C
2 are fused into a single prediction of C
from the reference frame Ri .
CRi →C = 1 − f
(Ri →C)
2 · CRi →C
1 + f
(Ri →C)
2 · CRi →C
2 (3)
where f
(Ri →C)
2 is a binary image associated to the blocks with high SSE.
Bi-directional affine motion compensated prediction
Figure 11: The affine motion model, M
(Ri →C)
1 performed poorly in blocks with white boundary
pixels i.e. these blocks have high prediction error energy. The example scenario is for predicting
frame 5 using coded frames 1 and 9 of the Kimono (1920 × 1080) sequence.
Bi-directional affine motion compensated prediction
Similarly, using the reference frame Rj and C , the prediction of C
namely CRj →C is formed.
It is blended with CRi →C to generate the bi-directional affine motion
models compensated prediction, Raffine , of C .
Raffine = wi · CRi →C + wj · CRj →C (4)
where wi = wj = 0.5.
Raffine is used as a reference frame in addition to the normal temporal
references for the B-frames.
Experimental analysis
(a) HEVC (translational model)
(b) Modified HEVC (translational model, Raffine )
Figure 12: The prediction unit (PU) structures used by the HM encoder in predicting frame 5 of the Kimono video
sequence. Prediction of the upper-right region is shown only. The random access main configuration is used. Visual comparison
reveals the fact that with the proposed hybrid approach, it is possible to employ bigger blocks for motion compensated
prediction and thus improved bit rebate. These results were obtained by setting the quantization parameter (QP) to 22.
Experimental analysis
The first 105 frames of each sequence are coded by the HM encoder which
is configured using the random access main configuration1.
Figure 13: Rate distortion performance of different
coding strategies on the Kimono (1920 × 1080) sequence.
Sequence Delta rate
Kimono −2.30%
Park Scene −1.76%
Basketball Drive −1.00%
1
F. Bossen, “Common test conditions and software reference configurations,” in document JCTVC-H1100, JCT-VC, San
Jose, CA, Feb 2012.
Conclusions
An approach is presented that attempts to find homogeneous motion
regions and that does not involve any super-pixel level segmentation.
These homogeneous regions are then affine motion compensated to
generate a prediction of the current frame.
May pave the way for piecewise smooth motion model identification
that is necessary to have volumetric description for motion fields.
Increased computational complexity due to an additional reference
frame.
Side information could be further optimized.

PCS 2016 presentation

  • 1.
    Homogeneous Motion DiscoveryOriented Reference Frame for High Efficiency Video Coding Ashek Ahmmed1, David Taubman2, Aous Naman2, Mark Pickering3 1Department of Computer Science and Engineering Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh. 2School of Electrical Engineering and Telecommunications The University of New South Wales, Sydney, Australia. 3School of Engineering and Information Technology The University of New South Wales, Canberra, Australia. Picture Coding Symposium, 2016
  • 2.
    Traditional video coding Modernvideo coding standards are pixel and frame centric where motion modeling plays a central role. Block-based translational motion models are used. Neighboring pixels are grouped together into square or rectangular blocks to form an artificial partitioning of the current frame. Motion modeling then involves performing a search in the set of already coded frames for an identical shape block that closely resembles the target block.
  • 3.
    Traditional video coding Figure1: Block partitioning used by the HM 16.7 encoder for HEVC to code frame 5 of the Kimono sequence. Hierarchical GOP structure is used with QP value 22.1 1 Intel Video Pro Analyzer is used to generate the above image.
  • 4.
    Traditional video coding Inmost video sequences, the motion vector field is a complex combination of camera motion and object motion. Block-based translational motion model ties all the pixels inside a block to a single motion vector.
  • 5.
    Traditional video coding Inmost video sequences, the motion vector field is a complex combination of camera motion and object motion. Block-based translational motion model ties all the pixels inside a block to a single motion vector. This uniformity of motion within a block assumption does not hold in the vicinity of object boundaries i.e. where motion discontinuity exists. Figure 2: A block in the Foreman sequence that contains motion discontinuity. Figure 3: Motion compensated prediction error for this block.
  • 6.
    Traditional video coding Approach1: square or rectangular block partitioning Partitioning motion blocks around object boundaries into smaller square or rectangular sub-blocks can improve the coding efficiency due to better matching between blocks and objects in the scene. “variable-size motion blocks”1 are to be found in H.264/AVC with several types of block partitions from 4 × 4 to 16 × 16 pixels are supported. HEVC allows a range of symmetric and asymmetric partitions with the maximum block size can go up as far as 64 × 64 pixels. 1 M. Chan, Y. Yu, and A. Constantinides, “Variable size block matching motion compensation with applications to video coding,” Communications, Speech and Vision, IEE Proceedings I, vol. 137, no. 4, pp. 205–212, Aug 1990.
  • 7.
    Traditional video coding Figure4: Block partitioning structure and the motion compensated prediction error for the block shown in figure 5. The used encoder is the JM software for H.264/AVC.1 1 the used bit stream analyzer is Elecard StreamEye.
  • 8.
    Traditional video coding Approach2: slanted block partitioning Another approach1,2,3,4,5 also partitions motion blocks into smaller sub-blocks. Takes into account the actual underlying motion discontinuities, by performing slanted and arbitrary positioned partitioning of blocks. 1 R. De Forni and D. Taubman, “On the benefits of leaf merging in quad-tree motion models,” IEEE International Conference on Image Processing, pp. 858–861, Sept 2005. 2 M. Tagliasacchi, M. Sarchi, and S. Tubaro, “Motion estimation by quadtree pruning and merging,” IEEE International Conference on Multimedia and Expo, pp. 1861–1864, July 2006. 3 E. Hung, R. de Queiroz, and D. Mukherjee, “On macroblock partition for motion compensation,” IEEE International Conference on Image Processing, pp. 1697–1700, Oct 2006. 4 A. Muhit, M. Pickering, and M. Frater, “A fast approach for geometry adaptive block partitioning,” Picture Coding Symposium, pp. 1–4, May 2009. 5 R. Mathew and D. Taubman, “Quad-tree motion modeling with leaf merging,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 10, pp. 1331–1345, Oct 2010.
  • 9.
    Traditional video coding Figure5: Fast wedge algorithm.1 Figure 6: Geometry adaptive partitioning.2 1 E. Hung, R. de Queiroz, and D. Mukherjee, “On macroblock partition for motion compensation,” IEEE International Conference on Image Processing, pp. 1697–1700, Oct 2006. 2 A. Muhit, M. Pickering, and M. Frater, “A fast approach for geometry adaptive block partitioning,” Picture Coding Symposium, pp. 1–4, May 2009.
  • 10.
    Issues with traditionalvideo coding Distortion in the prediction is reduced at the expense of increasing the bit budget associated with motion information. This increase in bit rate is due to motion model itself is used for approximating the geometric boundaries of moving objects in the scene. The current frame fp is partitioned and each block is predicted from fr1, fr2 or both, using per-block motion models.
  • 11.
    Motion hint basedvideo coding The inspiration behind motion hint is to avoid using motion model to describe object boundaries since the spatial structure of previously-decoded frames can be exploited to infer appropriate boundaries for the future ones. Motion hint1,2 provides a global description of motion over specific domain. Related to the foreground-background segmentation where the foreground and background motions are the hints. 1 A. Naman, D. Edwards, and D. Taubman, “Efficient communication of video using metadata,” IEEE International Conference on Image Processing, pp. 581–584, Sept 2011. 2 A. Naman, R. Xu, and D. Taubman, “Inter-frame prediction using motion hints,” IEEE International Conference on Image Processing,pp. 1792–1796, Sept 2013.
  • 12.
    Motion hint basedvideo coding The appealing thing about motion hint is that even though the observed motion field for a frame is discontinuous and non-invertible, motion hint is continuous and invertible, at least over defined temporal window (domain).
  • 13.
    Motion hint basedvideo coding The appealing thing about motion hint is that even though the observed motion field for a frame is discontinuous and non-invertible, motion hint is continuous and invertible, at least over defined temporal window (domain). Hence frame data available at one time instant may be mapped to any other time instant within the domain. In this example, the domain of the foreground motion hint, DF r1, is just a single rectangular region and the domain of the background, DB r1, is assumed to be the whole frame.
  • 14.
    Motion hint basedvideo coding A bi-directional motion hints based coding paradigm was proposed by the authors1,2. While existing approaches3,4 perform segmentation in the current frame, the proposed approach performed foreground-background segmentation in the reference frame. These segmented foreground and background regions were then mapped (motion compensated) and fused together to generate a prediction for the current frame. 1 A. Ahmmed, R. Xu, A. T. Naman, M. J. Alam, M. Pickering, and D. Taubman, “Motion segmentation initialization strategies for bidirectional inter-frame prediction,” IEEE International Workshop on Multimedia Signal Processing, pp. 58–63, Sept 2013. 2 A. Ahmmed, M. J. Alam, M. Pickering, R. Xu, A. T. Naman, and D. Taubman, “Motion hints based inter-frame prediction for hybrid video coding,” Picture Coding Symposium, pp. 177–180, Dec 2013. 3 M. Orchard, “Predictive motion-field segmentation for image sequence coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 3, no. 1, pp. 54–70, Feb 1993. 4 J. H. Kim, A. Ortega, P. Yin, P. Pandit, and C. Gomila, “Motion compensation based on implicit block segmentation,” IEEE International Conference on Image Processing, pp. 2452–2455, Oct 2008.
  • 15.
    Motion hint basedvideo coding Figure 7: Bi-directional motion hints based inter-frame prediction paradigm.
  • 16.
    Motion hint basedvideo coding Advantages of using motion hints Avoids ascribing motion vectors to regions where content in the current frame is occluded in a reference frame. Segmenting the reference frames permit the use of information from additional frames or even other media types e.g. depth maps as evidence within the segmentation estimation step. A savings in bit rate of up to 4.14%, in CIF resolution video sequences for IBBBP GOP structure, is achieved over the HM reference encoder for HEVC1. 1 A. Ahmmed and M. Pickering, “Motion hint field with content adaptive motion model for high efficiency video coding (HEVC),” Data Compression Conference, March 2016.
  • 17.
    Issues with motionhint based video coding involves a challenging task of foreground-background motion segmentation (an inverse problem).
  • 18.
    Issues with motionhint based video coding involves a challenging task of foreground-background motion segmentation (an inverse problem). the hint segmentation algorithm used in our previous works is not suitable for HD, full HD and higher resolution sequences. super-pixels need to be grouped into homogenous motion groups but for higher resolution frame, the number of super-pixels becomes too many to come up with representative enough foreground and background shapes within a bearable number of iterations. Figure 8: An example frame. Figure 9: Converged region1. Figure 10: Converged region2.
  • 19.
    Contribution of thepresent work Bi-directional affine motion model compensated prediction is used as a reference frame for predicting the intermediate frame. The prediction generation process does not require any foreground and background segmentation, therefore computationally simpler even for high resolution sequences. Presented design enables re-use of a HEVC encoder without modifications in low-level coding tools.
  • 20.
    Contribution of thepresent work Bi-directional affine motion model compensated prediction is used as a reference frame for predicting the intermediate frame. The prediction generation process does not require any foreground and background segmentation, therefore computationally simpler even for high resolution sequences. Presented design enables re-use of a HEVC encoder without modifications in low-level coding tools.
  • 21.
    Coding/Decoding architecture Bh-frame P-frame I-frame HEVC coding/decoding Frame Buffer Affine model basedprediction process Frame Pairs Raffine decoded I-, P-, & Bh-frames I-, P- reference frames I-, P-, Bh-, Rhint reference frames Block diagram of the coding/decoding framework that uses the bi-directional affine motion model compensated prediction as a reference frame, along with the usual temporal reference(s), for the B-frames.
  • 22.
    Bi-directional affine motioncompensated prediction Affine motion field between the reference frame Ri and the current B-frame, C is estimated. 1 H. Lakshman, H. Schwarz, and T. Wiegand, “Adaptive motion model selection using a cubic spline based estimation framework,” IEEE International Conference on Image Processing (ICIP), pp. 805–808, Sept 2010.
  • 23.
    Bi-directional affine motioncompensated prediction Affine motion field between the reference frame Ri and the current B-frame, C is estimated. Resultant affine motion model M (Ri →C) 1 is approximated by 3-corner motion vectors1 whose fractional parts are quantized. 1 H. Lakshman, H. Schwarz, and T. Wiegand, “Adaptive motion model selection using a cubic spline based estimation framework,” IEEE International Conference on Image Processing (ICIP), pp. 805–808, Sept 2010.
  • 24.
    Bi-directional affine motioncompensated prediction Affine motion field between the reference frame Ri and the current B-frame, C is estimated. Resultant affine motion model M (Ri →C) 1 is approximated by 3-corner motion vectors1 whose fractional parts are quantized. Ri is warped by this quantized motion model M (Ri →C) 1 to generate an affine motion compensated prediction of C . CRi →C 1 = M (Ri →C) 1 Ri (1) 1 H. Lakshman, H. Schwarz, and T. Wiegand, “Adaptive motion model selection using a cubic spline based estimation framework,” IEEE International Conference on Image Processing (ICIP), pp. 805–808, Sept 2010.
  • 25.
    Bi-directional affine motioncompensated prediction Affine motion field between the reference frame Ri and the current B-frame, C is estimated. Resultant affine motion model M (Ri →C) 1 is approximated by 3-corner motion vectors1 whose fractional parts are quantized. Ri is warped by this quantized motion model M (Ri →C) 1 to generate an affine motion compensated prediction of C . CRi →C 1 = M (Ri →C) 1 Ri (1) 1 H. Lakshman, H. Schwarz, and T. Wiegand, “Adaptive motion model selection using a cubic spline based estimation framework,” IEEE International Conference on Image Processing (ICIP), pp. 805–808, Sept 2010.
  • 26.
    Bi-directional affine motioncompensated prediction An error analysis is performed over the prediction error associated to the prediction CRi →C 1 . Prediction error blocks having the sum-squared error (SSE) greater than the blockwise mean SSE of this error image are identified.
  • 27.
    Bi-directional affine motioncompensated prediction An error analysis is performed over the prediction error associated to the prediction CRi →C 1 . Prediction error blocks having the sum-squared error (SSE) greater than the blockwise mean SSE of this error image are identified. Blocks with high SSE are then used to estimate another affine motion model, M (Ri →C) 2 , which in turn is employed to generate a second prediction of C . CRi →C 2 = M (Ri →C) 2 Ri (2) Predictions CRi →C 1 and CRi →C 2 are fused into a single prediction of C from the reference frame Ri . CRi →C = 1 − f (Ri →C) 2 · CRi →C 1 + f (Ri →C) 2 · CRi →C 2 (3) where f (Ri →C) 2 is a binary image associated to the blocks with high SSE.
  • 28.
    Bi-directional affine motioncompensated prediction Figure 11: The affine motion model, M (Ri →C) 1 performed poorly in blocks with white boundary pixels i.e. these blocks have high prediction error energy. The example scenario is for predicting frame 5 using coded frames 1 and 9 of the Kimono (1920 × 1080) sequence.
  • 29.
    Bi-directional affine motioncompensated prediction Similarly, using the reference frame Rj and C , the prediction of C namely CRj →C is formed. It is blended with CRi →C to generate the bi-directional affine motion models compensated prediction, Raffine , of C . Raffine = wi · CRi →C + wj · CRj →C (4) where wi = wj = 0.5. Raffine is used as a reference frame in addition to the normal temporal references for the B-frames.
  • 30.
    Experimental analysis (a) HEVC(translational model) (b) Modified HEVC (translational model, Raffine ) Figure 12: The prediction unit (PU) structures used by the HM encoder in predicting frame 5 of the Kimono video sequence. Prediction of the upper-right region is shown only. The random access main configuration is used. Visual comparison reveals the fact that with the proposed hybrid approach, it is possible to employ bigger blocks for motion compensated prediction and thus improved bit rebate. These results were obtained by setting the quantization parameter (QP) to 22.
  • 31.
    Experimental analysis The first105 frames of each sequence are coded by the HM encoder which is configured using the random access main configuration1. Figure 13: Rate distortion performance of different coding strategies on the Kimono (1920 × 1080) sequence. Sequence Delta rate Kimono −2.30% Park Scene −1.76% Basketball Drive −1.00% 1 F. Bossen, “Common test conditions and software reference configurations,” in document JCTVC-H1100, JCT-VC, San Jose, CA, Feb 2012.
  • 32.
    Conclusions An approach ispresented that attempts to find homogeneous motion regions and that does not involve any super-pixel level segmentation. These homogeneous regions are then affine motion compensated to generate a prediction of the current frame. May pave the way for piecewise smooth motion model identification that is necessary to have volumetric description for motion fields. Increased computational complexity due to an additional reference frame. Side information could be further optimized.