PCS 2016 presentation

Homogeneous Motion Discovery Oriented Reference
Frame for High Eﬃciency Video Coding
Ashek Ahmmed1, David Taubman2, Aous Naman2, Mark Pickering3
1Department of Computer Science and Engineering
Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh.
2School of Electrical Engineering and Telecommunications
The University of New South Wales, Sydney, Australia.
3School of Engineering and Information Technology
The University of New South Wales, Canberra, Australia.
Picture Coding Symposium, 2016

Traditional video coding
Modern video coding standards are pixel and frame centric where
motion modeling plays a central role.
Block-based translational motion models are used.
Neighboring pixels are grouped together into square or rectangular
blocks to form an artiﬁcial partitioning of the current frame.
Motion modeling then involves performing a search in the set of
already coded frames for an identical shape block that closely
resembles the target block.

Figure 1: Block partitioning used by the HM 16.7 encoder for HEVC to code frame 5 of
the Kimono sequence. Hierarchical GOP structure is used with QP value 22.1
1
Intel Video Pro Analyzer is used to generate the above image.

In most video sequences, the motion vector ﬁeld is a complex
combination of camera motion and object motion.
Block-based translational motion model ties all the pixels inside a
block to a single motion vector.

In most video sequences, the motion vector ﬁeld is a complex
combination of camera motion and object motion.
Block-based translational motion model ties all the pixels inside a
block to a single motion vector.
This uniformity of motion within a block assumption does not hold in
the vicinity of object boundaries i.e. where motion discontinuity exists.
Figure 2: A block in the Foreman sequence
that contains motion discontinuity.
Figure 3: Motion compensated prediction
error for this block.

Approach 1: square or rectangular block partitioning
Partitioning motion blocks around object boundaries into smaller
square or rectangular sub-blocks can improve the coding eﬃciency
due to better matching between blocks and objects in the scene.
“variable-size motion blocks”1 are to be found in H.264/AVC with
several types of block partitions from 4 × 4 to 16 × 16 pixels are
supported.
HEVC allows a range of symmetric and asymmetric partitions with
the maximum block size can go up as far as 64 × 64 pixels.
1
M. Chan, Y. Yu, and A. Constantinides, “Variable size block matching motion compensation with applications to video
coding,” Communications, Speech and Vision, IEE Proceedings I, vol. 137, no. 4, pp. 205–212, Aug 1990.

Figure 4: Block partitioning structure and the motion compensated prediction error for
the block shown in ﬁgure 5. The used encoder is the JM software for H.264/AVC.1
1
the used bit stream analyzer is Elecard StreamEye.

Approach 2: slanted block partitioning
Another approach1,2,3,4,5 also partitions motion blocks into smaller
sub-blocks.
Takes into account the actual underlying motion discontinuities, by
performing slanted and arbitrary positioned partitioning of blocks.
1
R. De Forni and D. Taubman, “On the beneﬁts of leaf merging in quad-tree motion models,” IEEE International
Conference on Image Processing, pp. 858–861, Sept 2005.
2
M. Tagliasacchi, M. Sarchi, and S. Tubaro, “Motion estimation by quadtree pruning and merging,” IEEE International
Conference on Multimedia and Expo, pp. 1861–1864, July 2006.
3
E. Hung, R. de Queiroz, and D. Mukherjee, “On macroblock partition for motion compensation,” IEEE International
Conference on Image Processing, pp. 1697–1700, Oct 2006.
4
A. Muhit, M. Pickering, and M. Frater, “A fast approach for geometry adaptive block partitioning,” Picture Coding
Symposium, pp. 1–4, May 2009.
5
R. Mathew and D. Taubman, “Quad-tree motion modeling with leaf merging,” IEEE Transactions on Circuits and
Systems for Video Technology, vol. 20, no. 10, pp. 1331–1345, Oct 2010.

Figure 5: Fast wedge algorithm.1 Figure 6: Geometry adaptive partitioning.2
1
E. Hung, R. de Queiroz, and D. Mukherjee, “On macroblock partition for motion compensation,” IEEE International
Conference on Image Processing, pp. 1697–1700, Oct 2006.
2
A. Muhit, M. Pickering, and M. Frater, “A fast approach for geometry adaptive block partitioning,” Picture Coding
Symposium, pp. 1–4, May 2009.

Issues with traditional video coding
Distortion in the prediction is reduced at the expense of increasing the
bit budget associated with motion information.
This increase in bit rate is due to motion model itself is used for
approximating the geometric boundaries of moving objects in
the scene.
The current frame fp is partitioned and each block is predicted from
fr1, fr2 or both, using per-block motion models.

Motion hint based video coding
The inspiration behind motion hint is to avoid using motion model to
describe object boundaries since the spatial structure of
previously-decoded frames can be exploited to infer appropriate
boundaries for the future ones.
Motion hint1,2 provides a global description of motion over speciﬁc
domain.
Related to the foreground-background segmentation where the
foreground and background motions are the hints.
1
A. Naman, D. Edwards, and D. Taubman, “Eﬃcient communication of video using metadata,” IEEE International
Conference on Image Processing, pp. 581–584, Sept 2011.
2
A. Naman, R. Xu, and D. Taubman, “Inter-frame prediction using motion hints,” IEEE International Conference on Image
Processing,pp. 1792–1796, Sept 2013.

The appealing thing about motion hint is that even though the
observed motion ﬁeld for a frame is discontinuous and non-invertible,
motion hint is continuous and invertible, at least over deﬁned
temporal window (domain).

The appealing thing about motion hint is that even though the
observed motion ﬁeld for a frame is discontinuous and non-invertible,
motion hint is continuous and invertible, at least over deﬁned
temporal window (domain).
Hence frame data available at one time instant may be mapped to
any other time instant within the domain.
In this example, the domain of the foreground motion hint, DF
r1, is just a single rectangular
region and the domain of the background, DB
r1, is assumed to be the whole frame.

A bi-directional motion hints based coding paradigm was proposed by
the authors1,2.
While existing approaches3,4 perform segmentation in the current
frame, the proposed approach performed foreground-background
segmentation in the reference frame.
These segmented foreground and background regions were then
mapped (motion compensated) and fused together to generate a
prediction for the current frame.
1
A. Ahmmed, R. Xu, A. T. Naman, M. J. Alam, M. Pickering, and D. Taubman, “Motion segmentation initialization
strategies for bidirectional inter-frame prediction,” IEEE International Workshop on Multimedia Signal Processing, pp. 58–63,
Sept 2013.
2
A. Ahmmed, M. J. Alam, M. Pickering, R. Xu, A. T. Naman, and D. Taubman, “Motion hints based inter-frame
prediction for hybrid video coding,” Picture Coding Symposium, pp. 177–180, Dec 2013.
3
M. Orchard, “Predictive motion-ﬁeld segmentation for image sequence coding,” IEEE Transactions on Circuits and
Systems for Video Technology, vol. 3, no. 1, pp. 54–70, Feb 1993.
4
J. H. Kim, A. Ortega, P. Yin, P. Pandit, and C. Gomila, “Motion compensation based on implicit block segmentation,”
IEEE International Conference on Image Processing, pp. 2452–2455, Oct 2008.

Figure 7: Bi-directional motion hints based inter-frame prediction paradigm.

Advantages of using motion hints
Avoids ascribing motion vectors to regions where content in the
current frame is occluded in a reference frame.
Segmenting the reference frames permit the use of information from
additional frames or even other media types e.g. depth maps as
evidence within the segmentation estimation step.
A savings in bit rate of up to 4.14%, in CIF resolution video
sequences for IBBBP GOP structure, is achieved over the HM
reference encoder for HEVC1.
1
A. Ahmmed and M. Pickering, “Motion hint ﬁeld with content adaptive motion model for high eﬃciency video coding
(HEVC),” Data Compression Conference, March 2016.

Issues with motion hint based video coding
involves a challenging task of foreground-background motion
segmentation (an inverse problem).

Issues with motion hint based video coding
involves a challenging task of foreground-background motion
segmentation (an inverse problem).
the hint segmentation algorithm used in our previous works is not
suitable for HD, full HD and higher resolution sequences.
super-pixels need to be grouped into homogenous motion groups but
for higher resolution frame, the number of super-pixels becomes too
many to come up with representative enough foreground and
background shapes within a bearable number of iterations.
Figure 8: An example frame. Figure 9: Converged region1. Figure 10: Converged region2.

Contribution of the present work
Bi-directional aﬃne motion model compensated prediction is used as
a reference frame for predicting the intermediate frame.
The prediction generation process does not require any foreground
and background segmentation, therefore computationally simpler even
for high resolution sequences.
Presented design enables re-use of a HEVC encoder without
modiﬁcations in low-level coding tools.

Coding/Decoding architecture
Bh-frame
P-frame
I-frame
HEVC
coding/decoding
Frame
Buffer
Affine model
based prediction
process
Frame Pairs Raffine
decoded I-, P-, & Bh-frames
I-, P- reference frames
I-, P-, Bh-, Rhint reference frames
Block diagram of the coding/decoding framework that uses the bi-directional affine motion
model compensated prediction as a reference frame, along with the usual temporal reference(s),
for the B-frames.

Bi-directional affine motion compensated prediction
Affine motion field between the reference frame Ri and the current
B-frame, C is estimated.
1
H. Lakshman, H. Schwarz, and T. Wiegand, “Adaptive motion model selection using a cubic spline based estimation
framework,” IEEE International Conference on Image Processing (ICIP), pp. 805–808, Sept 2010.

Resultant aﬃne motion model M
(Ri →C)
1 is approximated by 3-corner
motion vectors1 whose fractional parts are quantized.
1

Resultant aﬃne motion model M
(Ri →C)
1 is approximated by 3-corner
motion vectors1 whose fractional parts are quantized.
Ri is warped by this quantized motion model M
(Ri →C)
1 to generate an
aﬃne motion compensated prediction of C .
CRi →C
1 = M
(Ri →C)
1 Ri (1)
1

An error analysis is performed over the prediction error associated to
the prediction CRi →C
1 .
Prediction error blocks having the sum-squared error (SSE) greater
than the blockwise mean SSE of this error image are identiﬁed.

An error analysis is performed over the prediction error associated to
the prediction CRi →C
1 .
Prediction error blocks having the sum-squared error (SSE) greater
than the blockwise mean SSE of this error image are identiﬁed.
Blocks with high SSE are then used to estimate another aﬃne motion
model, M
(Ri →C)
2 , which in turn is employed to generate a second
prediction of C .
CRi →C
2 = M
(Ri →C)
2 Ri (2)
Predictions CRi →C
1 and CRi →C
2 are fused into a single prediction of C
from the reference frame Ri .
CRi →C = 1 − f
(Ri →C)
2 · CRi →C
1 + f
(Ri →C)
2 · CRi →C
2 (3)
where f
(Ri →C)
2 is a binary image associated to the blocks with high SSE.

Figure 11: The aﬃne motion model, M
(Ri →C)
1 performed poorly in blocks with white boundary
pixels i.e. these blocks have high prediction error energy. The example scenario is for predicting
frame 5 using coded frames 1 and 9 of the Kimono (1920 × 1080) sequence.

Similarly, using the reference frame Rj and C , the prediction of C
namely CRj →C is formed.
It is blended with CRi →C to generate the bi-directional affine motion
models compensated prediction, Raffine , of C .
Raffine = wi · CRi →C + wj · CRj →C (4)
where wi = wj = 0.5.
Raffine is used as a reference frame in addition to the normal temporal
references for the B-frames.

Experimental analysis
(a) HEVC (translational model)
(b) Modified HEVC (translational model, Raffine )
Figure 12: The prediction unit (PU) structures used by the HM encoder in predicting frame 5 of the Kimono video
sequence. Prediction of the upper-right region is shown only. The random access main configuration is used. Visual comparison
reveals the fact that with the proposed hybrid approach, it is possible to employ bigger blocks for motion compensated
prediction and thus improved bit rebate. These results were obtained by setting the quantization parameter (QP) to 22.

Experimental analysis
The first 105 frames of each sequence are coded by the HM encoder which
is configured using the random access main configuration1.
Figure 13: Rate distortion performance of different
coding strategies on the Kimono (1920 × 1080) sequence.
Sequence Delta rate
Kimono −2.30%
Park Scene −1.76%
Basketball Drive −1.00%
1
F. Bossen, “Common test conditions and software reference configurations,” in document JCTVC-H1100, JCT-VC, San
Jose, CA, Feb 2012.

Conclusions
An approach is presented that attempts to find homogeneous motion
regions and that does not involve any super-pixel level segmentation.
These homogeneous regions are then affine motion compensated to
generate a prediction of the current frame.
May pave the way for piecewise smooth motion model identification
that is necessary to have volumetric description for motion fields.
Increased computational complexity due to an additional reference
frame.
Side information could be further optimized.

PCS 2016 presentation

More Related Content

What's hot

Similar to PCS 2016 presentation

Recently uploaded

PCS 2016 presentation