SlideShare a Scribd company logo
1 of 24
Download to read offline
VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
NGUYEN MINH HOA
MOTION ANALYSIS FROM ENCODED VIDEO
BITSTREAM
MASTER’S THESIS
HA NOI – 2018
VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
NGUYEN MINH HOA
MOTION ANALYSIS FROM ENCODED VIDEO
BITSTREAM
Major: Computer Science
MASTER’S THESIS
Supervisor: Dr. Do Van Nguyen
Co-Supervisor: Dr. Tran Quoc Long
HA NOI - 2018
i
AUTHORSHIP
“I hereby declare that the work contained in this thesis is of my own and I have
not submitted this thesis at any other institution in order to obtain a degree. To
the best of my knowledge and belief, the thesis contains no materials previously
published or written by another person other than those listed in the bibliography
and identified as references.”
Signature: ………………………………………………
ii
SUPERVISOR’S APPROVAL
“I hereby approve that the thesis in its current form is ready for committee
examination as a requirement for the Master of Computer Science degree at the
University of Engineering and Technology.”
Signature: ………………………………………………
Signature: ………………………………………………
iii
ACKNOWLEDGMENTS
First of all, I would like to express special gratitude to my supervisors, Dr. Do
Van Nguyen and Dr. Tran Quoc Long, for their enthusiasm for instructions, the
technical explanation as well as advices during this project.
I also want to give sincere thanks to Assoc. Prof. Dr. Ha Le Thanh, Assoc. Prof.
Dr. Nguyen Thi Thuy for the instructions as well as the background knowledge
for this thesis. And I would like to also thank my teachers, my friends in Human
Machine Interaction Lab for their support.
Thank my friends, my colleagues in the project "Nghiên Cứu Công Nghệ Tóm Tắt
Video", and project “Multimedia application tools for intangible cultural heritage
conservation and promotion”, project number ĐTDL.CN-34/16 for their working
and support.
Last but not least, I want to thank my family and all of my friends for their
motivation and support as well. They stand by and inspire me whenever I face the
tough time.
1
TABLE OF CONTENTS
AUTHORSHIP.......................................................................................................i
SUPERVISOR’S APPROVAL.............................................................................ii
ACKNOWLEDGMENTS....................................................................................iii
TABLE OF CONTENTS......................................................................................1
ABBREVIATIONS...............................................................................................3
List of Figures .......................................................................................................4
List of Tables.........................................................................................................5
INTRODUCTION.................................................................................................6
CHAPTER 1. LITERATURE REVIEW ............................................................9
Moving object detection in the pixel domain..........................................9
Moving object detection in the compressed domain.............................10
1.2.1. Motion vector approaches.............................................................11
1.2.2. Size of Macroblock approaches ....................................................13
Chapter Summarization.........................................................................14
CHAPTER 2. METHODOLOGY ....................................................................15
Video compression standard h264 ........................................................15
2.1.1. H264 file structure.........................................................................15
2.1.2. Macroblock....................................................................................18
2.1.3. Motion vector................................................................................19
Proposed method ...................................................................................21
2.2.1. Process video bitstream.................................................................21
2.2.2. Macroblock-based Segmentation..................................................22
2.2.3. Object-based Segmentation...........................................................24
2.2.4. Object Refinement ........................................................................28
2
Chapter Summarization.........................................................................28
CHAPTER 3. RESULTS ..................................................................................30
The moving object detection application ..............................................30
3.1.1. The process of application ............................................................31
3.1.2. The motion information ................................................................34
3.1.3. Synthesizing movement information ............................................35
3.1.4. Storing Movement Information ....................................................36
Experiments...........................................................................................36
3.2.1. Dataset...........................................................................................36
3.2.2. Evaluation methods.......................................................................40
3.2.3. Implementations............................................................................41
3.2.4. Experimental results......................................................................41
Chapter Summarization.........................................................................44
CONCLUSIONS.................................................................................................45
List of of author’s publications related to thesis.................................................46
REFERENCES....................................................................................................47
3
ABBREVIATIONS
MB Macroblock
MV Motion vector
NALU Network Abstraction Layer Unit
RBSP Raw Byte Sequence Payload
SODB String Of Data Bits
4
List of Figures
Figure 1.1. The process of moving object detection with data in the pixel domain
.............................................................................................................................10
Figure 1.2. The process of moving object detection with data in the compressed
domain.................................................................................................................11
Figure 2.1. The structure of a H264 file..............................................................15
Figure 2.2. RBSP structure..................................................................................16
Figure 2.3. Slide structure ...................................................................................18
Figure 2.4. Macroblock structure........................................................................18
Figure 2.5. The motion vector of a Macroblock .................................................20
Figure 2.6. The process of moving object detection method..............................22
Figure 2.7. Skipped Macroblock.........................................................................23
Figure 2.8. (a) An outdoor and in-door frames (b) The "size-map" of frames, (c)
The "motion-map" of frames...............................................................................24
Figure 2.9. Example about the “consistent” of motion vector............................26
Figure 3.1. The implementation process of the approach...................................33
Figure 3.2. Data struct to storage motion information........................................35
Figure 3.3. Example frames of test videos..........................................................37
Figure 3.4. Example frames and their ground truth............................................39
Figure 3.5. An example frame of Pedestrians (a) and ground truth image (b)...40
5
List of Tables
Table 2.1. NALU types .......................................................................................16
Table 2.2. Slide types ..........................................................................................17
Table 3.1. The information of test videos ...........................................................38
Table 3.2. The information of test sequences in group 1....................................39
Table 3.3. The performance of two approachs with Pedestrians, PETS2006,
Highway, and Office ...........................................................................................42
Table 3.4. The experimental result of Poppe’s approach on 2nd
group...............42
Table 3.5. The experimental result of proposed method on 2nd
group ...............43
6
INTRODUCTION
Today, video content is extensively used in the areas of life such as indoor
monitoring, traffic monitoring, etc. The number of videos sharing over the
Internet at any given time is also extremely large. According to statistics,
hundreds of hours of video are uploaded to Youtube every minute [1]. Not only
that, the general trend today is the surveillance cameras installed in homes for
surveillance and sercurity purposes. These cameras will normally operate and
store the surveillance videos automatically. Only when there are some special
situations, or some special events occur, humans will use the video data to revisit.
The problem is that in a short amount of time, how can such a large video volume
be evaluated? For example, when there is a burglary, an intrusion occurs, we can
not spend hours to check each video previously stored. Then, a tool that lets you
determine the moment when an object is moving in a long video is essential to
reducing the time and effort of searching.
Normally, in order to reduce the size of videos for transmission or storing, a video
compression procedure is performed at surveillance cameras. After that, the
compressed information in form of bit stream is stored, or transmitted to a server
for analysis. The video analysis process needs a lot of features to describe
different aspects of vision. Typically, these features are extracted from the pixel
values of each video frame by fully decompressing bitstream. The decompression
procedure requires high computation capacity device to perform. However, with
the trend of "Internet of Things", there are many low processing capacity devices
which are not capable for performing this full video decompression at high speed.
So, it is difficult to perform an approach that requires a lot of computing power in
real time.
Another way to extract the feature from the video is using the data on the
compressed video. These data can be: transform coefficients, motion vectors,
quantization steps, quantization parameters, etc. From the above data, through the
process and analysis, we can handle some important tasks in the computer vision
include moving objects detection, human actions detection, face recognition,
motion objects tracking.
This thesis proposes a new method to determine moving object by exploring and
applying some motion estimation techniques in the video compression domain.
After that, the method will be used to build an application that supports movement
searching in the surveillance videos in the families. The compression format of
7
the videos in the thesis is the H264 compression standard (MPEG-4 part10), a
popular video compression standard today.
Aims
The goal of the thesis is to propose a method for determining moving objects in
the compressed domain of a video. Then, I try to build an application using the
method for support searching the moments which have moving objects in the
video.
Object and Scope of the study
Within the framework of the thesis, I study the algorithms related to determining
moving objects in video, especially the algorithms that determine moving objects
in the compressed domain. The video compression standard is used in the thesis
is H264/AVC.
The theory of video compression and computer vision are taken from scientific
articles related to the video analysis problem on the compression domain,
determine the motion form on the compression domain of the video.
The videos for test and experiment are obtained from the surveillance cameras
both indoor and outdoor.
Method and procedures
- Research on motion analysis and evaluation systems on existing compressed
video, scientific articles related to the analysis and evaluation of motion on
compressed video.
- Experimental research: Conduct experiential settings for each theoretical part
such as extracting video data, compiling data, and evaluating motion based on the
obtained data.
- Experimental evaluation: Each experiment will be conducted independently on
each module and then integrated and deployed.
Contributions
The thesis proposes a new moving object detection method in surveillance video
encoded with H264 compression standard using the motion vector and size of
macroblock.
8
Thesis structure
Apart from the introduction, the conclution and the references, this thesis is
organized into 3 chapters with the following main contents:
Chapter 1 is literature review. This chapter will show the related work of the thesis
include the moving object detection methods in the pixel domain and the moving
object detection methods in the compressed domain.
Chapter 2 mentiones the basic knowledge about video compression standard
H264 such as H264 file structure, macroblocks, motion vectors and describes the
detail of moving object detection method including processing video bitstreams,
macroblock-based segmentation phase, object-based segmentation phase, and
object refinement phase.
Chapter 3 shows the results of method including an application using proposed
method and experimental results.
9
CHAPTER 1.
LITERATURE REVIEW
Today, surveillance cameras are used extensively in the world. The volume of
video surveillance has also grown tremendously. Some problems that are often
encountered with video surveillance include event searching, motion tracking,
abnormal behavior detection, etc. In order to handle these tasks, it is necessary to
have a method that can determine which the moments in each videos exist
movements.
Usually, the video is compressed for storage and transmission. The previous
moving object detection method usually use the data from the pixel images such
as color value, edges, etc. To get the images that can be displayed, or processed,
the system must decode video fully. This consumes a large number of computing
resources, time and memory of the device. I suggest a method that can quickly
determine the moving objects in high resolution videos. The data used in the
method will be taken from the compressed video domain including information
about the motion vector and the size of the macroblock (in bit) after encoding.
The method reduces the processing time of the method considerably compared to
methods implemented with data on the pixel domain.
The problem of motion detection in a video has long been studied. This is the first
step in a series of computer vision problems such as object tracking, object
detection, abnormal movement detection, etc. There are usually two approaches
to address this problem: using fully decoded video data (pixel domain data) or
using live data from an undecoded video (compressed domain data). The
following section will outline the studies based on these two approaches.
Moving object detection in the pixel domain
Typically, to reduce the size of the video for transmission, a video encoding
process is performed inside the surveillance camera and the compressed
information is transmitted as a bit stream to a server for video analysis. Common
video compression standards used today including mp4, H264, H265. To be
viewable, these compressed videos need to be decoded to image frames. We call
these image frames are the pixel domain and the data obtained from these image
frames are the data in the pixel domain. Fig. 1.1 describes the process of moving
object detection methods in the pixel domain. The data in the pixel domain include
the color values of the pixels, the number of color channels of each pixel, the
edges, etc.
10
Figure 1.1. The process of moving object detection with data in the pixel domain
To determine moving objects in the pixel domain, background subtraction
algorithms are commonly used. There are many research results that have been
introduced long ago. These methods usually use data as the relationship between
frames in a time series.
Background subtraction in [2] is defined as: “Background subtraction is a widely
used approach for detecting moving objects in videos from static cameras. The
rationale in the approach is that of detecting the moving objects from the
difference between the current frame and a reference frame, often called The
“background image”, or “background model”. As a basic, the background image
must be a representation of the scene with no moving objects and must be kept
regularly updated so as to adapt to the varying luminarice conditions and
geometry settings.”.
Results of the researchs may include the methods use Gaussian average such as
the method of Wren et al. [3], the method of Koller et al. [4]; the methods use
Temporal median filter such as the method of Lo and Velasti [5], the method of
Cucchiara et al. [6]; the methods using a mixture of Gaussians such as the method
of Stauffer and Grimson [7], methods of Wayne Power and Schoonees [8]; etc.
The above methods have a common characteristic that is the process data are taken
by fully decompress the compressed bitstream and this decompression procedure
requires a highly computational device to perform. However, with the trend of
"Internet of Things," where most low-end devices are not capable of performing
high-speed decompression. Therefore, there should be a video analysis
mechanism that includes only uncompressed video.
Moving object detection in the compressed domain
Normally, the videos will be encoded using some compression standard. Each
compression standard specifies how to shrink the video size by a certain structure.
The compressed videos will contain fewer data. For example, with the H264
compression standard, the data contained in the compressed video includes
11
information about macroblock, motion vector, frame information, etc. We call
these data that the data in the compressed domain or video compression region.
Fig. 1.2 shows the process of moving object detection methods by using the data
in the compressed domain.
Figure 1.2. The process of moving object detection with data in the compressed
domain
In general, the amount of data in the video compression domain is much less than
the data in the pixel domain. The idea of using data in the compressed domain
with the H264 compression standard for video analysis has also been investigated
by some scientists around the world. In order to be able to detect motion in the
compressed video domain, we usually use two types of data. They are the motion
vector and the size (in bit) of the macroblock.
1.2.1. Motion vector approaches
A number of algorithms have been proposed to analyze video content in the H264
compressed domain, whose good performances have been obtained [9] [10]. Zeng
et al. Study in [11] proposed a method to detect moving objects in H264
compressed videos based on motion vectors. Motion vectors are extracted from
the motion field and classified into several types. Then, they are grouped into
blocks through the Markov Random Field (MRF) classification process. Liu et al.
[12] recognized the shape of an object by using a map for each object. This
approach is based on a binary partition tree created by macroblocks. Cipres et al.
[13] presented a moving object detection approach in the H264 compressed
domain based on fuzzy logic. The motion vectors are used to remove the noises
that appear during the encoding process and represent the concepts that describe
the detected regions. Then, the valid motion vectors are grouped into blocks. Each
of them could be identified as a moving object in the video scene. The moving
objects of each frame are described with common terms like shape, size, position,
and velocity. Mak et al. [14] used the length, angle, and direction of motion
vectors to track the objects by applying the MRF. Bruyne et al. [15] estimated the
12
reliability of motion vectors by comparing them with projected motion vectors
from surrounding frames. Then, they combined this information with the
magnitude of motion vectors to distinguish foreground objects from the
background. This method can localize the noisy motion vectors and their effect
during the classification can be diminished. Wang et al. [16] proposed a
background modeling method using the motion vector and local binary pattern
(LBP) to detect the moving object. When a background block was similar to a
foreground block, a noisy motion vector would appear. To obtain a more reliable
and dense motion vector field, the initial motion vector fields were preprocessed
by a temporal accumulation within three inter frames and a 3×3 median filtering.
After that, the LBP feature was introduced to describe the spatial correlation
among neighboring blocks. This approach can reduce the time of extracting
moving objects while also performing an effective synopsis analysis. Marcus
Laumer [17] proposed an approach to segment video frames into the foreground
and background and, according to this segmentation, to identify regions
containing moving objects. The approach uses a map to indicate the "weight" of
each (sub-)macroblock for the presence of a moving object. This map is the input
of a new spatiotemporal detection algorithm that is used to refine the weight that
indicated the level of motion for each block. Then, quantization parameters of
macroblocks are used to apply individual thresholds to the block weights to
segment the video frames. The accuracy of the approach was approximately 50%.
To identify the human action, Tom et al. [18] proposed a quick action
identification algorithm. The algorithm uses quantization parameters gradient
image (QGI) and motion vectors with support vector machines (SVM) to classify
the types of the actions. The algorithm can also handle light, scale and some other
environmental variables with an accuracy rate of 85% on the videos with
resolution 176x144. It can identifies walking, running, etc. Similarly, Tom,
Rangarajan and his colleagues also used QGI and motion vector to propose a new
method to classify human actions as the Projection Based Learning of the Meta-
cognitive Radial Basis Functional Network (PBL-McRBFN).
With the motion tracking problem, Biswas et al. [19] propose a method for
detecting abnormal actions by analyzing motion vector. This method mainly relies
on observing the motion vector to find the difference between abnormal actions
and normal situations. The classifier used here is the Gaussian Mixture Model
(GMM). This approach base on their another approach [20] but improved it by
using the direction of the motion vector. The speed of approach when perform
experimental is about 70fps. Thilak et al. [21] propose a Probabilistic Data
13
Association Filter that detects multiple target clusters. This method can handle
cases in which targets split into multiple clusters or clusters should be detected
(classified) as a target. Similarly, You et al. [22] use the probabilistic spatio-
temporal MB filtering to mark the macroblock as objects and then remove them
from the noise. The algorithm can track many objects with real-time accuracy but
can only be applied in case of fixed camera and objects must be at least two
macroblocks. Kas et al. [23] overcame the fixed camera problem using Global
Motion Estimation and Object History Images to handle background movement.
However, the number of motion objects need to be small and the moving objects
are not occupied most of the frame area.
1.2.2. Size of Macroblock approaches
The methods mentioned above share the trait of using motion vectors to detect
moving objects. However, since motion vectors are usually created at the video
encoder to optimize video compression ratio, they do not always represent the real
motion in the video sequence. As such, due to its coding-oriented nature, to detect
moving objects, the motion vector fields must be preprocessed and refined to
remove the noises.
So, Poppe et al. [24] proposed an approach to detect moving objects in the H264
video by using the size of the macroblocks after encoding (in bit). To achieve Sub-
macroblock-level (4×4) precision, the information from transform coefficients
was also utilized. The system achieved high execution speeds, up to 20 times
faster than the motion vector-based related works. An analysis was restricted to
Predicted (P) frames, and a simple interpolation technique was employed to
handle Intra (I) frames. The whole algorithm was based on the assumption that
the macroblocks that contains an edge of a moving object is more difficult to
compress since it is hard to find a good match for those macroblocks in the
reference frame(s).
Base on Poppe’s idea, Vacavant et al. [25] used the macroblock size to detect
moving objects by applying the Gaussian mixture model (GMM). The approach
can represent the distribution of macroblock sizes well.
Although the method of Poppe and Vacavant is good for removing the
background motion noise, they cannot produce high motion detection results for
videos in high spatial resolution (such as 1920 × 1080 or 1280 × 720). In case
where the moving objects are large and they contain a uniform color region (such
as a black car), then the size of macroblocks corresponding to the inside region of
14
the moving object will be very small (normally around zero), and using a filtering
threshold or parameter (though very small) will not be effective. In those cases,
the algorithm will determine these regions to be background.
Chapter Summarization
In this chapter showed the researchs about moving object detection in both pixel
domain and compressed domain. The approachs using data from pixel domain
usually have high accuracy but taking a large number of computing resources and
time. The approachs using data in compressed domain have lower accuracy
because the data in compressed domain usually contain less information. In the
next chapters, I will propose a method that can efficiently detect moving objects,
especially in high spatial resolution video streams. The method uses the data taken
from the video compressed domain, including the size of the macroblocks to
detect the skeleton of the moving object and the motion vectors to detect the detail
of the moving object.
15
CHAPTER 2.
METHODOLOGY
Video compression standard h264
Before proposing the moving object detection method, this chapter will show
some informations about H264, a popular video compression standard, which is
used to encode and decode the surveillance video in the thesis.
This day, the installation of surveillance cameras in house became quite common.
Normally, video data from a surveillance camera over a long period of time
usually has very huge size. Consequently, videos need to be preprocessed and
encoded before being used and transmitted over the network. There are many
recognized compression standards and widely used. One of these is the H264 or
MPEG-4 part 10 [26], a compression standard recognized by the ITU-T Video
Coding Experts Group and the ISO/IEC Moving Picture Experts Group.
2.1.1. H264 file structure
Normally, the video after being captured from the camera will be compressed
using a common video compression standard such as H261, H263, MP4,
H264/AVC, H265/HEVC, etc. In the thesis, I encode and decode the video by
using H264/AVC. The H264 video codec or MPEG-4 part 10 is recognized by the
ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts
Group.
Typically, an H264 file is splitted into packets called the Network Abstraction
Layer Unit (NALU) [27], as shown in Fig. 2.1.
Figure 2.1. The structure of a H264 file
The first NALU byte indicates the type of NALU. The NALU type shows what
the NALU's structure is. It can be a slice or set parameters for decompression. The
meaning of the NALU in Table 2.1.
16
Table 2.1. NALU types
Type Definition
0 Undefined
1 Slice layer without partitioning non IDR
2 Slice data partition A layer
3 Slice data partition B layer
4 Slice data partition C layer
5 Slice layer without partitioning IDR
6 Additional information (SEI)
7 Sequence parameter set
8 Picture parameter set
9 Access unit delimiter
10 End of sequence
11 End of stream
12 Filler data
13..23 Reserved
24..31 Undefined
Other than NALU, the rest of the NALU is called RBSP (Raw Byte Sequence
Payload). RBSP contains data of SODB (String Of Data Bits). According to the
specification document H264 (ISO/IEC 14496-10) if the SODB is empty (no bits
are present), the RBSP is also empty. The first byte of RBSP (left side) contains
8 bits of SODB; The next byte of the RBSP will contain up to 8 bits of SODB and
continue until less than 8 bits of SODB.
Figure 2.2. RBSP structure
17
A video will normally be divided into frames and the encoder will encode them
one by one. Each frame is encoded into slices. Each slice is divided into
Macroblock (MB). Typically, each frame corresponds to a slice, but sometimes a
frame can be split into multiple slices. The slices are divided into categories as
shown in Fig. 2.2. A slice consists of a header and a data section (Fig. 2.3). The
header of the slice contains information about the type of slice, the type of MB in
the slice, the number of slice frames. The header also contains information about
the reference frame and quantitative parameters. The data portion of the slice is
the information about the macroblock.
Table 2.2. Slide types
Type Description
0 P-slice. Consists of P-macroblocks (each macroblock is predicted using
one reference frame) and/or I-macroblocks.
1 B-slice. Consists of B-macroblocks (each macroblock is predicted
using one or two reference frames) and/or I-macroblocks.
2 I-slice. Contains only I-macroblocks. Each macroblock is predicted
from previously coded blocks of the same slice.
3 SP-slice. Consists of P and/or I-macroblocks and lets you switch
between encoded streams.
4 SI-slice. It consists of a special type of SI-macroblocks and lets you
switch between encoded streams.
5 P-slice.
6 B-slice.
7 I-slice.
8 SP-slice.
9 SI-slice.
Tải bản FULL (53 trang): https://bit.ly/3FkzN8W
Dự phòng: fb.com/TaiHo123doc.net
18
Figure 2.3. Slide structure
2.1.2. Macroblock
The basic principle of a compression standard is to split the video into frame
groups. Each frame is divided into the basic processing units. (For example, in the
H264/AVC standard, it is Macroblock (MB) which is a region 16x16 pixels).
Also, with some data regions carrying more detail, the MBs will be subdivided
into smaller sub-macroblocks (4x4 or 8x8 pixels). Each MB after compression
will contain the information used to recover the video later, including Motion
vector, Residual value, Quantization parameter, etc. as in Fig. 2.4, where:
• ADDR is the position of Macroblock in a frame;
• TYPE is the Macroblock type;
• QUANT is the quantization parameter;
• VECTOR is Motion vector;
• CBP (Coded Block Pattern) show how to split MB into smaller blocks;
• bN is encoded data of residual of color channels (4 Y, 1 Cr, 1 Cb).
Figure 2.4. Macroblock structure
During decompression, the video decoder receives the compressed video data as
a stream of binary data, decodes the elements and extracts the encoded
information, including coefficients of variation, size of MB (in bit), motion
Tải bản FULL (53 trang): https://bit.ly/3FkzN8W
Dự phòng: fb.com/TaiHo123doc.net
19
prediction information, and so on and perform the reverse transformation to
restore the original image data.
2.1.3. Motion vector
With H264 compression, frame-based megabytes are predicted based on the
information that has been transferred from the encoder to the decoder. Usually,
there are two ways of predicting frame prediction and inter-frame prediction.
Frame forecasting uses compressed image data in the same frame as the
compressed macroblock and predicts inter-frame image data using previously
compressed frames. Interframe forecasting is accomplished through a predictive
and compensatory motion process in which the motion predator retrieves the
macroblock in the reference frame closest to the new macroblock and calculates
the motion vector, this vector characterizes the shift of the new macroblock to
encoding compared to the reference frame.
Referenced macroblocks are sent to the subtractor with the new macroblock that
needs coding to find error prediction or residual signal, which will characterize
the difference between the predicted macroblock and the actual macroblock. The
residual signal or prediction error will be converted to Discrete Cosine Transform
and quantized to reduce the number of bits to be stored or transmitted. These
coefficients together with the motion vectors will be applied to the entropy
compressor and the bit stream. Video streams of binary data include conversion
factors, motion prediction information, compressed data structure information,
and more. To perform video compression, one compares the values of the two
frames. A frame is used as a reference. When we want to compress a MB at
position i of a frame, the video compression algorithm tries to find the reference
frame of a MB with the smallest value of MB compared to MB at position i. Then,
if MB is found in the reference frame at position j, the change between i and j is
called the Motion vector (MV) of MB at position i (Fig. 2.5). Normally an MV
will consist of two values: x (the column position of MB) and y (row position of
MB).
6815653

More Related Content

Similar to Motion analysis from encoded video bitstream.pdf

Research of the Current Status of Vinyl Records in Context of the Internet
Research of the Current Status of Vinyl Records in Context of the InternetResearch of the Current Status of Vinyl Records in Context of the Internet
Research of the Current Status of Vinyl Records in Context of the InternetSarah Steffen
 
Computer Science From the Bottom Up
Computer Science From the Bottom UpComputer Science From the Bottom Up
Computer Science From the Bottom Up鍾誠 陳鍾誠
 
Tomaszewski, Mark - Thesis: Application of Consumer-Off-The-Shelf (COTS) Devi...
Tomaszewski, Mark - Thesis: Application of Consumer-Off-The-Shelf (COTS) Devi...Tomaszewski, Mark - Thesis: Application of Consumer-Off-The-Shelf (COTS) Devi...
Tomaszewski, Mark - Thesis: Application of Consumer-Off-The-Shelf (COTS) Devi...Mark Tomaszewski
 
The Defender's Dilemma
The Defender's DilemmaThe Defender's Dilemma
The Defender's DilemmaSymantec
 
Adaptive Networking Protocol for Rapid Mobility
Adaptive Networking Protocol for Rapid MobilityAdaptive Networking Protocol for Rapid Mobility
Adaptive Networking Protocol for Rapid MobilityDr. Edwin Hernandez
 
Technical Communication 14th Edition Lannon Solutions Manual
Technical Communication 14th Edition Lannon Solutions ManualTechnical Communication 14th Edition Lannon Solutions Manual
Technical Communication 14th Edition Lannon Solutions ManualIgnaciaCash
 
Bachelor's Thesis: Mobile Advertising
Bachelor's Thesis: Mobile AdvertisingBachelor's Thesis: Mobile Advertising
Bachelor's Thesis: Mobile AdvertisingVantharith Oum
 
Private investment labour demand and social welfare in ssa
Private investment labour demand and social welfare in ssaPrivate investment labour demand and social welfare in ssa
Private investment labour demand and social welfare in ssaSamuel Agyei
 
BE1268_Dissertation_Clarke_Ricky_W13032289
BE1268_Dissertation_Clarke_Ricky_W13032289BE1268_Dissertation_Clarke_Ricky_W13032289
BE1268_Dissertation_Clarke_Ricky_W13032289Ricky Clarke
 
cops-p317-pub_Ferguson
cops-p317-pub_Fergusoncops-p317-pub_Ferguson
cops-p317-pub_Fergusonralston2152003
 

Similar to Motion analysis from encoded video bitstream.pdf (20)

Research of the Current Status of Vinyl Records in Context of the Internet
Research of the Current Status of Vinyl Records in Context of the InternetResearch of the Current Status of Vinyl Records in Context of the Internet
Research of the Current Status of Vinyl Records in Context of the Internet
 
Knustthesis
KnustthesisKnustthesis
Knustthesis
 
Computer Science From the Bottom Up
Computer Science From the Bottom UpComputer Science From the Bottom Up
Computer Science From the Bottom Up
 
Tomaszewski, Mark - Thesis: Application of Consumer-Off-The-Shelf (COTS) Devi...
Tomaszewski, Mark - Thesis: Application of Consumer-Off-The-Shelf (COTS) Devi...Tomaszewski, Mark - Thesis: Application of Consumer-Off-The-Shelf (COTS) Devi...
Tomaszewski, Mark - Thesis: Application of Consumer-Off-The-Shelf (COTS) Devi...
 
ThesisCIccone
ThesisCIcconeThesisCIccone
ThesisCIccone
 
GHopkins_BSc_2014
GHopkins_BSc_2014GHopkins_BSc_2014
GHopkins_BSc_2014
 
The Defender's Dilemma
The Defender's DilemmaThe Defender's Dilemma
The Defender's Dilemma
 
Rand rr4212 (1)
Rand rr4212 (1)Rand rr4212 (1)
Rand rr4212 (1)
 
Adaptive Networking Protocol for Rapid Mobility
Adaptive Networking Protocol for Rapid MobilityAdaptive Networking Protocol for Rapid Mobility
Adaptive Networking Protocol for Rapid Mobility
 
PhD_Thesis_J_R_Richards
PhD_Thesis_J_R_RichardsPhD_Thesis_J_R_Richards
PhD_Thesis_J_R_Richards
 
Thesis
ThesisThesis
Thesis
 
Technical Communication 14th Edition Lannon Solutions Manual
Technical Communication 14th Edition Lannon Solutions ManualTechnical Communication 14th Edition Lannon Solutions Manual
Technical Communication 14th Edition Lannon Solutions Manual
 
Analytical-Chemistry
Analytical-ChemistryAnalytical-Chemistry
Analytical-Chemistry
 
Bachelor's Thesis: Mobile Advertising
Bachelor's Thesis: Mobile AdvertisingBachelor's Thesis: Mobile Advertising
Bachelor's Thesis: Mobile Advertising
 
main - copie
main - copiemain - copie
main - copie
 
Private investment labour demand and social welfare in ssa
Private investment labour demand and social welfare in ssaPrivate investment labour demand and social welfare in ssa
Private investment labour demand and social welfare in ssa
 
BE1268_Dissertation_Clarke_Ricky_W13032289
BE1268_Dissertation_Clarke_Ricky_W13032289BE1268_Dissertation_Clarke_Ricky_W13032289
BE1268_Dissertation_Clarke_Ricky_W13032289
 
cops-p317-pub_Ferguson
cops-p317-pub_Fergusoncops-p317-pub_Ferguson
cops-p317-pub_Ferguson
 
Rand rr3242 (1)
Rand rr3242 (1)Rand rr3242 (1)
Rand rr3242 (1)
 
Rand rr3242
Rand rr3242Rand rr3242
Rand rr3242
 

More from HanaTiti

TRUYỀN THÔNG TRONG CÁC SỰ KIỆN NGHỆ THUẬT Ở VIỆT NAM NĂM 2012.pdf
TRUYỀN THÔNG TRONG CÁC SỰ KIỆN NGHỆ THUẬT Ở VIỆT NAM NĂM 2012.pdfTRUYỀN THÔNG TRONG CÁC SỰ KIỆN NGHỆ THUẬT Ở VIỆT NAM NĂM 2012.pdf
TRUYỀN THÔNG TRONG CÁC SỰ KIỆN NGHỆ THUẬT Ở VIỆT NAM NĂM 2012.pdfHanaTiti
 
TRỊ LIỆU TÂM LÝ CHO MỘT TRƢỜNG HỢP TRẺ VỊ THÀNH NIÊN CÓ TRIỆU CHỨNG TRẦM CẢM.pdf
TRỊ LIỆU TÂM LÝ CHO MỘT TRƢỜNG HỢP TRẺ VỊ THÀNH NIÊN CÓ TRIỆU CHỨNG TRẦM CẢM.pdfTRỊ LIỆU TÂM LÝ CHO MỘT TRƢỜNG HỢP TRẺ VỊ THÀNH NIÊN CÓ TRIỆU CHỨNG TRẦM CẢM.pdf
TRỊ LIỆU TÂM LÝ CHO MỘT TRƢỜNG HỢP TRẺ VỊ THÀNH NIÊN CÓ TRIỆU CHỨNG TRẦM CẢM.pdfHanaTiti
 
IMPACTS OF FINANCIAL DEPTH AND DOMESTIC CREDIT ON ECONOMIC GROWTH - THE CASES...
IMPACTS OF FINANCIAL DEPTH AND DOMESTIC CREDIT ON ECONOMIC GROWTH - THE CASES...IMPACTS OF FINANCIAL DEPTH AND DOMESTIC CREDIT ON ECONOMIC GROWTH - THE CASES...
IMPACTS OF FINANCIAL DEPTH AND DOMESTIC CREDIT ON ECONOMIC GROWTH - THE CASES...HanaTiti
 
THE LINKAGE BETWEEN CORRUPTION AND CARBON DIOXIDE EMISSION - EVIDENCE FROM AS...
THE LINKAGE BETWEEN CORRUPTION AND CARBON DIOXIDE EMISSION - EVIDENCE FROM AS...THE LINKAGE BETWEEN CORRUPTION AND CARBON DIOXIDE EMISSION - EVIDENCE FROM AS...
THE LINKAGE BETWEEN CORRUPTION AND CARBON DIOXIDE EMISSION - EVIDENCE FROM AS...HanaTiti
 
Phát triển dịch vụ Ngân hàng bán lẻ tại Ngân hàng thương mại cổ phần xuất nhậ...
Phát triển dịch vụ Ngân hàng bán lẻ tại Ngân hàng thương mại cổ phần xuất nhậ...Phát triển dịch vụ Ngân hàng bán lẻ tại Ngân hàng thương mại cổ phần xuất nhậ...
Phát triển dịch vụ Ngân hàng bán lẻ tại Ngân hàng thương mại cổ phần xuất nhậ...HanaTiti
 
Nhân vật phụ nữ trong truyện ngắn Cao Duy Sơn.pdf
Nhân vật phụ nữ trong truyện ngắn Cao Duy Sơn.pdfNhân vật phụ nữ trong truyện ngắn Cao Duy Sơn.pdf
Nhân vật phụ nữ trong truyện ngắn Cao Duy Sơn.pdfHanaTiti
 
Pháp luật về giao dịch bảo hiểm nhân thọ ở Việt Nam.pdf
Pháp luật về giao dịch bảo hiểm nhân thọ ở Việt Nam.pdfPháp luật về giao dịch bảo hiểm nhân thọ ở Việt Nam.pdf
Pháp luật về giao dịch bảo hiểm nhân thọ ở Việt Nam.pdfHanaTiti
 
Tổ chức dạy học lịch sử Việt Nam lớp 10 theo hướng phát triển năng lực vận dụ...
Tổ chức dạy học lịch sử Việt Nam lớp 10 theo hướng phát triển năng lực vận dụ...Tổ chức dạy học lịch sử Việt Nam lớp 10 theo hướng phát triển năng lực vận dụ...
Tổ chức dạy học lịch sử Việt Nam lớp 10 theo hướng phát triển năng lực vận dụ...HanaTiti
 
The impact of education on unemployment incidence - micro evidence from Vietn...
The impact of education on unemployment incidence - micro evidence from Vietn...The impact of education on unemployment incidence - micro evidence from Vietn...
The impact of education on unemployment incidence - micro evidence from Vietn...HanaTiti
 
Deteminants of brand loyalty in the Vietnamese neer industry.pdf
Deteminants of brand loyalty in the Vietnamese neer industry.pdfDeteminants of brand loyalty in the Vietnamese neer industry.pdf
Deteminants of brand loyalty in the Vietnamese neer industry.pdfHanaTiti
 
Phát triển hoạt động môi giới chứng khoán của CTCP Alpha.pdf
Phát triển hoạt động môi giới chứng khoán của CTCP Alpha.pdfPhát triển hoạt động môi giới chứng khoán của CTCP Alpha.pdf
Phát triển hoạt động môi giới chứng khoán của CTCP Alpha.pdfHanaTiti
 
The current situation of English language teaching in the light of CLT to the...
The current situation of English language teaching in the light of CLT to the...The current situation of English language teaching in the light of CLT to the...
The current situation of English language teaching in the light of CLT to the...HanaTiti
 
Quản lý chi ngân sách nhà nước tại Kho bạc nhà nước Ba Vì.pdf
Quản lý chi ngân sách nhà nước tại Kho bạc nhà nước Ba Vì.pdfQuản lý chi ngân sách nhà nước tại Kho bạc nhà nước Ba Vì.pdf
Quản lý chi ngân sách nhà nước tại Kho bạc nhà nước Ba Vì.pdfHanaTiti
 
Sự tiếp nhận đối với Hàng không giá rẻ của khách hàng Việt Nam.pdf
Sự tiếp nhận đối với Hàng không giá rẻ của khách hàng Việt Nam.pdfSự tiếp nhận đối với Hàng không giá rẻ của khách hàng Việt Nam.pdf
Sự tiếp nhận đối với Hàng không giá rẻ của khách hàng Việt Nam.pdfHanaTiti
 
An Investigation into the Effect of Matching Exercises on the 10th form Stude...
An Investigation into the Effect of Matching Exercises on the 10th form Stude...An Investigation into the Effect of Matching Exercises on the 10th form Stude...
An Investigation into the Effect of Matching Exercises on the 10th form Stude...HanaTiti
 
Đánh giá chất lượng truyền tin multicast trên tầng ứng dụng.pdf
Đánh giá chất lượng truyền tin multicast trên tầng ứng dụng.pdfĐánh giá chất lượng truyền tin multicast trên tầng ứng dụng.pdf
Đánh giá chất lượng truyền tin multicast trên tầng ứng dụng.pdfHanaTiti
 
Quản lý các trường THCS trên địa bàn huyện Thanh Sơn, tỉnh Phú Thọ theo hướng...
Quản lý các trường THCS trên địa bàn huyện Thanh Sơn, tỉnh Phú Thọ theo hướng...Quản lý các trường THCS trên địa bàn huyện Thanh Sơn, tỉnh Phú Thọ theo hướng...
Quản lý các trường THCS trên địa bàn huyện Thanh Sơn, tỉnh Phú Thọ theo hướng...HanaTiti
 
Nghiên cứu và đề xuất mô hình nuôi tôm bền vững vùng ven biển huyện Thái Thụy...
Nghiên cứu và đề xuất mô hình nuôi tôm bền vững vùng ven biển huyện Thái Thụy...Nghiên cứu và đề xuất mô hình nuôi tôm bền vững vùng ven biển huyện Thái Thụy...
Nghiên cứu và đề xuất mô hình nuôi tôm bền vững vùng ven biển huyện Thái Thụy...HanaTiti
 
PHÁT TRIỂN DOANH NGHIỆP THƯƠNG MẠI NHỎ VÀ VỪA TRÊN ĐỊA BÀN TỈNH HÀ TĨNH.pdf
PHÁT TRIỂN DOANH NGHIỆP THƯƠNG MẠI NHỎ VÀ VỪA TRÊN ĐỊA BÀN TỈNH HÀ TĨNH.pdfPHÁT TRIỂN DOANH NGHIỆP THƯƠNG MẠI NHỎ VÀ VỪA TRÊN ĐỊA BÀN TỈNH HÀ TĨNH.pdf
PHÁT TRIỂN DOANH NGHIỆP THƯƠNG MẠI NHỎ VÀ VỪA TRÊN ĐỊA BÀN TỈNH HÀ TĨNH.pdfHanaTiti
 
ENERGY CONSUMPTION AND REAL GDP IN ASEAN.pdf
ENERGY CONSUMPTION AND REAL GDP IN ASEAN.pdfENERGY CONSUMPTION AND REAL GDP IN ASEAN.pdf
ENERGY CONSUMPTION AND REAL GDP IN ASEAN.pdfHanaTiti
 

More from HanaTiti (20)

TRUYỀN THÔNG TRONG CÁC SỰ KIỆN NGHỆ THUẬT Ở VIỆT NAM NĂM 2012.pdf
TRUYỀN THÔNG TRONG CÁC SỰ KIỆN NGHỆ THUẬT Ở VIỆT NAM NĂM 2012.pdfTRUYỀN THÔNG TRONG CÁC SỰ KIỆN NGHỆ THUẬT Ở VIỆT NAM NĂM 2012.pdf
TRUYỀN THÔNG TRONG CÁC SỰ KIỆN NGHỆ THUẬT Ở VIỆT NAM NĂM 2012.pdf
 
TRỊ LIỆU TÂM LÝ CHO MỘT TRƢỜNG HỢP TRẺ VỊ THÀNH NIÊN CÓ TRIỆU CHỨNG TRẦM CẢM.pdf
TRỊ LIỆU TÂM LÝ CHO MỘT TRƢỜNG HỢP TRẺ VỊ THÀNH NIÊN CÓ TRIỆU CHỨNG TRẦM CẢM.pdfTRỊ LIỆU TÂM LÝ CHO MỘT TRƢỜNG HỢP TRẺ VỊ THÀNH NIÊN CÓ TRIỆU CHỨNG TRẦM CẢM.pdf
TRỊ LIỆU TÂM LÝ CHO MỘT TRƢỜNG HỢP TRẺ VỊ THÀNH NIÊN CÓ TRIỆU CHỨNG TRẦM CẢM.pdf
 
IMPACTS OF FINANCIAL DEPTH AND DOMESTIC CREDIT ON ECONOMIC GROWTH - THE CASES...
IMPACTS OF FINANCIAL DEPTH AND DOMESTIC CREDIT ON ECONOMIC GROWTH - THE CASES...IMPACTS OF FINANCIAL DEPTH AND DOMESTIC CREDIT ON ECONOMIC GROWTH - THE CASES...
IMPACTS OF FINANCIAL DEPTH AND DOMESTIC CREDIT ON ECONOMIC GROWTH - THE CASES...
 
THE LINKAGE BETWEEN CORRUPTION AND CARBON DIOXIDE EMISSION - EVIDENCE FROM AS...
THE LINKAGE BETWEEN CORRUPTION AND CARBON DIOXIDE EMISSION - EVIDENCE FROM AS...THE LINKAGE BETWEEN CORRUPTION AND CARBON DIOXIDE EMISSION - EVIDENCE FROM AS...
THE LINKAGE BETWEEN CORRUPTION AND CARBON DIOXIDE EMISSION - EVIDENCE FROM AS...
 
Phát triển dịch vụ Ngân hàng bán lẻ tại Ngân hàng thương mại cổ phần xuất nhậ...
Phát triển dịch vụ Ngân hàng bán lẻ tại Ngân hàng thương mại cổ phần xuất nhậ...Phát triển dịch vụ Ngân hàng bán lẻ tại Ngân hàng thương mại cổ phần xuất nhậ...
Phát triển dịch vụ Ngân hàng bán lẻ tại Ngân hàng thương mại cổ phần xuất nhậ...
 
Nhân vật phụ nữ trong truyện ngắn Cao Duy Sơn.pdf
Nhân vật phụ nữ trong truyện ngắn Cao Duy Sơn.pdfNhân vật phụ nữ trong truyện ngắn Cao Duy Sơn.pdf
Nhân vật phụ nữ trong truyện ngắn Cao Duy Sơn.pdf
 
Pháp luật về giao dịch bảo hiểm nhân thọ ở Việt Nam.pdf
Pháp luật về giao dịch bảo hiểm nhân thọ ở Việt Nam.pdfPháp luật về giao dịch bảo hiểm nhân thọ ở Việt Nam.pdf
Pháp luật về giao dịch bảo hiểm nhân thọ ở Việt Nam.pdf
 
Tổ chức dạy học lịch sử Việt Nam lớp 10 theo hướng phát triển năng lực vận dụ...
Tổ chức dạy học lịch sử Việt Nam lớp 10 theo hướng phát triển năng lực vận dụ...Tổ chức dạy học lịch sử Việt Nam lớp 10 theo hướng phát triển năng lực vận dụ...
Tổ chức dạy học lịch sử Việt Nam lớp 10 theo hướng phát triển năng lực vận dụ...
 
The impact of education on unemployment incidence - micro evidence from Vietn...
The impact of education on unemployment incidence - micro evidence from Vietn...The impact of education on unemployment incidence - micro evidence from Vietn...
The impact of education on unemployment incidence - micro evidence from Vietn...
 
Deteminants of brand loyalty in the Vietnamese neer industry.pdf
Deteminants of brand loyalty in the Vietnamese neer industry.pdfDeteminants of brand loyalty in the Vietnamese neer industry.pdf
Deteminants of brand loyalty in the Vietnamese neer industry.pdf
 
Phát triển hoạt động môi giới chứng khoán của CTCP Alpha.pdf
Phát triển hoạt động môi giới chứng khoán của CTCP Alpha.pdfPhát triển hoạt động môi giới chứng khoán của CTCP Alpha.pdf
Phát triển hoạt động môi giới chứng khoán của CTCP Alpha.pdf
 
The current situation of English language teaching in the light of CLT to the...
The current situation of English language teaching in the light of CLT to the...The current situation of English language teaching in the light of CLT to the...
The current situation of English language teaching in the light of CLT to the...
 
Quản lý chi ngân sách nhà nước tại Kho bạc nhà nước Ba Vì.pdf
Quản lý chi ngân sách nhà nước tại Kho bạc nhà nước Ba Vì.pdfQuản lý chi ngân sách nhà nước tại Kho bạc nhà nước Ba Vì.pdf
Quản lý chi ngân sách nhà nước tại Kho bạc nhà nước Ba Vì.pdf
 
Sự tiếp nhận đối với Hàng không giá rẻ của khách hàng Việt Nam.pdf
Sự tiếp nhận đối với Hàng không giá rẻ của khách hàng Việt Nam.pdfSự tiếp nhận đối với Hàng không giá rẻ của khách hàng Việt Nam.pdf
Sự tiếp nhận đối với Hàng không giá rẻ của khách hàng Việt Nam.pdf
 
An Investigation into the Effect of Matching Exercises on the 10th form Stude...
An Investigation into the Effect of Matching Exercises on the 10th form Stude...An Investigation into the Effect of Matching Exercises on the 10th form Stude...
An Investigation into the Effect of Matching Exercises on the 10th form Stude...
 
Đánh giá chất lượng truyền tin multicast trên tầng ứng dụng.pdf
Đánh giá chất lượng truyền tin multicast trên tầng ứng dụng.pdfĐánh giá chất lượng truyền tin multicast trên tầng ứng dụng.pdf
Đánh giá chất lượng truyền tin multicast trên tầng ứng dụng.pdf
 
Quản lý các trường THCS trên địa bàn huyện Thanh Sơn, tỉnh Phú Thọ theo hướng...
Quản lý các trường THCS trên địa bàn huyện Thanh Sơn, tỉnh Phú Thọ theo hướng...Quản lý các trường THCS trên địa bàn huyện Thanh Sơn, tỉnh Phú Thọ theo hướng...
Quản lý các trường THCS trên địa bàn huyện Thanh Sơn, tỉnh Phú Thọ theo hướng...
 
Nghiên cứu và đề xuất mô hình nuôi tôm bền vững vùng ven biển huyện Thái Thụy...
Nghiên cứu và đề xuất mô hình nuôi tôm bền vững vùng ven biển huyện Thái Thụy...Nghiên cứu và đề xuất mô hình nuôi tôm bền vững vùng ven biển huyện Thái Thụy...
Nghiên cứu và đề xuất mô hình nuôi tôm bền vững vùng ven biển huyện Thái Thụy...
 
PHÁT TRIỂN DOANH NGHIỆP THƯƠNG MẠI NHỎ VÀ VỪA TRÊN ĐỊA BÀN TỈNH HÀ TĨNH.pdf
PHÁT TRIỂN DOANH NGHIỆP THƯƠNG MẠI NHỎ VÀ VỪA TRÊN ĐỊA BÀN TỈNH HÀ TĨNH.pdfPHÁT TRIỂN DOANH NGHIỆP THƯƠNG MẠI NHỎ VÀ VỪA TRÊN ĐỊA BÀN TỈNH HÀ TĨNH.pdf
PHÁT TRIỂN DOANH NGHIỆP THƯƠNG MẠI NHỎ VÀ VỪA TRÊN ĐỊA BÀN TỈNH HÀ TĨNH.pdf
 
ENERGY CONSUMPTION AND REAL GDP IN ASEAN.pdf
ENERGY CONSUMPTION AND REAL GDP IN ASEAN.pdfENERGY CONSUMPTION AND REAL GDP IN ASEAN.pdf
ENERGY CONSUMPTION AND REAL GDP IN ASEAN.pdf
 

Recently uploaded

Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 

Recently uploaded (20)

Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 

Motion analysis from encoded video bitstream.pdf

  • 1. VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY NGUYEN MINH HOA MOTION ANALYSIS FROM ENCODED VIDEO BITSTREAM MASTER’S THESIS HA NOI – 2018
  • 2. VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY NGUYEN MINH HOA MOTION ANALYSIS FROM ENCODED VIDEO BITSTREAM Major: Computer Science MASTER’S THESIS Supervisor: Dr. Do Van Nguyen Co-Supervisor: Dr. Tran Quoc Long HA NOI - 2018
  • 3. i AUTHORSHIP “I hereby declare that the work contained in this thesis is of my own and I have not submitted this thesis at any other institution in order to obtain a degree. To the best of my knowledge and belief, the thesis contains no materials previously published or written by another person other than those listed in the bibliography and identified as references.” Signature: ………………………………………………
  • 4. ii SUPERVISOR’S APPROVAL “I hereby approve that the thesis in its current form is ready for committee examination as a requirement for the Master of Computer Science degree at the University of Engineering and Technology.” Signature: ……………………………………………… Signature: ………………………………………………
  • 5. iii ACKNOWLEDGMENTS First of all, I would like to express special gratitude to my supervisors, Dr. Do Van Nguyen and Dr. Tran Quoc Long, for their enthusiasm for instructions, the technical explanation as well as advices during this project. I also want to give sincere thanks to Assoc. Prof. Dr. Ha Le Thanh, Assoc. Prof. Dr. Nguyen Thi Thuy for the instructions as well as the background knowledge for this thesis. And I would like to also thank my teachers, my friends in Human Machine Interaction Lab for their support. Thank my friends, my colleagues in the project "Nghiên Cứu Công Nghệ Tóm Tắt Video", and project “Multimedia application tools for intangible cultural heritage conservation and promotion”, project number ĐTDL.CN-34/16 for their working and support. Last but not least, I want to thank my family and all of my friends for their motivation and support as well. They stand by and inspire me whenever I face the tough time.
  • 6. 1 TABLE OF CONTENTS AUTHORSHIP.......................................................................................................i SUPERVISOR’S APPROVAL.............................................................................ii ACKNOWLEDGMENTS....................................................................................iii TABLE OF CONTENTS......................................................................................1 ABBREVIATIONS...............................................................................................3 List of Figures .......................................................................................................4 List of Tables.........................................................................................................5 INTRODUCTION.................................................................................................6 CHAPTER 1. LITERATURE REVIEW ............................................................9 Moving object detection in the pixel domain..........................................9 Moving object detection in the compressed domain.............................10 1.2.1. Motion vector approaches.............................................................11 1.2.2. Size of Macroblock approaches ....................................................13 Chapter Summarization.........................................................................14 CHAPTER 2. METHODOLOGY ....................................................................15 Video compression standard h264 ........................................................15 2.1.1. H264 file structure.........................................................................15 2.1.2. Macroblock....................................................................................18 2.1.3. Motion vector................................................................................19 Proposed method ...................................................................................21 2.2.1. Process video bitstream.................................................................21 2.2.2. Macroblock-based Segmentation..................................................22 2.2.3. Object-based Segmentation...........................................................24 2.2.4. Object Refinement ........................................................................28
  • 7. 2 Chapter Summarization.........................................................................28 CHAPTER 3. RESULTS ..................................................................................30 The moving object detection application ..............................................30 3.1.1. The process of application ............................................................31 3.1.2. The motion information ................................................................34 3.1.3. Synthesizing movement information ............................................35 3.1.4. Storing Movement Information ....................................................36 Experiments...........................................................................................36 3.2.1. Dataset...........................................................................................36 3.2.2. Evaluation methods.......................................................................40 3.2.3. Implementations............................................................................41 3.2.4. Experimental results......................................................................41 Chapter Summarization.........................................................................44 CONCLUSIONS.................................................................................................45 List of of author’s publications related to thesis.................................................46 REFERENCES....................................................................................................47
  • 8. 3 ABBREVIATIONS MB Macroblock MV Motion vector NALU Network Abstraction Layer Unit RBSP Raw Byte Sequence Payload SODB String Of Data Bits
  • 9. 4 List of Figures Figure 1.1. The process of moving object detection with data in the pixel domain .............................................................................................................................10 Figure 1.2. The process of moving object detection with data in the compressed domain.................................................................................................................11 Figure 2.1. The structure of a H264 file..............................................................15 Figure 2.2. RBSP structure..................................................................................16 Figure 2.3. Slide structure ...................................................................................18 Figure 2.4. Macroblock structure........................................................................18 Figure 2.5. The motion vector of a Macroblock .................................................20 Figure 2.6. The process of moving object detection method..............................22 Figure 2.7. Skipped Macroblock.........................................................................23 Figure 2.8. (a) An outdoor and in-door frames (b) The "size-map" of frames, (c) The "motion-map" of frames...............................................................................24 Figure 2.9. Example about the “consistent” of motion vector............................26 Figure 3.1. The implementation process of the approach...................................33 Figure 3.2. Data struct to storage motion information........................................35 Figure 3.3. Example frames of test videos..........................................................37 Figure 3.4. Example frames and their ground truth............................................39 Figure 3.5. An example frame of Pedestrians (a) and ground truth image (b)...40
  • 10. 5 List of Tables Table 2.1. NALU types .......................................................................................16 Table 2.2. Slide types ..........................................................................................17 Table 3.1. The information of test videos ...........................................................38 Table 3.2. The information of test sequences in group 1....................................39 Table 3.3. The performance of two approachs with Pedestrians, PETS2006, Highway, and Office ...........................................................................................42 Table 3.4. The experimental result of Poppe’s approach on 2nd group...............42 Table 3.5. The experimental result of proposed method on 2nd group ...............43
  • 11. 6 INTRODUCTION Today, video content is extensively used in the areas of life such as indoor monitoring, traffic monitoring, etc. The number of videos sharing over the Internet at any given time is also extremely large. According to statistics, hundreds of hours of video are uploaded to Youtube every minute [1]. Not only that, the general trend today is the surveillance cameras installed in homes for surveillance and sercurity purposes. These cameras will normally operate and store the surveillance videos automatically. Only when there are some special situations, or some special events occur, humans will use the video data to revisit. The problem is that in a short amount of time, how can such a large video volume be evaluated? For example, when there is a burglary, an intrusion occurs, we can not spend hours to check each video previously stored. Then, a tool that lets you determine the moment when an object is moving in a long video is essential to reducing the time and effort of searching. Normally, in order to reduce the size of videos for transmission or storing, a video compression procedure is performed at surveillance cameras. After that, the compressed information in form of bit stream is stored, or transmitted to a server for analysis. The video analysis process needs a lot of features to describe different aspects of vision. Typically, these features are extracted from the pixel values of each video frame by fully decompressing bitstream. The decompression procedure requires high computation capacity device to perform. However, with the trend of "Internet of Things", there are many low processing capacity devices which are not capable for performing this full video decompression at high speed. So, it is difficult to perform an approach that requires a lot of computing power in real time. Another way to extract the feature from the video is using the data on the compressed video. These data can be: transform coefficients, motion vectors, quantization steps, quantization parameters, etc. From the above data, through the process and analysis, we can handle some important tasks in the computer vision include moving objects detection, human actions detection, face recognition, motion objects tracking. This thesis proposes a new method to determine moving object by exploring and applying some motion estimation techniques in the video compression domain. After that, the method will be used to build an application that supports movement searching in the surveillance videos in the families. The compression format of
  • 12. 7 the videos in the thesis is the H264 compression standard (MPEG-4 part10), a popular video compression standard today. Aims The goal of the thesis is to propose a method for determining moving objects in the compressed domain of a video. Then, I try to build an application using the method for support searching the moments which have moving objects in the video. Object and Scope of the study Within the framework of the thesis, I study the algorithms related to determining moving objects in video, especially the algorithms that determine moving objects in the compressed domain. The video compression standard is used in the thesis is H264/AVC. The theory of video compression and computer vision are taken from scientific articles related to the video analysis problem on the compression domain, determine the motion form on the compression domain of the video. The videos for test and experiment are obtained from the surveillance cameras both indoor and outdoor. Method and procedures - Research on motion analysis and evaluation systems on existing compressed video, scientific articles related to the analysis and evaluation of motion on compressed video. - Experimental research: Conduct experiential settings for each theoretical part such as extracting video data, compiling data, and evaluating motion based on the obtained data. - Experimental evaluation: Each experiment will be conducted independently on each module and then integrated and deployed. Contributions The thesis proposes a new moving object detection method in surveillance video encoded with H264 compression standard using the motion vector and size of macroblock.
  • 13. 8 Thesis structure Apart from the introduction, the conclution and the references, this thesis is organized into 3 chapters with the following main contents: Chapter 1 is literature review. This chapter will show the related work of the thesis include the moving object detection methods in the pixel domain and the moving object detection methods in the compressed domain. Chapter 2 mentiones the basic knowledge about video compression standard H264 such as H264 file structure, macroblocks, motion vectors and describes the detail of moving object detection method including processing video bitstreams, macroblock-based segmentation phase, object-based segmentation phase, and object refinement phase. Chapter 3 shows the results of method including an application using proposed method and experimental results.
  • 14. 9 CHAPTER 1. LITERATURE REVIEW Today, surveillance cameras are used extensively in the world. The volume of video surveillance has also grown tremendously. Some problems that are often encountered with video surveillance include event searching, motion tracking, abnormal behavior detection, etc. In order to handle these tasks, it is necessary to have a method that can determine which the moments in each videos exist movements. Usually, the video is compressed for storage and transmission. The previous moving object detection method usually use the data from the pixel images such as color value, edges, etc. To get the images that can be displayed, or processed, the system must decode video fully. This consumes a large number of computing resources, time and memory of the device. I suggest a method that can quickly determine the moving objects in high resolution videos. The data used in the method will be taken from the compressed video domain including information about the motion vector and the size of the macroblock (in bit) after encoding. The method reduces the processing time of the method considerably compared to methods implemented with data on the pixel domain. The problem of motion detection in a video has long been studied. This is the first step in a series of computer vision problems such as object tracking, object detection, abnormal movement detection, etc. There are usually two approaches to address this problem: using fully decoded video data (pixel domain data) or using live data from an undecoded video (compressed domain data). The following section will outline the studies based on these two approaches. Moving object detection in the pixel domain Typically, to reduce the size of the video for transmission, a video encoding process is performed inside the surveillance camera and the compressed information is transmitted as a bit stream to a server for video analysis. Common video compression standards used today including mp4, H264, H265. To be viewable, these compressed videos need to be decoded to image frames. We call these image frames are the pixel domain and the data obtained from these image frames are the data in the pixel domain. Fig. 1.1 describes the process of moving object detection methods in the pixel domain. The data in the pixel domain include the color values of the pixels, the number of color channels of each pixel, the edges, etc.
  • 15. 10 Figure 1.1. The process of moving object detection with data in the pixel domain To determine moving objects in the pixel domain, background subtraction algorithms are commonly used. There are many research results that have been introduced long ago. These methods usually use data as the relationship between frames in a time series. Background subtraction in [2] is defined as: “Background subtraction is a widely used approach for detecting moving objects in videos from static cameras. The rationale in the approach is that of detecting the moving objects from the difference between the current frame and a reference frame, often called The “background image”, or “background model”. As a basic, the background image must be a representation of the scene with no moving objects and must be kept regularly updated so as to adapt to the varying luminarice conditions and geometry settings.”. Results of the researchs may include the methods use Gaussian average such as the method of Wren et al. [3], the method of Koller et al. [4]; the methods use Temporal median filter such as the method of Lo and Velasti [5], the method of Cucchiara et al. [6]; the methods using a mixture of Gaussians such as the method of Stauffer and Grimson [7], methods of Wayne Power and Schoonees [8]; etc. The above methods have a common characteristic that is the process data are taken by fully decompress the compressed bitstream and this decompression procedure requires a highly computational device to perform. However, with the trend of "Internet of Things," where most low-end devices are not capable of performing high-speed decompression. Therefore, there should be a video analysis mechanism that includes only uncompressed video. Moving object detection in the compressed domain Normally, the videos will be encoded using some compression standard. Each compression standard specifies how to shrink the video size by a certain structure. The compressed videos will contain fewer data. For example, with the H264 compression standard, the data contained in the compressed video includes
  • 16. 11 information about macroblock, motion vector, frame information, etc. We call these data that the data in the compressed domain or video compression region. Fig. 1.2 shows the process of moving object detection methods by using the data in the compressed domain. Figure 1.2. The process of moving object detection with data in the compressed domain In general, the amount of data in the video compression domain is much less than the data in the pixel domain. The idea of using data in the compressed domain with the H264 compression standard for video analysis has also been investigated by some scientists around the world. In order to be able to detect motion in the compressed video domain, we usually use two types of data. They are the motion vector and the size (in bit) of the macroblock. 1.2.1. Motion vector approaches A number of algorithms have been proposed to analyze video content in the H264 compressed domain, whose good performances have been obtained [9] [10]. Zeng et al. Study in [11] proposed a method to detect moving objects in H264 compressed videos based on motion vectors. Motion vectors are extracted from the motion field and classified into several types. Then, they are grouped into blocks through the Markov Random Field (MRF) classification process. Liu et al. [12] recognized the shape of an object by using a map for each object. This approach is based on a binary partition tree created by macroblocks. Cipres et al. [13] presented a moving object detection approach in the H264 compressed domain based on fuzzy logic. The motion vectors are used to remove the noises that appear during the encoding process and represent the concepts that describe the detected regions. Then, the valid motion vectors are grouped into blocks. Each of them could be identified as a moving object in the video scene. The moving objects of each frame are described with common terms like shape, size, position, and velocity. Mak et al. [14] used the length, angle, and direction of motion vectors to track the objects by applying the MRF. Bruyne et al. [15] estimated the
  • 17. 12 reliability of motion vectors by comparing them with projected motion vectors from surrounding frames. Then, they combined this information with the magnitude of motion vectors to distinguish foreground objects from the background. This method can localize the noisy motion vectors and their effect during the classification can be diminished. Wang et al. [16] proposed a background modeling method using the motion vector and local binary pattern (LBP) to detect the moving object. When a background block was similar to a foreground block, a noisy motion vector would appear. To obtain a more reliable and dense motion vector field, the initial motion vector fields were preprocessed by a temporal accumulation within three inter frames and a 3×3 median filtering. After that, the LBP feature was introduced to describe the spatial correlation among neighboring blocks. This approach can reduce the time of extracting moving objects while also performing an effective synopsis analysis. Marcus Laumer [17] proposed an approach to segment video frames into the foreground and background and, according to this segmentation, to identify regions containing moving objects. The approach uses a map to indicate the "weight" of each (sub-)macroblock for the presence of a moving object. This map is the input of a new spatiotemporal detection algorithm that is used to refine the weight that indicated the level of motion for each block. Then, quantization parameters of macroblocks are used to apply individual thresholds to the block weights to segment the video frames. The accuracy of the approach was approximately 50%. To identify the human action, Tom et al. [18] proposed a quick action identification algorithm. The algorithm uses quantization parameters gradient image (QGI) and motion vectors with support vector machines (SVM) to classify the types of the actions. The algorithm can also handle light, scale and some other environmental variables with an accuracy rate of 85% on the videos with resolution 176x144. It can identifies walking, running, etc. Similarly, Tom, Rangarajan and his colleagues also used QGI and motion vector to propose a new method to classify human actions as the Projection Based Learning of the Meta- cognitive Radial Basis Functional Network (PBL-McRBFN). With the motion tracking problem, Biswas et al. [19] propose a method for detecting abnormal actions by analyzing motion vector. This method mainly relies on observing the motion vector to find the difference between abnormal actions and normal situations. The classifier used here is the Gaussian Mixture Model (GMM). This approach base on their another approach [20] but improved it by using the direction of the motion vector. The speed of approach when perform experimental is about 70fps. Thilak et al. [21] propose a Probabilistic Data
  • 18. 13 Association Filter that detects multiple target clusters. This method can handle cases in which targets split into multiple clusters or clusters should be detected (classified) as a target. Similarly, You et al. [22] use the probabilistic spatio- temporal MB filtering to mark the macroblock as objects and then remove them from the noise. The algorithm can track many objects with real-time accuracy but can only be applied in case of fixed camera and objects must be at least two macroblocks. Kas et al. [23] overcame the fixed camera problem using Global Motion Estimation and Object History Images to handle background movement. However, the number of motion objects need to be small and the moving objects are not occupied most of the frame area. 1.2.2. Size of Macroblock approaches The methods mentioned above share the trait of using motion vectors to detect moving objects. However, since motion vectors are usually created at the video encoder to optimize video compression ratio, they do not always represent the real motion in the video sequence. As such, due to its coding-oriented nature, to detect moving objects, the motion vector fields must be preprocessed and refined to remove the noises. So, Poppe et al. [24] proposed an approach to detect moving objects in the H264 video by using the size of the macroblocks after encoding (in bit). To achieve Sub- macroblock-level (4×4) precision, the information from transform coefficients was also utilized. The system achieved high execution speeds, up to 20 times faster than the motion vector-based related works. An analysis was restricted to Predicted (P) frames, and a simple interpolation technique was employed to handle Intra (I) frames. The whole algorithm was based on the assumption that the macroblocks that contains an edge of a moving object is more difficult to compress since it is hard to find a good match for those macroblocks in the reference frame(s). Base on Poppe’s idea, Vacavant et al. [25] used the macroblock size to detect moving objects by applying the Gaussian mixture model (GMM). The approach can represent the distribution of macroblock sizes well. Although the method of Poppe and Vacavant is good for removing the background motion noise, they cannot produce high motion detection results for videos in high spatial resolution (such as 1920 × 1080 or 1280 × 720). In case where the moving objects are large and they contain a uniform color region (such as a black car), then the size of macroblocks corresponding to the inside region of
  • 19. 14 the moving object will be very small (normally around zero), and using a filtering threshold or parameter (though very small) will not be effective. In those cases, the algorithm will determine these regions to be background. Chapter Summarization In this chapter showed the researchs about moving object detection in both pixel domain and compressed domain. The approachs using data from pixel domain usually have high accuracy but taking a large number of computing resources and time. The approachs using data in compressed domain have lower accuracy because the data in compressed domain usually contain less information. In the next chapters, I will propose a method that can efficiently detect moving objects, especially in high spatial resolution video streams. The method uses the data taken from the video compressed domain, including the size of the macroblocks to detect the skeleton of the moving object and the motion vectors to detect the detail of the moving object.
  • 20. 15 CHAPTER 2. METHODOLOGY Video compression standard h264 Before proposing the moving object detection method, this chapter will show some informations about H264, a popular video compression standard, which is used to encode and decode the surveillance video in the thesis. This day, the installation of surveillance cameras in house became quite common. Normally, video data from a surveillance camera over a long period of time usually has very huge size. Consequently, videos need to be preprocessed and encoded before being used and transmitted over the network. There are many recognized compression standards and widely used. One of these is the H264 or MPEG-4 part 10 [26], a compression standard recognized by the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. 2.1.1. H264 file structure Normally, the video after being captured from the camera will be compressed using a common video compression standard such as H261, H263, MP4, H264/AVC, H265/HEVC, etc. In the thesis, I encode and decode the video by using H264/AVC. The H264 video codec or MPEG-4 part 10 is recognized by the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. Typically, an H264 file is splitted into packets called the Network Abstraction Layer Unit (NALU) [27], as shown in Fig. 2.1. Figure 2.1. The structure of a H264 file The first NALU byte indicates the type of NALU. The NALU type shows what the NALU's structure is. It can be a slice or set parameters for decompression. The meaning of the NALU in Table 2.1.
  • 21. 16 Table 2.1. NALU types Type Definition 0 Undefined 1 Slice layer without partitioning non IDR 2 Slice data partition A layer 3 Slice data partition B layer 4 Slice data partition C layer 5 Slice layer without partitioning IDR 6 Additional information (SEI) 7 Sequence parameter set 8 Picture parameter set 9 Access unit delimiter 10 End of sequence 11 End of stream 12 Filler data 13..23 Reserved 24..31 Undefined Other than NALU, the rest of the NALU is called RBSP (Raw Byte Sequence Payload). RBSP contains data of SODB (String Of Data Bits). According to the specification document H264 (ISO/IEC 14496-10) if the SODB is empty (no bits are present), the RBSP is also empty. The first byte of RBSP (left side) contains 8 bits of SODB; The next byte of the RBSP will contain up to 8 bits of SODB and continue until less than 8 bits of SODB. Figure 2.2. RBSP structure
  • 22. 17 A video will normally be divided into frames and the encoder will encode them one by one. Each frame is encoded into slices. Each slice is divided into Macroblock (MB). Typically, each frame corresponds to a slice, but sometimes a frame can be split into multiple slices. The slices are divided into categories as shown in Fig. 2.2. A slice consists of a header and a data section (Fig. 2.3). The header of the slice contains information about the type of slice, the type of MB in the slice, the number of slice frames. The header also contains information about the reference frame and quantitative parameters. The data portion of the slice is the information about the macroblock. Table 2.2. Slide types Type Description 0 P-slice. Consists of P-macroblocks (each macroblock is predicted using one reference frame) and/or I-macroblocks. 1 B-slice. Consists of B-macroblocks (each macroblock is predicted using one or two reference frames) and/or I-macroblocks. 2 I-slice. Contains only I-macroblocks. Each macroblock is predicted from previously coded blocks of the same slice. 3 SP-slice. Consists of P and/or I-macroblocks and lets you switch between encoded streams. 4 SI-slice. It consists of a special type of SI-macroblocks and lets you switch between encoded streams. 5 P-slice. 6 B-slice. 7 I-slice. 8 SP-slice. 9 SI-slice. Tải bản FULL (53 trang): https://bit.ly/3FkzN8W Dự phòng: fb.com/TaiHo123doc.net
  • 23. 18 Figure 2.3. Slide structure 2.1.2. Macroblock The basic principle of a compression standard is to split the video into frame groups. Each frame is divided into the basic processing units. (For example, in the H264/AVC standard, it is Macroblock (MB) which is a region 16x16 pixels). Also, with some data regions carrying more detail, the MBs will be subdivided into smaller sub-macroblocks (4x4 or 8x8 pixels). Each MB after compression will contain the information used to recover the video later, including Motion vector, Residual value, Quantization parameter, etc. as in Fig. 2.4, where: • ADDR is the position of Macroblock in a frame; • TYPE is the Macroblock type; • QUANT is the quantization parameter; • VECTOR is Motion vector; • CBP (Coded Block Pattern) show how to split MB into smaller blocks; • bN is encoded data of residual of color channels (4 Y, 1 Cr, 1 Cb). Figure 2.4. Macroblock structure During decompression, the video decoder receives the compressed video data as a stream of binary data, decodes the elements and extracts the encoded information, including coefficients of variation, size of MB (in bit), motion Tải bản FULL (53 trang): https://bit.ly/3FkzN8W Dự phòng: fb.com/TaiHo123doc.net
  • 24. 19 prediction information, and so on and perform the reverse transformation to restore the original image data. 2.1.3. Motion vector With H264 compression, frame-based megabytes are predicted based on the information that has been transferred from the encoder to the decoder. Usually, there are two ways of predicting frame prediction and inter-frame prediction. Frame forecasting uses compressed image data in the same frame as the compressed macroblock and predicts inter-frame image data using previously compressed frames. Interframe forecasting is accomplished through a predictive and compensatory motion process in which the motion predator retrieves the macroblock in the reference frame closest to the new macroblock and calculates the motion vector, this vector characterizes the shift of the new macroblock to encoding compared to the reference frame. Referenced macroblocks are sent to the subtractor with the new macroblock that needs coding to find error prediction or residual signal, which will characterize the difference between the predicted macroblock and the actual macroblock. The residual signal or prediction error will be converted to Discrete Cosine Transform and quantized to reduce the number of bits to be stored or transmitted. These coefficients together with the motion vectors will be applied to the entropy compressor and the bit stream. Video streams of binary data include conversion factors, motion prediction information, compressed data structure information, and more. To perform video compression, one compares the values of the two frames. A frame is used as a reference. When we want to compress a MB at position i of a frame, the video compression algorithm tries to find the reference frame of a MB with the smallest value of MB compared to MB at position i. Then, if MB is found in the reference frame at position j, the change between i and j is called the Motion vector (MV) of MB at position i (Fig. 2.5). Normally an MV will consist of two values: x (the column position of MB) and y (row position of MB). 6815653