2. The probability density functions of noise coefficient is
MAP estimation is a common method in Bayesian
estimation system. It is to find w that make the most
(4)
W(Y)~Sign(Y{IYI-"~l
.fi(J'2 {d d> O}Letd =Iyl--_n ,then (d) =' .
(J' + O,d<O
The variance of noise according to Donoho robust
median estimation is
(3)
w(y) = argmax(Pn(Y-w) ,Pw(w))
w
According to Bayesian rules, MAP estimation is
(J'~ =median(ly(i)I)/ 0.6745, y(i) E D;.
IV. MOTION ESTIMATION
Motion estimation is an important part of object
detection. Through research of motion estimation theory
and aerial video characters, we make use of parameters of
aircraft and camera, fmd relationship between them and
video image, and calculate a global motion vector to
compensate the result of motion estimation. Considering of
the dithering and other influencing factor, we set a
threshold to make sure the compensation is valuable to
latter object detection.
A. Global Motion Estimation
We compute a global motion vector by analyzing
aircraft motion, camera pose and some other parameters.
Let dg = (xs>yg) be global motion estimation vector,
it is decided by aircraft motion parameters (such as
altitude, velocity and course), gimbals parameters (such as
posterior probability density on conditions of knowing
observation information y [8].
Fig.l flow chat of wavelet denoising
By applying (1) and (2) to (3), we can get a MAP
estimation that has form ofsoft threshold.
(1)
(2)
I .fixp(x)= r;; exp(--)
,,2(J' (J'
As aerial image, it has some unique characters such as
dithering, blur and polluting by noise. So the preprocessing
is necessary to attain a better detection effect.
When an aircraft flying at hundreds of meters or higher,
the region it can surveillance is large, but the camera field
of view is often narrow; approximately 1
0
to 10
0
[5].
III. PREPROCESSING
Aerial video image is polluted by many kinds of noise
because of disturbance of atmosphere, weather condition,
illumination and some onboard device when shooting. The
noises include Guassian noise, salt noise and so on. These
noises will bring errors to latter motion estimation, and
therefore influence the effect of object detection algorithm
[6].
The denoising progress is often implemented through a
series of filter, but some spatial domain methods such as
median filtering, will blur the edges of object while
eliminating noise. The wavelet domain denoising, on the
other hand, can preserve the edge information and filtered
the noise [7]. According to the characteristic of wavelet
transform, the low frequency section presents contour and
smooth part of an image, while the high frequency section
includes detail information of image and noise.
On preprocessing stage, we present a wavelet denoising
method based on Bayesian estimation. Considering the
difference between noise coefficient and real image
coefficient after the wavelet transform, a threshold is
estimated to distinguish the coefficient of noise from
signal; and then the wavelet coefficient of noise is
eliminated to attain the denoising effect.
Let g = x +e, where g is observation image, x is
real image and e is Guassian white noise with zero mean
and variance (12 . The wavelet transform is:
y=w+n
y and Ware the corresponding wavelet coefficient of
g and X • n - N(O,(J'2) . The wavelet domain denoising is
a process to get estimation w(y) of coefficient wfrom y .
The flow chat ofwavelet denoising is as follows in Fig.I.
Assume the distribution of coefficient w is Laplacian
distribution, as in equation (1).
303
3. Flg.3 prediction ofMV
Predicted motion vector (MVp) is the median of MVI,
MV2 and MV3 [12].
After motion vector compensation, a clustering using
centers and density is done to realize object detection.
Definition 1 Draw a circle with center 0 and radius R, if
any IMVclx
in the circle satisfied IIMVclcenter-IMvcLI <T ,
call IMVcL and IMVclcenter are directly density reachable
about (R,T).
Definition 2 C is set of data object, if there is a chain
PI, P2 ... Pn, PiEC (I ~i~n), Pi+I is directly density
reachable from Pi, call Pn is density reachable from PI.
Definition 3 All the density reachable objects belong to
the same center are combined to form a layer.
Rules for block matching is also important, some
common rules are MAD (minimum absolute difference)
and MSE (mean squared error). [13]
MAD(i,j)=_I-fi:lh(m,n)-hjm+i,n+ j)1
A1N m=l n=l
MSE(i,j)=_l_[f.fh(m,n)-h_l(m+i,n+ j)]2
MN m=ln=l
Result of interframe motion estimation is block
displacements, called MV, record as dMV =(x,y).
C. Motion VectorCompensation
Motion vector compensation in this paper is not the
same with in compression algorithm. In the compression
algorithm, compensation is done to blocks, while in this
paper, we compensate motion vector.
We compensate global motion vector to block motion
vector that computed through block matching. When
compensation is completed, the motion vector of
background blocks (blocks that contain background pixels)
will reduced a lot, which make the motion vector of object
blocks highlighted. Because of the errors from the devices,
the global estimation failed sometimes, and the
compensation may lead us a wrong way to ignoring
objects. So a judgement is necessary to the results of
compensation.
A threshold can distinguish invalid results from
effective ones. When most of the compensated MVs are
bigger than the threshold, the compensation is given up. If
one component from compensated MV is bigger than the
threshold, the other will be used only.
Let threshold be Th =(io' j 0), the compensated motion
vector MVc = (x,;;), the compensation is as follows.
x={x-Xg'IX-~gl < iO}':jl={Y-Yg,IY-Y~I < jo}
x,lx-xgl> 10 Y,IY-ygl> J«
Setting of threshold can be varied according to the
practical need.
V. MOTION VECTOR CLUSTERING AND OBJECT
DETECTION
(5)
MY2 MY3
MY)
Current
MB
2
X g= ml + m3x+ msy+ m7x + mgxy
2
yg= m2 +m4x+m6y+m7xy+mgy
All the eight parameters are required for the situation of
significant camera rotation, and for closely related views
the quadratic transformation is enough to approximation
the global motion model. If there is little change between
frames, a simple equation may suffice to model the
displacement [10].
In Fig.2, two consecutive frames from aviation
surveillance video are showed, both size are 352 X 288.
we can count that the background motion is 8.5 pixel up
and 2.2 pixel left, while the GPS information and other
equipments computed a background motion of(9.0, 2.0).
(a) (b)
Fig.2 An example for global motion
B. Interframe Motion Estimation
Interframe motion estimation is widely used in video
compressing and coding. The basic principle is to find each
block of the current frame a best matching block in a
certain search range in the former frame or the latter frame,
and compute the block displacement as a block motion
vector (MV) [II]. In blocks that have no object pixel, the
motion vector presents the background motion. To aerial
surveillance video, the background motion is related to the
motion of aircraft and camera.
Take H.264 coding as an example, to save transmission
bits, MV is first predicted from neighbor encoded blocks'
MV, as showed in Fig.3.
rotation) and camera parameters (such as zooming and
panning) . All of these parameters can be obtained through
some independent device onboard such as an altimeter, or
through synthesis information, GPS (Global Position
System) for an example [9].
A quadratic function showed in equation (5) can
modeled the displacement field with respect to a distant
scene for simple camera motions and stable aircraft
motion.
304
4. (e) (f)
Fig.4 experiment and result
The experiments show that object detection is done using a
method of motion vector compensation and clustering; also
a preprocessing for denoising based on Bayesian
estimation is necessary and has good effect.
REFERENCES
[I] Rakesh Kumar, Harpreet Sawhney, Supun Samarasekera et al. Aerial
Video Surveillance and Exploitation[J]. Proceedings of the IEEE, 2001,
10(89):1518-1520
[2] H. Tao, H. S. Sawhney, and R. Kumar, "Dynamic layer representa-
tion with applications to tracking," in Proc. IEEE Conf. Computer
[3] Nair D , Aggarwal J K. Recognition of targets by parts in second
generation forward looking infrared images[J]. Image and Vision
Computing, 2000, 18 (II): 8492864. Vision and Patter Recognition.
[4] Bors Adrian G, Pitas loannis. Prediction and tracking of moving
objects in image sequence [J].IEEE Transactions on Image Processing,
2000,9(8):1441-1445.
[5] Paul Robertson. Adaptive Image Analysis for Aerial Surveillance,
IEEE Intelligent Systems, Vol 14, Issue: 3, pp. 30-36, 1999.
[6] M. Mahmoudi and G. Sapiro. Fast image and video denoising via
nonlocal means of similar neighborhoods. IEEE Signal Processing
Letters, 12(12):839-842,2005.
[7] N. Lian, V. Zagorodnov, and Y. Tan, "Video denoising using vector
estimation of wavelet coefficients," in Proc. IEEE Int. Sym. Circuits and
Systems, pp. 2673-2676, May 2006.
[8] T.S. Jaakkola and M.1. Jordan. Bayesian parameter estimation via
variational methods. Statistics and Computing, 10:25-37,2000.
[9] Hong L, W.C Wang et al. Multiplatform Multi-sensor Fusion with
Adaptive-Rate Data Communication [J]. IEEE Trans. on Aerospace and
Electronic Systems, 33 (I) , 1997:123 - 126.
[10] 1. R. Bergen, P. Anandan, K. Hanna, and R. Hingorani, "Hierarchical
model-based motion estimation," in Proc. Eur. Conf. Computer Vision,
1992.
[11] Wiegand T, Sullivan G 1. The H.264 /MPEG-4 AVC Video Coding
Standard[S], IEEE, 2004
Considering that IMVcl <IMVcl . in most of thebackground object
time, and background motion vector take up a huge
number, we select the IMVclm.xas the first start point
(center). Let R be search range, we can get the directly
density reachable cluster SO 1. Then using all of the IMVcl
in SO1 as start points to go on searching, a cluster S02 can
be acquired from all the searching results. This process is
keeping on until no density is reachable in R
neighborhood. Unite all the IMVcl from SO1 to SMN to
form a layer L1. M and N are integer from [0,+00 ) . Then
the IMVcl~.x is selected from the rest of IMVcl as the new
start point to get L2.. .LW until all the IMVcl be done.
Object detection can be realized by highlighting the
contour of each layer. In practice, more processing can be
done to attain a better effect, such as eliminate some
isolated layers.
There would be no problem to the situation of more
than one objects occurred in the scene as long as their
motions are different from the background.
VI. EXPERIMENTS AND CONCLUSION
An aerial video image object detection algorithm is
proposed in this paper. We use a video sequence of road
surveillance to test our method. The image size is 352 X
288, the video frame rate is 15 frames/second. Fig. 4(a)
shows a frame (7th frame) in a video sequence with noise,
Fig. 4(b) is the next frame of (a) after the denoising.
Neither (a) nor (b) is implemented with object detection
process. We can see that most of noise is eliminated while
object edge is not blurred. Fig. 4(c) is the motion vector
image computed from H.264 motion estimation of part of
Fig. 4(b), while Fig. 4(d) shows the image after motion
vector compensation. If we calculate the arithmetic
module, it is easy to find that all the values are almost the
same in Fig. 4(c), while in Fig. 4(d) the object motion
vector is larger than the background motion vector after
the compensation.
Fig. 4(e) and Fig. 4(f) are the 36th
and 50th
frames in the
sequence; the object is lined out without any other fake
detection.
j j j
j j I I
j I I I
j j j I
j j j I
(c)
, / / /
(d)
(a) (b)
305