Published on

  • Be the first to comment

  • Be the first to like this


  1. 1. Aerial Video Image Object Detection andTracing Based on Motion Vector Compensationand Statistic AnalysisJiang ZheSchool ofElectronics andInformation EngineeringBeijing University ofAeronauticsand AstronauticsBeijing, China.ginzhe@163.comDing WenruiResearch Institute ofUnmannedAerial VehicleBeijing University ofAeronauticsand AstronauticsBeijing, China.ding@buaa.edu.cnLi HongguangSchool ofMechanical Engineeringand AutomationBeijing University ofAeronauticsand AstronauticsBeijing, ChinaAbstract-Object detection in aviation video is veryimportant in many occasions. Through research oncharacteristic of aviation video image, and analyzing ofmotion estimation theory, an object detection techniquebased on motion vector compensation and statisticalanalysis is proposed. Through effective image pre-processing, noise that influenced the capacity ofalgorithm is removed through a Bayesian estimationbased waveletdenoising method; a compensation is carryout between the motion vectorcomputedfrom parametersofcamera and the motion estimation that usually used inall kinds of image compression algorithm. Thecompensation makes it easy to distinguish object motionfrom background. Then statistic analysis, a clusteringusing center and density ofcompensated motion vector isperformed to eliminate isolate motion vector and realizeobject detection. Settings of the threshold can preventsome errors ofglobal estimation which derivedfrom theparameters from compensation, and have someaccessorial effect in motion vector clustering and objectdetection.I. INTRODUCTIONAerial surveillance using video cameras is now widelyinterested and studied [1]. Object detecting and detection isimportant in aerial surveillance [2]. A basic objectdetection system is composed by several parts, includingimage capture, preprocessing, detection or classification,post processing.There are many methods to realize object detection,most of which used the characteristic of image, so theabstraction of image characteristic is crucial in a detectionalgorithm. The characteristic of image that widely used isas follows [3][4]:(a). Shape/structure characteristic. Generally speaking,the abstraction of shape/structure characteristic is based onbinary image. Using image segmentation to get object or978-1-4244-4669-8/09/$25.00 ©2009 IEEE 302interest region, or through boundary abstraction to acquireimage boundary.(b). Motion characteristic. Through the modeling ofobject motion, we can get the motion characteristic ofmoving objects, then the recognition or detection can bedone. But to establish the model is difficult for most oftime. To subdivide, there are two ways to realize objectdetection. One is based on motion character; the other isbased on motion information, such as motion vector.(c). Gray distribution characteristic. Through analyzingthe change ofgray value to acquire image texture characteror other object related information. Approaches of textureanalysis include arithmetic operators based textureabstraction, statistic based texture character analysis, etc.This paper presents a moving object detection methodbased on motion vector compensation and analysis. Wefirst study the aerial image characteristic, and then proposea motion vector processing method based on cameramotion estimation and compensation. Motion vectorstatistic analysis and clustering is also researched to realizeobject detection and eliminate isolate error vector. Apreprocessing based on Bayesian statistics is used fordenoising.II. AERIAL VIDEO IMAGE CHARACTERISTICAerial video image is acquired from camera that ismounted within gimbals on board an aircraft. The qualityof video image is associated with the aircraft motionstableness and camera parameter precision. Cameragimbals isolated the camera from vibration of the aircraft;therefore it is possibIe to estimate the video imagebackground motion through camera parameters and getdisplacement between frames exactly. This may be themost special character of aerial video image, since to othervideo sequences we never know what the next frame willbe.
  2. 2. The probability density functions of noise coefficient isMAP estimation is a common method in Bayesianestimation system. It is to find w that make the most(4)W(Y)~Sign(Y{IYI-"~l.fi(J2 {d d> O}Letd =Iyl--_n ,then (d) = .(J + O,d<OThe variance of noise according to Donoho robustmedian estimation is(3)w(y) = argmax(Pn(Y-w) ,Pw(w))wAccording to Bayesian rules, MAP estimation is(J~ =median(ly(i)I)/ 0.6745, y(i) E D;.IV. MOTION ESTIMATIONMotion estimation is an important part of objectdetection. Through research of motion estimation theoryand aerial video characters, we make use of parameters ofaircraft and camera, fmd relationship between them andvideo image, and calculate a global motion vector tocompensate the result of motion estimation. Considering ofthe dithering and other influencing factor, we set athreshold to make sure the compensation is valuable tolatter object detection.A. Global Motion EstimationWe compute a global motion vector by analyzingaircraft motion, camera pose and some other parameters.Let dg = (xs>yg) be global motion estimation vector,it is decided by aircraft motion parameters (such asaltitude, velocity and course), gimbals parameters (such asposterior probability density on conditions of knowingobservation information y [8].Fig.l flow chat of wavelet denoisingBy applying (1) and (2) to (3), we can get a MAPestimation that has form ofsoft threshold.(1)(2)I .fixp(x)= r;; exp(--),,2(J (JAs aerial image, it has some unique characters such asdithering, blur and polluting by noise. So the preprocessingis necessary to attain a better detection effect.When an aircraft flying at hundreds of meters or higher,the region it can surveillance is large, but the camera fieldof view is often narrow; approximately 10to 100[5].III. PREPROCESSINGAerial video image is polluted by many kinds of noisebecause of disturbance of atmosphere, weather condition,illumination and some onboard device when shooting. Thenoises include Guassian noise, salt noise and so on. Thesenoises will bring errors to latter motion estimation, andtherefore influence the effect of object detection algorithm[6].The denoising progress is often implemented through aseries of filter, but some spatial domain methods such asmedian filtering, will blur the edges of object whileeliminating noise. The wavelet domain denoising, on theother hand, can preserve the edge information and filteredthe noise [7]. According to the characteristic of wavelettransform, the low frequency section presents contour andsmooth part of an image, while the high frequency sectionincludes detail information of image and noise.On preprocessing stage, we present a wavelet denoisingmethod based on Bayesian estimation. Considering thedifference between noise coefficient and real imagecoefficient after the wavelet transform, a threshold isestimated to distinguish the coefficient of noise fromsignal; and then the wavelet coefficient of noise iseliminated to attain the denoising effect.Let g = x +e, where g is observation image, x isreal image and e is Guassian white noise with zero meanand variance (12 . The wavelet transform is:y=w+ny and Ware the corresponding wavelet coefficient ofg and X • n - N(O,(J2) . The wavelet domain denoising isa process to get estimation w(y) of coefficient wfrom y .The flow chat ofwavelet denoising is as follows in Fig.I.Assume the distribution of coefficient w is Laplaciandistribution, as in equation (1).303
  3. 3. Flg.3 prediction ofMVPredicted motion vector (MVp) is the median of MVI,MV2 and MV3 [12].After motion vector compensation, a clustering usingcenters and density is done to realize object detection.Definition 1 Draw a circle with center 0 and radius R, ifany IMVclxin the circle satisfied IIMVclcenter-IMvcLI <T ,call IMVcL and IMVclcenter are directly density reachableabout (R,T).Definition 2 C is set of data object, if there is a chainPI, P2 ... Pn, PiEC (I ~i~n), Pi+I is directly densityreachable from Pi, call Pn is density reachable from PI.Definition 3 All the density reachable objects belong tothe same center are combined to form a layer.Rules for block matching is also important, somecommon rules are MAD (minimum absolute difference)and MSE (mean squared error). [13]MAD(i,j)=_I-fi:lh(m,n)-hjm+i,n+ j)1A1N m=l n=lMSE(i,j)=_l_[f.fh(m,n)-h_l(m+i,n+ j)]2MN m=ln=lResult of interframe motion estimation is blockdisplacements, called MV, record as dMV =(x,y).C. Motion VectorCompensationMotion vector compensation in this paper is not thesame with in compression algorithm. In the compressionalgorithm, compensation is done to blocks, while in thispaper, we compensate motion vector.We compensate global motion vector to block motionvector that computed through block matching. Whencompensation is completed, the motion vector ofbackground blocks (blocks that contain background pixels)will reduced a lot, which make the motion vector of objectblocks highlighted. Because of the errors from the devices,the global estimation failed sometimes, and thecompensation may lead us a wrong way to ignoringobjects. So a judgement is necessary to the results ofcompensation.A threshold can distinguish invalid results fromeffective ones. When most of the compensated MVs arebigger than the threshold, the compensation is given up. Ifone component from compensated MV is bigger than thethreshold, the other will be used only.Let threshold be Th =(io j 0), the compensated motionvector MVc = (x,;;), the compensation is as follows.x={x-XgIX-~gl < iO}:jl={Y-Yg,IY-Y~I < jo}x,lx-xgl> 10 Y,IY-ygl> J«Setting of threshold can be varied according to thepractical need.V. MOTION VECTOR CLUSTERING AND OBJECTDETECTION(5)MY2 MY3MY)CurrentMB2X g= ml + m3x+ msy+ m7x + mgxy2yg= m2 +m4x+m6y+m7xy+mgyAll the eight parameters are required for the situation ofsignificant camera rotation, and for closely related viewsthe quadratic transformation is enough to approximationthe global motion model. If there is little change betweenframes, a simple equation may suffice to model thedisplacement [10].In Fig.2, two consecutive frames from aviationsurveillance video are showed, both size are 352 X 288.we can count that the background motion is 8.5 pixel upand 2.2 pixel left, while the GPS information and otherequipments computed a background motion of(9.0, 2.0).(a) (b)Fig.2 An example for global motionB. Interframe Motion EstimationInterframe motion estimation is widely used in videocompressing and coding. The basic principle is to find eachblock of the current frame a best matching block in acertain search range in the former frame or the latter frame,and compute the block displacement as a block motionvector (MV) [II]. In blocks that have no object pixel, themotion vector presents the background motion. To aerialsurveillance video, the background motion is related to themotion of aircraft and camera.Take H.264 coding as an example, to save transmissionbits, MV is first predicted from neighbor encoded blocksMV, as showed in Fig.3.rotation) and camera parameters (such as zooming andpanning) . All of these parameters can be obtained throughsome independent device onboard such as an altimeter, orthrough synthesis information, GPS (Global PositionSystem) for an example [9].A quadratic function showed in equation (5) canmodeled the displacement field with respect to a distantscene for simple camera motions and stable aircraftmotion.304
  4. 4. (e) (f)Fig.4 experiment and resultThe experiments show that object detection is done using amethod of motion vector compensation and clustering; alsoa preprocessing for denoising based on Bayesianestimation is necessary and has good effect.REFERENCES[I] Rakesh Kumar, Harpreet Sawhney, Supun Samarasekera et al. AerialVideo Surveillance and Exploitation[J]. Proceedings of the IEEE, 2001,10(89):1518-1520[2] H. Tao, H. S. Sawhney, and R. Kumar, "Dynamic layer representa-tion with applications to tracking," in Proc. IEEE Conf. Computer[3] Nair D , Aggarwal J K. Recognition of targets by parts in secondgeneration forward looking infrared images[J]. Image and VisionComputing, 2000, 18 (II): 8492864. Vision and Patter Recognition.[4] Bors Adrian G, Pitas loannis. Prediction and tracking of movingobjects in image sequence [J].IEEE Transactions on Image Processing,2000,9(8):1441-1445.[5] Paul Robertson. Adaptive Image Analysis for Aerial Surveillance,IEEE Intelligent Systems, Vol 14, Issue: 3, pp. 30-36, 1999.[6] M. Mahmoudi and G. Sapiro. Fast image and video denoising vianonlocal means of similar neighborhoods. IEEE Signal ProcessingLetters, 12(12):839-842,2005.[7] N. Lian, V. Zagorodnov, and Y. Tan, "Video denoising using vectorestimation of wavelet coefficients," in Proc. IEEE Int. Sym. Circuits andSystems, pp. 2673-2676, May 2006.[8] T.S. Jaakkola and M.1. Jordan. Bayesian parameter estimation viavariational methods. Statistics and Computing, 10:25-37,2000.[9] Hong L, W.C Wang et al. Multiplatform Multi-sensor Fusion withAdaptive-Rate Data Communication [J]. IEEE Trans. on Aerospace andElectronic Systems, 33 (I) , 1997:123 - 126.[10] 1. R. Bergen, P. Anandan, K. Hanna, and R. Hingorani, "Hierarchicalmodel-based motion estimation," in Proc. Eur. Conf. Computer Vision,1992.[11] Wiegand T, Sullivan G 1. The H.264 /MPEG-4 AVC Video CodingStandard[S], IEEE, 2004Considering that IMVcl <IMVcl . in most of thebackground objecttime, and background motion vector take up a hugenumber, we select the IMVclm.xas the first start point(center). Let R be search range, we can get the directlydensity reachable cluster SO 1. Then using all of the IMVclin SO1 as start points to go on searching, a cluster S02 canbe acquired from all the searching results. This process iskeeping on until no density is reachable in Rneighborhood. Unite all the IMVcl from SO1 to SMN toform a layer L1. M and N are integer from [0,+00 ) . Thenthe IMVcl~.x is selected from the rest of IMVcl as the newstart point to get L2.. .LW until all the IMVcl be done.Object detection can be realized by highlighting thecontour of each layer. In practice, more processing can bedone to attain a better effect, such as eliminate someisolated layers.There would be no problem to the situation of morethan one objects occurred in the scene as long as theirmotions are different from the background.VI. EXPERIMENTS AND CONCLUSIONAn aerial video image object detection algorithm isproposed in this paper. We use a video sequence of roadsurveillance to test our method. The image size is 352 X288, the video frame rate is 15 frames/second. Fig. 4(a)shows a frame (7th frame) in a video sequence with noise,Fig. 4(b) is the next frame of (a) after the denoising.Neither (a) nor (b) is implemented with object detectionprocess. We can see that most of noise is eliminated whileobject edge is not blurred. Fig. 4(c) is the motion vectorimage computed from H.264 motion estimation of part ofFig. 4(b), while Fig. 4(d) shows the image after motionvector compensation. If we calculate the arithmeticmodule, it is easy to find that all the values are almost thesame in Fig. 4(c), while in Fig. 4(d) the object motionvector is larger than the background motion vector afterthe compensation.Fig. 4(e) and Fig. 4(f) are the 36thand 50thframes in thesequence; the object is lined out without any other fakedetection.j j jj j I Ij I I Ij j j Ij j j I(c), / / /(d)(a) (b)305