CVGIP 2010 Part 2


Published on

CVGIP 2010 The 23th IPPR Conference on Computer Vision, Graphics, and Image Processing

Published in: Technology
  • Be the first to comment

  • Be the first to like this

CVGIP 2010 Part 2

  1. 1. Robust Abandoned Object Detection based on Life-cycle State Measurement1 Wei-Hsin Hsu(徐維忻), 2Hung-I Pai(白宏益) , 3Shen-Zheng Wang(王舜正), 4San-Lung Zhao(趙 善隆), 5Kung-Ming Lan(藍坤銘) Identification and Security Technology Center, Systems Development and Solutions Division, Industrial Technology Research Institute, Hsin-Chu, Taiwan E-mail: 1 5 ABSTRACT can be used in some applications such as dangerous abandoned object detection, abandoned luggageIn public areas, objects could be abandoned due to detection for passengers and so on. Moreover, thiscareless forgetting or terrorist attack purposes. If we system is also lowering personnel.can detect those abandoned objects automatically basedon a video surveillance system, the forgotten objects [1] provides a two backgrounds framework (long-termcan be returned to the owner and the terrorist attacks and short-term). The two backgrounds framework usescan be stopped. In this paper, we propose an automatic pair of backgrounds that have different characteristicsabandoned object detection algorithm to satisfy the to segment related foregrounds and extract abandonedrequirement. The algorithm includes a double- objects. Our background module building technology isbackground framework, which generates two just follow [1] and tries to fix it. The advantage ofbackground models according to different sampling method from [1] is that it doesn’t track all of objects sorates. The framework can extract un-moving object that it can save the tracking computing performance.candidates rapidly and the abandoned objects will be However, this method is not perfect yet; temporal rateextracted from those candidates according to their of background construction is quite important, in [1],appearance features. Finally, this paper propose a finite long-term and short-term background will be updatedstate machine to model The Life-cycle State for a period of time. However, if long-term backgroundMeasurement called LCSM module, to prevent the is updated before the abandoned object is detected, itmissing fault caused by occlusion or illumination cause failure in detection and effect on the result. Inchanging. When abandoned events happened, the this study, we try to find the optimized key timing toLCSM module can launch or stop an alarm in the reduce the risk of failure in detection cause of long-starting time or the ending time. To evaluation the term background updating. The detail is described inperformance, we test the algorithm on 10 videos. The Section 2.experiment result shows that the algorithm is feasible,since the false alarm rate and missing rate are very low. [2] provides a framework of two level backgrounds, which uses linear combination, belonging to singleKeywords Pure background; Instant background; un- background building method and manages to track eachmoving object aggregated map; LCSM moving object. The framework uses optical flow technology to detect moving objects because optical 1. INTRODUCTION flow for each object will be changed when the object is moving. Therefore, it can easy to recognize movingThe problem of video surveillance becomes quite object and static object. Static object and stayed humanimportant issue today. Following the progressed are separated by Human Patten recognition, a lot of places including the public area, However, the method still has some limits for filtering.insider building, even in the public roadway are set up For example, we are hard to confirm each shape of stillcamera for surveillance. However, we cannot monitor human. Therefore, we must have enough Human Patteneach camera by human immediately cause of templates. When the number of template is up, theinsufficient human resource. In this study, we propose a recognition rate is increased; even so, the computingnew abandoned object detection technology, which performance is raised. Therefore, it is unsuitable forreceive the video stream from a camera and than detect performance priority. In Section 2, we propose athe abandoned object in few minutes. The technology method to filter an object by the object feature filtering 691
  2. 2. to avoid the performance problem cause of using the of objects in S are aggregated over the threshold, weHuman Patten method. draw objects out from S and save those objects. Moreover, those objects should be filtered according to[3] provides a framework of two backgrounds method feature filtering including Shape 、 Compactness and(current and buffered background) to track all objects Area. The final processing is the most important issueand to record the object’s information to determine in this paper: The Life-cycle State Measurement calledwhether it is occluded or not. The advantage is that it is LCSM. The conception of LCSM is from softwarestill locks on target even so the target is occluded. engineering, the original meaning talks about the life-Section 2, we fix the idea from [3] and provide the cycle of software, this paper puts the idea intotheory about The Life-cycle State Measurement (LCSM) abandoned object detection, it makes abandoned objectto make each abandoned object in different have different state in each situation. In this paper , theenvironments more convinced and have the detection state include the growing state、the stable state、theresult become more reasonable. aging state and the dead state. We assign proper state for each abandoned object in different situations like occlusion or removal, in this way; we can make abandoned object processing more reasonable. This issue will be discussed in Section 2, and Fig. 1 is shown the definition of all the symbols and described the relationships one another. 2. ABANDONED OBJECT EDTECTION SYSTEM Fig. 1.Symbol definitions and flowAn abandoned object detection method using objecttracking method could be feasible. However, theproblem of using tracking method has lower efficiencywhen too many objects are tracked, since those methodstrack not only abandoned object but also all otherobjects. Otherwise, the computing is heavy. Therefore, Fig. 2. System Overviewwe use new method based on different sample rateinstead of tracking-based technology to avoid the In this paper, the system consists of three modules asproblem we discussed above. In this paper , we divide shown in Fig. 2. The first module, um-moving objectthe system framework into three technological detection, is composed of foreground detection, un-processeses; first, we manage to receive the pure moving object decision and real un-moving objectbackground called BP(t) and the instant background extraction by aggregated processing. The secondcalled BI(t) according to different sample rate at time t, module called an abandoned object decision; theBP(t) means the original background without any function of this module is clustering the image pixels ofmoving objects here, and BI(t) means the background un-moving object from the un-moving objectreceived form video sequence for a short period. aggregated map received from module 1. Moreover,Computing by frame difference between current frame those un-moving objects are filtered by the objectBC(t) and BP(t), BI(t) to get the pure foreground called FP(t) feature filtering method and the abandoned objects areand the instant foreground called FI(t). Following the decided. The final module, life cycle of abandonedrules in Section 2, we can extract the un-moving objects object, decides the persistent time of an abandonedin current frame according to FP(t) and FI(t), and getting object event according to a finite state machine.the un-moving object map called St(t) at present. Second,the processing will aggregate value for each pixel from According to the definition of the modules, three maineach St(t) to receive the un-moving object aggregated corresponding technologies are presented in themap called S, this processing can get rid of some following.objects which remain within short period. When some 692
  3. 3. 2.1 The un-moving object decision background using BC(t) because BC(t) could have moving or un-moving objects here, and it is easy to update theAccording to the discussion from the Introduction, the objects into Bp(t). For that reason, To keeping pure, weabandoned objects detection by objects tracking method select foreground map FP(t) as the mask, and then wecould cause performance problem, because most use the linear combination method to combine thecomputing are cost to track moving objects. In [1] [3], masked pixels of BP(t) and BC(t). Therefore it can avoiddouble backgrounds are proposed to remove moving objects updated into BP(t) and suitable for illuminationobjects and retain un-moving objects. By this method, changing. Actually, we cannot guarantee the accuracythe performance will be raised than tracking method. of FP(t) and can’t avoid noises are updated into BP(t). Those noises be increased quickly when the updated frequency is raised. However, when the updated2.1.1. Background updating frequency of BP(t) is lower, the ability of illumination changing adaption will be declined. Therefore, in thisIn the sequence of video, we extract the frames to be the paper, according to different timing, we propose threeupdated source of the two background model BP(t) and updating rules to adapt background model BP(t). TheBI(t) by different duration shown in Fig.3: background is not updated for each frame. The update rate of BP(t) is defined as previous paragraph and the updating rules are defined as below: a) The foreground map FP(t) is used as a mask to select the pixels that need to be updated. Those selected pixels in BP(t) is linearly combined with BC(t) for updating. b) If the number of pixels from un-moving object aggregated map S is over an assigned threshold, it means that the noises are up or light is changing. In that condition, BP(t) is replaced by BC(t) . c) If there are not any moving objects in BC(t) for the Fig. 3 Pure and Instant sampling rate illustration long duration and there are not any abandoned objects detected, BP(t) is replaced by BC(t) .BP(t) indicates keeping pure image without any movingobjects during the long period, and BI(t) indicates an In the second rule, the un-moving object aggregatedimage caught by an sample rate with short period. map S is a map used to accumulate the possibility ofObjects are expected in BI(t) when the event of each pixel belonging to abandoned object. The detailabandoned objects happened. Current frame at time t is about map S will be described in next sub-section (Sec.denoted by BC(t). After the background model BP(t) is 2.1.2).estimated, we can easy to obtain a foreground map FP(t)including moving and un-moving objects by computing 2.1.2. Un-moving object decisionthe frame difference between BP(t) and BC(t). To extractun-moving objects from the foreground map FP(t), we Following the last section, first of all, we compute thetry to extract another foreground map FI(t) only includes frame difference between BC(t) and BP(t), and betweenmoving object shown in Fig.4. The foreground map FI(t) BC(t) and BI(t). The frame difference method is imageis gotten by computing the frame difference between subtraction between two images. The difference resultsBI(t) and BC(t). If an un-moving object stays in a position, FP(t) and FI(t) are shown in Fig.4:the value in the object position of the map FP(t) shouldbe 0 and the value in the object position of the map FI(t)should be 1. Therefore, this processing can easily toextract an un-moving object presently. According to ourexperiment, the time period of sample rate of BP(t) isabout 25 frames, and the sample rate of BI(t) is about 15frames .Because of illumination variances, it is difficult toupdate Bp(t). In general, Bp(t) could be updated and Fig. 4 Pure and Instant foreground illustrationsourced from BC(t). However, it is hard to keep pure 693
  4. 4. If a object stay in the environment, the value on the constant value (Iv). Otherwise, decrease a constantsame positions (x, y) of FP(t)(x, y) is 0 and FI(t)(x, y) is 1. value (Dv). Therefore, if objects stay for a long period,The algorism is shown in Algo.1: the values of the object positions in map S will increase continuously until that enough pixels value are exceedIn the algorithm, FP(t) and FI(t) are monochrome and the to the thresholds we assign. In that time, those pixelsrange of pixel value is 0 to 1. When the pixel value of could be parts of un-moving objects. The advantage ofFP(t) (x, y) is equal to 0 and the pixel value of FI(t)(x, y) this method is that it can prevent the temporary objects,is equal to 1, it means that this pixel begin to stop and which stayed only for few seconds, to be regarded asstay in the current frame. Therefore; we can extract the abandoned objects. Moreover, when number of pixelun-moving object map St shown in Fig.5: which value is equal to 1 is too much, this situation may infer too many noises in Bp(t) after updating, or the light is changed too large. Those information could be feedback to be a updated timing conditions of BP(t).Input: FP(t) ,FI(t) Output: St(t)For Each Position (x, y) inside the maps FP(t) andFI(t) IF (FP(t)(x, y) = 0 & FI(t)(x, y) = 1) Set St(t) (x, y) = 1 Else Set St(t) (x, y) = 0 Algo. 1. The algorism of getting the un-moving object Fig. 6. Un-moving object aggregated map illustration 2.2 Abandoned object decision The method of [2] is using Human Patten to recognize human or abandoned objects. But it is not enough that only Human Patten is using. In normal situation, an abandoned object is usually a luggage or a briefcase,Fig. 5.Un-moving object map illustration: only present and bomb usually package in a box with regular shape. the un-moving object. The right map is St Therefore, we can only focus on the object that has regular shape, getting rid of it using object feature2.1.3. Real Un-moving Object extraction filtering.St includes current un-moving objects. However, it 2.2.1. Abandoned object clusteringcould not indicate that they are real un-moving objectsbecause those objects could just be a person staying for According to the un-moving object aggregated map Sseconds or objects placing temporary. Therefore, St which is getting in the section 2.1, we extract eachmust be added up for each frame continuously to form object from S using clustering method. Those un-an un-moving object aggregated map S for extracting moving objects may include noises, staying human orreal un-moving objects. The map S is defined as Algo.2. luggage and so on, generally, we take care about the object which is baggage with regular shape, briefcase and box, therefore, we focus on those kinds of objects based on three special features.Input: St(t),S , Iv, Dv Output: SFor Each Position (x, y) inside St(t) & S 2.2.2. Object feature filtering IF(St(t)(x, y) = 0 ) According to the need from Subsection 2.2.1, we filter S(x, y) = S(x, y) + Iv each object based on object features. In this paper, we Else use three object features. First object feature is Area. S(x, y) = S(x, y) - Dv The goal of Area constrain is filtering the objects which are too large or too small. The Area feature is shown Algo. 2. The algorism of S (1) :For each pixel, if the pixel value is equal to 0, then thesame position (x, y) of S will begin to increase a Area  size(Object) (1) 694
  5. 5. where size (Object) means sun of pixels of object. abandoned object set that we already have(objectj(t-n) 1<n<t-1).If the center position of objecti (t) is located onThe second feature is Shape, the state of object one of abandoned objects‘s bounding box from the Setappearance. The function is shown (2) : (objectj(t-n) 1<n<t-1), it means that they have relationship between objectj(t-n) and objecti(t), and called the objectj(t-n) size(Object)  4 has relationship satisfaction. Otherwise; it could be new Shape  (2) object 、 removal object or occlusion and called ( Perimeter)2 relationship dissatisfaction, if it is a new object, thanwhere Perimeter indicate sun of object edge length and adding objecti(t) into the Set(objectj(t-n) 1<n<t-1). Therefore;the shape of object. When the shape of object is not we can decide relation and state for each object fromregular, the Shape is smaller. Otherwise, the Shape is Set(objectj(t-n) 1<n<t-1) through this processing. When anlarger by non-regular object. Generally speaking, this abandoned object is considered, we dont lunch a alarmfeature is good for human shape filtering. to user immediately instead of making a decision based on the state of current abandoned object. Following theThe last feature is Compactness. If the object is more definition of the life-cycle of software from softwaredispersed, the value of Compactness will be smaller. engineering technology, we manage to use the ideasThe function is shown (3): into our study. The Life-cycle State Measurement (LCSM) including four states: the growing state、the n iObject i stable state、 the aging state and the dead state. Fig.7 is The Finite State Machine of LCSM: Compactness  (3) size(Object )  8where ni means that for each pixel, searching eightneighborhood pixels, if one of eight pixels is alsoincluded object, than ni will be add value 1.The objects extracting from S should be contented withthose three features, and than those objects be Fig. 7.The Finite State Machine of LCSMconsidered an abandoned object finally. Therefore; wecan be easily to filter most of objects thought those The beginning state of abandoned object which isfeatures. recorded in Set(objectj(t-n) 1<n<t-1) is assigned the growing state, when the timing of growing state is finished, the2.3 The Life-cycle State Measurement state will be changed into the stable state or the aging state. When the timing of aging state is raised, it couldThe advantage of our method is that we dont track all be into the dead state.objects and the efficiency is better. However, the objectfeatures are easily to be effected due to illumination Following illustrate is the algorism of each state.changing and so on. Moreover, the occlusion problem Symbols are defined in Table.1 to convince theis a harsh issue, when the objects are occluded. The following illustration.abandon object could be discarded due to that the Areafeature is reduced. When objects are occluded in smallperiod; system should be kept instead of being Table.1. The definitions of symbols in LCSMdiscarded. Therefore, the set saving an abandoned symbol illustrationobject should have temporal register. In other word, ObjState The life-cycle state of current objectwhen we get objects form the S for each frame, thoseobjects should compute the relationship algorism with Newobj A type is Boolean, indicates whetherabandoned object set which detected before and object is new abandoned object ordiscarded abandon object. The definition of relationship notmeans whether the abandoned object set at present GrowingT A type is Boolean, indicates whether(objecti (t)) connect to the object set which detected the growing time (GrowingTime) isbefore (objectj(t-n) 1<n<t-1) or not. The processing ensures finished or not. GrowingTime isthat each abandoned object has stayed for a period of means a duration form Growingtime. State to Stable State AgingT A type is Boolean, indicates whetherThe computing method of relationship is that for each the aging time (AgingTime) isobjecti(t), their center position is compared with 695
  6. 6. finished or not. . AgingTime is be stopped and the state will be changed into the special means a duration form Aging State situation called the unstable state. When the unstable to Dead State state continues for a long period of time, this object willObjFeature A type is Boolean, indicates whether be killed by itself finally. Therefore, through the LCSM features of abandoned object are processing, we can avoid some of unstable situations satisfied assigned by user or not and make the detection more reasonable.ObjRelation A type is Boolean, indicates whether the relationship satisfaction is 3. EXPERIMENT RESULTS satisfied through the relationship computing or not We test ten videos which the dimension is 352 X 288 or 320 X 240 and those videos be divided into four types,Growing State: When the new abandoned object is each type has the same background, therefore; thecreated, we give it a time buffer (GrowingTime) to parameters for each type should be the same to proofgrow and to avoid a false alarm due to the error that the same parameters in the same background coulddetection by occlusion or non-complete area. Therefore; be use. When abandoned objects are detected, systemthis state is using at the beginning until the will draw bounding box on them even in occlusionGrowingTime is finished. situation, when objects are removed, the alarm is still retained for a period of time.ObjState = Grounging State if Newobj = True & GrowingT = False & ObjFeature = True & ObjRelation = True (4)Stable State: Stable state means the GrowingTime isfinished, and object’s features are satisfied, andrelationship satisfaction is existed. Relationshipsatisfaction means the object is not removed oroccluded. If the state of abandoned object is changed tothe stable state, system will lunch alarm to user and theobject will be boxed and feedback to user.ObjState = Stable State if GrowingT = True & ObjFeature = True & ObjRelation = True (5)Aging State: Aging state happened when therelationship is not contented or feature conditions arenot satisfied, it is usually stand for occlusion or removalobject. In that time, the state will be changed to theaging state. We also give the object a time buffer Fig. 8.Abandoned object detection results: when the(AgingTime) to age, once the conditions of the stable abandoned object is occlude, the bounding box is stillstate are fitted among the AgingTime. The state is keeping it; when the object is removed, the boundingreturned to the stable state again, otherwise, the state is box is also keeping for duration.changed to the dead state finally. The top of picture in the Fig.8 show the fact that the ObjState = Aging State system can still select the object for a while when the if GrowingT = True & AgingT = False & object is occluded, others are show the fact that the ObjFeature = False | ObjRelation = False (6) system can still select the object for a while when the object is picked off.Dead State: When the state is changed into the deadstate, It means the object should be ignored and deletedfrom Set(objectj(t-n) 1<n<t-1) in a few minute. In testing, we use Sensitivity and Specificity to verify our result。All the action in those test videos have at ObjState = Dead State least one abandoned object, objects which stay over 3 if AgingT = True (7) seconds will be considered the non-abandoned object event. Those test videos include 10 abandoned objects.If the state is the Growing State, but each conditions of Table 2 is about the definition of True positive (TP)、the Growing State are not satisfied, the GrowingT will 696
  7. 7. False positive (FP)、False negative (FN), True negative [2] Wu-Yu Chen, Meng-Fen Ho, Chung-Lin Huang, S. T.(TN) and Table 3is the table of results of 10 test video. Lee,Clarence Cheng,” DETECTING ABANDONED OBJECTS IN VIDEO-SURVEILLANCE SYSTEM”,” The 21th IPPR Conference on Computer Vision, Graphics, Table.2. Definition of TP、FP、FN、TN and Image Processing” ,CVGIP2008. IllustrationTP Abandoned objects are detected correctly [3]A.Singh ,S.Sawan ,M.Hanmandlu ,V.K.Madasu ,B.C.Love ll,”An abandoned object detection system based on dualFP Non-abandoned objects are detected as background segmentation”,” Proceedings of the 2009 abandon objects Sixth IEEE International Conference on Advanced VideoTN Non-abandoned object are detected as non- and Signal Based Surveillance”, Pages: 352-357 . abandon objectsFN Abandoned objects are detected as non- [4] J.Wang and W. Ooi. “Detecting static objects in busy abandon objects scenes”. Technical Report TR99-1730, Department of Computer Science, Cornell University, February1999.The 10 test video is from popular databases and our [5] M. Bhargava, C-C. Chen, M.S. Ryoo, and J.K. Aggarwal,results show that the methods to solve the problem “Detection of Abandoned Objects in Crowdedabandon object detection are efficient in the two points Environments”, in Proceedings of IEEE Conference onwith high accuracy and low computing cost. The Advanced Video and Signal Based Surveillance, 2007,sensitivity is 90% and specificity is 92.6%. This shows pp. 271 – 276high accuracy by applying our methods. The average [6] R. Mathew, Z. Yu and J. Zhang, “Detecting New StableFPS is around 30 fps, and real time test using IP Objects in Surveillance Video” in Proceedings of thecamera is about 25 fps. This shows cheap computing IEEE 7th Workshop on Multimedia Signal Processing,cost and the methods can work real time. 2005, pp. 1 – 4. [7] F. Porikli, Y. Ivanov, and T. Haga, “Robust Abandoned Table.3. Result of Sensitivity and Specificity Object Detection Using Dual Foregrounds”, Eurasip Positive Negative Journal on Advances in Signal Processing, vol. 2008,Positive TP = 9 FP = 4 2008.Negative FN = 1 TN = 50Sensitivity = TP / (TP + FN) = 90.0%Specificity = TN / (FP + TN) = 92.6% 4. CONCLUSIONIn this paper, the results are reasonable by applyingsome techniques of foreground analysis 、 featurefiltering and LCSM mechanism. However, thetechniques are not flawless. For example, the updatingBP(t) still has noise in a long period of time, even thougha mechanism is proposed to replace BP(t). Missingabandon object detection can not be avoided. The nextproblem is about feature filtering. In normal situation,feature filtering can separate human and object, but itcould make a false decision due to non-completedforeground detection, or people whose foregrounds arelooked like rectangular and static object. In the future,we will make this technology of an abandoned objectdetection more reliable and useful in video surveillance. REFERENCES[1] Fatih Porikli ,”Detection of Temporarily State Regions by Processing Video at Different Frame Rates ” , ” Advanced Video and Signal Based Surveillance2007”, AVSS 2007. IEEE Conference on 5-7 Sept. 697
  8. 8. HIERARCHICAL METHOD FOR FOREGROUND DETECTION USING CODEBOOK MODEL Jing-Ming Guo (郭景明), Member, IEEE and Chih-Sheng Hsu (徐誌笙) Department of Electrical Engineering National Taiwan University of Science and Technology Taipei, Taiwan E-mail:, ABSTRACT [6], the gradient information is employed to detectThis paper presents a hierarchical scheme with shadows, and which achieves good results. Yet, multipleblock-based and pixel-based codebooks for foreground steps are required for removing shadows, and thus itdetection. The codebook is mainly used to compress increases the complexity. Zhang et al. [24] proposed ratioinformation to achieve high efficient processing speed. In edge method to detect shadow, and the geometricthe block-based stage, 12 intensity values are employed to heuristics was used to improve the performance. However,represent a block. The algorithm extends the concept of the main problem of this scheme is its high complexity.the Block Truncation Coding (BTC), and thus it can Most foreground detection methods are pixel-based,further improve the processing efficiency by enjoying its and one of the popular methods is the MOG. Stauffer andlow complexity advantage. In detail, the block-based Grimson [7], [8] proposed the MOG by using multiplestage can remove most of the noises without reducing the Gaussian distributions to represent each pixel inTrue Positive (TP) rate, yet it has low precision. To background modeling. The advantage is to overcomeovercome this problem, the pixel-based stage is adopted non-stationary background which provides betterto enhance the precision, which also can reduce the False adaptation for background modeling. Yet it has somePositive (FP) rate. Moreover, the short term information is drawbacks: One of which is the standard deviation (SD);employed to improve background updating for adaptive if SD is too small, a pixel may easily be judged asenvironments. As documented in the experimental results, foreground, and vice versa. Another drawback is that itthe proposed algorithm can provide superior performance cannot remove shadows, since the matching criterionto that of the former related approaches. simply indicates that a pixel is classified as background when it is within 2.5 times of SD. Chen et al. [9] proposedKeywords- Background subtraction; foreground a hierarchical method with MOG, the method alsodetection; shadow detection; visual surveillance; BTC employs block and pixel-based strategy, yet shadows cannot be removed with their method. Martel-Brisson and1. INTRODUCTION Zaccarin [10] presented a novel pixel-based statisticalIn visual surveillance, background subtraction is an approach to model moving cast shadows of non-uniformimportant issue to extract foreground object for further and intensity-varying objects. This approach employsanalysis, such as human motion analysis. A challenge MOG’s learning strategy to build statistical models forproblem for background subtraction is that the describing moving cast shadows, yet this model requiresbackgrounds are usually non-stationary in practice, such more time for learning. Benedek and Sziranyi [23] chooseas waving tree, ripple water, light changing, etc. Another the CIE L*u*v space to detect foregrounds or shadows bydifficult problem is that the foreground generally suffers MOG, and the texture features are employed to enhancefrom shadow interference which leads to wrong analysis the segmentation results. The main problem of thisof foreground objects. Hence, background model is highly scheme is its low processing speed.demanded to be adaptively manipulated via background Kim et al. [11] presented a real-time algorithm formaintenance. In [1], some of the well-known issues in foreground detection which samples background pixelbackground maintenances are introduced. values and then quantizes them into codebooks. This To overcome shadows, some well-known methods can approach can improve the processing speed bybe adopted for use, such as RGB model, HSV model, compressing background information. Moreover, twogradient information and ratio edge. In particular, features, layered modeling/detection and adaptiveHorprasert et al. [2] proposed to employ statistical RGB codebook updating, are presented for further improvingcolor model to remove shadow. However, it suffers from the algorithm. In [12] and [13], the concept of Kohonensome drawbacks, including 1) more processing time is networks and Self-Organizing Maps (SOMs) [14] wererequired to compute thresholds, 2) non-stationary proposed to build background model. The backgroundbackground problem cannot be solved, and 3) a fixed model can automatically adapt to a self-organizingthreshold near the origin is used which offers less manner and without a prior knowledge. Patwardhan et al.flexibility. Another RGB color model proposed by [15] proposed robust foreground detection by propagatingCarmona et al. [18] can solve the third problem of [2], yet layers using the maximum-likelihood assignment, andit needs too many parameters for their color model. In [3] then clustered into “layers”, in which pixels that shareand [4], the HSV color model is employed to detect similar statistics are modeled as union of suchshadows. The shadows are defined by a diminution of the nonparametric layer-models. The pixel-layer manner forluminance and saturation values when the hue variation is foreground detection requires more time for processing, atsmaller than a predefined threshold parameter. In [5] and around 10 frames per second on a standard laptop 698
  9. 9. computer. In our observation, classifying each pixel to model building. In our observation, the CB employs morerepresent various types of features after background information to build the background, yet the proposedtraining period is good manner for building adaptive method employed the concept of MOG [7] by simplybackground model. Also, it can overcome the using weights to classify foreground and background andnon-stationary problem for background classification. thus can provide even higher efficient advantage and the Another foreground detection method can be classified precision is also higher than that of CB. Anotheras texture-based, in which Heikkila and Pietikainen [16] difference between the proposed method and CB is thatpresented efficient texture-based method by using the two stages, namely block-based and pixel-basedadaptive local binary pattern (LBP) histograms to model stages, are involved in background model construction,the background of each pixel. LBP method employs while simply one stage is used in CB. In block-basedcircular neighboring pixels to label the threshold stage, multiple neighboring pixels are classified as a unit,difference between neighboring pixels and the center while a pixel is the basic unit in pixel-based. Figure 1pixel. The results are considered as a binary number shows the structure of the background model whichwhich can fully represent the texture of a pattern. composes of block-based and pixel-based stages. The In this study, a hierarchical method is proposed for details are introduced in the following sub-sections.background subtraction by using both block andpixel-based stages to model the background. This Background Modelblock-based strategy is from the traditional compressionscheme, BTC [17], which divides an image intonon-overlapped blocks, and each pixel in a block is Block-based Pixel-basedsubstituted by a high mean or low mean. BTC algorithmsimply employs two distinct intensity values to representa block. Yet, in this paper, four intensity values are Fig. 1. Structure of background construction model.employed to represent a block, and each pixel in a blockis substituted by the high-top mean, high-bottom mean, 2.1 Block feature in block-based stagelow-top mean or low-bottom mean. The block-based The block feature used in this study is extended frombackground modeling can efficiently detect foreground BTC algorithm which maintains the first and the secondwithout reducing TP, yet the precision is rather low. To moment in a block. Although BTC is a highly efficientovercome this problem, the pixel-based codebook strategy coding scheme, we further reduce its complexity byis involved to compress background information to modifying the corresponding high mean and low mean.simultaneously maintain its high speed advantage and Moreover, we extended the BTC algorithm by using fourenhance the accuracy. Moreover, a modified color model intensity values to represent a block to increase thefrom the former approach [18] is used to distinguish recognition confidence, each pixel in a block isshadow, highlight, background, and foreground. The substituted by the High-top mean (Ht), High-bottom (Hb),modified structure can simplify the used parameters and Low-top (Lt) or Low-bottom (Lb) means. Suppose anthus improve the efficiency. As documented in the image is divided into non-overlapped blocks, and eachexperimental results, the proposed method can effectively block is of size M x N. Let x1, x2, ..., xm be the pixelsolve the non-stationary background problem. One values in a block, where m=MxN. The average value of aspecific problem for background subtraction is that a block is 1 m x   ximoving object becomes stationary foreground when it (1)stands still for a while during the period of background m i 1construction. Consequently, this object shall become a The high mean Hm and low mean Lm is defined aspart of the background model. For this, the short term m minformation is employed to solve this problem in (x i | xi  x ) (x i | xi  x )background model construction. Hm  i 1 , Lm  i 1 (2) The paper is organized as below. Section 2 presents q mqinitial background model in background training period where q denotes the number of pixels equal or greaterthat includes the block-based and pixel-based codebooks. than x . Notably, if q is equal to m or 0 then all theSection 3 reports background subtraction by the proposed values in a block are forced to be identical to x . In thishierarchical scheme. Section 4 introduces the short term case, the Ht, Hb, Lt and Lb are assigned with x .information with background model. Section 5 documents Otherwise, three thresholds ( x , Hm and Lm ) areexperimental results, in terms of accuracy and efficiency, employed to distinguish the four intensity values, Ht, Hb,and compares with former MOG [7], Rita’s method [4], Lt and Lb as defined below,CB [11], Chen’s method [9] and Chiu’s method [22] m mschemes. Section 6 draws conclusions. (x i | xi  Hm) (x i | x  xi  Hm) Ht  i 1 , Hb  i 1 (3) p q p2. INITIAL BACKGROUND MODEL m mIn this study, two types of codebooks are constructed for  ( x | Lm  x  x ) i i  ( x | x  Lm) i iblock-based and pixel-based background modeling. The Lt  i 1 , Lb  (4) i 1proposed background modeling is similar to CB [11]. The mqk kadvantage of CB is its high efficiency in background where p denotes the number of the pixels equal or greater 699
  10. 10. than Hm. If p is equal to q or 0, then both Ht and Hb are  vblock _ L  xblock_ tassigned with a value equal to Hm. The variable k denotes 1the number of the pixels which are smaller than Lm. If k  wL is equal to (m-q) or 0, then both Lt and Lb are assigned Nwith a value equal to Lm. In RGB color spaces, a divided IV. Otherwise, update the matched codeword cm,block of a specific color space is transformed to yield a consisting of Vblock_m and wm, by setting: block _ m  (1   )vblock _ m   x block_tset of Ht, Hb, Lt, and Lb. Thus, a block is represented by  v (5)Vblock=(RHt, GHt, BHt, RHb, GHb, BHb, RLt, GLt, BLt, RLb, GLb, 1BLb).  wm  wm  N The reason that the proposed block feature can provide end forsuperior performance than the former schemes is that Step 3: select background codeword in codebook:unlike the traditional BTC, The codeword size for a block I. Sort the codewords in descending orderis increased from six to twelve to better characterize the according to their weightstexture of the block for the block-based background breconstruction. Moreover, the BTC-based strategy can II. B  arg min  wk  T (6) b k 1significantly reduce the complexity to adapt to a real-timeapplication. Compared with the former Chen’shierarchical method [9], in which the texture information where α denotes the learning rate and which is empiricallyis employed to form a 48-dimension feature, the proposed set at 0.05 in this study. Step 3 is to demarcate themethod can effectively classify foreground and background with the way as that in MOG [7]. A codewordbackground by simply using 12 dimensions. Moreover, with a bigger weight has higher likelihood of being athe processing speed is superior to Chen’s method. background codeword in the background codebook. The codewords are sorted in descending order according to2.2 Initial background model for block-based their weights, and then select the codewords meet Eq. 6 ascodebook the background codebook, where T denotes an empiricalIn block-based stage, an image is divided into threshold with value 0.8.non-overlapped blocks, and each block can construct itsown codebook. Using N training sequence to build the 2.3 Initial background model for pixel-based codebookblock-based codebook, thus each codebook of a block has Algorithm for codebook construction in pixel-based stageN block vectors for training the background model. Let X is similar to block-based stage when a basic unit isbe a training sequence for a block consisting of N block changed from a block to a pixel. Let X be a trainingvectors: X={xblock_1,xblock_2,…,xblock_N}. Let C=(c1, c2,…, sequence for a pixel consisting of N RGB vector:cL) represent the codebook for a block consisting of L X=(xpixe_1, xpixel_2, …, xpixel_N). Let F=(f1, f2,…, fL ) be thecodewords. Each block has a different codebook size codebook for a pixel consisting of L codewords. Eachbased on codewords’ weights. Each codeword ci, i=1, …, pixel has a different codebook size based on codewords’L, consisting of an block vector vblock_i=( RHt _ i , GHt _ i , weight. Each codeword fi, i=1…L, consisting of a pixel vector vpixel_i=(Ri, Gi, Bi) and a weight wi.BHt _ i , RHb _ i , GHb _ i , BHb _ i , RLt _ i , GLt _ i , BLt _ i , RLb _ i , In the step 2(II), find the codeword fm matching toGLb _ i , BLb _ i ) and a weight wi. xpixel_t based on the match_function(xpixel_t, vpixel_m) which will be introduced in Section 2.4. In the step 2(III), if F=0 In the training phase, an input block vector xblock or there is no match, then create a new codeword f L bycompares with each codeword in the codebook. If no assigning xpixel_t to vpixel_L. Otherwise, update fm bymatch is found or there is no codeword in the codebook, assigning (1   )v pixel _ m   x pixel_t to vpixel_m. In the step 3,the input codeword is created in the codebook. Otherwise,update the matched codeword, and increase the weight the parameters α and T are identical to that of thevalue. To determine which codeword is the best matched block-based stage.candidate, the match function as introduced in sub-section The proposed block-based and pixel-based procedures2.4 is employed for measuring. The detailed algorithm is are used to establish the background mode, which isgiven below. similar to CB [11]. The main difference is that the CB employs more information to build the background, yet Algorithm for block-based codebook construction the proposed method employed the concept of MOG [7]Step 1: L  0 , C  0 (empty set) by simply using weights to classify foreground andStep 2: for t=1 to N do background and thus can provide higher efficient I. xblock_t=(RHt_t , GHt_t , BHt_t , RHb_t , GHb_t , BHb_t , advantage and the precision is also higher than that of CB RLt_t , GLt_t , BLt_t , RLb_t , GLb_t , BLb_t) [11]. II. find the codeword cm in C={ ci | 1  i  L } 2.4 Match function matching to xblock_t based on: The match function for n dimensions employed in this  Matching_function(xblock_t, vblock_m)=true study in terms of squared distance is given as below III. If C=0 or there is no match, then L  L  1. dTd Create a new codeword cL by setting:  2 (7) N 700
  11. 11. where d  (I ) 1 ( x  v) , and the empirical value of the Input Sequencestandard deviation σ is in between 2.5 and 5, with 2.5 as atight bound and 5 as a loosen bound; The identity matrixis of size NxN, where N=12 and 3 in block-based and Foregroundpixel-based stages, respectively. The match function can Detectionbe applied for n dimensions; the proposed block vectorvblock in the block-based stage is of 12 dimensions, and Foreground Foregroundpixel vector vpixel is of 3 dimensions. A match is found as Block-based stage Pixel-based stagesample falling within λ=2.5 standard deviation of themean of one of the codeword. The output of the match Backgroundfunction is as below:  d Td Background true,  2 ; (8) Update Pixel-based Shadows match _ function ( x, v)   N background Model Highlight  false, otherwise. In the pixel-based phase, the color model is exploited toclassify a pixel simply when no match is found. The Short term Construct short term Construct short termstrategy can significantly improve the efficiency. information information information for Block-based stage for Pixel-based stage3. FOREGROUND DETECTIONThe proposed foreground detection stage can also be Weight > T_add Weight > T_adddivided into block-based and pixel-based stages. In theblock-based stage, the match function introduced in Insert short term Insert short termsection 2.4 is employed to distinguish background or Insert information informationforeground. If a block is classified as background, then to Block-based background model to Pixel-based background modelwhich is fed to pixel-based background model updatingfor adapting to the current environment conditions. Yet, Fig. 2. Flow chart for foreground detection.this raises a disadvantage by increasing the processingtime for foreground detection. For this, the threshold 3.2 Pixel-based background updating modelT_update is used to enable the updating phase, which To adapt to the current environment conditions, when ameans the updating is conducted every T_update frames. block is classified as background in block-based stage, theEmpirically, the T_update is set at 2~5 to guarantee the corresponding pixel-based background model needs to beadaptation of the background model. Using the color updated. Yet this raises a disadvantage by increasing themodel function which will be introduced in Section 3.5 processing time for foreground detection. For this, theand the match function can distinguish the current frame threshold T_update is used to enable the updating phase,into four states, background, foreground, high light and which means the updating is conducted every T_updateshadows. Figure 2 shows the proposed foreground frames. Empirically, the T_update is set at 2~5 todetection flow chart, and which is detailed in the guarantee the adaptation of the background model.following sub-sections. Meanwhile, the match function is used to find the matched codeword for updating. The details of the3.1 Foreground detection with block-based stage algorithm are organized as below.Block-based stage is employed to separate backgroundand foreground. Although the block-based stage has low Algorithm for pixel-based background model updatingprecision, it can ensure the detected foreground without Step 1: xpixel=(R,G,B)reducing TP rate when σ is set with a small value as a Step 2: if the accumulated time is equal to T_update, thentight bound. However, a small σ increases FP rate as well. doTherefore, there is a trade-off in choosing the value of σ. 1) for all codewords in B in Eq. (6), find theHerein, the empirical value is set at 2.5 in this work. codeword fm matching to xpixel based on :  Match_function(xpixle, vpixel_m)=true Algorithm for background subtraction using block-based Update the matched codeword as codebook v pixel _ m  (1   )v pixel _ m   x pixelStep 1: xblock=(RHt, GHt, BHt, RHb, GHb, BHb, RLt, GLt, BLt, RLb, GLb, BLb)Step2: for all codewords in B in Eq. (6), find the 3.3 Foreground detection with pixel-based stage codeword cm matching to xblock based on : If a pixel is classified as foreground in block-based, then  Match_function(xblock, vblock_m)=true input pixel xpixel=(R,G,B) proceeds to pixel-based stage to Update the matched codeword as in Eq. (5) determine the state of a pixel. Algorithm for pixel-based background subtraction is similar to block-based. The Foreground if there is no match; block )  Step 3: BS(x only difference is on the match function. Herein, the color Background otherwise. model and match function are used to determine a pixel vector belongs to shadow, highlight, background, or 701
  12. 12. foreground. The detailed algorithm is organized below. between 2 and 3.5. I _ max   v , I _ min   v Algorithm for background subtraction using pixel-based (11) Where β>1 and γ<1. In our experiments, β is set in codebookStep 1: xpixel=(R,G,B) between 1.1 and 1.25, and γ is in between 0.7 and 0.85.Step 2: for all codewords in B in Eq. (6), find the The range [I_max, I_min] is used for measuring comp_I; codeword fm matching to xpixel based on : if comp_I is not in this range, the pixel is classified as foreground. The overall color model is organized as  s  color _ mod el _ function ( x pixel , v pixel ) below:  If s is classified as background, then do v pixelm  (1   )v pixelm   x pixel Color_model_function(x, v) = Background if match_func tion (x, v)  true;3.4 Color model  proj_I Highlight else  tan θ & v  comp _ I  I_max;In [18], the proposed color model can classify a RGB  comp_Icolor pixel into shadow, highlight, background, and  Shadow proj_Iforeground. However, many parameters are employed in else  tan θ & I_min  comp _ I  v ;  comp _ Ithis model, which leads to a disadvantage by increasing the computational complexity. In this work, the number of Foreground otherwise.parameters is reduced to three, namely θ, β, and γ, to (12)reduce the complexity. Figure 3 shows the modified color 4. BACKGROUND MODEL UPDATING WITHmodel. SHORT TERM INFORMATION G As indicated in Fig. 2, the background model updating with the short term information is divided to two stages: I_max First, construct short term information model with foreground region; second, if a codeword in short term information model accumulates enough weights, this _I mp Vi codeword will be inserted to background model for Co Proj_I I_min foreground detection. This strategy yields an advantage: A Xi (input pixel) user can control the lasting period of a stationary θ foreground which can be inserted to the background model. However, a non-stationary foreground region will R lead to too much unnecessary codewords in short term B information model. For this, time information is added to Fig. 3. Modified color model a codeword and a threshold is used to decide whether a codeword is reserved or deleted. In addition, identicalGiven an input pixel vector x, the match function is strategy is applied to background model as well.employed for measuring if it is in background state. If the The procedures of the short term informationvector x is classified as foreground by the match function, construction for block-based and pixel-based phases arethen we compare the angle, tanθ. If proj_I/comp_I is identical. The main concept is to add an additional modelgreater than tanθ, the vector x must be foreground. S called the short term information model. The S recordsOtherwise, the input vector x may fall within this color foreground regions after foreground detection. The modelmodel bound. Subsequently, the variables I_max and construction is similar to that of Section 2. Yet, herein theI_min are calculated; if comp_I falls in between v and time information (S_time) is added for a codeword. InI_max, the pixel is classified as highlight; if the pixel addition, three additional thresholds, S_delete, T_add, andvalue is not in between I_max and I_min, the pixel is B_delete, are employed: S_delete is used to determineclassified as foreground. whether a codeword is reserved or deleted. If the current Given an input pixel vector x=(R,G,B) and time subtracted by the last time information of abackground vector v  ( R , G , B ) , codeword is smaller than S_delete, then the codeword is unnecessary in the codebook, and thus it is deleted from x  R2  G2  B2 , v  R 2  G 2  B 2 the codebook. T_add is used to decide whether a x, v  ( RR  GG  BB ) codeword is inserted to background model. If a codeword x, v accumulates enough weights, then this codeword can be acomp _ I  x cos   (9) part of background model. B_delete is used to determine v whether a codeword is reserved or deleted in backgroundproj _ I  x  comp _ I 2 2 (10) model when short term information is inserted to background model, and sets the parameter B_deletewhere comp_I is used to determine a pixel vector belongs equals to T_add times S_delete (which is the worst case)to shadow or highlight; where proj_I is used to measure to ensure reserve the last updated time for codeword inthe nearest distance with background vector v. If background model. The overall procedure of theproj_I/comp_I is greater than tanθ, then the pixel is algorithm is organized as below.classified as foreground. Herein, θ is empirically set in 702
  13. 13. Algorithm for short term information model construction green and blue, are employed to represent shadows,Step 1: Given a background model B with the initial highlight and foreground, respectively. Figure 4(b) shows background model, create a new model S for the detected results using the block-based stage with recording foreground regions. block of size 10x10, in which most of the noises can beStep 2: Add time information parameter (B_time) for removed. Figure 4(c) shows the results obtained by the every codeword in B for recording current time hierarchical block-based and pixel-based stages. (C_time). S is assigned with an empty set. Apparently, the pixel-based stage can significantlyStep 3: enhance the detected precision. Yet, we would like to I. Find a match codeword in B for an input image. point out a weakness of the proposed method. As it can be The “match” is determined by when a codeword seen in the third row of Fig. 4 (Highway_I), when the is found during the updating codeword in Eqs. color of the shadow is dark, it will be classified as (5) and B_time is equal to C_time. foreground. Since a lower threshold is set for the color II. If no match codeword is found in B, then search model of the proposed method, when the value exceeds the matched codeword in S for foreground the threshold it will be classified as shadows. The problem region, and do the following steps: can be eased by increasing the threshold. Yet, as it can be i. find the codeword sm in S={ si | 1  i  L } seen in Fig. 4, some of the foregrounds are classified as whether matching to x (input vector) based shadows by doing this. In summary, the proposed method on the matching function. performs well for small intensity of shadows, yet it cannot ii. If S=0 or no match, then L  L  1 . Create a provide perfect performance for greater intensity of new codeword sL by setting: shadows.  vL  x  wL=1  S_timeL=C_time iii. Otherwise, update the matched codeword s m, consisting of vm, wm and S_timem, by setting:  v  (1   )v   x m m  w  w 1 m m  S_time  C_time mStep 4: S  {sm | (C _ time  S _ timem )  S _ delete}Step 5: Check the weight of every codeword in S. If the weight of the codeword is greater than T_add, then do the following steps: I. B  {cm | (C _ time  B _ timem )  B _ delete} (a) (b) (c) II. Add codeword as short term information at the Fig. 4. Classified results of sequence [19] for IR (row 1), head of B. Campus (row 2) and Highway_I (row 3) with shadowStep 6: Repeat the algorithm from Step 3 to Step 5. (red), highlight (green), and foreground (blue). (a) Original image, (b) block-based stage only with block of5. EXPERIMENTAL RESULTS size 10x10, and (c) proposed method.For measuring the accuracy of the results, the criterionsFP rate, TP rate, Precision, and Similarity [12] are Figure 5 shows the test sequence WT [21] ofemployed as defined below: non-stationary background with waving tree containing fp tpFP rate  TP rate  287 frames of size 160x120. Compared with the five fp  tn , tp  fn , former methods, MOG [7], Rita’s method [4], CB [11], tp Similarity  tp Chen’s method [9] and Chiu’s method [22] the proposedPrecision  tp  fp , tp  fp  fn , method can provide better performance in handling non-stationary background. Moreover, show the detectedwhere tp, tn, fp, and fn denote the numbers of true results with different block sizes using simplypositives, true negative, true positives, and false negative, block-based codebook. Apparently, most noises arerespectively; (tp + fn) indicates the total number of pixels removed without reducing TP rate. Most importantly, thepresented in the foreground, and (fp + tn) indicates the processing speed is highly efficient with the block-basedtotal number of pixels presented in the background. In our strategy. Yet, a low precision is its drawback. Toexperimental results is without any post processing and overcome this problem, the pixel-based stage is involvedshort term information for measuring the accuracy of the to enhance the precision, and which can also reduce theresults. FP rate. And, show the detected results using the proposed Figure 4 shows the test sequences [19] of size hierarchical scheme (block-based stage and pixel-based320x240 with IR (row 1), Campus (row 2) and stage) with various block sizes. Figures 6(a)-(d) shows theHighway_I (row 3). To provide a better understanding accuracy values, FP rate, TP rate, Precision, and Similarity,about the detected results, three colors, including red, 703