Content based indexing and retrieval from vehicle surveillance videos
Upcoming SlideShare
Loading in...5
×
 

Content based indexing and retrieval from vehicle surveillance videos

on

  • 306 views

 

Statistics

Views

Total Views
306
Views on SlideShare
306
Embed Views
0

Actions

Likes
0
Downloads
14
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Content based indexing and retrieval from vehicle surveillance videos Content based indexing and retrieval from vehicle surveillance videos Document Transcript

    • INTERNATIONALComputer EngineeringCOMPUTER ENGINEERING International Journal of JOURNAL OF and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME & TECHNOLOGY (IJCET)ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online)Volume 4, Issue 1, January- February (2013), pp. 420-429 IJCET© IAEME:www.iaeme.com/ijcet.aspJournal Impact Factor (2012): 3.9580 (Calculated by GISI) ©IAEMEwww.jifactor.com CONTENT BASED INDEXING AND RETRIEVAL FROM VEHICLE SURVEILLANCE VIDEOS USING GAUSSIAN MIXTURE MODEL Shruti V Kamath 1, Mayank Darbari 2, Dr. Rajashree Shettar 3 1 Student (8thsemB.E), Department of Computer Science and Engineering, R.V College of Engineering, R.V Vidyaniketan Post, Mysore Road, Bangalore-59, Karnataka, India 2 Student (8thsemB.E), Department of Computer Science and Engineering, R.V College of Engineering, R.V Vidyaniketan Post, Mysore Road, Bangalore-59, Karnataka, India 3 Professor, Department of Computer Science and Engineering, R.V College of Engineering, R.V Vidyaniketan Post, Mysore Road, Bangalore-59, Karnataka, India ABSTRACT Visual vehicle surveillance is widely used and produces a huge amount of video data which are stored for future or immediate use. To find an interesting vehicle from these videos because of car crashes, illegal U-turns, speeding or anything that the user may be interested in, is a common case. This would be a very labour intensive process, if there is not an effective method for the indexing and retrieval of vehicles. In the proposed approach, a surveillance video from a highway is taken. Then shots are detected using colour histogram method. There are great redundancies among the frames in the same shot; therefore, key frame are extracted to avoid redundancy. After key frames are extracted from the input video, objects are detected and tracked (moving vehicles) using Gaussian Mixture Model (GMM). The proposed method extracts vehicles and classifies them based on area and colour. Unsupervised clustering is performed on the vehicles extracted using K-means and SOM (Self Organizing Map) algorithms. Clustering is done based on size (small, medium, large) and colour (red, white, grey) of the vehicles. Using the clustering results, a matching matrix is computed for these two clustering algorithms, with respect to the object tracking method. The proposed system gives an accuracy of up to 97% (with respect to size) and up to 94% (with respect to colour). 420
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEKeywords – Color Histogram, Gaussian Mixture Model, k-means, SOM, VehicleSurveillance, Vehicle ClassificationI. INTRODUCTION Digital Surveillance Systems capture a lot of video data. To make the processing ofthis data efficient and accurate, a context based indexing and retrieval system is used for this.This paper proposes a system which demonstrates its strong ability in digital surveillance andcommonly has applications in police department, traffic control, intelligence bureau anddefence.A surveillance video from a highway is taken. Then shots are detected using colourhistogram method [1] [7], which takes the histogram difference of the frames and computes ahistogram. A threshold value is then set. The frames where the value is above the thresholdare identified as shots. To represent a shot succinctly, the redundancies are removed from the video usingkey frame extraction [5][6]. After key frames are extracted from the input video, we detectand track objects (moving vehicles) using the Gaussian Mixture Model [8]. We have assumeda stationary camerawhich is fixed at an angle for the input videos. Vehicles identified areclassified based on area and colour. Unsupervised clustering is performed on the objects detected using K-means andSOM (Self Organizing Map) [9] [10] algorithms. Clustering is done for vehicles under 3classes namely small, medium and large with respect to area and 3 classes namely, dark grey,light grey and red with respect to colour. Then given a threshold value, the extracted vehiclesare clustered and the matching matrix (which is a specific table layout that allowsvisualization of the performance of an algorithm) is calculated using the results of k-meansand SOM, with respect to the GMM. The remainder of this paper is organized as follows. Section II is a Literature Survey,citing about the existing algorithms. A brief System Overview is present in Section III. Howthe feature is extracted and indexed is explained in Section IV, Video Processing & ObjectTracking and Indexing. Experiments and Results are discussed in Section V. Finally theconclusion and future work is presented in Section VI.II. LITERATURE SURVEY The research on shot boundary detection has a long history, and there exist specificsurveys on video shot boundary detection. A shot is a consecutive sequence of framescaptured by a camera action that takes place between start and stop operations, which markthe shot boundaries. Generally, shot boundaries are classified as cut in which the transitionbetween successive shots is abrupt and gradual transitions which include dissolve, fade in,fade out, wipe, etc., stretching over a number of frames. Methods for shot boundary detectionusually first extract visual features from each frame, then measure similarities betweenframes using the extracted features, and, finally, detect shot boundaries between frames thatare dissimilar. Using the measured similarities between frames, shot boundaries can bedetected. To measure similarity between frames using the extracted features is the secondstep required for shot boundary detection. Current similarity metrics for extracted featurevectors include the 1-norm cosine dissimilarity, the Euclidean distance, the histogramintersection, and the chi-squared similarity. [1][7]. 421
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME The key frames of a video reflect the characteristics of the video to some extent.Traditional image retrieval techniques can be applied to key frames to achieve video retrieval.The static key frame features useful for video indexing and retrieval are mainly classified ascolor-based, texture-based, and shape-based. Color-based features include color histograms,color moments, color correlograms, a mixture of Gaussian models [8], etc. Shape-basedfeatures that describe object shapes in the image can be extracted from object contours orregions. Color is usually represented by color histogram, color correlogram, color coherencevector and color moment, under certain a color space. The color histogram feature has beenused by many researchers for image retrieval. A color histogram is a vector, where eachelement represents the number of pixels falling in a bin, of an image [13]. The detection of moving objects in video sequences is an important and challengingtask in multimedia technologies. To segment moving foreground objects from thebackground a pure background image has to be estimated. This reference background imageis then subtracted from each frame and binary masks with the moving foreground objects areobtained by thresholding the resulting difference images [1]. Gaussian Mixture Model for object detection has different variations. In one suchvariation, each pixel in a camera scene is modeled by an adaptive parametric mixture modelof three Gaussian distributions. A common optimization scheme used to fit a Gaussianmixture model is the Expectation Maximization (EM) algorithm. The EM algorithm is aniterative method that guarantees to converge to a local maximum in a search space [8]. Machine learning-based approach uses labeled samples with low-level features totrain a classifier or a set of classifiers for videos.Vehicle detection and vehicle classificationusing neural network (NN), can be achieved by video monitoring systems [10].III. SYSTEM OVERVIEW Shot Key Frame Vehicle Vehicle Clustering Vehicle Boundary Extraction Tracking Extraction Retrieval Detection Feature Extraction Store in .mat file Fig. 1 shows the architecture of the proposed framework. As shown in Fig. 1, the raw surveillance video is captured and the vehicles areextracted. During object tracking, on one hand the information of vehicles is extracted andsaved as .mat file. The color feature is picked up from the vehicles and stored in .mat file.When the user selects a color or/and a type provided by the system, there trieval results willbe returned. 422
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEIV. VIDEO PROCESSING & OBJECT TRACKING AND INDEXINGA. SHOT BOUNDARY DETECTION A shot is a consecutive sequence of frames captured by a camera action that takes placebetween start and stop operations, which mark the shot boundaries. There are strong contentcorrelations between frames in a shot. In the surveillance videos we used, abrupt transitionswere present, where transitions between successive shots are abrupt.In this paper, threshold-based approach is implemented [1].Threshold-Based Approach: The threshold-based approach detects shot boundaries bycomparing the measured pair-wise similarities between frames with a predefined threshold:When a similarity is less than the threshold, a boundary is detected. The threshold can beglobal, adaptive, or global and adaptive combined. The global threshold-based algorithms usethe same threshold, which is generally set empirically, over the whole video, as in. The majorlimitation of the global threshold-based algorithms is that local content variations are noteffectively incorporated into the estimation of the global threshold, therefore influencing theboundary detection accuracy.Abrupt Transition: One of the effective ways is intensity histogram. According to NTSCstandard, the intensity for a RGB frame can be calculated as, I = 0.299R + 0.587G + 0.114B (1)where R, G and B are Red, Green and Blue channel of the pixel.The intensity histogram difference [12] is expressed as, ܵ‫ ݅ܦ‬ൌ ∑ீ |‫ ݅ܪ‬ሺ݆ሻ െ ݄݅ ൅ 1ሺ݆ሻ| ௝ୀଵ (2)where Hi(j) is the histogram value for ith frame at level j. G denotes the total number of levelsfor the histogram. In a continuous video frame sequence, the histogram difference is small, whereas forabrupt transition detection, the intensity histogram difference spikes. Even there is a notablemovement or illumination changes between neighboring frames, the intensity histogramdifference is relatively small compared with those peaks caused by abrupt changes.Therefore, the difference of intensity histogram with a proper threshold is effective indetecting abrupt transitions.The threshold value to determine whether the intensity histogram difference indicates aabrupt transition can be set to, ܾܶ ൌ ߤ ൅ ߙߪ (3)where ߤ and ߪ are the mean value and standard deviation of the intensity histogramdifference. The value of ߙ typically varies from 3 to 6. Sometimes gradual transitions can have spikes that even higher than those in abrupttransitions. In order to differentiate the abrupt transitions from gradual transitions, theneighboring frames of a detected spike are also tested, if there’re multiple spikes nearby, thetransition is more likely to be gradual transition, and we simply drop this detection. 423
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEB. KEY FRAME EXTRACTION There are great redundancies among the frames in the same shot; therefore, certainframes that best reflect the shot contents are selected as key frames to succinctly represent theshot. The extracted key frames should contain as much salient content of the shot as possibleand avoid as much redundancy as possible. The features used for key frame extractioninclude colours (particularly the colour histogram), edges, shapes, optical flow, MPEG-7motion descriptors. We have adopted the colour feature to extract key frames.Sequential Comparison between Frames: In this algorithm [1], frames subsequent to apreviously extract key frame are sequentially compared with the key frame until a framewhich is very different from the key frame is obtained. This frame is selected as the next keyframe. The merits of the sequential comparison-based algorithms include their simplicity,intuitiveness, low computational complexity, and adaptation of the number of key frames tothe length of the shot. The limitations of these algorithms include the following. The keyframes represent local properties of the shot rather than the global properties. The irregulardistribution and uncontrolled number of key frames make these algorithms unsuitable forapplications that need an even distribution or a fixed number of key frames. Redundancy canoccur when there are contents appearing repeatedly in the same shot. This approach provedefficient in removing redundancies in the input video.The Algorithm:1. Firstly, we choose the first frame as the standard frame that is used to compare with thefollowing frames.2. Get the corresponding pixel value in both frames one by one, and computing theirdifference respectively.3. After finishing 2, add the results in 2 altogether. The sum will be the difference betweenthese two frames.4. Finally, if sum is larger than a threshold we set, select frame (1+i) as a key-frame, thenframe (1+i) becomes the standard frame. Redo 1 to 4 until there is no frame can be captured.C. OBJECT TRACKING USING GAUSSIAN MIXTURE MODEL An adaptive non-parametric Gaussian mixture model is employed to detect objects.This model can also lessen the effect of small repetitive motions; for example, movingvegetation like trees and bushes as well as small camera displacement.Background Extraction Gaussian Mixture Model (GMM) is utilized to represent the background in RGB colorspace due to its effectiveness. For an observed pixel, its probability will be estimated by KGaussian distributions. To update the model, each pixel value in a new video frame isprocessed to determine if it matches any of the existing Gaussian distributions at that pixel’slocation. If a match is confirmed for one of the distributions, a Boolean variable is set. Ineach frame, the weighting factor for each distribution is updated, and ‘a’ is the learning ratethat controls the speed of learning. Pixels that are dissimilar to the background Gaussiandistributions are classified as belonging to the foreground. In our experiments, the followingparameters were empirically set as K=5 and a=0.005 and Variance when initializing a newGaussian Mode = (30/255)^2. 424
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMED. FEATURE EXTRACTION The vehicles are extracted from the video at a point where they are prominent. In theproposed method, a range value is specified within which vehicles are clearly visible. Therow value of centroid of the bounding box (i.e. position) is considered for the comparisonwith the above range. Vehicles within the bounding box are extracted and stored in a mat filealong with the area and frame number. 1. A range is set within which the bounding box of the vehicles are extracted, when they come within range. The range is set such view of the camera is uniform. 2. Vehicles do not move uniformly, the same vehicle may be detected more than once. 3. An array is set which holds the vehicles of the previous 8-10 vehicles extracted. We have observed through experimental analysis that a max of the previous 10 vehicles, the object may be extracted more than once for our dataset. 4. A object detected is thus compared to previously extracted vehicles by using Euclidean distance norm formula(taking RGB values of the extracted portion of vehicle) 5. The area (number of pixels within the blob) and the image of cropped portion of car is stored in a mat file. Here, the colors red, black and white are considered as the color category. In order to findout which of these is a dominant color in the image, the cropped image of the vehicle isdivided into 8x8 blocks and the centroid pixel’s HSL color value is extracted. This is similarto extraction of color in [11]. We have approximated the HSL value for the color category.Low values of luminance are closer to black and higher values are closer to black. Anapproximate hue and saturation value range is associated for the color required. Starting withlightness value of the centroid pixel of the blocks, a histogram is computed for the valuesusing the approximate ranges. The values which satisfy a particular range of luminance for acolor are considered for comparing with saturation range, and those satisfying the saturationrange of the same color, are considered for comparison with hue range of that particularcolor. The count of a color if significant in an image is considered as the block’s dominantcolor. Thus using the block’s centroid color value, we obtain dominant color of an image. Anindex value is assigned for each of vehicles extracted and stored in a file.E. CLUSTERING Clustering is performed on the samples gathered based on size of vehicle and color.K-means and Self-Organizing maps are two unsupervised clustering methods used. Self-organizing maps are different from other artificial neural networks in the sensethat they use a neighborhood function to preserve the topological properties of the inputspace. This makes SOMs useful for visualizing low-dimensional views of high-dimensionaldata, akin to multidimensional scaling. The model was first described as an artificial neuralnetwork by the Finnish professor TeuvoKohonen, and is sometimes called a Kohonen map ornetwork. K-means clustering is a method of cluster analysis which aims to partition ‘n’observations into ‘k’ clusters in which each observation belongs to the cluster with thenearest mean using Euclidean distance formula. 425
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME Clustering is performed on the input data set based on area dividing into 3 classes(small, medium, large). For SOM, 200 epoches are used. A threshold is set and the actual classification into small, medium and large isdetermined. Similarly, this approach is done for colour as well. The matching matrix is thencalculated.The results of the clustering are stored in .mat files, which are used for retrieval.F. VEHICLE RETRIEVAL The user is provided with a graphical interface. The user selection is matched withvalues stored in the mat files. The output is displayed to the user. Hence this approach isquick and simple.V. EXPERIMENTAL ANALYSIS AND RESULTS Matlab is used for our implementation. The area of the vehicle is obtained bycalculating the area of the blob. This approach chooses a fixed range is where the areas of theblobs are captured while processing the video. Kohonen Map and k-means algorithm is usedto perform clustering on the input dataset. 200 epoches are used. The test dataset consists of a video having 4321 frames. After shots boundaries havebeen detected and key frames have been extracted from the video, the number of frames isbrought down to 3222. Therefore the total frames from 4321 are brought down to 3222. A threshold is set and the actual classification into small, medium and large isdetermined. The approach used for calculating the actual class of the dataset is as shown inTable 1. The median value of the area is taken as M. Small sized vehicles fall below the rangeof 1.4*M (=a), medium sized between 1.4*M and 3.5*M (=b) and the rest is large sizedvehicles. The difference between ‘a’ and ‘b’ value must at least be 2.5 for sample size(number of vehicles) of 50 and above. Experimental results obtained are shown in Table 2. Tables 3 and 4 show thematching matrix for SOM and k-means respectively. The results of the clustering are storedin .mat files, which are used for retrieval. Depending on the users’ selection, the vehicles aredisplayed.PERFORMANCE MEASURESACCURACY Accuracy is the overall correctness of the model and is calculated as the sum ofcorrect classifications divided by the total number of classifications. Accuracy = P/C (4)Where P is the total no. of predicted classifications, and C is the total no. of actualclassifications 426
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEPRECISIONPrecision is a measure of the accuracy provided that a specific class has been predicted. It isdefined by: Precision = tp/(tp + fp) (5)where tp and fp are the numbers of true positive and false positive predictions for theconsidered class. The result is always between 0 and 1.Vehicles 32 54 75 100-> SOM K means SOM K means K means SOM K means SOMAccuracy 29/32 29/32 50/54 66/75 50/54 66/75 85/100 97/100 =90.62% =90.62% =92.59% =88% =92.59% =88% =85% =97%Precision 18/19 18/19 31/35 42/51 31/35 42/51 54/69 55/58Small =94.73% =94.73% =88.57% =82.35% =88.57% =82.35% =78.26 % =94.82 %Precision 5/5 5/5 12/12 17/17 12/12 17/17 24/24 35/35Medium =100% =100% =100% =100% =100% =100% =100% =100%Precision 6/8 6/8 7/7 7/7 7/7 7/7 7/7 7/7Large =75% =75% =100% =100% =100% =100% =100% =100% Table 1: Experimental results SOM K-means Class after Small Medium Large Small Medium Large Clustering Actual Class Small 55 0 0 54 0 0 Medium 3 35 0 15 24 0 Large 0 0 7 0 0 7 Table 2: Matching Matrix for SOM and K-means for a sample size of 100In regard with the colour identification of the vehicle, for a sample of 100 vehicles,Accuracy = 90/100 = 90%Precision Red = 6/8 = 75%Precision White = 34/39 = 87.17%Precision Grey = 50/53 = 94.33% 427
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME Fig. 2 Size of vehicle is chosen as ‘medium’ and color as ‘white’ and output is displayed.VI. CONCLUSION As shown in the results, the combination of steps involved is unique. The vehicleextraction technique applied is an exclusive method implemented. The discussed methodadopts content-based video retrieval for surveillance videos. The results show that it is anefficient, simple and quick method. In the future, we aim to detect different types of objects,incorporating more features for indexing.REFERENCES[1] Weiming Hu, NianhuaXie, Li Li, XianglinZeng,, Stephen May bank “A Survey on VisualContent-Based Video Indexing and Retrieval,” IEEE Transactions on System, Man, andCybernetics—Part C: ApplicationsandReviews, Vol. 41, No. 6, November 2011.[2] B. D. Lucas and T. Kanade, “An iterative image registration technique with anapplicationto stereo vision,” in IJCAI81, pp. 674–679, 1981.[3] B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artificial Intelligence,vol.17, pp. 185–203, August 1981.[4] B. K. P. Horn and B. G. Schunck, “Determining optical flow: A retrospective,” ArtificialIntelligence, vol. 59, pp. 81–87, Feb. 1993.[5] T. Can and P. Duygulu, “Searching for repeated video sequences,” inProc. ACM MIR’07,Augsburg, Germany, Sept. 28–29, 2007.[6] X. Yang, P. Xue, and Q. Tian, “Automatically discovering unknownshort video repeats,”in Proc. ICASSP’07, Hawai, USA, 2007. 428
    • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME[7] J. S.Boreczky and L.A. Rowe, “Comparisonof video shot boundary detection techniques,”in Proc, SPIE Conf, on Vision, Communication, and Image Processing.[8] Katharina Quast, Matthias Obermann and Andr´eKaup, “Real-Time Moving ObjectDetection In Video Sequences Using Gaussian Mixture Models,” MultimediaCommunications and Signal Processing, University of Erlangen-Nuremberg.[9] Chang Liu, George Chen, Yingdong Ma, Xiankai Chen, Jinwu Wu, “A System forIndexing and Retrieving Vehicle Surveillance Videos,” 4th International Congress on Imageand Signal Processing, 2011.[10] P.M. Daigavane, Preeti R. Bajaj, M.B. Daigavane, “Vehicle Detection and NeuralNetwork Application for Vehicle Classification,” International Conference on ComputationalIntelligence and Communication Systems, 2011.[11] M.BabuRao et al.,” Content Based Image Retrieval using dominant color, texture andshape”, International Journal of Engineering Science and Technology (IJEST), 2011.[12] ZhongweiGuo, Gang Wang, Renan Ma, Yulin Yang, “Battlefield Video Target Mining”,3rd International Congress on Image and Signal Processing, 2010.[13] ManimalaSingha, K.Hemachandran, “Content Based Image Retrieval using Color andTexture”, Signal & Image Processing: An International Journal (SIPIJ) Vol.3, No.1, February2012. 429