Successfully reported this slideshow.

2D/Multi-view Segmentation and Tracking


Published on

My seminar at VISNET-II Summer School, June 15-19, 2009

  • Be the first to comment

2D/Multi-view Segmentation and Tracking

  1. 1. 2D/Multi-view Segmentation and Tracking Prof. Dr. Touradj Ebrahimi Multimedia Signal Processing Group Ecole Polytechnique Fédérale de Lausanne (EPFL)
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>2D segmentation </li></ul><ul><li>2D/3D segmentation and tracking </li></ul><ul><li>VISNET-II multiview tracking </li></ul><ul><li>Unusual events detection based on 2D segmentation and tracking </li></ul><ul><li>Final words </li></ul>
  3. 3. Introduction <ul><li>1D segmentation: </li></ul><ul><ul><li>Shot detection by temporal segmentation </li></ul></ul><ul><li>2D segmentation: </li></ul><ul><ul><li>Spatial segmentation </li></ul></ul><ul><li>3D segmentation/tracking </li></ul><ul><ul><li>Spatial-temporal segmentation </li></ul></ul>
  4. 4. Applications <ul><li>Object-based video coding (MPEG-4) </li></ul><ul><li>Interactive Multimedia </li></ul><ul><ul><li>Video editing </li></ul></ul><ul><ul><li>Hyper video </li></ul></ul><ul><li>Video Surveillance </li></ul><ul><ul><li>detect people in restricted areas </li></ul></ul><ul><ul><li>detect suspicious behavior </li></ul></ul><ul><li>Video/Image Analysis </li></ul><ul><ul><li>extract information from the video </li></ul></ul><ul><ul><li>autonomous driving cars </li></ul></ul><ul><ul><li>medical area: determine size of tumors </li></ul></ul><ul><li>Man-Machine Interface </li></ul><ul><li>Content Indexing, Annotation, Search and Retrieval, … </li></ul><ul><li>… </li></ul>
  5. 5. Segmentation <ul><li>Definition: </li></ul><ul><ul><li>Image segmentation refers to the partition of an image into multiple regions according to some criterion </li></ul></ul><ul><li>Objective: </li></ul><ul><ul><li>what is where? </li></ul></ul><ul><li>Segmentation problem can be very difficult and might require the use of domain knowledge </li></ul>
  6. 6. Regions and objects <ul><li>Two basic concepts: </li></ul>application dependent Semantically meaningful: selection depends on the application Objects Homogeneous according to given criteria (color, motion, texture...): automatically extracted and tracked. Regions
  7. 7. Mathematical formulation <ul><li>Segmentation subdivides an image R into N disjoint regions: </li></ul><ul><li>To each region, we assign a label represented by a gray level or a color </li></ul>
  8. 8. Segmentation techniques <ul><li>Techniques according to the dominant features they use: </li></ul><ul><ul><li>Global knowledge (e.g. thresholding) </li></ul></ul><ul><ul><li>Edge-based segmentation </li></ul></ul><ul><ul><li>Region-based segmentation </li></ul></ul>
  9. 9. Histogram shape analysis <ul><li>Objects have approximately the same gray/color value that differs from the background gray value: </li></ul><ul><ul><li>The resulting histogram is bi-modal </li></ul></ul><ul><ul><li>Threshold: gray value that has minimum histogram value between the two maxima </li></ul></ul>T Observed Background Object
  10. 10. Edge-based segmentation techniques <ul><li>Edge-based segmentation techniques differ in strategies to construct borders and in the amount of prior information: </li></ul><ul><ul><li>Edge relaxation </li></ul></ul><ul><ul><li>Border detection as graph searching </li></ul></ul><ul><ul><li>Border detection as dynamic programming </li></ul></ul><ul><ul><li>Hough transform </li></ul></ul><ul><ul><li>(Geodesic) Snakes </li></ul></ul><ul><ul><li>… </li></ul></ul>
  11. 11. Region-based segmentation <ul><li>Homogeneity is used as the main criterion in region-based segmentation </li></ul><ul><li>Criteria for homogeneity can be based on: gray-level, color, texture, shape, etc. </li></ul><ul><li>Constructed regions must further satisfy the following conditions: </li></ul><ul><ul><li>: Each region should be homogeneous </li></ul></ul><ul><ul><li>:The homogeneity criterion should not be true after merging a region with any adjacent region </li></ul></ul>
  12. 12. Region-based segmentation <ul><li>Several approaches </li></ul><ul><ul><li>Region merging </li></ul></ul><ul><ul><li>Region splitting </li></ul></ul><ul><ul><li>Splitting and merging </li></ul></ul><ul><ul><li>Region growing </li></ul></ul><ul><ul><li>Watershed </li></ul></ul><ul><ul><li>... </li></ul></ul>
  13. 13. Splitting and merging <ul><li>At any step, apply the following procedure: </li></ul><ul><ul><li>Split into four disjoint quadrants any region R i for which H(R i ) = FALSE </li></ul></ul><ul><ul><li>Merge any adjacent regions R j and R k for which P(R j  R k )= TRUE </li></ul></ul><ul><ul><li>Stop when no further merging or splitting is possible </li></ul></ul>
  14. 14. The multiple feature approach <ul><li>An example of a more advanced region-based segmentation algorithm </li></ul><ul><li>The approach takes into account several features: spatial (color, texture, position) and temporal (motion) </li></ul><ul><li>Algorithm that performs the clustering of the pixels into homogeneous regions: </li></ul><ul><ul><li>Fuzzy C-Means </li></ul></ul>
  15. 15. The multiple feature approach <ul><li>Use of multiple features: </li></ul><ul><ul><li>a vector of features for each pixel (“ feature vector ”) </li></ul></ul><ul><ul><li>exploit coherence and redundancies among features at the pixel level </li></ul></ul>Texture Motion (v x , v y ) Color (Y,U,V,R,G,B…) Position (x,y) image
  16. 16. The multiple feature approach <ul><li>What do regions look like in the feature space? </li></ul>motion color texture R 1 R 2 R 3  1  2  3
  17. 17. Fuzzy C-Means <ul><li>The Fuzzy C-Means algorithm: </li></ul><ul><ul><li>Minimize the objective function: </li></ul></ul><ul><ul><ul><li>U : membership matrix </li></ul></ul></ul><ul><ul><ul><li> membership of pixel to cluster i </li></ul></ul></ul><ul><ul><ul><li> centroid of cluster i </li></ul></ul></ul><ul><ul><ul><li> distance between pixel and centroid </li></ul></ul></ul>Fuzzy exponent
  18. 18. Fuzzy C-Means Stability? Initialize membership matrix U Update centroids: minimize objective function J(U,  with constant U Update memberships: minimize objective function J(U,  with constant  begin end Algorithm: yes no
  19. 19. Tracking <ul><li>Semantic level </li></ul><ul><ul><li>Identify objects from background (temporal discontinuities) </li></ul></ul><ul><ul><li>Provide a mask defining the areas containing moving objects </li></ul></ul><ul><ul><li>Use domain knowledge (face, persons, …) </li></ul></ul><ul><li>Region level </li></ul><ul><ul><li>Extracts spatial-temporal homogeneous regions </li></ul></ul>
  20. 20. Multilevel Region-object Tracking Procedure <ul><li>Object partition validation </li></ul><ul><ul><li>Initializing the tracking process </li></ul></ul><ul><ul><li>Decomposing each object into non-overlapping regions </li></ul></ul><ul><li>Data association </li></ul><ul><ul><li>Validating the tracking through region descriptor correspondence </li></ul></ul>
  21. 21. 2D Tracking <ul><ul><li>Object and regions extraction and tracking </li></ul></ul>
  22. 22. Example of 2D segmentation and tracking <ul><li>Based on A. Cavallaro, O. Steiger, and T. Ebrahimi, “Tracking Video Objects in Cluttered Background”, IEEE Trans. on Circuit and Systems for Video Technology, 2005 </li></ul><ul><ul><li>Foreground object extraction </li></ul></ul><ul><ul><li>Object Partitioning </li></ul></ul><ul><ul><li>Extraction of Region Descriptors </li></ul></ul><ul><ul><li>Region Tracking based on Descriptors </li></ul></ul><ul><ul><li>Object Tracking through a top-down and a bottom-up interaction between region and object levels </li></ul></ul>
  23. 23. 2D segmentation and tracking
  24. 24. Typical results of Multilevel Region-object tracking
  25. 25. Typical results of Multilevel Region-object tracking
  26. 26. Multi-view Tracking <ul><li>Geometry-Based methods </li></ul><ul><ul><li>Object correspondence in different views based on Homography transformation </li></ul></ul><ul><li>Color-Based methods </li></ul><ul><ul><li>Object correspondence in different views based on matching the color of different regions </li></ul></ul><ul><li>Hybrid methods </li></ul><ul><ul><li>Mix information about the geometry and the visual appearance </li></ul></ul>
  27. 27. Overview of a multi-view tracking system developed in VISNET-II
  28. 28. Consistent Object Labeling <ul><ul><li>Assign the same label to objects through time and across camera views </li></ul></ul>
  29. 29. Object Consistency Verification <ul><ul><li>Stability of objects through time </li></ul></ul>O i (n+1) O i (n) R i,j (n) R i,j (n+1)
  30. 30. Objects Correspondence <ul><li>Assumptions </li></ul><ul><ul><li>Cameras are calibrated </li></ul></ul><ul><ul><li>Moving objects are constrained to move along a dominant ground plane </li></ul></ul><ul><li>Given at least four corresponding points between two views, the Homography transform can be estimated </li></ul>
  31. 31. Objects Correspondence View 1 View 2 Homography transform of View 1 to View 2
  32. 32. Transfer Error <ul><li>Error between correspondent objects and their expected projection according to the Homography transform </li></ul><ul><ul><li>If TE < T , the pair x and x’ are considered as a potential match </li></ul></ul><ul><ul><li>Create a list of potential matches </li></ul></ul>
  33. 33. Correspondence Verification <ul><li>Find objects correspondence, handle splitting and occlusion across views </li></ul><ul><ul><li>Region descriptors: gravity center, histogram, texture </li></ul></ul><ul><ul><li>One-to-one correspondence (based on transfer error) </li></ul></ul><ul><ul><li>Each object receives the same label </li></ul></ul><ul><ul><li>One-to-many correspondence (based on transfer error) </li></ul></ul><ul><ul><li>Homography transform of each region from one view to another is computed </li></ul></ul><ul><ul><li>The distance between regions of two views is computed </li></ul></ul><ul><ul><li>Minimum mean square error is applied to find the best match of regions between two views </li></ul></ul>
  34. 34. Results
  35. 35. Results
  36. 36. <ul><ul><li>Unusual events </li></ul></ul><ul><ul><li>Small group of events that deviat e from the normal behavior </li></ul></ul><ul><ul><li>Rare - compared to usual events </li></ul></ul><ul><ul><li>Unpredictable - not considered in advance </li></ul></ul><ul><ul><li>Large interest in automatic and smart video-based surveillance system which would not require human intervention </li></ul></ul><ul><ul><li>Trajectory based event for which velocity ratio between normal and unusual events is similar to the ratio in the training sequence </li></ul></ul>Motivation for Unusual Events Detection
  37. 37. <ul><ul><li>Examples </li></ul></ul><ul><ul><li>Vehicle driving on the wrong side of a road </li></ul></ul><ul><ul><li>Person running in an area where one expects people walk </li></ul></ul><ul><ul><li>Careless driving </li></ul></ul><ul><ul><li>Applications – Video surveillance </li></ul></ul><ul><ul><li>Parkings </li></ul></ul><ul><ul><li>Metro or bus stations </li></ul></ul>Motivation for Unusual Events Detection <ul><ul><li>Banks and airport lobbies </li></ul></ul><ul><ul><li>Shopping malls </li></ul></ul>
  38. 38. <ul><ul><li>Three specific problems </li></ul></ul><ul><ul><li>Modeling trajectories with reduced dimensionality </li></ul></ul><ul><ul><ul><li>PCA, ICA, HMM, … </li></ul></ul></ul><ul><ul><li>Distance measure between trajectories </li></ul></ul><ul><ul><ul><li>Euclidian, Hausdorff distance, Longest Common Subsequence, ... </li></ul></ul></ul><ul><ul><li>Trajectory clustering </li></ul></ul><ul><ul><ul><li>graph cuts, k-medoids, spectral clustering, mean-shift clustering, ... </li></ul></ul></ul><ul><ul><li>Different features for unusual event detection </li></ul></ul><ul><ul><li>Trajectory based scene analysis ( coordinates of the object) </li></ul></ul><ul><ul><li>Frame based features (number of objects, size, color histograms, …) </li></ul></ul>State-of-the-art
  39. 39. <ul><ul><li>Testing phase can be performed on sequences from the same scene as for the training phase, or from different scenes </li></ul></ul>System Overview
  40. 40. <ul><ul><li>Trajectory representation </li></ul></ul><ul><ul><li>Pre-processing techniques </li></ul></ul><ul><ul><li>Hole filling: Bresenham’s line-drawing algorithm </li></ul></ul><ul><ul><li>Smoothing: Savitzky-Golay filter (4th order of polynomial fit, 21 points of window length) </li></ul></ul><ul><ul><li>Scaling to achieve velocity and acceleration invariance </li></ul></ul>Technical Approach
  41. 41. <ul><ul><li>High-level robust features </li></ul></ul><ul><ul><li>Velocity </li></ul></ul><ul><ul><li>Acceleration </li></ul></ul><ul><ul><li>Differentiation performed by Savitzky-Golay filter </li></ul></ul><ul><ul><li>Re-sampling the smoothed trajectory at 128 spatially equidistant points </li></ul></ul><ul><ul><li>Feature vector extraction </li></ul></ul>Technical Approach (cont.)
  42. 42. <ul><ul><li>Training - Support Vector Machine (SVM) classifier </li></ul></ul><ul><ul><li>Input: feature vectors </li></ul></ul><ul><ul><li>class labels: +1 = normal event </li></ul></ul><ul><ul><li> – 1 = unusual event </li></ul></ul><ul><ul><li>Scaling data: </li></ul></ul><ul><ul><li>Linear kernel </li></ul></ul><ul><ul><li>Cross validation is used to identify good parameters of the hyperplane </li></ul></ul><ul><ul><li>Output: SVM model represented by support vectors </li></ul></ul>Technical Approach (cont.)
  43. 43. <ul><ul><li>Testing </li></ul></ul><ul><ul><li>Trajectories extraction and pre-processing </li></ul></ul><ul><ul><li>Feature extraction, </li></ul></ul><ul><ul><li>Cross scaling: </li></ul></ul><ul><ul><li>Off-line classification </li></ul></ul><ul><ul><li>Support Vector Machine model </li></ul></ul><ul><ul><li>Probability distribution </li></ul></ul><ul><ul><ul><li>“ unusual” class: </li></ul></ul></ul><ul><ul><ul><li>normal class: </li></ul></ul></ul>Technical Approach (cont.)
  44. 44. <ul><ul><li>Standard video sequences from PETS dataset </li></ul></ul><ul><ul><li>S1: PETS2001, 26 trajectories, 768×576, 25Hz </li></ul></ul><ul><ul><li>S2: PETS2001, 24 trajectories, 768×576, 25Hz </li></ul></ul><ul><ul><li>S3: PETS2006, 41 trajectories, 720×576, 25Hz </li></ul></ul><ul><ul><li>S4: PETS2006, 49 trajectories, 720×576, 25Hz </li></ul></ul>Experiments and Results <ul><ul><li>S1 </li></ul></ul><ul><ul><li>S2 </li></ul></ul><ul><ul><li>S3 </li></ul></ul><ul><ul><li>S4 </li></ul></ul>
  45. 45. <ul><ul><li>Main goal : show that it is possible to train the system with one or more sequences and use the resulting model for testing with other sequences (different scenes and scenarios) </li></ul></ul>Experiments and Results (cont.)
  46. 46. <ul><ul><li>Four test cases </li></ul></ul>Experiments and Results (cont.) Testing sequence Video duration (Number of frames) Number of unusual trajectories (Number of trajectories) Average unusual trajectory length in number of frames (Average trajectory length) Training sequence Unusual events detection rate False alarms S1 4 min 25 sec (6642) 5 (26) 135 (477) S1 (2-fold cross-validation) 2/2 none S2 3 min 49 sec (5752) 2 (24) 218 (450) S1 2/2 none S3 1 min 50 sec (2551) 8 (41) 112 (222) S1 8/8 2 S4 1 min 42 sec (2556) 8 (49) 80 (164) S2 7/8 4
  47. 47. <ul><ul><li>When objects are far away from the camera, it is not possible to accurately determinate their velocity since we do not use projection on the ground-plane </li></ul></ul><ul><ul><li>In order to recognize, detect and analyze behaviors of other objects (e.g., metro), it is necessary to investigate usage of other features, such as </li></ul></ul><ul><ul><li>size, </li></ul></ul><ul><ul><li>dominant color, </li></ul></ul><ul><ul><li>texture of the object, etc. </li></ul></ul>Challenges
  48. 48. <ul><li>A complete 2D and multiview segmentation and tracking developed for generic applications within VISNET-II NoE resulting in very good performance </li></ul><ul><li>An unusual event detection added to the above for video surveillance applications with competitive results </li></ul>Conclusions
  49. 49. Thanks for your attention ! Questions, discussions, … Acknowledgements goes to my past and present PhD students who have contributed and continue to contribute to this work: Andrea Cavallaro, Olivier Steiger, Emrullah Durucan, Yousri Abdeljaoued, Ivan Ivanov, as well as Gelareh Mohammadi (research assistant in 2008)