2009.06.09 chris poppe - public PhD defense

1,430 views
1,306 views

Published on

Chris Poppe's public PhD defense entitled: "Detection and Representation of Moving Objects for Video Surveillance", 9th of June, 2009.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,430
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
47
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Openingszin: dank u mijnheer de voorzitter, geachte leden van de examencommissie, beste collega’s, vrienden en familie. Gedurende de volgende 45minuten…
  • Meer ingaan op h.264/avc
  • Nummering met multimodaal
  • Tot slot geef ik nog mee dat dit onderzoek geleid heeft tot…
  • 2009.06.09 chris poppe - public PhD defense

    1. 1. Detectie en representatie van bewegende objecten voor videobewaking Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Multimedia Lab Department of Electronics and Information Systems Faculty of Engineering Ghent University Supervisor: prof. dr. ir. Rik Van de Walle
    2. 2. Outline <ul><li>Introduction: Context and Problem Description </li></ul><ul><li>Detection of Moving Objects in the Pixel Domain </li></ul><ul><li>Detection of Moving Objects in the Compressed Domain </li></ul><ul><li>Metadata: Representing Moving Objects </li></ul><ul><li>Conclusions </li></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    3. 3. Introduction: Video Surveillance <ul><li>“ Usage of a video camera to act upon crime” </li></ul><ul><li>Number of cameras and surveillance systems has grown </li></ul><ul><ul><li>2004: 4 285 000 cameras in United Kingdom </li></ul></ul><ul><li>Operators have problems to interpret the increasing amount of data </li></ul><ul><ul><li>need for intelligent video surveillance systems </li></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    4. 4. Introduction: Intelligent Video Surveillance System Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 video video + metadata encoding analytics storage visualization
    5. 5. Introduction: Video Surveillance <ul><li>Automated analysis of the video to make intelligent decisions </li></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 person1 person2 intruder alert!!! <ul><ul><li>detection of moving objects </li></ul></ul><ul><ul><li>tracking </li></ul></ul><ul><ul><li>classification </li></ul></ul><ul><ul><li>identification </li></ul></ul><ul><ul><li>interpretation </li></ul></ul>analytics
    6. 6. Introduction: Moving Object Detection <ul><li>Detection of moving objects first step in video analytics </li></ul><ul><ul><li>needs to be fast and accurate </li></ul></ul><ul><li>Classify each pixel in the image as foreground or background </li></ul><ul><li>Current techniques </li></ul><ul><ul><li>good for “simple” situations </li></ul></ul><ul><ul><li>problems with moving trees, changing lighting conditions, environmental conditions, … </li></ul></ul><ul><li>Goal </li></ul><ul><ul><li>fast and robust detection of moving objects </li></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 analytics
    7. 7. Introduction: Moving Object Representation <ul><li>Analytics extracts information (e.g., moving objects) from video </li></ul><ul><ul><li>represented using standardized formats (metadata standards) </li></ul></ul><ul><li>Large video surveillance systems contain several analytics modules </li></ul><ul><ul><li>same information can be represented using different formats </li></ul></ul><ul><li>To retrieve relevant information (e.g., find all moving objects) a common understanding of this information is needed </li></ul><ul><li>Goal </li></ul><ul><ul><li>provide means to combine different metadata standards </li></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 metadata standard analytics information
    8. 8. Outline <ul><li>Introduction: Context and Problem Description </li></ul><ul><li>Detection of Moving Objects in the Pixel Domain </li></ul><ul><li>Detection of Moving Objects in the Compressed Domain </li></ul><ul><li>Metadata: Representing Moving Objects </li></ul><ul><li>Conclusions </li></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    9. 9. Moving Object Detection in the Pixel Domain <ul><li>Background subtraction </li></ul><ul><ul><li>create a background model for each pixel </li></ul></ul><ul><ul><li>compare new images with the background model </li></ul></ul><ul><ul><li>large differences result in foreground objects </li></ul></ul><ul><li>Different background models have been proposed in the literature </li></ul><ul><ul><li>previous value, average value, … </li></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 background model new image result - =
    10. 10. Moving Object Detection in the Pixel Domain <ul><li>Problems with background subtraction </li></ul><ul><ul><li>moving trees, opened or closed doors, construction works, … </li></ul></ul><ul><ul><ul><li>single static model is insufficient </li></ul></ul></ul><ul><ul><li>noise, weather conditions, shadows, … </li></ul></ul><ul><ul><ul><li>model needs to accommodate for such situations </li></ul></ul></ul><ul><ul><li>parked car </li></ul></ul><ul><ul><ul><li>need to gather information on background and foreground </li></ul></ul></ul><ul><li>Solution: multimodal background subtraction </li></ul><ul><ul><li>multiple models per pixel </li></ul></ul><ul><ul><li>each model contains several dynamic parameters </li></ul></ul><ul><ul><li>model can represent both background and foreground </li></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 <ul><li>background model </li></ul><ul><li>noise statistics </li></ul><ul><li>previous value </li></ul><ul><li>average value </li></ul><ul><li>background model </li></ul><ul><li>noise statistics </li></ul><ul><li>previous value </li></ul><ul><li>average value </li></ul><ul><li>foreground model </li></ul><ul><li>noise statistics </li></ul><ul><li>previous value </li></ul><ul><li>average value </li></ul>
    11. 11. Multimodal Background Subtraction model 2 Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 model 1 model 3 <ul><ul><li>For each new image </li></ul></ul><ul><ul><li>compare pixel value with the models </li></ul></ul><ul><ul><ul><li>find a match with one of the models </li></ul></ul></ul><ul><ul><li>adapt the parameters of the models </li></ul></ul><ul><ul><li>decision based on the matched model </li></ul></ul><ul><li>background model </li></ul><ul><li>noise statistics </li></ul><ul><li>previous value </li></ul><ul><li>average value </li></ul><ul><li>background model </li></ul><ul><li>noise statistics </li></ul><ul><li>previous value </li></ul><ul><li>average value </li></ul><ul><li>foreground model </li></ul><ul><li>noise statistics </li></ul><ul><li>previous value </li></ul><ul><li>average value </li></ul><ul><ul><li>pixel is background </li></ul></ul>
    12. 12. Multimodal Background Subtraction <ul><li>Each pixel in the image has been classified as foreground or background </li></ul><ul><li>Problem of “camouflage” </li></ul><ul><ul><li>moving objects can contain parts that resemble the environment </li></ul></ul><ul><li>Only using temporal information is not sufficient </li></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    13. 13. Spatio-Temporal Multimodal Background Subtraction <ul><li>Use spatial information to improve the temporal background subtraction </li></ul><ul><ul><li>spatial segmentation </li></ul></ul><ul><ul><ul><li>edge detection </li></ul></ul></ul><ul><ul><ul><li>fill the segments </li></ul></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    14. 14. Spatio-Temporal Multimodal Background Subtraction <ul><li>Combine spatial segmentation with temporal detection </li></ul><ul><ul><li>segments containing many foreground pixels are entirely regarded as foreground </li></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 spatio-temporal temporal spatial
    15. 15. Evaluation: Objective Results <ul><li>Precision : How much of the detected foreground pixels are correct? </li></ul><ul><li>Recall : How much of the real foreground pixels are detected? </li></ul><ul><li>Apply algorithm on video sequence and count correct and wrong detections </li></ul><ul><ul><li>calculate precision and recall value </li></ul></ul><ul><li>Good systems obtain high precision and recall </li></ul><ul><li>Different parameter of an algorithm gives different outputs </li></ul><ul><ul><li>vary parameters </li></ul></ul><ul><ul><li>calculate precision and recall values </li></ul></ul><ul><ul><li>represent on a graph </li></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    16. 16. Evaluation: Objective Results <ul><li>Compare proposed algorithm with similar techniques </li></ul><ul><ul><li>Stauffer (2001), Shan (2006) </li></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    17. 17. Evaluation: Subjective Results <ul><li>Visual examples of output of different algorithms </li></ul>input image ground truth Stauffer ‘01 Shan ‘06 proposed
    18. 18. <ul><li>Proposed system is faster than related work </li></ul><ul><li>Spatial segmentation can occur in parallel with temporal detection </li></ul><ul><ul><li>processing speed can be increased </li></ul></ul>Evaluation: Execution Times Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 Sequence Stauffer’01(fps) proposed (fps) temporal (fps) spatial (fps) PetsD2TeC2 (384x288) 8.33 10 29.4 18.2 Indoor (340x240) 9.5 15.4 45.5 30 Ismail (320x240) 9.7 14.9 71.4 29.4 ThirdView (720x576) 1.1 2.3 3.6 7.7
    19. 19. Outline <ul><li>Introduction: Context and Problem Description </li></ul><ul><li>Detection of Moving Objects in the Pixel Domain </li></ul><ul><li>Detection of Moving Objects in the Compressed Domain </li></ul><ul><li>Metadata: Representing Moving Objects </li></ul><ul><li>Conclusions </li></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    20. 20. Moving Object Detection in the Compressed Domain <ul><li>Video is encoded to reduce network traffic and storage cost </li></ul><ul><li>Video coding exploits redundancy in video </li></ul><ul><ul><li>neighboring pixels often have similar values </li></ul></ul><ul><ul><li>successive images are closely related </li></ul></ul><ul><li>Before video analytics can be applied a decoding step is needed </li></ul><ul><li>Apply analytics directly on the compressed bit stream </li></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 encoding analytics
    21. 21. H.264/AVC <ul><li>Block-based video coding (standardized 2003) </li></ul><ul><ul><li>frame divided into macroblocks (MBs) of 16x16 pixels </li></ul></ul><ul><ul><li>MBs are predicted based on previously encoded data </li></ul></ul><ul><ul><li>difference between prediction and MB is further encoded </li></ul></ul><ul><ul><ul><li>motion vector is stored in the bit stream to point to the prediction </li></ul></ul></ul><ul><li>Current object detection techniques are based on motion vectors </li></ul><ul><ul><li>motion vectors are created to compress, not to represent the real motion </li></ul></ul><ul><ul><li>processing/filtering needed to deal with noisy motion vectors </li></ul></ul><ul><li>Search for new approach </li></ul>motion vectors
    22. 22. Observations <ul><li>Size of a MB (number of bits used within the compressed bit stream) changes over several consecutive images </li></ul><ul><ul><li>MBs corresponding to background use few bits (frame 0 to 90) </li></ul></ul><ul><ul><li>if moving object passes the size of the MB rises (frame 90 to 120) </li></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    23. 23. <ul><li>Background model for each MB </li></ul><ul><ul><li>training period </li></ul></ul><ul><ul><li>determine maximum size </li></ul></ul><ul><li>Threshold T </li></ul><ul><li>Compare MB sizes </li></ul><ul><li>with maximum + T </li></ul><ul><ul><li>MBs with large sizes are considered foreground </li></ul></ul>MB-based Background Subtraction Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 T
    24. 24. (sub)MB-based Background Subtraction <ul><li>MBs can be coarse (16x16 pixels) </li></ul><ul><li>H.264/AVC divides MBs into subMBs (4x4 pixels) </li></ul><ul><li>Refine the MB output to subMB level </li></ul><ul><ul><li>only regard foreground MBs at the boundaries of moving object </li></ul></ul><ul><ul><li>analyze the size (in bits) of the subMBs in these boundary MBs </li></ul></ul><ul><ul><li>small subMBs are regarded as background </li></ul></ul>
    25. 25. Evaluation: Objective comparison Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 <ul><li>Precision : How much of the detected foreground pixels are correct? </li></ul><ul><li>Recall : How much of the real foreground pixels are detected? </li></ul><ul><li>Comparison with Zeng (2005) (based on motion vectors) </li></ul>
    26. 26. Evaluation: Execution Times <ul><li>Very high execution speeds </li></ul><ul><ul><li>up to 20x faster than the related work </li></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 Sequence Zeng’05 (fps) proposed (fps) Etri od A (352x240) 28 662 PetsD2TeC2 (384x288) 22 448 Indoor (340x240) 31 751
    27. 27. Evaluation: Subjective Results <ul><li>Demonstration </li></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    28. 28. Outline <ul><li>Introduction: Context and Problem Description </li></ul><ul><li>Detection of Moving Objects in the Pixel Domain </li></ul><ul><li>Detection of Moving Objects in the Compressed Domain </li></ul><ul><li>Metadata: Representing Moving Objects </li></ul><ul><li>Conclusions </li></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    29. 29. <ul><li>Metadata is “ data about data” </li></ul><ul><ul><li>data about detected object: size, color, bounding box, … </li></ul></ul><ul><li>Metadata standard </li></ul><ul><ul><li>common agreement on the format of the metadata </li></ul></ul><ul><li>Several metadata standards exist for video surveillance </li></ul><ul><ul><li>modules can use different standards </li></ul></ul><ul><ul><li>same information can be represented in different formats </li></ul></ul>Metadata: Representing Moving Objects Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 metadata metadata standard A metadata metadata standard B analytics1 analytics2 metadata metadata standard B
    30. 30. Metadata: Representing Moving Objects <ul><ul><li>Metadata standards </li></ul></ul><ul><ul><li>XML (eXtensible Markup Language) </li></ul></ul><ul><ul><ul><li>describes terms and structure of metadata </li></ul></ul></ul><ul><ul><li>specification </li></ul></ul><ul><ul><ul><li>textual description of the semantics of the XML elements </li></ul></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 < object id=“ 0 ” > < box xc=“ 77 ” yc=“ 73 ” w=“ 21 ” h=“ 16 ” /> </ object > Box: “Coordinates of the centre and the dimensions of the bounding box of a detected object in pixels.” metadata example 1 CVML (Computer Vision Markup Language) < LLID =“ LLID1 ” >< Mask > < BB mp7:dim = “ 4 ” > 67 65 88 91 </ BB > </ Mask > </ LLID > BB: “Coordinates of a rectangular segment.” metadata example 2 VS7 (Video Surveillance Schema)
    31. 31. Metadata: Representing Moving Objects <ul><li>Proposal: use Semantic Web Technologies </li></ul><ul><ul><li>make information on the internet accessible for machines </li></ul></ul><ul><ul><li>information in a domain is structured using an ontology </li></ul></ul><ul><ul><ul><li>a data model that represents a set of concepts and relations amongst these concepts within a specific domain </li></ul></ul></ul><ul><li>OWL (Web Ontology Language) </li></ul><ul><ul><li>W3C Recommendation (2004) </li></ul></ul><ul><ul><li>standardized language for the description of an ontology </li></ul></ul><ul><ul><ul><li>classes, properties and relations </li></ul></ul></ul><ul><ul><ul><li>Individuals or instances </li></ul></ul></ul><ul><ul><li>can be queried through standardized languages </li></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    32. 32. Metadata: Representing Moving Objects <ul><li>Example: ontology for domain of science </li></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 subClassOf birth date DatatypeProperty Person Class: Person Class: Scientist Scientist Individual birth date “ 14/10/1801” <ul><li>OWL constructs </li></ul><ul><li>Class </li></ul><ul><li>DatatypeProperty </li></ul><ul><li>subClassOf </li></ul><ul><li>Individual </li></ul><ul><li>… </li></ul>“ Joseph Plateau”
    33. 33. <ul><li>Create OWL ontologies for the metadata standards used in video surveillance </li></ul><ul><ul><li>CVML, VS7, MPEG-7, … </li></ul></ul><ul><li>Mappings link the different ontologies </li></ul><ul><ul><li>use OWL constructs to link classes </li></ul></ul><ul><ul><li>denote that classes in the different ontologies can be the same </li></ul></ul><ul><li>Information in different formats is linked </li></ul><ul><ul><li>however, metadata can be very technical or general </li></ul></ul>Metadata: Representing Moving Objects Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 OWL ontology CVML OWL ontology VS7 OWL ontology MPEG7 …
    34. 34. <ul><li>One global ontology with general concepts for video surveillance </li></ul><ul><li>Link with metadata ontologies through mappings </li></ul><ul><li>Layered metadata model </li></ul><ul><li>Only need to know the upper ontology to retrieve information (e.g., retrieve all images with moving objects) </li></ul>Metadata: Representing Moving Objects upper layer lower layer Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009 OWL ontology Video Surveillance OWL ontology CVML OWL ontology VS7 OWL ontology MPEG7 …
    35. 35. Evaluation: Practical Use Case Scenario <ul><li>Scenario </li></ul><ul><ul><li>“ operator wants to retrieve images that contain moving objects” </li></ul></ul><ul><ul><li>analytics module 1 detects objects in CVML (XML) </li></ul></ul><ul><ul><li>analytics module 2 detects objects in VS7 (XML) </li></ul></ul><ul><li>Proposed </li></ul><ul><ul><li>XML fragments are automatically converted to OWL instances </li></ul></ul><ul><ul><li>through the mappings these instances are linked to each other and to the Video Surveillance Ontology </li></ul></ul><ul><ul><li>operator can use standardized languages to query the Video Surveillance Ontology </li></ul></ul><ul><li>Related work </li></ul><ul><ul><li>specific software written to interpret CVML and VS7 </li></ul></ul><ul><ul><li>specific software written to “translate” the operator’s request to the corresponding XML elements </li></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    36. 36. Outline <ul><li>Introduction : Context and Problem Description </li></ul><ul><li>Detection of Moving Objects in the Pixel Domain </li></ul><ul><li>Detection of Moving Objects in the Compressed Domain </li></ul><ul><li>Metadata: Representing Moving Objects </li></ul><ul><li>Conclusions </li></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    37. 37. Conclusions <ul><li>Algorithm for the detection of moving objects in pixel domain </li></ul><ul><ul><li>multimodal background subtraction technique </li></ul></ul><ul><ul><li>combines spatial and temporal information </li></ul></ul><ul><ul><li>evaluated by comparison with related work </li></ul></ul><ul><ul><ul><li>more robust detection </li></ul></ul></ul><ul><ul><ul><li>faster execution speeds </li></ul></ul></ul><ul><li>Algorithm for detection of moving objects in the compressed domain </li></ul><ul><ul><li>novel approach that disregards motion vectors </li></ul></ul><ul><ul><li>macroblock-based background subtraction </li></ul></ul><ul><ul><li>evaluated by comparison with related work </li></ul></ul><ul><ul><ul><li>better detection results (very high precision) </li></ul></ul></ul><ul><ul><ul><li>up to 20 times faster than the related work </li></ul></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    38. 38. Conclusions <ul><li>Metadata for the representation of moving objects </li></ul><ul><ul><li>discussed problems of the usage of different XML-based metadata standards </li></ul></ul><ul><ul><li>introduction of Semantic Web Technologies </li></ul></ul><ul><ul><li>layered metadata model </li></ul></ul><ul><ul><ul><li>upper Video Surveillance Ontology </li></ul></ul></ul><ul><ul><ul><li>lower layer with pool of metadata ontologies </li></ul></ul></ul><ul><ul><ul><li>links defined using mappings </li></ul></ul></ul><ul><ul><li>evaluation based on practical use case scenario </li></ul></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009
    39. 39. Publications <ul><li>First author of 3 publications recorded in SCI (A1) </li></ul><ul><ul><li>Robust Spatio-Temporal Multimodal Background Subtraction for Video Surveillance </li></ul></ul><ul><ul><ul><li>Optical Engineering </li></ul></ul></ul><ul><ul><li>Moving Object Detection in the H.264/AVC Compressed Domain for Video Surveillance Applications </li></ul></ul><ul><ul><ul><li>Journal of Visual Communication & Image Representation </li></ul></ul></ul><ul><ul><li>Personal Content Management System, a Semantic Approach </li></ul></ul><ul><ul><ul><li>Journal of Visual Communication & Image Representation </li></ul></ul></ul><ul><li>Co-author of 1 publication recorded in SCI (A1) </li></ul><ul><li>17 articles at international conferences </li></ul><ul><li>5 standardization contributions </li></ul>Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Ghent, Belgium – June 9 2009

    ×