Real-time Detection and Tracking of Multiple Objects with Partial Decoding in H.264/AVC Bitstream Domain Wonsang You University of Augsburg, Germany Electronic Imaging, 19 January 2009
MOTIVATION Real-time Object Detection and Tracking in H.264|AVC Bitstream
Pixel Domain Approach Categories of Object Detection and Tracking Approaches. Pixel domain approach Compressed domain approach Pixel domain approach. Using raw pixel data High accuracy High computational complexity Require additional computation for compressed videos Compressed domain approach Exploit encoded information (DCT, motion vectors, etc) Poor performance Applicable for simple scenarios Weak for occlusion
Compressed Domain Approach Basic idea Exploit encoded information (DCT, motion vectors, etc) Advantages Remarkably fast processing time Adaptive to compressed videos Disadvantages Unreliability of encoded information Sparse assignment of block-based data Poor performance Applicable for simple scenarios Weak for occlusion
Related Works in Compressed Approach Basic Solution Using a low-resolution image from DCT coefficients Unfortunately, impossible for AVC bitstreams DC
Our Solution for H.264/AVC Bitstreams Basic idea We use  partially-decoded pixel data  instead of low-resolution images. Advantages Reliable performance in more natural scenes Articulated objects such as humans Objects changing in size Objects which have monotonous color or a chaotic set of motion vectors Occlusion handling Detecting and tracking multiple objects in stationary background Real-time processing Partial decoding in I-frames It has been considered to be impossible Due to spatial prediction dependency on neighboring blocks
Overview of the Proposed Algorithm Extraction Phase Probabilistic Spatiotemporal Macroblock Filtering Roughly  extracting the block-level region of objects Constructing the  approximate  object trajectories in each P-frame Refinement Phase Accurately  refining the obect trajectories Background subtraction and partial decoding in I-frames Motion interpolation in P-frames
EXTRACTION PHASE Real-time Object Detection and Tracking in H.264|AVC Bitstream
Probabilistic Spatiotemporal Macroblock Filtering Probabilistic Spatiotemporal Macroblock Filtering Block-based filtering of background parts (BGs) By using spatial and temporal properties of macroblocks Rapid processing of segmenting object regions and tracking each object
Block Clustering Block clustering Removing skip macroblocks Eliminating probable background parts Clustering the remaining MBs into several fragments Block group (BG) Set of non-skip blocks BGs
Spatial Filtering Filtering block groups which are likely to be background Removing BGs of  One-macroblock All zero IT coefficients Active Block Group (ABG) Remaining BGs after spatial filtering ABGs : Remaining BGs after Spatial Filtering
Temporal Filtering Filtering ABGs which are likely to be background Removing ABGs of background Based on temporal consistency of each ABG over a given period Fragments with high occurrence probability: considered as a part of objects Remaining ABGs after Temporal Filtering
Temporal Filtering Observing occurrence of ABGs during a finite period ABGs with high occurrence for finite period are judged as "Real Object". Occurrence Probability  is measured. ABGs
Temporal Filtering ABGs Criteria for survival of ABG as an object
REFINEMENT PHASE Real-time Object Detection and Tracking in H.264|AVC Bitstream
Background Subtraction in I-frames Reference Blocks (A-D) are substituted into background image Partial Decoding in I-frames ROI Refinement in I-frames A B C D
Motion Interpolation in P-frames Assumption : The object moves slowly nearly with uniform motion in one GOP ROI Refinement in P-frames In the ROI prediction stage, ROI significantly vary over P-frames. So, ROI refinement is needed for P-frames.
Occlusion Handling Comparing Hue color histogram of two objects
Experimental Results (1/3) Indoor Sequence: 49.5 frames/sec Ourdoor Sequence: 37.12 frames/sec
Experimental Results (2/3)
Experimental Results (3/3)
Thank You! Wonsang You [email_address] University of Augsburg Germany

Real-time Object Tracking

  • 1.
    Real-time Detection andTracking of Multiple Objects with Partial Decoding in H.264/AVC Bitstream Domain Wonsang You University of Augsburg, Germany Electronic Imaging, 19 January 2009
  • 2.
    MOTIVATION Real-time ObjectDetection and Tracking in H.264|AVC Bitstream
  • 3.
    Pixel Domain ApproachCategories of Object Detection and Tracking Approaches. Pixel domain approach Compressed domain approach Pixel domain approach. Using raw pixel data High accuracy High computational complexity Require additional computation for compressed videos Compressed domain approach Exploit encoded information (DCT, motion vectors, etc) Poor performance Applicable for simple scenarios Weak for occlusion
  • 4.
    Compressed Domain ApproachBasic idea Exploit encoded information (DCT, motion vectors, etc) Advantages Remarkably fast processing time Adaptive to compressed videos Disadvantages Unreliability of encoded information Sparse assignment of block-based data Poor performance Applicable for simple scenarios Weak for occlusion
  • 5.
    Related Works inCompressed Approach Basic Solution Using a low-resolution image from DCT coefficients Unfortunately, impossible for AVC bitstreams DC
  • 6.
    Our Solution forH.264/AVC Bitstreams Basic idea We use partially-decoded pixel data instead of low-resolution images. Advantages Reliable performance in more natural scenes Articulated objects such as humans Objects changing in size Objects which have monotonous color or a chaotic set of motion vectors Occlusion handling Detecting and tracking multiple objects in stationary background Real-time processing Partial decoding in I-frames It has been considered to be impossible Due to spatial prediction dependency on neighboring blocks
  • 7.
    Overview of theProposed Algorithm Extraction Phase Probabilistic Spatiotemporal Macroblock Filtering Roughly extracting the block-level region of objects Constructing the approximate object trajectories in each P-frame Refinement Phase Accurately refining the obect trajectories Background subtraction and partial decoding in I-frames Motion interpolation in P-frames
  • 8.
    EXTRACTION PHASE Real-timeObject Detection and Tracking in H.264|AVC Bitstream
  • 9.
    Probabilistic Spatiotemporal MacroblockFiltering Probabilistic Spatiotemporal Macroblock Filtering Block-based filtering of background parts (BGs) By using spatial and temporal properties of macroblocks Rapid processing of segmenting object regions and tracking each object
  • 10.
    Block Clustering Blockclustering Removing skip macroblocks Eliminating probable background parts Clustering the remaining MBs into several fragments Block group (BG) Set of non-skip blocks BGs
  • 11.
    Spatial Filtering Filteringblock groups which are likely to be background Removing BGs of One-macroblock All zero IT coefficients Active Block Group (ABG) Remaining BGs after spatial filtering ABGs : Remaining BGs after Spatial Filtering
  • 12.
    Temporal Filtering FilteringABGs which are likely to be background Removing ABGs of background Based on temporal consistency of each ABG over a given period Fragments with high occurrence probability: considered as a part of objects Remaining ABGs after Temporal Filtering
  • 13.
    Temporal Filtering Observingoccurrence of ABGs during a finite period ABGs with high occurrence for finite period are judged as "Real Object". Occurrence Probability is measured. ABGs
  • 14.
    Temporal Filtering ABGsCriteria for survival of ABG as an object
  • 15.
    REFINEMENT PHASE Real-timeObject Detection and Tracking in H.264|AVC Bitstream
  • 16.
    Background Subtraction inI-frames Reference Blocks (A-D) are substituted into background image Partial Decoding in I-frames ROI Refinement in I-frames A B C D
  • 17.
    Motion Interpolation inP-frames Assumption : The object moves slowly nearly with uniform motion in one GOP ROI Refinement in P-frames In the ROI prediction stage, ROI significantly vary over P-frames. So, ROI refinement is needed for P-frames.
  • 18.
    Occlusion Handling ComparingHue color histogram of two objects
  • 19.
    Experimental Results (1/3)Indoor Sequence: 49.5 frames/sec Ourdoor Sequence: 37.12 frames/sec
  • 20.
  • 21.
  • 22.
    Thank You! WonsangYou [email_address] University of Augsburg Germany