Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Embedded Fest 2019. Игорь Таненков и Игорь Успеньев. Action Recognition from Live CCTV streams

15 views

Published on

Action Recognition system for video surveillance. Description of integration computer vision module based on Deep learning and analytical models into production. Challenges and approaches. How we handle multiple video streams and reduce false positives. Also we will explain how to deal with lack of datasets for action recognition.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Embedded Fest 2019. Игорь Таненков и Игорь Успеньев. Action Recognition from Live CCTV streams

  1. 1. Action Recognition System Ihor Tanyenkov Igor Uspeniev
  2. 2. CCTV: Detect Conflict Behaviour Detect: 1. Person pushing another person. Pushing, punching and kicking is a hand movement at a speed above a configurable (not fixed) threshold value, and ending with a touch to another person. 2. Person fighting another person by kicking or punching. Requirements: #1 Clearly visible hitting hand, touch, participants. #4 Strike motion projection to camera image is distinguishable as a fast motion.
  3. 3. CCTV: Falling Detection Detect: 1. Person falling down from a punch. 2. Person on ground getting kicked or beat up. 3. Person on ground laying down Requirements: #1. Clearly visible standing person #2. Clearly visible lying person
  4. 4. False Depth Perception Fist is in frontHead is far behind People located close in angular position to camera, but have difference in distance location, on RGB image looks like they are too close or even touching. In this case if one is moving fast(dancing, rotating, etc, ), the other is not influenced by these moves. So we should analyse correlation between movement intensity of people that close to each other on RGB, and filter false positives if their movements are independent.
  5. 5. Occluded Participant Frames 1, 2: Normal behaviour while good visibility Frame 3: Hit while person is occluded Frame 4: Fall while person is occluded
  6. 6. Occluded Hit Frame 1: Normal behaviour while good visibility Frames 2, 3, 4: Occluded hit
  7. 7. Power Standoff Without Fast Movements Every single frame contains no strikes Sequence of frames contains no fast motion
  8. 8. False Hit A friendly hug, a pat on the shoulder can be fast and even strong. The difference from the power struggle lies in the manner of movements, it is a complex of movements of various parts of the body.
  9. 9. False Grassing Many (and perhaps most) falls are not due to blows, but because of ridiculous accidents
  10. 10. Standing Point Lower or Upper Than Ground Level Impossible to detect falling related to ground level. Problems in full body position detection.
  11. 11. Fighting in the Crowd Huge count of persons in the field of view Mutual occlusion and chaotic movement Performance problems
  12. 12. Review of Existing Solutions Group 1: Instant frame classification: ● Body position classification Lots of false positives ● Motion as smoothed areas classification Problems: Group 2: Motion tracking in frame sequence: ● Optical flow for motion estimation and classification Frame rate dependency Group 3. Body matching in frame sequence: ● Body parts detection and matching ● Motion sequence classification
  13. 13. Used Approach Step 1: Pose Estimation and Analytical Motion Pose estimation: Detect keypoints and connections. Challenge: ● Closely located persons with body intersections ● Dress on the body ● Hidden/occluded body parts ● Crowded scenes Multiframe body matching and action classification
  14. 14. Proposed Approach Step 2: Frame Sequence Classification {1 0 1 01 0 1 0 1 0 0 0 . . . . . . . . } CNN Feature extraction {1 0 1 01 0 1 0 1 0 0 0 . . . . . . . . } {1 0 1 01 0 1 0 1 0 0 0 . . . . . . . . }{1 0 1 01 0 1 0 1 0 0 0 . . . . . . . . } Deep LSTM network Extracting feature maps Frame collection and preprocessing Create embedding for each feature map Build embedding sequence Predict sequence with LSTM networks
  15. 15. Data Representativity and Accuracy Datasets Variativity Static features: body parts, primitives Dynamics: motion matching speed estimation Datasets Ground Truth Action classification
  16. 16. Challenges Dataset quality on public artificial data: ● Slow hits, ● Deceleration before hitting ● Fighting is only dynamics ● Poor action list scenario ● No ground truth Dataset representativity: ● No touches ● No falling ● Little set of variativity: ○ environment ○ no crowd ○ person’s appearance
  17. 17. Challenges and Uncertainties ● Smoothed motion ● Occluded strike ● Spatial orientation estimation ● Performance improvement: GPU parallelism, multiple models serving, intelligent preprocessing ● Voting system ● Dataset mining and labeling, request for proprietary datasets
  18. 18. Architecture design
  19. 19. Slow Motion: Filter of Third Level
  20. 20. Speed Function Reconstruction
  21. 21. Example of Function Differentiation
  22. 22. Example of Function Differentiation
  23. 23. Space of Equilibrium: Singularity Stability
  24. 24. Space of Equilibrium: Homogenous Deviation

×