Semantic human activity detection in videos


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • BHA – pointing, raising legs, cell to ear, opening a door, sitting downHA – hurdling, smoking, delivering a speech, horse riding, shooting a gunSHA – Meaningful activities to the end users
  • -Variations in recording settings --different viewpoints, motion cameras, dynamic backgrounds-Inter personal differences --shape variations, different garments among humans-Variations in action performing --any action never perform again in same manner even though the same person is performing
  • Laptev et al --solution is only for well-structured action detectionMing Yang et al --‘putting an object down’, ‘hands pointing’, ‘answering phone call’ QingshanLuo et al --‘walking’, ‘jogging’ and ‘hand waving’Ke et al --‘walking’, ‘jogging’, ‘waving hands’
  • Delivering a speechHurdlingSmoking, horse riding
  • --Main Frame Main frame of smoking activity is the pose of putting cigarette into mouth by holding fingers. We call this frame as main frame because it is the most unique and informative frame in an action sequence of smoking.
  • Detailed description
  • --Deploys two classifiers to detect face frontal view and face profile view this gives wide coverage of detection for faces which are turned to the detection plane--OpenCV face frontal view classifier use as frontal view classifier & New profile view classifer was trained for fpv detections
  • Property I -Repeating motion from hand/mouth to mouth/hand
  • Property IIAppearance of Main Frame
  • Property IIIAppearance of smokeProcessFirst we identify smoke pixel color distribution using sample set and then analyses the motion of each pixel to verify smoke pixels
  • *We took care not to overlap training data and testing data
  • --use two datasets Movie -images exploited from two movies Global -images downloaded from WWW--face profile view performance is less than ffv due to: lack of training samples & lack of discriminating feature points in fpv--combined classifier shows higher recall rate
  • --Due to lack of training samples we couldn’t obtain significant results but we strongly believe that increasing the number of training samples would give higher performance--To this point we have obtained sufficient results to prove our solution prototype
  • Semantic human activity detection in videos

    1. 1. Semantic Human Activity Detection in Videos by Hirantha Pradeep Weerarathna Dr. Anuja DharmaratneUniversity of Colombo School of Computing
    2. 2. Definitions• Basic Human Action – One simple motion of a human body organ• Human Activity – Combination of basic actions in a row• Semantic Human Activity – More meaningful human activities
    3. 3. • Human action detection is a well recognized problem in computer vision• It is a very hard problem due to: – Variations in recording settings – Inter personal differences – Variations in action performing
    4. 4. • Many solutions have been proposed in the past for human action detection• Significant observation on these solutions is almost all the solutions discuss about basic human actions no attention has been paid for human activities• Most solutions are based on action pattern analysis
    5. 5. Previous Human Action Detection Solutions• Laptev et al have proposed a space time classifier and key frame priming based method for action ‘drinking’.• Ming Yang et al has proposed efficient detection method based on motion history images• Qingshan Luo et al has proposed a novel action representation called local motion histogram and a gentle adaboost based feature selection technique• Ke et al proposed a solution to detect smooth human actions
    6. 6. • These solutions are based on action pattern analysis• Fails to detect human activities. Because, 1. Some activities do not have any pattern or structure within 2. Some activities are too complex to identify using basic action detection techniques 3. For some human activities it would be possible to create an action template, but when actions are performed this pattern would not be preserved
    7. 7. Problem StatementIdentifying such semantic human activities ?
    8. 8. Our Solution• Identifying human activities based on Context Specific Information.• We propose a solution prototype for the activity ‘smoking’
    9. 9. Context Specific Information• Information set directly associated to a particular human activity• Best description of the activity• Have the strength to discriminate the activity from thousands of similar activity classes
    10. 10. CSI Examples• Fighting – rapid hand, leg movements – collision of two or more human silhouettes• Delivering a speech – changing facial expressions – continuous hand movements• Riding a bicycle – continuous leg movements – bent hands and body – rapid moving in the space
    11. 11. Smoking• Well-known human activity• Cause fatal diseases• CSI set associated with smoking – Property I Repeating motion from hand/mouth to mouth/hand – Property II Appearance of Main Frame – Property III Appearance of smoke
    12. 12. Solution Architecture Input Frame Grabber Video Motion Analyzer Human Detector Main Frame Smoke Detector DetectorOutputVideo Frame Collector
    13. 13. Human Detector• Detect and localize humans using face detection technique• Deploys two classifiers to detect face frontal view and face profile view face face frontal profile view view• Haar cascades used for detection• If detector fails, no room for smoking scene
    14. 14. Motion Analyzer• Associated with smoking property I• Creates a motion history image to accumulate motion information• Alarms MFD when there is a motion from hand/mouth to mouth/hand
    15. 15. Main Frame Detector• Associated with smoking property II• Detects main frames• Uses object detection techniques to detection actions• Deploys a HOG feature based SVM for detection
    16. 16. Smoke Detector• Associated with smoking property III• Detect appearance of smoke in video sequence• Uses modified version of Phillips III et als work• Accumulate n number of frames to capture smoke properties• Uses properties of smoke: special color distribution and rapid motion
    17. 17. Dataset• No public dataset available with smoking videos or main frames• We exploited movie ‘Coffee and Cigarettes’ and ‘Sea and Love’• Downloaded samples for WWW• Training datasets not overlaps with testing data
    18. 18. Results Evaluation• Results of face frontal view detector Dataset Recall Precision Movie 88% 98% Global 78% 92%• Results of face profile view detector Dataset Recall Precision Movie 53% 88% Global 57% 86%• Results of combined face detector Dataset Recall Precision Movie 92% 90% Global 84% 88%
    19. 19. Results Evaluation• Results of main frame detector 1 0.9 0.8 0.7 Detection Rate 0.6 0.5 0.4 Series1 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 False Positive Rate
    20. 20. Results Evaluation• Results of smoke detector Dataset Detection Rate FP Rate Colored 92% 5% Grayscale 60% 40%
    21. 21. Strengths of CSI Based Solutions• Can be designed like a evidence collecting approach• Robust to action performing variations• Robust to dynamic and cluttered backgrounds
    22. 22. Future Works• This is the introduction to significance of using CSI for activity detection. We expect an open discussion and more accurate solutions based on our concept.• Classifier training using more samples• Analyze the importance of sound information associated with a particular activity as a context specific information source.
    23. 23. Conclusion• Action pattern recognition is sufficient for identifying basic human actions• But it is not sufficient to detect human activities• CSI can be used to detect such human activities• CSI set used to detect one activity class cannot be used to detect another activity class• Selection of CSI set for a particular activity should be done carefully