Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Content based video summarization into object maps

1,207 views

Published on

Master thesis defence by Manuel Martos-Asensio

Advisors: Horst Eidenberger (Technische Universtität Viena) and Xavier Giró-i-Nieto (Universitat Politècnica de Catalunya)

More details

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Content based video summarization into object maps

  1. 1. by Manuel Martos Asensio directed by Horst Eidenberger and Xavier Giro-i-Nieto
  2. 2. Introduction (I)
  3. 3. Introduction (II)
  4. 4. Introduction (III)
  5. 5. Contents  System overview  Requirements analysis  Solution Preparation Content selection Compositing  Conclusions Experimental results Further work
  6. 6. Contents  System overview  Requirements analysis  Solution approach Preparation Content selection Compositing  Conclusions Experimental results Further work
  7. 7. System overview
  8. 8. Contents  System overview  Requirements analysis  Solution approach Preparation Content selection Compositing  Conclusions Experimental results Further work
  9. 9. Requirements analysis  Priority requirements  P.1. People and main characters  P.2. Fast understanding  P.3. Visual variability  Uniqueness requirements  U.1. Non-repetition  U.2. Visual uniqueness  U.3. Characters uniqueness
  10. 10. Requirements analysis  Structural requirements  S.1. Main characters highlight  S.2. Style  Navigability requirements  N.1. Region boundaries  N.2. Metadata supplement
  11. 11. Requirements analysis
  12. 12. Contents  System overview  Requirements analysis  Solution approach Preparation Uniform sampling Shot boundary detection Content selection Compositing  Conclusions Experimental results Further work
  13. 13. Preparation (I)  Uniform sampling fpsi = acquisition frame rate N0 = number of samples Li = video length (in frames)
  14. 14. Preparation (II)  Shot boundary detection Customizable method for boundary detection Default: Cumulative Pixel-to-Pixel
  15. 15. Contents  System overview  Requirements analysis  Solution approach Preparation Content selection Face detection Face clustering Object detection Compositing  Conclusions Experimental results Further work
  16. 16. Content selection (I)  Face detection Problems: Extreme size detections Overlapping detections
  17. 17. Content selection (II)  Face detection
  18. 18. Content selection (III)  Face detection Size filtering with fixed threshold
  19. 19. Content selection (IV)  Face detection Overlap filtering Frontal detections are more reliable.
  20. 20. Content selection (V)  Face detection
  21. 21. Content selection (VI)  Face clustering Which faces belong to the same person? Which faces appear more often in the video? Unsupervised Face Clustering problem: 1. Unknown number of characters 2. Unknown ground truth Solution: Iterative cluster estimation using LBPH
  22. 22. Content selection (VII)  Face clustering Pre-processing of face detection boxes
  23. 23. Content selection (VIII)  Face clustering Iterative face labeling
  24. 24. Content selection (IX)  Face clustering
  25. 25. Content selection (X)  Face clustering
  26. 26. Content selection (XI)  Object detection Relevant content is related to source video Custom object map with: 1. Haar cascades 2. SURF descriptors matching 3. Deformable parts models
  27. 27. Content selection (XII)  Object detection Haar cascade classifiers Advantages: - Quick object detection - Training and detection stages included in OpenCV Disadvantages: - Fails at giving good results with different object views - Slow training process
  28. 28. Content selection (XIII)  Object detection SURF descriptors matching Advantages: - No additional training stage needed - Scale and rotation invariant method - Real-time object detection - Descriptors extraction and matching strategy included in OpenCV Disadvantages: - Very specific training image - Object may not be located in the image
  29. 29. Content selection (XIV)  Object detection Deformable parts models Advantages: - Multiple object views detection - Scored results Disadvantages: - Third party executable wrapped in Java - Slow object detection process
  30. 30. Contents  System overview  Requirements analysis  Solution approach Preparation Content selection Compositing  Conclusions Experimental results Further work
  31. 31. Compositing (I)  Object segmentation
  32. 32. Compositing (II)  Tile-based map Adaptative map Navigation functionalities
  33. 33. Contents  System overview  Requirements analysis  Solution approach Preparation Content selection Compositing  Conclusions Experimental results Further work
  34. 34. Conclusions (I)  Experimental results Web-based survey: 13 trailers 53 participants Control methods: Baseline: Uniform sampling Upper bound: Manual frame selection
  35. 35. Conclusions (III)  Experimental results Overall rating Recognition Rate Attractiveness and effectiveness  Scores 1 (Unacceptable), 2 (Fair), 3 (Good), 4 (Very good), 5 (Excellent)
  36. 36. Conclusions (II)  Experimental results Overall rating 0 1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 11 12 13 score trailer id MOS for video Uniform sampling Object map Manual selection 0 1 2 3 4 5 score MOS
  37. 37. Conclusions (III)  Experimental results Trailer 1: The Intouchables Uniform sampled Object map
  38. 38. Conclusions (III)  Experimental results Trailer 7: The Fast and the Furious Object map
  39. 39. Conclusions (IV)  Experimental results Movie recognition a) Uniform sampling b) Uniform sampling + Object map c) Uniform sampling + Object map + Manual selection 0 20 40 60 80 100 1 2 3 4 5 6 7 8 9 10 11 12 13 recognitionrate(%) trailer id Recognition Rate for video a b c 0 20 40 60 80 100 recognitionrate(%) Recognition rate
  40. 40. Conclusions (III)  Experimental results Trailer 4: Dark Shadows Trailer 9: Resident Evil 5 – Retribution Uniform sampled Uniform sampled
  41. 41. Conclusions (V)  Experimental results Attractiveness and Effectiveness 0 1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 11 12 13 score trailer id Acceptance rate Attractiveness Effectiveness 0 1 2 3 4 5 score Average acceptance rate
  42. 42. Conclusions (III)  Content-based video summarization application  Customizable  Allows to rapidly grasp video content  Generates a summary description file to include related metadata  ACM 2013 Open Source Software Competition  Code publicly available at Sourceforge  http://sourceforge.net/p/objectmaps
  43. 43. Conclusions (VI)  Further work  Face clustering improvement  Audio content analysis and understanding  Video sequence analysis  Content presentation analysis  Social Media

×