Content based video summarization into object maps

by Manuel Martos Asensio
directed by
Horst Eidenberger
and
Xavier Giro-i-Nieto

Contents
 System overview
 Requirements analysis
 Solution
Preparation
Content selection
Compositing
 Conclusions
Experimental results
Further work

Contents
 System overview
 Solution approach
Preparation
Content selection
Compositing
 Conclusions
Further work

Requirements analysis
 Priority requirements
 P.1. People and main characters
 P.2. Fast understanding
 P.3. Visual variability
 Uniqueness requirements
 U.1. Non-repetition
 U.2. Visual uniqueness
 U.3. Characters uniqueness

Requirements analysis
 Structural requirements
 S.1. Main characters highlight
 S.2. Style
 Navigability requirements
 N.1. Region boundaries
 N.2. Metadata supplement

Contents
 System overview
Preparation
Uniform sampling
Shot boundary detection
Content selection
Compositing
 Conclusions
Further work

Preparation (I)
 Uniform sampling
fpsi = acquisition frame rate
N0 = number of samples
Li = video length (in frames)

Preparation (II)
 Shot boundary detection
Customizable method for boundary detection
Default: Cumulative Pixel-to-Pixel

Contents
 System overview
Preparation
Content selection
Face detection
Face clustering
Object detection
Compositing
 Conclusions
Further work

Content selection (I)
 Face detection
Problems:
Extreme size detections
Overlapping detections

Content selection (II)
 Face detection

Content selection (III)
 Face detection
Size filtering with fixed threshold

Content selection (IV)
 Face detection
Overlap filtering
Frontal detections are more reliable.

Content selection (V)
 Face detection

Content selection (VI)
 Face clustering
Which faces belong to the same person?
Which faces appear more often in the video?
Unsupervised Face Clustering problem:
1. Unknown number of characters
2. Unknown ground truth
Solution:
Iterative cluster estimation using LBPH

Content selection (VII)
 Face clustering
Pre-processing of face detection boxes

Content selection (VIII)
 Face clustering
Iterative face labeling

Content selection (IX)
 Face clustering

Content selection (X)
 Face clustering

Content selection (XI)
 Object detection
Relevant content is related to source video
Custom object map with:
1. Haar cascades
2. SURF descriptors matching
3. Deformable parts models

Content selection (XII)
Haar cascade classifiers
Advantages:
- Quick object detection
- Training and detection stages included in OpenCV
Disadvantages:
- Fails at giving good results with different object views
- Slow training process

Content selection (XIII)
SURF descriptors matching
Advantages:
- No additional training stage needed
- Scale and rotation invariant method
- Real-time object detection
- Descriptors extraction and matching strategy included in OpenCV
Disadvantages:
- Very specific training image
- Object may not be located in the image

Content selection (XIV)
Deformable parts models
Advantages:
- Multiple object views detection
- Scored results
Disadvantages:
- Third party executable wrapped in Java
- Slow object detection process

Compositing (I)
 Object segmentation

Compositing (II)
 Tile-based map
Adaptative map
Navigation functionalities

Conclusions (I)
 Experimental results
Web-based survey:
13 trailers
53 participants
Control methods:
Baseline: Uniform sampling
Upper bound: Manual frame selection

Conclusions (III)
Overall rating
Recognition Rate
Attractiveness and effectiveness
 Scores
1 (Unacceptable), 2 (Fair), 3 (Good), 4 (Very good), 5 (Excellent)

Conclusions (II)
Overall rating
0
1
2
3
4
5
1 2 3 4 5 6 7 8 9 10 11 12 13
score
trailer id
MOS for video
Uniform sampling
Object map
Manual selection
0
1
2
3
4
5
score
MOS

Conclusions (III)
Trailer 1: The Intouchables
Uniform sampled Object map

Conclusions (III)
Trailer 7: The Fast and the Furious
Object map

Conclusions (IV)
Movie recognition
a) Uniform sampling
b) Uniform sampling + Object map
c) Uniform sampling + Object map + Manual selection
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9 10 11 12 13
recognitionrate(%)
trailer id
Recognition Rate for video
a
b
c
0
20
40
60
80
100
recognitionrate(%)
Recognition rate

Conclusions (III)
Trailer 4: Dark Shadows
Trailer 9: Resident Evil 5 – Retribution
Uniform sampled Uniform sampled

Conclusions (V)
Attractiveness and Effectiveness
0
1
2
3
4
5
1 2 3 4 5 6 7 8 9 10 11 12 13
score
trailer id
Acceptance rate
Attractiveness
Effectiveness
0
1
2
3
4
5
score
Average acceptance
rate

Conclusions (III)
 Content-based video summarization application
 Customizable
 Allows to rapidly grasp video content
 Generates a summary description file to include related metadata
 ACM 2013 Open Source Software Competition
 Code publicly available at Sourceforge
 http://sourceforge.net/p/objectmaps

Conclusions (VI)
 Further work
 Face clustering improvement
 Audio content analysis and understanding
 Video sequence analysis
 Content presentation analysis
 Social Media

Content based video summarization into object maps

Content based video summarization into object maps

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Content based video summarization into object maps

Similar to Content based video summarization into object maps (20)

More from Universitat Politècnica de Catalunya

More from Universitat Politècnica de Catalunya (20)

Recently uploaded

Recently uploaded (20)

Content based video summarization into object maps