Video Retrieval of Specific Persons in Specific Locations

VIDEO RETRIEVAL OF SPECIFIC
PERSONS IN SPECIFIC LOCATIONS
AUTHOR: ADVISORS:
Xavier Giró-i-Nieto
Eva Mohedano Kevin McGuinness
Andrea Calafell
Noel E. O’Connor
1

1. Motivation
2. State of the art
3. Framework for TRECVID
4. Face detection
5. Face representation
6. Query expansion
7. Annotation Tool
8. Fusion and normalization strategies
9. Conclusions and future work
OUTLINE
2

MOTIVATION
SURVEILLANCE PERSONAL VIDEO ORGANIZATION
3

TRECVID INSTANCE SEARCH 2016
PEOPLE AND LOCATION QUERY SET
Person
visual
examples
Binary
masks
Location
visual
examples
TARGET
DATABASE
1.5M keyframes
244 video files
(300GB)
4

MOTIVATION: goals
● Obtain a baseline to participate in
TRECVID Instance Search 2016 (July, 1).
● Improve the results obtained in TRECVID
using the baseline.
5

1. Motivation
2. State of the art
4. Face detection
6. Query expansion
7. Annotation Tool
OUTLINE
6

STATE OF THE ART
Image of Eva Mohedano, D3L6 Image Retrieval, Deep Learning for Computer Vision (UPC 2016)
BASIC RETRIEVAL PIPELINE:
7

STATE OF THE ART
Image of Eva Mohedano, D3L6 Image Retrieval, Deep Learning for Computer Vision (UPC 2016)
BAG OF VISUAL WORDS:
8

STATE OF THE ART
Image: Alex Krizhevsky , Ilya Sutskever , Geoffrey E. Hinton, Imagenet classification with deep convolutional neural networks, 2012
Ali Sharif Razavian, Josephine Sullivan, Atsuto Maki, and Stefan Carlsson. A baseline for visual instance retrieval with deep convolutional networks. ICLR 2015.
CNN REPRESENTATION:
9

STATE OF THE ART
Eva Mohedano, Amaia Salvador, Kevin McGuinness, Ferran Marqués, Noel E. O’Connor, and Xavier Giró i Nieto. Bags of local convolutional features for scalable
instance search. ICMR 2016.
BAG OF LOCAL CONVOLUTIONAL FEATURES:
10

1. Motivation
2. State of the art
4. Face detection
6. Query expansion
7. Annotation Tool
OUTLINE
11

FRAMEWORK FOR TRECVID
13
Mohedano, et al. ICMR 2016

1. Motivation
2. State of the art
4. Face detection
6. Query expansion
7. Annotation Tool
OUTLINE
14

FACE DETECTION: ReInspect
Russell Stewart, Mykhaylo Andriluka, and Andrew Y. Ng. End-to-end people detection in crowded scenes. CVPR 2016.
16

QUALITATIVE RESULTS OF REINSPECT:
Changing both the input size of the network and the image size
Changing only the image size
17
Bad detections
False negatives

PROBLEM: Images used to train ReInspect
18

FACE DETECTION: Menpo
1
https://github.com/menpo/menpodetect
Python wrapper for face detectors1
:
● DLIB
● OPENCV
● Pixel Intensity Comparison-based
Object detection (PICO)
● FFLD2
:
○ Based on Deformable Part
Models (DPM)
○ Use LUV color space
2
M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool. Face detection without bells and whistles. ECCV, 2014.
Examples of FFLD results
19

QUALITATIVE RESULTS OF MENPO:
DLIB
OPENCV
PICO 20
False negatives

QUALITATIVE RESULTS OF MENPO:
FFLD
Still some false negatives Solution: Equalize image
21

1. Motivation
2. State of the art
4. Face detection
6. Query expansion
7. Annotation Tool
OUTLINE
22

DEEP FACE RECOGNITION
FACE REPRESENTATION
Image: Simonyan, Karen, and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. ICLR 2015.
O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. BMVC 2015
VGG 16-layer
23

1. Motivation
2. State of the art
4. Face detection
6. Query expansion
7. Annotation Tool
OUTLINE
24

QUERY EXPANSION
Sequence of keyframes of one shot
dilate
Mask creation pipeline
25
TEMPORAL QUERY EXPANSION:

Results of temporal query expansion
26
QUERY EXPANSION
TEMPORAL QUERY EXPANSION:

27
QUERY EXPANSION
PSEUDO-RELEVANCE FEEDBACK QUERY EXPANSION:
Top 20 retrieved keyframes

1. Motivation
2. State of the art
4. Face detection
6. Query expansion
7. Annotation Tool
OUTLINE
28

ANNOTATION TOOL
3.991 shots
for persons
1.528 shots
for locations
794 shots
in common
29

1. Motivation
2. State of the art
4. Face detection
6. Query expansion
7. Annotation Tool
OUTLINE
30

FUSION AND NORMALIZATION STRATEGIES
NORMALIZATION:
● Z-score:
● Max-min:
● Extreme Value Theory:
FUSION:
Linear combination, maximum, minimum.
32

RESULTS OF APPLYING DIFFERENT NORMALIZATIONS:
BASELINE 33

Brad Person distribution Laundrette Location distribution
34
EXAMPLE DISTRIBUTION:

RESULTS OF APPLYING MAXIMUM OR MINIMUM FUSION
35

RESULTS OF WEIGHTING LINEAR COMBINATION
HIGHER THAN THE BASELINE
36

1. Motivation
2. State of the art
4. Face detection
6. Query expansion
7. Annotation Tool
OUTLINE
37

CONCLUSIONS
● FFLD, a simple approach using vanilla DPM combined with image equalization is the best option for TRECVID
dataset
38

CONCLUSIONS
● The temporal query expansion proposed works well, but the faces are very similar between them
However, using the top 20 faces in the ranking as new queries gives more diverse faces.
39

CONCLUSIONS
● An annotation tool is needed in order to obtain quantitative results.
3.991 shots
for persons
1.528 shots
for locations
794 shots
in common
TOTAL OF RELEVANT ANNOTATED SHOTS
● The best configuration is without applying normalization and combining the scores by weighting higher the
location ranking
40

FUTURE WORK
● Analyze deeper the location part
● Try to improve the location rankings
41

RESULTS OF THE PARTS SEPARATELY OVER 50 KEYFRAMES
43

RESULTS OF APPLYING MAXIMUM, MINIMUM AND PRODUCT FUSION
44

Video Retrieval of Specific Persons in Specific Locations

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Video Retrieval of Specific Persons in Specific Locations

Similar to Video Retrieval of Specific Persons in Specific Locations (20)

More from Universitat Politècnica de Catalunya

More from Universitat Politècnica de Catalunya (20)

Recently uploaded

Recently uploaded (20)

Video Retrieval of Specific Persons in Specific Locations