Semantic and Diverse Summarization of Egocentric Photo Events

Semantic and Diverse
Summarization of Egocentric
Photo Events
Aniol Lidon Baulida
Master Computer Vision (UAB, UPC, UPF, UOC)
Advisors:
Xavier Giró Nieto, Image Processing Group, Universitat Politècnica de Catalunya
Petia Radeva, Barcelona Perceptual Computing Lab, Universitat de Barcelona
1

Collaboration
Barcelona Perceptual Computing Laboratory :
Marc Bolaños, Petia Radeva
Image Processing Group:
Xavier Giró
Grup de Recerca Cervell, Cognició i Conducta:
Maite Garolera
Institute of Creative Media Technologies:
Matthias Zeppelzauer
2

Motivation
• In 2013, 44.4 million people with dementia worldwide.
• “Cognitive Stimulation Therapy”
3

Motivation
• Lifelogging with Narrative Clip.
• Up to 2000~3000 images at day!
• Summarization is needed.
4

Goal
5
Automatically summarize events.
• Sorting by priority.
• Trade-off between relevance and diversity.
• Obtaining sorted ranks.

Goal
6
RELEVANCE

Goal
7
RELEVANCE
DIVERSITY

Sate of the art
• This project continues the work started by Ricard Mestre.
– Event segmentation and selecting the most repetitive image from an event.
• Off-the-shelf algorithms used:
– Informativeness network: provided by Marc Bolaños (to be published)
– Blur detection: Crete et al. The blur effect: perception and estimation with a new no-
reference perceptual blur metric
– Saliency Maps: provided by Kevin McGuinness (to be published).
– Face detection: Zhu et al. Face detection, pose estimation, and landmark localization in
the wild.
– Object Candidates: Arbelaez et al. Multiscale Combinatorial Grouping
– Object Detector: Hoffman et al. Large Scale Detection through Adaptation.
– Affective: Campos et al. Diving Deep into Sentiment: Understanding Fine-tuned CNNs for
Visual Sentiment Prediction
8

Prefiltering
11
Aim: Removing uninformative images.
Informativeness network
Fine-tuning by
Human Annotations
Filtering out: Discarding absolutely uninformative frames.

Relevance
14
What is relevance?
Frame-level:
•Repeated.
• Unusual.
• WHAT? Representative of an activity.
• WHO? Social interactions.
• WHERE? Environment.
• WHEN an event has occurred.
• HOW activity occurred.

Relevance
15
What is relevance?
Frame-level:
• WHAT? Representative of an activity.
• Saliency Maps
• Object detection
• WHO? Social interactions.
• Face detection
• Sentiment Analysis (Affectivity)

Relevance Ranking: pipeline
16
Prefiltering
Diversity
re-ranking

Relevance ranking
Saliency maps
SalNet CNN
Aim: Determining interesting zones.
Scoring for relevance: Averaging all saliency-map values.
17

Relevance ranking
18
Objects
LSDA Large Scale Detection through Adaptation
Object Detector
Aim: Finding well defined objects.
Scoring for relevance: Summing all detected objects scores.

Relevance ranking
19
Faces
Face detection, pose estimation, and
landmark localization in the wild.
Aim: Finding well defined faces.
Scoring for relevance: Summing exponentially all faces confidences.

Relevance Ranking: pipeline
20
Prefiltering
Diversity
re-ranking

Diversity re-ranking
Re-ranking by Soft Max Diversity Fusion
23
Color similarity
Faces similarity

24
Color similarity
Faces similarity

25
Color similarity
Faces similarity

Similarity measure
26
ImageNet
Euclidean distance between features (L2 norm).
CNN trained with ImageNet DB (1000 classes) using CaffeNet Architecture.
Fully connected layer 8 removed.

Assesment
29
Validation of automatic
approach
Manually annotated
summaries
• 7 dataset with labelled ground-truth • 2 Online questionnaires
• Mean Opinion Score
Psychologists feedback:
INTERMEDIATE VALIDATION FINAL EVALUATION

Subjective problem
30
Precision
GROUND-TRUTH SELECTED

Metric
31
Mean Normalized Sum of Max Similarities (MNSMS)
MNSMS n (%)
Normalization in both axes
Y: Divide by GT samples
X: Reshape samples to N bins
Ground-Truth
SortedList(Results)
n=1
Similarity Sum= + +

Metric
32
MNSMS n (%)
Ground-Truth
SortedList(Results)
n=2
Similarity Sum= + +

Metric
33
MNSMS n (%)
Ground-Truth
SortedList(Results)
n= 3
Similarity Sum= + +

Metric
34
MNSMS n (%)
X: Reshape samples
Ground-Truth
SortedList(Results)
Similarity Sum= + +
n= 4

AUC
Metric
35
MNSMS n (%)
X: Reshape samples
Ground-Truth
SortedList(Results)
Similarity Sum= + +
n= 4

Assesment
36
approach
Manually annotated
summaries
• 7 dataset with labelled ground-truth
• MNSMS (ImageNet) AUC
• 2 Online questionnaires

Intermediate validation
37
Prefiltering
•Informativeness Network
•Hand Crafter Estimators
• Not prefitering

38
• SalNet
• SalNet + Gaussian
Objects Relevance
• LSDA (object detector)
• MCG (object candidates)
0,7
0,75
0,8
0,85
0,9
SalNet SalNet + Gauss
0,7
0,75
0,8
0,85
0,9
LSDA MCG
Saliency Relevance
Saliency Relevance AUC
Objects Relevance AUC

Affective Relevance
• Positive
• Negative
•Extremum
•Random
Sentiment analysis CNN
• 2 classes: positive / negative
39

Assesment
40
approach
Manually annotated
summaries
• 7 dataset with labelled ground-truth
• MNSMS (ImageNet) AUC
• 2 rounds of online questionnaires

Final evaluation
41
SIMILARITY
• ImageNet CNN (fc8 removed)
• Places CNN (fc8 removed)
• LSDA (only spatial NMS)
• Fusion (ImageNet + Places + LSDA)
(Diversity re-ranking + Weight fusion in MNSMS)

Final evaluation
43
MEAN OPINION SCORE
• ImageNet configuration
• Uniform Sampling
• Ground-truth (previous manual annotation)

Final results
Representativity of summaries:
Preferred summary:
Mean Opinion Score (1 worse - 5 best)
45

Generalization
Mediaeval diverse task
• APPLICATION: Finding more information about a place to visit.
• GOAL: Povide a ranked list of Flickr photos for a predefined set of queries. The
refined list should be both relevant to the query and also diverse.
46
A. Lidon, M. Bolaños, M. Seidl, X. Giro-i Nieto, P. Radeva, and M. Zeppelzauer, “Upc-ub-stp @ mediaeval 2015 diversity
task: Iterative reranking of relevant images,” in MediaEval 2015 Workshop, Wurzen, Germany, 2015.
0,4
0,42
0,44
0,46
0,48
0,5
0,52
0,54
0,56
Run 1 F1@20 (Visual)

Conclusions
• Contributions:
– Mean Normalized Sum of Max Similarities.
– New criterion for semantic diversity (based on LSDA).
– New method for diversity fusion.
– Online evaluation questionnaires.
47

Conclusions
• Tested in two applications:
– Memory reinforcement for mild-dementia.
– Diverse Social Images Task from the scientific MediaEval benchmark.
• Mean Opinion Score of 4.6 out of 5.00.
• Publications:
– Working-notes paper in MediaEval challenge.
– Wearable and Ego-vision Systems for Augmented Experience of the
journal IEEE Transactions on Human-Machine Systems.
• Code available: https://imatge.upc.edu/web/resources/semantic-
and-diverse-summarization-egocentric-photo-events-software
48

Future work
• Further in other relevance criterion.
• Higher level of semantics.
• Determine automatically the summary length.
49

Prefiltering
51
Hand-crafted estimators
Blur
Black
Burned Color mean
Crete et al.
Informativeness network
•CNN trained with ImageNet + Places.
•Finetuned with human annotations:
relevant / irrelevant
by Marc Bolaños (UB)

Relevance ranking
52
Affective
• VitorNet CNN (2 classes
sentiment prediccions)
by Victor Campos (UPC)

Relevance ranking
53
Late fusion
• Score normalization:
•By Rank
•By Score
• Aggregate scores
Using MNSMS weights will be learned

Similarity measure
54
ImageNet
Places
LSDa
CNN trained with ImageNet DB (1000 classes) using CaffeNet Architecture.
CNN trained with Places (476 classes) DB using CaffeNet Architecture.
Object detector : Large Scale Detection through Adaptation (7500 classes).
Knowledgement transfer: Classifiers without bounding box annotated data into detectors
Two post-processing steps of no-maxima supression.

Result
Ranking for
relevance
Filtering
Distance
computation
Diversity
Informativeness network, Textual
Keep N% top results
ImageNet, Places, Textual
Diverse top results

Result
Visual Textual Multi Crediv. Multi

Semantic and Diverse Summarization of Egocentric Photo Events

Recommended

Recommended

More Related Content

Similar to Semantic and Diverse Summarization of Egocentric Photo Events

Similar to Semantic and Diverse Summarization of Egocentric Photo Events (20)

More from Universitat Politècnica de Catalunya

More from Universitat Politècnica de Catalunya (20)

Recently uploaded

Recently uploaded (20)

Semantic and Diverse Summarization of Egocentric Photo Events