https://imatge.upc.edu/web/publications/semantic-and-diverse-summarization-egocentric-photo-events
This project generates visual summaries of events depicted from egocentric photos taken with a wearable camera. These summaries are addressed to mild-dementia patients in order to exercise their memory in a daily base. The main contribution is an iterative approach that guarantees the semantic diversity of the summary and a novel soft metric to assess subjective results. Medical experts validated the proposed solution with a Mean Opinion Score of 4.6 out of of 5.0. The flexibility and quality of the solution was also tested in the 2015 Retrieving Diverse Social Images Task from the scientific international benchmark, MediaEval.
Advantages of Hiring UIUX Design Service Providers for Your Business
Semantic and Diverse Summarization of Egocentric Photo Events
1. Semantic and Diverse
Summarization of Egocentric
Photo Events
Aniol Lidon Baulida
Master Computer Vision (UAB, UPC, UPF, UOC)
Advisors:
Xavier Giró Nieto, Image Processing Group, Universitat Politècnica de Catalunya
Petia Radeva, Barcelona Perceptual Computing Lab, Universitat de Barcelona
1
2. Collaboration
Barcelona Perceptual Computing Laboratory :
Marc Bolaños, Petia Radeva
Image Processing Group:
Xavier Giró
Grup de Recerca Cervell, Cognició i Conducta:
Maite Garolera
Institute of Creative Media Technologies:
Matthias Zeppelzauer
2
3. Motivation
• In 2013, 44.4 million people with dementia worldwide.
• “Cognitive Stimulation Therapy”
3
8. Sate of the art
• This project continues the work started by Ricard Mestre.
– Event segmentation and selecting the most repetitive image from an event.
• Off-the-shelf algorithms used:
– Informativeness network: provided by Marc Bolaños (to be published)
– Blur detection: Crete et al. The blur effect: perception and estimation with a new no-
reference perceptual blur metric
– Saliency Maps: provided by Kevin McGuinness (to be published).
– Face detection: Zhu et al. Face detection, pose estimation, and landmark localization in
the wild.
– Object Candidates: Arbelaez et al. Multiscale Combinatorial Grouping
– Object Detector: Hoffman et al. Large Scale Detection through Adaptation.
– Affective: Campos et al. Diving Deep into Sentiment: Understanding Fine-tuned CNNs for
Visual Sentiment Prediction
8
15. Relevance
15
What is relevance?
Frame-level:
• WHAT? Representative of an activity.
• Saliency Maps
• Object detection
• WHO? Social interactions.
• Face detection
• Sentiment Analysis (Affectivity)
18. Relevance ranking
18
Objects
LSDA Large Scale Detection through Adaptation
Object Detector
Aim: Finding well defined objects.
Scoring for relevance: Summing all detected objects scores.
19. Relevance ranking
19
Faces
Face detection, pose estimation, and
landmark localization in the wild.
Aim: Finding well defined faces.
Scoring for relevance: Summing exponentially all faces confidences.
31. Metric
31
Mean Normalized Sum of Max Similarities (MNSMS)
MNSMS n (%)
Normalization in both axes
Y: Divide by GT samples
X: Reshape samples to N bins
Ground-Truth
SortedList(Results)
n=1
Similarity Sum= + +
32. Metric
32
Mean Normalized Sum of Max Similarities (MNSMS)
MNSMS n (%)
Normalization in both axes
Y: Divide by GT samples
X: Reshape samples to N bins
Ground-Truth
SortedList(Results)
n=2
Similarity Sum= + +
33. Metric
33
Mean Normalized Sum of Max Similarities (MNSMS)
MNSMS n (%)
Normalization in both axes
Y: Divide by GT samples
X: Reshape samples to N bins
Ground-Truth
SortedList(Results)
n= 3
Similarity Sum= + +
34. Metric
34
Mean Normalized Sum of Max Similarities (MNSMS)
MNSMS n (%)
Normalization in both axes
Y: Divide by GT samples
X: Reshape samples
Ground-Truth
SortedList(Results)
Similarity Sum= + +
n= 4
35. AUC
Metric
35
Mean Normalized Sum of Max Similarities (MNSMS)
MNSMS n (%)
Normalization in both axes
Y: Divide by GT samples
X: Reshape samples
Ground-Truth
SortedList(Results)
Similarity Sum= + +
n= 4
36. Assesment
36
Validation of automatic
approach
Manually annotated
summaries
• 7 dataset with labelled ground-truth
• MNSMS (ImageNet) AUC
• 2 Online questionnaires
• Mean Opinion Score
Psychologists feedback:
INTERMEDIATE VALIDATION FINAL EVALUATION
46. Generalization
Mediaeval diverse task
• APPLICATION: Finding more information about a place to visit.
• GOAL: Povide a ranked list of Flickr photos for a predefined set of queries. The
refined list should be both relevant to the query and also diverse.
46
A. Lidon, M. Bolaños, M. Seidl, X. Giro-i Nieto, P. Radeva, and M. Zeppelzauer, “Upc-ub-stp @ mediaeval 2015 diversity
task: Iterative reranking of relevant images,” in MediaEval 2015 Workshop, Wurzen, Germany, 2015.
0,4
0,42
0,44
0,46
0,48
0,5
0,52
0,54
0,56
Run 1 F1@20 (Visual)
47. Conclusions
• Contributions:
– Mean Normalized Sum of Max Similarities.
– New criterion for semantic diversity (based on LSDA).
– New method for diversity fusion.
– Online evaluation questionnaires.
47
48. Conclusions
• Tested in two applications:
– Memory reinforcement for mild-dementia.
– Diverse Social Images Task from the scientific MediaEval benchmark.
• Mean Opinion Score of 4.6 out of 5.00.
• Publications:
– Working-notes paper in MediaEval challenge.
– Wearable and Ego-vision Systems for Augmented Experience of the
journal IEEE Transactions on Human-Machine Systems.
• Code available: https://imatge.upc.edu/web/resources/semantic-
and-diverse-summarization-egocentric-photo-events-software
48
49. Future work
• Further in other relevance criterion.
• Higher level of semantics.
• Determine automatically the summary length.
49
54. Similarity measure
54
ImageNet
Places
LSDa
CNN trained with ImageNet DB (1000 classes) using CaffeNet Architecture.
Fully connected layer 8 removed.
CNN trained with Places (476 classes) DB using CaffeNet Architecture.
Fully connected layer 8 removed.
Object detector : Large Scale Detection through Adaptation (7500 classes).
Knowledgement transfer: Classifiers without bounding box annotated data into detectors
Two post-processing steps of no-maxima supression.
55. Result
Mediaeval diverse task
• APPLICATION: Finding more information about a place to visit.
• GOAL: Povide a ranked list of Flickr photos for a predefined set of queries. The
refined list should be both relevant to the query and also diverse.
Ranking for
relevance
Filtering
Distance
computation
Diversity
Informativeness network, Textual
Keep N% top results
ImageNet, Places, Textual
Diverse top results
56. Result
Mediaeval diverse task
• APPLICATION: Finding more information about a place to visit.
• GOAL: Povide a ranked list of Flickr photos for a predefined set of queries. The
refined list should be both relevant to the query and also diverse.
Visual Textual Multi Crediv. Multi