SlideShare a Scribd company logo
1 of 23
Download to read offline
Visual Summary of Egocentric
Photostreams by Representative
Keyframes
Marc Bolaños, Ricard Mestre, Estefanía Talavera, Xavier Giró-i-Nieto and Petia Radeva
1
Motivation
Lifelogging wearable cameras can produce 1,500 images/day, more than 500,000 images/year.
2
Producing automatic summarization methods could help in
many applications. Specially, we are working on:
● Memory aid for Mild Cognitive Impairment patients.
● Automatic nutrition diary.
Extract the visual summary of a whole day capturing the
most representative information for describing the day.
Goal
3
Storytelling
Extract the visual summary of a whole day capturing the
most representative information for describing the day.
Goal
4
Storytelling
Have breakfast
with the family
Extract the visual summary of a whole day capturing the
most representative information for describing the day.
Goal
5
Storytelling
Have breakfast
with the family
Go for a walk
Extract the visual summary of a whole day capturing the
most representative information for describing the day.
Goal
6
Storytelling
Have breakfast
with the family
Go for a walk
Go shopping
Extract the visual summary of a whole day capturing the
most representative information for describing the day.
Goal
7
Storytelling
Have breakfast
with the family
Go for a walk
Go shopping
Take the bus
Extract the visual summary of a whole day capturing the
most representative information for describing the day.
Goal
8
Storytelling
Have breakfast
with the family
Go for a walk
Go shopping
Take the bus
Have a coffee
with a friend
State of the Art
Lu, Zheng, and Kristen Grauman. "Story-driven summarization for egocentric video." Computer Vision and Pattern Recognition (CVPR), 2013
IEEE Conference on. IEEE, 2013.
9
High temporal resolution egocentric data.
1. Event segmentation.
2. Detection of salient objects and people.
3. Subset selection of video shots based on:
a. Story
b. Importance
c. Diversity
State of the Art
Doherty, Aiden R., et al. "Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs." Proceedings of the
2008 international conference on Content-based image and video retrieval. ACM, 2008.
10
Low temporal resolution egocentric data.
1. Event segmentation.
2. Selection of the keyframes comparing
several methods:
a. Middle image of each segment.
b. Image close to the average value in
the segment (centroid-like).
c. Image with highest “quality”.
Methodology ( I )
11
Methodology ( II )
12
Frames Characterization
Convolutional Neural Networks (CNN) trained on ImageNet.
13
Jia, Yangqing, et al. "Caffe: Convolutional architecture for fast feature embedding." Proceedings of the ACM International Conference on Multimedia.
ACM, 2014.
Events Segmentation ( I )
Applying an agglomerative clustering and adapting the cut-off parameter, we can obtain a
good segmentation of all the events in our day.
14
cut-off parameter
Events Segmentation ( II )
Division - Fusion post-processing to obtain a more robust segmentation.
15
a) After Agglomerative Clustering
b) After Division
c) After Fusion
Division: splits and labels differently similar events spaced in time.
Fusion: merges very short sub-events not considered relevant enough.
Keyframe Selection
Visual similarity-based keyframe selection criteria.
16
Distances Matrix
Random Walk
Minimum Distance
Similarity-based probabilities
Summary Results
17
Evaluation ( I )
● 5 days
● 3 users
● 4005 images
● Segmentation ground truth
18
Talavera, E., Dimiccoli, M., Bolaños, M., Aghaei, M., & Radeva, P. R-
clustering for egocentric video segmentation. IbPRIA 2015, Santiago de
Compostela, Spain. Proceedings (Vol. 9117, p. 327). Springer.
Datasets Clustering
● Jaccard Index
Evaluation ( II )
19
Keyframe Selection
Lu, Zheng, and Kristen Grauman. "Story-driven summarization
for egocentric video." Computer Vision and Pattern
Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013.
Figure: brandchannel.com
● Blind taste test to 30 users for quality
evaluation
Representative images of the event #1
Do you think the image on the left can represent the event?
Do you think the image on the center can represent the event?
Yes
No
Yes
No
Yes
No
Do you think the image on the right can represent the event?
What is the most representative image of the event?
Left
Center
Right
Individual Keyframes Quality Evaluation
Evaluation ( III )
20
Keyframe Selection General Summary Quality Evaluation
Yes
No
Do you think that this set can summarize the whole day?
Finally, which one do you think is the best visual summary of the day?
Summary 1
Summary 2
Summary 3
Summary 4
Summary 1
Some of the summaries you will see might be very similar (differentiable
only in some images). In that case you can choose any of them.
Visual summaries of the day
Evaluation - Individual Keyframes
21
What is the most representative image of the
event?
Do you think that the image on the
left/center/right can represent the event?
Evaluation - General Summary
22
Can this set of images represent the complete day? Which summary is the best, in your opinion?
Conclusions
● New keyframe selection methodology taking into account visual and temporal
information.
● Keyframe selection using CNN-based global information and graph-analysis.
● 88-86% user acceptance of our summaries.
● 58% users chose our summaries as the best option.
● Use semantic information (e.g. objects, people, actions).
● Clinical application on Mild Cognitive Impairment patients.
23
Future Work

More Related Content

Similar to Visual Summary of a Day in 40 Characters

DESIGN THINKING PRESENTATION ON SMART MIRROR.pptx
DESIGN THINKING PRESENTATION ON SMART MIRROR.pptxDESIGN THINKING PRESENTATION ON SMART MIRROR.pptx
DESIGN THINKING PRESENTATION ON SMART MIRROR.pptxSHAHSHREYA4
 
0-1--Introduction FPCV-0-1.pdf
0-1--Introduction FPCV-0-1.pdf0-1--Introduction FPCV-0-1.pdf
0-1--Introduction FPCV-0-1.pdfPatrickMatthewChan
 
Lifelogging, egocentric vision and health: how a small wearable camera can he...
Lifelogging, egocentric vision and health: how a small wearable camera can he...Lifelogging, egocentric vision and health: how a small wearable camera can he...
Lifelogging, egocentric vision and health: how a small wearable camera can he...Petia Radeva
 
Intelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep LearningIntelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep LearningIRJET Journal
 
IRJET - Applications of Image and Video Deduplication: A Survey
IRJET -  	  Applications of Image and Video Deduplication: A SurveyIRJET -  	  Applications of Image and Video Deduplication: A Survey
IRJET - Applications of Image and Video Deduplication: A SurveyIRJET Journal
 
ATTENDANCE BY FACE RECOGNITION USING AI
ATTENDANCE BY FACE RECOGNITION USING AIATTENDANCE BY FACE RECOGNITION USING AI
ATTENDANCE BY FACE RECOGNITION USING AIIRJET Journal
 
IRJET- Persons Identification Tool for Visually Impaired - Digital Eye
IRJET-  	  Persons Identification Tool for Visually Impaired - Digital EyeIRJET-  	  Persons Identification Tool for Visually Impaired - Digital Eye
IRJET- Persons Identification Tool for Visually Impaired - Digital EyeIRJET Journal
 
A visual vocabulary for your UX in Life Science project
A visual vocabulary for your UX in Life Science projectA visual vocabulary for your UX in Life Science project
A visual vocabulary for your UX in Life Science projectJennifer Cham
 
Ch9visualtech
Ch9visualtechCh9visualtech
Ch9visualtechdawklein
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
DIGITALPHOTOGRAPHY PPTU koPLOADING.pptx
DIGITALPHOTOGRAPHY PPTU koPLOADING.pptxDIGITALPHOTOGRAPHY PPTU koPLOADING.pptx
DIGITALPHOTOGRAPHY PPTU koPLOADING.pptxAYANMONDAL73
 
Sin eng-3 - reducing myopia
Sin eng-3 - reducing myopiaSin eng-3 - reducing myopia
Sin eng-3 - reducing myopiasochinaction
 
CATWALKGRADER: A CATWALK ANALYSIS AND CORRECTION SYSTEM USING MACHINE LEARNIN...
CATWALKGRADER: A CATWALK ANALYSIS AND CORRECTION SYSTEM USING MACHINE LEARNIN...CATWALKGRADER: A CATWALK ANALYSIS AND CORRECTION SYSTEM USING MACHINE LEARNIN...
CATWALKGRADER: A CATWALK ANALYSIS AND CORRECTION SYSTEM USING MACHINE LEARNIN...mlaij
 
Image retrieval and re ranking techniques - a survey
Image retrieval and re ranking techniques - a surveyImage retrieval and re ranking techniques - a survey
Image retrieval and re ranking techniques - a surveysipij
 
Report Face Detection
Report Face DetectionReport Face Detection
Report Face DetectionJugal Patel
 
Materi_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfMateri_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfichsan6
 

Similar to Visual Summary of a Day in 40 Characters (20)

DESIGN THINKING PRESENTATION ON SMART MIRROR.pptx
DESIGN THINKING PRESENTATION ON SMART MIRROR.pptxDESIGN THINKING PRESENTATION ON SMART MIRROR.pptx
DESIGN THINKING PRESENTATION ON SMART MIRROR.pptx
 
Design Mind
Design MindDesign Mind
Design Mind
 
0-1--Introduction FPCV-0-1.pdf
0-1--Introduction FPCV-0-1.pdf0-1--Introduction FPCV-0-1.pdf
0-1--Introduction FPCV-0-1.pdf
 
Photo Editing.pptx
Photo Editing.pptxPhoto Editing.pptx
Photo Editing.pptx
 
Lifelogging, egocentric vision and health: how a small wearable camera can he...
Lifelogging, egocentric vision and health: how a small wearable camera can he...Lifelogging, egocentric vision and health: how a small wearable camera can he...
Lifelogging, egocentric vision and health: how a small wearable camera can he...
 
Intelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep LearningIntelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep Learning
 
IRJET - Applications of Image and Video Deduplication: A Survey
IRJET -  	  Applications of Image and Video Deduplication: A SurveyIRJET -  	  Applications of Image and Video Deduplication: A Survey
IRJET - Applications of Image and Video Deduplication: A Survey
 
ATTENDANCE BY FACE RECOGNITION USING AI
ATTENDANCE BY FACE RECOGNITION USING AIATTENDANCE BY FACE RECOGNITION USING AI
ATTENDANCE BY FACE RECOGNITION USING AI
 
IRJET- Persons Identification Tool for Visually Impaired - Digital Eye
IRJET-  	  Persons Identification Tool for Visually Impaired - Digital EyeIRJET-  	  Persons Identification Tool for Visually Impaired - Digital Eye
IRJET- Persons Identification Tool for Visually Impaired - Digital Eye
 
A visual vocabulary for your UX in Life Science project
A visual vocabulary for your UX in Life Science projectA visual vocabulary for your UX in Life Science project
A visual vocabulary for your UX in Life Science project
 
Ch9visualtech
Ch9visualtechCh9visualtech
Ch9visualtech
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Practical Usability
Practical UsabilityPractical Usability
Practical Usability
 
ICS1020 CV
ICS1020 CVICS1020 CV
ICS1020 CV
 
DIGITALPHOTOGRAPHY PPTU koPLOADING.pptx
DIGITALPHOTOGRAPHY PPTU koPLOADING.pptxDIGITALPHOTOGRAPHY PPTU koPLOADING.pptx
DIGITALPHOTOGRAPHY PPTU koPLOADING.pptx
 
Sin eng-3 - reducing myopia
Sin eng-3 - reducing myopiaSin eng-3 - reducing myopia
Sin eng-3 - reducing myopia
 
CATWALKGRADER: A CATWALK ANALYSIS AND CORRECTION SYSTEM USING MACHINE LEARNIN...
CATWALKGRADER: A CATWALK ANALYSIS AND CORRECTION SYSTEM USING MACHINE LEARNIN...CATWALKGRADER: A CATWALK ANALYSIS AND CORRECTION SYSTEM USING MACHINE LEARNIN...
CATWALKGRADER: A CATWALK ANALYSIS AND CORRECTION SYSTEM USING MACHINE LEARNIN...
 
Image retrieval and re ranking techniques - a survey
Image retrieval and re ranking techniques - a surveyImage retrieval and re ranking techniques - a survey
Image retrieval and re ranking techniques - a survey
 
Report Face Detection
Report Face DetectionReport Face Detection
Report Face Detection
 
Materi_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfMateri_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdf
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Visual Summary of a Day in 40 Characters

  • 1. Visual Summary of Egocentric Photostreams by Representative Keyframes Marc Bolaños, Ricard Mestre, Estefanía Talavera, Xavier Giró-i-Nieto and Petia Radeva 1
  • 2. Motivation Lifelogging wearable cameras can produce 1,500 images/day, more than 500,000 images/year. 2 Producing automatic summarization methods could help in many applications. Specially, we are working on: ● Memory aid for Mild Cognitive Impairment patients. ● Automatic nutrition diary.
  • 3. Extract the visual summary of a whole day capturing the most representative information for describing the day. Goal 3 Storytelling
  • 4. Extract the visual summary of a whole day capturing the most representative information for describing the day. Goal 4 Storytelling Have breakfast with the family
  • 5. Extract the visual summary of a whole day capturing the most representative information for describing the day. Goal 5 Storytelling Have breakfast with the family Go for a walk
  • 6. Extract the visual summary of a whole day capturing the most representative information for describing the day. Goal 6 Storytelling Have breakfast with the family Go for a walk Go shopping
  • 7. Extract the visual summary of a whole day capturing the most representative information for describing the day. Goal 7 Storytelling Have breakfast with the family Go for a walk Go shopping Take the bus
  • 8. Extract the visual summary of a whole day capturing the most representative information for describing the day. Goal 8 Storytelling Have breakfast with the family Go for a walk Go shopping Take the bus Have a coffee with a friend
  • 9. State of the Art Lu, Zheng, and Kristen Grauman. "Story-driven summarization for egocentric video." Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013. 9 High temporal resolution egocentric data. 1. Event segmentation. 2. Detection of salient objects and people. 3. Subset selection of video shots based on: a. Story b. Importance c. Diversity
  • 10. State of the Art Doherty, Aiden R., et al. "Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs." Proceedings of the 2008 international conference on Content-based image and video retrieval. ACM, 2008. 10 Low temporal resolution egocentric data. 1. Event segmentation. 2. Selection of the keyframes comparing several methods: a. Middle image of each segment. b. Image close to the average value in the segment (centroid-like). c. Image with highest “quality”.
  • 13. Frames Characterization Convolutional Neural Networks (CNN) trained on ImageNet. 13 Jia, Yangqing, et al. "Caffe: Convolutional architecture for fast feature embedding." Proceedings of the ACM International Conference on Multimedia. ACM, 2014.
  • 14. Events Segmentation ( I ) Applying an agglomerative clustering and adapting the cut-off parameter, we can obtain a good segmentation of all the events in our day. 14 cut-off parameter
  • 15. Events Segmentation ( II ) Division - Fusion post-processing to obtain a more robust segmentation. 15 a) After Agglomerative Clustering b) After Division c) After Fusion Division: splits and labels differently similar events spaced in time. Fusion: merges very short sub-events not considered relevant enough.
  • 16. Keyframe Selection Visual similarity-based keyframe selection criteria. 16 Distances Matrix Random Walk Minimum Distance Similarity-based probabilities
  • 18. Evaluation ( I ) ● 5 days ● 3 users ● 4005 images ● Segmentation ground truth 18 Talavera, E., Dimiccoli, M., Bolaños, M., Aghaei, M., & Radeva, P. R- clustering for egocentric video segmentation. IbPRIA 2015, Santiago de Compostela, Spain. Proceedings (Vol. 9117, p. 327). Springer. Datasets Clustering ● Jaccard Index
  • 19. Evaluation ( II ) 19 Keyframe Selection Lu, Zheng, and Kristen Grauman. "Story-driven summarization for egocentric video." Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013. Figure: brandchannel.com ● Blind taste test to 30 users for quality evaluation Representative images of the event #1 Do you think the image on the left can represent the event? Do you think the image on the center can represent the event? Yes No Yes No Yes No Do you think the image on the right can represent the event? What is the most representative image of the event? Left Center Right Individual Keyframes Quality Evaluation
  • 20. Evaluation ( III ) 20 Keyframe Selection General Summary Quality Evaluation Yes No Do you think that this set can summarize the whole day? Finally, which one do you think is the best visual summary of the day? Summary 1 Summary 2 Summary 3 Summary 4 Summary 1 Some of the summaries you will see might be very similar (differentiable only in some images). In that case you can choose any of them. Visual summaries of the day
  • 21. Evaluation - Individual Keyframes 21 What is the most representative image of the event? Do you think that the image on the left/center/right can represent the event?
  • 22. Evaluation - General Summary 22 Can this set of images represent the complete day? Which summary is the best, in your opinion?
  • 23. Conclusions ● New keyframe selection methodology taking into account visual and temporal information. ● Keyframe selection using CNN-based global information and graph-analysis. ● 88-86% user acceptance of our summaries. ● 58% users chose our summaries as the best option. ● Use semantic information (e.g. objects, people, actions). ● Clinical application on Mild Cognitive Impairment patients. 23 Future Work