Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Can Deep Learning and Egocentric Vision
for Visual Lifelogging help us eat better?
Petia Radeva
www.cvc.uab.es/~petia
Co...
Index
 Healthy habits
 Deep learning
 Automatic food analysis
 Egocentric vision
22:45
I Medical Imaging
22:45
What happens outside the body?
22:45
Project led by Dr. Maite Garolera of the Consorci Sanitari de Terrassa:
Goal: using episodic images to develop cognitive e...
Risk factors and chronic diseases
22:45
Chronic disease statistics
22:45
Obesity in Catalunya
51% of the Catalan population from 18 to 74 years overweight, 15% are obese.
62% without university s...
The obesity pandemic
 Risk factors for cancers, cardiovascular and
metabolic disorders and leading causes of
premature mo...
Which wearables do consumers plan to buy?
• 21M Fitbit sold in 2015!
• It’s expected to double by 2018, to 81.7 million us...
 Today, automatically measuring physical activity is not a problem.
 But what about food and nutrition?
22:45
What are w...
 But what about food and nutrition?
 State of the art: Nutritional health apps are based on manual food diaries.
22:45
S...
https://techcrunch.com/2016/09/29/lose-it-launches-snap-it-to-let-users-count-calories-in-food-photos/
How many food
categ...
Image databases evolution
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
10000000
ARDatabase(19...
Imagenet
22:45
Food datasets
Food256: 25.600 images (100 images/class)
Classes: 256
Food101 – 101.000 images (1000 images/class)
Classes:...
One is for sure,
if there is a solution,
it is highly probable
to need
Deep learning!
22:45
Index
 Healthy habits and food analysis
 Deep learning
 Automatic food analysis
 Egocentric vision
22:45
Deep leearning everywhere
22:45
White House wants the nation to get ready for AI
October, 2016
http://readwrite.com/2016/10/16/white-house-offers-artifici...
The learning pipeline
22:45
Input
f(x,W)y(f)
Score function
Predicted label
X
Feature
extraction
Good enough?
The traning process
22:45
Input
+
Ground
truth
f(x,W)argminf Σi Error(yi(f),yi)
Score function
X
Feature
extraction
Learn f
The learning process
22:45
argminf Σi Error(yi(f),yi)
Expectation over
data distribution
Prediction Ground Truth
Measure o...
The problem of image classification
22:45
32x32x3 D vector
Each image of M rows by N columns by C channels (3 for color im...
Linear classification
22:45
Given two classes how to learn a hyperplane to separate them?
R32x32x3
To find the hyperplane ...
Linear classification
22:45
How to project data in the feature space:
f(x)=W x + b
If x is an image of (32x32x3), -> x in ...
Linear classification
22:45
How to project data in the feature space:
f(x)=W x + b
If we have 3 classes, f(x) will give 3 ...
Image classification
Adapted from: Fei-Fei Li & Andrej Karpathy & Justin Johnson
22:45
Loss function and optimisation
 Question: if you were to assign a single number to how unhappy you are
with these scores,...
How is a CNN doing deep learning?
22:45
y=Wx
Image
….
First layer
y1=ΣiW1ixi
y10=ΣiW10ixi
….
Second layer
y=W(Wx) y=W(W(Wx...
Why a CNN is a neural network?
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson
22:45
Modern CNNs – 10M neurons
Human C...
Activation functions of NN
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson
22:45
Why is it convolutional?
Adapted from: Fei-Fei Li & Andrej Karpathy & Justin Johnson
22:45
What is new in the Convolutional Neural Network?
22:45
Convolutional and Max-pooling layer
22:45
Convolutional layer
Max-pool layer
Spatial info No spatial info
Example architecture
22:45
The trick is to train the weights such that when the network sees a picture of a truck, the las...
Training a CNN
22:45
The process of training a CNN consists of training all hyperparameters: convolutional
matrices and we...
1001 benefits of CNN
 Transfer learning: Fine tunning for object recognition
 Replace and retrain the classier on top of...
Index
 Healthy habits and food analysis
 Deep learning
 Automatic food analysis
 Egocentric vision
22:45
Automatic food analysis
Can we automatically recognize food?
• To detect and classify every instance of a dish in all of i...
Automatic Food Analysis
 Food detection
 Food recognition
 Food environment recognition
 Eating pattern extraction
22:...
Food localization
Food
Non Food
...
w1
w2
wn
G
oogleNet
Softm
ax
G
AP
inception4eoutput
Deep
Convolution
X
FAM
Bounding
Bo...
Image Input
Foodness Map
Extraction
Food Detection CNN
Food Recognition CNN
Food Type
Recognition
Apple
Strawberry
Food re...
Demo
22:45
Herruzo, P., Bolaños, M. and Radeva, P. (2016). “Can a CNN Recognize Catalan Diet?”. In Proceedings of the 8th ...
Food environment classification
Bakery
Banquet hall
Bar
Butcher shop
Cafetería
Ice cream parlor
Kitchen
Kitchenette
Market...
Towards automatic image description
22:45
Bolaños, M., Peris, Á., Casacuberta, F., & Radeva, P. “VIBIKNet: Visual Bidirect...
Two main questions?
 What we eat?
 Automatic food recognition vs. Food
diaries
 And how we eat?
 Automatic eating patt...
Index
 Healthy habits and food analysis
 Deep learning
 Automatic food analysis
 Egocentric vision
22:45
Wearable cameras and the life-logging trend
Shipments of wearable computing devices worldwide by
category from 2013 to 201...
Life-logging data
 What we have:
22:45
Wealth of life-logging data
 We propose an energy-based approach for motion-based event
segmentation of life-logging sequ...
Visual Life-logging data
Events to be extracted from life-logging images
- Activities he/she has done
- Interactions he/sh...
Egocentric vision progress
22:45
Bolaños, M.​, Dimiccoli, M. & Radeva, P. (2015). “Towards Storytelling from Visual Lifelo...
Towards healthy habits
Towards visualizing summarized lifestyle data to ease the management of the user’s
healthy habits (...
Conclusions
 Healthy habits – one of the main health concern for people, society, and
governments
 Deep learning – a tec...
THANK YOU!
22:45
Upcoming SlideShare
Loading in …5
×

Can Deep Learning and Egocentric Vision for Visual Lifelogging help us eat better?

477 views

Published on

This talk was presented at the CCIA'2016 conference.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Can Deep Learning and Egocentric Vision for Visual Lifelogging help us eat better?

  1. 1.  Can Deep Learning and Egocentric Vision for Visual Lifelogging help us eat better? Petia Radeva www.cvc.uab.es/~petia Computer Vision at UB (CVUB), Universitat de Barcelona & Medical Imaging Laboratory, Computer Vision Center
  2. 2. Index  Healthy habits  Deep learning  Automatic food analysis  Egocentric vision 22:45
  3. 3. I Medical Imaging 22:45
  4. 4. What happens outside the body? 22:45
  5. 5. Project led by Dr. Maite Garolera of the Consorci Sanitari de Terrassa: Goal: using episodic images to develop cognitive exercises and tools for memory reinforcing of MCI and Alzheimer people. 22:45 But episodic images serve for something more than reinforcing memory…. They are showing the lifestyle of individuals! Rememory: Life-logging for MCI treatment
  6. 6. Risk factors and chronic diseases 22:45
  7. 7. Chronic disease statistics 22:45
  8. 8. Obesity in Catalunya 51% of the Catalan population from 18 to 74 years overweight, 15% are obese. 62% without university studies vs. 36% with high education. 22:45
  9. 9. The obesity pandemic  Risk factors for cancers, cardiovascular and metabolic disorders and leading causes of premature mortality worldwide.  4.2 million die of chronic diseases in Europe (diabetes or cancer) linked to lack of physical activities and unhealthy diet.  Physical activities can increase lifespan by 1.5-3.7 years. 22:45
  10. 10. Which wearables do consumers plan to buy? • 21M Fitbit sold in 2015! • It’s expected to double by 2018, to 81.7 million users. 22:45 The Consumer Technology Association (CTA), formerly the Consumer Electronics Association (CEA), surveyed 1,001 US internet users. Source: eMarketer.
  11. 11.  Today, automatically measuring physical activity is not a problem.  But what about food and nutrition? 22:45 What are we missing in health applications?
  12. 12.  But what about food and nutrition?  State of the art: Nutritional health apps are based on manual food diaries. 22:45 Sparkpeople LoseIt! MyFitnessPal Cronometer Fatsecret What are we missing in health applications?
  13. 13. https://techcrunch.com/2016/09/29/lose-it-launches-snap-it-to-let-users-count-calories-in-food-photos/ How many food categories there are? Today we are speaking about 200.000 basic food categories. What about automatic food recognition? Is it possible? 22:45
  14. 14. Image databases evolution 0 1000000 2000000 3000000 4000000 5000000 6000000 7000000 8000000 9000000 10000000 ARDatabase(1998) YaleFaceDatabase(2001) Caltech(2003) 101(2004) VOC2005(2005) TUGraz-02(2005) VOC2006(2006) Caltech256(2006) MIT-CSAIL(2006) VOC2007(2007) Cifar-10(2009) Cifar-100(2009) Imagenet(2011) SunDB(2012) Places(2014) Food101(2014) Places2(2016) 3276 165 4620 9197 1578 1280 5304 30607 2873 9963 6000060000 1400000 15000 2500000 101000 10000000 Database Number of objects/Database Number of images/Database 126 15 6 101 10 4 10 256 125 21 10 100 1000 900 205 101 476 0 200 400 600 800 1000 1200 ARDatabase(1998) YaleFace… Caltech(2003) 101(2004) VOC2005(2005) TUGraz-02(2005) VOC2006(2006) Caltech256(2006) MIT-CSAIL(2006) VOC2007(2007) Cifar-10(2009) Cifar-100(2009) Imagenet(2011) SunDB(2012) Places(2014) Food101(2014) Places2(2016) ImageNet & Deep learning 22:45
  15. 15. Imagenet 22:45
  16. 16. Food datasets Food256: 25.600 images (100 images/class) Classes: 256 Food101 – 101.000 images (1000 images/class) Classes: 101 Food101+FoodCAT: 146.392 (101.000+45.392) Classes: 231 EgocentricFood: 5038 images Classes: 9 22:45 150.000 images 231 categories 1.400.000 images 1000 categories ????? images 200.000 categories Food DB ImageNet Future Food DB
  17. 17. One is for sure, if there is a solution, it is highly probable to need Deep learning! 22:45
  18. 18. Index  Healthy habits and food analysis  Deep learning  Automatic food analysis  Egocentric vision 22:45
  19. 19. Deep leearning everywhere 22:45
  20. 20. White House wants the nation to get ready for AI October, 2016 http://readwrite.com/2016/10/16/white-house-offers-artificial-intelligence-plan-cl1/ 22:45
  21. 21. The learning pipeline 22:45 Input f(x,W)y(f) Score function Predicted label X Feature extraction Good enough?
  22. 22. The traning process 22:45 Input + Ground truth f(x,W)argminf Σi Error(yi(f),yi) Score function X Feature extraction Learn f
  23. 23. The learning process 22:45 argminf Σi Error(yi(f),yi) Expectation over data distribution Prediction Ground Truth Measure of prediction quality (error, loss) Training data {(xi,yi), i = 1,2,…,n} Loss function the negative conditional log-likelihood, with the interpretation that fi(X) estimates P(Y=i|X): L(f(x),y)) = -log fi(x), where fi(x)>=0, Σi fi(x) = 1.
  24. 24. The problem of image classification 22:45 32x32x3 D vector Each image of M rows by N columns by C channels (3 for color images) can be considered as a vector/point in RMxNxC and viceversa. Dual representation of images as points/vectors R32x32x3
  25. 25. Linear classification 22:45 Given two classes how to learn a hyperplane to separate them? R32x32x3 To find the hyperplane that separates dogs from cats, we need to define: • The score function • The loss function • And the optimization process.
  26. 26. Linear classification 22:45 How to project data in the feature space: f(x)=W x + b If x is an image of (32x32x3), -> x in R3072, The matrix W is (3x3072). The bias vector b is 3-dimensional. 3072x1 3x3072 3x1 3x1
  27. 27. Linear classification 22:45 How to project data in the feature space: f(x)=W x + b If we have 3 classes, f(x) will give 3 scores. 3072x1 3x3072 3x1 3x1
  28. 28. Image classification Adapted from: Fei-Fei Li & Andrej Karpathy & Justin Johnson 22:45
  29. 29. Loss function and optimisation  Question: if you were to assign a single number to how unhappy you are with these scores, what would you do? 22:45 Question : Given the score and the loss function, how to find the parameters W? L(f(xi),yi) W Loss function f(xi,W) Score function Input Xi Yi
  30. 30. How is a CNN doing deep learning? 22:45 y=Wx Image …. First layer y1=ΣiW1ixi y10=ΣiW10ixi …. Second layer y=W(Wx) y=W(W(Wx)) …. Output layer W11 W12 W13 W1n Fully connected layers y1=ΣiW1ixi …
  31. 31. Why a CNN is a neural network? From: Fei-Fei Li & Andrej Karpathy & Justin Johnson 22:45 Modern CNNs – 10M neurons Human CNNs – 5B of neurons.
  32. 32. Activation functions of NN From: Fei-Fei Li & Andrej Karpathy & Justin Johnson 22:45
  33. 33. Why is it convolutional? Adapted from: Fei-Fei Li & Andrej Karpathy & Justin Johnson 22:45
  34. 34. What is new in the Convolutional Neural Network? 22:45
  35. 35. Convolutional and Max-pooling layer 22:45 Convolutional layer Max-pool layer Spatial info No spatial info
  36. 36. Example architecture 22:45 The trick is to train the weights such that when the network sees a picture of a truck, the last layer will say “truck”. Credit slide: Li Fei-fei
  37. 37. Training a CNN 22:45 The process of training a CNN consists of training all hyperparameters: convolutional matrices and weights of the fully connected layers. - Several millions of parameters!!!
  38. 38. 1001 benefits of CNN  Transfer learning: Fine tunning for object recognition  Replace and retrain the classier on top of the ConvNet  Fine-tune the weights of the pre-trained network by continuing the backpropagation  Feature extraction by CNN  Object detection  Object segmentation  Image similarity and matching by CNN 22:45Convolutional Neural Networks (4096 Features)
  39. 39. Index  Healthy habits and food analysis  Deep learning  Automatic food analysis  Egocentric vision 22:45
  40. 40. Automatic food analysis Can we automatically recognize food? • To detect and classify every instance of a dish in all of its variants, shapes and positions and in a large number of images. The main problems that arise are: • Complexity and variability of the data. • Huge amounts of data to analyse. 22:45
  41. 41. Automatic Food Analysis  Food detection  Food recognition  Food environment recognition  Eating pattern extraction 22:45
  42. 42. Food localization Food Non Food ... w1 w2 wn G oogleNet Softm ax G AP inception4eoutput Deep Convolution X FAM Bounding Box G eneration Examples of localization and recognition on UECFood256 (top) and EgocentricFood (bottom). Ground truth is shown in green and our method in blue. 22:45 Marc Bolaños, Petia Radeva: Simultaneous Food Localization and Recognition, ICPR’16, Cancun, Mexico, arXiv.org> cs> arXiv:1604.07953, 2016.
  43. 43. Image Input Foodness Map Extraction Food Detection CNN Food Recognition CNN Food Type Recognition Apple Strawberry Food recognition Results: TOP-1 74.7% TOP-5 91.6% SoA (Bossard,2014): TOP-1 56,4%22:45
  44. 44. Demo 22:45 Herruzo, P., Bolaños, M. and Radeva, P. (2016). “Can a CNN Recognize Catalan Diet?”. In Proceedings of the 8th Intl Conf. for Promoting the Application of Mathematics in Technical and Natural Sciences (AMiTaNS).
  45. 45. Food environment classification Bakery Banquet hall Bar Butcher shop Cafetería Ice cream parlor Kitchen Kitchenette Market Pantry Picnic Area Restaurant Restaurant Kitchen Restaurant Patio Supermarket Candy store Coffee shop Dinette Dining room Food court Galley Classification results: 0.92 - Food-related vs. Non-food-related 0.68 - 22 classes of Food-related categories 22:45
  46. 46. Towards automatic image description 22:45 Bolaños, M., Peris, Á., Casacuberta, F., & Radeva, P. “VIBIKNet: Visual Bidirectional Kernelized Network for the VQA Challenge” VQA Challenge, CVPR '16.
  47. 47. Two main questions?  What we eat?  Automatic food recognition vs. Food diaries  And how we eat?  Automatic eating pattern extraction – when, where, how, how long, with whom, in which context? 22:45
  48. 48. Index  Healthy habits and food analysis  Deep learning  Automatic food analysis  Egocentric vision 22:45
  49. 49. Wearable cameras and the life-logging trend Shipments of wearable computing devices worldwide by category from 2013 to 2015 (in millions) 22:45
  50. 50. Life-logging data  What we have: 22:45
  51. 51. Wealth of life-logging data  We propose an energy-based approach for motion-based event segmentation of life-logging sequences of low temporal resolution  - The segmentation is reached integrating different kind of image features and classifiers into a graph-cut framework to assure consistent sequence treatment. 22:45 Complete dataset of a day captured with SenseCam (more than 4,100 images Choice of devise depends on: 1) where they are set: a hung up camera has the advantage that is considered more unobtrusive for the user, or 2) their temporal resolution: a camera with a low fps will capture less motion information, but we will need to process less data. We chose a SenseCam or Narrative - cameras hung on the neck or pinned on the dress that capture 2-4 fps. Or the hell of life-logging data
  52. 52. Visual Life-logging data Events to be extracted from life-logging images - Activities he/she has done - Interactions he/she has participated - Events he/she has taken part - Duties he/she has performed - Environments and places he/she visited, etc. 22:45 Dimiccoli, M., Bolaños, M​., Talavera, E., Aghaei, M., Nikolov, S., and Radeva, P. (2015). “SRClustering: Semantic ​ Regularized Clustering for Egocentric Photo Streams Segmentation”. In Computer Vision and Image Understanding Journal (CVIU) (In press). Preprint: http://arxiv.org/abs/1512.07143
  53. 53. Egocentric vision progress 22:45 Bolaños, M.​, Dimiccoli, M. & Radeva, P. (2015). “Towards Storytelling from Visual Lifelogging: An ​ Overview”. In Transactions on HumanMachine Systems Journal (THMS) (IN PRESS). Preprint: http://arxiv.org/abs/1507.06120
  54. 54. Towards healthy habits Towards visualizing summarized lifestyle data to ease the management of the user’s healthy habits (sedentary lifestyles, nutritional activity, etc.). 22:45 M. Aeghai, M. Dimiccoli, P. Radeva. Extended Bag-of-Tracklets for Multi-Face Tracking in Egocentric Photo Streams. Computer Vision and Image Understanding, Volume 149, 146-156, 2016. Special Issue on Assistive Computer Vision and Robotics, Elsevier, 2016. doi: 10.1016/j.cviu.2016.02.013
  55. 55. Conclusions  Healthy habits – one of the main health concern for people, society, and governments  Deep learning – a technology that came to stay  A new technological trend that is affecting directly our environment  Food analysis and recognition – a new challenge with huge potential for applications  We need food databases of millions of images and thousands of categories  A wide set of problems for food analysis – recognition, segmentation, habits characterization, image and video description, etc.  Egocentric vision and Lifelogging – a recent trend in Computer Vision and unexplored technology that hides big potential to help people monitor and describe their behaviour and thus improve their lifestyle. 22:45
  56. 56. THANK YOU! 22:45

×