Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Learning for Food Analysis

1,160 views

Published on

Invited talk at the conference Ammitans 2016

Published in: Technology
  • Be the first to comment

Deep Learning for Food Analysis

  1. 1.  Deep Learning for Food Analysis Petia Radeva www.cvc.uab.es/~petia Computer Vision at UB (CVUB), Universitat de Barcelona & Medical Imaging Laboratory, Computer Vision Center
  2. 2. Index  Motivation  Learning and Deep learning  Deep learning for food analysis  Lifelogging 2 22:55AMiTANS’16, Albena, 26 of June, 2016
  3. 3. Metabolic diseases and health 3 22:55AMiTANS’16, Albena, 26 of June, 2016  4.2 million die of chronic diseases in Europe (diabetes or cancer) linked to lack of physical activities and unhealthy diet.  Physical activities can increase lifespan by 1.5-3.7 years.  Obesity is a chronic disease associated with huge economic, social and personal costs.  Risk factors for cancers, cardiovascular and metabolic disorders and leading causes of premature mortality worldwide.
  4. 4. Health and medical care  Today, 88% of U.S. healthcare dollars are spent on medical care – access to physicians, hospitals, procedures, drugs, etc.  However, medical care only accounts for approximately 10% of a person’s health.  Approximately half the decline in U.S. Deaths from coronary heart disease from 1980 through 2000 may be attributable to reductions in major risk factors (systolic blood pressure, smoking, physical inactivity). 4 22:55AMiTANS’16, Albena, 26 of June, 2016
  5. 5. Health and medical care Recent data shows evidence of stagnation that may be explained by the increases in obesity and diabetes prevalence. Healthcare resources and dollars must now be dedicated to improving lifestyle and behavior. 5 22:55AMiTANS’16, Albena, 26 of June, 2016
  6. 6. Why food analysis?  Today, measuring physical activities is not a problem.  But what about food and nutrition?  Nutritional health apps are based on food diaries 6 22:55AMiTANS’16, Albena, 26 of June, 2016
  7. 7. Two main questions?  What we eat?  Automatic food recognition vs. Food diaries  And how we eat?  Automatic eating pattern extraction – when, where, how, how long, with whom, in which context?  Lifelogging 7 22:55AMiTANS’16, Albena, 26 of June, 2016
  8. 8. Index  Motivation  Learning and Deep learning  Deep learning for food analysis  Lifelogging 8 22:55AMiTANS’16, Albena, 26 of June, 2016
  9. 9. Why “Learn”?  Machine learning consists of:  Developing models, methods and algorithms to make computers learn i.e. take decision.  Training from big amount of example data.  Learning is used when:  Humans are unable to explain their expertise (speech recognition)  Human expertise does not exist (navigating on Mars),  Solution changes in time (routing on a computer network)  Solution needs to be adapted to particular cases (user biometrics)  Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce.  Example in retail: Customer transactions to consumer behavior: People who bought “Da Vinci Code” also bought “The Five People You Meet in Heaven” (www.amazon.com)  Build a model that is a good and useful approximation to the data. 9 22:55AMiTANS’16, Albena, 26 of June, 2016
  10. 10. Growth of Machine Learning  This trend is accelerating due to:  Big data and data science today are a reality  Improved data capture, networking, faster computers  New sensors / IO devices / Internet of Things  Software too complex to write by hand  Demand for self-customization to user  It turns out to be difficult to extract knowledge from human expertsfailure of expert systems in the 1980’s.  Improved machine learning algorithms AMiTANS’16, Albena, 26 of June, 2016 10 22:55
  11. 11. 22:55AMiTANS’16, Albena, 26 of June, 2016 11
  12. 12. Deep leearning everywhere 12 22:55AMiTANS’16, Albena, 26 of June, 2016
  13. 13. Deep learning applications 13 22:55AMiTANS’16, Albena, 26 of June, 2016
  14. 14. Formalization of learning  Consider:  training examples: D= {z1, z2, .., zn} with the zi being examples sampled from an unknown process P(Z);  a model f and a loss functional L(f,Z) that returns a real-valued scalar. Minimize the expected value of L(f,Z) under the unknown generating process P(Z).  Supervised Learning: each example is an (input,target) pair: Z=(X,Y)  classification: Y is a finite integer (e.g. a symbol) corresponding to a class index, and we often take as loss function the negative conditional log-likelihood, with the interpretation that fi(X) estimates P(Y=i|X): L(f,(X,Y)) = -log fi(X), where fi(X)>=0, Σi fi(X) = 1. 14 22:55AMiTANS’16, Albena, 26 of June, 2016
  15. 15. Classification/Recognition Is this an urban or rural area? Input: x Output: y  {-1,+1} From: M. Pawan Kumar Which city is this? Output: y  {1,2,…,C} Binary classification Multi-class classification 22:55AMiTANS’16, Albena, 26 of June, 2016 15
  16. 16. Object Detection and segmentation Where is the object in the image? Output: y  {Pixels} From: M. Pawan Kumar What is the semantic class of each pixel? Output: y  {1,2,…,C}|Pixels| car road grass treesky 22:55AMiTANS’16, Albena, 26 of June, 2016 16
  17. 17. A Simplified View of the Pipeline Input x Features Φ(x) Scores f(Φ(x),y) Extract Features Compute Scores maxy f(Φ(x),y)Prediction y(f) Learn f From: M. Pawan Kumar22:55AMiTANS’16, Albena, 26 of June, 2016 17
  18. 18. Learning Objective Data distribution P(x,y) Prediction f* = argminf EP(x,y) Error(y(f),y) Ground Truth Measure of prediction quality (error, loss) Distribution is unknown Expectation over data distribution From: M. Pawan Kumar22:55AMiTANS’16, Albena, 26 of June, 2016 18
  19. 19. Learning Objective Training data {(xi,yi), i = 1,2,…,n} Prediction f* = argminf EP(x,y) Error(y(f),y) Ground Truth Measure of prediction quality Expectation over data distribution From: M. Pawan Kumar22:55AMiTANS’16, Albena, 26 of June, 2016 19
  20. 20. Learning Objective Training data {(xi,yi), i = 1,2,…,n} Prediction f* = argminf Σi Error(yi(f),yi) Ground Truth Measure of prediction quality Expectation over empirical distribution Finite samples From: M. Pawan Kumar22:55AMiTANS’16, Albena, 26 of June, 2016 20
  21. 21. The problem of image classification 21 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
  22. 22. Dual representation of images as points/vectors 22 22:55 AMiTANS’16, Albena, 26 of June, 2016 32x32x3 D vector Each image of M rows by N columns by C channels ( 3 for color images) can be considered as a vector/point in RMxNxC and viceversa.
  23. 23. Linear Classier and key classification components 22:55 23 Given two classes how to learn a hyperplane to separate them? To find the hyperplane we need to specify : • Score function • Loss function • Optimization AMiTANS’16, Albena, 26 of June, 2016
  24. 24. Interpreting a linear classifier 24 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016 32x32x3 D vector
  25. 25. General learning pipeline 25 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016 Training consists of constructing the prediction model f according to a training set.
  26. 26. The problem of image classification 26 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
  27. 27. Parametric approach: linear classifier 27 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016 Score function:
  28. 28. Loss function/optimization 28 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016 The score function
  29. 29. Image classification 29 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
  30. 30. Loss function and optimisation  Question: if you were to assign a single number to how unhappy you are with these scores, what would you do? 22:55 30 Question : Given the score and the loss function, how to find the parameters W? AMiTANS’16, Albena, 26 of June, 2016
  31. 31. Interpreting a linear classifier 31 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016 10x3072
  32. 32. Why is a CNN doing deep learning? 32 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016 where fi=Σjwij * xj w1n f1 f2 fm x1 x2 xn w11 w12
  33. 33. Activation functions of NN 33 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
  34. 34. Setting the number of layers and their size 34 - Neurons arranged into fully-connected layers - Bigger = better (but might have to regularize more strongly). - How many parameters to learn? From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
  35. 35. Why a CNN is neural network? 35 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
  36. 36. Architecture of neural networks 22:55 36 Modern CNNs: ~10 million neurons Human visual cortex: ~5 billion neurons AMiTANS’16, Albena, 26 of June, 2016
  37. 37. Activation functions of NN 37 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
  38. 38. What is it a Convolutional Neural Network? 22:55 38 AMiTANS’16, Albena, 26 of June, 2016
  39. 39. Convolutional and Max-pooling layer 22:55 39 Convolutional layer Max-pool layer
  40. 40. How does the CNN work? 22:55 40 AMiTANS’16, Albena, 26 of June, 2016
  41. 41. Example architecture 22:55 41 The trick is to train the weights such that when the network sees a picture of a truck, the last layer will say “truck”. AMiTANS’16, Albena, 26 of June, 2016
  42. 42. Training a CNN 22:55AMiTANS’16, Albena, 26 of June, 2016 42 The process of training a CNN consists of training all hyperparameters: convolutional matrices and weights of the fully connected layers. - Several millions pf parameters!!!
  43. 43. Learned convolutional filters 22:55 43 AMiTANS’16, Albena, 26 of June, 2016
  44. 44. Neural network training 44 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson Using the chain rule, optimize the parameters, W of the neural network by gradient descent and backpropagation. 22:55AMiTANS’16, Albena, 26 of June, 2016 Optimization consists of training severalmillions of parameters!
  45. 45. Monitoring loss and accuracy 22:55 45 Looks linear? Learning rate too low! Decreases too slowly? Learning rate too high. Looks too noisy? Increases the batch size. Big gap? - you're overfitting, increase regularization! AMiTANS’16, Albena, 26 of June, 2016
  46. 46. Transfer learning 46 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
  47. 47. Imagenet 22:55 47 AMiTANS’16, Albena, 26 of June, 2016
  48. 48. 1001 benefits of CNN  Transfer learning: Fine tunning for object recognition  Replace and retrain the classier on top of the ConvNet  Fine-tune the weights of the pre-trained network by continuing the backpropagation  Feature extraction by CNN  Object detectiion  Object segmentation  Image similarity and matching by CNN 22:55 48 Convolutional Neural Networks (4096 Features)AMiTANS’16, Albena, 26 of June, 2016
  49. 49. ConvNets are everywhere 49 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
  50. 50. ConvNets are everywhere 50 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
  51. 51. ConvNets are everywhere 51 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
  52. 52. ConvNets are everywhere 52 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
  53. 53. ConvNets are everywhere 53 From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
  54. 54. Index  Motivation  Learning and Deep learning  Deep learning for food analysis  Lifelogging 54 22:55AMiTANS’16, Albena, 26 of June, 2016
  55. 55. Automatic food analysis 55 Can we automatically recognize food? • To detect every instance of a dish in all of its variants, shapes and positions and in a large number of images. The main problems that arise are: • Complexity and variability of the data. • Huge amounts of data to analyse. 22:55AMiTANS’16, Albena, 26 of June, 2016
  56. 56. Automatic Food Analysis  Food detection  Food recognition  Food environment recognition  Eating pattern extraction 56 22:55AMiTANS’16, Albena, 26 of June, 2016
  57. 57. Food datasets 57 Food256 - 25.600 images (100 images/class) Classes: 256 Food101 – 101.000 images (1000 images/class) Classes: 101 Food101+FoodCAT: 146.392 (101.000+45.392) Classes: 131 EgocentricFood: 5038 images Classes: 9 22:55AMiTANS’16, Albena, 26 of June, 2016
  58. 58. Food localization and recognition 58 General scheme of our food localization and recognition proposal 22:55AMiTANS’16, Albena, 26 of June, 2016
  59. 59. Food localization Food Non Food ... w1 w2 wn G oogleNet Softm ax G AP inception4eoutput Deep Convolution X FAM Bounding Box G eneration 59 Examples of localization and recognition on UECFood256 (top) and EgocentricFood (bottom). Ground truth is shown in green and our method in blue. 22:55AMiTANS’16, Albena, 26 of June, 2016
  60. 60. Image Input Foodness Map Extraction Food Detection CNN Food Recognition CNN Food Type Recognition Apple Strawberry Food recognition Results: TOP-1 74.7% TOP-5 91.6% SoA (Bossard,2014): TOP-1 56,4%22:55AMiTANS’16, Albena, 26 of June, 2016 60
  61. 61. Demo 61 22:55AMiTANS’16, Albena, 26 of June, 2016
  62. 62. Food environment classification 62 Bakery Banquet hall Bar Butcher shop Cafetería Ice cream parlor Kitchen Kitchenette Market Pantry Picnic Area Restaurant Restaurant Kitchen Restaurant Patio Supermarket Candy store Coffee shop Dinette Dining room Food court Galley Classification results: 0.92 - Food-related vs. Non-food-related 0.68 - 22 classes of Food-related categories 22:55AMiTANS’16, Albena, 26 of June, 2016
  63. 63. Index  Motivation  Learning and Deep learning  Deep learning for food analysis  Lifelogging 63 22:55AMiTANS’16, Albena, 26 of June, 2016
  64. 64. Wearable cameras and the life-logging trend 64 Shipments of wearable computing devices worldwide by category from 2013 to 2015 (in millions) 22:55AMiTANS’16, Albena, 26 of June, 2016
  65. 65. Life-logging data  What we have: 22:55 65 AMiTANS’16, Albena, 26 of June, 2016
  66. 66. Wealth of life-logging data  We propose an energy-based approach for motion-based event segmentation of life-logging sequences of low temporal resolution  - The segmentation is reached integrating different kind of image features and classifiers into a graph-cut framework to assure consistent sequence treatment. 22:55AMiTANS’16, Albena, 26 of June, 2016 66 Complete dataset of a day captured with SenseCam (more than 4,100 images Choice of devise depends on: 1) where they are set: a hung up camera has the advantage that is considered more unobtrusive for the user, or 2) their temporal resolution: a camera with a low fps will capture less motion information, but we will need to process less data. We chose a SenseCam or Narrative - cameras hung on the neck or pinned on the dress that capture 2-4 fps. Or the hell of life-logging data
  67. 67. Visual Life-logging data Events to be extracted from life-logging images 67 The camera captures up to 2000 images per day, around 100.000 images per month. Applying Computer Vision algorithms we are able to extract the diary of the person: - Activities he/she has done - Interactions he/she has participated - Events he/she has taken part - Duties he/she has performed - Environments and places he/she visited, etc. 22:55AMiTANS’16, Albena, 26 of June, 2016
  68. 68. Towards healthy habits Towards visualizing summarized lifestyle data to ease the management of the user’s healthy habits (sedentary lifestyles, nutritional activity, etc.). 22:55AMiTANS’16, Albena, 26 of June, 2016 68
  69. 69. Conclusions  Healthy habits – one of the main health concern for people, society, and governments  Deep learning – a technology that “came to stay”  A new technological trend with huge power  Specially useful for food recognition and analysis  Lifelogging – a unexplored technology that hides big potential to help people monitor and describe their behaviour and thus improve their lifestyle. 69 22:55AMiTANS’16, Albena, 26 of June, 2016
  70. 70. 22:55 70 AMiTANS’16, Albena, 26 of June, 2016
  71. 71. Deep learning applications 71 22:55AMiTANS’16, Albena, 26 of June, 2016

×