SlideShare a Scribd company logo
1 of 16
Download to read offline
Leveraging Context to
Support Automated Food
Recognition in Restaurants
Vinay Bettadapura, Edison Thomaz, Aman Parnami, Gregory D.
Abowd, Irfan Essa
Presented by: Pedro Herruzo Sanchez
Contents
● Motivation
● Contributions of the paper
● System Overview
● Features & Classifier
● Evaluation on the PFID dataset
● Evaluation for In-the-wild food images
● Recognition without Location Prior
● Discussion
Motivation
● In 1970, 25.9% of food spending was on food away from home
● By 2012, 43.1%
● 80% of Americans report eating fast-food monthly and 40% report
weakly
● Obesity, nutrition, and chronic diseases are now a major health
concern
● Logging eating habits are increasing to prevent diseases, using
diaries and smartphones
● Manually tracking (the most common) is time-consuming, prone
to errors, and it is susceptible to selective under-reporting
Contributions
● They develop an automated workflow where online resources are queried with
contextual data (location) to find images and additional information about the
restaurant where the food picture was taken, with the intent to build classifiers for
food recognition.
● Classification by SMO-MKL multi-class SVM with features extracted from test
photographs
● ‘In the wild’ evaluation from food images taken in 10 restaurants across the 5
different cuisines: American, Indian, Italian, Mexican and Thai.
● Comparative evaluation focused on location information of the images
● Acquisition of food images using: FoodGawker, Instagram, Pintarest and Flickr
● Determine the restaurant where the picture was took, using longitude and latitude
coordinates, and APIs like Yelp or Google Places.
● Once they determine a particular restaurant R:
○ Search for R’s menu (allmenus.com, openmenu.com).
○ For each item in the menu of R, download the top 50 images from Google Images
WEAKLY-LABELED TRAINING DATA
● Test data formed by segmented images from a certain restaurant R
System Overview
System Overview
Figure 1. System overview. Pipeline from taking a picture until classifying.
● They focus on illumination Changes: Images taken in restaurant are typically indoor
and under varying conditions -> Color descriptors
● Harris-Laplace point detector as a Feature extractor
● For Feature descriptor they use:
○ Color Moment Invariants: Give as a value for a region of an image
○ Hue Histogram weighted by saturation (Invariant to changes and shifts in light intensity and
color)
○ C-SIFT (Invariant to changes in light intensity)
○ OpponentSIFT: Channels in the opponent color space described using SIFT descriptors
(Invariant to changes and shifts in light intensity)
○ RGB-SIFT: Computed for each channel independently (Invariant to changes and shifts in
light intensity and color)
○ SIFT (Invariant to changes and shifts in light intensity)
Features
● For a given restaurant R, 100.000 interest point are detected
● For each of the 6 descriptors:
○ BoW histogram using k-means with k=1000
○ Compute Extended Gaussian kernels
● A linear combination of this 6 kernels is learned by using the Sequential Minimal
Optimization (SMP) algorithm, with a p-norm, with p>1
● SVM to classify the images
Classification using SMO-MKL
100.000
6
clusters
1000
BoW 1
1
1
1
1
1
Extended
Gaussian
Kernels
K1
K6
descriptors
Evaluation on the PFID dataset
● PFID dataset:
○ 61 categories of fast food images acquired under lab conditions
○ Each category contains 3 different instances of a foods, with 6 images from 6 different points of
view in each instance.
● 3-fold cross validation: 12 images for training and the remaining 6 for testing
● In results, MKL gives the best performance and improves the state-of-the-art by more
than 20%
● Their SIFT approach achieves 34.9% accuracy whereas the SIFT used in the PFID
achieves 9.2% accuracy. Why?
○ PFID baseline use LIB-SVM for classification with its default parameters.
○ Current approach uses χ^2 kernel (with scaled data), and tunes the SVM parameters (through a
grid-search over the space of C and γ),
Figure 2. Two first results
(green) are the PFID
publication. The next two
(red) are obtained using
GIR and OM. The
remaining (blue) are the
obtained using feature
descriptors and MKL.
(CMI: Color Moment
Invariant, CS: C-SIFT,
HH: Hue-Histogram,
O-S: OpponentSIFT,
R-S: RGBSIFT, S: SIFT
and MKL: Multiple Kernel
Learning)
Evaluation for In-the-wild food images
● 10 restaurants across the 5 different cuisines: American, Indian, Italian, Mexican and Thai
● 3 different individuals collected images on different days from 10 different restaurants (2
per cuisines)
● 600 images token in 2 phases:
○ 300 with a smartphone (5 cuisines × 6 dishes/cuisine × 10 images/dish)
○ 300 using a Goggle Glass
● In results, note they achieve a limited accuracy for the Mexican and Thai cuisines. They
have low degree of visible variability between food types belonging to the same cuisines.
Figure 3. Classification results. The columns are: CMI: Color Moment Invariant, C-S: C-SIFT, HH:
Hue-Histogram, O-S: OpponentSIFT, R-S: RGB-SIFT, S: SIFT and MKL: Multiple Kernel Learning
● Disregard the location information and train the SMO-MKL classifier on all of
the training data (3,750 images).
● Accuracy across the 600 test images is 15.67%, whereas the previous model
obtained, in average, had 63.33%
● Thus, the average performance increased by 47.66% when location prior was
included.
● They claim that it is better to build several smaller restaurant/cuisine specific
classifiers rather than one all-category food classifier
Recognition without Location Prior
● They built the training dataset based on the most popular foods for a particular
cuisine, matching 15 dishes from the menu of the restaurant where the test data was
token, i.e. They built the training data with the prior knowledge of the test data
● In our research:
○ If we want to use this approach to get a good food classifier for the city of Barcelona, we
should somehow split Barcelona in small sections, where we will learn a small classifier in
each
○ When a user will use the classifier, first we will locate where he/she is by geo-tags, and we
will use the corresponding classifier
● Cons:
○ Allow GPS (Will be enough for pictures?)
○ Many classifiers. Find the best cluster for each classifier
● Alternatives: One all-categories classifier weighted by GPS data.
Discussion
Suggestions?

More Related Content

Viewers also liked (6)

Functional food
Functional foodFunctional food
Functional food
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
 
Food microbes
Food microbesFood microbes
Food microbes
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 

Similar to Leveraging context to support automated food recognition in restaurants

GTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerce
GrubhubTech
 
Report on "Food recognition system using BOF model"
Report on "Food recognition system using BOF model"Report on "Food recognition system using BOF model"
Report on "Food recognition system using BOF model"
madhuripallod
 
Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image Enhancement
Sean Moran
 
Computer Vision based Model for Fruit Sorting using K-Nearest Neighbour clas...
Computer Vision based Model for Fruit Sorting  using K-Nearest Neighbour clas...Computer Vision based Model for Fruit Sorting  using K-Nearest Neighbour clas...
Computer Vision based Model for Fruit Sorting using K-Nearest Neighbour clas...
IJEEE
 
Stability based validation of dietary patterns obtained by cluster (1)
Stability based validation of dietary patterns obtained by cluster (1)Stability based validation of dietary patterns obtained by cluster (1)
Stability based validation of dietary patterns obtained by cluster (1)
SarathvarmaTirumalar
 
Cardiology_Metabolomics_workshop_2016_v2
Cardiology_Metabolomics_workshop_2016_v2Cardiology_Metabolomics_workshop_2016_v2
Cardiology_Metabolomics_workshop_2016_v2
Sophia Banton
 

Similar to Leveraging context to support automated food recognition in restaurants (20)

Content based image retrieval for agriculture crops
Content based image retrieval for agriculture cropsContent based image retrieval for agriculture crops
Content based image retrieval for agriculture crops
 
Content based image retrieval for agriculture crops
Content based image retrieval for agriculture cropsContent based image retrieval for agriculture crops
Content based image retrieval for agriculture crops
 
GTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerce
 
SIEMPRE: A GIS aided multi-criteria decision analysis application for settin...
SIEMPRE:  A GIS aided multi-criteria decision analysis application for settin...SIEMPRE:  A GIS aided multi-criteria decision analysis application for settin...
SIEMPRE: A GIS aided multi-criteria decision analysis application for settin...
 
Report on "Food recognition system using BOF model"
Report on "Food recognition system using BOF model"Report on "Food recognition system using BOF model"
Report on "Food recognition system using BOF model"
 
Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image Enhancement
 
Computer Vision based Model for Fruit Sorting using K-Nearest Neighbour clas...
Computer Vision based Model for Fruit Sorting  using K-Nearest Neighbour clas...Computer Vision based Model for Fruit Sorting  using K-Nearest Neighbour clas...
Computer Vision based Model for Fruit Sorting using K-Nearest Neighbour clas...
 
Computerized spoiled tomato detection
Computerized spoiled tomato detectionComputerized spoiled tomato detection
Computerized spoiled tomato detection
 
Computerized spoiled tomato detection
Computerized spoiled tomato detectionComputerized spoiled tomato detection
Computerized spoiled tomato detection
 
Stability based validation of dietary patterns obtained by cluster
Stability based validation of dietary patterns obtained by clusterStability based validation of dietary patterns obtained by cluster
Stability based validation of dietary patterns obtained by cluster
 
Stability based validation of dietary patterns obtained by cluster (1)
Stability based validation of dietary patterns obtained by cluster (1)Stability based validation of dietary patterns obtained by cluster (1)
Stability based validation of dietary patterns obtained by cluster (1)
 
Cardiology_Metabolomics_workshop_2016_v2
Cardiology_Metabolomics_workshop_2016_v2Cardiology_Metabolomics_workshop_2016_v2
Cardiology_Metabolomics_workshop_2016_v2
 
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptxGENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
 
3 d vision based dietary inspection for the central kitchen automation
3 d vision based dietary inspection for the central kitchen automation3 d vision based dietary inspection for the central kitchen automation
3 d vision based dietary inspection for the central kitchen automation
 
Vision Based Food Analysis System
Vision Based Food Analysis SystemVision Based Food Analysis System
Vision Based Food Analysis System
 
An Algorithm for Mobile Vision-Based Localization of Skewed Nutrition Labels ...
An Algorithm for Mobile Vision-Based Localization of Skewed Nutrition Labels ...An Algorithm for Mobile Vision-Based Localization of Skewed Nutrition Labels ...
An Algorithm for Mobile Vision-Based Localization of Skewed Nutrition Labels ...
 
A Cloud-Based Infrastructure for Caloric Intake Estimation from Pre-Meal Vide...
A Cloud-Based Infrastructure for Caloric Intake Estimation from Pre-Meal Vide...A Cloud-Based Infrastructure for Caloric Intake Estimation from Pre-Meal Vide...
A Cloud-Based Infrastructure for Caloric Intake Estimation from Pre-Meal Vide...
 
EVALUATION OF ANTIMICROBIAL SUSCEPTIBILITY TESTING
EVALUATION OF ANTIMICROBIAL SUSCEPTIBILITY TESTINGEVALUATION OF ANTIMICROBIAL SUSCEPTIBILITY TESTING
EVALUATION OF ANTIMICROBIAL SUSCEPTIBILITY TESTING
 
Food Cuisine Analysis using Image Processing and Machine Learning
Food Cuisine Analysis using Image Processing and Machine LearningFood Cuisine Analysis using Image Processing and Machine Learning
Food Cuisine Analysis using Image Processing and Machine Learning
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross Entropy
 

Recently uploaded

Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
MaherOthman7
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
BalamuruganV28
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
rahulmanepalli02
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
drjose256
 

Recently uploaded (20)

Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptx
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdf
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...
 
Piping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfPiping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdf
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTUUNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of Arduino
 

Leveraging context to support automated food recognition in restaurants

  • 1. Leveraging Context to Support Automated Food Recognition in Restaurants Vinay Bettadapura, Edison Thomaz, Aman Parnami, Gregory D. Abowd, Irfan Essa Presented by: Pedro Herruzo Sanchez
  • 2. Contents ● Motivation ● Contributions of the paper ● System Overview ● Features & Classifier ● Evaluation on the PFID dataset ● Evaluation for In-the-wild food images ● Recognition without Location Prior ● Discussion
  • 3. Motivation ● In 1970, 25.9% of food spending was on food away from home ● By 2012, 43.1% ● 80% of Americans report eating fast-food monthly and 40% report weakly ● Obesity, nutrition, and chronic diseases are now a major health concern ● Logging eating habits are increasing to prevent diseases, using diaries and smartphones ● Manually tracking (the most common) is time-consuming, prone to errors, and it is susceptible to selective under-reporting
  • 4. Contributions ● They develop an automated workflow where online resources are queried with contextual data (location) to find images and additional information about the restaurant where the food picture was taken, with the intent to build classifiers for food recognition. ● Classification by SMO-MKL multi-class SVM with features extracted from test photographs ● ‘In the wild’ evaluation from food images taken in 10 restaurants across the 5 different cuisines: American, Indian, Italian, Mexican and Thai. ● Comparative evaluation focused on location information of the images
  • 5. ● Acquisition of food images using: FoodGawker, Instagram, Pintarest and Flickr ● Determine the restaurant where the picture was took, using longitude and latitude coordinates, and APIs like Yelp or Google Places. ● Once they determine a particular restaurant R: ○ Search for R’s menu (allmenus.com, openmenu.com). ○ For each item in the menu of R, download the top 50 images from Google Images WEAKLY-LABELED TRAINING DATA ● Test data formed by segmented images from a certain restaurant R System Overview
  • 6. System Overview Figure 1. System overview. Pipeline from taking a picture until classifying.
  • 7. ● They focus on illumination Changes: Images taken in restaurant are typically indoor and under varying conditions -> Color descriptors ● Harris-Laplace point detector as a Feature extractor ● For Feature descriptor they use: ○ Color Moment Invariants: Give as a value for a region of an image ○ Hue Histogram weighted by saturation (Invariant to changes and shifts in light intensity and color) ○ C-SIFT (Invariant to changes in light intensity) ○ OpponentSIFT: Channels in the opponent color space described using SIFT descriptors (Invariant to changes and shifts in light intensity) ○ RGB-SIFT: Computed for each channel independently (Invariant to changes and shifts in light intensity and color) ○ SIFT (Invariant to changes and shifts in light intensity) Features
  • 8. ● For a given restaurant R, 100.000 interest point are detected ● For each of the 6 descriptors: ○ BoW histogram using k-means with k=1000 ○ Compute Extended Gaussian kernels ● A linear combination of this 6 kernels is learned by using the Sequential Minimal Optimization (SMP) algorithm, with a p-norm, with p>1 ● SVM to classify the images Classification using SMO-MKL
  • 10. Evaluation on the PFID dataset ● PFID dataset: ○ 61 categories of fast food images acquired under lab conditions ○ Each category contains 3 different instances of a foods, with 6 images from 6 different points of view in each instance. ● 3-fold cross validation: 12 images for training and the remaining 6 for testing ● In results, MKL gives the best performance and improves the state-of-the-art by more than 20% ● Their SIFT approach achieves 34.9% accuracy whereas the SIFT used in the PFID achieves 9.2% accuracy. Why? ○ PFID baseline use LIB-SVM for classification with its default parameters. ○ Current approach uses χ^2 kernel (with scaled data), and tunes the SVM parameters (through a grid-search over the space of C and γ),
  • 11. Figure 2. Two first results (green) are the PFID publication. The next two (red) are obtained using GIR and OM. The remaining (blue) are the obtained using feature descriptors and MKL. (CMI: Color Moment Invariant, CS: C-SIFT, HH: Hue-Histogram, O-S: OpponentSIFT, R-S: RGBSIFT, S: SIFT and MKL: Multiple Kernel Learning)
  • 12. Evaluation for In-the-wild food images ● 10 restaurants across the 5 different cuisines: American, Indian, Italian, Mexican and Thai ● 3 different individuals collected images on different days from 10 different restaurants (2 per cuisines) ● 600 images token in 2 phases: ○ 300 with a smartphone (5 cuisines × 6 dishes/cuisine × 10 images/dish) ○ 300 using a Goggle Glass ● In results, note they achieve a limited accuracy for the Mexican and Thai cuisines. They have low degree of visible variability between food types belonging to the same cuisines.
  • 13. Figure 3. Classification results. The columns are: CMI: Color Moment Invariant, C-S: C-SIFT, HH: Hue-Histogram, O-S: OpponentSIFT, R-S: RGB-SIFT, S: SIFT and MKL: Multiple Kernel Learning
  • 14. ● Disregard the location information and train the SMO-MKL classifier on all of the training data (3,750 images). ● Accuracy across the 600 test images is 15.67%, whereas the previous model obtained, in average, had 63.33% ● Thus, the average performance increased by 47.66% when location prior was included. ● They claim that it is better to build several smaller restaurant/cuisine specific classifiers rather than one all-category food classifier Recognition without Location Prior
  • 15. ● They built the training dataset based on the most popular foods for a particular cuisine, matching 15 dishes from the menu of the restaurant where the test data was token, i.e. They built the training data with the prior knowledge of the test data ● In our research: ○ If we want to use this approach to get a good food classifier for the city of Barcelona, we should somehow split Barcelona in small sections, where we will learn a small classifier in each ○ When a user will use the classifier, first we will locate where he/she is by geo-tags, and we will use the corresponding classifier ● Cons: ○ Allow GPS (Will be enough for pictures?) ○ Many classifiers. Find the best cluster for each classifier ● Alternatives: One all-categories classifier weighted by GPS data. Discussion