A Cloud-Based Infrastructure for Caloric Intake Estimation from Pre-Meal Videos and Post-Meal Plate Waste Pictures
1. HIMS 2015
A Cloud-Based Infrastructure for Caloric Intake
Estimation from Pre-Meal Videos and Post-Meal
Plate Waste Pictures
Vladimir Kulyukin Vikas Reddy Sudini
Department of Computer Science
Utah State University
Logan, UT, USA
vladimir.kulyukin@usu.edu
Heidi Wengreen Jennifer Day
Department of Nutrition, Dietetics, & Food Sciences
Utah State University
Logan, UT, USA
heidi.wengreen@usu.edu
Abstract— Accurate caloric intake estimation is an open
research problem in health informatics, dietetics, and nutrition
management. Most research and development efforts to date
have approached this problem by designing and developing
automated vision-based methods to estimate caloric intake from
static images. One reason why completely automated solutions
underperform on some image sets is that they do not integrate
nutritionists into the caloric intake estimation process. In this
paper, a cloud-based infrastructure is presented that allows
clients to submit pre-meal videos and post-meal plate waste
pictures. Our approach recognizes the critical role of human
nutritionists and keeps them integrated in the caloric estimation
process.
Keywords—health data acquisition; caloric intake assessment;
nutrition management; food intake; digital imaging; energy intake
I. Introduction
Accurate caloric intake estimation is an open research problem
in health informatics, dietetics, and nutrition management.
Such chronic diseases as obesity, diabetes, and blockage of
coronary arteries are related to mismanaged diets.
Consequently, there is a growing need for robust methods for
periodic caloric intake estimation. Most research and
development efforts to date have approached this problem by
designing and developing automated vision-based methods to
estimate caloric intake from static pictures taken with mobile
phones [1, 2]. Such methods attempt to automatically identify
and, when possible, quantify consumed foods and beverages.
While the performance of automated approaches exhibits
high percentages on selected food items [3], such approaches
underperform on some image sets due to low image quality,
inaccurate volume estimation, absence of reliable ground truth
baselines, and highly variable food textures that poorly yield
to automated classification.
One reason, relatively unexplored in the literature, why
completely automated solutions may underperform on some
data sets is that they do not integrate nutritionists into the
caloric intake estimation process. Nutritionists are typically
asked for feedback post-factum when the ground truth is
needed to estimate the performance of a specific algorithm on
a given set of images. Therefore, some nutritionists may feel
disengaged, because they have little or no stake in the system.
Another reason why such systems may underperform is the
limited ability of the target user to engage in continuous
nutritional data collection and analysis on a regular basis, e.g.,
daily, weekly, or bi-weekly. The target users find it difficult to
integrate nutritional data collection into their daily activities
due to lack of time, motivation, or training, which causes them
to turn off or ignore such digital stimuli as emails, phone calls,
and SMS’s.
To make nutritional data collection more manageable and
enjoyable for the users, we have been developing a Persuasive
NUTrition Management System (PNUTS) [4, 5]. PNUTS
seeks to shift current research, training, and clinical practices
in nutrition management toward persuasion, better integration
of target users and nutritionists into dietary management, and
community-oriented, context-sensitive nutrition decision
support. PNUTS is inspired by the Fogg Behavior Model
(FBM) [6], which states that motivation alone is insufficient to
stimulate target behaviors. Even motivated users must have
both the ability to execute behaviors and well-designed
triggers to engage in that behavior at appropriate places or
times. Toward this end, in this paper, we propose a cloud-
based infrastructure that allows clients to submit pre-meal
videos and post-meal plate waste pictures. Our approach
recognizes the critical role of human nutritionists and keeps
them completely integrated in the caloric estimation process.
Automated image and video analysis approaches can be
integrated into the infrastructure as needed.
Our paper is organized as follows. In Section II, we
discuss related work. In Section III, we describe a cloud-based
infrastructure for a PNUTS module that allows clients to
submit pre-meal videos and post-meal plate waste pictures that
are analyzed by human nutritionists for caloric intake. We
describe the control flows in this module and discuss how the
three main systemic roles (the client, the nutritionist, and the
project coordinator) participate in these flows. In Section IV,
we describe how the pilot version of this module has been
deployed and integrated into NDFS 4750: Transition to
Professional Practice taught at Utah State University (USU)
in the spring 2015 semester. The class had thirty six students
who acted as both client and nutritionist. Students did not have
access to their own digital images and other digital images
were accessed anonymously. The clients used the PNUTS
2. HIMS 2015
module to submit pre-meal videos and post-meal plate waste
pictures. The nutritionists watched these data to estimate
caloric intake. Two co-authors of this paper, Wengreen and
Day, both Registered Dietitian Nutritionists acted as program
coordinators to train nutritionists and to resolve
inconsistencies in caloric intake estimations. In Section V, we
analyze and discuss the results of the data collection and
caloric intake estimation experiences with the system.
II. Related Work
A variety of algorithms have been developed for vision-based
caloric intake estimation. Bosch et al. [1] developed a system,
called the mobile telephone food record (mpFR), to
automatically identify and quantify consumed foods and
beverages by analyzing images taken with mobile phones. The
image analysis consists of image segmentation, food
identification, and volume estimation. Pre- and post-meal
images are used to estimate food and energy intakes. The
reported recognition and energy intake estimation accuracy
ranges from 50 to 90 percent.
Chen et al. [3] proposed a system for automated Chinese
food identification and quantity estimation. The researchers
use sparse coding in the SIFT and local binary pattern (LBP)
feature descriptors combined with Gabor and color features to
represent food items. A multi-label SVM classifier is trained
for each feature. The trained classifiers are combined with the
multi-class Adaboost algorithm [8]. The overall accuracy
reported by the researchers is 68.3 percent. The experiments
excluded transparent food ingredients such as pure water and
cooked rice.
Kitamura et al. [7] proposed FoodLog, a web-based system
that enables users to log their dietary intake by taking and
uploading pictures of consumed foods and beverages. The
system locates and analyzes the ingredients from the uploaded
pictures and calculates the dietary intake according to a USDA
food pyramid (usda.gov) that categorizes food into grains,
vegetable, meat, beans, milk, and fruit. The researchers claim
that FoodLog’s performance is improved by personalized
models created and updated dynamically via user feedback.
The experiments showed that the accuracy of the food balance
estimation was improved from 37 to 42 percent on average by
personalized classifiers.
Hoashi et al. [9] proposed an automatic food image
recognition system for 85 food categories by fusing
various kinds of image features including bag-of-features
(BoF), color histogram, Gabor features and gradient
histogram with Multiple Kernel Learning (MKL). The
researchers implemented a prototype system to recognize food
images taken by mobile phone cameras. The MKL enabled
the researchers to integrate various kinds of image features
such as color, texture, and BoF representations. The
researchers obtained an accuracy 62.5 percent classification
rate for 85 food categories.
Yang et al.[10] proposed a new representation for food
items based on pairwise statistics between local features
computed over pixel-level segmentations of images into nine
ingredient types: beef, chicken, pork, bread, vegetable, tomato,
cheese, egg, and background. These statistics are collected in a
multi-dimensional histogram, which is then used as a feature
vector for a discriminative classifier. The Semantic Texton
Forest (STF) [11] was used for image categorization and
segmentation to generate soft labels for pixels based on
local such low-level characteristics as the colors of nearby
pixels. The system was evaluated on the Pittsburgh Food
Image Dataset (PFID) (http://pfid.intel-research.net/) and the
proposed algorithm was compared with the two PFID baseline
methods. The conducted experiments showed that the
proposed method was significantly more accurate than the two
baseline methods.
Martin et al. [2] proposed a method called the Remote
Food Photography Method (RFPM). The method consists of
camera-enabled cell phones with data transfer capability.
Users take and transmit photographs of food selection and
plate waste to researchers or clinicians for subsequent
analysis. The RFPM allows clients to receive fast feedback
about their energy intake from professionals at remote
locations without going to a clinic. The RFPM was tested in
controlled laboratory and free-living conditions, which
allowed the researchers to do direct comparisons of the
accuracy between laboratory and free-living conditions. The
variety of food was limited and not necessarily representative
of the participants’ habitual daily intake.
Wang et al. [12] developed a new dietary instrument for
assessing an individual's food intake with a hand-held personal
digital assistant (PDA) with a camera and a mobile telephone
card. The researchers applied a cross-sectional study design in
a study of twenty-eight participants who were asked to keep 1-
day weighed food records. Digital images of all recorded
foods were obtained simultaneously and sent to registered
dietitians via PDAs. The participants' opinions about the PDA
method and two other methods were determined using a
questionnaire. No significant differences were found between
the PDA method when compared with the other two food
recording methods for most nutrients. The survey showed that
57 percent of the participants indicated that the PDA method
was the least burdensome of the three methods and the least
time consuming to record daily diet.
Figure 1. Home page of PNUTS
III. Caloric Intake Estimation Infrastructure
In this section, we describe a cloud-based infrastructure
module for PNUTS that allows clients to submit pre-meal
videos and post-meal plate waste pictures. Our approach is
similar to the approaches outlined in Martin et al. [2] and
Wang et al. [12] in that it recognizes the critical role of
human nutritionists and the limitations of completely
automated food image analysis approaches. Unlike in these
two systems, the data input in our infrastructure is not
3. HIMS 2015
confined to static images taken on mobile phones: the clients
can submit not just static photos but also videos. Figure 1
shows the home page of the PNUTS module described in this
paper and used in the pilot study. As can be seen from Figure
1, the PNUTS module has three main systemic roles: the
client, the nutritionist, and the program coordinator. Each role
is briefly described below.
A. Client
The client is any user of the system interested in obtaining
accurate caloric intake estimations of consumed foods and
beverages. The client uses a web-enabled device to register
with the system. The client receives text reminders before the
three major meals (breakfast, lunch, dinner) and has an option
to sign up for additional reminders or tips. At breakfast, lunch,
and dinner the client takes a short per-meal video and does an
optional voice recording the foods and beverages to be
consumed.
At the end of the meal the client takes a still picture of the
plate waste. The client is required to have a standard plastic
card object in all videos and pictures for subsequent caloric
intake estimations. Figure 2 shows a snapshot from a pre-meal
video with the sensitive information eliminated from the
client’s USU card. Figure 3 shows the corresponding post-
meal plate waste picture. The pre-meal video and post-meal
picture are uploaded by the client on a web page shown in
Figure 4. These data together with the client id and a time
stamp constitute an energy intake estimation request (EIER).
Figure 2. Snapshot from a Pre-Meal Video
Figure 3. Post-Meal Plate Waste Picture
Figure 4. Client Meal Data Upload
B. Nutritionist
The nutritionist also registers with the system to review and
evaluate EIERs. Figure 5 shows a web page where a
nutritionist selects EIERs for evaluation. The nutritionist
watches the videos and looks at the plate waste pictures. In
computing the caloric intake the nutritionists use USDA’s
National Nutrient Database (ndb.nal.usda.gov/ndb).
Nutritionists selected the appropriate food and estimated the
amount eaten. Total kcalories consumed was computed by the
NDB program and entered into the PNUTS module by the
nutritionist. Figure 6 shows two caloric estimations of the
meal data shown in Figures 2 and 3.
Figure 5. Pending EIERs for Nutritionists
Figure 6. Two Conflicting Estimations
An EIER can be in one of the five states: unprocessed,
pending, processed, conflicting, and resolved. A request is
Nutritionist ID: A*****2540
Email ID: *****@gmail.com
Estimation: 54_A01772540.txt
Summary:
08539 Kashi Wheat Cereal 1 cup 336
01175 1% milk 1 cup 102
09433 Clementines 2 fruits 70
Nutritionist ID: A*****9151
Email ID: *****@gmail.com
Estimation: 54_A01049151.txt
Summary:
08539 Cereals ready-to-eat, Kashi Organic Promise,
Cinnamon Harvest 1 serving 185
01082 milk, lowfat, fluid, 1% milkfat, with added
nonfat milk solids, vitamin A and D 1 cup 102
09433 Clementines, Raw 2 each 70
4. HIMS 2015
unprocessed when it has been received by the system but has
not been evaluated by any nutritionist registered with the
system. A request is pending when it has been evaluated by
one nutritionist. A processed request is a request that has been
evaluated by two nutritionists without a conflict, i.e., their
estimates agree with each other within ten percent of total
kcalories at the eating occasions level, or by a program
coordinator, another nutritionist with the authority to resolve
conflicting caloric intake estimations. A conflicting request is
a request that has been evaluated by three nutritionists whose
evaluations disagree by more than 10 percent. A resolved
request is a conflicting request that has been evaluated by a
program coordinator. Resolved requests can be viewed by
nutritionists but not re-evaluated.
C. Program Coordinator
Figures 7 and 8 show how the energy intake estimations (EIE)
are made by nutritionists and, if necessary, resolved by
program coordinators. When a nutritionist submits his or her
estimation, the new EIE is matched against the list of EIEs for
that request. If there are no existing EIEs for that request, the
system does nothing and saves the EIE in the EIE database.
Figure 7. No Existing EIEs for a given EIER
Figure 8. Matching new EIE against existing EIEs
If there is at least one existing EIE for that EIER, the
control flow continues as shown in Figure 8. If there is only
one existing EIE, the new EIE is compared with it. The
comparison between two requests is computed as the relative
difference between two numbers, i.e., 𝐷 =
|𝑥−𝑦|
max{𝑥,𝑦}
100, where
x and y are two total caloric counts in the two EIEs being
compared. For example, consider two EIEs given in Figure 6.
The first EIE (EIE1) is submitted by the nutritionist whose ID
is A*****2540. The second EIE (EIE2) is submitted by the
nutritionist whose ID is A*****9151. The total caloric count
of EIE1 is 336+102 +7 =508. The total caloric count of EIE2
is 185+102+70=357. The relative difference is
|508−357|
508
100 =
29.72. Two EIEs are considered similar if their relative
difference is at most 10 percent. Thus, the two EIEs given in
Figure 6 are considered conflicting.
If the new EIE and the only existing EIE are not
conflicting, the corresponding EIER is removed from the
database of pending EIERs and considered resolved. If the
new EIE and the only existing EIE are found to be in conflict,
both are saved in the database of the existing EIEs for a given
EIER.
If there are two existing EIEs for a specific request, then
the new EIE is matched against both to see if there is
agreement with either one. If there is an agreement, all EIEs
are removed and the corresponding EIER is considered
resolved. If there is no agreement, all three EIEs are deleted
from the database of the existing EIEs, the corresponding
EIER is deleted from the database of the pending EIERs, and
all four records (three EIEs and one EIER) are saved in a
conflict database. A program coordinator is notified about the
conflict via email.
Figure 9. List of Conflicting EIERs
The program coordinator is also a Rregistered Dietitian
Nutritionist who has the authority to resolve EIER conflicts.
The system can have multiple program coordinators, each
supervising a number of nutritionists. In the pilot version of
the system described in this paper, there are two program
coordinators.
Whenever a conflicting EIER request has been detected by
the system, the program coordinator receives an email
notification. When the program coordinator logs in, the
program coordinator sees a list of conflicting EIERs, as shown
in Figure 9. The program coordinator watches the same video
data as the nutritionists who provided the EIEs and evaluates
5. HIMS 2015
the request. After the program coordinator evaluates the
conflicting EIER, the EIER is considered resolved and is
removed from the database of the conflicting requests.
D. Impelementation Details
We implemented the described infrastructure in Java using the
Java servlets, Java Server Pages (JSPs), and JBoss
(http://www.jboss.org). Our current cluster has two nodes: one
master and one slave. All our databases are implemented with
MySQL (https://www.mysql.com/). The JBoss Application
Server (JBoss AS) is a free open-source Java EE-based
application server. In addition to providing a full
implementation of a Java application server, it also
implements the Java EE part of Java. The JBoss AS is
maintained by jboss.org, a community that provides free
support for the server. JBoss is licensed under the GNU
Lesser General Public License (LGPL).
IV. Pilot Deployment
The pilot version of the PNUTS module described in the
previous section has been deployed and integrated into NDFS
4750: Transition to Professional Practice taught at Utah State
University (USU) in the spring 2015 semester. 407 pre-meal
videos, 407 post-meal pictures, and 84 caloric intake
estimations.
Figure 10. Sample Caloric Intake Estimation Form
All students enrolled in NDFS 4750 received caloric
estimation training prior to acting as clients and nutritionists.
Each student watched and evaluated fourteen training pre-
meal and post-meal video pairs and was required to complete
by hand a caloric intake estimation form shown in Figure 10.
The ground truth for each video was provided by Jennifer
Day. Each student’s evaluation was compared with the ground
truth and the students who differed from the ground truth by
more than ten percent had to undergo additional training. The
range of the ratio of EIEs to ground truth was 0.85 – 1.15
(mean = 0.98, standard deviation = .067). Of the thirty six
students, thirty two (89%) provided EIEs within 10% of the
ground truth. Four students provided EIEs that were not within
10% of the ground truth. One student provided mean EIEs that
underestimated intake, and three students overestimated
energy intake. After additional training all four students
provided estimates within 10% of ground truth using digital
data from five new eating occasions.
The client group used the PNUTS module to submit pre-
meal videos and post-meal plate waste pictures. We have
collected sixty one pre-meal videos, sixty one post-meal plate
waste pictures, and one hundred twenty two EIEs. The
nutritionists used the data to estimate caloric intake of each
eating occasion. Jennifer Day, acted as program coordinator to
resolve inconsistencies in caloric intake estimations. In
addition, each client completed a paper-pencil three day food
record of all foods eaten and submitted to the PNUTS
program as pre- and post-meal digital data. Figure 11 shows
part of a filled 3-day food record form filled by one of the
student clients. We plan to use these data forms as the ground
truth in our future experiments on automated food item
identification from client videos.
Twenty eight students provided EIEs for from digital data
of 61 eating occasions provided by this same set of students.
The mean EIEs was 383 kcalories (range: 35 – 936).
Agreement among the twenty eight students who provided
EIEs from the digital data was examined with intra-class
correlation coefficients (ICCs). Agreement was high among
the twenty eight student nutritionists (ICC = 0.89, p<0.001).
Agreement was lower for digital data from eating occasions
that included a single food or five or more food items (ICC =
0.78, 0.77, respectively; p-value = 0.023, and 0.002,
respectively). Agreement was highest for digital data from
eating occasions that included two to four food items (ICC =
0.95, p<0.001). These values are at least as high as the ICCs
reported when digital photography method is used in cafeteria
settings and free living adults [13, 14].
Figure 11. Part of a 3-Day Paper Record
6. HIMS 2015
V. Discussion
The pilot version of the PNUTS infrastructure functioned
flawlessly and enabled both the clients and the nutritionists to
upload and analyze the meal data. There were two main
complaints by the students who used the system as clients and
nutritionists. The clients complained that the system did not
allow them to submit partial caloric estimations with an option
to complete them later. In the current implementation, the
client can submit only complete estimations of each meal. The
main complaint of the nutritionists was the necessity to use the
USDA National Nutrient Database at ndb.nal.usda.gov. The
students did not complain about the Nutrient Database
program itself but about the need to go to a different web site,
look for ingredients, enter in the amount eaten, then retype all
the data into another web page.
We plan to address the first complaint by allowing the
clients to submit partial caloric intake estimations. One
challenge that we foresee is what to do with partial estimations
when their requests have been completed by other
nutritionists. For example, a nutritionist who comes back to
complete an estimation may discover that two other
nutritionists have submitted complete non-conflicting
estimations of the same EIER. The second complaint will be
harder to address as we do not our own internal database of all
possible ingredients that may be found in submitted digital
data.
In an informal qualitative survey conducted with the
students they said that the system made them feel as part of a
community of users and that they would use the system as
registered nutritionists after they graduate. When asked about
the caloric data entry difficulties, the students said these
difficulties are compensated by not having to meet with the
clients in person. The system has turned out to be a valuable
training tool for undergraduate nutrition students and will
likely be used in other USU NDFS classes in the fall 2015
semester.
Another direction that we plan to pursue in the future is
automated identification of food items in pre-meal videos.
Such identification will likely make data entry easier for
nutritionists in that the nutritionists will not have to enter the
names of food items manually.
Acknowledgments
We are grateful to Dr. Sheryl Aguilar for letting us to pilot
the system in her NDFS 4750 class that she taught at USU in
the spring 2015 semester. We are also grateful to all students
enrolled in NDFS 4750 who tested the system as clients and
nutritionists for their effort and feedback.
References
[1] Bosch M., Zhu F., Khanna N., Boushey C.J., and Delp E.
“Combining global and local features for food
identification in dietary assessment.” IEEE transactions
on Image Processing . 2011:1789-1792.
doi:10.1109/ICIP.2011.6115809.
[2] Martin, C.K., Han, H., Coulon, S.M., Allen, H.R.,
Champagne, C.M., and Anton, S.D. (2009). “A novel
method to remotely measure food intake of free-living
people in real-time: The Remote Food Photography
Method (RFPM).” British Journal of Nutrition, 101, 446-
456. PMCID: PMC2626133.
[3] Chen, M.Y., Yang,Y., Chia-Ju Ho, C., Wang, S., Liu, S.,
Chang, E., Yeh, C., & Ouhyoung. M. “Automatic
Chinese food identification and quantity estimation.” In
Proceedings of SIGGRAPH Asia Technical Briefs (SA
'12). ACM, New York, NY, USA, , Article 29 , 4 pages,
2012. DOI=10.1145/2407746.2407775.
[4] Kulyukin, V., Zaman, T., and Andhavarapu, S. “Effective
use of nutrition labels on smarphones.” In Proceedings of
the 15th International Conference on Internet Computing
and Big Data (ICOMP 2014), pp. 93 - 99, July 21-24,
2014, Las Vegas, NV, USA, CSREA Press, ISBN: 1-
60132-227-1.
[5] Kulyukin, V. and Zaman T. (2013). “Vision-based
localization of skewed UPC barcodes on smartphones.” In
Proceedings of the International Conference on Image
Processing, Computer Vision, & Pattern Recognition
(IPCV 2013), pp. 344-350, pp. 314-320, ISBN 1-60132-
252-6, CSREA Press, Las Vegas, NV, USA.
[6] Fogg, B.J. “A behavior model for persuasive design.” In
Proceedings of the 4th
International Conference on
Persuasive Technology. Arctile 40. ACM, New York,
USA, 2009.
[7] Kitamura, K., Yamasaki, T., & Aizawa, K. 2009.
“FoodLog: capture, analysis and retrieval of personal food
images via web.” In Proceedings of the ACM multimedia
2009 workshop on Multimedia for cooking and eating
activities (CEA '09). ACM, New York, NY, USA, pp. 23-
30. DOI=10.1145/1630995.1631001.
[8] Freund, Y. & Schapire, R. “A Decision-theoretic
generalization of on-line learning and application to
boosting.” Journal of Computer Systems and System
Sciences, vol. 55, issue 1, pp. 119-139, August 1997.
doi:10.1006/jcss.1997.1504.
[9] Hoashi, H., Joutou, T., and Yanai, K. 2010. “Image
recognition of 85 food categories by feature fusion.” In
Proceedings of IEEE International Symposium on
Multimedia, pp. 296-301, Taichung, Dec. 2010, IEEE.
ISBN 978-1-4244-8672-4.
[10] Yang, S., Chen, M., Pomerleau, D., and Sukthankar, R.
“Food recognition using statistics of pairwise local
features.” In Proceedings of IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pp.
2249-2256, 13-18 June, 2010, ISSN 1063-6919.
[11] Shotton, J., Johnson, M., and Cipolla, R. “Semantic texton
forests for image categorization and segmentation.” In
Proceedings of IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 1063-6919, 23-28 June,
2008, ISSN 1063-6919.
[12] Wang, D.H., Kogashiwa, M., and Kira, S. “Development
of a new instrument for evaluating individuals’ dietary
intakes.” Journal of American Diabetes Association,
106(10), pp. 1588-1593, Oct. 2006.
[13] Wengreen, H.J., Madden G.J., Aguilar S.S., Smits R.R.,
Jones B.A. “Incentivizing children’s fruit and vegetable
consumption : results of a United States pilot study of the
Food Dudes program.” Journal of Nutrition Education
Behavior. 2013;45(1):54-9.
[14] Martin C.K., Han H., Coulon S.M., Allen H.R.,
Champagne C.M., Anton S.D. “A novel method to
remotely measure food intake of free-living individuals in
real time: the remote food photography method.” British
Journal of Nutrition. 2009;101:446-456.