Kusk Object Dataset: Recording Access to Objects in Food Preparation

Kusk Object Dataset: Recording Access
to Objects in Food Preparation
Atsushi Hashimoto, Masaaki Iiyama, Shinsuke Mori,
Michihiko Minoh
Kyoto University
http://kusk.mm.media.kyoto-u.ac.jp/en/

Computer Vision (CV) meets
Natural Language Processing (NLP)
• CV-NLP collaboration is an active field.
– Supported by Matured Machine Learning Tech.
– Cooking Media can be a good practice field!
• Long text (Recipe) and organized activity (Cooking)
Video
observation
/instruction
Machine-
Readable
Description
(BN/DNN)
Recog.
Text
Generation
CV NLP
Recog./ParseRetrieve
Human-
Friendly
Description
Vision Language Real WorldReal World

Grand goal: Comparing Recipe and Human
Actions
• From a viewpoint of computer engineering…
– Recipe: A kind of script language
– Human Actions: An execution of the script by
human
• Potential Applications
– Automatic Cooking, Online recipe navigation
– Cooking Record for Healthcare , Recipe Generation

Pascal Sentence Dataset
http://vision.cs.uiuc.edu/pascal-sentences/
Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. “Collecting
Image Annotations Using Amazon's Mechanical Turk”.
In Proc. of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with
Amazon's Mechanical Turk.
• One jet lands at an airport while another
takes off next to it.
• Two airplanes parked in an airport.
• Two jets taxi past each other.
• Two parked jet airplanes facing opposite
directions.
• two passenger planes on a grassy plain
Pascal Sentence Dataset: captions and images
- Images are obtained from Pascal Dataset
Captions are annotated by Amazon Mechanical Turk

CV/NLP Datasets in CEA fields
• NLP
– Cooking Ontology (CEA2014, Japanese)
– Cookpad/Rakuten Recipe (2015, Japanese)
• CV
– TUM Kitchen Data Set (2009)
– CMU Multi-Modal Activity Database (2009)
– Actions for Cooking Eggs Dataset (2012)
– MPII Cooking Activities Dataset (2012)
– 50 Salads dataset (2013)
– The Breakfast Actions Dataset (2014)
• CV x NLP
– Yummly API
– Flow Graph Corpus (2014) × KUSK Dataset (CEA2014)

KUSK Dataset x Flow Graph Corpus
KUSK Dataset (Hashimoto,CEA2014) Flow Graph Corpus (Mori, 2014)
Water Flow Sensors
Eye Tracker
Touch Display
Electric Consumption Sensors
Load Sensing Tables
20 recipes, which are shared with flow-graph corpus
60 observations by 33 subjects.

The list of 20 recipes
CookPad ID KUSK ID Title of Recipe (Original title is in Japanese
00121196 2014RC01 Chicken and Chinese cabbage starchy soup
00180223 2014RC02 Tomato soup - Japanese style
00196551 2014RC03 Omelets
00162433 2014RC04 Mother’s chicken salad
00201826 2014RC05 Batter-less Fried croquette
00200883 2014RC06 Beef and mushrooms - Korean style
00176550 2014RC07 Saute of Shiitake and Shimeji Mushrooms
00202059 2014RC08 Potato salad with fresh potatoes
00171343 2014RC09 Celery leaves soup
00148537 2014RC10 Cooked Tomato with Chicken and Soy beans
00185809 2014RC11 Fried broccoli with chicken
00196431 2014RC12 Spicy cooked beans with chicken
00157755 2014RC13 Black sesame-crusted fried chicken
00192913 2014RC14 Zestily flavored fried eggplants
00195151 2014RC15 Meat miso wrap
00187900 2014RC16 Simmered Chinese cabbage
00155229 2014RC17 Chinese style open tofu omelet
00193642 2014RC18 Aglio e olio peperoncino
00182653 2014RC19 Radish cake
00168029 2014RC20 Noshidori
* a certain complexity
* common ingredients

KUSK Object Dataset (expansion from CEA2014)
• Provide object recognition results in KUSK Dataset
Videos
– A baseline for CV research
– Real image processing results as a input for NLP
• Resources: grabbed/released objects
– object class name, timestamp, region (rectangle)
– Informative to predict forthcoming cooking process(*
• Statistics
– 4391 unique images
– Total 133 categories (Each recipe has different cat. set.)
* A. Hashimoto et al, “Intention-Sensing Recipe Guidance via User Accessing to Objects,”
International Journal of Human-Computer Interaction, 2016

Obtained images (a select review)
IngredientsUtensilsSeasonings
Backgrounds
Cauliflowers Garlics Tofu
Enoki dake mushrooms Cabbages Pasta
Chop Sticks Bowls Colander Chop. Board Knife
soup stock powder ketchup Pepper Stem of foodDish detergent Sponge Corner Trash Bag

Semi-automated Annotation (1/2)
• 3 manual tasks for annotation
1. Correcting Errors in object region extraction by a
method from our previous research(Hashimoto, 2012)
2. List up object names appearing in each recipe
3. Adding names (from 2.) to each region (from 1.)
Treatment for orthogonal variants at the 2nd task.
> Cooking Ontology (Nanba,CEA2014)
– We manually treated items that are not listed in the ontology.

Semi-automated Annotation (1/2)
• Workers : students who do not major informatics
– # of workers: More than 20 students
– term: two months at maximum for each worker
– selection: cooking more than once in a week in the last
half year
• Interface: GUI working on Google Chrome
– Most of worker get used to operate the browser.
– double-check (reject if two annotators answered
differently)
• rejected annotation is meta-reviewed by another worker.
– Check and Advise by authors if necessary.

Object Feature and Recognition Result
• Feature: Output of the last layer of ResNet(*
• ResNet: the best CNN model in 2015 competitions
• No fine-tuned
(ResNet training does not run in public CNN libraries)
• Classifier Linear SVM (trained for each recipe)
• Assumption: Recipe is known, thereby objects too.
*) Kaiming He et al., “Deep Residual Learning for Image Recognition”
arXiv preprint arXiv:1512.03385, 2015”

Object Recognition Accuracy
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
2014RC01
2014RC02
2014RC03
2014RC04
2014RC05
2014RC06
2014RC07
2014RC08
2014RC09
2014RC10
2014RC11
2014RC12
2014RC13
2014RC14
2014RC15
2014RC16
2014RC17
2014RC18
2014RC19
2014RC20
Total

Evaluation by CMC curve
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
All Cat.
Ingredients
Seasonings
Utensils
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5 6 7 8 9 10 11 12 13 14
全種類材料
調味料調理器具
Rank
Acc.

Discussion
• Difficulty in food recognition
– Variations: wrapped? cut? and others (eggs change
appearance extremely)
• Relatively easy to recognize utensils and
seasonings:
– Every kitchen has limited variations.
(environment adaptive system is promised)
• Possibility of RCNN approach
– To deal with failures in object region extraction.

Conclusion
• KUSK Dataset x Flow Graph Corpus
– hope to be a base dataset for CV x NLP research
– problem: texts (and dishes) are Japanese.
• A dataset from Yummly is available for English speakers.
• KUSK Object Dataset ⊂KUSK Dataset
– History of user accessing objects in cooking
• Contains important information to predict forthcoming process.
• Organized by object name, put/taken label, timestamp, and rect.
• Features from ResNet and Recognition Results by Linear SVM

Future works
Mail: a_hasimoto@mm.media.kyoto-u.ac.jp
Twitter: @a_hasimoto or Facebook, Researchgate…
Original KUSK Dataset and old version of KUSK Object Dataset.
http://kusk.mm.media.kyoto-u.ac.jp/
• Collaborative research with NLP team in Kyoto Univ.
– CV2NLP: Vision-assisted NLP, Recipe Text Generation
– NLP2CV: Scenario-guided CV + PR
To get KUSK Object Dataset, please do not
hesitate to contact us.

Kusk Object Dataset: Recording Access to Objects in Food Preparation

Recommended

Recommended

More Related Content

Similar to Kusk Object Dataset: Recording Access to Objects in Food Preparation

Similar to Kusk Object Dataset: Recording Access to Objects in Food Preparation (20)

More from Atsushi Hashimoto

More from Atsushi Hashimoto (16)

Recently uploaded

Recently uploaded (20)

Kusk Object Dataset: Recording Access to Objects in Food Preparation