Kusk Object Dataset: Recording Access to Objects in Food Preparation
Kusk Object Dataset: Recording Access
to Objects in Food Preparation
Atsushi Hashimoto, Masaaki Iiyama, Shinsuke Mori,
Computer Vision (CV) meets
Natural Language Processing (NLP)
• CV-NLP collaboration is an active field.
– Supported by Matured Machine Learning Tech.
– Cooking Media can be a good practice field!
• Long text (Recipe) and organized activity (Cooking)
Vision Language Real WorldReal World
Grand goal: Comparing Recipe and Human
• From a viewpoint of computer engineering…
– Recipe: A kind of script language
– Human Actions: An execution of the script by
• Potential Applications
– Automatic Cooking, Online recipe navigation
– Cooking Record for Healthcare , Recipe Generation
Pascal Sentence Dataset
Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. “Collecting
Image Annotations Using Amazon's Mechanical Turk”.
In Proc. of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with
Amazon's Mechanical Turk.
• One jet lands at an airport while another
takes off next to it.
• Two airplanes parked in an airport.
• Two jets taxi past each other.
• Two parked jet airplanes facing opposite
• two passenger planes on a grassy plain
Pascal Sentence Dataset: captions and images
- Images are obtained from Pascal Dataset
Captions are annotated by Amazon Mechanical Turk
CV/NLP Datasets in CEA fields
– Cooking Ontology (CEA2014, Japanese)
– Cookpad/Rakuten Recipe (2015, Japanese)
– TUM Kitchen Data Set (2009)
– CMU Multi-Modal Activity Database (2009)
– Actions for Cooking Eggs Dataset (2012)
– MPII Cooking Activities Dataset (2012)
– 50 Salads dataset (2013)
– The Breakfast Actions Dataset (2014)
• CV x NLP
– Yummly API
– Flow Graph Corpus (2014) × KUSK Dataset (CEA2014)
KUSK Dataset x Flow Graph Corpus
KUSK Dataset (Hashimoto,CEA2014) Flow Graph Corpus (Mori, 2014)
Water Flow Sensors
Electric Consumption Sensors
Load Sensing Tables
20 recipes, which are shared with flow-graph corpus
60 observations by 33 subjects.
The list of 20 recipes
CookPad ID KUSK ID Title of Recipe (Original title is in Japanese
00121196 2014RC01 Chicken and Chinese cabbage starchy soup
00180223 2014RC02 Tomato soup - Japanese style
00196551 2014RC03 Omelets
00162433 2014RC04 Mother’s chicken salad
00201826 2014RC05 Batter-less Fried croquette
00200883 2014RC06 Beef and mushrooms - Korean style
00176550 2014RC07 Saute of Shiitake and Shimeji Mushrooms
00202059 2014RC08 Potato salad with fresh potatoes
00171343 2014RC09 Celery leaves soup
00148537 2014RC10 Cooked Tomato with Chicken and Soy beans
00185809 2014RC11 Fried broccoli with chicken
00196431 2014RC12 Spicy cooked beans with chicken
00157755 2014RC13 Black sesame-crusted fried chicken
00192913 2014RC14 Zestily flavored fried eggplants
00195151 2014RC15 Meat miso wrap
00187900 2014RC16 Simmered Chinese cabbage
00155229 2014RC17 Chinese style open tofu omelet
00193642 2014RC18 Aglio e olio peperoncino
00182653 2014RC19 Radish cake
00168029 2014RC20 Noshidori
* a certain complexity
* common ingredients
KUSK Object Dataset (expansion from CEA2014)
• Provide object recognition results in KUSK Dataset
– A baseline for CV research
– Real image processing results as a input for NLP
• Resources: grabbed/released objects
– object class name, timestamp, region (rectangle)
– Informative to predict forthcoming cooking process(*
– 4391 unique images
– Total 133 categories (Each recipe has different cat. set.)
* A. Hashimoto et al, “Intention-Sensing Recipe Guidance via User Accessing to Objects,”
International Journal of Human-Computer Interaction, 2016
Obtained images (a select review)
Cauliflowers Garlics Tofu
Enoki dake mushrooms Cabbages Pasta
Chop Sticks Bowls Colander Chop. Board Knife
soup stock powder ketchup Pepper Stem of foodDish detergent Sponge Corner Trash Bag
Semi-automated Annotation (1/2)
• 3 manual tasks for annotation
1. Correcting Errors in object region extraction by a
method from our previous research(Hashimoto, 2012)
2. List up object names appearing in each recipe
3. Adding names (from 2.) to each region (from 1.)
Treatment for orthogonal variants at the 2nd task.
> Cooking Ontology (Nanba,CEA2014)
– We manually treated items that are not listed in the ontology.
Semi-automated Annotation (1/2)
• Workers : students who do not major informatics
– # of workers: More than 20 students
– term: two months at maximum for each worker
– selection: cooking more than once in a week in the last
• Interface: GUI working on Google Chrome
– Most of worker get used to operate the browser.
– double-check (reject if two annotators answered
• rejected annotation is meta-reviewed by another worker.
– Check and Advise by authors if necessary.
Object Feature and Recognition Result
• Feature: Output of the last layer of ResNet(*
• ResNet: the best CNN model in 2015 competitions
• No fine-tuned
(ResNet training does not run in public CNN libraries)
• Classifier Linear SVM (trained for each recipe)
• Assumption: Recipe is known, thereby objects too.
*) Kaiming He et al., “Deep Residual Learning for Image Recognition”
arXiv preprint arXiv:1512.03385, 2015”
• Difficulty in food recognition
– Variations: wrapped? cut? and others (eggs change
• Relatively easy to recognize utensils and
– Every kitchen has limited variations.
(environment adaptive system is promised)
• Possibility of RCNN approach
– To deal with failures in object region extraction.
• KUSK Dataset x Flow Graph Corpus
– hope to be a base dataset for CV x NLP research
– problem: texts (and dishes) are Japanese.
• A dataset from Yummly is available for English speakers.
• KUSK Object Dataset ⊂KUSK Dataset
– History of user accessing objects in cooking
• Contains important information to predict forthcoming process.
• Organized by object name, put/taken label, timestamp, and rect.
• Features from ResNet and Recognition Results by Linear SVM
Twitter: @a_hasimoto or Facebook, Researchgate…
Original KUSK Dataset and old version of KUSK Object Dataset.
• Collaborative research with NLP team in Kyoto Univ.
– CV2NLP: Vision-assisted NLP, Recipe Text Generation
– NLP2CV: Scenario-guided CV + PR
To get KUSK Object Dataset, please do not
hesitate to contact us.