Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kusk Object Dataset: Recording Access to Objects in Food Preparation


Published on

Presentation in CEA2016

Published in: Technology
  • Be the first to comment

Kusk Object Dataset: Recording Access to Objects in Food Preparation

  1. 1. Kusk Object Dataset: Recording Access to Objects in Food Preparation Atsushi Hashimoto, Masaaki Iiyama, Shinsuke Mori, Michihiko Minoh Kyoto University
  2. 2. Computer Vision (CV) meets Natural Language Processing (NLP) • CV-NLP collaboration is an active field. – Supported by Matured Machine Learning Tech. – Cooking Media can be a good practice field! • Long text (Recipe) and organized activity (Cooking) Video observation /instruction Machine- Readable Description (BN/DNN) Recog. Text Generation CV NLP Recog./ParseRetrieve Human- Friendly Description Vision Language Real WorldReal World
  3. 3. Grand goal: Comparing Recipe and Human Actions • From a viewpoint of computer engineering… – Recipe: A kind of script language – Human Actions: An execution of the script by human • Potential Applications – Automatic Cooking, Online recipe navigation – Cooking Record for Healthcare , Recipe Generation
  4. 4. Pascal Sentence Dataset Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. “Collecting Image Annotations Using Amazon's Mechanical Turk”. In Proc. of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. • One jet lands at an airport while another takes off next to it. • Two airplanes parked in an airport. • Two jets taxi past each other. • Two parked jet airplanes facing opposite directions. • two passenger planes on a grassy plain Pascal Sentence Dataset: captions and images - Images are obtained from Pascal Dataset Captions are annotated by Amazon Mechanical Turk
  5. 5. CV/NLP Datasets in CEA fields • NLP – Cooking Ontology (CEA2014, Japanese) – Cookpad/Rakuten Recipe (2015, Japanese) • CV – TUM Kitchen Data Set (2009) – CMU Multi-Modal Activity Database (2009) – Actions for Cooking Eggs Dataset (2012) – MPII Cooking Activities Dataset (2012) – 50 Salads dataset (2013) – The Breakfast Actions Dataset (2014) • CV x NLP – Yummly API – Flow Graph Corpus (2014) × KUSK Dataset (CEA2014)
  6. 6. KUSK Dataset x Flow Graph Corpus KUSK Dataset (Hashimoto,CEA2014) Flow Graph Corpus (Mori, 2014) Water Flow Sensors Eye Tracker Touch Display Electric Consumption Sensors Load Sensing Tables 20 recipes, which are shared with flow-graph corpus 60 observations by 33 subjects.
  7. 7. The list of 20 recipes CookPad ID KUSK ID Title of Recipe (Original title is in Japanese 00121196 2014RC01 Chicken and Chinese cabbage starchy soup 00180223 2014RC02 Tomato soup - Japanese style 00196551 2014RC03 Omelets 00162433 2014RC04 Mother’s chicken salad 00201826 2014RC05 Batter-less Fried croquette 00200883 2014RC06 Beef and mushrooms - Korean style 00176550 2014RC07 Saute of Shiitake and Shimeji Mushrooms 00202059 2014RC08 Potato salad with fresh potatoes 00171343 2014RC09 Celery leaves soup 00148537 2014RC10 Cooked Tomato with Chicken and Soy beans 00185809 2014RC11 Fried broccoli with chicken 00196431 2014RC12 Spicy cooked beans with chicken 00157755 2014RC13 Black sesame-crusted fried chicken 00192913 2014RC14 Zestily flavored fried eggplants 00195151 2014RC15 Meat miso wrap 00187900 2014RC16 Simmered Chinese cabbage 00155229 2014RC17 Chinese style open tofu omelet 00193642 2014RC18 Aglio e olio peperoncino 00182653 2014RC19 Radish cake 00168029 2014RC20 Noshidori * a certain complexity * common ingredients
  8. 8. KUSK Object Dataset (expansion from CEA2014) • Provide object recognition results in KUSK Dataset Videos – A baseline for CV research – Real image processing results as a input for NLP • Resources: grabbed/released objects – object class name, timestamp, region (rectangle) – Informative to predict forthcoming cooking process(* • Statistics – 4391 unique images – Total 133 categories (Each recipe has different cat. set.) * A. Hashimoto et al, “Intention-Sensing Recipe Guidance via User Accessing to Objects,” International Journal of Human-Computer Interaction, 2016
  9. 9. Obtained images (a select review) IngredientsUtensilsSeasonings Backgrounds Cauliflowers Garlics Tofu Enoki dake mushrooms Cabbages Pasta Chop Sticks Bowls Colander Chop. Board Knife soup stock powder ketchup Pepper Stem of foodDish detergent Sponge Corner Trash Bag
  10. 10. Semi-automated Annotation (1/2) • 3 manual tasks for annotation 1. Correcting Errors in object region extraction by a method from our previous research(Hashimoto, 2012) 2. List up object names appearing in each recipe 3. Adding names (from 2.) to each region (from 1.) Treatment for orthogonal variants at the 2nd task. > Cooking Ontology (Nanba,CEA2014) – We manually treated items that are not listed in the ontology.
  11. 11. Semi-automated Annotation (1/2) • Workers : students who do not major informatics – # of workers: More than 20 students – term: two months at maximum for each worker – selection: cooking more than once in a week in the last half year • Interface: GUI working on Google Chrome – Most of worker get used to operate the browser. – double-check (reject if two annotators answered differently) • rejected annotation is meta-reviewed by another worker. – Check and Advise by authors if necessary.
  12. 12. Object Feature and Recognition Result • Feature: Output of the last layer of ResNet(* • ResNet: the best CNN model in 2015 competitions • No fine-tuned (ResNet training does not run in public CNN libraries) • Classifier Linear SVM (trained for each recipe) • Assumption: Recipe is known, thereby objects too. *) Kaiming He et al., “Deep Residual Learning for Image Recognition” arXiv preprint arXiv:1512.03385, 2015”
  13. 13. Object Recognition Accuracy 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% 2014RC01 2014RC02 2014RC03 2014RC04 2014RC05 2014RC06 2014RC07 2014RC08 2014RC09 2014RC10 2014RC11 2014RC12 2014RC13 2014RC14 2014RC15 2014RC16 2014RC17 2014RC18 2014RC19 2014RC20 Total
  14. 14. Evaluation by CMC curve 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 All Cat. Ingredients Seasonings Utensils 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 全種類 材料 調味料 調理器具 Rank Acc.
  15. 15. Discussion • Difficulty in food recognition – Variations: wrapped? cut? and others (eggs change appearance extremely) • Relatively easy to recognize utensils and seasonings: – Every kitchen has limited variations. (environment adaptive system is promised) • Possibility of RCNN approach – To deal with failures in object region extraction.
  16. 16. Conclusion • KUSK Dataset x Flow Graph Corpus – hope to be a base dataset for CV x NLP research – problem: texts (and dishes) are Japanese. • A dataset from Yummly is available for English speakers. • KUSK Object Dataset ⊂KUSK Dataset – History of user accessing objects in cooking • Contains important information to predict forthcoming process. • Organized by object name, put/taken label, timestamp, and rect. • Features from ResNet and Recognition Results by Linear SVM
  17. 17. Future works Mail: Twitter: @a_hasimoto or Facebook, Researchgate… Original KUSK Dataset and old version of KUSK Object Dataset. • Collaborative research with NLP team in Kyoto Univ. – CV2NLP: Vision-assisted NLP, Recipe Text Generation – NLP2CV: Scenario-guided CV + PR To get KUSK Object Dataset, please do not hesitate to contact us.