I teamed up with 3 of my classmates to come up with a recipe recommendation engine that takes in ingredients & cuisine preferences as an input & gives you the best suited recipe for you. This was the final project for our Data Science in the Wild class at Cornell Tech for Spring 2020. Shoutout to my team Infinite Players, Prashant, Saloni & Dale!
2. HOWDIDWE ARRIVEHERE?
FINDCALLING
01
FINDDATA
We found over 20+ usable
data repositories &
analyzed them
02
FINDRECIPE
Upon cleaning, we tried various
models to get the best possible
result across various models.
03
COOK&SERVE!
We collated the best
results around an intuitive
workflow
04
Infinite Players
We foodies found that no-one has
collated multiple datasets & made
a good recipe recommendation
engine
3. BACKGROUND
Coronavirus has brought about an interesting fact about
young working professionals and university students. All of
them relied on take-aways and dine-ins skipping cooking.
Most people do not know what to cook despite so many
options available in the grocery store in this globalized world.
Therefore, we want to answer:
What do I cook, given I have these ingredients available?
Food is the gateway to a new culture and so many cultures
can be explored by what is in your fridge. Our engine enables
this cross cultural exchange by telling you what is possible!
Infinite Players
4. DATA-FROM THEWILD
DATAsources:
Infinite Players
RECIPE_INGR_REVIEW (12K)
YUMMLY CLEAN (6K)
FOOD.COM DATA (231K)
EPICURIOUS (20K)
RECIPE_INGR (56K)
Dataset Name SOURCE FIELDS TAKEN
FOOD_COM LINK Ingredients, Recipe Name
EPICURIOUS LINK Ingredients, Recipe Name, Ratings,
Description
YUMMLY CLEAN LINK Ingredients, Recipe Name, Cuisines
RECIPE_INGR LINK Ingredients, Cuisines
RECIPE_INGR_REV LINK Ingredients, Recipe Name, Ratings
Total:~270k
5. DATA-FROM THEWILD
FORINGREDIENTS
We had ingredients ranging
from ubiquitous wheat flour
to the most exotic such as
Saffron.
In total, we had more than
100K+ ingredients in our
datasets
FORCUISINE
We started with more than 35
unique cuisines, studied the
differences, and commonalities
among all.
And finally mapped them to a
superset of 7
FORUSERRATINGS
Certain datasets had user
reviews for the recipes.
We utilised these reviews
by defining a rating scale
from 1-5 as a basis for
our item-item based
collaborative filtering
model.
Infinite Players
FORREcipeNAMES
All datasets have recipe
names except recipe
ingredients which has
only cuisine names &
ingredients.
This is our desired
output.
6. DATACLEANING- overview
Basic
Common text
preprocessing techniques
01
No“quantities”
“OZ”, “KG”, “POUNDS”,
“TSP”, “LITTLE”, “PINCH”
02
Extractnouns
POS tagging, extract
ingredients from recipe
instructions
03
Removerarewords
AVG term frequency is
600, remove words
occurring < 30 times
04
Infinite Players
Iterate!
05
Continue cleaning as we
see results
7. FEATUREENGINEERING
WHY?
● Multiple datasets - different data formats
● Cleaning to 100% is hard, doesn’t scale to new data
● Ingredient related tokens > 2.5 MIL across 270K recipes
Infinite Players
8. CUISINES- inthewild
Which cuisine does the recipe belong to?
Which cuisines should we narrow it down
to?
We tried to narrow down cuisines from this ->
Infinite Players
PROBLEM
GROCERYINSPIRATION
9. CUISINESDEMYSTIFIED
Infinite Players
Confusion Matrix - Using Neural Network
Upon refining further, we combined many
cuisines to achieve the highest accuracy for
our cuisine classifier while maintaining
distinctive flavors and favoring numbers.
Final List of Cuisines (7): American, Italian,
European, Asian, Mexican, French, Indian
10. CUISINEs- AMERICA!
Infinite Players
Confusion Matrix - Using Ensembling methods
Some patterns can be clearly noticed:
French cuisine is very similar to America’s Cajun & Creole (Louisiana).
Mexico influenced Texan food.
Italy has a great influence on Northeast food with Pizza etc.
European cuisine (Spanish,British & German) has a great influence
too
Asians & Indian cuisines have minimal collision
This is really similar to the ethnicity of immigrants in the US
*Indian & Mexican cuisines also share a lot of flavors.
11. HOWDOESITWORK?
Infinite Players
OUTPUT
INPUT : Ingredients feature vectors
EnsembleTechniques
Neuralnetwork
Logistic Regression
K Neighbors Classifier
Decision Tree Classifier
Random Forest Classifier
Layer 1: Linear + Leaky ReLU
Layer 2: Linear + LeakyReLU + Dropout
Layer 3: Linear + LeakyReLU + Dropout
Layer 4: Linear + Softmax
:Cuisine type for a list of ingredients
12. HOWTOGETARECOMMENDATION
Infinite Players
OUTPUTINPUT
Ingredient (s)
Choice of Cuisine
(if any)
COLLABFILTER
CONTENTBASEDRECOMMENDATION
AlternativeINPUT
Name of Recipe
ONE List of recipes
according to user
preferences
& Another,
List of recipes closest to
the ingredients mentioned.
Ingredient to
features using
word2vec model
Cosine Similarity for
calculating distance
KNN with Means
for recipe ratings
Cosine Similarity for
calculating distance
ITEM - ITEM based filter
Ingredients
taken as input
from recipe
13. MODEL- COLLABORATIVEFILTERing
INPUT
We build a recommender system
in which the user inputs the
ingredients they have on hand.
Based on these inputs we will
generate a short list of recipes
that fit the users preferences.
MODEL
KNN with Means has been
chosen for the recommender,
which is a basic collaborative
filtering algorithm, taking into
account the mean ratings of
each user.
Compute the cosine similarity
DATACLEANINg
We use only one rating per user.
Further we define a rating scale
for the recipe.This is determined
by the lowest and highest rating
possible given by the users.
Infinite Players
EVALUATE
We use the Surprise lib to test our
recsys. Using cross validation we
evaluate the model using a few
metrics like MSE and RMSE.
OUTPUT
Finally get a
recommendation
based on an input
string of ingredients
14. COLLABORATIVEFILTERing-RESULTS
Infinite Players
Input:User_ID,Ingredients
User_id: 2043209
Ingredients: ‘chicken,egg,milk’
RECIPE INGREDIENTS INSTRUCTIONS
Chicken Lasagna with White Sauce Recipe mozzarella,mushroom,milk,spinach,egg,ricotta,n… Preheat oven to 350 degrees F (175 degrees C)....
Swedish Meatballs egg,milk,ground beef,cereal,onion,chicken,mush. Preheat oven to 350 degrees F (175 degrees C)....
Mushroom Chicken Piccata Recipe flour,salt,paprika,egg,milk,chicken,butter,mus… In a shallow dish or bowl, mix together flour,...
User_id: 700
Ingredients:: ‘Cheese,onion’
RECIPE INGREDIENTS INSTRUCTIONS
Tuna Noodle Casserole II Recipe noodle,mushroom,milk,tuna,cheese,onion,potato,... In a large pot with boiling salted water cook ...
Hamburger Cheese Bake Recipe pasta,ground beef,onion,tomato sauce,white sauce.. In a large pot cook with boiling salted water..
I
15. MODEL- CONTENTBASEDRECOMMENDER SYSTEM
RAWINPUT
Ingredient list for every
recipe. All ingredients are
kept through the pre-
processing pipeline
MODEL:WorD2vec
Ingredients to features. 200 dimensions (with PCA,
negligible difference in cuisine results, hence
unused), context window of 12 (based on
experiments), downsampling threshold of 1e-3
Recommendation
Take input ingredients
and use w2v on it. Use
cosine similarity to
compare distance with
recipes in dataset
Infinite Players
evaluation
Based on performance of
downstream task: cuisine
classification
Eyeballing results of recipes
recommended
17. POST RECOMMENDATION /FUTURESCOPE
Personalization
As a part of improving the recommendations,
users can be prompted to rate the recipes they
were recommended.
!
The tool can be integrated into smart
devices such as refrigerators.
Integrationintosmart
devices
Infinite Players
!
User can evaluate recommendation
quality to improve the models
LEARNREGULAR
!
23. References andRelevantWork
1. https://www.kaggle.com/c/whats-cooking/data (P)
2. https://www.kaggle.com/shuyangli94/food-com-recipes-and-user-interactions (D)
3. https://www.kaggle.com/hugodarwood/epirecipes (B)
4. https://www.kaggle.com/kaggle/recipe-ingredients-dataset (S)
5. https://www.kaggle.com/kanaryayi/recipe-ingredients-and-reviews (P)
6. https://data.world/datafiniti/food-ingredient-lists (D)
7. https://link.springer.com/article/10.1007/s10844-017-0469-0
8. http://foodb.ca/ (B)
9. https://github.com/lingcheng99/Flavor-Network (S)
10. https://www.nature.com/articles/srep00196
11. https://www.foodpairing.com/
12. https://www.wired.com/2013/11/a-new-kind-of-food-science/
13. https://www.prescouter.com/2019/05/flavor-discovery-big-data-ai/
14. https://waterfootprint.org/media/downloads/Mekonnen-Hoekstra-2011-WaterFootprintCrops.pdf
15. https://www.footprintnetwork.org/licenses/public-data-package-free/
1. A New Kind of Food Science: How IBM Is Using Big Data to Invent Creative Recipes
● The study develops an algorithm that generates a list of recipes ranked using three categories: surprise, pleasantness of odor, and flavor pairings
1. Flavor network and the principles of food pairing
● The study introduces a flavor network that captures the flavor compounds shared by culinary ingredients. Given the increasing availability of information on food preparation, their data-driven
investigation also opens new avenues towards a systematic understanding of culinary practice.
1. How healthy is the meal: an analysis of recipe data
● The study looks into the interconnection between ratings, nutrients, ingredients, meals, seasons, holidays and cooking techniques.
Infinite Players
24. CUISINE
Infinite Players
GROCERYINSPIRATION
To these classifications.
We looked at our own grocery store
experiences and saw that we all could identify
items in the supermarket from these cuisines.
Therefore, people could recognize most these
cuisines
On the other hand, some cuisines had very
distinctive flavors and classifications. Such as
Jamaican & Moroccan. Therefore we tried
keeping a small sample & building a model
around it.
25. Fonts& colors used
This presentation has been made using the following fonts:
Staatliches
(https://fonts.google.com/specimen/Staatliches)
Roboto Condensed
(https://fonts.google.com/specimen/Roboto+Condensed)
#4c1130 #ff5864 #df183d#20124d #76a5af #134f5c#ffd966
OTHER RESOURCEs:
Inspiration from across SlidesGo
Infinite Players
Editor's Notes
We were looking for inspiration and we foodies who are students found that there is no good recipe recommendation engine which would suggest recipes with the ingredients available at our hand and at the same time satisfy of our specific taste. Most of the existing solutions suggest very common recipes.
Data
Try best models
We collated the best results through an intutive workflow
COVID-19 has had an adverse impact on the populations health and finances. We want to provide people with a way to
make food at home easily and quickly by recommending recipes depending on their preferences and what ingredients they
have available on hand.
Our proposed solution bodes well in the current ongoing pandemic wherein access to restaurant food is becoming
increasingly difficult. Anyone interested in saving money on eating out while simultaneously becoming more independent
and healthier will benefit from this.
People using the service can save money while simultaneously honing a skill everyone should have which is the ability to
cook food for one's self. Also, they will have an informed choice for eating healthy food.
We identified the fields in every dataset which would provide some sort of value to our recommendation engine
Fussiness
Ratings
INGREDIENT CLEANING:
We wanted to transform all our datasets to use the “Ingredient only” format. We used the following techniques across our datasets:
Removal of punctuation, numeric quantities and extra spaces
Removal of quantity strings such as “ounces”, “pounds” etc. and their variations
Splitting ingredients by “and” and “with” into individual ingredients. For example, “tomato sauce with basil and garlic” would become “tomato sauce”, “basil”, “garlic”
Use of POS tagging to identify noun phrases. For example, “Whisk some eggs” -> “eggs”
Removal of words ending in ‘ed’.
We applied the cuisine prediction on recipe datasets (>270K)
Key differences:
NN distributes recipes from Ensembling’s “American” and distributes in mexican, and other cuisines.