In the content analysis on healthy dish recipes, we have studied 718 recipes and provide actionable recommendations to Allrecipes.com to increase their website traffic.
2. Introduction
Background
● Allrecipes.com provides a platform and forum for everyone to search for
recipes, post recipes and make comments. The website is concerned
about attracting more website visits by improving the quantity and quality
of recipes, since people now are more health-conscious and prefer a
healthy balanced diet.
Problem Statement
● The research studies 718 healthy main dish recipes and explores:
1. Which ingredients and/or cooking methods are most popular for a
healthy diet?
2. Which factors contribute most to the popularity of a recipe?
3. What writing style of recipes attracts most reviews?
4. Variable lists:
● Number of reviews
● Video(Y/N)
● Photo(Y/N)
* Number of reviews is used to
measure the popularity of the
recipe
● Number of Ingredients
● Ingredients(text)
Example of a recipe:
5. ● Prep time
● Cook time
● Ready time
* Ready time does not
necessarily equals to (Prep time +
Cook time)
● Directions(text)
● Length of directions
● Number of steps
● Average length of directions
● Calories
● Cholesterol
● Fiber
● Sodium
● Carbohydrates
● Fat
● Protein
6. Analysis on Directions
Tool: Stanford Parser, Python
Steps:
● Adopted VB (including VBD,VBG,VBN,VBP,VBZ)
● Neglect useless verbs, eg. cook, use, take
● Combine similar verbs related to cooking methods
● Calculate total count of verbs
7. Most Popular Cooking Methods
● Top 1: “Bake”
● Healthy Cooking Methods:
eg. “Simmer”(108), “Saute”(93), “Boiling”(56)
● Unhealthy Cooking Methods:
eg. “Grill”(37), “Fry”(25), “Burn”(7)
Top 20 Verbs of Cooking
Top 30 Verbs
8. Analysis on Ingredients
Tool: Weka
Steps:
● Split a string into an n gram with 1- 4 grams
● Output word count
● Delete meaningless words or phrases
● Calculate total word frequency
● Drill down analysis
● Associations within ingredients
9. Word Frequency
Top main ingredients:Top seasonings:
seasoning Freq
garlic 425
olive oil 298
black
pepper 214
cheese 195
lemon 147
basil 97
ginger 92
vegetable
oil 92
wine 92
lemon juice 88
parsley 88
vinegar 80
cilantro 71
cumin 68
ingredient Freq
onion 384
chicken 328
bell
pepper 149
pasta 131
mushroom 97
tomato 95
bean 81
carrot 58
rice 58
potato 54
orange 53
flour 50
pork 49
zucchini 44
● 65% of top main
ingredients are
vegetable and
fruit, 21% are
staple food and
14% are meat
10. ● Chicken meat is a healthy kind of protein compared to other kinds of meat (like
beef and pork)
● Chicken breast has 3 times lower in fat and 25% in calories compared to drumsticks
and wings
● Skin and internal organs are the fattest parts, thus 87% of chicken breast are
skinless and boneless
Drill-Down Analysis
11. Brown sugar:
● more minerals
● vitamin B enricher
● more flavors and textures
Honey isn’t just Sugar!
● energy source
● source of vitamins and
minerals
● weight loss
12. ● Parmesan Cheese has much more
calcium than any other cheese, and it is
also low-lactose
● 67% of Cheddar cheese are reduced-fat
● Benefits of olive oil for heart, skin and
hair
● Canola oil that is not healthy for
cardiovascular system takes only 3%
Cheese Oil
13. Association - The overall
associations are
categorized into 4
groups: chicken,
garlic, onion and
pasta
- Offer ideas for
users to cook healthy
food by providing a
list of common
ingredients
14. Data Mining - Data Preprocessing(target variable)
● Number of reviews is a
continuous variable
● The majority of number
of reviews falls into the
range of [0,600]
● Existence of outliers
Solution:
Divide the variable evenly
into three categories
Distribution of Number of reviews
15. Variables that have strong correlations
within each other:
● Number of ingredients & length of
directions
● Ready time & Cooking time
● Length of direction & Average length of
recipe
● Calories & fat
● Calories & carbohydrate
● Calories & protein
● Protein & Cholesterol
Solution:
● Remove following input variables:
cooking time, length of direction,
calories
Data Mining
- Data Preprocessing (Correlation between Numeric variables)
Average Length vs. Ready Time
Fat vs. Calories
No correlation
strong correlation
16. Data mining
Neural Networks(3 layers):
Decision tree(C 5.0):
Top 5 important
predictors:
● Recipe photo(Y/N)
● Video(Y/N)
● Sodium/mg
● Ready Time/min
● Carbohydrate/mg
* Results in R is very similar
17. Recommendations
In order to attract more website traffics, health main dish
recipes are suggested to have:
● video/photos
● sodium: 200 ~ 800 mg
● # of ingredients: 7-12
● <80-minute ready time
● avg.length of direction: 20~40 words