Recipe2Vec: Or how does my robot know what’s tasty

Recipe2Vec:
Or how does my robot know
what recipes are related?
Meghan Heintz, Senior Data Scientist at BuzzFeed and Tasty

Why Related Recipes?
Search behavior on BuzzFeed Tasty vertical
showed
Narrows a potentially exhaustive search down for
user
Provides a channel for us to resurface older
content in a coherent manner on new highly
trafficked recipes

Why do we even need to calculate related
recipes?
Can’t the producers/chefs tag those for us???

Humans while great chefs… not great taggers

My favorite example of
awful human tagging....

How will robots/computers understand the
content of our recipes?

Making text machine readable?
- Dummy coding e.g. pandas.get_dummies - Label Encoding e.g.
sklearn.preprocessing.LabelEncoder¶
More advanced techniques…
- Polynomial
- Backward Difference
- Helmert

Word Embeddings
raw text corpus
(recipes)
vector representations for words in corpus
E.g. eggplant = [0.1, 0.5, 0.2, 0.4, 0.9]

Word Embeddings
raw text corpus
(recipes)
vector representations for words in corpus
E.g. eggplant = [0.1, 0.5, 0.2, 0.4, 0.9]
Plotting these vectors should show
us that similar words end up
spatially closer to each other than
other dissimilar words.

Ways to make word embeddings
● TF-IDF vectorization
● Word-word co-occurrence matrix
○ GloVe log-bilinear model with a weighted
least-squares objective
● Neural Network
○ word2vec two-layer neural network
"A word is characterized by the company it keeps" -Firth

How does this work??
Using the word2vec implementation with skip-gram…
- Take a sentence like the quick brown fox jumped over the lazy dog

- Decompose into context words and target words
([the, brown], quick), ([quick, fox], brown), ([brown, jumped], fox), ...

- Each word is initialized as a random vector with small values

- We try to predict the context words from the target word using softmax regression
classifier

classifier
- Update word vectors by taking a small step to maximize our objective function using
stochastic gradient descent and backpropagation

classifier
- Update word vectors by taking a small step to maximize our objective function using
stochastic gradient descent and backpropagation
- Rinse and Repeat!

Image:
http://adventuresinmachinelearning.
com/word2vec-tutorial-tensorflow/

How is word2vec different from other NN
implementations?
Word Pairs and “Phrases”
● Common pairings get treated as “phrases” rather than single words
Subsampling
● Subsampling frequent words to decrease the number of training
examples
Negative Sampling
● Only a small percentage of the model’s weights are updated with each
training set. “Negative” words are selected and their vectors are
updated along with the “Positive” word.

How do we evaluate these embeddings??
Maybe Dimensionality Reduction??
High dimensional data is very difficult to visualize.
However, there are methods to project high
dimensional data down to fewer dimensions.
Principal component analysis, Linear discriminant
analysis, and t-Distributed Stochastic Neighbor
Embedding are all dimensionality reduction
methods.
We can use one of them to visualize our
embeddings.
Image:
http://www.turingfinance.com/artificial-intelligence-and-statistics-p
rincipal-component-analysis-and-self-organizing-maps/

Why it’s great
Won the Merck Visualization
Challenge on Kaggle
Better for visualization than PCA
Solves “The Crowding Problem”
t-Distributed stochastic neighbor embedding (t-SNE) minimizes the divergence
between two distributions: a distribution that measures pairwise similarities of
the input objects and a distribution that measures pairwise similarities of the
corresponding low-dimensional points in the embedding.
What to be careful about
Those hyperparameters really matter
Cluster sizes in a t-SNE plot mean nothing
Distances between clusters might not mean
anything

Find similarity by sorting by Cosine similarity
Image:
http://dataaspirant.com/2015/04/11/five-most-popula
r-similarity-measures-implementation-in-python/

Evaluating word embeddings using similarities for known relationships

Turns out Word Embeddings are sort of modular
When you add, sum, subtract or
concatenate word embeddings they
retain their meaning.
Example:
“King” - “Man” + “Woman” = “Queen”
Image:
https://blog.acolyer.org/2016/04/21/th
e-amazing-power-of-word-vectors/

recipe2vec
Sum all our word embeddings from each recipe’s preparation steps to create our recipe vector.
Evaluate recipe vectors using t-SNE

Desserts
Comfort Food
Healthy-ish
Happy Hour (Boozy)

How does this stay fresh?
We publish ~15-20 new recipes a week
Recipe2Vec applied every time a recipe is published using stale model
Model completely retrained every 12 hours

Does need some
necessary tweaks for
recipes with similar
preparations but
different flavor profiles…
e.g. smoothies v. boozy
bevvies

Other recipe vectors uses
● Predicting performance of new recipes based performance of older
similar recipes
● Creating context aware recommendations for users combining
collaborative filtering recommendations with recipe similarity metrics
● Making recommendations to producers on types of recipes to make
based on past performance
● Generally as useful features in machine learning applications

We are HIRING!!!
Contact me at:
meghan.heintz@buzzfeed.com
Or on Twitter @dot2dotseurat
Or come talk to me outside.

Recipe2Vec: Or how does my robot know what’s tasty

More Related Content

Similar to Recipe2Vec: Or how does my robot know what’s tasty

More from PyData

Recently uploaded

Recipe2Vec: Or how does my robot know what’s tasty