Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Discover your-latent-food-graph-with-this-1-weird-trick -- PyData NYC 2019

At Grubhub we leverage recent advances in Representation Learning to gain an automated and scalable understanding of our vast restaurant and menu catalog. We use these techniques to learn a latent food knowledge graph in order to drive better search and personalization. Particularly, we hope to share some of our advances in using: language modeling and knowledge graphs in the e-commerce setting.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

  • Be the first to like this

Discover your-latent-food-graph-with-this-1-weird-trick -- PyData NYC 2019

  1. 1. Discover Your Latent Food Graph with this 1 Weird Trick Grubhub Search Data Science
  2. 2. Restaurant Recommendations Menu/Dish Recommendations Rest/Dish/Cuisine Search Cuisine Recommendations
  3. 3. Ecommerce Dilemma ● Our catalog grows everyday ● Data is unstructured ● and unbounded ● How can we understand it to drive: search & recommendations?
  4. 4. Use Cases ● Where can I get amazing Blueberry Pancakes? (semantic dish search) ● What are some synonyms for Pierogi? (query expansion) ● Show me French restaurants in Brooklyn? (semantic cuisine search) ● What are the top-10 asian noodle dishes near me? (semantics dish recs) ● Find me a new French restaurant that I’ll like (personalized restaurant recs)
  5. 5. Weird Trick: Representation Learning 1. Query2vec: understanding users 2. Rest2vec: understanding restaurants 3. FastMenu: understanding menus Users + Restaurants + Menus = Grubhub Food Universe
  6. 6. query2vec Query Understanding ● Language Normalization ● Intent Classification Query Building ● Filtering ● Query Expansion Candidate Selection ● Phrase/Term Matching ● Semantic Matching Enrichment ● Pruning ● Hydration ● Pagination Ranking ● Revenue ● Relevance ● Personalization
  7. 7. Query Expansion Original Query: ● Dan Dan Noodles Expanded Query: ● Dan Dan Noodles ● Spicy Noodles ● Chinese ● Japanese ● Asian Increased Recall!
  8. 8. Classical Query Expansion ● Thesaurus/Synonyms: cranium, brain, noggin, thinker ● Knowledge Graph Modern Query Expansion ● Representation Learning ○ Click Pattern Mining: Cluster similar queries based on converting restaurant ○ query2vec à la word2vec “Dan Dan Noodles”
  9. 9. #network weights query_embeddings = tf.Variable(tf.random_uniform([len(query_mapping), k], -1.0, 1.0), name="query_embeddings") softmax_weights = tf.Variable(tf.truncated_normal([len(item_mapping), k], stddev=1.0 / math.sqrt(k)), name="softmax_weights") softmax_bias = tf.Variable(tf.zeros([len(item_mapping)]), name="softmax_bias") #Select input items from embedding. x_one_hot = tf.one_hot(x, len(query_mapping), name="one_hot_input") h = tf.matmul(x_one_hot, query_embeddings, name="projection") # [None, K] #select input labels batched_labels = tf.reshape(mapped_labels, [-1, 1]) logits = tf.matmul(h, tf.transpose( softmax_weights)) full_softmax_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logit s=logits, labels=batched_labels) approx_softmax_loss = tf.nn.nce_loss(softmax_weights, softmax_bias, batched_labels, h, neg, len(item_mapping)) train_op = tf.train.AdamOptimizer(learning_rate).minimize(tf.re duce_mean(loss)) return tf.estimator.EstimatorSpec(mode=mode, loss=mean_loss, train_op=train_op)
  10. 10. Data ● Dataset: 1 year of (search query, restaurant_id) pairs ● Spark preprocessing: normalization w/ EMR cluster ● 10min/epoch @1 GPU (AWS p2)
  11. 11. Rest2Vec Query Understanding ● Language Normalization ● Intent Classification Query Building ● Filtering ● Query Expansion Candidate Selection ● Phrase/Term Matching ● Semantic Matching Enrichment ● Pruning ● Hydration ● Pagination Ranking ● Revenue ● Relevance ● Personalization
  12. 12. Rest2Vec Creates numerical vector representation of restaurants from historical clickstream data using user’s clicks/conversions ● Helps to understand Restaurants ● Helps to power Discovery ● Helps to power Personalization
  13. 13. From Word2Vec to Rest2Vec (Data) Distributional Hypothesis A word is characterized by the company it keeps - Firth (1957)
  14. 14. From Word2Vec to Rest2Vec (Algorithm) Word2Vec Rest2Vec
  15. 15. Training Data ● Number of Intentful Sessions ~ 60M ● Interactions per Session 4 to 8 ● Number of Restaurants ~140K ● Sample Session
  16. 16. Tensorboard Visualization ● Each Market has its own Cluster ● Cluster size indicates how big the Market is
  17. 17. Integration with Service Learn more about Fast KNN lookup using Annoy here
  18. 18. Query Understanding ● Language Normalization ● Intent Classification Query Building ● Filtering ● Query Expansion Candidate Selection ● Phrase/Term Matching ● Semantic Matching Enrichment ● Pruning ● Hydration ● Pagination Ranking ● Revenue ● Relevance ● Personalization FastMenu
  19. 19. FastMenu Creates numerical vector representation of menu items using associated textual data rather than diner behavior ● Helps to understand menus ● Helps to power semantic search ● Complete catalogue coverage
  20. 20. Menu Text Matching Menu Item Description: ● mai fun ● blueberry pancake String Matched Menu Items: ● mai fun, chow fun, shrimp mai fun ● blueberry smoothie, buttermilk pancake Semantic Matched Menu Items ● stir fried noodles, thin rice noodles ● grand slam breakfast ● Increased recall
  21. 21. Static Sequence Embeddings ● Fasttext = sub-words ● Handles out of vocabulary words ● “pizza” ● <START>p, pi, iz, zz, za, a<END>
  22. 22. Menu Item Feature How do you characterize a unique menu item with text? Text Source Example Use Restaurant Name San Gennaro’s No - no semantic info Name margarita pizza Yes Description Adorned simply in the colors of the Italian flag: green from basil, white from mozzarella, red from tomato sauce. Yes Menu Section House Favorites No - too noisy Restaurant Cuisine Pizza, Subs, Italian, American, Lunch Specials Yes Reviews “3/5” No BUT: This content has no location awareness
  23. 23. Tensorboard Visualization ● Each Market has its own Cluster ● Cluster size indicates how big the Market is
  24. 24. Geohashes: ● Covers the surface of the earth ● Denotes rectangular area ● Alphanumeric string ● ~32 bit lat-long specification ● Nested precision levels dr dr5 drh dr5x dr5z
  25. 25. Geohash Embedding Geohashes: same representative characters as language ● Location words: geohash (dr5ru) ● Sentence = geohashes < 40 mi “dr725 dr72h dr72j dr5rg dr5ru dr5rv dr5re dr5rs dr5rt” ● Concat geohash sentence to menu text margarita pizza adorned simply colors italian flag green from basil white from mozzarella red from tomato sauce pizza subs italian american lunch specials dr725 dr72h dr72j dr5rg dr5ru dr5rv dr5re dr5rs dr5rt ● Expand “word” vocabulary, but still 26 chars and 10 numbers ● Menu item text now knows about location
  26. 26. Data Menu items: ~10 M Geohash radius: 40 mi - geohash precision 4 Embedding Dimension: 30 Vocabulary: 3k words account for 97 % of all words used to describe menu items.
  27. 27. Visualization: TensorBoard t-SNE: local variation (cuisine separable) PCA: global variation (geography separable) Phoenix: mexican Topeka NashvillePhoenix Phoenix: asian Phoenix: indian Nashville: asian Topeka: Small market
  28. 28. Nearest Neighbors
  29. 29. Now we can answer the important questions AMAZING!!! blueberry pancakes Are pierogis really empanadas?!? 10 Asian noodles near you Try this French restaurant instead
  30. 30. Alex Egg: @eggie5 Emily Ray: eray1@grubhub.com Parin Choganwala: pchoganwala@grubhub.com FOR MORE INFO CHECK OUT: https://bit.ly/32fmBwJ

×