Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016

2,855 views

Published on

E-commerce Query Tagging System Using Unsupervised Training Methods: Amazon is one of the world’s largest e-commerce sites and Amazon Search powers the majority of Amazon’s sales. A key component of Amazon Search is the query understanding pipeline, which extracts appropriate semantic information used to precisely display products for billions of queries everyday. In this talk, we will go through the primary building blocks of query understanding pipeline.
Amazon Search enables users to search against structured products, hence it is necessary to extract information from queries in a format that is consistent with the structured information about the products. Query tagging is the task of semantically annotating query terms to pre-defined labels (such as brand, product-type and color). We propose a scalable system to train large-scale machine learning algorithms to solve this problem. Our system improved the precision over baseline, which is a dictionary lookup based tagger, by 10% and approximately doubled the recall.

Published in: Technology
  • Be the first to comment

Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016

  1. 1. September 23, 2016 Query Understanding In Amazon Search Tanvi Motwani Data Scientist Amazon Search
  2. 2. PRODUCT SEARCH 2
  3. 3. MOTIVATION Fashion Product Query Brand Product Type Price under Gender Is Prime 3
  4. 4. Power Law Distribution of Queries Large population of long tail queries Speedy Search Response Fast Models that respond in milliseconds Dynamic Search Trends Adaptive to new trends Global Search Reach Deal with 10 different languages CHALLENGES 4
  5. 5. “label”: “_color”, “id” : C232, “name” : “Black” “label”: “_brand”, “id” : B1402, “name” : “The North Face” “label” : “_product”, “id” : P232, “name” : “Jacket” “category” : “Fashion > Clothing > Jackets & Coats” “class”: “product_query”, “score”: 0.9 “name”: “query_specificity”, “score”: 0.7 “class”: “fashion”, “score”: 0.8 QUERY TAGGERS QUERY CLASSIFIERS 5
  6. 6. QUERY CLASSIFIERS 6
  7. 7. QUERY CATEGORY CLASSIFIER “A Multiclass Classifier which classifies input user query into Amazon Categories.” 7
  8. 8. QUERY CATEGORY CLASSIFIER • Automatic generates large training dataset • Frequent refresh of training data possible • Trigram model generalizes well for tail queries tv ipod projector speakers headphones pillow curtains pet bells mattress shower curtain suits mr robot star trek downton abbey game of thrones Trigram Language Model 8 Large percentage of query searches happen within a category
  9. 9. CUSTOMER SERVICE QUERY CLASSIFIER “Classifies query into customer service queries versus product query.” contact amazon amazon phone number how do I cancel my order? where is my order history? where is my order? how can I see videos? amazon prime video help 9
  10. 10. QUERY TAGGERS 10
  11. 11. Brand Product Type Price under Gender Is Prime QUERY TAGGING FILTERING 11
  12. 12. QUERY TAGGING IMPROVED UI 12
  13. 13. QUERY TAGGING (BRAND) adidas shoes jansport backpack ray ban sunglasses ralph lauren men BRAND BRAND BRAND BRAND ralph lauren men IB O Conditional Random Field adidas shoes B O ray ban sunglasses B I O ralph lauren men B I O jansport backpack B O how? TRAIN 13 what?
  14. 14. 14 • Discriminative Model – models conditional probability P(Y|X). We do not care to model P(X) • Features: word capitalized, word in atlas or name list, previous word is “Mrs”, next word is “Times”, … Recommended Tutorial on CRF – An Introduction to Conditional Random Fields for Relational Learning https://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf CONDITIONAL RANDOM FIELD
  15. 15. adidas shoes jansport backpack ray ban sunglasses north face black jacket polo ralph lauren men white shoes ralph lauren click add purchase QUERY LOGS PRODUCT CATALOGUE 15
  16. 16. north face black jacket QUERY LOGS PRODUCT CATALOGUE 16
  17. 17. arg max 𝑏 𝑖 ∈ 𝑃(𝑏) 𝑓(𝑐𝑖, 𝑎𝑖, 𝑝𝑖) 𝑤ℎ𝑒𝑟𝑒, 𝑏 𝑖𝑠 𝑎 𝑏𝑟𝑎𝑛𝑑 𝑎𝑛𝑑, 𝑃(𝑏) 𝑎𝑟𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑏𝑟𝑎𝑛𝑑 𝑏 north face black jacket 0.8 0.2BRAND Matching Strategies: • Attribute completely contained in query • Match after removing stop words, prepositions etc. • Partial query-attribute match 17
  18. 18. QUERY TAGGING - SUMMARY Training Data Generator Query Logs Product Catalogue Conditional Random Field Search Engine TRAIN DEPLOY Manual Overrides 18 • Context aware “Philosophy books” v/s “Philosophy face wash” • Different formulations of same entity “Marc by Marc Jacobs” v/s “Marc Jacobs”
  19. 19. Query Understanding Team • Palo Alto, California • Munich, Germany • Tokyo, Japan • Beijing, China Acknowledgements Mukund Seshadri, Tracy King, Will Headden, Louka Dlagnekov, Tianyu Cao, Rahul Goutam, Huascar Fiorletta, Alexander Zeyliger, Smruthi Mukund, Konstantin Stulov, Himanshu Gahlot, Yosi Shturm, Taro Kawagishi, Ravi Jammalamadaka, Anand Lakshminath, Hernan, Greg Miller, Heran, Nick Trown
  20. 20. ACCESSORY QUERY CLASSIFIER 20
  21. 21. Base Product DB Accessory Product DB ACCESSORY QUERY CLASSIFIER macbook pro macs apple laptop laptop Base Query Corpus mac ram apple sleeve laptop cover apple skin Accessory Query Corpus Binary Classifier Class A Class B Search Engine 21
  22. 22. ACCESSORY QUERY CLASSIFIER 22
  23. 23. Base Product DB Accessory Product DB ACCESSORY QUERY CLASSIFIER paperwhite kindle amazon tablet book reader Base Query Corpus Kindle case Kindle cover Amazon cover case for kindle Accessory Query Corpus Binary Classifier Class A Class B Search Engine 23
  24. 24. Training Data Generator Query Logs Product Catalogue Conditional Random Field Search Engine TRAIN DEPLOY QUERY TAGGING - SUMMARY Manual Overrides Validation Techniques: • Offline validation  Cross validation 80/20 split  Manual Gold Standard evaluation • A/B test  Control – Before the model was deployed  Treatment – After the model is deployed 24
  25. 25. • Dictionary methods are not context aware  Example: “philosophy books”, dictionary method will tag “philosophy” as brand. • Fails to detect different formulations of same entity.  Example: “mk” vs. “michael kors” COMPARISON TO DICTIONARY LOOKUP METHODS Our system improved precision over baseline by 10% and approximately doubled the recall. 25
  26. 26. GLOBAL REACH 26
  27. 27. 27 GENERATIVE v/s DISCRIMINATIVE MODELS 𝑃(𝑌, 𝑋) 𝑃 𝑌 𝑋)

×