Data science plays an important role across many departments at eBay, including search, recommendations, fraud detection, and more. The document discusses three case studies:
1. Query categorization uses deep learning models to predict relevant product categories for queries to improve search results.
2. Personalized query autocompletion ranks suggestions based on a user's search history and context to provide more relevant recommendations.
3. Spell correction efficiently generates and ranks candidate corrections using language models and error models to identify the most likely corrections for queries.
4. What is Data Science?
Data science is a multidisciplinary field that uses scientific methods,
processes, algorithms and systems to extract knowledge and insights
from structured and unstructured data - Wikipedia
4
6. Why Data Science?
● Empowering management to make better decisions
● Directing actions based on trends—which in turn help to define goals
● Challenging the staff to adopt best practices and focus on issues that matter
● Identifying opportunities
● Decision making with quantifiable, data-driven evidence
● Testing these decisions
● Identification and refining of target audiences
● Recruiting the right talent for the organization
6Reference: https://www.simplilearn.com/why-and-how-data-science-matters-to-business-article
30. What is Query Categorization?
30
● Predict relevant product categories given a query
● Use high confident prediction to filter product listings
● Use confidence scores of the predictions to influence ranking
32. Deep Semantic Similarity Model
32
Huang, He, Gao, Deng, Acero, Heck, “Learning deep structured semantic models for web search using clickthrough data”, CIKM, 2013
33. 33
eBay Query Categorization
● Based on Convolutional Latent Semantic Model (CLSM)
○ Shen, He, Gao, Deng, Mesnil, “A latent semantic model with
convolutional-pooling structure for IR,” CIKM 2014
● Maximize the posterior probability of a category given a query
34. Training - Data Collection
● Test Data: Confident set from a future period
34
Query - Product
Category, Clicks,
Transactions
Confident Set: Queries w/ >= 90%
products in a single category
Ambiguous Set
Subsamples by
popularity
Train/Validation
Data
35. Query Categorization in Action
35
● Directly use historic data if there were
sufficient amount
● Use an experimentally determined
confidence score threshold to pick top
predictions
● Fallback to parent category or entire
inventory when there are no high confident
predictions
● Baseline = ngrams + BM25 + attribute filtration
● Absolute scale obfuscated
36. FastCat - Faster Training & Inference
36
● Based on (Joulin et al., “Bag of tricks for
efficient text classification”, arXiv, 2016)
○ Shallow network but deep learning
- no feature engineering
○ Bag of ngrams as input
○ Hierarchical softmax in the output
layer: log2
V outputs to evaluate
● Data collected as before
Training time
20X faster
Inference time
< 1 ms
Commodity
Hardware
Comparable
Accuracy
W1
W2
Wn-1
Wn
.
.
.
H
I
D
D
E
N
C
A
T
E
G
O
R
Y
Query
40. Why?
40
● Saves time for users
● Guides users to reach their products faster
● Avoids Spell errors
● Can help promote Top products
41. Why is it Challenging & Fun?
● Millions of Users
● Humongous Amount of Queries per sec
● Show Relevant Suggestions to users
● Detect spell errors and provide corrected suggestions
41
42. Most Popular Completions - Overview
42
User Prefix
Most Popular
Completions
(MPC)
Query Data
Get Top N
Queries
43. Most Popular Completions - Naive Approach
● Show Queries matching prefix based on popularity
● Popularity can be frequency or sale
43
44. Personalized Query Autocompletion
● Users queries in a session are around one or more intents
● Global query completions may be sub-optimal
44
Dslr camera Canon dslr camera Canon 5D Mark IV Canon lenses
45. Personalized Re-Ranker Overview
45
User Prefix
Most Popular
Completions
(MPC)
Query Data
Get Top N
Queries
Re-Ranker
Query
Features
User
Features
Manojkumar Rangasamy Kannadasan, Grigor Aslanyan, “Personalized Query Auto-Completion Through a Lightweight
Representation of the User Context” [Under Review]
46. Data Collection
● Billions of User Sessions
● Capture user behavioral activity
○ Prefix
○ Query Clicked from Autocomplete
○ Previous Queries issued by user
○ Queries viewed and not clicked
○ Global performance of the query
46
48. Understanding User Context
● Features computed based on previous queries issued by the user
○ Textual features like ngrams, # of terms, frequency, session-based etc
○ Similarity features based on text
○ Similarity features based on Vector representations
● Query Vectors can be learned by
○ Supervised - query transitions, queries from product co-clicks
○ Semi-Supervised - Word2Vec, fastText, GloVe
48
49. Model Training
● Positive Samples
○ Queries clicked in Autocomplete
● Negative Samples
○ Queries viewed and not clicked in Autocomplete
● Train a Machine Learned Ranking Model
○ Ref: https://en.wikipedia.org/wiki/Learning_to_rank
49
50. Evaluation
● MRR, Success Rate, MAP & nDCG
○ 20% - 30%**
Lift over MPC
○ 5% - 10%**
Lift over Non-Personalized Re-Ranker
50
** Manojkumar Rangasamy Kannadasan, Grigor Aslanyan, “Personalized Query Auto-Completion Through a Lightweight
Representation of the User Context” [Under Review]
55. Why is it Challenging and Fun?
● Special - Query Spell Correction for user generated item information
● Big - Millions of Users, Billions of Items
● Efficiency - Need to process humongous amount of queries per sec
● Precision - Suggest the correct spell correction for the correct query
55
62. Efficiency
62
Generate only the
ones we know?
Source: http://ajainarayanan.github.io/ctrlf/
tap
taps
top
tops
Ehsan Shareghi, Matthias Petri, Gholamreza Haffari, Trevor Cohn:
Compact, Efficient and Unlimited Capacity: Language Modeling with Compressed Suffix Trees. EMNLP 2015
63. Efficiency - Which one?
● Naïve: Slow, no memory footprint, unnecessary candidates (?)
● Trie: Faster, Huge memory footprint
● DAWG: Even Faster, Not-that-huge memory footprint
● Suffix Trees (not compressed): Humongous memory footprint
● Suffix Trees (compressed): Slowest, very small memory footprint
63
64. Language Model
● How likely is the candidate - p(c) ?
● p(c1
c2
c3
… cn
)? = p(levis blue jeans 32 in)?
● Naive Algorithm - look for number of occurrences of given query
○ What if we have never seen the query
○ Long queries will have poor count leading to poor probability estimates
● Markov Assumption - second order
○ p(c1
c2
c3
…cn
) = p(c1
)p(c2
|c1
)p(c3
|c1
c2
) … p(cn
|cn-2
cn-1
)
64
65. Language Model
● p(levis blue jeans 32 in) = p(levis)p(blue|levis)p(jeans|levis blue)p(32|blue
jeans)p(in|jeans 32)
● p(blue|levis) = count(levis,blue) / count(levis)
● Now we have to only deal with unigrams, bigrams and trigrams
● There are still issues
○ Words that we have never seen - we still need to assign some probability
65
66. Error Model
● p(query|correction)?
● How likely is that user wanted to type the correction but typed the query
● Multiple ways to estimate this
○ Keyboard distance
○ Phonetic distance
○ Mine your logs
66
67. Error Model
Industry approach
● To train an error model we need triples of (intended word, observed word,
count)
● We would expect
○ p(the|the) to be very high
○ p(teh|the) to be relatively high
○ p(hippopotamus|the) to be extremely low
67
68. Error Model
● Get 10 million most frequent unigrams
● Get all the candidates at certain edit distance (depending on word length)
● This will give a huge tuple list <apple, applo>
● Assumption is that top 10 million are generally correct
● Prune this list based on freq - apple should be at least 10x more frequent
68
71. Students & Recent Graduates
https://careers.ebayinc.com/join-our-team/students-recent-graduat
es/
71
Start your Career @ eBay
https://careers.ebayinc.com/join-our-team/start-your-search/
74. Language Model
● p(levis blue jeans 32 in) = p(levis)p(blue|levis)p(jeans|levis blue)p(32|blue
jeans)p(in|jeans 32)
● p(blue|levis) = count(levis,blue) / count(levis)
● Now we have to only deal with unigrams, bigrams and trigrams
● There are still issues
○ Words that we have never seen - we still need to assign some probability
○ Adjustment of probabilities to demote high freq words - the, a etc
○ Backoff scores - KenLM (https://kheafield.com/code/kenlm/)
74