Ticket Search By Voice and NLP at Stubhub - Charles Zhang & William Yu, Stubhub

Query Understanding &
Voice Search
William Yu & Charles Zhang

StubHub
• Mission to bring joy of live events to fans globally
• Acquired by eBay in 2007
• World’s largest ticket marketplace
• About 1 ticket is sold on StubHub every 1.3 seconds
• Every day, StubHub sends 80,000+ fans to events
• Present in 48 countries
• 200+ partnerships worldwide
• All 30 MLB teams
• NFL, NBA, NHL, MLS, NCAA, and others

Example Queries
• “Giants”
• “The the”
• “The white elephants”
• “Concerts this weekend”
• “Find events in San Francisco under $50”

Example Queries - Challenges
• “giants”
• entity disambiguation – New York Giants vs San Francisco Giants
• “the the”
• relevancy – more on this later
• “the white elephants”
• alias detection – a nickname for the Oakland Athletics
• “concerts this weekend”
• entity detection – concerts [category] this weekend [date/time]
• “find events in san francisco under $50”
• entity detection – find events [category] in san francisco [city] under
$50 [price]

Bag of Words approach
Example query: “Taylor Swift concerts”
• Tokenize: “Taylor”, ”Swift”, ”concerts”
• Remove stop words: “Taylor”, “Swift”, “concerts”
Problems:
• “Giants game” vs “the game”
• ”game” is a stop word in one case and an artist in the other
• “the the band”
• excluding “the” removes all information
• including “the” returns all results with “the” and “band”

Query Understanding &
Entity Detection
Making sense of the query and sending results to Solr
e.g., “find giants tickets this weekend at at&t park”
• find giants [Performer] tickets this weekend [date] at at&t park
[venue]
• “giants” -> PerformerId:197
• “this weekend” -> [2018-09-08T00:00:00.000Z TO 2018-09-
10T00:00:00.000Z]
• “at&t park” -> VenueId: 82

Ambiguity
• Conflicts between entities:
• “bruno mars weekend”
• ”red sky july”
• “steve march”
• ”steve march”
• “steve [performer] march [date]” or “steve march [performer]”
• Solution: more user queries
• Bootstrapping and encouraging user behavior with a conservative
approach

Steve March
”steve march” -> “steve [performer] march [date]” or “steve march [performer]”

Red Sky July
”red sky july” -> “red sky [performer] july [date]” or “red sky july [performer]”

Skewed performer searches
Performers searched
Performers searched vs count

Approach
Query Classifier
Rule Based NER
Machine
Learning
NER
Query
Precise query
Conversational query

Query Classification
• Differentiate between “precise” and “conversational” queries
• Precise -> “giants this weekend”
• Conversational -> “find me a giants game in new york
happening this weekend”
• WEKA Naïve Bayesian classifier
• Accuracy of 96% on generated queries, with spot-checking on
a few randomly selected queries

Rule-based Entity Detection
• Based on predefined rules or patterns and lookup
• Not particularly accurate ~70%
• Conservative approach - does not return many false positives
• e.g.,
• PERFORMER, CONJUNCTION, PERFORMER, PRICE
“sf giants vs Oakland a’s under 30”
• PERFORMER, DATE, PRICE
“maroon 5 next month under $25”
• PERFORMER, PRICE
“foo fighters under $200"
• UNKNOWN, DATE, PRICE
“tickets for this weekend under $20”

Stanford NLP &
Conditional Random Fields (CRF)
Find me AT&T Parkatthis weekendgiants
UNKNOWN VENUEUNKNOWNDATE/TIMEPERFORMER

Training
• Gazettes -> List of entities
• Features
• shape features -> n-grams
• use ordinals
• use class features
• order of CRF
• use word
• use date range
• gazette features
• 27 features in total
• 95% accuracy on generated queries*

Performer Disambiguation
• giants -> San Francisco Giants, New York Giants, San Jose Giants etc.
• User click count data from suggestions based on location.

Alias Detection
• Alias generation on index side
• e.g., “the white elephants” for Oakland Athletics
• e.g., “the boys from the bay” for San Francisco Giants
• Conservative approach to alias generation

StubHub
Search Team
20
Engineering Manager
Charles Zhang
Software Engineer
William Yu
Sr. Staff Software Engineer
Rui Niu
Software Engineer
Mrugen Deshmukh
Software Engineer
Ankit Patil
Software Engineer Intern
Akhilesh Devowanshi

Ticket Search By Voice and NLP at Stubhub - Charles Zhang & William Yu, Stubhub

Recommended

Recommended

More Related Content

More from Lucidworks

More from Lucidworks (20)

Recently uploaded

Recently uploaded (20)

Ticket Search By Voice and NLP at Stubhub - Charles Zhang & William Yu, Stubhub

Editor's Notes