SlideShare a Scribd company logo
1 of 21
Query Understanding &
Voice Search
William Yu & Charles Zhang
StubHub
• Mission to bring joy of live events to fans globally
• Acquired by eBay in 2007
• World’s largest ticket marketplace
• About 1 ticket is sold on StubHub every 1.3 seconds
• Every day, StubHub sends 80,000+ fans to events
• Present in 48 countries
• 200+ partnerships worldwide
• All 30 MLB teams
• NFL, NBA, NHL, MLS, NCAA, and others
Search Architecture
Example Queries
• “Giants”
• “The the”
• “The white elephants”
• “Concerts this weekend”
• “Find events in San Francisco under $50”
Example Queries - Challenges
• “giants”
• entity disambiguation – New York Giants vs San Francisco Giants
• “the the”
• relevancy – more on this later
• “the white elephants”
• alias detection – a nickname for the Oakland Athletics
• “concerts this weekend”
• entity detection – concerts [category] this weekend [date/time]
• “find events in san francisco under $50”
• entity detection – find events [category] in san francisco [city] under
$50 [price]
Bag of Words approach
Example query: “Taylor Swift concerts”
• Tokenize: “Taylor”, ”Swift”, ”concerts”
• Remove stop words: “Taylor”, “Swift”, “concerts”
Problems:
• “Giants game” vs “the game”
• ”game” is a stop word in one case and an artist in the other
• “the the band”
• excluding “the” removes all information
• including “the” returns all results with “the” and “band”
Query Understanding &
Entity Detection
Making sense of the query and sending results to Solr
e.g., “find giants tickets this weekend at at&t park”
• find giants [Performer] tickets this weekend [date] at at&t park
[venue]
• “giants” -> PerformerId:197
• “this weekend” -> [2018-09-08T00:00:00.000Z TO 2018-09-
10T00:00:00.000Z]
• “at&t park” -> VenueId: 82
Ambiguity
• Conflicts between entities:
• “bruno mars weekend”
• ”red sky july”
• “steve march”
• ”steve march”
• “steve [performer] march [date]” or “steve march [performer]”
• Solution: more user queries
• Bootstrapping and encouraging user behavior with a conservative
approach
Steve March
”steve march” -> “steve [performer] march [date]” or “steve march [performer]”
Red Sky July
”red sky july” -> “red sky [performer] july [date]” or “red sky july [performer]”
Skewed performer searches
Performers searched
Performers searched vs count
Approach
Query Classifier
Rule Based NER
Machine
Learning
NER
Query
Precise query
Conversational query
Query Classification
• Differentiate between “precise” and “conversational” queries
• Precise -> “giants this weekend”
• Conversational -> “find me a giants game in new york
happening this weekend”
• WEKA Naïve Bayesian classifier
• Accuracy of 96% on generated queries, with spot-checking on
a few randomly selected queries
Rule-based Entity Detection
• Based on predefined rules or patterns and lookup
• Not particularly accurate ~70%
• Conservative approach - does not return many false positives
• e.g.,
• PERFORMER, CONJUNCTION, PERFORMER, PRICE
“sf giants vs Oakland a’s under 30”
• PERFORMER, DATE, PRICE
“maroon 5 next month under $25”
• PERFORMER, PRICE
“foo fighters under $200"
• UNKNOWN, DATE, PRICE
“tickets for this weekend under $20”
Stanford NLP &
Conditional Random Fields (CRF)
Find me AT&T Parkatthis weekendgiants
UNKNOWN VENUEUNKNOWNDATE/TIMEPERFORMER
Training
• Gazettes -> List of entities
• Features
• shape features -> n-grams
• use ordinals
• use class features
• order of CRF
• use word
• use date range
• gazette features
• 27 features in total
• 95% accuracy on generated queries*
Performer Disambiguation
• giants -> San Francisco Giants, New York Giants, San Jose Giants etc.
• User click count data from suggestions based on location.
Alias Detection
• Alias generation on index side
• e.g., “the white elephants” for Oakland Athletics
• e.g., “the boys from the bay” for San Francisco Giants
• Conservative approach to alias generation
StubHub
Search Team
20
Engineering Manager
Charles Zhang
Software Engineer
William Yu
Sr. Staff Software Engineer
Rui Niu
Software Engineer
Mrugen Deshmukh
Software Engineer
Ankit Patil
Software Engineer Intern
Akhilesh Devowanshi
THANK
YOU!!

More Related Content

More from Lucidworks

Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 
Webinar: Lucidworks Managed Search
Webinar: Lucidworks Managed SearchWebinar: Lucidworks Managed Search
Webinar: Lucidworks Managed SearchLucidworks
 
KMWorld 2019 - Diane Burley - Bringing a Consumer-Like Experience to the Digi...
KMWorld 2019 - Diane Burley - Bringing a Consumer-Like Experience to the Digi...KMWorld 2019 - Diane Burley - Bringing a Consumer-Like Experience to the Digi...
KMWorld 2019 - Diane Burley - Bringing a Consumer-Like Experience to the Digi...Lucidworks
 
Using Search to Drive Self-Help Success at Veritas
Using Search to Drive Self-Help Success at VeritasUsing Search to Drive Self-Help Success at Veritas
Using Search to Drive Self-Help Success at VeritasLucidworks
 
Using Signals in Lucidworks Fusion
Using Signals in Lucidworks FusionUsing Signals in Lucidworks Fusion
Using Signals in Lucidworks FusionLucidworks
 

More from Lucidworks (20)

Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 
Webinar: Lucidworks Managed Search
Webinar: Lucidworks Managed SearchWebinar: Lucidworks Managed Search
Webinar: Lucidworks Managed Search
 
KMWorld 2019 - Diane Burley - Bringing a Consumer-Like Experience to the Digi...
KMWorld 2019 - Diane Burley - Bringing a Consumer-Like Experience to the Digi...KMWorld 2019 - Diane Burley - Bringing a Consumer-Like Experience to the Digi...
KMWorld 2019 - Diane Burley - Bringing a Consumer-Like Experience to the Digi...
 
Using Search to Drive Self-Help Success at Veritas
Using Search to Drive Self-Help Success at VeritasUsing Search to Drive Self-Help Success at Veritas
Using Search to Drive Self-Help Success at Veritas
 
Using Signals in Lucidworks Fusion
Using Signals in Lucidworks FusionUsing Signals in Lucidworks Fusion
Using Signals in Lucidworks Fusion
 

Recently uploaded

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Ticket Search By Voice and NLP at Stubhub - Charles Zhang & William Yu, Stubhub

  • 1. Query Understanding & Voice Search William Yu & Charles Zhang
  • 2. StubHub • Mission to bring joy of live events to fans globally • Acquired by eBay in 2007 • World’s largest ticket marketplace • About 1 ticket is sold on StubHub every 1.3 seconds • Every day, StubHub sends 80,000+ fans to events • Present in 48 countries • 200+ partnerships worldwide • All 30 MLB teams • NFL, NBA, NHL, MLS, NCAA, and others
  • 4. Example Queries • “Giants” • “The the” • “The white elephants” • “Concerts this weekend” • “Find events in San Francisco under $50”
  • 5. Example Queries - Challenges • “giants” • entity disambiguation – New York Giants vs San Francisco Giants • “the the” • relevancy – more on this later • “the white elephants” • alias detection – a nickname for the Oakland Athletics • “concerts this weekend” • entity detection – concerts [category] this weekend [date/time] • “find events in san francisco under $50” • entity detection – find events [category] in san francisco [city] under $50 [price]
  • 6. Bag of Words approach Example query: “Taylor Swift concerts” • Tokenize: “Taylor”, ”Swift”, ”concerts” • Remove stop words: “Taylor”, “Swift”, “concerts” Problems: • “Giants game” vs “the game” • ”game” is a stop word in one case and an artist in the other • “the the band” • excluding “the” removes all information • including “the” returns all results with “the” and “band”
  • 7. Query Understanding & Entity Detection Making sense of the query and sending results to Solr e.g., “find giants tickets this weekend at at&t park” • find giants [Performer] tickets this weekend [date] at at&t park [venue] • “giants” -> PerformerId:197 • “this weekend” -> [2018-09-08T00:00:00.000Z TO 2018-09- 10T00:00:00.000Z] • “at&t park” -> VenueId: 82
  • 8. Ambiguity • Conflicts between entities: • “bruno mars weekend” • ”red sky july” • “steve march” • ”steve march” • “steve [performer] march [date]” or “steve march [performer]” • Solution: more user queries • Bootstrapping and encouraging user behavior with a conservative approach
  • 9. Steve March ”steve march” -> “steve [performer] march [date]” or “steve march [performer]”
  • 10. Red Sky July ”red sky july” -> “red sky [performer] july [date]” or “red sky july [performer]”
  • 11. Skewed performer searches Performers searched Performers searched vs count
  • 12. Approach Query Classifier Rule Based NER Machine Learning NER Query Precise query Conversational query
  • 13. Query Classification • Differentiate between “precise” and “conversational” queries • Precise -> “giants this weekend” • Conversational -> “find me a giants game in new york happening this weekend” • WEKA Naïve Bayesian classifier • Accuracy of 96% on generated queries, with spot-checking on a few randomly selected queries
  • 14. Rule-based Entity Detection • Based on predefined rules or patterns and lookup • Not particularly accurate ~70% • Conservative approach - does not return many false positives • e.g., • PERFORMER, CONJUNCTION, PERFORMER, PRICE “sf giants vs Oakland a’s under 30” • PERFORMER, DATE, PRICE “maroon 5 next month under $25” • PERFORMER, PRICE “foo fighters under $200" • UNKNOWN, DATE, PRICE “tickets for this weekend under $20”
  • 15. Stanford NLP & Conditional Random Fields (CRF) Find me AT&T Parkatthis weekendgiants UNKNOWN VENUEUNKNOWNDATE/TIMEPERFORMER
  • 16. Training • Gazettes -> List of entities • Features • shape features -> n-grams • use ordinals • use class features • order of CRF • use word • use date range • gazette features • 27 features in total • 95% accuracy on generated queries*
  • 17. Performer Disambiguation • giants -> San Francisco Giants, New York Giants, San Jose Giants etc. • User click count data from suggestions based on location.
  • 18. Alias Detection • Alias generation on index side • e.g., “the white elephants” for Oakland Athletics • e.g., “the boys from the bay” for San Francisco Giants • Conservative approach to alias generation
  • 19.
  • 20. StubHub Search Team 20 Engineering Manager Charles Zhang Software Engineer William Yu Sr. Staff Software Engineer Rui Niu Software Engineer Mrugen Deshmukh Software Engineer Ankit Patil Software Engineer Intern Akhilesh Devowanshi

Editor's Notes

  1. Give more context prior to introducing these examples (e.g., StubHub has the largest catalog of performers and events, and thus faces a unique problem)
  2. Shape features: bigram / trigram / etc Ordinals: first, second, third Class features: label of previous word (i.e., entity type) Order of CRF: how many words to look at (order=2 means use two words) Use word: e.g., giants almost always the performer, so give a bias towards performer Use date range: e.g., this weekend, in October, etc. Gazette features: list of entities that we support