Data-Driven Approach to Search Relevance
Eric Melz
Measured Search
Praveena Subrahmanyam
Ticketmaster
Los Angeles Search, Data, and Analytics Meetup
June 26, 2017
1
2
About the Speakers
Praveena Subrahmanyam
• Senior Architect and Search Lead at Ticketmaster

• ~ 2 years at Ticketmaster

• Geek, Mom, Travel enthusiast

Eric Melz
• Head of Engineering at Measured Search

• Over 20 years in Tech - LinkedIn, Google, Oracle, etc

• Used to work at TicketMaster
3
About Ticketmaster
The World’s Leading Live Entertainment Company
• A Live Nation Company

• Founded over 40 years ago

• Selling over 400 million tickets each year

• Supporting 240K events, 200K attractions and 100K venues across 80+ countries

• Open API’s

• Follow us @ticketmaster
4
• From the homepage, Search is the Top used feature

• 50-60% of sessions use search
Search at Ticketmaster
5
Challenges
• Relevancy

• Text Relevancy

• Popularity

• Geo

• Personalization

• Fix one thing break another thing!

• Long tail

• Performance 

• Index

• Query

• Scale

• Documents

• QPS

• Multilingual Documents

• Storing

• Querying
6
• Exploratory

• Manual Testing

• Reports

• Feedback

• Social Media

• Internal

• Dev Jams

• Data Driven
Approaches
7
Measured Search Overview
•Intro

•A/B Testing

•A/B Testing for Search

•Model Simulation

•TicketMaster Model Simulator
SearchStax: Open Source based
Platform-as-a-Service
Accelerate your time to market by flattening
the Solr learning curve and going straight to
development. Focus on your search
application and save months of headaches in
setup, provisioning, production readiness and
administration.
Managed Services and
Support
Our always-ready Solr experts are
only a call or an email away – every
day, all day and night, all year round.
Enjoy peace of mind with fully
managed Solr-as-a-Service.
Highly Skilled and Experienced
Open Source Search Experts
Our engineers have decades of
experience and delivered numerous
engagements in the field of search,
analytics and machine learning. These
same search experts are available on
an ad hoc basis to help ensure your
projects success.
Measured Search
8
Accelerate your timeline Peace of Mind On-Demand Expertise
Measured Search® enables companies to elevate the experience of Search
based applications faster and with more confidence.
9
A / B Testing
10
A / B Testing - Fundamentals
Split User population into Segments
Each Segment sees a different variant
• Control - existing version (“A”)

• Treatment - proposed version (“B”)
Variable - metric we hope improves

in the treatment group
11
A / B Testing - Example
Split Users into Segments
• segmentId = userId mod 2
Each Segment sees a different variant
• Control - existing version (“A”)

• Blue Button

• Treatment - proposed version (“B”)

• Green Button
Variable - metric we hope improves

in the treatment group

• Click rate
12
Search - Fundamentals
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Query (aka Search)
Result Set
Rank (aka Position)
Result Item
13
Search A / B Testing - Variants
Variant parameters: Search Index + Ranking Algorithm
Index A
+
Ranking A
Index B
+
Ranking B
Paul M
?
Control Treatment
14
Search A / B Testing - Variables

Click Through Rate
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Click!
15
Search A / B Testing - Variables

Click Through Rate (CTR)
Clicked ClickedNot Clicked
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Click!
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Click!
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Click!
Clicked
Control
CTR = 3/4
Paul M
1. Justin Bieber
2. Paul Manafort
3. Paul McCartney
Not Clicked
Treatment
Paul M
1. Justin Bieber
2. Paul Manafort
3. Paul McCartney
Not Clicked
Paul M
1. Justin Bieber
2. Paul Manafort
3. Paul McCartney
Paul M
1. Justin Bieber
2. Paul Manafort
3. Paul McCartney
Not Clicked
CTR = 1/4
Clicked
Click!
Score = # Clicks / # Searches
Higher scores are better
16
Search A / B Testing - Variables

Manual (aka Human) Relevance Ranking
Foreach Query Q

• Foreach Item I

• Manually assign Relevance(Q,I)
Query Item Relevance
Paul M Justin Bieber 5
Paul M Paul Manafort 20
Paul M Paul McCartney 98
Paul Ma Justin Bieber 5
Paul Ma Paul Manafort 90
Paul Ma Paul McCartney 70
17
Search A / B Testing - Variables

Human Ranking - Example
Score = Sum(Relevance / Rank )
Higher scores are better
Rank Item Relevance
Relevance /
Rank
1 Paul McCartney 98 98 / 1
2 Paul Manafort 20 20 / 2
3 Justin Bieber 5 5 / 3
Total 109.7
Control
Rank Item Relevance
Relevance /
Rank
1 Justin Bieber 5 5 / 1
2 Paul Manafort 20 20 / 2
3 Paul McCartney 98 98 / 3
Total 47.7
TreatmentPaul M Paul M
18
Search A / B Testing - Variables

Human Ranking - Issue
Foreach Query Q

• Foreach Item I

• Manually assign Relevance(Q,I)
100K queries x 100K items = 10,000,000,000 ratings!
19
Search A / B Testing - Variables

Average Click Position
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Click!
Clicked 3
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Click!
Clicked 1
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Click!
Clicked 1
Control
Avg Click Pos =

(1 + 2 + 1 + 1) / 4 =

1.25
Paul M
1. Justin Bieber
2. Paul Manafort
3. Paul McCartney
Treatment
Paul M
1. Justin Bieber
2. Paul Manafort
3. Paul McCartney
Not Clicked
Paul M
1. Justin Bieber
2. Paul Manafort
3. Paul McCartney
Paul M
1. Justin Bieber
2. Paul Manafort
3. Paul McCartney
Clicked 2
Click!
Score = Average(Click Pos)
Lower scores are better
Clicked 2
Click!
Avg Click Pos =

(3 + 2 + 3) / 3 =

2.6
Clicked 3
Click!
Clicked 1
Click!
20
Search A / B Testing - Variables

Mean Reciprocal Rank (MRR)
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Click!
Clicked 3
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Click!
Clicked 1
Paul M
1. Paul McCartney
2. Paul Manafort
3. Justin Bieber
Click!
Clicked 1
Control
MRR =

(1/1 + 1/2 + 1/1 + 1/1) / 4 =
0.88
Paul M
1. Justin Bieber
2. Paul Manafort
3. Paul McCartney
Treatment
Paul M
1. Justin Bieber
2. Paul Manafort
3. Paul McCartney
Not Clicked
Paul M
1. Justin Bieber
2. Paul Manafort
3. Paul McCartney
Paul M
1. Justin Bieber
2. Paul Manafort
3. Paul McCartney
Clicked 2
Click!
Score = Average(1 / Click Pos)
Higher scores are better (will be in range (0,1])
Clicked 2
Click!
MRR =

(1/3 + 1/2 + 1/3) / 3 =

0.38
Clicked 3
Click!
Clicked 1
Click!
21
A / B Testing - Variables - No Results Searches
Score = # No-Result-Searches/ # Searches
Lower scores are better (will be in range [0,1])
Paul M
1. Paul McCartney
2. Paul Manafort
Paul
NO RESULTS!
No Results
1. Justin Bieber
Results
1. Justin Bieber
Results
Control
No Results =

1/4
Paul
NO RESULTS!
Treatment
Paul M
1. Paul McCartney
2. Paul Manafort
Results
Justin Beeb
NO RESULTS!
Justin Bieb
1. Justin Bieber
No Results
Results
No Results = 

2/4
Results
No Results
Justin Beeb Justin Bieb
22
A / B Testing - Issues
•Need adequate sample sizes to achieve
statistical significance

•Treatment should…

•Have negligible impact to business

•Revenue

•Goodwill

•Be production ready

•Secure

•Performant

•Acceptable UX

•Compatible with prod tech stack

•Have org approval for prod release
23
Model Simulation - Fundamentals
•Alternative to A/B testing - Simulation

•Don’t direct traffic to different variants

•Single variant - control

•Record requests to control

•Replay recorded requests against treatment (in
dev environment)

•Measure performance of treatment against
control
24
Search Model Simulation - Specifics
• Record (from control)

• Searches (queries)

• Searchclicks (queries + item + item position)

• Replay (to treatment)

• Searches - used to compute 

• % of No-Result searches

• Searchclicks - used to compute

•Average Click Position
•MRR
• Report

• Metrics

• Average Click Pos

• MRR

• % of No-Result Searches

•Items clicked on in control, but not found in treatment
25
Model Simulation - Flow
A
Control
Index
B
Treatment
Index
Event
DataSearchStax
Searches
Start Simulation
Fetch Results
Model
Simulator
Fetch
Data
Upload
Results
Track Events
Run Queries
Searcher
Analyst
26
Model Simulation - Tech Stack
• Search Indexes

• Elastic Search

• SearchStax

• Python/Django

• MongoDB

• RDS/MySQL

• Deployed in Measured Search AWS VPC

• Model Simulator

• Jenkins

• Python/Django

• Sqlite

• Docker

• Deployed in Ticketmaster AWS VPC
27
Model Simulator - Jenkins
28
Model Simulator - Reporting - Top
29
Model Simulator - Reporting - Metrics
30
Model Simulator - Reporting - Missing Items
31
Gather
Data
Categorize
Explore
Evaluate
Relevance Refinement Process
32
CATEGORIZE
Try to find patterns and categorize poorly performing
queries
33
EXPLORE
• Attack top queries
• Low hanging fruit
• Examine impact of changes
• Does it come with a cost?
34
EVALUATE
• Run the Model Simulator
• Regression Test
• Performance Test
• Did we become better?
35
Whats Next?
• Anticipatory Testing
• Automated Relevance
36
Q&A
37
Contact Info
Eric Melz
@ericmelz
eric@measuredsearch.com
https://www.measuredsearch.com
Praveena Subrahmanyam
@askpraveena
praveena.subrahmanyam@ticketmaster.com
https://www.ticketmaster.com

Data-Driven Approach to Search Relevance

  • 1.
    Data-Driven Approach toSearch Relevance Eric Melz Measured Search Praveena Subrahmanyam Ticketmaster Los Angeles Search, Data, and Analytics Meetup June 26, 2017 1
  • 2.
    2 About the Speakers PraveenaSubrahmanyam • Senior Architect and Search Lead at Ticketmaster • ~ 2 years at Ticketmaster • Geek, Mom, Travel enthusiast Eric Melz • Head of Engineering at Measured Search • Over 20 years in Tech - LinkedIn, Google, Oracle, etc • Used to work at TicketMaster
  • 3.
    3 About Ticketmaster The World’sLeading Live Entertainment Company • A Live Nation Company • Founded over 40 years ago • Selling over 400 million tickets each year • Supporting 240K events, 200K attractions and 100K venues across 80+ countries • Open API’s • Follow us @ticketmaster
  • 4.
    4 • From thehomepage, Search is the Top used feature • 50-60% of sessions use search Search at Ticketmaster
  • 5.
    5 Challenges • Relevancy • TextRelevancy • Popularity • Geo • Personalization • Fix one thing break another thing! • Long tail • Performance • Index • Query • Scale • Documents • QPS • Multilingual Documents • Storing • Querying
  • 6.
    6 • Exploratory • ManualTesting • Reports • Feedback • Social Media • Internal • Dev Jams • Data Driven Approaches
  • 7.
    7 Measured Search Overview •Intro •A/BTesting •A/B Testing for Search •Model Simulation •TicketMaster Model Simulator
  • 8.
    SearchStax: Open Sourcebased Platform-as-a-Service Accelerate your time to market by flattening the Solr learning curve and going straight to development. Focus on your search application and save months of headaches in setup, provisioning, production readiness and administration. Managed Services and Support Our always-ready Solr experts are only a call or an email away – every day, all day and night, all year round. Enjoy peace of mind with fully managed Solr-as-a-Service. Highly Skilled and Experienced Open Source Search Experts Our engineers have decades of experience and delivered numerous engagements in the field of search, analytics and machine learning. These same search experts are available on an ad hoc basis to help ensure your projects success. Measured Search 8 Accelerate your timeline Peace of Mind On-Demand Expertise Measured Search® enables companies to elevate the experience of Search based applications faster and with more confidence.
  • 9.
    9 A / BTesting
  • 10.
    10 A / BTesting - Fundamentals Split User population into Segments Each Segment sees a different variant • Control - existing version (“A”) • Treatment - proposed version (“B”) Variable - metric we hope improves
 in the treatment group
  • 11.
    11 A / BTesting - Example Split Users into Segments • segmentId = userId mod 2 Each Segment sees a different variant • Control - existing version (“A”) • Blue Button • Treatment - proposed version (“B”) • Green Button Variable - metric we hope improves
 in the treatment group • Click rate
  • 12.
    12 Search - Fundamentals PaulM 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Query (aka Search) Result Set Rank (aka Position) Result Item
  • 13.
    13 Search A /B Testing - Variants Variant parameters: Search Index + Ranking Algorithm Index A + Ranking A Index B + Ranking B Paul M ? Control Treatment
  • 14.
    14 Search A /B Testing - Variables Click Through Rate Paul M 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Click!
  • 15.
    15 Search A /B Testing - Variables Click Through Rate (CTR) Clicked ClickedNot Clicked Paul M 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Paul M 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Click! Paul M 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Click! Paul M 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Click! Clicked Control CTR = 3/4 Paul M 1. Justin Bieber 2. Paul Manafort 3. Paul McCartney Not Clicked Treatment Paul M 1. Justin Bieber 2. Paul Manafort 3. Paul McCartney Not Clicked Paul M 1. Justin Bieber 2. Paul Manafort 3. Paul McCartney Paul M 1. Justin Bieber 2. Paul Manafort 3. Paul McCartney Not Clicked CTR = 1/4 Clicked Click! Score = # Clicks / # Searches Higher scores are better
  • 16.
    16 Search A /B Testing - Variables Manual (aka Human) Relevance Ranking Foreach Query Q • Foreach Item I • Manually assign Relevance(Q,I) Query Item Relevance Paul M Justin Bieber 5 Paul M Paul Manafort 20 Paul M Paul McCartney 98 Paul Ma Justin Bieber 5 Paul Ma Paul Manafort 90 Paul Ma Paul McCartney 70
  • 17.
    17 Search A /B Testing - Variables Human Ranking - Example Score = Sum(Relevance / Rank ) Higher scores are better Rank Item Relevance Relevance / Rank 1 Paul McCartney 98 98 / 1 2 Paul Manafort 20 20 / 2 3 Justin Bieber 5 5 / 3 Total 109.7 Control Rank Item Relevance Relevance / Rank 1 Justin Bieber 5 5 / 1 2 Paul Manafort 20 20 / 2 3 Paul McCartney 98 98 / 3 Total 47.7 TreatmentPaul M Paul M
  • 18.
    18 Search A /B Testing - Variables Human Ranking - Issue Foreach Query Q • Foreach Item I • Manually assign Relevance(Q,I) 100K queries x 100K items = 10,000,000,000 ratings!
  • 19.
    19 Search A /B Testing - Variables Average Click Position Paul M 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Paul M 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Click! Clicked 3 Paul M 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Click! Clicked 1 Paul M 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Click! Clicked 1 Control Avg Click Pos = (1 + 2 + 1 + 1) / 4 = 1.25 Paul M 1. Justin Bieber 2. Paul Manafort 3. Paul McCartney Treatment Paul M 1. Justin Bieber 2. Paul Manafort 3. Paul McCartney Not Clicked Paul M 1. Justin Bieber 2. Paul Manafort 3. Paul McCartney Paul M 1. Justin Bieber 2. Paul Manafort 3. Paul McCartney Clicked 2 Click! Score = Average(Click Pos) Lower scores are better Clicked 2 Click! Avg Click Pos = (3 + 2 + 3) / 3 = 2.6 Clicked 3 Click! Clicked 1 Click!
  • 20.
    20 Search A /B Testing - Variables Mean Reciprocal Rank (MRR) Paul M 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Paul M 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Click! Clicked 3 Paul M 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Click! Clicked 1 Paul M 1. Paul McCartney 2. Paul Manafort 3. Justin Bieber Click! Clicked 1 Control MRR = (1/1 + 1/2 + 1/1 + 1/1) / 4 = 0.88 Paul M 1. Justin Bieber 2. Paul Manafort 3. Paul McCartney Treatment Paul M 1. Justin Bieber 2. Paul Manafort 3. Paul McCartney Not Clicked Paul M 1. Justin Bieber 2. Paul Manafort 3. Paul McCartney Paul M 1. Justin Bieber 2. Paul Manafort 3. Paul McCartney Clicked 2 Click! Score = Average(1 / Click Pos) Higher scores are better (will be in range (0,1]) Clicked 2 Click! MRR = (1/3 + 1/2 + 1/3) / 3 = 0.38 Clicked 3 Click! Clicked 1 Click!
  • 21.
    21 A / BTesting - Variables - No Results Searches Score = # No-Result-Searches/ # Searches Lower scores are better (will be in range [0,1]) Paul M 1. Paul McCartney 2. Paul Manafort Paul NO RESULTS! No Results 1. Justin Bieber Results 1. Justin Bieber Results Control No Results = 1/4 Paul NO RESULTS! Treatment Paul M 1. Paul McCartney 2. Paul Manafort Results Justin Beeb NO RESULTS! Justin Bieb 1. Justin Bieber No Results Results No Results = 2/4 Results No Results Justin Beeb Justin Bieb
  • 22.
    22 A / BTesting - Issues •Need adequate sample sizes to achieve statistical significance •Treatment should… •Have negligible impact to business •Revenue •Goodwill •Be production ready •Secure •Performant •Acceptable UX •Compatible with prod tech stack •Have org approval for prod release
  • 23.
    23 Model Simulation -Fundamentals •Alternative to A/B testing - Simulation •Don’t direct traffic to different variants •Single variant - control •Record requests to control •Replay recorded requests against treatment (in dev environment) •Measure performance of treatment against control
  • 24.
    24 Search Model Simulation- Specifics • Record (from control) • Searches (queries) • Searchclicks (queries + item + item position) • Replay (to treatment) • Searches - used to compute • % of No-Result searches • Searchclicks - used to compute •Average Click Position •MRR • Report • Metrics • Average Click Pos • MRR • % of No-Result Searches •Items clicked on in control, but not found in treatment
  • 25.
    25 Model Simulation -Flow A Control Index B Treatment Index Event DataSearchStax Searches Start Simulation Fetch Results Model Simulator Fetch Data Upload Results Track Events Run Queries Searcher Analyst
  • 26.
    26 Model Simulation -Tech Stack • Search Indexes • Elastic Search • SearchStax • Python/Django • MongoDB • RDS/MySQL • Deployed in Measured Search AWS VPC • Model Simulator • Jenkins • Python/Django • Sqlite • Docker • Deployed in Ticketmaster AWS VPC
  • 27.
  • 28.
    28 Model Simulator -Reporting - Top
  • 29.
    29 Model Simulator -Reporting - Metrics
  • 30.
    30 Model Simulator -Reporting - Missing Items
  • 31.
  • 32.
    32 CATEGORIZE Try to findpatterns and categorize poorly performing queries
  • 33.
    33 EXPLORE • Attack topqueries • Low hanging fruit • Examine impact of changes • Does it come with a cost?
  • 34.
    34 EVALUATE • Run theModel Simulator • Regression Test • Performance Test • Did we become better?
  • 35.
    35 Whats Next? • AnticipatoryTesting • Automated Relevance
  • 36.
  • 37.
    37 Contact Info Eric Melz @ericmelz eric@measuredsearch.com https://www.measuredsearch.com PraveenaSubrahmanyam @askpraveena praveena.subrahmanyam@ticketmaster.com https://www.ticketmaster.com