Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR to work! Now what? - Xun Wang

1
Towards a Learning To Rank
Ecosystem @ snag
---- We've got LTR to “work”, now what?
Xun Wang (xun.wang@snag.co)

2
Iterating LTR beyond v1
Agenda today
● Snag Overview
● Snag & Learning to Rank
● Troubleshooting Learning to Rank
● Elements of the LTR Ecosystem (LTR
2.0 initiatives)

4
Snag is an Hourly Job Marketplace
snag
● Marketplace health
● Member growth
● Revenue growth
Job-seekers
● Preference
● Qualiﬁcations
● Schedule
● Responsiveness
Employers &
Hiring Agencies
● Candidate volume
● Candidate quality
● Cost per hire/lead
Fabio
Rosati
90MM+ registered workers 150K+ hires per month 400K+ active locations

5
Hourly Jobs are Transactional
● Fragmented
Organized around “Shifts”. A worker
can be assigned 1 to 30+ hours per
week. Many hold multiple jobs
● High turnover
Workers stay at each job for 6 months
on average
● Lightly Skilled
Many hourly jobs require just a high
school diploma https://www.snag.co/employers/wp-content/uploads/2016/07/2016
_SOTHW_Report-3.pdf

6
Hourly Job Search is Open-ended
Schedule and location more
important than actual job duty
Queries not explicit (40% without keywords)

7
Matching Hourly Jobs
Recommendation
Search/IR
Explicit
Requests
Implicit
Feedbacks Activities
Job Inventory
Worker Proﬁle Match
Query
Keywords
Query
Locations
Employer
Locations
Employer Proﬁles
Preference
Yield
Positions
Trey
Grainger
requires a hybrid approach

8
Snag & Learning to Rank
what we’ve built as v1

9
The Old System
● System too complex to accurately tune the boosts: Relevancy
whack-a-mole
● Inventory content frequently changes
● Lacks data driven input -- assumption driven without proper statistical
analysis
“If only there was a way to do this
differently…”
Jason
Kowalewsky
(This slide is a shout-out to Jason Kowalewsky, who jump-started Learning to Rank at Snag.
He was a terriﬁc boss but routinely wrote sloppy slides like this. )

10
Learning to Rank Model Doug
Turnbull
Abandonment: 0
Relevancy Labels Features
Click: 1
Apply Intent: 2
bm25 scores on job title,
employer name, job type, ...
distance <position, seeker>
match scores on query location
(e.g. zip-code, city)
Bm25 scores on job description
query string attributes (e.g
length, query type)
posting attributes (e.g. position,
requirements, industry,
semantics representation)
.
.
.
lambdamart
Machine learning is everywhere

11
Training Pipeline Rishi
Kumar
Elizabeth
Haubert
Peter
Dixon-Moses
User events posting
collection
event
sampler
posting
sampler
training
data
parser
posting
ingestion
model
generator
feature
backfilling
relevancy label
parser
relevancy
scores
query
info
features training
data
ranking
model
posting
docs
user
events
training
index
search
engine
(dev)
search
engine
(prod)
“click model” +
HyperOpt
Scott
Stults

12
Last time we checked, LTR “worked” Aash
Srikar
...with varying degrees of success across query types
11%
27%
0%
-3%
0%
Old New
5%
% of searches 24% 13% 16% 30% 13%
15%
“Near me”
(50% native app traffic)

13
However, with great power... Everyone who
complained
“Why is my customer losing so many
applications? ”
“Why is this keyword search still
perform poorly?”
“I heard Google released a job search
service, why don’t we just use that?
Nobody beats Google in search!”
(Somebody actually set up a meeting with
Google Cloud Talent Solution while I was
on vacation…)
(OK this one’s on us. We actually made
the conversion rate better than before
but it’s still far from satisfactory)
(Because your customer has been gaming
our site for years and the new system
closed the loophole?)

14
Troubleshooting Learning to Rank
Issues we realized, ﬁxed or stumbled upon while maintaining v1.0

15
Sample Complexity Simon
Hughes
Factorial state space, low capacity model, biased training data
● Many LTR algorithms approximate
ranking as a scoring problem due to
intractable state space (Perm(n, r)).
● Under-expressive model formulation
leads to high bias and overﬁtting
● Search log typically contains bias
introduced by previous ranking models
https://en.wikipedia.org/wiki/Sample_complexity

16
BM 2.5 scores can make spurious LTR features
Low precision on long texts, low recall on short texts

17
Presentation Bias Jason
Kowalewsky
Stephen
Ahearn
● Users’ propensity to click on an
search entry can be inﬂuenced by
factors besides relevancy (e.g.
position, yield, UX)
● Search logs often cannot tell active
skipping from passive neglects,
introducing lots of false negatives -
had to throw away lots of data
Not all clicks are created equal
Unbiased learning to rank: https://arxiv.org/abs/1608.04468

18
Search Metrics
Used in training, ofﬂine and online testing but often don’t align with business objectives
1
0
0
0
0
0
This SERP has NDCG
of 1 but 0 apply
0
2
1
0
0
0
This SERP has
lower NDCG but
one apply (ERR?)
KFC
KFC
Macy’s
KFC
KFC
Uber
...until you realize KFC
showed up 4 times for no
good reason
0
0
2
2
0
0
This SERP has the
lowest NDCG but the
best yield (MAP?)
http://olivier.chapelle.cc/pub/err.pdf

19
Bot detection
● Bot traffic consists of > 60% of
Snag’s web and mobile web traffic
● Bots behave very differently from
human users. (e.g. views 50+
pages, clicks every posting, etc.)
● Thus, even a 5~10% false
negative rate can significantly
contaminate LTR training data
Ali
Bartos
Carl
Gieringer
Garbage in, garbage out

20
SEO - External Query Pattern Shift
Problem:
Solution:
Outcome:
When Google doesn’t care about small businesses (not that it ever did)...

21
Elements of the LTR Ecosystem
Work in progress and future initiatives towards LTR 2.0

22
Search Engine needs Metadata
Availability Req. Example Integration Strategy
User/Query
Metadata real-time query string search engine plugin / external API
near-static user profile external API
near real-time search history streaming -> external API
Posting
Metadata static
industry, vector
embeddings external API -> search index
near real-time
yield, remaining
budget streaming -> external API -> search index
Relevancy real-time relevancy score search engine plugin / external API
for both ofﬂine training and real time querying
(current focus)
(long term goal)

23
Signals Platform
Signals is an Kafka-based data streaming
platform to stream & transform real-time
events data to various internal
consumers.
● Kafka backend to process real-time
comprehensive user behavior &
product activity data
● “Hermes” REST API layer to enable
signal publishing via http calls
● Avro schema registry to enforce
typed event deﬁnition
Corey
Fritz
Clean, granular data to train and serve machine learning models

24
Position Proﬁle via Clustering
“CDL Training School ! We
train, We Hire, Guaranteed!”
“Truck
Driver”
use position ontology to align with query intent and boost recall

25
Posting Summarization via Topic Modeling
“ If you are an actor, actress, admin, agency, artist,
assistant, barista, bartender, broker, bus driver, cab
driver, cashier, chauffeur, cleaner, college student,
customer service agent, chef, contract worker, cook,
courier, designer, dishwasher, dog walker, driver,
entrepreneurs, fitness trainer, food prep, food services,
freelancer, handyman, hostess, insurance broker,
instructor, intern, janitor, maid, maintenance,
messenger, manager, management, musician, maid,
office assistant, office administrator, photographer,
private hire, professional driver, realtor, retail associate,
sales associate, sales person, security, server, students,
teacher, tutor, valet, veteran, waiter, waitress who is
looking for a flexible part-time, full-time or summer gig,
apply to <> to supplement your income this summer! ”
extracted from a real job description:
● Many postings contain
‘stuffed’ keywords to boost
their own recall at the
expense of others’
● Topic models “summarize”
each posting by the strength
of its key concepts to both
reduce spurious recall and
promote relevant recall
https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
(proof of concept) Goodbye keyword spamming

26
Posting Deduplication via LSH Robert
Mealey
● Large employers often supply generic job
descriptions that receive similar relevancy
scores for neighboring store locations,
affecting result diversity
● Locality-Sensitive Hashing (LSH) is used
to tag duplicates/near duplicates so that
all but one is shown in search results
https://en.wikipedia.org/wiki/Locality-sensitive_hashing
no, not 4 KFC jobs on the same SERP

27
Yield Management
● An interesting problem to the
LTR framework because users
behave agnostically of yield
information
● Requires careful user modeling
to “de-bias” relevancy signal
and streaming infrastructure to
update yield and budget
information
Quadrant III
Low Engagement,
Low yield
Quadrant II
High Engagement,
High Yield
(proof of concept) make some money, change the world
Anuradha
Uduwage

28
Additional Initiatives
● Language model for job postings
● Posting quality score
● User proﬁle features/embeddings
● Enhanced AB testing and metrics
monitoring capabilities
● Real-time user-activity-based
features and related infrastructure
● Search result diversity
● Query expansion
● Named Entity detection
● Knowledge graph and
graph-based search
● Vector-based relevancy
● Neural ranking models
hopefully some of those will make themselves to Haystack 2020

30
Lessons Learned
● LTR isn’t just about the ML model or the search engine
Ranking models are only as expressive and/or accurate as the features and labels
we feed them. Investment in data infrastructure and data assets is absolutely
necessary and arguably more critical.
● Expectations from stakeholders need to be carefully managed
Workers, employers, internal teams, Google bots, etc. all have their own areas of
emphasis and sometimes may demand slightly different search experiences.
Navigating through those multiple party-tradeoffs is crucial for the success of the
search system.

31
We are Hiring!
Join us and solve some interesting data engineering and search relevance engineering problems !
Richmond, VA, too

Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR to work! Now what? - Xun Wang

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR to work! Now what? - Xun Wang

Similar to Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR to work! Now what? - Xun Wang (20)

More from OpenSource Connections

More from OpenSource Connections (20)

Recently uploaded

Recently uploaded (20)

Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR to work! Now what? - Xun Wang