As the largest online marketplace for hourly jobs in the US, Snag strives to connect millions of job seekers with part/full time, hourly and on-demand employment opportunities. Snag started building its learning-to-rank (LTR)-based search system using the Elasticsearch learning-to-rank plugin in 2017 and has switched all of its user queries to LTR by mid-2018, generating significant lift to overall search quality. While fine-tuning and maintaining the LTR system over the past 12 months, our team has come to the realization that continued success of the LTR system requires not only a great ranking model, but also an ecosystem of intelligent metadata services and reliable data infrastructure.
This talk is a collection of examples about the growing pains and remedies of iterating LTR beyond v1.0 at Snag. To start, we will address a few nuances of LTR as a machine-learning problem, e.g. high sample complexity, potential biases from training data, limitations of BM25-based features, incorporation of user preferences, evaluation metrics to please both human users and SEO bots, etc. Then, we will present a few of our newest developments to supplement the current LTR system, including our posting deduplication services, job title normalization services, and architectural designs of our next-generation signal platform and posting enrichment pipeline.
4. 4
Snag is an Hourly Job Marketplace
snag
● Marketplace health
● Member growth
● Revenue growth
Job-seekers
● Preference
● Qualifications
● Schedule
● Responsiveness
Employers &
Hiring Agencies
● Candidate volume
● Candidate quality
● Cost per hire/lead
Fabio
Rosati
90MM+ registered workers 150K+ hires per month 400K+ active locations
5. 5
Hourly Jobs are Transactional
● Fragmented
Organized around “Shifts”. A worker
can be assigned 1 to 30+ hours per
week. Many hold multiple jobs
● High turnover
Workers stay at each job for 6 months
on average
● Lightly Skilled
Many hourly jobs require just a high
school diploma https://www.snag.co/employers/wp-content/uploads/2016/07/2016
_SOTHW_Report-3.pdf
6. 6
Hourly Job Search is Open-ended
Schedule and location more
important than actual job duty
Queries not explicit (40% without keywords)
9. 9
The Old System
● System too complex to accurately tune the boosts: Relevancy
whack-a-mole
● Inventory content frequently changes
● Lacks data driven input -- assumption driven without proper statistical
analysis
“If only there was a way to do this
differently…”
Jason
Kowalewsky
(This slide is a shout-out to Jason Kowalewsky, who jump-started Learning to Rank at Snag.
He was a terrific boss but routinely wrote sloppy slides like this. )
10. 10
Learning to Rank Model Doug
Turnbull
Abandonment: 0
Relevancy Labels Features
Click: 1
Apply Intent: 2
bm25 scores on job title,
employer name, job type, ...
distance <position, seeker>
match scores on query location
(e.g. zip-code, city)
Bm25 scores on job description
query string attributes (e.g
length, query type)
posting attributes (e.g. position,
requirements, industry,
semantics representation)
.
.
.
lambdamart
Machine learning is everywhere
11. 11
Training Pipeline Rishi
Kumar
Elizabeth
Haubert
Peter
Dixon-Moses
User events posting
collection
event
sampler
posting
sampler
training
data
parser
posting
ingestion
model
generator
feature
backfilling
relevancy label
parser
relevancy
scores
query
info
features training
data
ranking
model
posting
docs
user
events
training
index
search
engine
(dev)
search
engine
(prod)
“click model” +
HyperOpt
Scott
Stults
12. 12
Last time we checked, LTR “worked” Aash
Srikar
...with varying degrees of success across query types
11%
27%
0%
-3%
0%
Old New
5%
% of searches 24% 13% 16% 30% 13%
15%
“Near me”
(50% native app traffic)
13. 13
However, with great power... Everyone who
complained
“Why is my customer losing so many
applications? ”
“Why is this keyword search still
perform poorly?”
“I heard Google released a job search
service, why don’t we just use that?
Nobody beats Google in search!”
(Somebody actually set up a meeting with
Google Cloud Talent Solution while I was
on vacation…)
(OK this one’s on us. We actually made
the conversion rate better than before
but it’s still far from satisfactory)
(Because your customer has been gaming
our site for years and the new system
closed the loophole?)
15. 15
Sample Complexity Simon
Hughes
Factorial state space, low capacity model, biased training data
● Many LTR algorithms approximate
ranking as a scoring problem due to
intractable state space (Perm(n, r)).
● Under-expressive model formulation
leads to high bias and overfitting
● Search log typically contains bias
introduced by previous ranking models
https://en.wikipedia.org/wiki/Sample_complexity
16. 16
BM 2.5 scores can make spurious LTR features
Low precision on long texts, low recall on short texts
17. 17
Presentation Bias Jason
Kowalewsky
Stephen
Ahearn
● Users’ propensity to click on an
search entry can be influenced by
factors besides relevancy (e.g.
position, yield, UX)
● Search logs often cannot tell active
skipping from passive neglects,
introducing lots of false negatives -
had to throw away lots of data
Not all clicks are created equal
Unbiased learning to rank: https://arxiv.org/abs/1608.04468
18. 18
Search Metrics
Used in training, offline and online testing but often don’t align with business objectives
1
0
0
0
0
0
This SERP has NDCG
of 1 but 0 apply
0
2
1
0
0
0
This SERP has
lower NDCG but
one apply (ERR?)
KFC
KFC
Macy’s
KFC
KFC
Uber
...until you realize KFC
showed up 4 times for no
good reason
0
0
2
2
0
0
This SERP has the
lowest NDCG but the
best yield (MAP?)
http://olivier.chapelle.cc/pub/err.pdf
19. 19
Bot detection
● Bot traffic consists of > 60% of
Snag’s web and mobile web traffic
● Bots behave very differently from
human users. (e.g. views 50+
pages, clicks every posting, etc.)
● Thus, even a 5~10% false
negative rate can significantly
contaminate LTR training data
Ali
Bartos
Carl
Gieringer
Garbage in, garbage out
20. 20
SEO - External Query Pattern Shift
Problem:
Solution:
Outcome:
When Google doesn’t care about small businesses (not that it ever did)...
21. 21
Elements of the LTR Ecosystem
Work in progress and future initiatives towards LTR 2.0
22. 22
Search Engine needs Metadata
Availability Req. Example Integration Strategy
User/Query
Metadata real-time query string search engine plugin / external API
near-static user profile external API
near real-time search history streaming -> external API
Posting
Metadata static
industry, vector
embeddings external API -> search index
near real-time
yield, remaining
budget streaming -> external API -> search index
Relevancy real-time relevancy score search engine plugin / external API
for both offline training and real time querying
(current focus)
(long term goal)
23. 23
Signals Platform
Signals is an Kafka-based data streaming
platform to stream & transform real-time
events data to various internal
consumers.
● Kafka backend to process real-time
comprehensive user behavior &
product activity data
● “Hermes” REST API layer to enable
signal publishing via http calls
● Avro schema registry to enforce
typed event definition
Corey
Fritz
Clean, granular data to train and serve machine learning models
24. 24
Position Profile via Clustering
“CDL Training School ! We
train, We Hire, Guaranteed!”
“Truck
Driver”
use position ontology to align with query intent and boost recall
25. 25
Posting Summarization via Topic Modeling
“ If you are an actor, actress, admin, agency, artist,
assistant, barista, bartender, broker, bus driver, cab
driver, cashier, chauffeur, cleaner, college student,
customer service agent, chef, contract worker, cook,
courier, designer, dishwasher, dog walker, driver,
entrepreneurs, fitness trainer, food prep, food services,
freelancer, handyman, hostess, insurance broker,
instructor, intern, janitor, maid, maintenance,
messenger, manager, management, musician, maid,
office assistant, office administrator, photographer,
private hire, professional driver, realtor, retail associate,
sales associate, sales person, security, server, students,
teacher, tutor, valet, veteran, waiter, waitress who is
looking for a flexible part-time, full-time or summer gig,
apply to <> to supplement your income this summer! ”
extracted from a real job description:
● Many postings contain
‘stuffed’ keywords to boost
their own recall at the
expense of others’
● Topic models “summarize”
each posting by the strength
of its key concepts to both
reduce spurious recall and
promote relevant recall
https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
(proof of concept) Goodbye keyword spamming
26. 26
Posting Deduplication via LSH Robert
Mealey
● Large employers often supply generic job
descriptions that receive similar relevancy
scores for neighboring store locations,
affecting result diversity
● Locality-Sensitive Hashing (LSH) is used
to tag duplicates/near duplicates so that
all but one is shown in search results
https://en.wikipedia.org/wiki/Locality-sensitive_hashing
no, not 4 KFC jobs on the same SERP
27. 27
Yield Management
● An interesting problem to the
LTR framework because users
behave agnostically of yield
information
● Requires careful user modeling
to “de-bias” relevancy signal
and streaming infrastructure to
update yield and budget
information
Quadrant III
Low Engagement,
Low yield
Quadrant II
High Engagement,
High Yield
(proof of concept) make some money, change the world
Anuradha
Uduwage
28. 28
Additional Initiatives
● Language model for job postings
● Posting quality score
● User profile features/embeddings
● Enhanced AB testing and metrics
monitoring capabilities
● Real-time user-activity-based
features and related infrastructure
● Search result diversity
● Query expansion
● Named Entity detection
● Knowledge graph and
graph-based search
● Vector-based relevancy
● Neural ranking models
hopefully some of those will make themselves to Haystack 2020
30. 30
Lessons Learned
● LTR isn’t just about the ML model or the search engine
Ranking models are only as expressive and/or accurate as the features and labels
we feed them. Investment in data infrastructure and data assets is absolutely
necessary and arguably more critical.
● Expectations from stakeholders need to be carefully managed
Workers, employers, internal teams, Google bots, etc. all have their own areas of
emphasis and sometimes may demand slightly different search experiences.
Navigating through those multiple party-tradeoffs is crucial for the success of the
search system.
31. 31
We are Hiring!
Join us and solve some interesting data engineering and search relevance engineering problems !
Richmond, VA, too