Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1

Share

Haystack- Learning to rank in an hourly job market

Download to read offline

Slides for our presentation at Haystack 2018
(https://opensourceconnections.com/events/haystack/)

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Haystack- Learning to rank in an hourly job market

  1. 1. Learning to Rank in an Hourly Job Marketplace Xun Wang (xun.wang@snag.co) Jason Kowalewski (jason.kowalewski@snag.co)
  2. 2. Snag. The marketing slide 2 ● 85 MM registered workers ● 325,000 employer locations ● 4.5 MM applications submitted monthly ● 1MM active job postings
  3. 3. Our Matching Problem 3 Location? Pay?Industry? Part Time
  4. 4. Our Matching Problem 4 Sandwich Artist? Host/Hostess?Barista? cashier
  5. 5. Our Matching Problem 5 Determine Intent / Context. ?????????
  6. 6. Legacy Search System 6
  7. 7. Legacy Search System 7
  8. 8. Query: “mcdonalds in 92801” title^n
  9. 9. Query: “mcdonalds in 90024”
  10. 10. Query: “mcdonalds in 11231” Boosts on channel
  11. 11. The (0ld) system 12 ● System that is too complex to accurately tune the boosts: Relevancy whack-a-mole ● Inventory content frequently changes ● Lacks data driven input -- assumption driven without proper statistical analysis. “If only there was a way to do this differently…”
  12. 12. The Hourly Job Marketplace
  13. 13. Job search is both an IR and a match problem Search/ IR (e.g. Youtube) 14 { User } { Resource } Match (e.g. online chess) ● Many to Many ● Asymmetric ● Unlimited supply { Player} { Player } ● One to One ● Symmetric ● No extra supply Job Search { Job Seekers } {Job Positions} ● One to Many ● Asymmetric and Bi-directional ● Limited supply, unlimited “attempts”
  14. 14. ● Fragmented Organized around “Shifts”. A worker can be assigned 1 to 30+ hours per week. Many hold multiple jobs ● Transactional Workers stay at each job for 6 months on average ● Lightly Skilled Many hourly jobs require just a high school diploma Hourly Jobs are not ‘Sticky’ 15 https://www.snag.co/employers/wp-content/uploads/2016/07/2016 _SOTHW_Report-3.pdf
  15. 15. Hourly job search is often a recommendation Schedule and location can be more important than actual duty of the job Queries are not explicit (40% don’t have keywords)
  16. 16. Our relevancy signals are collected from multiple levels of interactions 17 Clicks Apply Intents Completed Applications Interviews & Assessments Hires 1 3 2 4 5 Jobseeker Platform Employer Platform 0 Search
  17. 17. We balance a variety of market participants snag. 18 ● Revenue growth ● User base growth ● Marketplace health & efficiency Job-seekers ● Job preference ● Job requirements ● Employer responsiveness Employers ● Seeker volume ● Conversion rate (e.g. CTR) ● Cost per lead ● Seeker volume & velocity ● Candidate quality ● Cost per hire Advertisers/Partners
  18. 18. LTR System Design
  19. 19. Learning to Rank Model 20 Development Environment Abandonment: 0 Relevancy Labels Features Click: 1 Apply Intent: 2 match scores on job title, employer name, job type, ... distance <position, seeker> match scores on query location (e.g. zip-code, city) match scores on job description query string attributes (e.g length, entity type) posting attributes (e.g. position, requirements, industry, semantics representation) . . . lambdamart Composability!
  20. 20. Training Pipeline - esltr plugin 0.x 21 Development Environment data warehouse posting collection event sampler posting sampler training data generator posting ingestion model generator feature backfilling relevancy label parser Ranklib relevancy scores query info features training data ranking model posting docs user events training index search engine (dev) search engine (prod)
  21. 21. Training Pipeline - esltr plugin 1.0 22 Development Environment data warehouse event sampler training data generator model generator feature parser relevancy label generator relevancy scores features training data ranking model user events live feature logs + HyperOpt search engine (prod)
  22. 22. Offline Validation Pre-Deployment 23 Development Environment ● Re-ranking historical queries Gives good directional guidance, but not very accurate in absolute numbers due to 1) inability to account for new items and 2) contamination from sponsored postings with artificially high rankings. ● Manual examination of common query patterns Great for sanity checks. Reveals details beyond relevancy labels. More indicative of future performance. ● Best of Both Worlds? Aljadda, Khalifeh & Korayem, Mohammed & Grainger, Trey. (2018). Fully Automated QA System for Large Scale Search and Recommendation Engines Leveraging Implicit User Feedback.
  23. 23. Deployment via A-B testing 24 Production Environment Don’t modify the existing system.
  24. 24. Deployment via A-B testing 25 Production Environment a) Build a parallel system b) Iterate c) Test d) Evaluate
  25. 25. Posting Ingestion 26 Production Environment Step 1. Make it work. Step 2. Streaming magic We are still here! Not here.
  26. 26. Search API 27 Production Environment
  27. 27. Tuning the LTR system
  28. 28. Iteration 1 (Q2 2017) 29 ● LTR Features 1. job_title match score 2. job_description match score 3. employer_name match score 4. city-state_match score 5. zipcode_match score 6. distance <query location, posting> ● Relevancy Labels Click : 1 Apply Intent: 2 Completed Applications: 3 ● Success Criteria - NDCG@10 ● Use Cases Site: desktop, mobile web User: registered Search Type: - zip-code location only - zip-code location + keyword
  29. 29. Relevancy Performance 30 Iteration 1 ● Pros Immediate boost of NDCG for zipcode-only searches (~5%) ● Cons Keyword and location-only searches shared same feature space, leading to polarized user experience. ● Todo Add query-string-related attributes to the list of features Query: - keyword: Starbucks - location: Arlington, VA, 22201 Results: Rank Employer Location 1 Starbucks Arlington, VA, 22201 2 Starbucks Arlington, VA, 22203 3 WholeFoods Arlington, VA, 22201 4 Starbucks Washington, DC, 20007 ● When things don’t work:
  30. 30. Iteration 2 (Q3 2017) 31 ● Success Criteria - NDCG@10 - Application Rate (# of applications/ # of search sessions) ● Use Cases Site: desktop, mobile web User: registered, unregistered Search Type: - zip-code location only - zip-code location + keyword - text location only - text location + keyword ● LTR Features 1. job_title match scores 2. job_description match scores 3. employer_name match scores 4. location “match” level 5. distance <seeker, posting> 6. query location level 7. query length 8. platform (e.g. desktop, mobile) 9. job seeker registration status ● Relevancy Labels Click : 1 Apply Intent: 2 Completed Applications: 3
  31. 31. Relevancy Performance 32 Iteration 2 ● Pros More stable performance across the board. ● Cons Low geo-location resolution rate (~95%) hurt queries with text locations Default text analyzers supplied noisy signals to ltr. ● Todo Enhance geo-coding logics Define customized analyzers (e.g. stopwords, synonym filters, keyword markers) for every field used by the ranking model Query: - keyword: Part time restaurant Results: Rank Title Employer 1 Part time server Chipotle 2 Full time cook KFC 3 Part time Cashier Restaurant Depot 4 Cook District Taco ● When things don’t work:
  32. 32. Iteration 3 (Q4 2017) 33 ● Success Criteria - NDCG@10 - Application rate (# of applications / # of search sessions) - Applicant conversion rate (# of applicants / # of users) - Applications per user (# of applications / # of users) ● Use Cases Site: desktop, mobile web User: registered, unregistered Search Type: - zip-code location only - zip-code location + keyword - text location only - text location + keyword - keyword only ● LTR Features 1. job_title match scores 2. job_description match scores 3. employer_name match scores 4. location “match” level 5. distance <seeker, postings> 6. query location level 7. query length 8. platform (e.g. desktop, mobile) 9. job seeker registration status 10. is_faceted flag ● Relevancy Labels Click : 1 Apply Intent: 2 Completed Applications: 3
  33. 33. Relevancy Performance 34 Iteration 3 ● Pros Location only searches are 10%+ better than baseline. Keyword searches broke even. ● Cons Large numbers of tied LTR scores artificially limited user options via presentation bias Lack of features about job description contexts meant “click-baits” received too much exposure ● Todo Randomize the ranking of postings with tied LTR scores on a per-user/session basis Add query independent posting-level features Query: - keyword: PT (part time) - location: Arlington, VA Results: Rank Title Location 1 Part time Cashier Arlington, VA, 22201 2 Drive Uber PT! Arlington, VA, 22209 3 Drive Uber PT! Arlington, VA, 22202 4 Drive Uber PT! Arlington, VA, 22203 ● When things don’t work:
  34. 34. Current Iteration (Q1 2018) 35 ● Success Criteria - Application rate (# of applications/ # of search sessions) - Applicant conversion rate (# of applicants/ # of users) - Applications per user (# of applications / # of users) - Application diversity (# of distinct applied postings/ # of applications) ● Use cases Site: mobile apps, desktop, mobile web User: registered, unregistered Search Type: - zip-code location only - zip-code location + keyword - text location only - text location + keyword - keyword only - user coordinates only (a.k.a Jobs near me) - user coordinates + keyword ● LTR features 1. job_title match scores 2. job_description match scores 3. employer_name match scores 4. location “match” level 5. distance <seeker, postings> 6. query location level 7. query length 8. platform (e.g. desktop, mobile) 9. job seeker registration status 10. is_faceted flag 11. Location conf level of postings (proxy for posting quality) ● Relevancy Labels Click : 1 Apply Intent: 2 Completed Applications: 3
  35. 35. Android App Live Performance (April, 2018) 36 Metrics Qualitative assessments ● Signal Regularisation: No particular field has outsized impact on relevancy anymore ● Signal Coordination: e.g. The interaction between text and location relevancy are more balanced ● Randomized ties => Better Match: Randomization enables well-distributed matchings and better marketplace health, and partially corrects positional bias Metric Control (80% user) Test (20% user) Average % Lift Application Rate 0.1273 (0.0005) 0.1409 (0.0011) 10.72% Applicant Conversion Rate 33.86% (0.20%) 36.64% (0.43%) 8.22% Apply Intent Diversity 0.676 (0.002) 0.759 (0.004) 12.40% Click Diversity 0.663 (0.002) 0.807 (0.004) 21.62%
  36. 36. Engineering Challenges 37 ● Latency ● API: window size from 3000 to 1000 to 500 ● Igniter (posting ingestion) execution time ● Signal Quality ● Randomization for result consistency
  37. 37. Lessons Learned
  38. 38. Lessons Learned 39 Model Development ● Relevancy tuning can create feedback loops. Look ahead Changes in the ranking function sometimes triggers changes in user behavior, which in turn invalidate said ranking function. Treat relevancy tuning as interactive experiments, not a curve-fitting exercise ● Apply strong model assumptions to correct deficiencies in old ranking functions Use sound behavioral hypothesis via data analysis and qualitative user research to regulate model behavior. Historical data can be noisy. Let AB tests be the final judge. ● Engineer the relevancy labels as well as the features Implicit feedbacks are not absolute measures of relevancy and should be modeled to account for biases and behavioral assumptions ● Ranking functions are only as expressive as the features you feed them Any relevancy insights that can’t be encoded as meaningful differences in the feature space will not be reflected in the search results
  39. 39. Lessons Learned 40 Engineering & Infrastructure ● Prioritize on velocity of iteration (analysis paralysis) ● Worked backwards from conclusions about system latency
  40. 40. Future Work
  41. 41. Posting and Query Semantics Features 42 ● Contextual information in posting descriptions contribute many relevancy signals ● Back-testings on both manually crafted bag-of-words features and machine-learned representations (e.g. via SVD, word2vec) already saw significant lift of reranked NDCG ● Some concerns for query-time performance and over-fitting of long NLP feature vectors “... hiring individuals to work as part-time Package Handlers... involves continual lifting, lowering and sliding packages that typically weigh 25 - 35 lbs… typically do not work on holidays.... working approximately 17.5 - 20 hours per week… outstanding education assistance of up to $2,625 per semester...” “We have a part time opening for a delivery driver position. Must be authorized to work in the US” High context Low context
  42. 42. Click / Relevancy Label Modeling 43 Model Improvements ● Build multi-stage click models to account for factors that cannot be formulated as query-time LTR features (e.g. rank position, between-session correlations). ● Creates a positive feedback loop that boosts potentially relevant postings with low exposures (and penalize the reverse)
  43. 43. Personalized Matching 44 Model Improvements ● Incorporate LTR features about matching signals between job seeker preferences/ qualifications and job requirements ● (Potentially) an online learning module that dynamically adjusts the rankings shown to each user based on onsite behavior (...That pays >$15 per hour. No night shifts! ...is In the retail industry, where I have 5 years of experience Bonus points if it’s Harris Teeter…) I want a part time job near my home!
  44. 44. Engineering Improvements 45 Engineering & Infrastructure ● Push-button training pipeline ● Automated push button deployment for re-indexing ● Latency and scale improvements
  45. 45. References
  46. 46. 47 ● Elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.htm ● ES Learning to Rank Plugin: http://elasticsearch-learning-to-rank.readthedocs.io/en/latest/ ● Relevancy tuning: Turnbull, Doug, and John Berryman. Relevant Search with Applications for Solr and Elasticsearch. Manning, 2016. ● lambdaMart: C. Burges. From RankNet to LambdaRank to LambdaMART: An overview. Technical Report MSR-TR-2010-82, Microsoft Research, 2010. ● ranklib: https://sourceforge.net/p/lemur/wiki/RankLib ● xgboost: Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 785-794 K.V. Rashmi and Ran Gilad-Bachrach, Dart: Dropouts meet multiple additive regression trees, April 2015 ● hyperopt: J. Bergstra, R. Bardenet, Y. Bengio and B. Kégl. Algorithms for Hyper-parameter Optimization. Proc. Neural Information Processing Systems 24 (NIPS2011), 2546–2554, 2011 ● Interleaving: O. Chapelle, T. Joachims, F. Radlinski, Yisong Yue, Large-Scale Validation and Analysis of Interleaved Search Evaluation, ACM Transactions on Information Systems (TOIS), 30(1):6.1-6.41, 2012. T. Joachims, Evaluating Retrieval Performance Using Clickthrough Data, Proceedings of the SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval, 2002. ● document & query embeddings: Mitra, Bhaskar & Craswell, Nick. (2017). Neural Models for Information Retrieval. Hamed Zamani and W. Bruce Croft. 2017. Relevance-based Word Embedding. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17). ACM, New York, NY, USA, 505-514 ● Click model: Chuklin, A., Markov, I., & de Rijke, M. (2015). Click models for web search. Synthesis Lectures on Information Concepts Retrieval and Services, 7(3), 1–115. With Pyclick: https://github.com/markovi/PyClick Y. Hu, Y. Koren and C. Volinsky, "Collaborative Filtering for Implicit Feedback Datasets," 2008 Eighth IEEE International Conference on Data Mining, Pisa, 2008, pp. 263-272.
  47. 47. Our posting index is constantly changing 48
  48. 48. Thank you. Any questions?
  • ChengGibson

    Jan. 22, 2019

Slides for our presentation at Haystack 2018 (https://opensourceconnections.com/events/haystack/)

Views

Total views

593

On Slideshare

0

From embeds

0

Number of embeds

62

Actions

Downloads

36

Shares

0

Comments

0

Likes

1

×