Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Yuri M. Brovman, Data Scientist, eBay

588 views

Published on

Yuri is a Member of Technical Staff / Data Scientist at eBay in New York City. He is currently focused on developing scalable machine learning algorithms to produce high quality item recommendations. Yuri holds a Ph.D. degree from the Applied Physics and Applied Mathematics department from Columbia University and an undergraduate degree in Physics from UC Berkeley.

Abstract Summary:

Innovations in Recommender Systems for a Semi-structured Marketplace:
eBay has over 1 billion live items on the site at any given time. The lack of structured information about listings as well as variable inventory makes traditional collaborative filtering algorithms difficult to use in eBay’s large semi-structured marketplace. We will discuss approaches to overcome these challenges using machine learning and deep learning (both text and image based models). The details of the sampling strategy, feature engineering, and machine learned ranking model are all important for delivering improved operational metrics in A/B tests. We will cover both system architecture engineering as well as data science and machine learning methods that were developed to generate high quality recommendations.

Published in: Technology
  • Be the first to comment

Yuri M. Brovman, Data Scientist, eBay

  1. 1. Innovations in Recommender Systems for a Semi-structured Marketplace March 24, 2017 Yuri M. Brovman Merchandising Team, eBay, New York City, USA 1
  2. 2. 2 Acknowledgements Merchandising Team Natraj Srinivasan Paul Wang Ben Klein Jin Chung Ryan Snyder Steve Neola Daniel Galron Michal Wieja Mike Firer Pavel Stepanov Marcus Gallagher Giri Iyengar
  3. 3. 3 eBay Marketplace Challenges: • 1 billion active items • 150 million active users • Limited structured data coverage • Volatile inventory • Cold start problem
  4. 4. 4 ML / Deep Learning @ eBay NYC Machine Learned Ranking Optimizing Similar Item Recommendations in a Semi-structured Marketplace to Maximize Conversion Yuri M. Brovman eBay Inc., New York City, USA ybrovman@ebay.com Marie Jacob eBay Inc., New York City, USA marijacob@ebay.com Natraj Srinivasan eBay Inc., New York City, USA nsrinivasan@ebay.com Stephen Neola eBay Inc., New York City, USA sneola@ebay.com Daniel Galron eBay Inc., New York City, USA dgalron@ebay.com Ryan Snyder eBay Inc., New York City, USA rysnyder@ebay.com ABSTRACT This paper tackles the problem of recommendations in eBay’s large semi-structured marketplace. eBay’s variable inven- tory and lack of structured information about listings makes traditional collaborative filtering algorithms di cult to use. We discuss how to overcome these data limitations to pro- duce high quality recommendations in real time with a com- bination of a customized scalable architecture as well as a widely applicable machine learned ranking model. A point- wise ranking approach is utilized to reduce the ranking prob- lem to a binary classification problem optimized on past user purchase behavior. We present details of a sampling strategy and feature engineering that have been critical to achieve a lift in both purchase through rate (PTR) and revenue. Keywords e-commerce; recommender systems; machine learning; learn- ing to rank 1. INTRODUCTION Recommender systems in e-commerce have been exten- sively studied over the last few decades. Recommendations drive a considerable portion of site revenue, and ensure that users stay engaged with content for as long as possible. Un- like general marketplaces such as Amazon and Walmart, which o↵er warehouse products in a documented catalog, the eBay marketplace o↵ers more diverse listings ranging anywhere from a new iPhone (a specific product with struc- tured data attributes) to o↵hand antique items with no Figure 1: Item page showing similar item recom- mendations above the fold. In this paper, we highlight several unique challenges in providing high quality item recommendations to users in real time that are specific to the scale and variety of this data. There is limited structured data coverage of items which makes it di cult to utilize specific item attributes. Additionally, many items tend to be short-lived – they sur- face on the site for one week and are never listed again. Traditional collaborative filtering algorithms [5, 10, 8] are not e↵ective in this environment due to this volatility of in- ventory and limitation of structured data coverage. Image Based Deep Learning Text Based Deep Learning • Similar item recommendations • Pointwise machine learned ranking model • Published in ACM RecSys 2016 • Similar item recommendations using images • Deep learning model based on GoogLeNet CNN using GPUFigure 1: Items from eBay that have the words “Chinese” and “Vase” in their title. This exemplifies that textual data is not always informative enough to find stylistically similar items. Figure 2: Comparison of the recommendations before the proposed algorithm and after. The first column is the seed item. The second, third, and fourth columns show the previous recommendations. The fifth, sixth, and seventh columns show the new recommendations. The first three rows are books, and the last two are movies. 2 • Complementary item recommendations using title and aspects text • Novel deep learning architecture trained on eBay data using Theano
  5. 5. 5 ML / Deep Learning @ eBay NYC Machine Learned Ranking Optimizing Similar Item Recommendations in a Semi-structured Marketplace to Maximize Conversion Yuri M. Brovman eBay Inc., New York City, USA ybrovman@ebay.com Marie Jacob eBay Inc., New York City, USA marijacob@ebay.com Natraj Srinivasan eBay Inc., New York City, USA nsrinivasan@ebay.com Stephen Neola eBay Inc., New York City, USA sneola@ebay.com Daniel Galron eBay Inc., New York City, USA dgalron@ebay.com Ryan Snyder eBay Inc., New York City, USA rysnyder@ebay.com ABSTRACT This paper tackles the problem of recommendations in eBay’s large semi-structured marketplace. eBay’s variable inven- tory and lack of structured information about listings makes traditional collaborative filtering algorithms di cult to use. We discuss how to overcome these data limitations to pro- duce high quality recommendations in real time with a com- bination of a customized scalable architecture as well as a widely applicable machine learned ranking model. A point- wise ranking approach is utilized to reduce the ranking prob- lem to a binary classification problem optimized on past user purchase behavior. We present details of a sampling strategy and feature engineering that have been critical to achieve a lift in both purchase through rate (PTR) and revenue. Keywords e-commerce; recommender systems; machine learning; learn- ing to rank 1. INTRODUCTION Recommender systems in e-commerce have been exten- sively studied over the last few decades. Recommendations drive a considerable portion of site revenue, and ensure that users stay engaged with content for as long as possible. Un- like general marketplaces such as Amazon and Walmart, which o↵er warehouse products in a documented catalog, the eBay marketplace o↵ers more diverse listings ranging anywhere from a new iPhone (a specific product with struc- tured data attributes) to o↵hand antique items with no Figure 1: Item page showing similar item recom- mendations above the fold. In this paper, we highlight several unique challenges in providing high quality item recommendations to users in real time that are specific to the scale and variety of this data. There is limited structured data coverage of items which makes it di cult to utilize specific item attributes. Additionally, many items tend to be short-lived – they sur- face on the site for one week and are never listed again. Traditional collaborative filtering algorithms [5, 10, 8] are not e↵ective in this environment due to this volatility of in- ventory and limitation of structured data coverage. Image Based Deep Learning Text Based Deep Learning • Similar item recommendations • Pointwise machine learned ranking model • Published in ACM RecSys 2016 • Similar item recommendations using images • Deep learning model based on GoogLeNet CNN using GPUFigure 1: Items from eBay that have the words “Chinese” and “Vase” in their title. This exemplifies that textual data is not always informative enough to find stylistically similar items. Figure 2: Comparison of the recommendations before the proposed algorithm and after. The first column is the seed item. The second, third, and fourth columns show the previous recommendations. The fifth, sixth, and seventh columns show the new recommendations. The first three rows are books, and the last two are movies. 2 • Complementary item recommendations using title and aspects text • Novel deep learning architecture trained on eBay data using Theano
  6. 6. 6 Similar Algorithm (SIM) on eBay • Powering several prominent placements across desktop and mobile • Serving 1 billion impressions daily • Response time = 200 ms end-to-end Item Page Goal: Find most similar items and maximize conversion
  7. 7. 7 Architecture Merchandising Back End item titles productscoviews Offline Offline item images Offline GPU
  8. 8. 8 Machine Learned Ranking in Search • Learning to rank for information retrieval to reduce ranking problem to classification problem • Training on 𝒙𝒊 = {query, URL} pairs as input • Binary or multi-class relevance label are 𝒚𝒊 collected with crowdsourcing pair
  9. 9. • Pointwise ranking approach using binary classifier to rank recommendation by the probability of purchase • Training on 𝒙𝒊 = {seed item, recommended item} pairs logged from implicit user data • Class label are 𝒚𝒊 = {0 = non-clicked, 1 = purchased} • Training dataset size ≈ 350K positive training pairs pair 1 pair 2 pair 3 pair 4 Sample impression from item page 9clicked non-clicked non-clicked purchased Machine Learned Ranking
  10. 10. 10 Feature Engineering: Price • How to compare seed price to recommendation price? • Fit the ratio of recommendation price to seed price with Cauchy distribution • Feature score defined by normalized Cauchy distribution with parameters defined by past purchase events Price distribution of ratio from past purchase data PDF = 1 πγ 1+ x − x0 γ " # $ % & ' 2( ) * * + , - - γ = HWHM x0 = median
  11. 11. • Considered several class combinations for binary classifier • Calculated class separability of each feature using the KL Divergence • Using class 0 [non-clicked] and class 1 [purchased] produced highest KL Divergence in price feature Sampling Strategy KL Divergence = 0.15 KL Divergence Negative Class Positive Class Price Feature non-clicked clicked 0.01 non-purchased purchased 0.13 clicked not purchased purchased 0.07 non-clicked purchased 0.15 11 KL Divergence = 0.01
  12. 12. Classification Results 12 Metrics from validation data set: AUC LogisticRegression GradientBoostingClassifier Accuracy 0.698 0.734 class 0 class 1 class 0 class 1 Precision 0.70 0.70 0.75 0.72 Recall 0.70 0.70 0.70 0.77 LogLoss 0.58 0.53 AUC 0.766 0.81
  13. 13. 13 Ranking Metric NDCG@k • Evaluate performance with ranking metric Discounted Cumulative Gain (DCG): DCG = 2li −1 log2 (ri )+1i=1 n ∑ li = relevance ri = rank • Normalized DCG truncated at position k (NDCG@k) • Relevance function defined to be {non-clicked = 0, clicked = 0, purchased = 1} Ranking performance improved with classifier model
  14. 14. 14
  15. 15. 15 A/B Test Results CTR (%) PTR (%) Revenue (%) +3.0 +6.6 +6.0 • Site and category segmented logistic regression implemented in production • A/B test showed positive results over existing baseline ranking model in key operational metrics click through rate (CTR), purchase through rate (PTR), and revenue Launched MLR model to full traffic worldwide in 2016! % Lift
  16. 16. Conclusion • Presented scalable architecture with a machine learned ranking model used in the Similar items algorithm to produce high quality recommendations • There are many interesting engineering and machine learning / deep learning challenges that are being solved @ eBay NYC! 16 Thank you! We are hiring!

×