Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DataEngConf: Building the Next New York Times Recommendation Engine


Published on

By Alex Spangher (Data Engineer, New York Times Digital)
Machine Learning is a discipline characterized by systematic approaches and common threads to seemingly diverse problems. In this talk I'll talk about several approaches taken during our work on the next New York Times Recommendation Engine, specifically focusing on spatial reasoning, dimensionality reduction, and testing strategies. Topics covered will include implicit regression, Bayesian modeling and neural networks. The talk will focus on the commonalities between different approaches taken.

Published in: Software
  • Be the first to comment

DataEngConf: Building the Next New York Times Recommendation Engine

  1. 1. Building the Next New York Times Recommendation Engine By Alexander Spangher
  2. 2. Problem Statement: 1. The New York Times publishes over 300 articles, blog posts and interactive stories a day. Corpus: n articles that are still relevant over the past x days
  3. 3. For each user: 1 2 3 4 ... 30 day reading history
  4. 4. Machine Learning “All of machine learning can be broken down into regression and matrix factorization.” -A drunk PhD student at a bar 1. Regression: f(input) = output 2. Factorization: f(output) = output -Yann Lecun, 2015
  5. 5. Problem Statement (Refined) 1. Define pool of articles. Not all articles expire at the same rate 1. Rank order articles based on reading history of user. Assume that reader’s future preferences will match past preferences
  6. 6. Defining the Pool of Articles
  7. 7. Defining Relevancy
  8. 8. Exponential Distribution
  9. 9. Evergreen Model Section, Desk, Word Count ... clicks per day 2. Learn relationship between features and metric 1. Learn training metric 3. Convert to interpretable expiration date
  10. 10. Fit a to each item in training set Fit: i
  11. 11. Likelihood function: Maximum Likelihood Estimate (MLE) likelihood of data and parameters joint pdf of data given parameter product of independent pdf’s
  12. 12. Maximum Likelihood Estimate Given timestamp of every click:
  13. 13. Maximum Likelihood Estimate ???
  14. 14. Maximum Log Likelihood Estimate
  15. 15. Or, use optimization package: Python: Convex Optimization by Stephen Boyd
  16. 16. Learn relationship between article features and x = [desk, word count, section, ...] y = General Linear model:
  17. 17. Performance
  18. 18. Building the Recommender (
  19. 19. First Iteration Keyword-Based model: TF-IDF Vector N = number of times word appears in document D = number of documents that word appears in
  20. 20. First Iteration Keyword-Based model: TF-IDF Vector [ 0.02, 0.5, 0, 0, … , .01 ] [ 0.9, 0.01, 0.2, … , .05 ] fun cat dog scholar nice
  21. 21. Feedback: “Recommendations work for me I have been following the Oscar Pistorius case for over a year now and every time there has been a relevant story about the case, I have been recommended that story. Recommendations seem to be working very well for me.”
  22. 22. Feedback: “No More Brooks recommendations, please Your constant pushing of David Brooks onto me is like an annoying grandmother who won't believe you are really allergic to peanuts even though you regularly go into anaphylactic shock at her dinner table and need to be rushed to the hospital. What can I say… you're killing me. Please stop it. ... Thanks for your attention to this matter.”
  23. 23. Feedback: “Dear NY Times, You seem to have missed the fact that, while I do read the Weddings section, I only (or almost only) read about the weddings of same sex couples. Please stop recommending heterosexual weddings articles to me!!”
  24. 24. [ 0.02, 0.5, 0, 0, … , .01 ] [ 0.9, 0.01, 0.2, … , .05 ] 1 2 3 4 k LDA-Based model: Topic Vector Second Iteration:
  25. 25. Example topic, probabilityweight cat yarn tree building car money bank paw toy newspaper Spotify
  26. 26. Example topic, : probabilityweight cat yarn tree building car money bank paw toy newspaper Spotify
  27. 27. LDA
  28. 28. David Blei (2003)
  29. 29. Topic Space
  30. 30. How do we learn these parameters? LDA Definition: Choose 𝜃 ~ Dirichlet(ɑ) For each in document: Choose word topic ~ Mult(𝜃) Choose word from
  31. 31. Variational Inference Image borrowed from David Blei (2003)
  32. 32. Variational Inference (cont.)
  33. 33. Variational Inference (cont.) 1. (E-Step): 1. (M-Step): tractable!!!
  34. 34. Collaborative Topic Modeling (CTM) Image borrowed from David Blei (2011) The graphical model for the CTM model we use.
  35. 35. Scaling the algorithm Training procedure is batch. Do we have time to scale to all our users, in real time???
  36. 36. Strategy: 1. Iterate until some variables don’t change (article-topics). 1. Scale out, fixing non-changing variables. Update equation for one variable becomes a closed-form equation.
  37. 37. Algorithm 1. Batch train on training set of users 1. Fix and scale out to all users
  38. 38. Derive scores for users As seen in:!!
  39. 39. C parameter: the back-off average
  40. 40. Any vector-based algorithm. 1)Deep Network (Spotify’s audio-CNN) 2)Shallow Network (Doc2Vec) 3)Topic Model 4)pLSA
  41. 41. In conclusion Modeling is fun! All models are bad, but some can be useful! Improve by recognizing shortfalls. Evaluate on KPIs, on customer feedback, on design decisions.
  42. 42. not functional sub-optimal flat-lining/degrading