Successfully reported this slideshow.
Your SlideShare is downloading. ×

Modeling Impression discounting in large-scale recommender systems

Modeling Impression discounting in large-scale recommender systems

Recommender systems have become very important for many online activities, such as watching movies, shopping for products, and connecting with friends on social networks. User behavioral analysis and user feedback (both explicit and implicit) modeling are crucial for the improvement of any online recommender system. Widely adopted recommender systems at LinkedIn such as “People You May Know” and “Suggested Skills Endorsement” are evolving by analyzing user behaviors on impressed recommendation items.
In this paper, we address modeling impression discounting of
recommended items, that is, how to model users’ no-action feedback on impressed recommended items. The main contributions of this paper include (1) large-scale analysis of impression data from LinkedIn and KDD Cup; (2) novel anti-noise regression techniques, and its application to learn four different impression discounting functions including linear decay, inverse decay, exponential decay, and quadratic decay; (3) applying these impression discounting functions to LinkedIn’s “People You May Know” and “Suggested Skills Endorsements” recommender systems.

Recommender systems have become very important for many online activities, such as watching movies, shopping for products, and connecting with friends on social networks. User behavioral analysis and user feedback (both explicit and implicit) modeling are crucial for the improvement of any online recommender system. Widely adopted recommender systems at LinkedIn such as “People You May Know” and “Suggested Skills Endorsement” are evolving by analyzing user behaviors on impressed recommendation items.
In this paper, we address modeling impression discounting of
recommended items, that is, how to model users’ no-action feedback on impressed recommended items. The main contributions of this paper include (1) large-scale analysis of impression data from LinkedIn and KDD Cup; (2) novel anti-noise regression techniques, and its application to learn four different impression discounting functions including linear decay, inverse decay, exponential decay, and quadratic decay; (3) applying these impression discounting functions to LinkedIn’s “People You May Know” and “Suggested Skills Endorsements” recommender systems.

Advertisement
Advertisement

More Related Content

Advertisement

Modeling Impression discounting in large-scale recommender systems

  1. 1. Modeling Impression Discounting In Large-scale Recommender Systems KDD 2014, presented by Mitul Tiwari 1 Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada Mitul Tiwari, Sam Shah LinkedIn Mountain View, CA, USA 2015/3/12
  2. 2. 2 LinkedIn By The Numbers 300M members 2 new members/sec
  3. 3. Large-scale Recommender Systems at LinkedIn3  People You May Know  Skills Endorsements
  4. 4. Recommender Systems at LinkedIn 4
  5. 5. What is Impression Discounting? 5  Examples  People You May Know  Skills Endorsement  Problem Definition
  6. 6. 5 days ago: Recommended, but not invited Today: If we recommend again, would you invite or not? If you are not likely to invite, we better discount this recommendation. 1 day ago: Recommended again, but not invited Example: People You May Know
  7. 7. Example: Skills Endorsement  If you are likely to endorse, we will show a skills suggestion again;  Otherwise, we should leave this space for other skills suggestions Recommended Recommended Again
  8. 8. Problem Definition  Impressions: recommendations shown to user  Conversion: positive action - invite, endorse, etc.  In natural language:  We already impressed an item several times with no conversion. How much should we discount that item?  More formally:  For an user u and item i, given an impression history (T1, T2 , …, Tn) between u and i, can we predict the conversion rate for u on i?
  9. 9. Impression Discounting Framework 9
  10. 10. Outline 10  Define an impression  User, Item, Conversion Rate  Features  How many times the user saw this item?  When is the last time the user saw this item? …  Data analysis: impression features and conversion relationship  Impression discounting model fitting  Experimental evaluation
  11. 11. Define an Impression 11
  12. 12. Features on Impressed Items 12  ImpCount (frequency):  the number of historical impressions before the current impression, associated with the same (user, item)  LastSeen (recency):  the day difference between the last impression and the current impression, associated with the same (user, item)  Position:  the offset of item in the recommendation list of user  UserFreq:  the interaction frequency of user in a recommender system
  13. 13. Data Analysis: PYMK Dataset  Data: 1.09 B impression tuples  Training dataset: 80% of data  0.55 billion unique impressions  0.87 billion total impressions  20 millions invitations  Testing dataset: 20% of data  0.14 billion unique impressions (3.7%)  0.22 billion total impressions (2.4%)  5.2 millions invitations
  14. 14. Skills Endorsement Dataset 14  Total dataset size: 190 million impression tuples  Training dataset: 80% of data  Testing dataset: 20% of data
  15. 15. Tencent SearchAds Dataset 15  Publicly available for KDD Cup 2012 by the Tencent search engine  Total Size: 150 million impression sequences  CTR of the Ad at the 1st, 2nd and 3rd position: 4.8%, 2.7%, and 1.4%
  16. 16. Data Analysis: Conversion Rate Changes with Impression Count 16 If we show an item to a user repeatedly, the conversion rate decreases
  17. 17. 17  The conversion rate decreases with both ImpCount and LastSeen PYMK Invitation Rate Changes with Impression Count and Last Seen
  18. 18. 18  Similar observation: The conversion rate also decreases with both ImpCount and LastSeen Endorsement Rate Changes with Impression Count and Last Seen
  19. 19. Impression Discounting Model Fitting 20  Workflow:  Model fitting process:  Fundamental discounting functions  Aggregation Model  Offline and online
  20. 20. Conversion Rate with ImpCount21
  21. 21. Discounting Functions 22
  22. 22. Conversion Rate with LastSeen 23
  23. 23. Aggregation Models 24  Linear Aggregation  Multiplicative Aggregation f(Xi) is one of discounting function given earlier
  24. 24. Anti-Noise Regression Model 26  The sparsity of observations increases variance in the conversion rate.  Typically happens when the number of observations are relatively low.
  25. 25. Density Weighted Regression 27  Add a weight with square error  Original RMSE:  Density weighted RMSE: vi is assessed by the number of observations in a small neighborhood of (Xi, yi)
  26. 26. Impression Discounting Framework 28
  27. 27. Offline Evaluation: Precision @k 29  Precision at top k  Precision improvement for different behavior sets: More features yield higher precision
  28. 28. Online Evaluation: Bucket Test on PYMK 30  Use behavior set (LastSeen, ImpCount)  On LinkedIn PYMK online system
  29. 29. Conclusion 31  Impression Discounting Model  Learnt a discounting function to capture ignored impressions  Linear or multiplicative aggregation model  Anti-noise regression model  Offline and Online evaluation (Bucket testing)
  30. 30. Questions? 32  Related work in the paper  Want to know more contact: mtiwari@linkedin.com

Editor's Notes

  • Good afternoon! I am Mitul Tiwari. Today I am going to talk about Modeling Impression Discounting in Large-scale Recommender Systems. This is joint work with Pei Lee, Laks, and Sam Shah. Most of the work was done when Pei Lee was interning at LinkedIn last summer.
  • Let me start with giving some context. LinkedIn is the largest professional network with more than 300M members, and its growing very fast with more than 2 members joining LinkedIn per second.
  • Here are a couple of examples of large scale recommender systems at LinkedIn.
    People You May Know: recommending other people to connect with. Daily we process 100s of tera-bytes of data and 100s of billions of edges to recommend connections for each of 300M members.
    Suggested skills endorsements: is another example where we recommend a member and skill combination for endorsements.
  • There are many such large scale recommender systems at LinkedIn. For example, Jobs recommendations, news articles suggestions, companies to follow, groups to join, similar profile, related search query etc.
  • Let me describe what we mean by impression discounting using a couple of examples and formulate the problem.
  • Let me start with the case of People You May Know or PYMK. Let’s say 5 days ago a Deepak was recommended to me to invite and connect on LinkedIn, and I ignored that recommendation.
    And Deepak is recommended to me again in PYMK 1 day ago. Now today PYMK systems needs to figure out whether to recommend Deepak again to me or not.
    Basically, if I am not likely to invite Deepak to connect then better discount this recommendation from my PYMK recommendations.
    Note that this is just an example since I am already connected to Deepak and I work very closely with him 
  • Here is another example of skills endorsement suggestions. We have very small real-estate on the website to display endorsement suggestions. If we recommended certain skills in the past for some of your connections and you did not take any action on those suggestion. If you are not likely to endorse with those skills then it’s better if we suggest some other skills to endorse.
  • Let me define certain terms.
    Impressions are recommendation that are seen by users.
    Conversion is a positive action taken on a recommendation. For example, in case of PYMK, invitation to connect. Or in case of Skills Endorsements, endorsing a connection for a skill.
    In Impression discounting problem, we have given a history of impressions or recommendations seen by users, and we would like to discount or remove the items from the recommendation set if it’s not likely to lead to conversion.
  • Our goal is to build an impression discounting framework that can be used as a feature in the recommendation engine model or a plugin that can be applied on top of the existing recommendation engine.

    Our proposed impression discounting model only relies on the historical impression records, and can be treated as a plugin for existing recommendation engines. We use the dotted rectangle to circle the newly built recommender system with impression discounting, which produces discounted impressions with a higher overall conversion rate.
  • Here is the outline of the rest of my talk. Let me define impression more in detail in terms of user, item, conversion, and behavior features.
    Then I will describe the exploratory data analysis work to find out the correlation between impression features and conversion rate
    Then will describe the impression discounting model fitting
    And finally will conclude with offline and online experimental evaluations
  • An impression can be seen as a tuple with multiple fields such as
    User; (2) Item; (3) Conversion;
    (4) Features such as how many times this item was shown to the user; when was the last time this item was shown to the user, etc
    (5) Timestamp; (6) Recommendation score of the item to the use
  • We can derive various features from impressions such as:
    (1) Impression count: the number of times the item was seen by the user; (2) Last seen: when was the last time the item was seen; (3) at what position the item was seen; (4) how frequently user
  • Here are some data analysis results that shows that conversation rate drops as the number of the same item shown to the user increase
    The decrease in conversion rate is much more steep in case of endorsements
    Also note that in case of PYMK the decrease is more gradual. That mean, a recommendation in PYMK still has chance of acceptance even after multiple impressions
  • Given this data and after doing correlation studies between various behavior features and conversion rate
    We would like to learn discounting functions that fits this data
  • In the plot we show various discounting functions that we use to fit the relationship be tween conversion rate and the number of times an item was shown to a user.
  • These are various discounting functions that we use to fit the data
  • After fitting the best discounting function for each of the feature, we combined these functions using linear or multiplicative aggregation
  • Note that in certain cases we didn’t have a lot of data points. For example, a very few items were shown to users more than 40 times and there is a lot of variance because of small number of data points.
    We developed an anti-noise regression technique to give lower weightage to such data points in the cost functions
  • We adopted this technique from density based clustering to give lower weight to certain observations in the cost function
    Throw away information that you don’t need – analogy – denoising auto encoders in deep learning
  • Our proposed impression discounting model only relies on the historical impression records, and can be treated as a plugin for existing recommendation engines. We use the dotted rectangle to circle the newly built recommender system with impression discounting, which produces discounted impressions with a higher overall conversion rate.
  • Talked about modeling impression discounting in large scale recommender systems at LinkedIn such as People You May Know and Skills Endorsement Suggestions.

×