Recsys virtual-profiles


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Recsys virtual-profiles

  1. 1. Generating Supplemental Content Information Using Virtual Profiles Haishan Liu Linkedin Corporation 2029 Stierlin Court Mountain View, CA, 94043 Mohammad Amin Linkedin Corporation 2029 Stierlin Court Mountain View, CA, 94043 Baoshi Yan Linkedin Corporation 2029 Stierlin Court Mountain View, CA, 94043 Anmol Bhasin Linkedin Corporation 2029 Stierlin Court Mountain View, CA, 94043 ABSTRACT We describe a hybrid recommendation platform/technique at LinkedIn that seeks to optimally extract relevant infor- mation pertaining to items to be recommended. By extend- ing the notion of an item profile, we propose the concept of a “virtual profile” that augments the content of the item with rich set of features inherited from members who have already shown explicit interest in it. Unlike item-based col- laborative filtering, we focus on discovering the characteris- tic descriptors that underlie the item-user association. Such information is used as supplemental features in a content- based filtering system. The main objective of virtual pro- files is to provide a means to tap into rich-content infor- mation from one type of entity and propagate features ex- tracted from which to other affiliated entities that may suf- fer from relative data scarcity. We empirically evaluate the proposed method on a real-world community recommenda- tion problem at Linkedin. The result shows that the virtual profiles outperform a collaborative filtering based approach (user who likes this also likes that). In particular, the im- provement is more significant for new users with only limited connections, demonstrating the capability of the method to address the cold-start problem in pure collaborative filtering systems. Categories and Subject Descriptors H.2.8 [Database Management]: Data Mining General Terms Theory Keywords hybrid recommender systems, feature generation and extrac- tion, model-based recommendation, virtural profiles 1. INTROCUDTION Large scale recommender systems, in the era of internet scale data deluge, contribute significantly to mitigate information overload problem by unveiling relevant and interesting ob- jects to users. Rather than hoping for serendipitous encoun- ters, recommender systems bring forth the notion of per- sonalized information discovery by presenting to the user a smaller pool of relevant objects. Collaborative filtering, the de facto mechanism for recommendation, fails to address “cold start problems” which has led to the exploration of hybrid recommenders. Hybrid recommenders combine in- formation obtained from different sources and techniques to achieve better outcome. Typically a hybrid recommender system incorporates information from a myriad of sources e.g. content meta data, interaction data, global popularity, social network and social interaction information and so on. Each of these information sources offers different level of rel- evance guarantee at varying computation overhead. Hence, how these information sources are computed and how they are combined play a vital role in the final outcome. As of today LinkedIn has more than 220 million users. As the largest and most popular professional networking site, LinkedIn presents some unique opportunities and challenges for content discovery and recommendation. It is imperative for the members to be able to discover and subscribe to companies and groups (referred to as community henceforth) that might be relevant to them from a professional context. In this paper, we describe a hybrid community recommen- dation platform/technique at LinkedIn that optimally com- bines information from multiple sources. In order to extract more relevant information pertaining to the community to be recommended, i.e. to further extend the notion of content meta data, we have proposed the concept of “virtual profile” that augments the content meta data with rich set of features inherited from the set of members who have already shown explicit interest to it. In general the notion of virtual profile answers: “What are the most dominant features pertain- ing to the members who have shown interest to a particular
  2. 2. community?”. This question essentially maps an object into the same feature space as that of the subscribers’. Content meta data, extended with this inferred information provides additional warranty against cold start problem. LinkedIn data presents a unique opportunity to extend the content features with extracted features since there is no dearth of rich set of information about the subscribers in the data set, which essentially renders the synergy immensely valuable. The contribution of this paper is as follows: 1. Generic content meta data extension method i.e vir- tual profile generation. 2. Scalable and generic recommendation computation plat- form that powers multiple real-time recommendation products at LinkedIn. 3. Seamless integration of multiple, heterogeneous data sources to compute optimal outcome. 2. RELATED WORK There has been a flurry of research in the domain of recom- mender systems with the objective of improving personal- ization [1]. Most traditional recommenders are powered by collaborative filtering [9, 17], content-based predictors [8, 14] and knowledge based filtering techniques [11]. Each in- dividual techniques have their own strengths and weaknesses e.g. while collaborative filtering techniques suffer from data sparsity and cold start problems [15], content-based tech- niques are prone to skewed recommendation [14]. Hybrid recommenders combine the best of both worlds, making the recommenders more robust in practice. Much work has been done to combine multiple recommenders in an effective way to outperform any single one. In [5] Burke depicts a taxon- omy of recommender systems, where multiple recommenders are arranged to allow execution in a parallel or cascaded topology. A system described in [4] combines multiple col- laborative filtering approaches using a linear combination of static weights learned via linear regression. STREAM [2], which combines multi-tier predictors, uses dynamically gen- erated metrics to learn the next level of predictors. In [12], a hybrid movie recommender system is proposed that uses content based predictors to boost user data which drives the ensuing collaborative filtering based recommendation. The content information is obtained from IMDB and a Naive Bayes classifier is used for building user item profiles. Fi- nally a user-based collaborative filtering is employed to ob- tain the final recommendation. However, this approach suf- fers from scalability issues. Pazzani [13] proposed a hybrid recommender system where the content based user profiles are used to group similar users which is subsequently used to predict user preferences. In many of these user-item rec- ommendation frameworks, items to be recommended can be augmented with meta-data corresponding to the members who have already shown explicit interest to it. In other words, these items can be represented as an object in the same feature space as that of the users. These representa- tions could be thought of “virtual user profiles” or “virtual profiles”. This could potentially add one other layer of in- formation source to guide the recommendation process. In our approach, we describe a large scale recommender system that combines data from multiple heterogeneous sources in- cluding virtual profiles and social network to serve real time traffic in a large professional social networking site. 3. METHOD 3.1 System Overview We adhere to building our recommender system based on content filtering since we have an abundant access to rich- content entities, such as user profiles, which enables a straight- forward means for feature extraction, indexing and match- ing. Target entities (those the client wants recommenda- tions of) are feature extracted and put into a reverse index, and source entities (those the client wants recommendations for) are converted into complex queries against the index. This provides a form of content-based recommendation score where the match is determined by the degree of similarity between the source and target entity features, with differ- ent fields weighted by a set of parameters determined by an offline learning-to-rank process. Figure 1 illustrates a brief workflow of the system. It also shows how we can augment the system by including more information, such as virtual profiles, as new features in the content filtering recommen- dation, as detailed below. Figure 1: A brief workflow for the recommender system with virtual profiles. We view every entity as being characterized by two set of content features: one extracted from explicit information associated with the entity which we name the “primary pro- file”, and the other inferred from the entity’s behavior and association with other entities, which we name the “virtual profile.”The main objective of virtual profiles is to provide a means to tap into rich-content information from one type of entity and propagate features extracted from which to other affiliated entities that may suffer from relative data scarcity. Essentially, a virtual profile of an entity is an aggregation of statistically relevant features from primary profiles of af- filiated entities, in which way it introduces a collaborative
  3. 3. filtering aspect in our content filtering system. For example, a virtual profile of a Linkedin group constitutes distinctive features from its participants so that the group can be most effectively distinguished from others. To first extract features from entities to generate primary profiles, we utilize a feature extractor layer, a standalone service that accumulates underlying entity database change events and identifies various fields in the document. Various types of fields that could be feature extracted include rich text fields, such as job summary, member position summary etc., and specialized fields, such as Geo entities including region, country, city, coordinates, etc. The presented content filtering system can be extended to consider other collaborative filtering aspects, for example, by including network proximity as a feature while computing relevance scores. We describe a browsemap-based method along this line as a comparison in Section 4. As a gen- eral platform, every application consuming recommenda- tions from this system can easily build its own logic for reranking/reordering of results based on custom filtering cri- teria. the concept of network proximity, e.g., recommending jobs to discussion groups. 3.2 Generating Virtual Profiles The virtual profile generation process for an entity aims at selecting from a total of n features of its affiliated entities, a subset with k < n features that is “maximally informa- tive” about the entity. In a classification point of view, the entity that we generate the virtual profile for represents a target class for a set of documents (primary profiles). We need a measure to evaluate the“information content”of each individual feature with regard to the target class. We pro- pose to use mutual information for this purpose. Mutual information measures arbitrary dependencies between ran- dom variables. And the fact that the mutual information is independent of the coordinates chosen permits a robust estimation makes it suitable for assessing the “information content” of features in complex classification tasks. In accordance with Shannon’s information theory, the un- certainty of a document class C as a random variable can be measured as: H(C) = − ∑ c∈C P(c)logP(c) , After knowing the feature vector F, the conditional entropy H(C|F) measures the remaining uncertainty about C: H(C) = − ∑ f∈F P(f) ∑ c∈C P(c|f)logP(c|f) . After having observed the feature vector F, the mutual in- formation, i.e., the amount of decreased class uncertainty is defined as: I(C; F) = H(C) − H(C|F) = ∑ c,f P(c, f)log P(c, f) P(c)P(f) , where P(c, f) is the joint probability of class c and feature f. Therefore, to generate virtual profiles, the goal is to find the optimal feature subset, S ⊆ F, so that I(C; S) is max- imized. From an information theoretic perspective, select- ing features that maximize I(C; F) translates into selecting those features that contain the maximum information about class C. However, locating the optimal subset requires an exhaustive combinatorial search over the feature space, re- quiring a number of runs equal to (n k ) , where n is the size of the original feature set and k is that of the desired sub- set. Besides, an exact solution also demands large training sample sizes to estimate the higher order joint probability distribution in I(F; C). For example, Fraser’s method [6], a computationally efficient algorithm for calculating the opti- mal I(C; S), requires for its convergence a number of sam- ples “in the millions” when the number of features in the input vector is larger than 3 or 4. Given these difficulties, most of the existing approaches ap- proximate I(F; C) based on the assumption of lower-order dependencies between features. For example, a second-order feature dependence assumption is proposed by Battiti [3] to approximate I(F; C) by a greedy incremental selection scheme with a heuristic to account for correlations between features: Given a set of already selected features, the algo- rithm chooses the next feature as the one that maximizes the information about the class corrected by subtracting a quan- tity proportional to the average mutual information with the selected features. Unfortunately, the calculation of pairwise feature correlation I(f, f′ ) is impractical in our case because the feature dimen- sion is extremely high given the bag-of-words extracted from textual contents. Therefore, we make a first-order class de- pendence assumption that each feature independently influ- ences the class variable, which means to select the mth fea- ture, fm, is independent from the (m − 1) already selected features, i.e., P(fm|f1, . . . , fm−1, C) = P(fm|C). This re- sults a straightforward greedy algorithm to generate the vir- tual profile for an entity c, which consists of following steps: 1) gather features from all primary profiles associated with entities that have an affiliation with c, 2) calculate mutual information, I(f; c), between each feature and e, and 3) se- lect top k features with highest I(f; c) into the virtual pro- file. More specifically, I(f; c) can be calculated as follows. I(f; c) = ∑ ef ∈{1,0} ∑ ec∈{1,0} P(f = ef , c = ec) log P(f = ef , c = ec) P(f = ef )P(c = ec) , (1) where f is a random variable that takes values ef = 1 (en- tity primary profile contains feature f) and ef = 0 (the entity primary profile does not contain feature f), and c is a random variable that takes values ec = 1 (the entity is affiliated with c) and ec = 0 (the entity is not affiliated with c). The probabilities in Equation 1 can be calculated using maximum likelihood estimation. 4. EXPERIMENTS
  4. 4. Our goal is to test if virtual profiles are a valuable source of features to improve the recommendation performance. In designing experiments, we want to verify the heuristic as- sumption that virtual profile can use features greedily se- lected by mutual information. We also want to compare the performance of virtual profiles with other classic collabora- tive filtering methods and study their tradeoffs. Further- more, by experimenting with different parameter settings to generate virtual profiles, we want to provide a general guidance on how virtual profiles can be best implemented in practice. 4.1 Methodologies We choose a community recommendation problem at Linkedin as the test application. Successful recommendations would result in users following certain communities, while users are also presented the choice to opt-out communities at any later point. We extract three kinds features from entities (users and com- munities) in this application domain as follows. 1. content features: features from users’ and communi- ties’ textual information extracted into predefined stan- dardized fields (e.g., name, industry, description, etc.). 2. virtual profile: as described in Section 3, a set of fea- tures selected from a community’s followers as supple- ments to the community’s primary profile. 3. browsemap: a collaborative feature representing the co-affiliation relationship, or “users who follow X also follow Y.” Browsemaps capture a notion of similarity between com- munities that is driven by users’ preference. To generate a browsemap for a community, from all other communi- ties that it shares followers with, we choose top 50 ones ranked by TF/IDF. And then for each user, we take the closure of communities she has already followed with re- spect to browsemaps, and select top 50 ones weighted by their TF/IDF scores normalized over the number of com- munities followed. Communities selected in this way can be essentially seen as recommendations by collaborative filter- ing. We instead treat them as part of a standalone feature, and when combined with users’ content features to generate a search query, it would lead to extra field matches with hits against communities appear in the feature. And the weight of this match, just like matches in other features, can be determined in an offline learning process. The content features extracted for communities contains only three fields (i.e., name, description, and tags). They represent nearly a minimum amount of information that is required for a content filtering recommender system to func- tion, and are therefore considered as a baseline in the exper- iment. Browsemaps, on the other hand, are designed as an alternative to virtual profiles for comparison, given that they both take into account the interaction among entities. As for model fitting, we use a training set including 3.4 million positive and 2.2 million negative examples gathered from both explicit and implicit user feedbacks (e.g., fol- low/unfollow or lack of action to recommendations). We apply an L2-regularized logistic regression with various com- bination of the above mentioned features. The best model under each configuration is selected by optimizing the area under the ROC curve (AUC-ROC). Performances of differ- ent models are evaluated both offline and online. The results are presented in the next sections. 4.2 Results 4.2.1 Offline evaluation We compare the AUC for models obtained by training with four different feature configurations, namely, (A) content features only, (B) content features plus virtual profiles, (C) content features plus browsemaps, and (D) content features plus both virtual profiles and browsemaps. It can be seen from Figure 2 that, the ROC curve of model A completely dominates that of model B (with AUCs 0.72 vs. 0.60), and both of them dominate that of model C (AUC 0.44). The same performance pattern is also exhibited in the precision- recall curve, as shown in Figure 3. 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Truepositiverate content features + vp content features + bm content features + vp + bm content features only Figure 2: ROC curves for different models. Besides classification performance, another important mea- sure that can be evaluated offline is the coverage, which refers to the degree to which recommendations cover the set of available items (item space coverage) and the degree to which recommendations can be generated to all potential users (user space coverage) [7, 10]. Owing to a distributed algorithm developed at Linkedin, we are able to calculate recommendations offline for all our 220 million users. Us- ing each of the trained model described above, we calculate a different set of recommendations for each user, with the size of each set capped at 50. We counted numbers of times
  5. 5. 0.0 0.2 0.4 0.6 0.8 1.0 Recall Precision content features + vp content features + bm content features + vp + bm content features only Figure 3: Precision-recall curves for different mod- els. unique communities appeared in recommendations (frequen- cies) under different models. Figure 4 shows a logarithmic scale of the frequencies sorted in descending order plotted against their ranks. It is not surprising to see that the baseline curve from the content-features-only model is the lowest since features ex- tracted for communities in this case contains the least amount of information. And the distribution of the recommendation frequency simply reflects the distribution of the amount of textual content of each community, which is subject to the power law. On the other hand, the curve from the model with the addition of browsemaps visibly bulges outwards from the baseline for about two thirds of points, indicating that those points are getting higher frequencies showing up in recommendations, hence more coverage. Most remark- ably, the model with the addition of virtual profiles signifi- cantly increased the frequencies for almost all points on the curve except for cases where original baseline frequencies are extremely high or low. The reason why browsemaps slightly boost the coverage for some communities is because those communities bear little content information yet having followers already. Having followers makes them eligible to be potentially included in other communities’ browsemaps, and thus leads to a higher chance to matches with users. However, for users not hav- ing followed any communities at all, browsemaps become an empty feature, which is the reason why for about a third of communities, there sees no increase in coverage from browsemaps compared with the baseline. This phenomenon is also illus- trated in Figure 5, in which the recommendation frequencies of unique companies are only counted for new users (i.e., users who have not started following communities yet). We observe that the model with browsemaps produces an iden- tical curve to the baseline, while the model with virtual pro- files exerts a consistent boost. This shows that browsemaps, as a feature of a collaborative filtering aspect, fails to address cold start, while virtual profiles provides a well-rounded im- provement in terms of both coverage and predictive power. 0 50000 100000 150000 200000 250000 2e+025e+022e+035e+032e+045e+042e+05 numberofrecommendations content features + vp content features + bm content features only Figure 4: number of recommendation per unique companies. 4.2.2 Online evaluation To further evaluate models with various feature configura- tions (i.e., content features with vp, content features with bm, content features with both vp and bm, and content fea- tures only), we deployed them to serve realtime online rec- ommendation requests and compare performances through a bucket test. We assign a unique bucket of 2.5% randomly selected users to each model. The bucket with the model based only on content features is the control, while others are variants. The duration of the test is determined according to Wheeler [18], where a conservative estimation of sample size to achieve an 80% power (the probability of correctly rejecting the null hypothesis when it is indeed false) is given by Equation 2. n = ( 4rσ ∆ )2 , (2) where n is the minimum number of samples (impressions to be delivered) for each equal-sized variant, r is the number of variants, σ2 is the variance of the OEC (Overall Evalu- ation Criterion [16], a quantitative measure of the experi- ment’s objective.), and ∆ is the sensitivity, or the desired amount of change. The OEC in this test is the Click-through rate (CTR) of recommendations. Assume each click-through
  6. 6. 0 50000 100000 150000 200000 2e+025e+022e+035e+032e+045e+04 numberofrecommendations content features + vp content features + bm content features only Figure 5: number of recommendation for new users per unique companies. event is a Bernoulli trial with probability p = ctr0 (con- trol CTR, which is estimated from historical data), then σ2 = p(1 − p). Applying Equation 2 and knowing the ap- proximate recommendation impressions per day, we derive the length of the test to be 7 days. Figure 7 presents the results of the test by showing the per- centage change in CTR of variant models relative to the con- trol, on each individual day of the test. Overall, the model with virtual profiles outperforms the control by 91.2%. Sur- prisingly, however, we do not observe any improvement from the model with browsemaps. The model with both virtual profiles and browsemaps increased the CTR by 84.4%. The difference between the two best performing model is not significant (p value 0.062), which is similar to the offline evaluation result. The reason why browsemaps fail to in- crease overall CTR may be attributed to the fact that only one third of users have followed communities in this par- ticular application, meaning the cold start effect is much pronounced. Virtual profiles, on the other hand, is not vul- nerable to this problem since it is content-based and does not rely on pre-existing user-item affiliations, as is demon- strated in this experiment. 5. CONCLUSION AND FUTURE WORK We presented virtual profiles, a generic content meta data extension method. We also introduced how it is utilized in a scalable and generic content-based hybrid recommender sys- tem that powers multiple real-time recommendation prod- ucts at LinkedIn. The goal of virtual profiles is to provide a means to tap into rich-content information from one type of entity and propagate features extracted from which to other affiliated entities that may suffer from relative data scarcity. 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Truepositiverate vp−top50 vp−top100 vp−top200 Figure 6: ROC curves for virtual profiles with dif- ferent number of terms. It brings a collaborative filtering aspect in the form of a sup- plement to content features in the recommender system. It is shown to outperform a method that directly incorporate network proximity from collaborative filtering. Experiments supported that our first-order class dependence assumption and the greedy algorithm in calculating the mu- tual information is a reasonable approximation. In future work, we will investigate scalable ways to account for de- pendencies among features. We plan to explore more term weighting methods besides mutual information, including other classic information theoretic quantities such as the Kullback-Leibler divergence, or TF/IDF. 6. REFERENCES [1] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 17(6):734–749, 2005. [2] X. Bao, L. Bergman, and R. Thompson. Stacking recommendation engines with additional meta-features. In Proceedings of the third ACM conference on Recommender systems, RecSys ’09, pages 109–116, 2009. [3] R. Battiti. Using mutual information for selecting features in supervised neural net learning. Trans. Neur. Netw., 5(4):537–550, July 1994. [4] R. M. Bell, Y. Koren, and C. Volinsky. The BellKor solution to the Netflix Prize. [5] R. Burke. Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction, 12(4):331–370, Nov. 2002.
  7. 7. 1 2 3 4 5 6 7 Day CTR% 1 2 3 4 5 6 7 1 2 3 4 5 6 7 content features + vp content features + bm content features + vp + bm Figure 7: Model CTRs. [6] A. M. Fraser and H. L. Swinney. Independent coordinates for strange attractors from mutual information. Physical Review A, 33(2):1134–1140, Feb. 1986. [7] M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. In Proceedings of the fourth ACM conference on Recommender systems, RecSys ’10, pages 257–260, New York, NY, USA, 2010. ACM. [8] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigentaste: A constant time collaborative filtering algorithm. Inf. Retr., 4(2):133–151, July 2001. [9] J. L. Herlocker, J. A. Konstan, and J. Riedl. Explaining collaborative filtering recommendations. In Proceedings of the 2000 ACM conference on Computer supported cooperative work, CSCW ’00, pages 241–250, New York, NY, USA, 2000. ACM. [10] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22(1):5–53, Jan. 2004. [11] P. B. Kantor. Recommender systems handbook. Springer, 2009. [12] P. Melville, R. J. Mooney, and R. Nagarajan. Content-boosted collaborative filtering for improved recommendations. pages 187–192, 2002. [13] M. J. Pazzani. A framework for collaborative, content-based and demographic filtering. Artif. Intell. Rev., 13(5-6):393–408, Dec. 1999. [14] M. J. Pazzani and D. Billsus. The adaptive web. chapter Content-based recommendation systems, pages 325–341. Springer-Verlag, Berlin, Heidelberg, 2007. [15] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: an open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM conference on Computer supported cooperative work, CSCW ’94, pages 175–186, New York, NY, USA, 1994. ACM. [16] R. K. Roy. Design of experiments using the taguchi approach: 16 steps to product and process improvement. Wiley, 20011. [17] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web, WWW ’01, pages 285–295, 2001. [18] R. E. Wheller. Portable power. Technometrics, 16(2):177–179, 1974.