Recsys virtual-profiles

Generating Supplemental Content Information Using
Virtual Profiles
Haishan Liu
Linkedin Corporation
2029 Stierlin Court
Mountain View, CA, 94043
haliu@linkedin.com
Mohammad Amin
2029 Stierlin Court
mamin@linkedin.com
Baoshi Yan
2029 Stierlin Court
byan@linkedin.com
Anmol Bhasin
2029 Stierlin Court
abhasin@linkedin.com
ABSTRACT
We describe a hybrid recommendation platform/technique
at LinkedIn that seeks to optimally extract relevant infor-
mation pertaining to items to be recommended. By extend-
ing the notion of an item profile, we propose the concept
of a “virtual profile” that augments the content of the item
with rich set of features inherited from members who have
already shown explicit interest in it. Unlike item-based col-
laborative filtering, we focus on discovering the characteris-
tic descriptors that underlie the item-user association. Such
information is used as supplemental features in a content-
based filtering system. The main objective of virtual pro-
files is to provide a means to tap into rich-content infor-
mation from one type of entity and propagate features ex-
tracted from which to other affiliated entities that may suf-
fer from relative data scarcity. We empirically evaluate the
proposed method on a real-world community recommenda-
tion problem at Linkedin. The result shows that the virtual
profiles outperform a collaborative filtering based approach
(user who likes this also likes that). In particular, the im-
provement is more significant for new users with only limited
connections, demonstrating the capability of the method to
address the cold-start problem in pure collaborative filtering
systems.
Categories and Subject Descriptors
H.2.8 [Database Management]: Data Mining
General Terms
Theory
Keywords
hybrid recommender systems, feature generation and extrac-
tion, model-based recommendation, virtural profiles
1. INTROCUDTION
Large scale recommender systems, in the era of internet scale
data deluge, contribute significantly to mitigate information
overload problem by unveiling relevant and interesting ob-
jects to users. Rather than hoping for serendipitous encoun-
ters, recommender systems bring forth the notion of per-
sonalized information discovery by presenting to the user a
smaller pool of relevant objects. Collaborative filtering, the
de facto mechanism for recommendation, fails to address
“cold start problems” which has led to the exploration of
hybrid recommenders. Hybrid recommenders combine in-
formation obtained from different sources and techniques to
achieve better outcome. Typically a hybrid recommender
system incorporates information from a myriad of sources
e.g. content meta data, interaction data, global popularity,
social network and social interaction information and so on.
Each of these information sources offers different level of rel-
evance guarantee at varying computation overhead. Hence,
how these information sources are computed and how they
are combined play a vital role in the final outcome.
As of today LinkedIn has more than 220 million users. As
the largest and most popular professional networking site,
LinkedIn presents some unique opportunities and challenges
for content discovery and recommendation. It is imperative
for the members to be able to discover and subscribe to
companies and groups (referred to as community henceforth)
that might be relevant to them from a professional context.
In this paper, we describe a hybrid community recommen-
dation platform/technique at LinkedIn that optimally com-
bines information from multiple sources. In order to extract
more relevant information pertaining to the community to
be recommended, i.e. to further extend the notion of content
meta data, we have proposed the concept of “virtual profile”
that augments the content meta data with rich set of features
inherited from the set of members who have already shown
explicit interest to it. In general the notion of virtual profile
answers: “What are the most dominant features pertain-
ing to the members who have shown interest to a particular

community?”. This question essentially maps an object into
the same feature space as that of the subscribers’. Content
meta data, extended with this inferred information provides
additional warranty against cold start problem. LinkedIn
data presents a unique opportunity to extend the content
features with extracted features since there is no dearth of
rich set of information about the subscribers in the data set,
which essentially renders the synergy immensely valuable.
The contribution of this paper is as follows:
1. Generic content meta data extension method i.e vir-
tual profile generation.
2. Scalable and generic recommendation computation plat-
form that powers multiple real-time recommendation
products at LinkedIn.
3. Seamless integration of multiple, heterogeneous data
sources to compute optimal outcome.
2. RELATED WORK
There has been a flurry of research in the domain of recom-
mender systems with the objective of improving personal-
ization [1]. Most traditional recommenders are powered by
collaborative filtering [9, 17], content-based predictors [8,
14] and knowledge based filtering techniques [11]. Each in-
dividual techniques have their own strengths and weaknesses
e.g. while collaborative filtering techniques suffer from data
sparsity and cold start problems [15], content-based tech-
niques are prone to skewed recommendation [14]. Hybrid
recommenders combine the best of both worlds, making the
recommenders more robust in practice. Much work has been
done to combine multiple recommenders in an effective way
to outperform any single one. In [5] Burke depicts a taxon-
omy of recommender systems, where multiple recommenders
are arranged to allow execution in a parallel or cascaded
topology. A system described in [4] combines multiple col-
laborative filtering approaches using a linear combination of
static weights learned via linear regression. STREAM [2],
which combines multi-tier predictors, uses dynamically gen-
erated metrics to learn the next level of predictors. In [12],
a hybrid movie recommender system is proposed that uses
content based predictors to boost user data which drives the
ensuing collaborative filtering based recommendation. The
content information is obtained from IMDB and a Naive
Bayes classifier is used for building user item profiles. Fi-
nally a user-based collaborative filtering is employed to ob-
tain the final recommendation. However, this approach suf-
fers from scalability issues. Pazzani [13] proposed a hybrid
recommender system where the content based user profiles
are used to group similar users which is subsequently used
to predict user preferences. In many of these user-item rec-
ommendation frameworks, items to be recommended can be
augmented with meta-data corresponding to the members
who have already shown explicit interest to it. In other
words, these items can be represented as an object in the
same feature space as that of the users. These representa-
tions could be thought of “virtual user profiles” or “virtual
profiles”. This could potentially add one other layer of in-
formation source to guide the recommendation process. In
our approach, we describe a large scale recommender system
that combines data from multiple heterogeneous sources in-
cluding virtual profiles and social network to serve real time
traffic in a large professional social networking site.
3. METHOD
3.1 System Overview
We adhere to building our recommender system based on
content filtering since we have an abundant access to rich-
content entities, such as user profiles, which enables a straight-
forward means for feature extraction, indexing and match-
ing. Target entities (those the client wants recommenda-
tions of) are feature extracted and put into a reverse index,
and source entities (those the client wants recommendations
for) are converted into complex queries against the index.
This provides a form of content-based recommendation score
where the match is determined by the degree of similarity
between the source and target entity features, with differ-
ent fields weighted by a set of parameters determined by an
offline learning-to-rank process. Figure 1 illustrates a brief
workflow of the system. It also shows how we can augment
the system by including more information, such as virtual
profiles, as new features in the content filtering recommen-
dation, as detailed below.
Figure 1: A brief workflow for the recommender
system with virtual profiles.
We view every entity as being characterized by two set of
content features: one extracted from explicit information
associated with the entity which we name the “primary pro-
file”, and the other inferred from the entity’s behavior and
association with other entities, which we name the “virtual
profile.”The main objective of virtual profiles is to provide a
means to tap into rich-content information from one type of
entity and propagate features extracted from which to other
affiliated entities that may suffer from relative data scarcity.
Essentially, a virtual profile of an entity is an aggregation
of statistically relevant features from primary profiles of af-
filiated entities, in which way it introduces a collaborative

filtering aspect in our content filtering system. For example,
a virtual profile of a Linkedin group constitutes distinctive
features from its participants so that the group can be most
effectively distinguished from others.
To first extract features from entities to generate primary
profiles, we utilize a feature extractor layer, a standalone
service that accumulates underlying entity database change
events and identifies various fields in the document. Various
types of fields that could be feature extracted include rich
text fields, such as job summary, member position summary
etc., and specialized fields, such as Geo entities including
region, country, city, coordinates, etc.
The presented content filtering system can be extended to
consider other collaborative filtering aspects, for example,
by including network proximity as a feature while computing
relevance scores. We describe a browsemap-based method
along this line as a comparison in Section 4. As a gen-
eral platform, every application consuming recommenda-
tions from this system can easily build its own logic for
reranking/reordering of results based on custom filtering cri-
teria. the concept of network proximity, e.g., recommending
jobs to discussion groups.
3.2 Generating Virtual Profiles
The virtual profile generation process for an entity aims at
selecting from a total of n features of its affiliated entities,
a subset with k < n features that is “maximally informa-
tive” about the entity. In a classification point of view, the
entity that we generate the virtual profile for represents a
target class for a set of documents (primary profiles). We
need a measure to evaluate the“information content”of each
individual feature with regard to the target class. We pro-
pose to use mutual information for this purpose. Mutual
information measures arbitrary dependencies between ran-
dom variables. And the fact that the mutual information
is independent of the coordinates chosen permits a robust
estimation makes it suitable for assessing the “information
content” of features in complex classification tasks.
In accordance with Shannon’s information theory, the un-
certainty of a document class C as a random variable can
be measured as:
H(C) = −
∑
c∈C
P(c)logP(c) ,
After knowing the feature vector F, the conditional entropy
H(C|F) measures the remaining uncertainty about C:
H(C) = −
∑
f∈F
P(f)
∑
c∈C
P(c|f)logP(c|f) .
After having observed the feature vector F, the mutual in-
formation, i.e., the amount of decreased class uncertainty is
defined as:
I(C; F) = H(C) − H(C|F) =
∑
c,f
P(c, f)log
P(c, f)
P(c)P(f)
,
where P(c, f) is the joint probability of class c and feature
f.
Therefore, to generate virtual profiles, the goal is to find
the optimal feature subset, S ⊆ F, so that I(C; S) is max-
imized. From an information theoretic perspective, select-
ing features that maximize I(C; F) translates into selecting
those features that contain the maximum information about
class C. However, locating the optimal subset requires an
exhaustive combinatorial search over the feature space, re-
quiring a number of runs equal to
(n
k
)
, where n is the size
of the original feature set and k is that of the desired sub-
set. Besides, an exact solution also demands large training
sample sizes to estimate the higher order joint probability
distribution in I(F; C). For example, Fraser’s method [6], a
computationally efficient algorithm for calculating the opti-
mal I(C; S), requires for its convergence a number of sam-
ples “in the millions” when the number of features in the
input vector is larger than 3 or 4.
Given these difficulties, most of the existing approaches ap-
proximate I(F; C) based on the assumption of lower-order
dependencies between features. For example, a second-order
feature dependence assumption is proposed by Battiti [3]
to approximate I(F; C) by a greedy incremental selection
scheme with a heuristic to account for correlations between
features: Given a set of already selected features, the algo-
rithm chooses the next feature as the one that maximizes the
information about the class corrected by subtracting a quan-
tity proportional to the average mutual information with the
selected features.
Unfortunately, the calculation of pairwise feature correlation
I(f, f′
) is impractical in our case because the feature dimen-
sion is extremely high given the bag-of-words extracted from
textual contents. Therefore, we make a first-order class de-
pendence assumption that each feature independently influ-
ences the class variable, which means to select the mth fea-
ture, fm, is independent from the (m − 1) already selected
features, i.e., P(fm|f1, . . . , fm−1, C) = P(fm|C). This re-
sults a straightforward greedy algorithm to generate the vir-
tual profile for an entity c, which consists of following steps:
1) gather features from all primary profiles associated with
entities that have an affiliation with c, 2) calculate mutual
information, I(f; c), between each feature and e, and 3) se-
lect top k features with highest I(f; c) into the virtual pro-
file. More specifically, I(f; c) can be calculated as follows.
I(f; c) =
∑
ef ∈{1,0}
∑
ec∈{1,0}
P(f = ef , c = ec) log
P(f = ef , c = ec)
P(f = ef )P(c = ec)
,
(1)
where f is a random variable that takes values ef = 1 (en-
tity primary profile contains feature f) and ef = 0 (the
entity primary profile does not contain feature f), and c is
a random variable that takes values ec = 1 (the entity is
affiliated with c) and ec = 0 (the entity is not affiliated with
c). The probabilities in Equation 1 can be calculated using
maximum likelihood estimation.
4. EXPERIMENTS

Our goal is to test if virtual profiles are a valuable source
of features to improve the recommendation performance. In
designing experiments, we want to verify the heuristic as-
sumption that virtual profile can use features greedily se-
lected by mutual information. We also want to compare the
performance of virtual profiles with other classic collabora-
tive filtering methods and study their tradeoffs. Further-
more, by experimenting with different parameter settings
to generate virtual profiles, we want to provide a general
guidance on how virtual profiles can be best implemented in
practice.
4.1 Methodologies
We choose a community recommendation problem at Linkedin
as the test application. Successful recommendations would
result in users following certain communities, while users
are also presented the choice to opt-out communities at any
later point.
We extract three kinds features from entities (users and com-
munities) in this application domain as follows.
1. content features: features from users’ and communi-
ties’ textual information extracted into predefined stan-
dardized fields (e.g., name, industry, description, etc.).
2. virtual profile: as described in Section 3, a set of fea-
tures selected from a community’s followers as supple-
ments to the community’s primary profile.
3. browsemap: a collaborative feature representing the
co-affiliation relationship, or “users who follow X also
follow Y.”
Browsemaps capture a notion of similarity between com-
munities that is driven by users’ preference. To generate
a browsemap for a community, from all other communi-
ties that it shares followers with, we choose top 50 ones
ranked by TF/IDF. And then for each user, we take the
closure of communities she has already followed with re-
spect to browsemaps, and select top 50 ones weighted by
their TF/IDF scores normalized over the number of com-
munities followed. Communities selected in this way can be
essentially seen as recommendations by collaborative filter-
ing. We instead treat them as part of a standalone feature,
and when combined with users’ content features to generate
a search query, it would lead to extra field matches with hits
against communities appear in the feature. And the weight
of this match, just like matches in other features, can be
determined in an offline learning process.
The content features extracted for communities contains
only three fields (i.e., name, description, and tags). They
represent nearly a minimum amount of information that is
required for a content filtering recommender system to func-
tion, and are therefore considered as a baseline in the exper-
iment. Browsemaps, on the other hand, are designed as an
alternative to virtual profiles for comparison, given that they
both take into account the interaction among entities.
As for model fitting, we use a training set including 3.4
million positive and 2.2 million negative examples gathered
from both explicit and implicit user feedbacks (e.g., fol-
low/unfollow or lack of action to recommendations). We
apply an L2-regularized logistic regression with various com-
bination of the above mentioned features. The best model
under each configuration is selected by optimizing the area
under the ROC curve (AUC-ROC). Performances of differ-
ent models are evaluated both offline and online. The results
are presented in the next sections.
4.2 Results
4.2.1 Offline evaluation
We compare the AUC for models obtained by training with
four different feature configurations, namely, (A) content
features only, (B) content features plus virtual profiles, (C)
content features plus browsemaps, and (D) content features
plus both virtual profiles and browsemaps. It can be seen
from Figure 2 that, the ROC curve of model A completely
dominates that of model B (with AUCs 0.72 vs. 0.60), and
both of them dominate that of model C (AUC 0.44). The
same performance pattern is also exhibited in the precision-
recall curve, as shown in Figure 3.
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
False positive rate
Truepositiverate
content features + vp
content features + bm
content features + vp + bm
content features only
Figure 2: ROC curves for different models.
Besides classification performance, another important mea-
sure that can be evaluated offline is the coverage, which
refers to the degree to which recommendations cover the set
of available items (item space coverage) and the degree to
which recommendations can be generated to all potential
users (user space coverage) [7, 10]. Owing to a distributed
algorithm developed at Linkedin, we are able to calculate
recommendations offline for all our 220 million users. Us-
ing each of the trained model described above, we calculate
a different set of recommendations for each user, with the
size of each set capped at 50. We counted numbers of times

0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Recall
Precision
Figure 3: Precision-recall curves for different mod-
els.
unique communities appeared in recommendations (frequen-
cies) under different models. Figure 4 shows a logarithmic
scale of the frequencies sorted in descending order plotted
against their ranks.
It is not surprising to see that the baseline curve from the
content-features-only model is the lowest since features ex-
tracted for communities in this case contains the least amount
of information. And the distribution of the recommendation
frequency simply reflects the distribution of the amount of
textual content of each community, which is subject to the
power law. On the other hand, the curve from the model
with the addition of browsemaps visibly bulges outwards
from the baseline for about two thirds of points, indicating
that those points are getting higher frequencies showing up
in recommendations, hence more coverage. Most remark-
ably, the model with the addition of virtual profiles signifi-
cantly increased the frequencies for almost all points on the
curve except for cases where original baseline frequencies are
extremely high or low.
The reason why browsemaps slightly boost the coverage for
some communities is because those communities bear little
content information yet having followers already. Having
followers makes them eligible to be potentially included in
other communities’ browsemaps, and thus leads to a higher
chance to matches with users. However, for users not hav-
ing followed any communities at all, browsemaps become an
empty feature, which is the reason why for about a third of
communities, there sees no increase in coverage from browsemaps
compared with the baseline. This phenomenon is also illus-
trated in Figure 5, in which the recommendation frequencies
of unique companies are only counted for new users (i.e.,
users who have not started following communities yet). We
observe that the model with browsemaps produces an iden-
tical curve to the baseline, while the model with virtual pro-
files exerts a consistent boost. This shows that browsemaps,
as a feature of a collaborative filtering aspect, fails to address
cold start, while virtual profiles provides a well-rounded im-
provement in terms of both coverage and predictive power.
0 50000 100000 150000 200000 250000
2e+025e+022e+035e+032e+045e+042e+05
numberofrecommendations
Figure 4: number of recommendation per unique
companies.
4.2.2 Online evaluation
To further evaluate models with various feature configura-
tions (i.e., content features with vp, content features with
bm, content features with both vp and bm, and content fea-
tures only), we deployed them to serve realtime online rec-
ommendation requests and compare performances through
a bucket test. We assign a unique bucket of 2.5% randomly
selected users to each model. The bucket with the model
based only on content features is the control, while others
are variants.
The duration of the test is determined according to Wheeler [18],
where a conservative estimation of sample size to achieve an
80% power (the probability of correctly rejecting the null
hypothesis when it is indeed false) is given by Equation 2.
n = (
4rσ
∆
)2
, (2)
where n is the minimum number of samples (impressions to
be delivered) for each equal-sized variant, r is the number
of variants, σ2
is the variance of the OEC (Overall Evalu-
ation Criterion [16], a quantitative measure of the experi-
ment’s objective.), and ∆ is the sensitivity, or the desired
amount of change. The OEC in this test is the Click-through
rate (CTR) of recommendations. Assume each click-through

0 50000 100000 150000 200000
2e+025e+022e+035e+032e+045e+04
numberofrecommendations
Figure 5: number of recommendation for new users
per unique companies.
event is a Bernoulli trial with probability p = ctr0 (con-
trol CTR, which is estimated from historical data), then
σ2
= p(1 − p). Applying Equation 2 and knowing the ap-
proximate recommendation impressions per day, we derive
the length of the test to be 7 days.
Figure 7 presents the results of the test by showing the per-
centage change in CTR of variant models relative to the con-
trol, on each individual day of the test. Overall, the model
with virtual profiles outperforms the control by 91.2%. Sur-
prisingly, however, we do not observe any improvement from
the model with browsemaps. The model with both virtual
profiles and browsemaps increased the CTR by 84.4%. The
difference between the two best performing model is not
significant (p value 0.062), which is similar to the offline
evaluation result. The reason why browsemaps fail to in-
crease overall CTR may be attributed to the fact that only
one third of users have followed communities in this par-
ticular application, meaning the cold start effect is much
pronounced. Virtual profiles, on the other hand, is not vul-
nerable to this problem since it is content-based and does
not rely on pre-existing user-item affiliations, as is demon-
strated in this experiment.
5. CONCLUSION AND FUTURE WORK
We presented virtual profiles, a generic content meta data
extension method. We also introduced how it is utilized in a
scalable and generic content-based hybrid recommender sys-
tem that powers multiple real-time recommendation prod-
ucts at LinkedIn. The goal of virtual profiles is to provide a
means to tap into rich-content information from one type of
entity and propagate features extracted from which to other
affiliated entities that may suffer from relative data scarcity.
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
False positive rate
Truepositiverate
vp−top50
vp−top100
vp−top200
Figure 6: ROC curves for virtual profiles with dif-
ferent number of terms.
It brings a collaborative filtering aspect in the form of a sup-
plement to content features in the recommender system. It
is shown to outperform a method that directly incorporate
network proximity from collaborative filtering.
Experiments supported that our first-order class dependence
assumption and the greedy algorithm in calculating the mu-
tual information is a reasonable approximation. In future
work, we will investigate scalable ways to account for de-
pendencies among features. We plan to explore more term
weighting methods besides mutual information, including
other classic information theoretic quantities such as the
Kullback-Leibler divergence, or TF/IDF.
6. REFERENCES
[1] G. Adomavicius and A. Tuzhilin. Toward the next
generation of recommender systems: A survey of the
state-of-the-art and possible extensions. IEEE
TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING, 17(6):734–749, 2005.
[2] X. Bao, L. Bergman, and R. Thompson. Stacking
recommendation engines with additional
meta-features. In Proceedings of the third ACM
conference on Recommender systems, RecSys ’09,
pages 109–116, 2009.
[3] R. Battiti. Using mutual information for selecting
features in supervised neural net learning. Trans.
Neur. Netw., 5(4):537–550, July 1994.
[4] R. M. Bell, Y. Koren, and C. Volinsky. The BellKor
solution to the Netflix Prize.
[5] R. Burke. Hybrid recommender systems: Survey and
experiments. User Modeling and User-Adapted
Interaction, 12(4):331–370, Nov. 2002.

1 2 3 4 5 6 7
0.51.01.52.02.53.0
Day
CTR%
1 2 3 4 5 6 7
0.51.01.52.02.53.0
1 2 3 4 5 6 7
0.51.01.52.02.53.0
Figure 7: Model CTRs.
[6] A. M. Fraser and H. L. Swinney. Independent
coordinates for strange attractors from mutual
information. Physical Review A, 33(2):1134–1140, Feb.
1986.
[7] M. Ge, C. Delgado-Battenfeld, and D. Jannach.
Beyond accuracy: evaluating recommender systems by
coverage and serendipity. In Proceedings of the fourth
ACM conference on Recommender systems, RecSys
’10, pages 257–260, New York, NY, USA, 2010. ACM.
[8] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins.
Eigentaste: A constant time collaborative filtering
algorithm. Inf. Retr., 4(2):133–151, July 2001.
[9] J. L. Herlocker, J. A. Konstan, and J. Riedl.
Explaining collaborative filtering recommendations. In
Proceedings of the 2000 ACM conference on Computer
supported cooperative work, CSCW ’00, pages 241–250,
New York, NY, USA, 2000. ACM.
[10] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and
J. T. Riedl. Evaluating collaborative filtering
recommender systems. ACM Trans. Inf. Syst.,
22(1):5–53, Jan. 2004.
[11] P. B. Kantor. Recommender systems handbook.
Springer, 2009.
[12] P. Melville, R. J. Mooney, and R. Nagarajan.
Content-boosted collaborative filtering for improved
recommendations. pages 187–192, 2002.
[13] M. J. Pazzani. A framework for collaborative,
content-based and demographic filtering. Artif. Intell.
Rev., 13(5-6):393–408, Dec. 1999.
[14] M. J. Pazzani and D. Billsus. The adaptive web.
chapter Content-based recommendation systems,
pages 325–341. Springer-Verlag, Berlin, Heidelberg,
2007.
[15] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and
J. Riedl. Grouplens: an open architecture for
collaborative filtering of netnews. In Proceedings of the
1994 ACM conference on Computer supported
cooperative work, CSCW ’94, pages 175–186, New
York, NY, USA, 1994. ACM.
[16] R. K. Roy. Design of experiments using the taguchi
approach: 16 steps to product and process
improvement. Wiley, 20011.
[17] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl.
Item-based collaborative filtering recommendation
algorithms. In Proceedings of the 10th international
conference on World Wide Web, WWW ’01, pages
285–295, 2001.
[18] R. E. Wheller. Portable power. Technometrics,
16(2):177–179, 1974.

Recsys virtual-profiles

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (6)

Similar to Recsys virtual-profiles

Similar to Recsys virtual-profiles (20)

Recsys virtual-profiles