Sociocast CF Benchmark

Benchmarking NODE against Collaborative Filtering
Albert Azout Giri Iyengar, PhD
Sociocast Networks LLC Sociocast Networks LLC
New York, New York New York, New York
albert.azout@sociocast.com giri.iyengar@sociocast.com

January 4, 2013

Abstract
We benchmark Sociocast’s proprietary NODE algorithm against the popular col-
laborative filtering algorithm for the task of predicting Social bookmarking activity of
internet users. Our results indicate that NODE performs between 4 and 10 times better
in precision, recall and F1 score compared to collaborative filtering. This performance
was holds across varying levels of the prediction window.

1 Introduction
Recommender systems have become widely used across many industries - e.g. Netflix
for movies, Pandora for music, and Amazon for consumer products. This technology
has also been studied in statistics and machine learning research communities. The
underlying problem is to form product recommendations based on previously recorded
data [1, 11]. Better recommendations can improve customer loyalty by helping cus-
tomers find products of interest they were previously unaware of [2].
Ansari, Essegaier, and Kohli (2000) categorize recommendation systems into two
types: collaborative filtering and content-based approaches [3]. In this short study,
we benchmark the performance of Sociocast’s NODE algorithm against collaborative
filtering. NODE incorporates the time dimension into its core similarity function be-
tween users, whereas in collaborative filtering the introduction of temporal dynamics
is usually via additional parameters that result in a very large number of parameters
which makes the algorithm unscalable and impractical for most use cases (cf. Netflix
prize winning algorithm).

2 Testing Methodology
2.1 Delicious Dataset
We use the Delicious (a Yahoo! company) dataset that is publicly available. This
dataset represents bookmarking activity by 210,000 users on the www.delicious.com

1

website over a period of 10 days (Sept 5th, 2009 - Sept 14th, 2009). The first eight
days are given to both algorithms for training, and the last two days are withheld as
the ground truth for testing. We restrict the dataset to only those users who had at
least 10 bookmarks in this period. This represents 14337 users and 600752 bookmarks
over the 8-day training period and another 136164 bookmarks over the test period.

2.2 Bookmark Classification
Each user provided bookmark corresponds to a live URL. We classify each URL into
a space of 434 classes using a proprietary machine learning based classifier trained
on a custom-curated corpora. These classes correspond to the 2nd level of the IAB
standard taxonomy. An example classification of a URL could be “Sports, Basketball”
or “Technology Products, Laptops”. Each URL is allowed up to three classifications.
The prediction task can then be thought of as predicting which classes or topics each
user will bookmark next, based on their previous bookmarking activity.

2.3 Collaborative Filtering
User-based collaborative filtering [4, 5, 6] is a memory-based algorithm which mimics
the word-of-mouth behavior for rating data. The intuition is that users with similar
preferences will rate items similarly. Missing ratings for a user can be predicted by
finding a neighborhood of similar users and then aggregating the ratings of these users
to form a prediction. A neighborhood of similar users can be defined with either the
Pearson correlation coefficient or cosine similarity:

¯ ¯
i∈I (xi x)(yi y)
simP earson (x, y) = (1)
(|I| − 1)sd(x)sd(y)
x·y
simcosine (x, y) = (2)
||x||||y||

where I is the set of items, x and y represent the row vectors in the rating matrix R
of two users’ profile vectors, sd(·) is the standard deviation and || · || is the l2 norm of
a vector. Once the users in a neighborhood of an active user N (a) ⊂ U are found by
taking a threshold on the similarity or by taking the k nearest neighbors, the easiest
way to form predicted ratings is to average the ratings in the neighborhood:
1
raj =
ˆ sai rij (3)
i∈N (a)sai i∈N (a)

where sai is the similarity between the active user ua and user ui in the neighbor-
hood.
In some data sets where numeric ratings are not appropriate or only binary data
is available, a version of CF using 0-1 data is available [7, 8]. The Delicious dataset is
best represented this way, where each rating rjk ∈ {0, 1} can be defined as:

1 if user uj bookmarked item ik
rjk =
0 otherwise.

2

A similarity measure which only focuses on matching ones and avoids the ambiguity
of zeroes representing either missing ratings or negative examples is the Jaccard index:

|X ∩ Y|
simJaccard (X , Y) = (4)
|X ∪ Y|

where X and Y are the sets of the items with a 1 in user profiles ua and ub , respectively.

3 Evaluation and Results
We ask each algorithm to generate the top-N recommended items for each user (where
N can vary), based on the training period. Each recommended item can then be
checked whether or not it appears in the withheld ground truth period. The results
can be summarized with the classical binary classification confusion matrix. Precision,
recall, and F1 are popular metrics used in information retrieval [9, 10]:

correctly recommended items
P recision =
total recommended items
correctly recommended items
Recall =
total useful recommendations
P recision · Recall
F1 = 2 ·
P recision + Recall
The tables below summarize the performance of the two algorithms for different
levels of N , where N is the number of recommendations for each user each algorithm
is forced to make. Each recommendation is then evaluated against the ground truth
set, then tallied using precision, recall, and F1.

Precision
N NODE CF Factor of Improvement
1 35.31% 4.22% 8.37
2 31.01% 3.11% 9.93
5 23.50% 3.66% 6.42
10 18.19% 3.03% 6.01
15 15.05% 3.87% 3.89
Recall
1 5.43% 0.65% 8.37
2 9.53% 0.96% 9.93
5 18.06% 2.81% 6.42
10 27.97% 4.65% 6.01
15 34.68% 8.92% 3.89

3

F1 score
1 9.41% 1.12% 8.37
2 14.58% 1.46% 9.93
5 20.43% 3.18% 6.42
10 22.04% 3.67% 6.01
15 20.98% 5.39% 3.89
NODE consistently outperforms CF by a factor of 3.89 to 9.93 in both precision
and recall. Note that the factor of improvement is consistent across all metrics, since
both algorithms are forced make the same number of predictions, and the ground truth
set is also the same for both algorithms.

References
[1] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Analysis of recommendation
algorithms for e-commerce. In EC ’00: Proceedings of the 2nd ACM conference on
Electronic commerce, pages 158–167. ACM, 2000. ISBN 1-58113-272-7.
[2] J. B. Schafer, J. A. Konstan, and J. Riedl. E-commerce recommendation applica-
tions. Data Mining and Knowledge Discovery, 5(1/2):115–153, 2001.
[3] A. Ansari, S. Essegaier, and R. Kohli. Internet recommendation systems. Journal
of Marketing Research, 37:363–375, 2000.
[4] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry. Using collaborative filtering to
weave an information tapestry. Communications of the ACM, 35(12):61–70, 1992.
ISSN 0001-0782. doi: http://doi.acm.org/10.1145/138859.138867.
[5] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: an open
archi- tecture for collaborative filtering of netnews. In CSCW ’94: Proceedings of
the 1994 ACM conference on Computer supported cooperative work, pages 175–186.
ACM, 1994. ISBN 0-89791-689-1. doi: http://doi.acm.org/10.1145/192844.192905.
[6] U. Shardanand and P. Maes. Social information filtering: Algorithms for automat-
ing ’word of mouth’. In Conference proceedings on Human factors in computing
systems (CHI’95), pages 210–217, Denver, CO, May 1995. ACM Press/Addison-
Wesley Publishing Co.
[7] A. Mild and T. Reutterer. An improved collaborative filtering approach for pre-
dicting cross- category purchases based on binary market basket data. Journal of
Retailing and Consumer Services, 10(3):123–133, 2003.
[8] J.-S. Lee, C.-H. Jun, J. Lee, and S. Kim. Classification-based collaborative filter-
ing using market basket data. Expert Systems with Applications, 29(3):700–704,
October 2005.
[9] G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-
Hill, New York, 1983.
[10] C. van Rijsbergen. Information retrieval. Butterworth, London, 1979.
[11] M. Hahsler. recommenderlab: A Framework for Developing and Testing Recom-
mendation Algorithms. 2011.

4

Sociocast CF Benchmark

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Viewers also liked

Viewers also liked (17)

Similar to Sociocast CF Benchmark

Similar to Sociocast CF Benchmark (20)

Recently uploaded

Recently uploaded (20)

Sociocast CF Benchmark