The RecSys Challenge is a traditional competition among
Recommender Systems’ (RS) researchers. The 2014 edition is focused on predicting the amount of interaction achieved by tweets related to movies. In this paper, we present an approach to participate in the 2014 RecSys Challenge. Our approach consists of three steps: (i) using binary classification methods in order to split the tweets into two lists, those having user engagement equal to zero, and those having user engagement different from zero; (ii) each list is sorted through the use of regression methods; and (iii) is performed a concatenation of the two lists and a sort of tweets. To validate our approach we tested 126 configurations and verify that the settings using MovieTweetings dataset, Naïve Bayes classifier and Linear Regression, obtained the best results: nDCG@10 = 0.9037242.
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
A Recommender System for Predicting User Engagement in Twitter
1. Jonathas Magalhães2, Rubens Pessoa, Cleyton Souza, Evandro Costa, Joseana Fechine
INTRODUCTION
The 2014 RecSys Challenge [1] consists of ordering tweets shared by users on IMDb
according to the amount of interaction that they received. The interaction of a tweet is
defined by the sum of the number of retweets and favorites that it received.Our
objective is to present a contestant approach to the 2014 RecSys Challenge.
COMPOSING AND PRE-PROCESSING THE DATASET
OVERVIEW OF THE RECOMMENDER SYSTEM
CLASSIFICATION STEP
1 More information at http://www.grouptips.org.
2 Corresponding author, e-mail: jonathas@copin.ufcg.edu.br.
RECSYS CHALLENGE 2014
FEDERAL UNIVERSITY OF CAMPINA GRANDE
FEDERAL UNIVERSITY OF ALAGOAS
Intelligent, Personalized and Social Technologies Group1
A RECOMMENDER SYSTEM FOR PREDICTING
USER ENGAGEMENT IN TWITTER
REGRESSION STEP
REFERENCES
[1] A. Said, S. Dooms, B. Loni, and D. Tikk. Recommender systems challenge 2014. In Proceedings of
the eighth ACM conference on Recommender systems, RecSys ’14, New York, NY, USA, 2014. ACM.
[2] S. Dooms, T. De Pessemier, and L. Martens. Movietweetings: a movie rating dataset collected from
twitter. In Workshop on Crowdsourcing and Human Computation for Recommender Systems,
CrowdRec at RecSys 2013, 2013.
We use two datasets:
● The expanded MovieTweetings dataset [2] distributed by the organizers of the
challenge, with the following attributes: movie id, movie rating, crawled time, tweet
time, followers count, statuses count, favourites count and engagement.
● The IMDb dataset which consists of additional information about movies
referenced by tweets in order to complement the MovieTweetings dataset, with
the following attributes: IMDb rating, IMDb votes count, Movie year.
In this work we use three different regressors: Linear Regression, Pace Regression
and induction model trees algorithm M5Base that is an extension of the Quinlan’s
algorithm to the regression task.
Table 2: Regression models and their parameters.
Besides the models presented in Table 2, we implemented three methods to
combine them: Average, Median and Ranking.
Our approach is divided into three steps:
● Classification;
● Regression and;
● Ordering Results.
In the classification and regression steps we use the Weka API to train the models.
Figure 1: Overview of the Recommender System.
We use three classifiers, Naïve Bayes, Support Vector Machines (SVM) and the
Nearest Neighbor algorithm Ibk.
Table 1: Classification models and their parameters.
We also implement a classifier that combine them using Voting. In other words, an
instance will be classified in a given class if it has obtained the required majority of
the models presented.
Table 3 summarizes the factors and the levels used in each one. Considering the
factors and levels used, we have an experimental design with 2 * 7 * 9 = 126
treatments without replication. We use the metric normalized Discounted Cumulative
Gain (nDCG) to compare the methods.
Table 3: Experimental factors and their levels.
METHODOLOGY
Table 4 presents the NDCG@10 results of the ten best configurations of our approach.
Table 4: The nDCG@10 of the 10 best configurations.
RESULTS