Reputation Model Based on Rating Data and Application in Recommender Systems

Reputation Model Based On
Rating Data and Application in
Recommender Systems
PhD Final Seminar
Student Name: Ahmad Abdel-Hafez
Principle Supervisor: Assoc. Prof. Yue Xu
Associate Supervisor: Assoc. Prof. Dian Tjondronegoro
Panel Members:
Assoc. Prof. Yue Xu (Chair)
Assoc. Prof. Richi Nayak (EECS)
Dr. Ernest Foo (EECS)
Assoc. Prof. Dian Tjondronegoro (IS)

Presentation Contents
• Background on reputation models and
reputation-aware recommender systems.
• Research problems and objectives.
• Contributions.
• Normal distribution-based reputation model.
• Beta distribution-based reputation model.
• Reputation-aware recommender system.
• Evaluating the proposed methods
• Conclusions.

Background - Reputation Models
• Many websites nowadays provide rating
systems for customers in order to rate
available products.
• Reputation systems provide methods for
collecting and aggregating users’ opinions in
order to calculate products’ reputations.
• The Naïve aggregation method is the
arithmetic mean of ratings method.

Reputation System Components
Feedback
Collection
Reputation
Presentation
Reputation
Engine
User
Feedback
(Ratings)
Item
Reputation
Score

Reputation Engine
Detecting and
Filtering
Malicious
Ratings
Rating
Normalisation
- User Rating
Tendency
Relevant Weight
Calculation:
- User Trustworthiness
- User Reliability
- User Expertise
- Time decay
Reputation
Score
Discount or
Reward
Rating
Aggregation
User
Feedback
(Ratings)
Item
Reputation
Score

1. Weighted Mean:
– uses weight for ratings which represent:
• Reviewer reputation, expertise, or reliability [1,2,5,7].
• The time when the rating was given, assuming that
newer ratings should have more weight [1,3].
2. Bayesian Reputation Models:
– Jøsang and Haller [3] introduced a multinomial
Bayesian probability distribution reputation
system based on Dirichlet probability distribution.

3. Fuzzy Reputation Models:
– Bharadwaj and Al-Shamri [4] proposed a fuzzy
computational model for trust and reputation, which
uses fuzzy rules in order to aggregate the calculated
reputation values.
4. Other Reputation Models
– Flow Reputation Models: The Google PageRank
calculates page reputation using the number of links
to this page from other pages [5]
– Probabilistic reputation models: TRAVOS calculates
user trust based on past direct transactions between
users. [6]

Background - Reputation-Aware
Recommender Systems
• On the other hand, recommender systems are
widely used on websites that contain massive
number of elements where personalisation
becomes a necessity.
• Most of the available recommender systems
focus on users’ preferences and rarely involve
products reputation in the recommendation
process.

Literature Review – Reputation-Aware
Recommender Systems
1. Recently, Ku and Tai [15] investigated the effect of
recommender system and reputation system on
purchase intentions regarding recommended
products.
2. Jøsang et al. [11] Proposed Cascading Minimum
Common Belief Fusion method to combine
reputation scores with recommendation scores.
3. Most recently, Wang et al. [8] proposed a trust-
based probabilistic recommendation model. The
trust used is for products, which is obtained
based on product reputations and purchase
frequencies

Research Problems - Problem 1:
Sparse Dataset
• Sparse dataset is the dataset with the majority of
its items having small rating count.
• Reputation systems depend on the historical
feedback to generate reputation scores for items.
Therefore, when the available feedback is sparse
it becomes more difficult to produce accurate
reputation scores for items.
Research Question 1:
How to use statistical data in the rating aggregation process to
enhance the accuracy of reputation scores over sparse dataset?

Item Popularity
• An item is considered popular if it has large
ratings count in comparison to other items
ratings count in the same dataset.
• A reputation score is assumed to reflect the
popularity of an item, specifically; unpopular
items should not have high reputation scores.
How to reflect the item popularity, presented by the count of
ratings of an item in the reputation calculation process?

Reputation and Recommendation
• In most Collaborative Filtering (CF) based
recommender systems, items’ reputations
were not considered as part of the
recommendation process. And that may
produce a recommendation for an item with
low reputation score, which is not likely to be
consumed by the user.
How to incorporate item reputation into recommendation process to
provide more accurate product recommendations for users?

Research Objectives
• Objective 1: To provide a literature review
about the reputation system, and reputation-
aware recommender systems.
• Objective 2: To propose a new reputation
model that deals with the sparsity problem.
• Objective 3: To propose a new reputation
model which considers items popularity in the
item reputation score.

Research Objectives
• Objective 4: To propose a new method for
merging items reputation with the
recommender system generated lists.
• Objective 5: To conduct experiments and
evaluate the performance of the proposed
approaches.

Contributions- In Reputation Models
• We propose two novel reputation models:
– The normal distribution-based reputation model
with uncertainty (NDRU), which employs rating
level frequencies, and uncertainty factors and
produces more accurate reputation scores over
sparse datasets.
– The beta distribution based reputation model
(BetaDR), which uses the item relative ratings
count in its rating aggregation process and beats
the state-of-the-art methods in reputation scores
accuracy using several well-known datasets. .

Contributions- In Recommender
Systems
• We propose reputation-aware top-n
recommender system:
– We adopt the concept of voting systems and propose
the weighted Borda count (WBC) method to combine
item reputations with item recommendation scores.
– We propose using personalised reputation generated
list of items.
– We propose a method to calculate user coherence,
and use it to determine the contribution weights of
item reputations and recommendation scores.

Definitions
• Rating level represents the number of possible
rating values that can be assigned to a specific
item by a user.
• In the five stars rating system we have 5 rating
levels. More instances of rating levels 1 and 2
than rating level 4 and 5 indicate that the item is
not favoured by a larger number of customers.
• Usually the middle rating levels such as 3 in a
rating scale [1-5] system is the most frequent
rating level (we call these rating levels “Popular
Rating Levels”) and 1 and 5 are the least frequent
levels (we call these levels “Rare Rating Levels”).

Motivation
• Using weighted mean reputation models, if we
don’t consider other factors such as time and
user credibility, then the weight for each rating is
1
𝑛
, where 𝑛 is the number of ratings to an item.
Weights for 7 ratings
Ratings
Rating
Weight
Level
Weight
2 0.1429
0.5714
2 0.1429
2 0.1429
2 0.1429
3 0.1429 0.1429
5 0.1429
0.2857
5 0.1429
3.0

Normal Distribution-based
Reputation Model (NDR)
Weights for 100 ratings Unified Weights for 100 ratings with each score has same frequency (=20)
• Our method can be described as weighted mean;
where the weights will be generated using normal
distribution function.

Rationale
• We propose to ‘award’
higher frequent rating
levels and popular
rating levels, and
‘punish’ lower
frequent rating levels
and rare rating levels.
Ratings
Rating Weight Level Weight
Naïve NDR Naïve NDR
2 0.1429 0.0765
0.5714 0.604
2 0.1429 0.1334
2 0.1429 0.1861
2 0.1429 0.208
3 0.1429 0.1861 0.1429 0.1861
5 0.1429 0.1334
0.2857 0.2099
5 0.1429 0.0765
3.0 2.8158

Weighting Based on the Normal
Distribution
• The weights to the ratings will be calculated using
the normal distribution density function
• where 𝑎𝑖 is the weight for the rating at index 𝑖 ,
𝑖 = 0, … , 𝑛 − 1 , 𝜇 is the mean, 𝜎 is the standard
deviation and . 𝑘 is the number of levels in the
rating system
𝑎𝑖 =
1
𝜎 2𝜋
𝑒
−
𝑥 𝑖−𝜇
2
2𝜎2
𝑥𝑖 =
(𝑘−1)×𝑖
𝑛−1
+ 1

Calculating The NDR Reputation Score
• The final reputation score is calculated as
weighted mean for each rating level. Where
𝐿𝑊 𝑙
is called level weight
𝑤𝑖 =
𝑎𝑖
𝑗=0
𝑛−1
𝑎𝑗
,
𝑖=0
𝑛−1
𝑤𝑖 = 1
𝑁𝐷𝑅 𝑝 =
𝑙=1
𝑘
𝑙 × 𝐿𝑊 𝑙
𝐿𝑊 𝑙
=
𝑗=0
𝑅 𝑙 −1
𝑤𝑗
𝑙

Enhanced NDR model by adding
uncertainty (NDRU)
• We do a slight modification to our proposed
NDR method by combining uncertainty factor,
which is important to deal with sparse
dataset.
𝑁𝐷𝑅𝑈 𝑝1 =
𝑙=1
𝑘
𝑙 ×
𝑛 × 𝐿𝑊 𝑙
+ 𝐶 × 𝑏
𝐶 + 𝑛
• 𝐶 = 2 is a priori constant and 𝑏 =
1
𝑘
is a base
rate for any of the 𝑘 rating values.

Motivation - Item Popularity Problem
• Less popular items (item 1) is expected to
have lower reputation scores than popular
items (item 2) with similar ratings distribution.
Rating
Frequency
Item 1 Item 2
1 1 25
2 1 25
3 1 25
4 1 25
5 5 125
Count 9 225
Mean 3.89 3.89
NDR 4.089 4.052
Median 5 5

Beta Distribution-Based
Reputation Model (BetaDR)
• A weighted mean method where the weights
will be generated using the beta distribution
probability density function.
• The beta distribution is flexible to produce
different distribution shapes which reflect the
weighting tendency suitable to every item.
• The proposed model considers statistical
information of the dataset, including the
ratings count.

Weighting Using the BetaDR
• The weights to the ratings will be calculated
using the beta-distribution probability density
function
• Equation (2) is used to evenly deploy the
values of 𝑥𝑖; providing that 0 < 𝑥𝑖< 1, where
𝑥0 = 0.01 and 𝑥 𝑛−1 = 0.99.
Beta(𝑥𝑖) =
Γ 𝛼+𝛽
Γ 𝛼 Γ 𝛽
𝑥𝑖
𝛼−1
1 − 𝑥𝑖
𝛽−1
(1)
𝑥𝑖 =
0.98×𝑖
𝑛−1
+ 0.01 (2)

Symmetric vs Asymmetric Shapes
• The symmetric shapes will ensure the reputation
model is fair and unbiased. Occurs when ( = )
• The chart bellow represents examples of the beta
distributions, where  and  > 1
0
0.5
1
1.5
2
2.5
3
1 2 3 4 5 6 7
Index of ratings
Shape 1
Shape 2
Shape 3
𝛼
< 𝛽
𝛼
= 𝛽
𝛼
> 𝛽

Symmetric Beta-Shapes
• The different symmetric shapes of the Beta distribution
• The x-axis represents ratings indexes and y-axis
represents weights.
0
0.04
0.08
0.12
0.16
1 3 5 7 9 11 13 15 17 19
Beta Distribution Probability Density Function (PDF)
U Shape
(α=ß=0.5)
Universal
Distribution
(α=ß=1)
Bell Shape
(α=ß=5)

Shape Parameters Equation
𝛼 = 𝛽 =
𝜇
𝜎
2
× 𝐼𝑅𝑅𝐶 , 𝜎 ≠ 0
𝐼𝑅𝑅𝐶 , 𝜎 = 0
(3)
• We use symmetric shape for the beta
distribution all the times. 𝛼 = 𝛽
• 𝛼 = 𝛽 < 1 ∶ The Beta-distribution will
generate “U” shape
• 𝛼 = 𝛽 > 1 ∶ The Beta-distribution will
generate “Bell” shape

IRRC
• 𝐼𝑅𝑅𝐶 (Item Relative Rating Count): is a ratio of
the ratings count of item 𝑃 to the average of
ratings for all items in the dataset.
𝐼𝑅𝑅𝐶 =
𝑛 𝑖
𝑛 𝑖
, 𝑛𝑖 = 𝑖∈𝑃 𝑛 𝑖
𝑃

Weights from Beta Distribution-based
Reputation Model (BetaDR)

Example
Rating
Level
Item 1 Item 2
Frequency
Weights per level
Frequency
Weights per level
Avg NDR BetaDR Avg NDR BetaDR
1 1 0.111 0.060 0.396 25 0.111 0.069 0.025
2 1 0.111 0.091 0.043 25 0.111 0.096 0.083
3 1 0.111 0.123 0.027 25 0.111 0.122 0.134
4 1 0.111 0.147 0.022 25 0.111 0.140 0.168
5 5 0.556 0.578 0.511 125 0.556 0.573 0.590
Aggregate 9 3.89 4.089 3.206 225 3.89 4.052 4.215

• Recommenders focus on generating personalized
results, without perceiving the global opinions of
users about the recommended items.
• The reputation of an item reflects the quality of
an item, which could affect a user opinion.
• We propose a method to combine the
recommender and reputation systems to
enhance the accuracy of the top-n recommender
results.
Reputation-Aware Recommender
System

A Block Diagram of the Proposed
Reputation-Aware Recommender System
User
Profiles
User Profile
Ratings
Users Similarity
Generating Nearest
Neighbours
Historical Data
Personalised Items’
Reputations Recommendation
ranked list of items
Combined list of Item
Recommendations
Item Reputations
Item
Profiles
Item Profile
Ratings
Reputation ranked
list of items
Item Clustering

Personalized Item Reputation
• The top items on the reputation-based list are not
necessarily the items that a particular user likes.
• We cluster items based on user ratings or based
on item categories.
• The personalized reputation is defined as the
degrading process for all the items in the
reputation-ranked list that do not belong to the
user preference.
𝑃𝐼𝑅 𝑢,𝑝 =
𝑆(𝑝), 𝑝 ∈ 𝐶𝑖, 𝐶𝑖 ∈ 𝐹𝑢
0, 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Weighted Borda-Count
• The BC method gives each item a number of
points corresponding to his rank in each list.
• We introduce a weight to the traditional BC
method to emphasize difference between the
two lists.
𝑊𝐵𝐶 𝑝 = 𝜔 × 𝐵𝐶𝑟𝑒𝑐 𝑝 + 1 − 𝜔 × 𝐵𝐶𝑟𝑒𝑝 𝑝
𝐵𝐶(𝑝) = 𝐵𝐶𝑟𝑒𝑐(𝑝) + 𝐵𝐶𝑟𝑒𝑝(𝑝)

WBC Example
WBC Example with 𝜔 = 0.7

User Coherence
ω 𝑢 =
𝑐∈𝐶
𝑝∈𝐼(𝑢,𝑐) 𝑟𝑢,𝑝 − 𝑟𝑢,𝑐
2
𝐼(𝑢, 𝑐)
𝐶
𝐶𝑜ℎ𝑒𝑟𝑒𝑛𝑐𝑒 𝑢 = −
𝑐∈𝐶
𝑝∈𝐼(𝑢,𝑐)
𝑟𝑢,𝑝 − 𝑟𝑢,𝑐
2
• In order to define the value of ω which will
enhance the accuracy of the recommender
system, we propose to use user coherence.
• User coherence is defined as the stability of user
ratings levels given to items with the same
category [14].
• A coherent user has no problem with getting
accurate recommendations, unlike incoherent
users

Evaluating the Proposed Reputation
Models
• Hypothesis 1: Embedding item relative rating
count, and the standard deviation of items
ratings in calculating items reputations will
produce more accurate reputation scores
using dense datasets.
• Hypothesis 2: Using uncertainty parameter,
alongside with the rating levels frequencies in
calculating items reputations will produce
more accurate reputation scores when the
dataset is sparse.

Experiment 1: Ratings prediction
• The first experiment is to predict an item rating
using the item reputation score.
• The hypothesis is that the more accurate the
reputation model the closer the scores it
generates to actual users’ ratings.
• The mean absolute error (MAE) metric will be
used to measure the prediction accuracy.
𝑀𝐴𝐸 =
1
𝑃 𝑖=1
𝑚 𝑟∈𝑅 𝑝
𝑟𝑝 − 𝑟
𝑅 𝑝

Experiment 2: Item Ranking List
Similarity
• We compare two lists of items ranked based on
their reputation scores generated using different
methods.
• 𝑛 𝑑 and 𝑛 𝑐 is the number of discordant and
concordant pairs between the two lists
respectively.
𝜏 =
𝑛 𝑐 − 𝑛 𝑑
1
2
𝑛 𝑛 − 1
𝑛 𝑑 = 𝑖, 𝑗 𝐴 𝑖 < 𝐴 𝑗 , 𝐵 𝑖 > 𝐵(𝑗)}
𝑛 𝑐 = 𝑖, 𝑗 𝐴 𝑖 < 𝐴 𝑗 , 𝐵 𝑖 < 𝐵(𝑗)}

Experiment 3: Item Ranking Accuracy
• The IMDb website provides a special
calculation for the top-250 movies of all time.
• We compare the top-250 movies generated by
each one of the implemented reputation
models with the IMDb produced top-250
movies
𝐴𝑃 =
𝑖∈𝐶 𝑃@𝑖
𝐸
𝐺 𝑝@𝑡 =
5 × 1 −
𝐼 𝑝,𝑖𝑚𝑑𝑏 − 𝐼 𝑝,𝑟𝑒𝑝
𝑡
, 𝑝 ∈ Top−t𝑖𝑚𝑑𝑏
0, 𝑝 ∉ Top−t𝑖𝑚𝑑𝑏
𝐷𝐶𝐺 𝑝@𝑡 =
𝑥=1
𝑡
2 𝐺 𝑝@𝑡
− 1
log2 𝑥 + 1
𝑛𝐷𝐶𝐺 𝑝@𝑡 =
𝐷𝐶𝐺 𝑝@𝑡
𝐼𝐷𝐶𝐺 𝑝@𝑡

Experiment 4: Reputation-Aware
Recommender Accuracy
• We implement the traditional
user-based CF as the top-n
recommender system.
• The method we use to
combine the reputation
models with recommender
system is the proposed
weighted Borda count method
(WBC) presented in Chapter 5.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
Relevant ⋂ Recommended
Recommended
𝑅𝑒𝑐𝑎𝑙𝑙 =
Relevant ⋂ Recommended
Relevant
𝐹1−𝑆𝑐𝑜𝑟𝑒 = 2 ×
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
Merging Method
Reputation Model
Recommender
System

Baseline Models
• Naïve method: arithmetic mean.
• IMDb: a true Bayesian estimation.
• Dirichlet reputation model [3] (2007).
• Fuzzy reputation model [5] (2009).
• PerContRep model [7] (2014).
• Trusti Model [8] (2015).

Datasets
• Dense
• Sparse
Dataset #Users #Items #Ratings #Rating Levels
Average of Ratings Count
Per Item (ARCPI)
ML-100K 943 1682 100,000 5 59.453
ML-1M 6,040 3,952 1,000,209 5 253.089
ML-10M 71,567 10,681 10,000,054 10 936.246
IMDb - 13,479 385,581,991 10 28,606.127
Dataset #Users #Items #Ratings
#Rating
Levels
Sparsity ARCPI
Book Crossing
Dataset
17,854 113,481 277,906 10 0.99998 2.44892
Only 4 ratings per
movie (4RPM)
1361 3,706 14261 5 0.99717 3.84808
Only 6 ratings per
movie (6RPM)
1760 3,706 21054 5 0.99677 5.68105
Only 8 ratings per
movie (8RPM)
2098 3,706 27723 5 0.99643 7.48057

Results of Rating Prediction Experiment
Method/Dataset ML-100K ML-1M ML-10M
Naïve 0.796652 0.763668 1.437024
IMDb 0.781558 0.748331 1.401215
Fuzzy 0.795871 0.761167 1.430607
Dirichlet 0.796720 0.764013 1.437291
PerContRep 0.786162 0.752248 1.410826
Trusti 0.783254 0.750015 1.409872
NDR 0.791346 0.756602 1.423804
NDRU 0.790925 0.756756 1.424101
BetaDR 0.770330 0.732876 1.372640
Method/Dataset 4RPM 6RPM 8RPM Book Crossing Dataset
Naïve 0.5577 0.5610 0.5720 1.6957
IMDb 0.5601 0.5618 0.5721 1.6834
Fuzzy 0.5583 0.5628 0.5736 1.6922
Dirichlet 0.5351 0.5514 0.5705 1.6159
PerContRep 0.5602 0.5621 0.5728 1.6904
Trusti 0.5604 0.5622 0.5729 1.6908
NDR 0.5575 0.5598 0.5693 1.6874
NDRU 0.5339 0.5498 0.5676 1.5924
BetaDR 0.5571 0.5592 0.5689 1.6826

Results of Item Ranking List Similarity Experiment
0
0.2
0.4
0.6
0.8
1
0% 20% 40% 60% 80% 100%
Top X% of the ranked lists used in similarity calculation
NDR
NDRU
BetaDR
0
0.2
0.4
0.6
0.8
1
0% 20% 40% 60% 80% 100%
NDR
NDRU
BetaDR
0
0.2
0.4
0.6
0.8
1
0% 20% 40% 60% 80% 100%
NDR
NDRU
BetaDR
0
0.2
0.4
0.6
0.8
1
0% 20% 40% 60% 80% 100%
NDR
NDRU
BetaDR
Kendall similarities with the naïve method using 100K-ML dataset
Kendall similarities with the naïve method using 100K-ML dataset
Kendall similarities with the Trusti method using 8RPM dataset
Kendall similarities with the Trusti method using 8RPM dataset

Method/Metric 𝑷@𝟏𝟎 𝑷@𝟓𝟎 𝑷@𝟏𝟎𝟎 𝑷@𝟐𝟓𝟎 𝑨𝑷 𝒏𝑫𝑪𝑮
@𝟏𝟎
𝒏𝑫𝑪𝑮
@𝟓𝟎
𝒏𝑫𝑪𝑮
@𝟏𝟎𝟎
𝒏𝑫𝑪𝑮
@𝟐𝟓𝟎
Naïve 0.1 0.12 0.23 0.304 0.2262 0.0009 0.0045 0.0129 0.0403
Fuzzy 0.1 0.12 0.25 0.312 0.2312 0.0009 0.0042 0.0110 0.0397
Dirichlet 0.1 0.2 0.27 0.336 0.2620 0.0009 0.0082 0.0185 0.0511
PerContRep 0.1 0.12 0.25 0.308 0.2345 0.0009 0.0045 0.0140 0.0417
Trusti 0.7 0.82 0.77 0.756 0.7671 0.5600 0.6212 0.5989 0.5822
NDR 0.1 0.1 0.19 0.28 0.1972 0.0009 0.0038 0.0100 0.0337
NDRU 0.1 0.14 0.19 0.308 0.2217 0.0009 0.0054 0.0114 0.0407
BetaDR 0.2 0.24 0.45 0.432 0.3712 0.1327 0.0277 0.0686 0.1133
Results of Item Ranking Accuracy Experiment
Method/Metric 𝑷@𝟏𝟎 𝑷@𝟓𝟎 𝑷@𝟏𝟎𝟎 𝑷@𝟐𝟓𝟎 𝑨𝑷 𝒏𝑫𝑪𝑮
@𝟏𝟎
𝒏𝑫𝑪𝑮
@𝟓𝟎
𝒏𝑫𝑪𝑮
@𝟏𝟎𝟎
𝒏𝑫𝑪𝑮
@𝟐𝟓𝟎
Naïve 0.0 0.0 0.0 0.0156 0.00288 0.0 0.0 0.0 0.00001
Fuzzy 0.0125 0.0185 0.02825 0.0758 0.02652 0.00121 0.00329 0.01642 0.09841
Dirichlet 0.365 0.575 0.5585 0.5882 0.56438 0.05065 0.16393 0.19453 0.24182
PerContRep 0.125 0.385 0.4515 0.5032 0.42906 0.00371 0.04145 0.07726 0.12991
Trusti 0.7 0.642 0.353 0.2604 0.41111 0.55510 0.50833 0.30792 0.17065
NDR 0.005 0.0055 0.00625 0.0746 0.02312 0.0 0.0 0.0 0.00088
NDRU 0.395 0.598 0.573 0.593 0.57775 0.05813 0.17051 0.19782 0.26946
BetaDR 0.095 0.02 0.0115 0.1282 0.03529 0.05877 0.01245 0.00635 0.00829

Used Reputation
Method with CF
ML-100K ML-1M ML-10M
Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score
N/A 0.0236 0.0412 0.0300 0.0301 0.0327 0.0314 0.0379 0.0306 0.0339
IMDb 0.0305 0.0532 0.0388 0.0369 0.0596 0.0456 0.0401 0.0718 0.0515
Fuzzy 0.0283 0.0494 0.0359 0.0271 0.0399 0.0323 0.0311 0.0407 0.0353
Dirichlet 0.0297 0.0519 0.0377 0.0351 0.0539 0.0425 0.0382 0.0523 0.0442
PerContRep 0.0282 0.0491 0.0358 0.0264 0.0391 0.0316 0.0304 0.0389 0.0341
Trusti 0.0301 0.0519 0.0381 0.0360 0.0561 0.0439 0.0395 0.0681 0.0500
NDR 0.0286 0.0494 0.0362 0.0289 0.0439 0.0349 0.0349 0.0442 0.0390
NDRU 0.0297 0.0518 0.0377 0.0352 0.0537 0.0425 0.0385 0.0533 0.0447
BetaDR 0.0307 0.0540 0.0392 0.0380 0.0643 0.0478 0.0412 0.0791 0.0542
4RPM 6RPM 8RPM
Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score
N/A 0.0080 0.1459 0.0152 0.0088 0.1575 0.0167 0.0096 0.1601 0.0181
IMDb 0.0067 0.0747 0.0123 0.0078 0.1027 0.0145 0.0088 0.1396 0.0166
Fuzzy 0.0071 0.1319 0.0135 0.0084 0.1445 0.0159 0.0096 0.1472 0.0180
Dirichlet 0.0082 0.1468 0.0155 0.0095 0.1593 0.0179 0.0105 0.1629 0.0197
PerContRep 0.0075 0.1362 0.0142 0.0087 0.1481 0.0164 0.0098 0.1506 0.0184
Trusti 0.0043 0.0486 0.0079 0.0064 0.0826 0.0119 0.0083 0.0967 0.0153
NDR 0.0080 0.1461 0.0152 0.0090 0.1581 0.0170 0.0102 0.1611 0.0192
NDRU 0.0083 0.1479 0.0157 0.0097 0.1601 0.0183 0.0111 0.1646 0.0208
BetaDR 0.0081 0.1461 0.0154 0.0093 0.1579 0.0176 0.0103 0.1620 0.0194
Results of Reputation-aware Recommender Accuracy Experiment

Evaluating the Reputation-Aware
Recommender System
• Hypothesis 3: The accuracy of recommender
systems can be enhanced by combining the
item reputation factor using WBC as a merging
method. The accuracy enhancement takes
place with both sparse and dense datasets.

Recommender Systems and
Reputation Models Employed
• We use the BetaDR reputation model for the
dense dataset evaluation part. In the sparse
evaluation part we used the NDRU reputation
model.
• We use two recommender systems for the top-n
recommendation experiment:
– The traditional user-based CF [12].
– The reliability-aware recommender system [13].
Merging Method
Reputation Model
Recommender
System

Datasets
• For the dense datasets, we used the ML-100K
and the ML-1M datasets described before.
MovieLens 5% (ML5) MovieLens 10% (ML10)
Number of ratings 6,515 13,077
Sparsity 0.99589 0.99175
Minimum number of ratings per
user
5 10
Maximum number of ratings per
user
36 73
Average number of ratings per
user
6.849 13.867
Minimum number of ratings per
movie
0 0
Maximum number of ratings per
movie
59 114
Average number of ratings per
movie
3.840 7.774

Baseline Methods
• Borda Count Method (BC) [9].
• Coombs Method [10]
• Baldwin method [9].
• Proportional Representation (Naïve method)
• CasMin Method [11] (2013).
• A Trust-Based Probabilistic Recommendation
Model (Trusti) [8] (2015).

Results on Dense Dataset
Method Used to Combine Reputation with TCF ML-100K ML-1M
Precision Recall F1-score Precision Recall F1-score
TCF 0.0257 0.0446 0.0326 0.0238 0.0281 0.0258
Naïve (PR) 0.0218 0.0391 0.0280 0.0206 0.0244 0.0223
BC 0.0455 0.0728 0.0560 0.0401 0.0585 0.0476
Baldwin 0.0295 0.0522 0.0377 0.0297 0.0388 0.0337
Coombs 0.0293 0.0512 0.0373 0.0286 0.0383 0.0328
CasMin 0.0078 0.0154 0.0104 0.0061 0.0141 0.0085
Trusti
0.0326 0.0543 0.0407 0.0340 0.0460 0.0391
WBC 0.0476 0.0832 0.0606 0.0412 0.0594 0.0486
WBC-P 0.0624 0.1199 0.0820 0.0447 0.0698 0.0545
Method Used to Combine Reputation with RA-
CF
ML-100K ML-1M
RA-CF 0.0338 0.0555 0.0420 0.0272 0.0331 0.0299
Naïve (PR) 0.0289 0.0473 0.0359 0.0248 0.0314 0.0277
BC 0.0459 0.0695 0.0553 0.0318 0.0474 0.0380
Baldwin 0.0386 0.0630 0.0479 0.0296 0.0418 0.0346
Coombs 0.0385 0.0628 0.0477 0.0266 0.0409 0.0322
CasMin 0.0092 0.0169 0.0119 0.0075 0.0183 0.0106
Trusti
0.0413 0.0655 0.0506 0.0352 0.0480 0.0406
WBC 0.0506 0.0786 0.0616 0.0397 0.0663 0.0497
WBC-P 0.0626 0.1195 0.0821 0.0423 0.0738 0.0537

Results on Sparse Dataset
Method Used to Combine Reputation
with TCF
ML5 ML10
TCF 0.0023 0.0410 0.0044 0.0028 0.0265 0.0051
Naïve (PR) 0.0012 0.0187 0.0022 0.0023 0.0218 0.0041
BC 0.0056 0.0985 0.0106 0.0074 0.0832 0.0136
Baldwin 0.0030 0.0516 0.0057 0.0031 0.0308 0.0057
Coombs 0.0030 0.0515 0.0056 0.0030 0.0303 0.0055
CasMin 0.0004 0.0046 0.0007 0.0005 0.0051 0.0009
Trusti
0.0029 0.0517 0.0056 0.0029 0.0288 0.0053
WBC 0.0059 0.0999 0.0111 0.0078 0.0888 0.0143
WBC-P 0.0211 0.3837 0.0400 0.0219 0.2531 0.0403
Method Used to Combine Reputation
with RA-CF
ML5 ML10
RA-CF 0.0229 0.4137 0.0435 0.0250 0.3009 0.0462
Naïve (PR) 0.0223 0.4034 0.0423 0.0236 0.2850 0.0436
BC 0.0095 0.1708 0.0180 0.0132 0.1542 0.0244
Baldwin 0.0223 0.4021 0.0422 0.0247 0.2966 0.0455
Coombs 0.0222 0.4020 0.0420 0.0242 0.2956 0.0447
CasMin 0.0012 0.0187 0.0023 0.0019 0.0234 0.0035
Trusti
0.0204 0.4071 0.0389 0.0216 0.3055 0.0403
WBC 0.0230 0.4217 0.0436 0.0255 0.3019 0.0470
WBC-P 0.0245 0.4397 0.0464 0.0337 0.3975 0.0621

Conclusions
• The first reputation model we propose is the NDRU
model.
– It uses the normal distribution to generate ratings weights.
– It employs the uncertainty factor.
– It is more accurate when used with sparse datasets.
• The second reputation model we propose is the
BetaDR model.
– It uses the beta distribution to generate the weights for
the ratings.
– It uses the item relative rating count (IRRC), and the
standard deviation of item’s ratings and its mean.
– This model proved to produce more accurate results when
used with dense dataset.

Conclusions (Cont.)
• We propose the weighted Borda count (WBC)
method to combine reputation scores with
recommender systems in order to enhance the
accuracy of the top-n recommendations.
• We propose to generate personalized reputation
scores for each user to purify reputation scores
and make them useful in recommender systems.
• We noticed that merging reputation models with
recommenders has the potential to enhance the
recommendation accuracy.

Relevant Publications
• Journal Papers
1. Abdel-Hafez, A., & Xu, Y. (2013). A survey of user modelling in social media websites. In Computer and Information
Science, 6(4), pp. 59-71.
2. Abdel-Hafez, A., Xu, Y., & Jøsang, A. (2015). A normal-distribution based rating aggregation method for generating
product reputations. In Web Intelligence. 13(1), pp. 43-51. IOS Press.
• Conference Papers
3. Abdel-Hafez, A., Xu, Y., & Tjondronegoro, D. (2012). Product reputation model: an opinion mining based approach.
Paper presented at the 1st International Workshop on Sentiment Discovery from Affective Data (SDAD’12), p16-20,
CEUR workshop.
4. Abdel-Hafez, A., & Xu, Y. (2013). Ontology-based product's reputation model. Paper presented at the 2013
IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT),
pp. 37-40. IEEE.
5. Abdel-Hafez, A., Tang, X., Tian, N., & Xu, Y. (2014). A reputation-enhanced recommender system. Paper presented
at Advanced Data Mining and Applications (ADMA’14), pp. 185-198. Springer International Publishing.
6. Abdel-Hafez, A., Xu, Y., & Jøsang, A. (2014). A normal-distribution based reputation model. Paper presented at the
Trust, Privacy, and Security in Digital Business (TrustBus’14), pp. 144-155. Springer International Publishing.
7. Abdel-Hafez, A., Phung, Q. V., & Xu, Y. (2014). Utilizing voting systems for ranking user tweets. Paper presented at
the 2014 Recommender Systems Challenge (RecSysChallenge’14), pp. 23-28. ACM.
8. Abdel-Hafez, A., Xu, Y., & Tian, N. (2014). Item reputation-aware recommender systems. Paper presented at the
16th International Conference on Information Integration and Web-based Applications & Services (iiWAS’14), pp.
79-86. ACM.
9. Abdel-Hafez, A., Xu, Y., & Jøsang, A. (2014). A rating aggregation method for generating product reputations. Paper
presented at the 25th ACM conference on Hypertext and social media (Hypertext’14), pp. 291-293.ACM.
10. Abdel-Hafez, A., Xu, Y., & Jøsang, A. (2015). An accurate rating aggregation method for generating item reputation.
Paper to be presented at the 2015 IEEE International Conference on Data Science and Advanced Analytics
(DSAA'15). IEEE. (Accepted)
11. Abdel-Hafez, A., & Xu, Y. (2015). Exploiting the beta distribution-based reputation model in recommender system.
Paper to be presented at the 28th Australasian Joint Conference on Artificial Intelligence. (Accepted)

References
[1] Ayday, E., Lee, H., & Fekri, F. (2009). An iterative algorithm for trust and reputation management. In IEEE International Symposium on
Information Theory (ISIT), 2051-2055, IEEE.
[2] Riggs, T., & Wilensky, R. (2001). An algorithm for automated rating of reviewers. Paper presented at the Proceedings of the 1st
ACM/IEEE-CS joint conference on Digital libraries.
[3] A. Jøsang and J. Haller, "Dirichlet reputation systems," in Availability, Reliability and Security, 2007. ARES 2007. The Second International
Conference on, 2007, pp. 112-119.
[4] Bharadwaj, K. K., & Al-Shamri, M. Y. H. (2009). Fuzzy computational models for trust and reputation systems. Electronic Commerce
Research and Applications, 8(1), 37-47.
[5] Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: bringing order to the web.
[6] Teacy, W. L., Patel, J., Jennings, N. R., & Luck, M. (2006). Travos: Trust and reputation in the context of inaccurate information sources.
Autonomous Agents and Multi-Agent Systems, 12(2), 183-198.
[7] Yan, Z., Chen, Y., & Shen, Y. (2014). PerContRep: a practical reputation system for pervasive content services. The Journal of
Supercomputing, 70(3), 1051-1074.
[8] Wang, Y., Yin, G., Cai, Z., Dong, Y., & Dong, H. (2015). A trust-based probabilistic recommendation model for social networks. Journal of
Network and Computer Applications, 55(2015), 59-67.
[9] De Grazia, A. (1953). Mathematical derivation of an election system. Isis, 44(1/2), 42-51.
[10] Coombs, C. H. (1964). A theory of data. New York: Wiley.
[11] Jøsang, A., Pini, M. S., Santini, F., & Xu, Y. (2013). Combining Recommender and Reputation Systems to Produce Better Online Advice.
Modeling Decisions for Artificial Intelligence, 8234, 126-138.
[12] Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2000). Analysis of recommendation algorithms for e-commerce. In Proceedings of the
2nd ACM Conference on Electronic Commerce, 158-167, ACM.
[13] Hernando, A., Bobadilla, J., Ortega, F., & Tejedor, J. (2013). Incorporating reliability measurements into the predictions of a
recommender system. Information Sciences, 218(2013), 1-16.
[14] Said, A., Jain, B. J., Narr, S., & Plumbaum, T. (2012). Users and noise: The magic barrier of recommender systems User Modeling,
Adaptation, and Personalization (pp. 237-248): Springer International Publishing.
[15] Ku, Y.-C., & Tai, Y.-M. (2013). What Happens When Recommendation System Meets Reputation System? The Impact of
Recommendation Information on Purchase Intention. In System Sciences (HICSS), 2013 46th Hawaii International Conference on,
1376-1383, IEEE.6

Reputation Model Based on Rating Data and Application in Recommender Systems

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Reputation Model Based on Rating Data and Application in Recommender Systems

Similar to Reputation Model Based on Rating Data and Application in Recommender Systems (20)

Recently uploaded

Recently uploaded (20)

Reputation Model Based on Rating Data and Application in Recommender Systems