Exploring Author Gender in Book Rating and Recommendation
M. D. Ekstrand, M. Tian, M. R. I. Kazi, H. Mehrpouyan, and D. Kluver
https://doi.org/10.1145/3240323.3240373
Exploring Author Gender in Book Rating and Recommendation
M. D. Ekstrand, M. Tian, M. R. I. Kazi, H. Mehrpouyan, and D. Kluver
https://doi.org/10.1145/3240323.3240373
4.
4
RecSys ’18, October 2–7, 2018, Vancouver, BC, Canada
u
unu
µ
¯ua ua
¯ua ¯nua
ba
sa a
u 2 U
a 2 A
u u a
5.
5
RecSys ’18, October 2–7, 2018, Vancouver, BC, Can
u
unu
µ
¯ua
¯ua
ba
sa
u 2 U
Binomial(nu, θu)NegBinomial(ν, γ)
logit(θu) Normal(μ, σ)
6.
6
ober 2–7, 2018, Vancouver, BC, Canada
u
u
µ
¯ua ua
¯ua ¯nua
ba
sa a
a 2 A
Table
Variab
n
¯nu
¯u
logit( ) Normal( + logit( ), 2)<latexit sha1_base64="WiSy2qnMJnJn/Jh+eBgx0ac955E=">AAADb3ichVLPaxNBFP6a1Vrrj8Z6UBBkMEQSlDIJQqWnohe9SJs0baBbl9l1mizdX+xMQuuy/4AXjx68qOBB/DO8ePLmoX+CeJIW9GDFt5sNakvrLDvz5pv3fe/Nm2dHnqs057sTJePU6ckzU2enz52/cHGmfGl2VYWD2JEdJ/TCuGsLJT03kB3tak92o1gK3/bkmr11PztfG8pYuWGwonciueGLXuBuuo7QBFnlvukL3Y/9xAt7rk5rpi3ixNR9qUVqJQOR1pmpXJ+N3R6FsS+8tGZbgt1iiubDAjnXGtRvs4SYPV9YIn3crFvlCp/j+WBHjUZhVFCMpbD8AyaeIISDAXxIBNBkexBQ9K2jAY6IsA0khMVkufm5RIpp4g7IS5KHIHSL5h7t1gs0oH2mqXK2Q1E8+mNiMlT5Z/6O7/GP/D3/wn8eq5XkGlkuO7TaI66MrJlnV9vf/8vyadXo/2GdmLPGJu7mubqUe5Qj2S2cEX/49MVee6FVTW7yN/wr5f+a7/IPdINguO+8XZatl6ReBXt+0HrVmjwhUkBVyJS3i7qqvKrbaI7i0D/Om1FGYf4mC2S3sYKH6P6FHn//scK4blntVfZm1CKNww1x1FhtzjXIXr5TWbxXNMsUruEGatQR81jEAyyhQxE/YR8H+FX6Zlwxrhts5FqaKDiX8c8w6r8BoCnTFQ==</latexit><latexit sha1_base64="WiSy2qnMJnJn/Jh+eBgx0ac955E=">AAADb3ichVLPaxNBFP6a1Vrrj8Z6UBBkMEQSlDIJQqWnohe9SJs0baBbl9l1mizdX+xMQuuy/4AXjx68qOBB/DO8ePLmoX+CeJIW9GDFt5sNakvrLDvz5pv3fe/Nm2dHnqs057sTJePU6ckzU2enz52/cHGmfGl2VYWD2JEdJ/TCuGsLJT03kB3tak92o1gK3/bkmr11PztfG8pYuWGwonciueGLXuBuuo7QBFnlvukL3Y/9xAt7rk5rpi3ixNR9qUVqJQOR1pmpXJ+N3R6FsS+8tGZbgt1iiubDAjnXGtRvs4SYPV9YIn3crFvlCp/j+WBHjUZhVFCMpbD8AyaeIISDAXxIBNBkexBQ9K2jAY6IsA0khMVkufm5RIpp4g7IS5KHIHSL5h7t1gs0oH2mqXK2Q1E8+mNiMlT5Z/6O7/GP/D3/wn8eq5XkGlkuO7TaI66MrJlnV9vf/8vyadXo/2GdmLPGJu7mubqUe5Qj2S2cEX/49MVee6FVTW7yN/wr5f+a7/IPdINguO+8XZatl6ReBXt+0HrVmjwhUkBVyJS3i7qqvKrbaI7i0D/Om1FGYf4mC2S3sYKH6P6FHn//scK4blntVfZm1CKNww1x1FhtzjXIXr5TWbxXNMsUruEGatQR81jEAyyhQxE/YR8H+FX6Zlwxrhts5FqaKDiX8c8w6r8BoCnTFQ==</latexit><latexit sha1_base64="WiSy2qnMJnJn/Jh+eBgx0ac955E=">AAADb3ichVLPaxNBFP6a1Vrrj8Z6UBBkMEQSlDIJQqWnohe9SJs0baBbl9l1mizdX+xMQuuy/4AXjx68qOBB/DO8ePLmoX+CeJIW9GDFt5sNakvrLDvz5pv3fe/Nm2dHnqs057sTJePU6ckzU2enz52/cHGmfGl2VYWD2JEdJ/TCuGsLJT03kB3tak92o1gK3/bkmr11PztfG8pYuWGwonciueGLXuBuuo7QBFnlvukL3Y/9xAt7rk5rpi3ixNR9qUVqJQOR1pmpXJ+N3R6FsS+8tGZbgt1iiubDAjnXGtRvs4SYPV9YIn3crFvlCp/j+WBHjUZhVFCMpbD8AyaeIISDAXxIBNBkexBQ9K2jAY6IsA0khMVkufm5RIpp4g7IS5KHIHSL5h7t1gs0oH2mqXK2Q1E8+mNiMlT5Z/6O7/GP/D3/wn8eq5XkGlkuO7TaI66MrJlnV9vf/8vyadXo/2GdmLPGJu7mubqUe5Qj2S2cEX/49MVee6FVTW7yN/wr5f+a7/IPdINguO+8XZatl6ReBXt+0HrVmjwhUkBVyJS3i7qqvKrbaI7i0D/Om1FGYf4mC2S3sYKH6P6FHn//scK4blntVfZm1CKNww1x1FhtzjXIXr5TWbxXNMsUruEGatQR81jEAyyhQxE/YR8H+FX6Zlwxrhts5FqaKDiX8c8w6r8BoCnTFQ==</latexit><latexit sha1_base64="WiSy2qnMJnJn/Jh+eBgx0ac955E=">AAADb3ichVLPaxNBFP6a1Vrrj8Z6UBBkMEQSlDIJQqWnohe9SJs0baBbl9l1mizdX+xMQuuy/4AXjx68qOBB/DO8ePLmoX+CeJIW9GDFt5sNakvrLDvz5pv3fe/Nm2dHnqs057sTJePU6ckzU2enz52/cHGmfGl2VYWD2JEdJ/TCuGsLJT03kB3tak92o1gK3/bkmr11PztfG8pYuWGwonciueGLXuBuuo7QBFnlvukL3Y/9xAt7rk5rpi3ixNR9qUVqJQOR1pmpXJ+N3R6FsS+8tGZbgt1iiubDAjnXGtRvs4SYPV9YIn3crFvlCp/j+WBHjUZhVFCMpbD8AyaeIISDAXxIBNBkexBQ9K2jAY6IsA0khMVkufm5RIpp4g7IS5KHIHSL5h7t1gs0oH2mqXK2Q1E8+mNiMlT5Z/6O7/GP/D3/wn8eq5XkGlkuO7TaI66MrJlnV9vf/8vyadXo/2GdmLPGJu7mubqUe5Qj2S2cEX/49MVee6FVTW7yN/wr5f+a7/IPdINguO+8XZatl6ReBXt+0HrVmjwhUkBVyJS3i7qqvKrbaI7i0D/Om1FGYf4mC2S3sYKH6P6FHn//scK4blntVfZm1CKNww1x1FhtzjXIXr5TWbxXNMsUruEGatQR81jEAyyhQxE/YR8H+FX6Zlwxrhts5FqaKDiX8c8w6r8BoCnTFQ==</latexit>
7.
7
btain author information from
(VIAF)3, a directory of author
ity records from the Library of
und the world. Author gender
s for many records.
mployed by the VIAF is exible
ender identities, supporting an
es for the validity of an identity.
se exibility — all its assertions
This is a signicant limitation
on 5.1.
book data with rating data by
ve data linking coverage, and
works instead of individual edi-
m a bipartite graph of ISBNs and
“edition” records, and OpenLi-
e) and consider each connected
ess than 1% of ratings) this caus-
or a book; we resolve multiple
ir ratings.
VIAF do not share linking iden-
hority records by author name.
ontain multiple name entries,
izations of the author’s name.
arry multiple known forms of
ng names to improve matching
ng both “Last, First” and “First
e all VIAF records containing a
d names for the rst author of
n a book’s cluster. If all records
hor’s gender agree, we take that
ontradicting gender statements,
as “ambiguous”.
ure good coverage while main-
Table 2: Summary of rating data
BookCrossing Amazon
Ratings 1,149,780 22,507,155
Users 105,283 8,026,324
Rated ISBNs/ASINs 340,554 2,330,066
Rated ‘Books’ 295,935 2,286,656
Matched Books 240,255 1,083,066
Known-Gender Books 166,928 616,317
Female-Author Books 66,524 181,850
Male-Author Books 100,404 434,467
% Female Books 39.9% 29.5%
% Female Ratings 45.3% 36.2%
BXA BXE
LOC AZ
fem
ale
m
ale
am
biguousunknow
nunlinked
fem
ale
m
ale
am
biguousunknow
nunlinked
0%
20%
40%
60%
0%
10%
20%
30%
40%
0%
10%
20%
30%
40%
0%
10%
20%
30%
40%
Linking Result
CoveragePercent
Scope
Books
Ratings
Figure 1: Results of data linking and gender resolution. LOC
is the set of books with Library of Congress records; other
panes are the results of linking rating data.
8.
8
dependent
TAN 2.17.3
each per-
We report
arameters
h existing
acterizing
nalyze the
Tables 1–
sample of
nders are
in our cat-
has a more
ookCross-
wn-gender
oportions
(est. sd log odds) 1.03 1.11 1.77
Posterior Mean 0.42 0.40 0.37
Std. Dev. 0.23 0.23 0.28
AZBXABXE
0.00 0.25 0.50 0.75 1.00
0
1
2
3
4
0.0
0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
Proportion of Female Authors
Density
Method Estimated θ Observed y/n Predicted y/n
Figure 4: Distribution of user author-gender tendencies. His-
togram shows observed proportions; lines show kernel den-
sities of estimated tendencies ( 0) along with observed and
predicted proportions.
and Figure 4 shows the distribution of observed author gender
9.
9
Users Dist. Items % Dist. Users Dist. Items % Dist. Users Dist. Items % Dist. Users Dist. Items % Dist.
Prole 1,000 35,187 66.5 1,000 24,913 73.6 1,000 27,525 88.2 1,000 27,525 88.2
UserUser 1,000 6,007 12.0 988 6,235 12.7 1,000 15,343 30.7 939 25,853 55.1
ItemItem 1,000 21,282 42.6 997 10,174 20.4 999 33,363 67.7 999 22,360 45.6
MF 1,000 140 0.3 1,000 264 0.5 1,000 164 0.3 1,000 651 1.3
PF 1,000 1,506 3.0 1,000 4,105 8.2 1,000 2,746 5.4 1,000 3,538 7.0
AZ (Explicit) AZ (Implicit) BXA BXE
UserUserItemItemMFPF
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
0
1
2
3
0
1
2
3
4
0
10
20
0
1
2
3
4
Proportion of Books by Female Authors
Density
Mean
Algorithm
Popular
Profile
Method
Observed
Predicted
Figure 5: Posterior densities of recommender biases from integrated regression model.
proportions. The ripples in predicted and observed proportions are
due to the commonality of 5-item user proles, for which there are
only 6 possible proportions; estimated tendency ( ) smooths them
out. This smoothing, along with avoiding estimated extreme biases
based on limited data, are why we nd it useful to estimate tenden-
cy instead of directly computing statistics on observed proportions.
To support direct comparison of the densities of observations and
predictions, we resampled observed proportions with replacement
to yield 10,000 observations.
We observe a population tendency to rate male authors more
frequently than female authors in all data sets (µ 0), but to rate
female authors more frequently than they would be rated were
users drawing books uniformly at random from the available set.
The average user author-gender tendency is slightly closer to an
even balance than the set of rated books. We also found a large
diversity amongst users about their estimated tendencies (s.d. of
Table 6: Mean / SD of rec. list female author proportions.
BXA BXE AZ (Implicit) AZ (Explicit)
Popular 0.458 0.500 0.364 0.364
Rating — 0.383 — 0.222
UserUser 0.399 / 0.180 0.435 / 0.190 0.315 / 0.186 0.367 / 0.278
ItemItem 0.465 / 0.200 0.348 / 0.124 0.351 / 0.245 0.389 / 0.336
MF 0.134 / 0.027 0.334 / 0.039 0.468 / 0.079 0.418 / 0.124
PF 0.372 / 0.208 0.429 / 0.177 0.374 / 0.144 0.394 / 0.177
basic coverage statistics of these algorithms along with correspond-
ing user prole statistics. Users for which an algorithm could not
produce recommendations are rare. We also computed the extent
to which algorithms recommend dierent items to dierent users;
“% Dist.” is the percentage of all recommendations that were distinct
items. Algorithms that repeatedly recommend the same items will
10.
10
BXE
-0.139 0.162 0.906 -0.573 0.129 0.531 -0.652 0.002 0.161 -0.166 0.298 0.772
(-0.20,-0.08) (0.10,0.22) (0.87,0.95) (-0.61,-0.54) (0.09,0.16) (0.51,0.56) (-0.66,-0.64) (-0.01,0.01) (0.15,0.17) (-0.22,-0.11) (0.25,0.35) (0.74,0.81)
AZ (Implicit)
-0.127 0.688 0.715 0.094 0.863 0.895 -0.244 0.011 0.364 -0.224 0.287 0.537
(-0.19,-0.06) (0.65,0.73) (0.68,0.76) (0.02,0.17) (0.81,0.92) (0.84,0.95) (-0.27,-0.22) (-0.00,0.02) (0.35,0.38) (-0.26,-0.18) (0.26,0.31) (0.51,0.56)
AZ (Explicit)
-0.580 0.322 0.681 -0.380 0.438 0.852 -0.117 0.006 0.273 -0.403 0.141 0.525
(-0.63,-0.53) (0.29,0.35) (0.65,0.71) (-0.44,-0.32) (0.40,0.48) (0.81,0.89) (-0.14,-0.10) (-0.00,0.02) (0.26,0.29) (-0.44,-0.37) (0.12,0.16) (0.50,0.55)
AZ (Explicit) AZ (Implicit) BXA BXE
UserUserItemItemMFPF
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
Profile Proportion of Female Authors
RecommenderProportionofFemaleAuthors
Figure 6: Scatter plots and regression curves for recommender response to individual users.
more concentrated. In the BookCrossing data, it tends to favor male
authors more than the underlying data would support; in implic-
it feedback mode, it is highly biased towards male authors with
respect even to the baseline distributions.
4.4 From Proles to Recommendations
Our extended Bayesian model (Section 3.4.2) allows us to address
RQ4: the extent to which our algorithms propagate individual users’
tendencies into their recommendations (RQ4).
Figure 5 shows the posterior predictive and observed densities
of recommender author-gender tendencies, and Figure 6 shows
scatter plots of observed recommendation proportions against user
prole proportions with regression curves (regression lines in log-
place. Visual inspection of the scatter plot suggests that there is a
strong component with consistent tendencies, but the regression
may accurately model the remaining users. Future work will use a
model that can better account for some global consistency.
4.5 Summary
RQ1 — Baseline Gender Distribution Known books are sig-
nicantly more likely to be written by men than by women;
representation among rated books is more balanced.
RQ2 — User Input Gender Distributions User are diuse in
their rating tendencies, with an overall trend favoring male
authors but less strongly than the baseline distribution.
RQ3 — Recommender Output Distributions Dierent CF
It appears that you have an ad-blocker running. By whitelisting SlideShare on your ad-blocker, you are supporting our community of content creators.
Hate ads?
We've updated our privacy policy.
We’ve updated our privacy policy so that we are compliant with changing global privacy regulations and to provide you with insight into the limited ways in which we use your data.
You can read the details below. By accepting, you agree to the updated privacy policy.