Bias in Recommendations

Bias in Recommendations
@ SIKS Course "Advances in Information Retrieval"
! David Graus
✉ david.graus@fdmediagroep.nl
🐦 @dvdgrs

David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
whoami !
2

whoami !
• 🎓 Academia
• BA Media Studies @ UvA (2008)
• MSc Media Technology @ Universiteit Leiden (2012)
• PhD Information Retrieval @ UvA (2017)
2

whoami !
• 🎓 Academia
• BA Media Studies @ UvA (2008)
• MSc Media Technology @ Universiteit Leiden (2012)
• PhD Information Retrieval @ UvA (2017)
• 🏢 Industry
• Editor radio/online public broadcaster NTR (between BA & MSc)
• Research Intern @ Microsoft Research, US
• Data Scientist @ Company.info (FD Mediagroep)
• Lead Data Scientist @ FD SMART Journalism / BNR SMART Radio
2

In what is to follow…
3

• An introduction of FD Mediagroep
3

• Personalization & RecSys at FD Mediagroep
3

• Personalization & RecSys at FD Mediagroep
• Two flavors of bias in RecSys
• Model/Algorithmic bias
• Perceived bias in personalization
3

David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 5

FD Mediagroup

The leading information provider in the financial economic domain
FD Mediagroup

The leading information provider in the financial economic domain
FD Mediagroup
in the Netherlands

AI @ FD Mediagroup

AI @ FD Mediagroup
10

Team
11
Dung Bahadir Anca Philippe
Maya David Feng Li’ao
Klaus Oberon Manon Azamat

AI @ FDMG: Academia/Industry

SMART Radio
• (Transcribe)
• Segment
• Tag
• Serve
14

Transcribe
15

Segment
• Based on metadata,  
text, and audio.
16

Tag
• Simple multilabel text  
classifier
• Trained on transcripts of  
segments + associated tags  
from website
17

Serve
• iOS/Android  
app
18

SMART Journalism
21

SMART Journalism
• Moonshot; personalized summarization
21

SMART Journalism
• How to get there:
21

SMART Journalism
• Content Understanding
21

SMART Journalism
• Content-based Recommender System; <user, article>
21

SMART Journalism
• Personalized snippet retrieval; <user, snippet-in-article>
21

SMART Journalism
• Personalized snippet retrieval; <user, snippet-in-article>
• Snippet-to-summary abstractor (?)
21

User Article

User Article
RecSys
Matching

User Article
RecSys
Matching
0.352

User Article
RecSys
Matching
0.352
0.795

User Article
RecSys
Matching
0.352
0.795
0.125

User Article
RecSys
Matching
0.352
0.795
0.125
0.643

User Articles

User Articles
Reader
Profile

User Articles
Reader
Profile
Article
Profile

User Articles
RecSys
Matching
Reader
Profile
Article
Profile

Article Representation
25
Article
Article
Profile

25
Article
Article
Profile
'Meer regelgeving cryptogeld noodzakelijk'

25
Article
Article
Profile
Tags: Blockchain, Cryptocurrency, Regelgeving

25
Article
Article
Profile
Rubriek: Economie & Politiek

25
Article
Article
Profile
Stylometrie: CharLen=2424, WordLen=486

25
Article
Article
Profile
Entities: -

User Profile
26
User

User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel

User Profile
26
User
Tags: Boete, Chips, EU, Mededinging

User Profile
26
User
Rubriek: Ondernemen

User Profile
26
User
Rubriek: Ondernemen
Entities: Qualcomm, Apple, NXP, Intel, Google

User Profile
26
User
User
Profile
Rubriek: Ondernemen

User Profile
26
User
User
Profile
Rubriek: Ondernemen
Rubriek: Ondernemen

Topman van softwaremaker Salesforce kraakt grote
techbedrijven
User
User
Profile
Rubriek: Ondernemen

techbedrijven
Tags: Big Data, Blog, Davos, Google, Technologie
User
User
Profile
Rubriek: Ondernemen

techbedrijven
Rubriek: Davos
User
User
Profile
Rubriek: Ondernemen

techbedrijven
Rubriek: Davos
Entities: Google, Apple, Microsoft, Salesforce
User
User
Profile
Rubriek: Ondernemen

Tags: Boete, Chips, EU, Mededinging, Big
Data, Blog, Davos, Google, Technologie
Rubriek: Ondernemen, Davos
Stylometrie: CharLen=3491, WordLen=635, CharLen=2856,
WordLen=524
Entities: Qualcomm, Apple (2), NXP, Intel, Google (2), Microsoft,
Salesforce
techbedrijven
Rubriek: Davos
Entities: Google, Apple, Microsoft, Salesforce
User
User
Profile

Model
• Content-based RecSys
• Ranking w/ point-wise LTR
• Features: user, article, user-article features (~14k)
• Labels: implicit feedback
• Clicks (i.e., click = 1, non-click = 0)
• Trained nightly
28

Bias?
• “Disproportionate weight in favor of or against an idea or thing,
usually in a way that is closed-minded, prejudicial, or unfair.”
29

Bias in RecSys
“Algorithmic”
I. In Collaborative Filtering methods
II. In implicit feedback/clicks
30

Collaborative
Filtering
31

Bias in CF
32

[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
Bias in CF
32

Filtering
• Bias: disproportionate weight in favor of popular items
Bias in CF
32

[2.] Meyer, F. Recommender systems in industrial contexts (2012)
Filtering
• “It is generally not useful to recommend very popular items as they are generally
already known by the user” [2]
Bias in CF
32

[3.] Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation, RMSE@RecSys ’19
Filtering
• “A market that suffers from popularity bias will lack opportunities to discover more
obscure products and will be, by definition, dominated by a few large brands […]” [3]
Bias in CF
32

[3.] Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation, RMSE@RecSys ’19
Filtering
• “A market that suffers from popularity bias will lack opportunities to discover more
obscure products and will be, by definition, dominated by a few large brands […]” [3]
• Solution: cluster long-tail items
Bias in CF
32

Bias in implicit feedback
33
Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)

• Popular items are overrepresented in implicit feedback
33

• Popular items are overrepresented in implicit feedback
• Position/“trust" bias (see Joachims et al., 2005)
• Eye-tracking study + comparison w/ explicit feedback shows;
• Clicks reflect relevance judgments
• Clicks ranked highly receive more clicks
33

Perceived Bias from RecSys
34

Perceived Bias from RecSys
• A state of intellectual isolation that  
allegedly can result from personalized  
searches when a website algorithm  
selectively guesses what information a  
user would like to see based on  
information about the user.
• As a result, users become separated  
from information that disagrees with  
their viewpoints.
34

Measuring personalization
35

Measuring personalization
• On average, 11.7% of results show differences due to
personalization on Google.
• Varies widely by search query and by result ranking.
• Only found measurable personalization as a result of searching
with a logged in account and the IP address of the searching user.
35

Method
36[Hannák et al., 2013]

Method
1. 👤

Method
1. 👤
1. Get 200 volunteers with Google accounts

Method
1. 👤
2. Have them issue the same set of queries

Method
1. 👤
3. Compare results

Method
1. 👤
3. Compare results
2. 🤖

Method
1. 👤
3. Compare results
2. 🤖
1. Construct Google bot accounts

Method
1. 👤
3. Compare results
2. 🤖
• Vary aspects such as location, demographics, click behavior, browsing + search
history, etc.

Method
1. 👤
3. Compare results
2. 🤖
history, etc.

Method
1. 👤
3. Compare results
2. 🤖
history, etc.
3. Compare results

👤 Findings

👤 Findings
• Top ranks tend to be less personalized than bottom ranks.

👤 Findings

👤 Findings
• ✅ Personalization based on location (e.g., company names)

👤 Findings
• ✅ Personalization based on location (e.g., company names)
• ❌ The least personalized results tend to be factual and health related
queries.

🤖 Findings

🤖 Findings
✅ Logged in vs. “cleared cookies” account

🤖 Findings
✅ Geolocation

🤖 Findings
✅ Geolocation
❌ Gender

🤖 Findings
✅ Geolocation
❌ Gender
❌ Age

🤖 Findings
✅ Geolocation
❌ Gender
❌ Age
❌ Search history

🤖 Findings
✅ Geolocation
❌ Gender
❌ Age
❌ Search history
❌ Click history

🤖 Findings
✅ Geolocation
❌ Gender
❌ Age
❌ Search history
❌ Click history
❌ Browsing history

Diversity to pop the filter bubble
40

Method
41[Nguyen et al., 2014]

Method
• Split MovieLens users into two groups:

Method
• “Followers”: users who rated movies they were recommended

Method
• “Ignorers”: users who rated movies they were not
recommended

Method
recommended
• Compare between groups, over time:

Method
recommended
• Diversity of recommendations

Method
recommended
• Diversity of recommendations
• Ratings of movies

Findings

Findings
1. Diversity

Findings
1. Diversity
• In both groups, diversity decreases over time.

Findings
1. Diversity
• The effect is lessened for users who consume recommended
items (followers)

Findings
1. Diversity
items (followers)
2. Ratings

Findings
1. Diversity
items (followers)
2. Ratings
• Slight decrease in average ratings for ignorers (3.74 to 3.55).

Findings
1. Diversity
items (followers)
2. Ratings
• Slight decrease in average ratings for ignorers (3.74 to 3.55).
• Stable average ratings for followers (~3.68).

Diversity in RecSys 🤖 vs. humans 👤?
43

Method
44[Möller et al. 2018]

Method
• 🤖 Generate article recommendations for news articles using
different RecSys algorithms (CF & CB).

Method
• 👤 Compare to hand-picked article recommendations.

Method
• 👤 Compare to hand-picked article recommendations.
• Measure & compare “diversity” of recommended articles:
• At content level
• At tag level
• At category level
• At sentiment/subjectivity level

Findings

Findings
“Conventional recommendation algorithms at least preserve the
topic/sentiment diversity of the article supply.”

More diversity
46

Aim
Increase exposure to varied political opinions  
with a goal of improving civil discourse
47[Yom-Tov et al. 2014]

Method
• Classify searchers into political leaning (using geo data)

Method

Method
• Infer political leaning of news sources from user behavior.

Method
• Infer political leaning of news sources from user behavior.
• Identify polarized search queries (with strong political leanings —
in both directions).

Method

Method
• Treatment group: Insert red results for blue users, and blue
results for red users

Method
• Treatment group: Insert red results for blue users, and blue
results for red users
• Control group: Do not adjust results

Method
51

Method
1. Short term: Compare clicks/behavior between control &
treatment.
51

Method
treatment.
2. Long term: Measure during two weeks, per user;
51

Method
treatment.
1. Polarization: Difference of user’s leaning-score compared to
average leaning across all sources.
51

Method
treatment.
1. Polarization: Difference of user’s leaning-score compared to
average leaning across all sources.
2. Engagement: Average number of queries + average read
articles.
51

Findings I

Findings I
• Less clicks on inserted opposing sources.

Findings I
• Less clicks on inserted opposing sources.
• But:  
“Results pages of the opposing viewpoint which had a similarity
higher than the average tended to be clicked 38% more than those
below the average.”

Findings II

Findings II
• Polarization:

Findings II
• Polarization:
• Treatment: Average leaning ‘moves’ ~25% to centre

Findings II
• Polarization:
• Control: Negligible difference (~1%)

Findings II
• Polarization:
• Engagement:

Findings II
• Polarization:
• Engagement:
• Treatment: Number of queries: +9% / articles read: +4%

Findings II
• Polarization:
• Engagement:
• Treatment: Number of queries: +9% / articles read: +4%
• Control: Small reduction in both (~2.5%)

Refs
Algorithmic bias
1. Park & Tuzhilin, The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
2. Meyer, Recommender systems in industrial contexts (2012)
3. Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation (RMSE@RecSys ’19)
4. Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
Perceived bias / filter bubbles
5. Hannak et al., Measuring personalization of web search (WWW ’13)
6. Nguyen et al., Exploring the filter bubble: the effect of using recommender systems on content diversity (WWW ’14)
7. Möller et al., Do not blame it on the algorithm — An empirical assessment of multiple recommender systems and their impact
on content diversity (Information Communication and Society ’18)
8. Yom-Tov et al., Promoting Civil Discourse Through Search Engine Diversity (Social Science Computer Review, ’13)
54

Bias in Recommendations

Recommended

Recommended

More Related Content

Similar to Bias in Recommendations

Similar to Bias in Recommendations (20)

More from David Graus

More from David Graus (20)

Recently uploaded

Recently uploaded (20)

Bias in Recommendations