Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bias in Recommendations

1,952 views

Published on

Slidedeck of my lecture at SIKS Course "Advances in Information Retrieval"

Read more here: https://graus.nu/blog/bias-in-recommendations-lecture-siks-course-on-advances-in-ir/

Published in: Data & Analytics
  • Be the first to comment

Bias in Recommendations

  1. 1. Bias in Recommendations @ SIKS Course "Advances in Information Retrieval" ! David Graus ✉ david.graus@fdmediagroep.nl 🐦 @dvdgrs
  2. 2. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 whoami ! 2
  3. 3. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 whoami ! • 🎓 Academia • BA Media Studies @ UvA (2008) • MSc Media Technology @ Universiteit Leiden (2012) • PhD Information Retrieval @ UvA (2017) 2
  4. 4. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 whoami ! • 🎓 Academia • BA Media Studies @ UvA (2008) • MSc Media Technology @ Universiteit Leiden (2012) • PhD Information Retrieval @ UvA (2017) • 🏢 Industry • Editor radio/online public broadcaster NTR (between BA & MSc) • Research Intern @ Microsoft Research, US • Data Scientist @ Company.info (FD Mediagroep) • Lead Data Scientist @ FD SMART Journalism / BNR SMART Radio 2
  5. 5. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 In what is to follow… 3
  6. 6. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 In what is to follow… • An introduction of FD Mediagroep 3
  7. 7. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 In what is to follow… • An introduction of FD Mediagroep • Personalization & RecSys at FD Mediagroep 3
  8. 8. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 In what is to follow… • An introduction of FD Mediagroep • Personalization & RecSys at FD Mediagroep • Two flavors of bias in RecSys • Model/Algorithmic bias • Perceived bias in personalization 3
  9. 9. Part 1: Introduction
  10. 10. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 5
  11. 11. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 FD Mediagroup
  12. 12. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 The leading information provider in the financial economic domain FD Mediagroup
  13. 13. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 The leading information provider in the financial economic domain FD Mediagroup in the Netherlands
  14. 14. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 The leading information provider in the financial economic domain FD Mediagroup in the Netherlands
  15. 15. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 The leading information provider in the financial economic domain FD Mediagroup in the Netherlands
  16. 16. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
  17. 17. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
  18. 18. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 AI @ FD Mediagroup
  19. 19. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 AI @ FD Mediagroup 10
  20. 20. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Team 11 Dung Bahadir Anca Philippe Maya David Feng Li’ao Klaus Oberon Manon Azamat
  21. 21. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Team 11 Dung Bahadir Anca Philippe Maya David Feng Li’ao Klaus Oberon Manon Azamat
  22. 22. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Team 11 Dung Bahadir Anca Philippe Maya David Feng Li’ao Klaus Oberon Manon Azamat
  23. 23. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Team 11 Dung Bahadir Anca Philippe Maya David Feng Li’ao Klaus Oberon Manon Azamat
  24. 24. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Team 11 Dung Bahadir Anca Philippe Maya David Feng Li’ao Klaus Oberon Manon Azamat
  25. 25. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 AI @ FDMG: Academia/Industry
  26. 26. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 AI @ FDMG: Academia/Industry
  27. 27. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 AI @ FDMG: Academia/Industry
  28. 28. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 AI @ FDMG: Academia/Industry
  29. 29. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 SMART Radio • (Transcribe) • Segment • Tag • Serve 14
  30. 30. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Transcribe 15
  31. 31. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Segment • Based on metadata, 
 text, and audio. 16
  32. 32. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Segment • Based on metadata, 
 text, and audio. 16
  33. 33. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Tag • Simple multilabel text 
 classifier • Trained on transcripts of 
 segments + associated tags 
 from website 17
  34. 34. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Serve • iOS/Android 
 app 18
  35. 35. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Serve • iOS/Android 
 app 18
  36. 36. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Serve • iOS/Android 
 app 18
  37. 37. Part 2: SMART Journalism
  38. 38. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 20
  39. 39. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 20
  40. 40. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 SMART Journalism 21
  41. 41. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 SMART Journalism • Moonshot; personalized summarization 21
  42. 42. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 SMART Journalism • Moonshot; personalized summarization • How to get there: 21
  43. 43. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 SMART Journalism • Moonshot; personalized summarization • How to get there: • Content Understanding 21
  44. 44. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 SMART Journalism • Moonshot; personalized summarization • How to get there: • Content Understanding • Content-based Recommender System; <user, article> 21
  45. 45. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 SMART Journalism • Moonshot; personalized summarization • How to get there: • Content Understanding • Content-based Recommender System; <user, article> • Personalized snippet retrieval; <user, snippet-in-article> 21
  46. 46. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 SMART Journalism • Moonshot; personalized summarization • How to get there: • Content Understanding • Content-based Recommender System; <user, article> • Personalized snippet retrieval; <user, snippet-in-article> • Snippet-to-summary abstractor (?) 21
  47. 47. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 SMART Journalism • Moonshot; personalized summarization • How to get there: • Content Understanding • Content-based Recommender System; <user, article> • Personalized snippet retrieval; <user, snippet-in-article> • Snippet-to-summary abstractor (?) 21
  48. 48. Tech
  49. 49. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 23 User Article
  50. 50. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 23 User Article RecSys Matching
  51. 51. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 23 User Article RecSys Matching 0.352
  52. 52. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 23 User Article RecSys Matching 0.352 0.795
  53. 53. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 23 User Article RecSys Matching 0.352 0.795 0.125
  54. 54. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 23 User Article RecSys Matching 0.352 0.795 0.125 0.643
  55. 55. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 23 User Article RecSys Matching 0.352 0.795 0.125 0.643
  56. 56. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 24 User Articles
  57. 57. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 24 User Articles Reader Profile
  58. 58. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 24 User Articles Reader Profile Article Profile
  59. 59. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 24 User Articles RecSys Matching Reader Profile Article Profile
  60. 60. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Article Representation 25 Article Article Profile Article Representation
  61. 61. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Article Representation 25 Article Article Profile 'Meer regelgeving cryptogeld noodzakelijk' Article Representation
  62. 62. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Article Representation 25 Article Article Profile 'Meer regelgeving cryptogeld noodzakelijk' Article Representation Tags: Blockchain, Cryptocurrency, Regelgeving
  63. 63. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Article Representation 25 Article Article Profile 'Meer regelgeving cryptogeld noodzakelijk' Article Representation Tags: Blockchain, Cryptocurrency, Regelgeving Rubriek: Economie & Politiek
  64. 64. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Article Representation 25 Article Article Profile 'Meer regelgeving cryptogeld noodzakelijk' Article Representation Tags: Blockchain, Cryptocurrency, Regelgeving Rubriek: Economie & Politiek Stylometrie: CharLen=2424, WordLen=486
  65. 65. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Article Representation 25 Article Article Profile 'Meer regelgeving cryptogeld noodzakelijk' Article Representation Tags: Blockchain, Cryptocurrency, Regelgeving Rubriek: Economie & Politiek Stylometrie: CharLen=2424, WordLen=486 Entities: -
  66. 66. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User
  67. 67. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User
  68. 68. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User Qualcomm krijgt bijna €1 mrd boete van Brussel
  69. 69. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User Qualcomm krijgt bijna €1 mrd boete van Brussel
  70. 70. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User Qualcomm krijgt bijna €1 mrd boete van Brussel Tags: Boete, Chips, EU, Mededinging
  71. 71. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User Qualcomm krijgt bijna €1 mrd boete van Brussel Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen
  72. 72. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User Qualcomm krijgt bijna €1 mrd boete van Brussel Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635
  73. 73. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User Qualcomm krijgt bijna €1 mrd boete van Brussel Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google
  74. 74. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User Qualcomm krijgt bijna €1 mrd boete van Brussel Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google
  75. 75. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User User Profile Qualcomm krijgt bijna €1 mrd boete van Brussel Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google
  76. 76. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User User Profile Qualcomm krijgt bijna €1 mrd boete van Brussel Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google Tags: Boete, Chips, EU, Mededinging
  77. 77. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User User Profile Qualcomm krijgt bijna €1 mrd boete van Brussel Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen
  78. 78. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User User Profile Qualcomm krijgt bijna €1 mrd boete van Brussel Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635
  79. 79. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 User Profile 26 User User Profile Qualcomm krijgt bijna €1 mrd boete van Brussel Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google
  80. 80. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27 Qualcomm krijgt bijna €1 mrd boete van Brussel Topman van softwaremaker Salesforce kraakt grote techbedrijven User User Profile Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google
  81. 81. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27 Qualcomm krijgt bijna €1 mrd boete van Brussel Topman van softwaremaker Salesforce kraakt grote techbedrijven User User Profile Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google
  82. 82. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27 Qualcomm krijgt bijna €1 mrd boete van Brussel Topman van softwaremaker Salesforce kraakt grote techbedrijven Tags: Big Data, Blog, Davos, Google, Technologie User User Profile Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google
  83. 83. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27 Qualcomm krijgt bijna €1 mrd boete van Brussel Topman van softwaremaker Salesforce kraakt grote techbedrijven Tags: Big Data, Blog, Davos, Google, Technologie Rubriek: Davos User User Profile Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google
  84. 84. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27 Qualcomm krijgt bijna €1 mrd boete van Brussel Topman van softwaremaker Salesforce kraakt grote techbedrijven Tags: Big Data, Blog, Davos, Google, Technologie Rubriek: Davos Stylometrie: CharLen=2856, WordLen=524 User User Profile Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google
  85. 85. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27 Qualcomm krijgt bijna €1 mrd boete van Brussel Topman van softwaremaker Salesforce kraakt grote techbedrijven Tags: Big Data, Blog, Davos, Google, Technologie Rubriek: Davos Stylometrie: CharLen=2856, WordLen=524 Entities: Google, Apple, Microsoft, Salesforce User User Profile Tags: Boete, Chips, EU, Mededinging Rubriek: Ondernemen Stylometrie: CharLen=3491, WordLen=635 Entities: Qualcomm, Apple, NXP, Intel, Google
  86. 86. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27 Qualcomm krijgt bijna €1 mrd boete van Brussel Tags: Boete, Chips, EU, Mededinging, Big Data, Blog, Davos, Google, Technologie Rubriek: Ondernemen, Davos Stylometrie: CharLen=3491, WordLen=635, CharLen=2856, WordLen=524 Entities: Qualcomm, Apple (2), NXP, Intel, Google (2), Microsoft, Salesforce Topman van softwaremaker Salesforce kraakt grote techbedrijven Tags: Big Data, Blog, Davos, Google, Technologie Rubriek: Davos Stylometrie: CharLen=2856, WordLen=524 Entities: Google, Apple, Microsoft, Salesforce User User Profile
  87. 87. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Model • Content-based RecSys • Ranking w/ point-wise LTR • Features: user, article, user-article features (~14k) • Labels: implicit feedback • Clicks (i.e., click = 1, non-click = 0) • Trained nightly 28
  88. 88. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Bias? • “Disproportionate weight in favor of or against an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair.” 29
  89. 89. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Bias in RecSys “Algorithmic” I. In Collaborative Filtering methods II. In implicit feedback/clicks 30
  90. 90. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Collaborative Filtering 31
  91. 91. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Collaborative Filtering 31
  92. 92. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Bias in CF 32
  93. 93. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 [1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08) • It is more difficult to predict ratings of infrequently rated items in Collaborative Filtering Bias in CF 32
  94. 94. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 [1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08) • It is more difficult to predict ratings of infrequently rated items in Collaborative Filtering Bias in CF 32
  95. 95. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 [1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08) • It is more difficult to predict ratings of infrequently rated items in Collaborative Filtering • Bias: disproportionate weight in favor of popular items Bias in CF 32
  96. 96. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 [1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08) [2.] Meyer, F. Recommender systems in industrial contexts (2012) • It is more difficult to predict ratings of infrequently rated items in Collaborative Filtering • Bias: disproportionate weight in favor of popular items • “It is generally not useful to recommend very popular items as they are generally already known by the user” [2] Bias in CF 32
  97. 97. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 [1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08) [2.] Meyer, F. Recommender systems in industrial contexts (2012) [3.] Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation, RMSE@RecSys ’19 • It is more difficult to predict ratings of infrequently rated items in Collaborative Filtering • Bias: disproportionate weight in favor of popular items • “It is generally not useful to recommend very popular items as they are generally already known by the user” [2] • “A market that suffers from popularity bias will lack opportunities to discover more obscure products and will be, by definition, dominated by a few large brands […]” [3] Bias in CF 32
  98. 98. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 [1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08) [2.] Meyer, F. Recommender systems in industrial contexts (2012) [3.] Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation, RMSE@RecSys ’19 • It is more difficult to predict ratings of infrequently rated items in Collaborative Filtering • Bias: disproportionate weight in favor of popular items • “It is generally not useful to recommend very popular items as they are generally already known by the user” [2] • “A market that suffers from popularity bias will lack opportunities to discover more obscure products and will be, by definition, dominated by a few large brands […]” [3] • Solution: cluster long-tail items Bias in CF 32
  99. 99. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 [1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08) [2.] Meyer, F. Recommender systems in industrial contexts (2012) [3.] Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation, RMSE@RecSys ’19 • It is more difficult to predict ratings of infrequently rated items in Collaborative Filtering • Bias: disproportionate weight in favor of popular items • “It is generally not useful to recommend very popular items as they are generally already known by the user” [2] • “A market that suffers from popularity bias will lack opportunities to discover more obscure products and will be, by definition, dominated by a few large brands […]” [3] • Solution: cluster long-tail items Bias in CF 32
  100. 100. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Bias in implicit feedback 33 Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
  101. 101. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Bias in implicit feedback • Popular items are overrepresented in implicit feedback 33 Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
  102. 102. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Bias in implicit feedback • Popular items are overrepresented in implicit feedback • Position/“trust" bias (see Joachims et al., 2005) • Eye-tracking study + comparison w/ explicit feedback shows; • Clicks reflect relevance judgments • Clicks ranked highly receive more clicks 33 Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
  103. 103. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Bias in implicit feedback • Popular items are overrepresented in implicit feedback • Position/“trust" bias (see Joachims et al., 2005) • Eye-tracking study + comparison w/ explicit feedback shows; • Clicks reflect relevance judgments • Clicks ranked highly receive more clicks 33 Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
  104. 104. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Perceived Bias from RecSys 34
  105. 105. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Perceived Bias from RecSys • A state of intellectual isolation that 
 allegedly can result from personalized 
 searches when a website algorithm 
 selectively guesses what information a 
 user would like to see based on 
 information about the user. • As a result, users become separated 
 from information that disagrees with 
 their viewpoints. 34
  106. 106. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Measuring personalization 35
  107. 107. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Measuring personalization 35
  108. 108. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Measuring personalization • On average, 11.7% of results show differences due to personalization on Google. • Varies widely by search query and by result ranking. • Only found measurable personalization as a result of searching with a logged in account and the IP address of the searching user. 35
  109. 109. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 36[Hannák et al., 2013]
  110. 110. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 1. 👤 36[Hannák et al., 2013]
  111. 111. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 1. 👤 1. Get 200 volunteers with Google accounts 36[Hannák et al., 2013]
  112. 112. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 1. 👤 1. Get 200 volunteers with Google accounts 2. Have them issue the same set of queries 36[Hannák et al., 2013]
  113. 113. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 1. 👤 1. Get 200 volunteers with Google accounts 2. Have them issue the same set of queries 3. Compare results 36[Hannák et al., 2013]
  114. 114. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 1. 👤 1. Get 200 volunteers with Google accounts 2. Have them issue the same set of queries 3. Compare results 2. 🤖 36[Hannák et al., 2013]
  115. 115. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 1. 👤 1. Get 200 volunteers with Google accounts 2. Have them issue the same set of queries 3. Compare results 2. 🤖 1. Construct Google bot accounts 36[Hannák et al., 2013]
  116. 116. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 1. 👤 1. Get 200 volunteers with Google accounts 2. Have them issue the same set of queries 3. Compare results 2. 🤖 1. Construct Google bot accounts • Vary aspects such as location, demographics, click behavior, browsing + search history, etc. 36[Hannák et al., 2013]
  117. 117. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 1. 👤 1. Get 200 volunteers with Google accounts 2. Have them issue the same set of queries 3. Compare results 2. 🤖 1. Construct Google bot accounts • Vary aspects such as location, demographics, click behavior, browsing + search history, etc. 2. Have them issue the same set of queries 36[Hannák et al., 2013]
  118. 118. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 1. 👤 1. Get 200 volunteers with Google accounts 2. Have them issue the same set of queries 3. Compare results 2. 🤖 1. Construct Google bot accounts • Vary aspects such as location, demographics, click behavior, browsing + search history, etc. 2. Have them issue the same set of queries 3. Compare results 36[Hannák et al., 2013]
  119. 119. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 👤 Findings 37[Hannák et al., 2013]
  120. 120. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 👤 Findings • On average, 11.7% of results show differences due to personalization on Google. 37[Hannák et al., 2013]
  121. 121. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 👤 Findings • On average, 11.7% of results show differences due to personalization on Google. • Top ranks tend to be less personalized than bottom ranks. 37[Hannák et al., 2013]
  122. 122. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 👤 Findings 38[Hannák et al., 2013]
  123. 123. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 👤 Findings • ✅ Personalization based on location (e.g., company names) 38[Hannák et al., 2013]
  124. 124. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 👤 Findings • ✅ Personalization based on location (e.g., company names) • ❌ The least personalized results tend to be factual and health related queries. 38[Hannák et al., 2013]
  125. 125. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 🤖 Findings 39[Hannák et al., 2013]
  126. 126. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 🤖 Findings ✅ Logged in vs. “cleared cookies” account 39[Hannák et al., 2013]
  127. 127. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 🤖 Findings ✅ Logged in vs. “cleared cookies” account ✅ Geolocation 39[Hannák et al., 2013]
  128. 128. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 🤖 Findings ✅ Logged in vs. “cleared cookies” account ✅ Geolocation ❌ Gender 39[Hannák et al., 2013]
  129. 129. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 🤖 Findings ✅ Logged in vs. “cleared cookies” account ✅ Geolocation ❌ Gender ❌ Age 39[Hannák et al., 2013]
  130. 130. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 🤖 Findings ✅ Logged in vs. “cleared cookies” account ✅ Geolocation ❌ Gender ❌ Age ❌ Search history 39[Hannák et al., 2013]
  131. 131. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 🤖 Findings ✅ Logged in vs. “cleared cookies” account ✅ Geolocation ❌ Gender ❌ Age ❌ Search history ❌ Click history 39[Hannák et al., 2013]
  132. 132. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 🤖 Findings ✅ Logged in vs. “cleared cookies” account ✅ Geolocation ❌ Gender ❌ Age ❌ Search history ❌ Click history ❌ Browsing history 39[Hannák et al., 2013]
  133. 133. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Diversity to pop the filter bubble 40
  134. 134. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Diversity to pop the filter bubble 40
  135. 135. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 41[Nguyen et al., 2014]
  136. 136. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • Split MovieLens users into two groups: 41[Nguyen et al., 2014]
  137. 137. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • Split MovieLens users into two groups: • “Followers”: users who rated movies they were recommended 41[Nguyen et al., 2014]
  138. 138. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • Split MovieLens users into two groups: • “Followers”: users who rated movies they were recommended • “Ignorers”: users who rated movies they were not recommended 41[Nguyen et al., 2014]
  139. 139. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • Split MovieLens users into two groups: • “Followers”: users who rated movies they were recommended • “Ignorers”: users who rated movies they were not recommended • Compare between groups, over time: 41[Nguyen et al., 2014]
  140. 140. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • Split MovieLens users into two groups: • “Followers”: users who rated movies they were recommended • “Ignorers”: users who rated movies they were not recommended • Compare between groups, over time: • Diversity of recommendations 41[Nguyen et al., 2014]
  141. 141. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • Split MovieLens users into two groups: • “Followers”: users who rated movies they were recommended • “Ignorers”: users who rated movies they were not recommended • Compare between groups, over time: • Diversity of recommendations • Ratings of movies 41[Nguyen et al., 2014]
  142. 142. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings 42[Nguyen et al., 2014]
  143. 143. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings 1. Diversity 42[Nguyen et al., 2014]
  144. 144. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings 1. Diversity • In both groups, diversity decreases over time. 42[Nguyen et al., 2014]
  145. 145. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings 1. Diversity • In both groups, diversity decreases over time. • The effect is lessened for users who consume recommended items (followers) 42[Nguyen et al., 2014]
  146. 146. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings 1. Diversity • In both groups, diversity decreases over time. • The effect is lessened for users who consume recommended items (followers) 2. Ratings 42[Nguyen et al., 2014]
  147. 147. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings 1. Diversity • In both groups, diversity decreases over time. • The effect is lessened for users who consume recommended items (followers) 2. Ratings • Slight decrease in average ratings for ignorers (3.74 to 3.55). 42[Nguyen et al., 2014]
  148. 148. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings 1. Diversity • In both groups, diversity decreases over time. • The effect is lessened for users who consume recommended items (followers) 2. Ratings • Slight decrease in average ratings for ignorers (3.74 to 3.55). • Stable average ratings for followers (~3.68). 42[Nguyen et al., 2014]
  149. 149. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Diversity in RecSys 🤖 vs. humans 👤? 43
  150. 150. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Diversity in RecSys 🤖 vs. humans 👤? 43
  151. 151. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 44[Möller et al. 2018]
  152. 152. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • 🤖 Generate article recommendations for news articles using different RecSys algorithms (CF & CB). 44[Möller et al. 2018]
  153. 153. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • 🤖 Generate article recommendations for news articles using different RecSys algorithms (CF & CB). • 👤 Compare to hand-picked article recommendations. 44[Möller et al. 2018]
  154. 154. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • 🤖 Generate article recommendations for news articles using different RecSys algorithms (CF & CB). • 👤 Compare to hand-picked article recommendations. • Measure & compare “diversity” of recommended articles: • At content level • At tag level • At category level • At sentiment/subjectivity level 44[Möller et al. 2018]
  155. 155. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings 45[Möller et al. 2018]
  156. 156. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings 45[Möller et al. 2018]
  157. 157. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings 45[Möller et al. 2018]
  158. 158. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings 45[Möller et al. 2018]
  159. 159. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings 45[Möller et al. 2018]
  160. 160. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings 45[Möller et al. 2018]
  161. 161. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings “Conventional recommendation algorithms at least preserve the topic/sentiment diversity of the article supply.” 45[Möller et al. 2018]
  162. 162. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 More diversity 46
  163. 163. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 More diversity 46
  164. 164. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Aim Increase exposure to varied political opinions 
 with a goal of improving civil discourse 47[Yom-Tov et al. 2014]
  165. 165. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • Classify searchers into political leaning (using geo data) 48[Yom-Tov et al. 2014]
  166. 166. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 49[Yom-Tov et al. 2014]
  167. 167. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • Infer political leaning of news sources from user behavior. 49[Yom-Tov et al. 2014]
  168. 168. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • Infer political leaning of news sources from user behavior. 49[Yom-Tov et al. 2014]
  169. 169. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • Infer political leaning of news sources from user behavior. 49[Yom-Tov et al. 2014]
  170. 170. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • Infer political leaning of news sources from user behavior. • Identify polarized search queries (with strong political leanings — in both directions). 49[Yom-Tov et al. 2014]
  171. 171. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 50[Yom-Tov et al. 2014]
  172. 172. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • Treatment group: Insert red results for blue users, and blue results for red users 50[Yom-Tov et al. 2014]
  173. 173. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method • Treatment group: Insert red results for blue users, and blue results for red users • Control group: Do not adjust results 50[Yom-Tov et al. 2014]
  174. 174. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 51
  175. 175. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 1. Short term: Compare clicks/behavior between control & treatment. 51
  176. 176. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 1. Short term: Compare clicks/behavior between control & treatment. 2. Long term: Measure during two weeks, per user; 51
  177. 177. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 1. Short term: Compare clicks/behavior between control & treatment. 2. Long term: Measure during two weeks, per user; 1. Polarization: Difference of user’s leaning-score compared to average leaning across all sources. 51
  178. 178. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Method 1. Short term: Compare clicks/behavior between control & treatment. 2. Long term: Measure during two weeks, per user; 1. Polarization: Difference of user’s leaning-score compared to average leaning across all sources. 2. Engagement: Average number of queries + average read articles. 51
  179. 179. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings I 52[Yom-Tov et al. 2014]
  180. 180. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings I • Less clicks on inserted opposing sources. 52[Yom-Tov et al. 2014]
  181. 181. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings I • Less clicks on inserted opposing sources. • But: 
 “Results pages of the opposing viewpoint which had a similarity higher than the average tended to be clicked 38% more than those below the average.” 52[Yom-Tov et al. 2014]
  182. 182. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings II 53[Yom-Tov et al. 2014]
  183. 183. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings II • Polarization: 53[Yom-Tov et al. 2014]
  184. 184. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings II • Polarization: • Treatment: Average leaning ‘moves’ ~25% to centre 53[Yom-Tov et al. 2014]
  185. 185. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings II • Polarization: • Treatment: Average leaning ‘moves’ ~25% to centre • Control: Negligible difference (~1%) 53[Yom-Tov et al. 2014]
  186. 186. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings II • Polarization: • Treatment: Average leaning ‘moves’ ~25% to centre • Control: Negligible difference (~1%) • Engagement: 53[Yom-Tov et al. 2014]
  187. 187. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings II • Polarization: • Treatment: Average leaning ‘moves’ ~25% to centre • Control: Negligible difference (~1%) • Engagement: • Treatment: Number of queries: +9% / articles read: +4% 53[Yom-Tov et al. 2014]
  188. 188. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Findings II • Polarization: • Treatment: Average leaning ‘moves’ ~25% to centre • Control: Negligible difference (~1%) • Engagement: • Treatment: Number of queries: +9% / articles read: +4% • Control: Small reduction in both (~2.5%) 53[Yom-Tov et al. 2014]
  189. 189. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 Refs Algorithmic bias 1. Park & Tuzhilin, The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08) 2. Meyer, Recommender systems in industrial contexts (2012) 3. Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation (RMSE@RecSys ’19) 4. Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05) Perceived bias / filter bubbles 5. Hannak et al., Measuring personalization of web search (WWW ’13) 6. Nguyen et al., Exploring the filter bubble: the effect of using recommender systems on content diversity (WWW ’14) 7. Möller et al., Do not blame it on the algorithm — An empirical assessment of multiple recommender systems and their impact on content diversity (Information Communication and Society ’18) 8. Yom-Tov et al., Promoting Civil Discourse Through Search Engine Diversity (Social Science Computer Review, ’13) 54
  190. 190. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 55

×