Successfully reported this slideshow.

Assessing Viewpoint Diversity in Search Results Using Ranking Fairness Metrics

0

Share

1 of 14
1 of 14

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Assessing Viewpoint Diversity in Search Results Using Ranking Fairness Metrics

  1. 1. 1 WIS Web Information Systems Assessing Viewpoint Diversity in Search Results Using Ranking Fairness Metrics Tim Draws1, Nava Tintarev1, Ujwal Gadiraju1, Alessandro Bozzon1, and Benjamin Timmermans2 1TU Delft, The Netherlands 2IBM, The Netherlands t.a.draws@tudelft.nl
  2. 2. 2 WIS Web Information Systems Biases in web search • Position bias2-4 • “Search Engine Manipulation Effect”1,5 How can we quantify (a lack of) viewpoint diversity in search results? Yes! Yes! Yes! Yes! Yes! No! No!
  3. 3. 3 WIS Web Information Systems Ranking fairness metrics Statistical parity in fair ranking: protected attribute should not influence the ranking6 evaluate statistical parity for top i discount by rank normalize
  4. 4. 4 WIS Web Information Systems Our paper RQ: Can ranking fairness metrics be used to assess viewpoint diversity in search results? Contributions: 1. Evaluation of existing metrics 2. Novel metric
  5. 5. 7 WIS Web Information Systems Representing viewpoints Should we all be vegan? Strongly opposing Opposing Somewhat opposing Neutral Somewhat supporting Supporting Strongly supporting protected non-protected Binomial viewpoint fairness
  6. 6. 8 WIS Web Information Systems Representing viewpoints Should we all be vegan? Strongly opposing Opposing Somewhat opposing Neutral Somewhat supporting Supporting Strongly supporting protected Multinomial viewpoint fairness
  7. 7. 9 WIS Web Information Systems Metrics we consider Binomial viewpoint fairness – Normalized Discounted Difference (nDD)6 – Normalized Discounted Ratio (nDR)6 – Normalized Discounted Kullback-Leibler Divergence (nDKL)6 Multinomial viewpoint fairness – Normalized Discounted Jensen-Shannon Divergence (nDJS)
  8. 8. 11 WIS Web Information Systems Simulation studies How do the metrics behave for different levels of viewpoint diversity? • Three synthetic data sets S1, S2, S3 • Per set created rankings to simulate different levels of viewpoint diversity
  9. 9. 13 WIS Web Information Systems Weighted sampling procedure Rank Viewpoint 1 Strongly opposing 2 Strongly opposing 3 Opposing 4 Somewhat opposing 5 Supporting 6 Strongly opposing … … S1 sampling Per set: created rankings with different levels of ranking bias • Binomial viewpoint fairness: all opposing viewpoints get w1, all others w2 • Multinomial viewpoint fairness: random viewpoint get w1, all others w2
  10. 10. 14 WIS Web Information Systems Results: binomial viewpoint fairness nDD nDR nDKL −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Ranking bias Meanmetricvalue Distribution S1 S2 S3 • All metrics assess binomial viewpoint fairness (as expected) • All metrics are asymmetric (proportion of protected items and ”direction” of bias matter) • Which metric to use depends on strength of ranking bias
  11. 11. 15 WIS Web Information Systems Results: multinomial viewpoint fairness 0.0 0.1 0.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Ranking bias MeannDJSvalue Distribution S1 S2 S3 • nDJS assesses multinomial viewpoint fairness • nDJS is also asymmetric (proportion of protected items and ”direction” of bias matter) • Careful interpretation: values not directly comparable to other metrics
  12. 12. 16 WIS Web Information Systems Discussion • Metrics work for assessing viewpoint diversity • Considerations: – What is the underlying aim? – How balanced is the data overall? – How strong is the ranking bias? – What is the direction of ranking bias?
  13. 13. 17 WIS Web Information Systems Take home and future work • Ranking fairness metrics can be used for assessing viewpoint diversity in search results – (when interpreted correctly) • Future work can use these metrics to… – …assess viewpoint diversity in real search results – …align different metric and behavioral outcomes
  14. 14. 18 WIS Web Information Systems References [1] R. Epstein and R. E. Robertson. The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proceedings of the National Academy of Sciences of the United States of America, 112(33):E4512–E4521, 2015. [2] A. Ghose, P. G. Ipeirotis, and B. Li. Examining the impact of ranking on consumer behavior and search engine revenue. Management Science, 60(7):1632–1654, 2014. [3] L. A. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in WWW search. Proceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 478–479, 2004. [4] B. Pan, H. Hembrooke, T. Joachims, L. Lorigo, G. Gay, and L. Granka. In Google we trust: Users’ decisions on rank, position, and relevance. Journal of Computer-Mediated Communication, 12(3):801– 823, 2007. [5] F. A. Pogacar, A. Ghenai, M. D. Smucker, and C. L. Clarke. The positive and negative influence of search results on people’s decisions about the efficacy of medical treatments. ICTIR 2017 - Proceedings of the 2017 ACM SIGIR International Conference on the Theory of Information Retrieval, pages 209– 216, 2017. [6] Yang, K., & Stoyanovich, J. Measuring fairness in ranked outputs. Proceedings of the 29th International Conference on Scientific and Statistical Database Management, pages 1-6, 2017.

Editor's Notes

  • Introduce myself
    Second year PhD
  • Search results on disputed topic: various viewpoints within topic
    diversity across ranking
    Position bias: trust and interact with higher results more
    Also voting preferences, judgment on medical treatment
    Important to maintain viewpoint diversity
    So far unclear how to assess viewpoint diversity in search results  this paper
  • Protected non-protected attribute
    Example: gender bias in job candidate list
    Mostly: statistical parity
    Explain formula: F is function to evaluate statistical parity
    Low value (0) is fair, high value (1) is unfair
    How to use this for viewpoint diversity?
  • Assess viewpoint div. using specific class of metrics: ranking fairness
    Simulation study on existing metrics
    Novel metric, also simulation study
  • assumption: 7 classes
    Also assume that ranking assessor has specific aim as to what they are concerned about
    We consider two different aims
  • assumption: 7 classes
    Also assume that ranking assessor has specific aim as to what they are concerned about
    We consider two different aims
  • Quickly repeat formula, metrics differ in F
    F evaluates statistical parity by comparing to ideal ranking
    Briefly describe each metric
    nDJS because others are not applicable to multinomial (details in paper)
    These metrics QUANTIFY (no “fairness criterion”)
  • Goal: see how metrics behave in different settings of viewpoint diversity
    Three data sets consisting of viewpoint labels
    Created rankings with different levels of ranking bias from each set
    Here: the more bias, the less viewpoint diversity
    Done by weighted sampling
  • Goal: see how metrics behave in different settings of viewpoint diversity
    Three data sets consisting of viewpoint labels
    Created rankings with different levels of ranking bias from each set
    Here: the more bias, the less viewpoint diversity
    Done by weighted sampling
  • Draw from data set without replacement
    Sampling is weighted
    Summary: two simulation studies, each with three sets, per set 21 settings of ranking bias, 1000 times per setting
  • Draw from data set without replacement
    Sampling is weighted
    Two weights (which varies) to advantage / disadvantage
    Summary: two simulation studies, each with three sets, per set 21 settings of ranking bias, 1000 times per setting
  • Explain ranking bias + mean metric outcome
    All metrics seem to work
    nDR is not normalized properly
    Whether to use nDD or nDKL depends on strength of ranking bias
    take home: use nDD / nDKL; proportion of protected + direction of bias is important to know
  • Works
    Doesn’t go to 1 (don’t compare)
    Take home: same lessons as before
  • Considerations need to be taken to decide which metric to use and how sensitive the metric is
    Considerations:
    Binomial or multinomial?
    The more balanced, the better the sensitivity
    If strong and binomial, use nDKL, otherwise nDD
    If protected group is advantaged, the same ranking bias produces a different outcome
    It would be good to have a simulator for interpreting metrics (I am working on that)
    In general, nDD, nDKL, or nDJS
  • Correct interpretation: awareness of data skew and bias direction
    Future work: assessment + align metric outcomes with SEME
  • ×