Successfully reported this slideshow.

Assessing Viewpoint Diversity in Search Results Using Ranking Fairness Metrics

0

Share

Loading in …3
×
1 of 11
1 of 11

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Assessing Viewpoint Diversity in Search Results Using Ranking Fairness Metrics

  1. 1. 1 WIS Web Information Systems Assessing Viewpoint Diversity in Search Results Using Ranking Fairness Metrics Tim Draws1, Nava Tintarev1, Ujwal Gadiraju1, Alessandro Bozzon1, and Benjamin Timmermans2 1TU Delft, The Netherlands 2IBM, The Netherlands t.a.draws@tudelft.nl https://timdraws.net
  2. 2. 2 WIS Web Information Systems Biases in web search “Search Engine Manipulation Effect”1,2 Yes! Yes! Yes! Yes! Yes! No! No! How can we measure viewpoint diversity in search results?
  3. 3. 3 WIS Web Information Systems Ranking fairness metrics Rank Candidate gender 1 m 2 f 3 m 4 m 5 m … …
  4. 4. 4 WIS Web Information Systems Our paper RQ: Can ranking fairness metrics be used to assess viewpoint diversity in search results? What we did: • Defined two notions of viewpoint diversity • Conducted two simulation studies to 1. evaluate existing metrics 2. evaluate novel metric that we propose
  5. 5. 5 WIS Web Information Systems Representing viewpoints Should we all be vegan? Extremely opposing Opposing Somewhat opposing Neutral Somewhat supporting Supporting Extremely supporting -3 -2 -1 0 +1 +2 +3
  6. 6. 6 WIS Web Information Systems Representing viewpoints Should we all be vegan? Strongly opposing Opposing Somewhat opposing Neutral Somewhat supporting Supporting Strongly supporting protected non-protected Binomial viewpoint fairness -3 -2 -1 0 +1 +2 +3
  7. 7. 7 WIS Web Information Systems Representing viewpoints Should we all be vegan? Strongly opposing Opposing Somewhat opposing Neutral Somewhat supporting Supporting Strongly supporting protected Multinomial viewpoint fairness -3 -2 -1 0 +1 +2 +3
  8. 8. 8 WIS Web Information Systems Simulation studies • Three synthetic data sets S1, S2, S3 • Per set created rankings to simulate different levels of viewpoint diversity (ranking bias) • Computed metrics on each simulated ranking
  9. 9. 9 WIS Web Information Systems Results Considerations: – What is the underlying aim? – How balanced is the data overall? – How strong is the ranking bias? – What is the direction of ranking bias?0.0 0.2 0.4 0.6 0.8 1.0 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Ranking bias MeannDDvalue
  10. 10. 10 WIS Web Information Systems Take home • Ranking fairness metrics can be used for assessing viewpoint diversity in search results – (when interpreted correctly) • Future work – Appropriate viewpoint labels? – Appropriate level of viewpoint diversity? – Assess viewpoint diversity in real search results – Align different metric and behavioral outcomes t.a.draws@tudelft.nl https://timdraws.net
  11. 11. 11 WIS Web Information Systems References [1] R. Epstein and R. E. Robertson. The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proceedings of the National Academy of Sciences of the United States of America, 112(33):E4512–E4521, 2015. [2] F. A. Pogacar, A. Ghenai, M. D. Smucker, and C. L. Clarke. The positive and negative influence of search results on people’s decisions about the efficacy of medical treatments. ICTIR 2017 - Proceedings of the 2017 ACM SIGIR International Conference on the Theory of Information Retrieval, pages 209– 216, 2017.

Editor's Notes

  • Introduce myself
    Second year PhD
  • Search results on disputed topic: various viewpoints within topic
    Position bias: trust and interact with higher results more
    SEME: voting preferences, judgment on medical treatment
    But what would be viewpoint diverse? We don’t know
    First step: measure viewpoint diversity in rankings
    Problem: no method for this!
  • We study whether ranking fairness metrics can perform this task
    Two notions: what a ranking assessor might be looking for
    Conducted simulation study for each notion and evaluated metrics
  • Categorisation into 7 viewpoints
    Task: classify search results into this taxonomy
  • assumption: 7 classes
    Also assume that ranking assessor has specific aim as to what they are concerned about
    We consider two different aims
  • Goal: see how metrics behave in different settings of viewpoint diversity
    Three data sets consisting of viewpoint labels
    Created rankings with different levels of ranking bias from each set
    Here: the more bias, the less viewpoint diversity
    Done by weighted sampling
  • Considerations need to be taken to decide which metric to use and how sensitive the metric is
    Considerations:
    Binomial or multinomial?
    The more balanced, the better the sensitivity
    If strong and binomial, use nDKL, otherwise nDD
    If protected group is advantaged, the same ranking bias produces a different outcome
    It would be good to have a simulator for interpreting metrics (I am working on that)
    In general, nDD, nDKL, or nDJS
  • Correct interpretation: awareness of data skew and bias direction
    Future work: assessment + align metric outcomes with SEME
    Also talk about our own work here
  • Protected non-protected attribute
    Ranking algorithm should be agnostic to whether a subject has the protected class or not
    Example: gender bias in job candidate list
    Mostly: statistical parity
    Explain formula: F is function to evaluate statistical parity
    Low value (0) is fair, high value (1) is unfair
    How to use this for viewpoint diversity?
  • Explain ranking bias + mean metric outcome
    All metrics seem to work
    nDR is not normalized properly
    Whether to use nDD or nDKL depends on strength of ranking bias
    take home: use nDD / nDKL; proportion of protected + direction of bias is important to know
  • Works
    Doesn’t go to 1 (don’t compare)
    Take home: same lessons as before
  • Draw from data set without replacement
    Sampling is weighted
    Two weights (which varies) to advantage / disadvantage
    Summary: two simulation studies, each with three sets, per set 21 settings of ranking bias, 1000 times per setting
  • Quickly repeat formula, metrics differ in F
    F evaluates statistical parity by comparing to ideal ranking
    Briefly describe each metric
    nDJS because others are not applicable to multinomial (details in paper)
    These metrics QUANTIFY (no “fairness criterion”)
  • Goal: see how metrics behave in different settings of viewpoint diversity
    Three data sets consisting of viewpoint labels
    Created rankings with different levels of ranking bias from each set
    Here: the more bias, the less viewpoint diversity
    Done by weighted sampling
  • Draw from data set without replacement
    Sampling is weighted
    Summary: two simulation studies, each with three sets, per set 21 settings of ranking bias, 1000 times per setting
  • ×