Sidelines:
An Algorithm for Increasing Diversity in News
and Opinion Aggregators

Sean Munson, Daniel Zhou, Paul Resnick
S...
“front page stories from the last seven days shows that liberal
sites… have had multiple articles a day on the front page ...
today

• Diversity goals
• Sidelines algorithm, based on votes and voters
• Diversity measures, based on votes, voters, an...
diversity goals

• Make people feel represented
• Proportional representation of viewpoints
• Expose everyone to challengi...
approval voting
• Each voter can vote for
  an unlimited number of
  items, up to once each
• Select the kitems with
  the...
approval voting
• Each voter can vote for
  an unlimited number of
  items, up to once each
• Select the kitems with
  the...
approval voting                        sidelines
• Each voter can vote for              • Each voter can vote for an
  an ...
documents
      A   B    C   D      E   F   Approval
                                             Sidelines
              ...
documents
      A   B    C   D      E   F   Approval
                                             Sidelines
              ...
documents
      A   B    C   D      E   F   Approval
                                                  Sidelines
         ...
documents
      A   B    C   D      E   F   Approval
                                                  Sidelines
         ...
documents
      A   B    C   D      E   F   Approval
                                                  Sidelines
         ...
documents
      A   B    C   D      E   F   Approval
                                                  Sidelines
         ...
Measures
Inclusion /Exclusion :: Alienation :: Proportionality
inclusion / exclusion

Inclusion: portion of voters who had something
  they voted for in the result set

Exclusion: porti...
Salienation

How far down the result list to find a voted-for item.
For user u, result set K:




 so for result set K:
proportional representation
Groups G=(g1, g2, g3), and each voter has membership
in these groups




For set of users U, r...
proportional representation (continued)
Items’ representativeness defined according to voters’
affiliations:




So for se...
proportional representation (continued)

Compare vectors UG    and KG   using Kullback-
Leibler divergence:
proportional representation (continued)

Compare vectors UG    and KG   using Kullback-
Leibler divergence:
sidelines vs. approval voting (pure popularity)
Digg World and Business Category

Data from 11 October 2008 to 30 November 2008.

Daily average:
                 New stor...
Digg World and Business Category




                  Pure Popularity Sidelines      p
     Inclusion             0.651  ...
Data source: Links from 500 Political Blogs
• Links treated as votes, blogs as voters
• 24 Oct – 25 Nov
• Blogs coded as l...
Edges indicate Jaccard similarity above average.
Multidimensional scaling layout according to Jaccard similarity.
proportional representation

                                                                    Pure popularity
         ...
inclusion, alienation

High inclusion score for sidelines (0.445) than pure
popularity (0.419) (paired t-test, p<0.001).

...
noticeable differences?

Asked 40 subjects to view
12-item result sets for
sidelines or pure popularity.

(Not told there ...
noticeable differences


      Somewhat liberally-biased set of
      readers had an 89% chance of
      finding something...
mixed preferences for diversity

“I make a point of visiting websites with viewpoints
different than my own, so I would ha...
applications

• News aggregators based on user votes.
• Other voting systems where diversity matters
  (e.g. Google Modera...
applications

• News aggregators based on user votes.
• Other voting systems where diversity matters
  (e.g. Google Modera...
future work

• Enhancements to sidelines algorithm

• Alternative algorithms

• Actual preferences & behavior for challeng...
thanks!

Sean Munson samunson@umich.edu
Daniel Zhou    mrzhou@umich.edu
Paul Resnickpresnick@umich.edu
Sidelines: An Algorithm for Increasing Diversity in News and Opinion Aggregators
Sidelines: An Algorithm for Increasing Diversity in News and Opinion Aggregators
Sidelines: An Algorithm for Increasing Diversity in News and Opinion Aggregators
Upcoming SlideShare
Loading in …5
×

Sidelines: An Algorithm for Increasing Diversity in News and Opinion Aggregators

1,690 views

Published on

Aggregators rely on votes, and links to select and present subsets of the large quantity of news and opinion items generated each day. Opinion and topic diversity in the output sets can provide individual and societal benefits, but simply selecting the most popular items may not yield as much diversity as is present in the overall pool of votes and links.

In this paper, we define three diversity metrics that address different dimensions of diversity: inclusion, alienation, and proportional representation. We then present the Sidelines algorithm – which temporarily suppresses a voter’s preferences after a preferred item has been selected – as one approach to increase the diversity of result sets. In comparison to collections of the most popular items, from user votes on Digg.com and links from a panel of political blogs, the Sidelines algorithm increased inclusion while decreasing alienation. For the blog links, a set with known political preferences, we also found that Sidelines improved proportional representation. In an online experiment using blog link data as votes, readers were more likely to find something challenging to their views in the Sidelines result sets. These findings can help build news and opinion aggregators that present users with a broader range of topics and opinions.

Paper at http://www.smunson.com/portfolio/projects/aggdiversity/Sidelines-ICWSM.pdf

1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
1,690
On SlideShare
0
From Embeds
0
Number of Embeds
103
Actions
Shares
0
Downloads
0
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide
  • If people do not feel represented – they feel they have not been heard, and they don’t see content that supports them -- they may exit to places where they do. This can create balkanization and polarization. Sunstein and others have warned about the problems this may cause for democracy, society.
  • Make people feel represented can encourage people to speak up (who would have otherwise remained silent to promote social harmony). People may also be more open to hearing other views after they feel they have been heard.Proportional representation of viewpoints. As Duncan mentioned yesterday, people are not very good at knowing when others agree or disagree with them. They tend to think that support for their point of view is broader than it is. Those in the minority may think they are in the majority, and when their candidate does not win or their idea is not selected, they may feel disenfranchised or concoct conspiracy theories about how the election was “rigged” or “stolen.” Proportionally representing ideas can help people realize when they are in the minority. This can increase legitimacy of public decisions. It also may encourage the majority to stop and listen to dissenting views.Finally, exposing everyone to challenging viewpoints can lead to better problem solving as more ideas and viewpoints are included in the conversation. It can also help reduce polarization.
  • Not saying that approval voting is exactly what anyone uses!
  • Exclusion is just 1-inclusion.
  • S_alienation normalized by the maximum alienation so it always falls on the range [1/(|K|+1), 1]
  • Actually do need to know something about user groups and affiliations for this metric.
  • Blogs of one bias more likely to link to items linked by blogs with the same bias (Jaccard similarity).
  • Remember that the alienation score as how far down the list people had to go on average. So, a little over half the time, they didn't get an item at all. When they did which counted as an alienation of 1. The other half the time, on average, they had to go about 30-40% of the way down the list.
  • Subjects were recruited primarily from the university of Michigan and were somewhat liberally biased.
  • Need to do more work in this area of actual user preferences and behavior with respect to diversity.
  • The Obama administration used Google Moderator on Change.gov during the transition to collect questions. In one category, most of the top questions were about the legalization of marijuana. Sidelines may have let the stop question still be on this topic while letting other questions also make it to the first few questions.
  • Complement to content analysis approaches.
  • So we have an algorithm for increasing diversity without knowledge about content or voters’ political affiliations, and some potential applications for this algorithm. We also have some metrics for measuring diversity in result sets where people have voted on the candidate items. What’s next?Enhancement:suppress votes based on users’ voting history, optimize parametersOther algorithms:clustering based on votes
  • Sidelines: An Algorithm for Increasing Diversity in News and Opinion Aggregators

    1. 1. Sidelines: An Algorithm for Increasing Diversity in News and Opinion Aggregators Sean Munson, Daniel Zhou, Paul Resnick School of Information, University of Michigan
    2. 2. “front page stories from the last seven days shows that liberal sites… have had multiple articles a day on the front page while weeks will go by without a single major conservative blog achieving popular status.” – Simon Owens, Mediashift Blog September 2008
    3. 3. today • Diversity goals • Sidelines algorithm, based on votes and voters • Diversity measures, based on votes, voters, and affiliations • Pilot test – metrics – user response • Future work
    4. 4. diversity goals • Make people feel represented • Proportional representation of viewpoints • Expose everyone to challenging viewpoints
    5. 5. approval voting • Each voter can vote for an unlimited number of items, up to once each • Select the kitems with the most votes For news aggregator, votes weighted according to age
    6. 6. approval voting • Each voter can vote for an unlimited number of items, up to once each • Select the kitems with the most votes Risk of tipping? With approval voting, a small majority may be able to claim all the top kspots. For news aggregator, votes weighted according to age
    7. 7. approval voting sidelines • Each voter can vote for • Each voter can vote for an an unlimited number of unlimited number of items, up to once each items, up to once each • Select the kitems with • Selection: repeat k times the most votes 1) Select item with the most votes 2) Voters for that item Risk of tipping? sidelined for next t turns With approval voting, a small majority may be able to claim all the top kspots. For news aggregator, votes weighted according to age
    8. 8. documents A B C D E F Approval Sidelines voting 1 ✔ ✔ ✔ ✔ 2 ✔ ✔ ✔ 3 ✔ ✔ ✔ 4 ✔ ✔ ✔ 5 ✔ ✔ 6 ✔ ✔ total 3 4 2 3 2 3
    9. 9. documents A B C D E F Approval Sidelines voting 1 ✔ ✔ ✔ ✔ B 2 ✔ ✔ ✔ A 3 ✔ ✔ ✔ D 4 ✔ ✔ ✔ F 5 ✔ ✔ 6 ✔ ✔ total 3 4 2 3 2 3
    10. 10. documents A B C D E F Approval Sidelines voting 1 ✔ ✔ ✔ ✔ B B 2 ✔ ✔ ✔ A 3 ✔ ✔ ✔ D 4 ✔ ✔ ✔ F 5 ✔ ✔ Wait of just 1 turn 6 ✔ ✔ total 3 4 2 3 2 3
    11. 11. documents A B C D E F Approval Sidelines voting 1 ✔ ✔ ✔ ✔ B B 2 ✔ ✔ ✔ A C 3 ✔ ✔ ✔ D 4 ✔ ✔ ✔ F 5 ✔ ✔ Wait of just 1 turn 6 ✔ ✔ total 0 0 2 0 2 0
    12. 12. documents A B C D E F Approval Sidelines voting 1 ✔ ✔ ✔ ✔ B B 2 ✔ ✔ ✔ A C 3 ✔ ✔ ✔ D A 4 ✔ ✔ ✔ F 5 ✔ ✔ Wait of just 1 turn 6 ✔ ✔ total 3 4 0 3 0 3
    13. 13. documents A B C D E F Approval Sidelines voting 1 ✔ ✔ ✔ ✔ B B 2 ✔ ✔ ✔ A C 3 ✔ ✔ ✔ D A 4 ✔ ✔ ✔ F E 5 ✔ ✔ Wait of just 1 turn 6 ✔ ✔ total 0 1 2 1 2 1
    14. 14. Measures Inclusion /Exclusion :: Alienation :: Proportionality
    15. 15. inclusion / exclusion Inclusion: portion of voters who had something they voted for in the result set Exclusion: portion who didn’t.
    16. 16. Salienation How far down the result list to find a voted-for item. For user u, result set K: so for result set K:
    17. 17. proportional representation Groups G=(g1, g2, g3), and each voter has membership in these groups For set of users U, representation vector: UG
    18. 18. proportional representation (continued) Items’ representativeness defined according to voters’ affiliations: So for set K:
    19. 19. proportional representation (continued) Compare vectors UG and KG using Kullback- Leibler divergence:
    20. 20. proportional representation (continued) Compare vectors UG and KG using Kullback- Leibler divergence:
    21. 21. sidelines vs. approval voting (pure popularity)
    22. 22. Digg World and Business Category Data from 11 October 2008 to 30 November 2008. Daily average: New stories 4600 Diggs (votes) 85000 Voters 24000
    23. 23. Digg World and Business Category Pure Popularity Sidelines p Inclusion 0.651 0.668 <0.001 Alienation 0.476 0.463 <0.001 No user groups, so we couldn’t calculate Proportional Representation score.
    24. 24. Data source: Links from 500 Political Blogs • Links treated as votes, blogs as voters • 24 Oct – 25 Nov • Blogs coded as liberal (52%), conservative (35%), or independent (13%)
    25. 25. Edges indicate Jaccard similarity above average. Multidimensional scaling layout according to Jaccard similarity.
    26. 26. proportional representation Pure popularity showed some evidence of tipping. 0.07 Pure Popularity Some tipping in 0.06 Sidelines Sidelines as well, but 0.05 significantly less (paired t-test, p< divKL 0.04 0.03 0.001) 0.02 0.01 0 25-Oct 30-Oct 4-Nov 9-Nov 14-Nov 19-Nov 24-Nov
    27. 27. inclusion, alienation High inclusion score for sidelines (0.445) than pure popularity (0.419) (paired t-test, p<0.001). Pure Popularity Significantly reduced Sidelines Salienation for sidelines 0.85 (paired t-test, p<0.001) Salienation 0.8 0.75 0.7 25-Oct 29-Oct 2-Nov 6-Nov 10-Nov 14-Nov 18-Nov 22-Nov
    28. 28. noticeable differences? Asked 40 subjects to view 12-item result sets for sidelines or pure popularity. (Not told there were two possibilities)
    29. 29. noticeable differences Somewhat liberally-biased set of readers had an 89% chance of finding something challenging in the sidelines result set (compared with 50% for pure popularity).
    30. 30. mixed preferences for diversity “I make a point of visiting websites with viewpoints different than my own, so I would have been happy with this.” (Sidelines) “it’s good to know diverse opinions, but, on the other hand, I can’t take too much of the opinions that disagree with mine.” (Pure Popularity) “I wouldn't use a news aggregator, but because it's liberally biased [in agreement with subject’s views], I'm ok with it.” (Pure Popularity)
    31. 31. applications • News aggregators based on user votes. • Other voting systems where diversity matters (e.g. Google Moderator) • Don’t need to know anything about content, user groups, or long-term voting behavior
    32. 32. applications • News aggregators based on user votes. • Other voting systems where diversity matters (e.g. Google Moderator) • Don’t need to know anything about content, user groups, or long-term voting behavior
    33. 33. future work • Enhancements to sidelines algorithm • Alternative algorithms • Actual preferences & behavior for challenging vs. affirming content • Presentation to make people feel represented (while still viewing on challenging items!)
    34. 34. thanks! Sean Munson samunson@umich.edu Daniel Zhou mrzhou@umich.edu Paul Resnickpresnick@umich.edu

    ×