CHI2007 talk on Conflicts in Wikipedia

3,937 views
3,601 views

Published on

Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed H. Chi.

He Says, She Says: Conflict and Coordination in Wikipedia.

In Proc. of ACM Conference on Human Factors in Computing Systems (CHI2007), pp. 453--462, April 2007. ACM Press. San Jose, CA.

http://www-users.cs.umn.edu/~echi/papers/2007-CHI/2007-Wikipedia-coordination-PARC-CHI2007.pdf

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total views
3,937
On SlideShare
0
From Embeds
0
Number of Embeds
52
Actions
Shares
0
Downloads
42
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide
  • Thank you. Today I’m going to be talking about conflict and coordination in Wikipedia. This is joint work with... Most everyone knows that Wikipedia is an online encyclopedia that anyone can edit. But as I was putting this talk together I thought to myself “how can I describe what makes Wikipedia so special?” And luckily I found this video clip of Steve Carell from the TV show The Office describing it in a much more... interesting way than I possibly could.
  • CHI2007 talk on Conflicts in Wikipedia

    1. He Says, She Says: Conflict and Coordination in Wikipedia Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed Chi UCLA Augmented Social Cognition Group Palo Alto Research Center
    2. What is Wikipedia? “ Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you’re getting the best possible information.” – Steve Carell, The Office
    3. Spreading conflict
    4. Spreading conflict
    5. Spreading conflict
    6. Spreading conflict
    7. Spreading conflict
    8. Policy and procedure <ul><li>“ The degree of success that one meets in dealing with conflicts... often depends on the efficiency with which one can quote policy and precedent.” </li></ul>- Wikipedia admin (survey data)
    9. Collaborative work beneath the surface <ul><li>Visitors only look at article pages </li></ul><ul><li>But much of Wikipedia comprised of other pages </li></ul><ul><ul><li>Conflict resolution, coordination, policies and procedures </li></ul></ul>
    10. Characterizing coordination and conflict
    11. Characterizing coordination and conflict
    12. Exponential growth
    13. Costs of growth <ul><li>Increase in conflict and coordination costs </li></ul><ul><ul><li>Software development (Boehm, 1981; Brooks, 1975) </li></ul></ul><ul><ul><li>MUDs/MOOs (Curtis, 1992; Dibbell, 1993) </li></ul></ul><ul><ul><li>Mailing lists (Sproull & Kiesler, 1991) </li></ul></ul><ul><li>How has growth affected Wikipedia? </li></ul><ul><ul><li>Millions of new users and articles </li></ul></ul>
    14. Infrastructure <ul><li>Analyze entire history of Wikipedia </li></ul><ul><ul><li>Every edit to every article </li></ul></ul><ul><li>Large amount of data </li></ul><ul><ul><li>4+ million pages </li></ul></ul><ul><ul><li>58+ million revisions </li></ul></ul><ul><ul><li>800+ Gb </li></ul></ul><ul><ul><li>as of June 2006 </li></ul></ul><ul><li>Distributed processing </li></ul><ul><ul><li>Hadoop distributed filesystem </li></ul></ul><ul><ul><li>Map/reduce to process data in parallel </li></ul></ul>
    15. Types of work Direct work Immediately consumable Indirect work Coordination, conflict Maintenance work Reverts, vandalism Article Talk, user, procedure
    16. Less direct work <ul><li>Decrease in proportion of edits to article page </li></ul>70%
    17. More indirect work <ul><li>Increase in proportion of edits to user talk </li></ul>8%
    18. More indirect work <ul><li>Increase in proportion of edits to user talk </li></ul><ul><li>Increase in proportion of edits to procedure </li></ul>11%
    19. More maintenance work <ul><li>Increase in proportion of edits that are reverts </li></ul>7%
    20. More wasted work <ul><li>Increase in proportion of edits that are reverts </li></ul><ul><li>Increase in proportion of edits reverting vandalism </li></ul>1-2%
    21. Global level <ul><li>Conflict and coordination costs are growing </li></ul><ul><ul><li>Less direct work (articles) </li></ul></ul><ul><ul><li>More indirect work (article talk, user, procedure) </li></ul></ul><ul><ul><li>More maintenance work (reverts, vandalism) </li></ul></ul>
    22. Characterizing coordination and conflict
    23. Conflict at the article level <ul><li>What defines conflict in articles? </li></ul><ul><li>Build a characterization model of article conflict </li></ul><ul><ul><li>Identify page features and metrics associated with conflict </li></ul></ul><ul><ul><li>Automatically identify high-conflict articles </li></ul></ul>
    24. Page metrics <ul><li>Chose metrics for identifying conflict in articles </li></ul><ul><ul><li>Easily computable, scalable </li></ul></ul>Article Reverts (#, by unique editors) Article, talk Minor edits (#, %) Article, talk Administrator edits (#, %) Article, talk Anonymous edits (#, %) Article, talk Links to other articles Article, talk Links from other articles Article, talk Unique editors / revisions Article, talk, article/talk Unique editors Article, talk, article/talk Page length Article, talk, article/talk Revisions (#) Page Type Metric type
    25. Defining conflict <ul><li>Operational definition for conflict </li></ul><ul><li>Revisions tagged controversial </li></ul><ul><li>Conflict revision count </li></ul>
    26. Machine learning <ul><li>Predict conflict from page metrics </li></ul><ul><ul><li>Training set of “controversial” pages </li></ul></ul><ul><ul><li>Support vector machine regression predicting # controversial revisions (SMOreg; Smola & Scholkopf, 1998) </li></ul></ul><ul><li>Not just conflict/no conflict, but how much conflict </li></ul>
    27. Performance: Cross-validation <ul><li>5x cross-validation, R 2 = 0.897 </li></ul>
    28. Performance: Cross-validation <ul><li>5x cross-validation, R 2 = 0.897 </li></ul>
    29. Determinants of conflict <ul><li>—  Revisions (talk) </li></ul><ul><li>—  Minor edits (talk) </li></ul><ul><li>˜  Unique editors (talk) </li></ul><ul><li>—  Revisions (article) </li></ul><ul><li>˜  Unique editors (article) </li></ul><ul><li>—  Anonymous edits (talk) </li></ul><ul><li>˜  Anonymous edits (article) </li></ul>Highly weighted metrics of conflict model:
    30. Identifying untagged articles <ul><li>Detect conflicts for unlabeled articles </li></ul><ul><ul><li>Majority of articles have never been conflict tagged </li></ul></ul><ul><li>Testing model generalization </li></ul><ul><ul><li>Applied model to untagged articles </li></ul></ul><ul><ul><li>Sample rated by expert Wikipedians </li></ul></ul><ul><li>Significant positive correlation with predicted scores </li></ul><ul><ul><li>By rank correlation, p < 0.013 (Spearman’s rho) </li></ul></ul>
    31. Characterizing coordination and conflict
    32. Conflict at the user level <ul><li>How can we identify conflict between users? </li></ul><ul><li>Reverts as a proxy for user conflict </li></ul><ul><li>Revert patterns between users </li></ul><ul><li>Force directed layout to cluster users </li></ul><ul><ul><li>Group similar viewpoints </li></ul></ul><ul><ul><li>Find conflicts between groups </li></ul></ul>
    33. Dokdo/Takeshima opinion groups Group A Group B Group C Group D
    34. Terry Schiavo Mediators Sympathetic to parents Sympathetic to husband Anonymous (vandals/spammers)
    35. Summary: Characterizing Wikipedia <ul><li>Coordination costs and conflict are increasing </li></ul><ul><li>Global-level: Trend identification </li></ul><ul><ul><li>Decrease in direct article work </li></ul></ul><ul><ul><li>Increase in indirect coordination work </li></ul></ul><ul><ul><li>Increase in maintenance work </li></ul></ul><ul><li>Article-level: Prediction using Machine learning </li></ul><ul><ul><li>Identify characteristics of article conflict </li></ul></ul><ul><ul><li>Detect conflict-heavy articles needing extra attention </li></ul></ul><ul><li>User-level: User Conflict Visualization </li></ul><ul><ul><li>Make sense of user conflicts and identify shared viewpoints </li></ul></ul>
    36. Future Work <ul><li>Applied to many domains </li></ul><ul><ul><li>Corporate memory (Socialtext) </li></ul></ul><ul><ul><li>Intelligence gathering (Intellipedia) </li></ul></ul><ul><ul><li>Scholarly research (Scholarpedia) </li></ul></ul><ul><ul><li>Collaborative problem solving (Lostpedia) </li></ul></ul><ul><li>Application: Social Dashboard </li></ul><ul><ul><li>Identify high conflict articles </li></ul></ul><ul><ul><li>Surface editing patterns to readers </li></ul></ul><ul><ul><li>Route attention to articles that need it most </li></ul></ul>
    37. Future work
    38. He Says, She Says: Conflict and Coordination in Wikipedia Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed Chi UCLA Augmented Social Cognition Group Palo Alto Research Center Thank you!

    ×