CHI2007 talk on Conflicts in Wikipedia

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

1 comments

Comments 1 - 1 of 1 previous next Post a comment

  • + EdChi Ed Chi 8 months ago
    Hope you guys enjoy this. Let us know if anyone has questions. We have been doing more research on Wikipedia analysis. Check out our blog at http://asc-parc.blogspot.com
Post a comment
Embed Video
Edit your comment Cancel

Notes on slide 1

Thank you. Today I’m going to be talking about conflict and coordination in Wikipedia. This is joint work with... Most everyone knows that Wikipedia is an online encyclopedia that anyone can edit. But as I was putting this talk together I thought to myself “how can I describe what makes Wikipedia so special?” And luckily I found this video clip of Steve Carell from the TV show The Office describing it in a much more... interesting way than I possibly could.

2 Favorites

CHI2007 talk on Conflicts in Wikipedia - Presentation Transcript

  1. He Says, She Says: Conflict and Coordination in Wikipedia Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed Chi UCLA Augmented Social Cognition Group Palo Alto Research Center
  2. What is Wikipedia? “ Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you’re getting the best possible information.” – Steve Carell, The Office
  3. Spreading conflict
  4. Spreading conflict
  5. Spreading conflict
  6. Spreading conflict
  7. Spreading conflict
  8. Policy and procedure
    • “ The degree of success that one meets in dealing with conflicts... often depends on the efficiency with which one can quote policy and precedent.”
    - Wikipedia admin (survey data)
  9. Collaborative work beneath the surface
    • Visitors only look at article pages
    • But much of Wikipedia comprised of other pages
      • Conflict resolution, coordination, policies and procedures
  10. Characterizing coordination and conflict
  11. Characterizing coordination and conflict
  12. Exponential growth
  13. Costs of growth
    • Increase in conflict and coordination costs
      • Software development (Boehm, 1981; Brooks, 1975)
      • MUDs/MOOs (Curtis, 1992; Dibbell, 1993)
      • Mailing lists (Sproull & Kiesler, 1991)
    • How has growth affected Wikipedia?
      • Millions of new users and articles
  14. Infrastructure
    • Analyze entire history of Wikipedia
      • Every edit to every article
    • Large amount of data
      • 4+ million pages
      • 58+ million revisions
      • 800+ Gb
      • as of June 2006
    • Distributed processing
      • Hadoop distributed filesystem
      • Map/reduce to process data in parallel
  15. Types of work Direct work Immediately consumable Indirect work Coordination, conflict Maintenance work Reverts, vandalism Article Talk, user, procedure
  16. Less direct work
    • Decrease in proportion of edits to article page
    70%
  17. More indirect work
    • Increase in proportion of edits to user talk
    8%
  18. More indirect work
    • Increase in proportion of edits to user talk
    • Increase in proportion of edits to procedure
    11%
  19. More maintenance work
    • Increase in proportion of edits that are reverts
    7%
  20. More wasted work
    • Increase in proportion of edits that are reverts
    • Increase in proportion of edits reverting vandalism
    1-2%
  21. Global level
    • Conflict and coordination costs are growing
      • Less direct work (articles)
      • More indirect work (article talk, user, procedure)
      • More maintenance work (reverts, vandalism)
  22. Characterizing coordination and conflict
  23. Conflict at the article level
    • What defines conflict in articles?
    • Build a characterization model of article conflict
      • Identify page features and metrics associated with conflict
      • Automatically identify high-conflict articles
  24. Page metrics
    • Chose metrics for identifying conflict in articles
      • Easily computable, scalable
    Article Reverts (#, by unique editors) Article, talk Minor edits (#, %) Article, talk Administrator edits (#, %) Article, talk Anonymous edits (#, %) Article, talk Links to other articles Article, talk Links from other articles Article, talk Unique editors / revisions Article, talk, article/talk Unique editors Article, talk, article/talk Page length Article, talk, article/talk Revisions (#) Page Type Metric type
  25. Defining conflict
    • Operational definition for conflict
    • Revisions tagged controversial
    • Conflict revision count
  26. Machine learning
    • Predict conflict from page metrics
      • Training set of “controversial” pages
      • Support vector machine regression predicting # controversial revisions (SMOreg; Smola & Scholkopf, 1998)
    • Not just conflict/no conflict, but how much conflict
  27. Performance: Cross-validation
    • 5x cross-validation, R 2 = 0.897
  28. Performance: Cross-validation
    • 5x cross-validation, R 2 = 0.897
  29. Determinants of conflict
    • —  Revisions (talk)
    • —  Minor edits (talk)
    • ˜  Unique editors (talk)
    • —  Revisions (article)
    • ˜  Unique editors (article)
    • —  Anonymous edits (talk)
    • ˜  Anonymous edits (article)
    Highly weighted metrics of conflict model:
  30. Identifying untagged articles
    • Detect conflicts for unlabeled articles
      • Majority of articles have never been conflict tagged
    • Testing model generalization
      • Applied model to untagged articles
      • Sample rated by expert Wikipedians
    • Significant positive correlation with predicted scores
      • By rank correlation, p < 0.013 (Spearman’s rho)
  31. Characterizing coordination and conflict
  32. Conflict at the user level
    • How can we identify conflict between users?
    • Reverts as a proxy for user conflict
    • Revert patterns between users
    • Force directed layout to cluster users
      • Group similar viewpoints
      • Find conflicts between groups
  33. Dokdo/Takeshima opinion groups Group A Group B Group C Group D
  34. Terry Schiavo Mediators Sympathetic to parents Sympathetic to husband Anonymous (vandals/spammers)
  35. Summary: Characterizing Wikipedia
    • Coordination costs and conflict are increasing
    • Global-level: Trend identification
      • Decrease in direct article work
      • Increase in indirect coordination work
      • Increase in maintenance work
    • Article-level: Prediction using Machine learning
      • Identify characteristics of article conflict
      • Detect conflict-heavy articles needing extra attention
    • User-level: User Conflict Visualization
      • Make sense of user conflicts and identify shared viewpoints
  36. Future Work
    • Applied to many domains
      • Corporate memory (Socialtext)
      • Intelligence gathering (Intellipedia)
      • Scholarly research (Scholarpedia)
      • Collaborative problem solving (Lostpedia)
    • Application: Social Dashboard
      • Identify high conflict articles
      • Surface editing patterns to readers
      • Route attention to articles that need it most
  37. Future work
  38. He Says, She Says: Conflict and Coordination in Wikipedia Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed Chi UCLA Augmented Social Cognition Group Palo Alto Research Center Thank you!

+ Ed ChiEd Chi, 8 months ago

custom

779 views, 2 favs, 0 embeds more stats

Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed H. more

More info about this document

© All Rights Reserved

Go to text version

  • Total Views 779
    • 779 on SlideShare
    • 0 from embeds
  • Comments 1
  • Favorites 2
  • Downloads 23
Most viewed embeds

more

All embeds

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories