2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica
Upcoming SlideShare
Loading in...5
×
 

2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica

on

  • 2,045 views

This is the talk I gave at the Academica Sinica Inst. for Information Science in Taiwan. It focuses on our Wikipedia and Amazon Mechanical Turk research.

This is the talk I gave at the Academica Sinica Inst. for Information Science in Taiwan. It focuses on our Wikipedia and Amazon Mechanical Turk research.

Statistics

Views

Total Views
2,045
Views on SlideShare
2,040
Embed Views
5

Actions

Likes
1
Downloads
9
Comments
0

2 Embeds 5

http://www.slideshare.net 4
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica Presentation Transcript

  • Ed H. Chi Area Manager and Principal Scientist Augmented Social Cognition Area Palo Alto Research Center
  •   Cognition:  the  ability  to  remember,  think,  and  reason;  the  faculty  of   knowing.     Social  Cognition:  the  ability  of  a  group  to  remember,  think,  and   reason;  the  construction  of  knowledge  structures  by  a  group.   –  (not  quite  the  same  as  in  the  branch  of  psychology  that  studies  the   cognitive  processes  involved  in  social  interaction,  though  included)     Augmented  Social  Cognition:  Supported  by  systems,  the   enhancement    of  the  ability  of  a  group  to  remember,  think,  and   reason;  the  system-­‐supported  construction  of  knowledge   structures  by  a  group.     Citation:  Chi,  IEEE  Computer,  Sept  2008   2010-02-22 Ed H. Chi ASC Overview 2 2
  • Characteriza*on   Models   Evalua*ons   Prototypes     Characterize activity on social systems with analytics   Model interaction social and community dynamics and variables   Prototype tools to increase benefits or reduce cost   Evaluate prototypes via Living Laboratories with real users 3 2010-02-22 Ed H. Chi ASC Overview 3
  •   Characterization and Modeling: –  Community Analytics and Wikipedia Dynamics   Prototyping: –  Social Transparency thru WikiDashboard   Evaluation: –  Evaluations using Amazon Mechanical Turk 4 2010-02-22 Ed H. Chi ASC Overview 4
  • Characteriza*on   Models   Evalua*ons   Prototypes  
  • Conflict/Coordination  Effects  in  Wikipedia   2010-02-22 Ed H. Chi ASC Overview 6
  • Mediator  Pattern  -­‐  Terri  Schiavo   Anonymous (vandals/ spammers) Sympathetic to husband Mediators Sympathetic to parents 2010-02-22 Ed H. Chi ASC Overview 7
  • Measure  of  controversy   •  Controversial”  tag   • Use  #  revisions  tagged  controversial   2010-02-22 Ed H. Chi ASC Overview 8
  • Page  metrics   •  Possible  metrics  for  identifying  conflict  in  articles   Metric type Page Type Revisions (#) Article, talk, article/talk Page length Article, talk, article/talk Unique editors Article, talk, article/talk Unique editors / revisions Article, talk Links from other articles Article, talk Links to other articles Article, talk Anonymous edits (#, %) Article, talk Administrator edits (#, %) Article, talk Minor edits (#, %) Article, talk Reverts (#, by unique Article editors) 2010-02-22 Ed H. Chi ASC Overview 9
  • Performance:  Cross-­‐validation   • 5x  cross-­‐validation,  R2  =  0.897   2010-02-22 Ed H. Chi ASC Overview 10
  • Determinants  of  conflict   Highly weighted features of conflict model:  Revisions  (talk)    Minor  edits  (talk)    Unique  editors  (talk)    Revisions  (article)    Unique  editors  (article)    Anonymous  edits  (talk)    Anonymous  edits  (article)   2010-02-22 Ed H. Chi ASC Overview 11
  • Number of Articles (Log Scale) http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia’s_growth 2010-02-22 Ed H. Chi ASC Overview 12 12
  • 2010-02-22 Ed H. Chi ASC Overview 13 13
  • Monthly Edits 2010-02-22 Ed H. Chi ASC Overview 14 14
  • Monthly Edits 2010-02-22 Ed H. Chi ASC Overview 15 15
  • Monthly Active Editors 2010-02-22 Ed H. Chi ASC Overview 16 16
  • Characteriza*on   Models   Evalua*ons   Prototypes  
  • 2010-02-22 Ed H. Chi ASC Overview 18 18
  •   Edits beget edits –  more number of previous edits, more number of new edits Growth rate depends on current population N r = growth rate of the population N(t) = N 0 ⋅ e rt dN = r⋅ N dt Growth rate Current of population € population € 2010-02-22 Ed H. Chi ASC Overview 19 19
  •   Ecological population growth model –  r, growth rate of the population –  K, carrying capacity (due to resource limitation) 4000000 K 3500000 3000000 dN N Population 2500000 = r ⋅ N ⋅ (1− ) 2000000 dt K 1500000 1000000 500000 0 2000 2002 2004 2006 2008 2010 Year 2010-02-22 Ed H. Chi ASC Overview 20 20
  •   Follows a logistic growth curve New Article http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia’s_growth 2010-02-22 Ed H. Chi ASC Overview 21 21
  •   Carrying Capacity as a function of time. K(t) Population 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Year 2010-02-22 Ed H. Chi ASC Overview 22 22
  •   Biological system –  Competition increases as population hit the limits of the ecology –  Advantage go to members of the population that have competitive dominance over others   Analogy –  Limited opportunities to make novel contributions –  Increased patterns of conflict and dominance 2010-02-22 Ed H. Chi ASC Overview 23 23
  • 2010-02-22 Ed H. Chi ASC Overview 24 24
  •   Highly skewed contribution pattern –  Top 3% users contribute 50%+ edits –  A lot of single-edit users   Five Editor Classes –  Monthly edit count –  No bot, vandalism included in the analysis –  1000+: editors who made more than 1000 edits in that month –  100-999 –  10-99 –  2-9 –  1 2010-02-22 Ed H. Chi ASC Overview 25 25
  • Monthly Edits by Editor Class (in thousands) 2010-02-22 Ed H. Chi ASC Overview 26 26
  • 2010-02-22 Ed H. Chi ASC Overview 27 27
  • Monthly Ratio of Reverted Edits 2010-02-22 Ed H. Chi ASC Overview 28 28
  •   Two interpretations: –  Overall increased resistance from the Wikipedia community to changing content –  Disparity of treatment of edits »  Occasional editors have been reverted in a higher rate   Example of increased patterns of conflict and dominance Photo: http://www.flickr.com/photos/efan78/3619921561/ 2010-02-22 Ed H. Chi ASC Overview 29 29
  • 2010-02-22 Ed H. Chi ASC Overview 30 30
  • Bongwon Suh, Gregorio Convertino, Ed H. Chi, Peter Pirolli. WikiSym 2009 2010-02-22 Ed H. Chi ASC Overview 31 31
  • Characteriza*on   Models   Evalua*ons   Prototypes  
  • “Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you’re getting the best possible information.” – Steve Carell, The Office 2010-02-22 Ed H. Chi ASC Overview 33 33
  •   Content in Wikipedia can be added or changed by anyone   Because of this, WP has become one of the most important resources on the web –  Hundreds of thousands of contributors –  Over 2 million articles –  5th most used websites (Alexa.com)   Also because of this, is viewed with skepticism by readers, press, researchers 2010-02-22 Ed H. Chi ASC Overview 34 34
  • 2010-02-22 Ed H. Chi ASC Overview 35 35
  • Nothing 2010-02-22 Ed H. Chi ASC Overview 36 36
  • “Wikipedia, just by its nature, is impossible to trust completely. I don't think this can necessarily be changed.” 2010-02-22 Ed H. Chi ASC Overview 37 37
  •   Risks with using Wikipedia –  Accuracy of content –  Motives of editors –  Expertise of editors –  Stability of article –  Coverage of topics –  Quality of cited information Insufficient information to evaluate trustworthiness 2010-02-22 Ed H. Chi ASC Overview 38 38
  •   Transparency of social dynamics can reduce conflict and coordination issues   Attribution encourages contribution –  WikiDashboard: Social dashboard for wikis –  Prototype system: http://wikidashboard.parc.com   Visualization for every wiki page showing edit history timeline and top individual editors   Can drill down into activity history for specific editors and view edits to see changes side-by-side Citation: Suh et al. CHI 2008 Proceedings 39 2010-02-22 Ed H. Chi ASC Overview 39
  • 2010-02-22 Ed H. Chi ASC Overview 40 40
  • 2010-02-22 Ed H. Chi ASC Overview 41
  • Characteriza*on   Models   Evalua*ons   Prototypes  
  • Surfacing information •  Numerous studies mining Wikipedia revision history to surface trust-relevant information –  Adler & Alfaro, 2007; Dondio et al., 2006; Kittur et al., 2007; Viegas et al., 2004; Zeng et al., 2006 Suh, Chi, Kittur, & Pendleton, CHI2008 •  But how much impact can this have on user perceptions in a system which is inherently mutable? 43
  • Hypotheses 1.  Visualization will impact perceptions of trust 2.  Compared to baseline, visualization will impact trust both positively and negatively 3.  Visualization should have most impact when high uncertainty about article •  Low quality •  High controversy 44
  • Design •  3 x 2 x 2 design Controversial Uncontroversial Visualization Abortion Volcano High quality •  High stability George Bush Shark •  Low stability •  Baseline (none) Pro-life feminism Disk defragmenter Low quality Scientology and celebrities Beeswax 45
  • Example: High trust visualization 46
  • Example: Low trust visualization 47
  • Summary info •  % from anonymous users 48
  • Summary info •  % from anonymous users •  Last change by anonymous or established user 49
  • Summary info •  % from anonymous users •  Last change by anonymous or established user •  Stability of words 50
  • Graph •  Instability 51
  • Graph •  Instability •  Revert activity 52
  • Method •  Users recruited via Amazon’s Mechanical Turk –  253 participants –  673 ratings –  7 cents per rating –  Kittur, Chi, & Suh, CHI 2008: Crowdsourcing user studies •  To ensure salience and valid answers, participants answered: –  In what time period was this article the least stable? –  How stable has this article been for the last month? –  Who was the last editor? –  How trustworthy do you consider the above editor? 53
  • Results main effects of quality and controversy: • high-quality articles > low-quality articles (F(1, 425) = 25.37, p < .001) • uncontroversial articles > controversial articles (F(1, 425) = 4.69, p = . 031) 54
  • Results interaction effects of quality and controversy: • high quality articles were rated equally trustworthy whether controversial or not, while • low quality articles were rated lower when they were controversial than when they were uncontroversial service. 55
  • Results 1.  Significant effect of visualization –  High > low, p < .001 2.  Viz has both positive and negative effects –  High > baseline, p < .001 –  Low > baseline, p < .01 3.  No interaction of visualization with either quality or controversy –  Robust across conditions 56
  • Results 1.  Significant effect of visualization –  High > low, p < .001 2.  Viz has both positive and negative effects –  High > baseline, p < .001 –  Low > baseline, p < .01 3.  No interaction of visualization with either quality or controversy –  Robust across conditions 57
  • Results 1.  Significant effect of visualization –  High > low, p < .001 2.  Viz has both positive and negative effects –  High > baseline, p < .001 –  Low > baseline, p < .01 3.  No interaction effect of visualization with either quality or controversy –  Robust across conditions 58
  • Characteriza*on   Models   Methodology Evalua*ons   Prototypes  
  • User studies •  Getting input from users is important in HCI –  surveys –  rapid prototyping –  usability tests –  cognitive walkthroughs –  performance measures –  quantitative ratings
  • User studies •  Getting input from users is expensive –  Time costs –  Monetary costs •  Often have to trade off costs with sample size
  • Online solutions •  Online user surveys •  Remote usability testing •  Online experiments •  But still have difficulties –  Rely on practitioner for recruiting participants –  Limited pool of participants
  • Crowdsourcing •  Make tasks available for anyone online to complete •  Quickly access a large user pool, collect data, and compensate users •  Experiences at PARC: –  CSL UbiComp group –  ISL’s NLTT group
  • Crowdsourcing •  Make tasks available for anyone online to complete •  Quickly access a large user pool, collect data, and compensate users •  Example: NASA Clickworkers –  100k+ volunteers identified Mars craters from space photographs –  Aggregate results “virtually indistinguishable” from expert geologists experts crowds http://clickworkers.arc.nasa.gov
  • Amazon’s Mechanical turk •  Market for “human intelligence tasks” •  Typically short, objective tasks –  Tag an image –  Find a webpage –  Evaluate relevance of search results •  Users complete for a few pennies each
  • Example task
  • Using Mechanical Turk for user studies Traditional user Mechanical Turk studies Task complexity Complex Simple Long Short Task subjectivity Subjective Objective Opinions Verifiable User information Targeted demographics Unknown demographics High interactivity Limited interactivity Can Mechanical Turk be usefully used for user studies?
  • Task •  Assess quality of Wikipedia articles •  Started with ratings from expert Wikipedians –  14 articles (e.g., “Germany”, “Noam Chomsky”) –  7-point scale •  Can we get matching ratings with mechanical turk?
  • Experiment 1 •  Rate articles on 7-point scales: –  Well written –  Factually accurate –  Overall quality •  Free-text input: –  What improvements does the article need? •  Paid $0.05 each
  • Experiment 1: Good news •  58 users made 210 ratings (15 per article) –  $10.50 total •  Fast results –  44% within a day, 100% within two days –  Many completed within minutes
  • Experiment 1: Bad news •  Correlation between turkers and Wikipedians only marginally significant (r=.50, p=.07) •  Worse, 59% potentially invalid responses Experiment 1 Invalid 49% comments <1 min 31% responses •  Nearly 75% of these done by only 8 users
  • Not a good start •  Summary so far: –  Only marginal correlation with experts. –  Heavy gaming of the system by a minority •  Possible Response: –  Can make sure these gamers are not rewarded –  Ban them from doing your hits in the future –  Create a reputation system [Delores Lab] •  Can we change how we collect user input ?
  • Design changes •  Use verifiable questions to signal monitoring –  “How many sections does the article have?” –  “How many images does the article have?” –  “How many references does the article have?”
  • Design changes •  Use verifiable questions to signal monitoring •  Make malicious answers as high cost as good-faith answers –  “Provide 4-6 keywords that would give someone a good summary of the contents of the article”
  • Design changes •  Use verifiable questions to signal monitoring •  Make malicious answers as high cost as good-faith answers •  Make verifiable answers useful for completing task –  Used tasks similar to how Wikipedians described evaluating quality (organization, presentation, references)
  • Design changes •  Use verifiable questions to signal monitoring •  Make malicious answers as high cost as good-faith answers •  Make verifiable answers useful for completing task •  Put verifiable tasks before subjective responses –  First do objective tasks and summarization –  Only then evaluate subjective quality –  Ecological validity?
  • Experiment 2: Results •  124 users provided 277 ratings (~20 per article) •  Significant positive correlation with Wikipedians (r=. 66, p=.01) •  Smaller proportion malicious responses •  Increased time on task Experiment 1 Experiment 2 Invalid 49% 3% comments <1 min 31% 7% responses Median time 1:30 4:06
  • Generalizing to other user studies •  Combine objective and subjective questions –  Rapid prototyping: ask verifiable questions about content/design of prototype before subjective evaluation –  User surveys: ask common-knowledge questions before asking for opinions
  • Limitations of mechanical turk •  No control of users’ environment –  Potential for different browsers, physical distractions –  General problem with online experimentation •  Not designed for user studies –  Difficult to do between-subjects design –  Involves some programming •  Users –  Uncertainty about user demographics, expertise
  • Conclusion •  Mechanical Turk offers the practitioner a way to access a large user pool and quickly collect data at low cost •  Good results require careful task design 1.  Use verifiable questions to signal monitoring 2.  Make malicious answers as high cost as good-faith answers 3.  Make verifiable answers useful for completing task 4.  Put verifiable tasks before subjective responses
  • Ed  H.  Chi  (manager,  PS)   Peter  Pirolli  (RF)   Lichan  Hong   Bongwon  Suh   Les  Nelson   Rowan  Nairn     Gregorio  Convertino     Interns/Collaborators:     Sanjay  Kairam,  Jilin  Chen  (UMinn),  Michael  Bernstein  (MIT)   http://asc-­‐parc.blogspot.com     Ed H. Chi ASC Overview 81 2010-02-22
  • 2010-02-22 Ed H. Chi ASC Overview 82
  •   r, growth rate dN N = rN(1− )   K, carrying capacity dt K 4000000 3500000 3000000 € K dominates 2500000 r dominates when N K 2000000 when N is small N 1500000 N (1− ) ≈ 0 1000000 (1− ) ≈ 1 K 500000 K 0 2000 2002 2004 2006 2008 2010 Year 2010-02-22 Ed H. Chi ASC Overview 83 € €
  •   r-Strategist –  Growth or exploitation –  Less-crowded niches / produce many offspring   K-Strategist –  Conservation –  Strong competitors in crowded niches / invest more heavily in fewer offspring   Evolution cycle –  Resilience of an ecological system –  Gunderson & Holling 2001 2010-02-22 Ed H. Chi ASC Overview 84
  •   Exponential growth model dN –  Growth rate depends on the current N = r*N dt   Ecological population growth model –  r, growth rate of the population –  K, carrying capacity (due to resource limitation) € dN N = rN(1− ) dt K 2010-02-22 € Ed H. Chi ASC Overview 85
  •   People-ware –  Growing resistance to changing content –  Coordination cost and bureaucracy   Knowledge-ware: Availability of easy topics to write about   Tool-ware: Quality of tools used by editors and admins http://www.aerostich.com/ http://www.mikestreetmedia.co.uk/blog/wp-content/uploads/2009/01/knowledge.jpg 2010-02-22 Ed H. Chi ASC Overview 86 http://youropenbook.agitprop.co.uk/growing.php?p=2