Advertisement

Crowdsourcing using MTurk for HCI research

Ed Chi
Principal Scientist at Google
Mar. 27, 2012
Advertisement

More Related Content

More from Ed Chi(20)

Advertisement

Crowdsourcing using MTurk for HCI research

  1. Crowdsourcing using Mechanical Turk for Human Computer Interaction Research Ed H. Chi Research Scientist Google (work done while at [Xerox] PARC) 1
  2. Historical Footnote De Prony, 1794, hired hairdressers •  (unemployed after French revolution; knew only addition and subtraction) •  to create logarithmic and trigonometric tables. •  He managed the process by splitting the work into very detailed workflows. !"#$% &'#(")$)*'%+ ,'"%- • !"#$%/ 0121 )31 56'#(")12/+7 "/1- #$)3 6'#(")$)*'% –  Grier, When computers were human, 2005 • !"#$%&'() 6'#(") – &9$*2$")+ $/)2'%'# &'#(")1- )31 !$9 6'#1) '2?*) @)3211 (2'?91#A )&*&)&%# -$./" '4 %"#12*6 6'#(")$)*'%/ $62' $/)2'%'#12/ C2*12+ D31% 6'#(")12/ 0 C2*12 2
  3. Talk in 3 Acts •  Act 1: –  How we almost failed in using MTurk?! –  [Kittur, Chi, Suh, CHI2008] •  Act II: –  Apply MTurk to visualization evaluation –  [Kittur, Suh, Chi, CSCW2008] •  Act III: –  Where are the limits? Aniket Kittur, Ed H. Chi, Bongwon Suh. Crowdsourcing User Studies With Mechanical Turk. In CHI2008. Aniket Kittur, Bongwon Suh, Ed H. Chi. Can You Ever Trust a Wiki? Impacting Perceived Trustworthiness in Wikipedia. In CSCW2008. 3
  4. Example Task from Amazon MTurk 4
  5. Using Mechanical Turk for user studies Traditional user Mechanical Turk studies Task complexity Complex Simple Long Short Task subjectivity Subjective Objective Opinions Verifiable User information Targeted demographics Unknown demographics High interactivity Limited interactivity Can Mechanical Turk be usefully used for user studies? 5
  6. Task •  Assess quality of Wikipedia articles •  Started with ratings from expert Wikipedians –  14 articles (e.g., Germany , Noam Chomsky ) –  7-point scale •  Can we get matching ratings with mechanical turk? 6
  7. Experiment 1 •  Rate articles on 7-point scales: –  Well written –  Factually accurate –  Overall quality •  Free-text input: –  What improvements does the article need? •  Paid $0.05 each 7
  8. Experiment 1: Good news •  58 users made 210 ratings (15 per article) –  $10.50 total •  Fast results –  44% within a day, 100% within two days –  Many completed within minutes 8
  9. Experiment 1: Bad news •  Correlation between turkers and Wikipedians only marginally significant (r=.50, p=.07) •  Worse, 59% potentially invalid responses Experiment 1 Invalid 49% comments <1 min 31% responses •  Nearly 75% of these done by only 8 users 9
  10. Not a good start •  Summary of Experiment 1: –  Only marginal correlation with experts. –  Heavy gaming of the system by a minority •  Possible Response: –  Can make sure these gamers are not rewarded –  Ban them from doing your hits in the future –  Create a reputation system [Delores Lab] •  Can we change how we collect user input ? 10
  11. Design changes •  Use verifiable questions to signal monitoring –  How many sections does the article have? –  How many images does the article have? –  How many references does the article have? 11
  12. Design changes •  Use verifiable questions to signal monitoring •  Make malicious answers as high cost as good-faith answers –  Provide 4-6 keywords that would give someone a good summary of the contents of the article 12
  13. Design changes •  Use verifiable questions to signal monitoring •  Make malicious answers as high cost as good-faith answers •  Make verifiable answers useful for completing task –  Used tasks similar to how Wikipedians evaluate quality (organization, presentation, references) 13
  14. Design changes •  Use verifiable questions to signal monitoring •  Make malicious answers as high cost as good-faith answers •  Make verifiable answers useful for completing task •  Put verifiable tasks before subjective responses –  First do objective tasks and summarization –  Only then evaluate subjective quality –  Ecological validity? 14
  15. Experiment 2: Results •  124 users provided 277 ratings (~20 per article) •  Significant positive correlation with Wikipedians –  r=.66, p=.01 •  Smaller proportion malicious responses •  Increased time on task Experiment 1 Experiment 2 Invalid 49% 3% comments <1 min 31% 7% responses Median time 1:30 4:06 15
  16. Quick Summary of Tips •  Mechanical Turk offers the practitioner a way to access a large user pool and quickly collect data at low cost •  Good results require careful task design 1.  Use verifiable questions to signal monitoring 2.  Make malicious answers as high cost as good-faith answers 3.  Make verifiable answers useful for completing task 4.  Put verifiable tasks before subjective responses 16
  17. Generalizing to other MTurk studies •  Combine objective and subjective questions –  Rapid prototyping: ask verifiable questions about content/ design of prototype before subjective evaluation –  User surveys: ask common-knowledge questions before asking for opinions •  Filtering for Quality –  Put in a field for Free-Form Responses and Filter out data without answers –  Results that came in too quickly –  Sort by WorkerID and look for cut and paste answers –  Look for outliers in the data that are suspicious 17
  18. Talk in 3 Acts •  Act 1: –  How we almost failed?! •  Act II: –  Applying MTurk to visualization evaluation •  Act III: –  Where are the limits? 18
  19. What would make you trust Wikipedia more? 20
  20. What is Wikipedia? Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you re getting the best possible information. – Steve Carell, The Office 21
  21. What would make you trust Wikipedia more? Nothing 22
  22. What would make you trust Wikipedia more? Wikipedia, just by its nature, is impossible to trust completely. I don't think this can necessarily be changed. 23
  23. WikiDashboard   Transparency of social dynamics can reduce conflict and coordination issues   Attribution encourages contribution –  WikiDashboard: Social dashboard for wikis –  Prototype system: http://wikidashboard.parc.com   Visualization for every wiki page showing edit history timeline and top individual editors   Can drill down into activity history for specific editors and view edits to see changes side-by-side Citation: Suh et al. CHI 2008 Proceedings 2011 UCBerkeley Visual Computing Retreat 24
  24. Hillary  Clinton   2011 UCBerkeley Visual 25 Computing Retreat 25
  25. Top  Editor  -­‐  Wasted  Time  R   2011 UCBerkeley Visual 26 Computing Retreat
  26. Surfacing information •  Numerous studies mining Wikipedia revision history to surface trust-relevant information –  Adler & Alfaro, 2007; Dondio et al., 2006; Kittur et al., 2007; Viegas et al., 2004; Zeng et al., 2006 Suh, Chi, Kittur, & Pendleton, CHI2008 •  But how much impact can this have on user perceptions in a system which is inherently mutable? 27
  27. Hypotheses 1.  Visualization will impact perceptions of trust 2.  Compared to baseline, visualization will impact trust both positively and negatively 3.  Visualization should have most impact when high uncertainty about article •  Low quality •  High controversy 28
  28. Design •  3 x 2 x 2 design Controversial Uncontroversial Visualization Abortion Volcano High quality •  High stability George Bush Shark •  Low stability •  Baseline (none) Pro-life feminism Disk defragmenter Low quality Scientology and celebrities Beeswax 29
  29. Example: High trust visualization 30
  30. Example: Low trust visualization 31
  31. Summary info •  % from anonymous users 32
  32. Summary info •  % from anonymous users •  Last change by anonymous or established user 33
  33. Summary info •  % from anonymous users •  Last change by anonymous or established user •  Stability of words 34
  34. Graph •  Instability 35
  35. Method •  Users recruited via Amazon s Mechanical Turk –  253 participants –  673 ratings –  7 cents per rating –  Kittur, Chi, & Suh, CHI 2008: Crowdsourcing user studies •  To ensure salience and valid answers, participants answered: –  In what time period was this article the least stable? –  How stable has this article been for the last month? –  Who was the last editor? –  How trustworthy do you consider the above editor? 36
  36. Results 7 High stability Baseline Low stability 6 Trustworthiness rating 5 4 3 2 1 Low qual High qual Low qual High qual Uncontroversial Controversial main effects of quality and controversy: • high-quality articles > low-quality articles (F(1, 425) = 25.37, p < .001) • uncontroversial articles > controversial articles (F(1, 425) = 4.69, p = . 031) 37
  37. Results 7 High stability Baseline Low stability 6 Trustworthiness rating 5 4 3 2 1 Low qual High qual Low qual High qual Uncontroversial Controversial interaction effects of quality and controversy: • high quality articles were rated equally trustworthy whether controversial or not, while • low quality articles were rated lower when they were controversial than when they were uncontroversial. 38
  38. Results 1.  Significant effect of visualization: High-Stability > Low-Stability, p < .001 2.  Viz has both positive and negative effects: –  High-Stability > Baseline (p < .001) > Low-Stability, p < .01 3.  No interaction of visualization with either quality or controversy –  Robust across visualization conditions 7 High stability Baseline Low stability 6 Trustworthiness rating 5 4 3 2 1 Low qual High qual Low qual High qual Uncontroversial Controversial 39
  39. Results 1.  Significant effect of visualization: High-Stability > Low-Stability, p < .001 2.  Viz has both positive and negative effects: –  High-Stability > Baseline (p < .001) > Low-Stability, p < .01 3.  No interaction of visualization with either quality or controversy –  Robust across visualization conditions 7 High stability Baseline Low stability 6 Trustworthiness rating 5 4 3 2 1 Low qual High qual Low qual High qual Uncontroversial Controversial 40
  40. Results 1.  Significant effect of visualization: High-Stability > Low-Stability, p < .001 2.  Viz has both positive and negative effects: –  High-Stability > Baseline (p < .001) > Low-Stability, p < .01 3.  No interaction of visualization with either quality or controversy –  Robust across visualization conditions 7 High stability Baseline Low stability 6 Trustworthiness rating 5 4 3 2 1 Low qual High qual Low qual High qual Uncontroversial Controversial 41
  41. Talk in 3 Acts •  Act 1: –  How we almost failed?! •  Act II: –  Applying MTurk to visualization evaluation •  Act III: –  Where are the limits? 42
  42. Limitations of Mechanical Turk •  No control of users environment –  Potential for different browsers, physical distractions –  General problem with online experimentation •  Not yet designed for user studies –  Difficult to do between-subjects design –  May need some programming •  Hard to control user population –  hard to control demographics, expertise 43
  43. Crowdsourcing for HCI Research •  Does my interface/visualization work? –  WikiDashboard: transparency vis for Wikipedia [Suh et al.] –  Replicating Perceptual Experiments [Heer et al., CHI2010] •  Coding of large amount of user data –  What is a Question in Twitter? [Sharoda Paul, Lichan Hong, Ed Chi] •  Incentive mechanisms –  Intrinsic vs. Extrinsic rewards: Games vs. Pay –  [Horton & Chilton, 2010 for Mturk] and [Ariely, 2009] in general 44
  44. Crowdsourcing for HCI Research •  Does my interface/visualization work? –  WikiDashboard: transparency vis for Wikipedia [Suh et al. VAST, Kittur et al. CSCW2008] –  Replicating Perceptual Experiments [Heer et al., CHI2010] •  Coding of large amount of user data –  What is a Question in Twitter? [S. Paul, L. Hong, E. Chi, ICWSM 2011] •  Incentive mechanisms –  Intrinsic vs. Extrinsic rewards: Games vs. Pay –  [Horton & Chilton, 2010 on MTurk] and Satisficing –  [Ariely, 2009] in general: Higher pay != Better work 45
  45. Managing Quality •  Quality through redundancy: Combining votes –  Majority vote [work best when similar worker quality] –  Worker-Quality‐adjusted vote –  Managing dependencies •  Quality through gold data –  Advantaged when imbalanced dataset & bad workers •  Estimating worker quality (Redundancy + Gold) –  Calculate the confusion matrix and see if you actually get some information from the worker •  Toolkit: http://code.google.com/p/get‐another‐label/ Source: Ipeirotis, WWW2011 46
  46. Coding and Machine Learning !"#$%& '(%)*"(+ •  Integration with Machine Learning • ,)#-+' %-.&% */-"+"+0 1-*- using –  Build automatic classification models crowdsourced data • 2'& */-"+"+0 1-*- *( .)"%1 #(1&% Data from existing crowdsourced answers N New C Case Automatic Model Automatic (through machine learning) Answer Source: Ipeirotis, WWW2011 47
  47. Crowd Programming for Complex Tasks •  Decompose tasks into smaller tasks –  Digital Taylorism –  Frederick Winslow Taylor (1856-1915) –  1911 'Principles Of Scientific Management’ •  Crowd Programming Explorations –  MapReduce Models •  Kittur, A.; Smus, B.; and Kraut, R. CHI2011EA on CrowdForge. •  Kulkarni, Can, Hartmann, CHI2011 workshop & WIP –  Little, G.; Chilton, L.; Goldman, M.; and Miller, R. C. In KDD 2010 Workshop on Human Computation. 48
  48. CHI 2011 • Work-in-Progress May 7–12, 2011 • Vancouver, BC, Canada Crowd Programming for Complex Tasks ! ! "#!$%&!'%()(*!%!(&+,-.-+/!&01,+((-#2!('+&!-(!%&&3-+/!'1! &%0'-'-1#!('+&!%()+/!:10)+0(!'1!,0+%'+!%#!%0'-,3+!18'3-#+*! +%,4!-'+$!-#!'4+!&%0'-'-1#5!64+(+!'%()(!%0+!-/+%337! 0+&0+(+#'+/!%(!%#!%00%7!1.!(+,'-1#!4+%/-#2(!(8,4!%(! •  Crowd Programming Explorations (-$&3+!+#1824!'1!9+!%#(:+0%93+!97!%!(-#23+!:10)+0!-#!%! (410'!%$18#'!1.!'-$+5!;10!+<%$&3+*!%!$%&!'%()!.10! EF-('107G!%#/!EH+120%&47G5!"#!%#!+#=-01#$+#'!:4+0+! :10)+0(!:183/!,1$&3+'+!4-24!+..10'!'%()(*!'4+!#+<'!('+&! –  Kittur, A.; Smus, B.; and Kraut, R. CHI2011EA on %0'-,3+!:0-'-#2!,183/!%()!%!:10)+0!'1!,133+,'!1#+!.%,'!1#! %!2-=+#!'1&-,!-#!'4+!%0'-,3+>(!18'3-#+5!?83'-&3+!-#('%#,+(! $-24'!9+!'1!4%=+!(1$+1#+!:0-'+!%!&%0%20%&4!.10!+%,4! (+,'-1#5!F1:+=+0*!'4+!/-..-,83'7!%#/!'-$+!-#=13=+/!-#! CrowdForge. 1.!%!$%&!'%()(!,183/!9+!-#('%#'-%'+/!.10!+%,4!&%0'-'-1#@! +525*!$83'-&3+!:10)+0(!,183/!9+!%()+/!'1!,133+,'!1#+!.%,'! .-#/-#2!'4+!-#.10$%'-1#!.10!%#/!:0-'-#2!%!,1$&3+'+! &%0%20%&4!.10!%!4+%/-#2!-(!%!$-($%',4!'1!'4+!31:!:10)! +%,4!1#!%!'1&-,!-#!&%0%33+35! ,%&%,-'7!1.!$-,01I'%()!$%0)+'(5!648(!:+!901)+!'4+!'%()! –  Kulkarni, Can, Hartmann, CHI2011 workshop & WIP 8&!.80'4+0*!(+&%0%'-#2!'4+!-#.10$%'-1#!,133+,'-1#!%#/! ;-#%337*!0+/8,+!'%()(!'%)+!%33!'4+!0+(83'(!.01$!%!2-=+#! :0-'-#2!(89'%()(5!B&+,-.-,%337*!+%,4!(+,'-1#!4+%/-#2! $%&!'%()!%#/!,1#(13-/%'+!'4+$*!'7&-,%337!-#'1!%!(-#23+! .01$!'4+!&%0'-'-1#!:%(!8(+/!'1!2+#+0%'+!$%&!'%()(!-#! 0+(83'5!"#!'4+!%0'-,3+!:0-'-#2!+<%$&3+*!%!0+/8,+!('+&! $-24'!'%)+!.%,'(!,133+,'+/!.10!%!2-=+#!'1&-,!97!$%#7! :10)+0(!%#/!4%=+!%!:10)+0!'80#!'4+$!-#'1!%!&%0%20%&45! “Please solve the 16-question SAT located at A#7!1.!'4+(+!('+&(!,%#!9+!-'+0%'-=+5!;10!+<%$&3+*!'4+! http://bit.ly/SATexam”. In both cases, we paid workers '1&-,!.10!%#!%0'-,3+!(+,'-1#!/+.-#+/!-#!%!.-0('!&%0'-'-1#! between $0.10 and $0.40 per HIT. Each “subdivide” or ,%#!-'(+3.!9+!&%0'-'-1#+/!-#'1!(89(+,'-1#(5!B-$-3%037*!'4+! &%0%20%&4(!0+'80#+/!.01$!1#+!0+/8,'-1#!('+&!,%#!-#! “merge” HIT received answers within 4 hours; solutions '80#!9+!0+10/+0+/!'401824!%!(+,1#/!0+/8,'-1#!('+&5! to the initial task were complete within 72 hours. !"#$%#&'()$#% C+!+<&310+/!%(!%!,%(+!('8/7!'4+!,1$&3+<!'%()!1.! :0-'-#2!%#!+#,7,31&+/-%!%0'-,3+5!C0-'-#2!%#!%0'-,3+!-(!%! Results ,4%33+#2-#2!%#/!-#'+0/+&+#/+#'!'%()!'4%'!-#=13=+(!$%#7! The decompositions produced by Turkers while running /-..+0+#'!(89'%()(D!&3%##-#2!'4+!(,1&+!1.!'4+!%0'-,3+*! 41:!-'!(4183/!9+!('08,'80+/*!.-#/-#2!%#/!.-3'+0-#2! Turkomatic are displayed in Figure 1 (essay-writing) -#.10$%'-1#!'1!-#,38/+*!:0-'-#2!8&!'4%'!-#.10$%'-1#*! .-#/-#2!%#/!.-<-#2!20%$$%0!%#/!(&+33-#2*!%#/!$%)-#2! and Figure 4 (SAT). '4+!%0'-,3+!,14+0+#'5!64+(+!,4%0%,'+0-('-,(!$%)+!%0'-,3+! Figure 4. For the SAT task, we uploaded :0-'-#2!%!,4%33+#2-#2!98'!0+&0+(+#'%'-=+!'+('!,%(+!.10! sixteen questions from a high school 180!%&&01%,45! In the essay task, each “subdivide” HIT was posted Scholastic Aptitude Test to the web and three times by Turkomatic and the best of the three 61!(13=+!'4-(!&0193+$!:+!,0+%'+/!%!(-$&3+!.31:! *)+',$%-.%/",&)"0%,$#'0&#%12%"%3100"41,"&)5$% was selected by experimenters (simulating Turker 49 posed ,1#(-('-#2!1.!%!&%0'-'-1#*!$%&*!%#/!0+/8,+!('+&5!!64+! the following task to Turkomatic: 6,)&)7+%&"#89% “Please solve the 16-question SAT located voting) to continue the solution process. The proposed at http://bit.ly/SATexam”. decompositions were overwhelmingly linear and chose 1804
  49. Future Directions in Crowdsourcing •  Real-time Crowdsourcing –  Bigham, et al. VizWiz, UIST 2010 What color is this pillow? What denomination is Do you see picnic tables What temperature is my Can you please tell me What k this bill? across the parking lot? oven set to? what this can is? thi (89s) . (24s) 20 (13s) no (69s) it looks like 425 (183s) chickpeas. (91s) (105s) multiple shades (29s) 20 (46s) no degrees but the image (514s) beans (99s) n of soft green, blue and is difficult to see. (552s) Goya Beans picture gold (84s) 400 (247s) (122s) 450 Figure 2: Six questions asked by participants, the photographs they took, and answers received with latency in s 50 the total time required to answer a question. quikTurkit also distribution was set such that half of the HI
  50. Future Directions in Crowdsourcing •  Real-time Crowdsourcing –  Bigham, et al. VizWiz, UIST 2010 •  Embedding of Crowdwork inside Tools –  Bernstein, et al. Solyent, UIST 2010 51
  51. Crowd Feedback To effectively design feedback mechanism the goals of learning, engagement, and qu improvement, we first analyze the importa Future Directions in Crowdsourcing dimensions of the design space for crowd (Figure 2). Timeliness: When should feedback be sho •  Real-time Crowdsourcing In micro-task work, workers stay with tas –  Bigham, et al. VizWiz, UIST 2010 while, then move on. This implies two tim synchronously deliver feedback when wor •  Embedding of Crowdwork inside Tools engaged in a set of tasks, or asynchronou –  Bernstein, et al. Solyent, UIST 2010 feedback after workers have completed th •  Shepherding Crowdwork Synchronous feedback may have more im –  Dow et al. CHI2011 WIP task performance since while workers are still th the task domain. It also probability that workers onto similar tasks. Howe synchronous feedback p burden on the feedback they have little time to r This implies a need for t scheduling algorithms th near real-time feedback Asynchronous feedback 52 feedback providers more Figure 2: Current systems (in orange) focus on asynchronous, single-bit feedback by requesters. review and comment on
  52. Tutorials •  Matt Lease http://ir.ischool.utexas.edu/crowd/ •  AAAI 2011 (w HCOMP 2011): Human Computation: Core Research Questions and State of the Art (E. Law & Luis von Ahn) •  WSDM 2011: Crowdsourcing 101: Putting the WSDM of Crowds to Work for You (Omar Alonso and Matthew Lease) –  http://ir.ischool.utexas.edu/wsdm2011_tutorial.pdf •  LREC 2010 Tutorial: Statistical Models of the Annotation Process (Bob Carpenter and Massimo Poesio) –  http://lingpipe-blog.com/2010/05/17/ •  ECIR 2010: Crowdsourcing for Relevance Evaluation. (Omar Alonso) –  http://wwwcsif.cs.ucdavis.edu/~alonsoom/crowdsourcing.html •  CVPR 2010: Mechanical Turk for Computer Vision. (Alex Sorokin and Fei‐Fei Li) –  http://sites.google.com/site/turkforvision/ •  CIKM 2008: Crowdsourcing for Relevance Evaluation (D. Rose) –  http://videolectures.net/cikm08_rose_cfre/ •  WWW2011: Managing Crowdsourced Human Computation (Panos Ipeirotis) –  http://www.slideshare.net/ipeirotis/managing-crowdsourced-human-computation 53
  53. Social Q&A on Twitter! ! S.  Paul,  L.  Hong,  E.  Chi,  ICWSM  2011     3/27/12 54
  54. Why social Q&A?! ! ! People turn to their friends on social networks because they trust their friends to provide tailored answers to subjective questions on niche topics.! ! 3/27/12! 55
  55. Research Questions! ! What kinds of questions are Twitter users asking their friends?! ! Types and topics of questions! ! Are users receiving responses to the questions they are asking?! Number, speed, and relevancy of responses! ! How does the nature of the social network affect Q&A behavior?! Size and usage of network, reciprocity of relationship! 3/27/12 58
  56. Identifying question tweets was challenging! ! Advertisement framed as question! ! ! Rhetorical question! ! ! Missing context! ! ! Used heuristics to identify candidate tweets! that were possibly questions! ! 3/27/12 59
  57. Classifying candidates tweets using Mechanical Turk! Crowd-sourced question tweet identification to Amazon Mechanical Turk! ! ! Control tweet! ! •  Each Tweet classified by two Turkers! ! •  Each Turker classified 25 tweets: 20 candidates and 5 control tweets! •  Only accepted data from Turkers who classified all control tweets correctly! 3/27/12 60
  58. Overall method for filtering questions! ! Candidate tweets   Random sample of public tweets   Applied heuristics to ! identify candidate tweets ! 12,000! 1.2 million! (4,100 presented to Turkers)!   Classified candidates! Tracked responses ! using Mechanical Turk! to each candidate tweet! 624! 1152! 3/27/12 61
  59. Findings: Types and topics of questions! ! Rhetorical (42%), factual (16%), and poll (15%) questions were common! Significant percentage of personal & health (11%)questions! ! ! Question types! Question topics! How do you feel about Which team is better interracial dating? others   raiders or steelers? 16%   uncategorized   5%   entertainment   professional   32%   4%   restaurant/food   Any good iPad app 4%   recommendations? current  events   In UK, when you need to 4%   gree8ngs     technology   see a specialist, do you 7%   10%   personal   need special forms or &  health   permission? 11%   ethics  &   philosophy   7%   Any idea how to lost weight fast? 3/27/12 62
  60. Findings: Responses to questions! 8 ! 7 ! log(number of questions) 6 Number of responses 5 have a long tail Low (18.7%) 4 distribution! response rate in 3 general, but quick 2 responses! 1 0 ! 0 1 2 3 4 5 6 7 8 10 16 17 28 29 39 147 Number of answers Most often reciprocity between asker and answerer was one-way (55%)! Responses were largely (84%) relevant! 3/27/12 ! 63
  61. Findings: Social network characteristics! ! Which characteristics of asker predict whether she will receive a response?! ! ! Network size and status in network are good ! predictors of whether asker will receive response! Logistic regression modeling (structural properties)! ! Number of followers (+) " "Number of tweets posted! Number of days on Twitter (+)" "Frequency of use of Twitter! Ratio of followers/followees (+)! Reciprocity rate (-)! ! ! ! 3/27/12 64
  62. Thanks! •  chi@acm.org •  http://edchi.net •  @edchi •  Aniket Kittur, Ed H. Chi, Bongwon Suh. Crowdsourcing User Studies With Mechanical Turk. In Proceedings of the ACM Conference on Human- factors in Computing Systems (CHI2008), pp.453-456. ACM Press, 2008. Florence, Italy. •  Aniket Kittur, Bongwon Suh, Ed H. Chi. Can You Ever Trust a Wiki? Impacting Perceived Trustworthiness in Wikipedia. In Proc. of Computer- Supported Cooperative Work (CSCW2008), pp. 477-480. ACM Press, 2008. San Diego, CA. [Best Note Award] 66
Advertisement