SlideShare a Scribd company logo
1 of 19
Information Systems
                           Scholars
                     A Research Tool for
                      Organizations and
                     Information Systems
                           Scholars
                          Kevin Crowston
        Syracuse University          National Science Foundation
        crowston@syr.edu                  kcrowsto@nsf.gov
                      http://crowston.syr.edu/


This research and presentation have been supported by National Science Foundation, the research through
Grant 09–68470. Any opinions, findings, and conclusions or recommendations expressed in this material are
those of the author and do not necessarily reflect the views of the National Science Foundation.
Amazon Mechanical Turk
Potential benefits & limitations
         of AMT for research
         of AMT for research
Low cost to recruit subjects
Amazon handles payments, so Turkers are anonymous
Possible to recruit a diverse subject population
Can easily recruit multiple subjects at one time for
collaborative experiments
Only basic features for selecting or filtering participants
No control over work setting or equipment
Hard to know how well Turker understands task
Limited opportunities for follow up
Concerns about reliability and validity of data
Positivist research concerns



        Measurement

                                    Conclusion
                                     External
                                     validity

   Noise             Bias
  Reliability   Internal validity
Applications of AMT for research
Reliability & validity concerns
Research           Mode 1: Data about
concern            Turkers



Reliability (i.e.,
                   Use multiple indicators per
errors in
                   construct
responses)



                  Prevent or remove duplicate
Internal validity responses
(i.e., biased     Consider effects of
responses)        monetary compensation on
                  research questions

                   Examine time taken to
                   perform task
Spam               Examine pattern of
                   responses
                   Include check questions
External         Not perfectly representative
validity (i.e.,
                 of Internet users, but not
generalizability
                 worse than alternatives
)
Applications of AMT for research
Reliability & validity concerns
Research           Mode 1: Data about            Mode 2: Data about
concern            Turkers                       research stimulus

                                                 Careful task design
Reliability (i.e.,                               Prequalify Turkers
                   Use multiple indicators per
errors in                                        Replicate work
                   construct
responses)                                       Use AMT to validate
                                                 responses


                  Prevent or remove duplicate
Internal validity responses
(i.e., biased     Consider effects of         Careful task design
responses)        monetary compensation on
                  research questions

                   Examine time taken to
                                                 Same as mode 1
                   perform task
                                                 Include gold standard data
Spam               Examine pattern of
                                                 Compare responses to detect
                   responses
                                                 outliers
                   Include check questions
External         Not perfectly representative
validity (i.e.,
                 of Internet users, but not   N/A
generalizability
                 worse than alternatives
)
Applications of AMT for research
Reliability & validity concerns
Research           Mode 1: Data about            Mode 2: Data about             Mode 3: Data about
concern            Turkers                       research stimulus              interaction

                                                 Careful task design Prequalify
Reliability (i.e.,                               Turkers                        Use multiple indicators per
                   Use multiple indicators per
errors in                                        Replicate work                 construct
                   construct
responses)                                       Use AMT to validate            Prequalify Turkers
                                                 responses


                  Prevent or remove duplicate                                   Same as mode 1
Internal validity responses                                                     Design task to minimize
(i.e., biased     Consider effects of                                           demand
responses)        monetary compensation on                                      Minimize time to reduce
                  research questions                                            discussion of experiment

                   Examine time taken to
                                                 Same as mode 1                 Same as mode 1
                   perform task
                                                 Include gold standard data     Include objective- answer
Spam               Examine pattern of
                                                 Compare responses to detect    questions that demonstrate
                   responses
                                                 outliers                       task performance
                   Include check questions
External         Not perfectly representative
validity (i.e.,
                 of Internet users, but not   N/A                               Same as mode 1
generalizability
                 worse than alternatives
)
Applications of AMT for research
Potential benefits & limitations
         of AMT for research
         of AMT for research
Low cost to recruit subjects
Amazon handles payments, so Turkers are anonymous
Possible to recruit a diverse subject population
Can easily recruit multiple subjects at one time for
collaborative experiments
Only basic features for selecting or filtering participants
No control over work setting or equipment
Hard to know how well Turker understands task
Limited opportunities for follow up
Concerns about reliability and validity of data
Reliability & validity concerns
Research concern     Mode 3: Data about interaction


Reliability (i.e.,
                     Use multiple indicators per construct
errors in
                     Prequalify Turkers
responses)



                     Prevent or remove duplicate responses
Internal validity
                     Consider effects of monetary compensation on research questions
(i.e., biased
                     Design task to minimize demand
responses)
                     Minimize time to reduce discussion of experiment




                     Examine time taken to perform task
                     Examine pattern of responses
Spam
                     Include check questions
                     Include objective-answer questions that demonstrate task performance


External validity
(i.e.,               Not perfectly representative of Internet users, but not worse than alternatives
generalizability)
Conclusions
AMT can be a useful tool for research
 Cheap quick access to a useful pool of subjects or
 assistants

But need to be conscious of the issues in use
  Issues depend on the kind of research you’re doing
  Many of the issues are similar to other kinds of
  research (e.g., reliability of measures)
  Internal validity: Unique issue to AMT is spammers
  External validity: Not perfectly representative but not
  unrepresentative
Acknowledgements
Nathan Prestopnik and Andrea Wiggins

Developers: Gongying Pu, Shu Zhang, Trupti Rane,
Nathan Brown, Chris Duarte, Susan Furest, Yang Liu,
Nitin Mule, Sheila Sicilia, Jessica Smith, Peiyuan Sun,
Xueqing Xuan and Zhiruo Zhao

UMD: Anne Bowser, Jennifer Preece, Dana Rotman;
Smithsonian: Jennifer Hammock; Discover Life: Nancy
Lowe, John Pickering

NSF Grant 09–68470

More Related Content

What's hot

Guidelines to Understanding Design of Experiment and Reliability Prediction
Guidelines to Understanding Design of Experiment and Reliability PredictionGuidelines to Understanding Design of Experiment and Reliability Prediction
Guidelines to Understanding Design of Experiment and Reliability Predictionijsrd.com
 
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...Abdel Salam Sayyad
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineeringalessio_ferrari
 
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Abdel Salam Sayyad
 
Popular Delusions, Crowds, and the Coming Deluge: end of the Oracle?
Popular Delusions, Crowds, and the Coming Deluge: end of the Oracle?Popular Delusions, Crowds, and the Coming Deluge: end of the Oracle?
Popular Delusions, Crowds, and the Coming Deluge: end of the Oracle?Bob Binder
 
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to ValidityControlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validityalessio_ferrari
 
TEST CASE PRIORITIZATION USING FUZZY LOGIC BASED ON REQUIREMENT PRIORITIZING
TEST CASE PRIORITIZATION USING FUZZY LOGIC  BASED ON REQUIREMENT PRIORITIZINGTEST CASE PRIORITIZATION USING FUZZY LOGIC  BASED ON REQUIREMENT PRIORITIZING
TEST CASE PRIORITIZATION USING FUZZY LOGIC BASED ON REQUIREMENT PRIORITIZINGijcsa
 
Design of experiments formulation development exploring the best practices ...
Design of  experiments  formulation development exploring the best practices ...Design of  experiments  formulation development exploring the best practices ...
Design of experiments formulation development exploring the best practices ...Maher Al absi
 
Applied Psych Test Design: Part A--Planning, development frameworks & domain/...
Applied Psych Test Design: Part A--Planning, development frameworks & domain/...Applied Psych Test Design: Part A--Planning, development frameworks & domain/...
Applied Psych Test Design: Part A--Planning, development frameworks & domain/...Kevin McGrew
 
Test design techniques
Test design techniquesTest design techniques
Test design techniquesReginaKhalida
 
IRJET- Sentiment Analysis: Algorithmic and Opinion Mining Approach
IRJET- Sentiment Analysis: Algorithmic and Opinion Mining ApproachIRJET- Sentiment Analysis: Algorithmic and Opinion Mining Approach
IRJET- Sentiment Analysis: Algorithmic and Opinion Mining ApproachIRJET Journal
 
Exploratory testing STEW 2016
Exploratory testing STEW 2016Exploratory testing STEW 2016
Exploratory testing STEW 2016Per Runeson
 
Doe Taguchi Basic Manual1
Doe Taguchi Basic Manual1Doe Taguchi Basic Manual1
Doe Taguchi Basic Manual1nazeer pasha
 

What's hot (20)

Guidelines to Understanding Design of Experiment and Reliability Prediction
Guidelines to Understanding Design of Experiment and Reliability PredictionGuidelines to Understanding Design of Experiment and Reliability Prediction
Guidelines to Understanding Design of Experiment and Reliability Prediction
 
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineering
 
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
 
Popular Delusions, Crowds, and the Coming Deluge: end of the Oracle?
Popular Delusions, Crowds, and the Coming Deluge: end of the Oracle?Popular Delusions, Crowds, and the Coming Deluge: end of the Oracle?
Popular Delusions, Crowds, and the Coming Deluge: end of the Oracle?
 
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to ValidityControlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
 
Wcre13a.ppt
Wcre13a.pptWcre13a.ppt
Wcre13a.ppt
 
TEST CASE PRIORITIZATION USING FUZZY LOGIC BASED ON REQUIREMENT PRIORITIZING
TEST CASE PRIORITIZATION USING FUZZY LOGIC  BASED ON REQUIREMENT PRIORITIZINGTEST CASE PRIORITIZATION USING FUZZY LOGIC  BASED ON REQUIREMENT PRIORITIZING
TEST CASE PRIORITIZATION USING FUZZY LOGIC BASED ON REQUIREMENT PRIORITIZING
 
Design of experiments formulation development exploring the best practices ...
Design of  experiments  formulation development exploring the best practices ...Design of  experiments  formulation development exploring the best practices ...
Design of experiments formulation development exploring the best practices ...
 
Applied Psych Test Design: Part A--Planning, development frameworks & domain/...
Applied Psych Test Design: Part A--Planning, development frameworks & domain/...Applied Psych Test Design: Part A--Planning, development frameworks & domain/...
Applied Psych Test Design: Part A--Planning, development frameworks & domain/...
 
Kost for china-2011
Kost for china-2011Kost for china-2011
Kost for china-2011
 
Test design techniques
Test design techniquesTest design techniques
Test design techniques
 
Object modeling
Object modelingObject modeling
Object modeling
 
IRJET- Sentiment Analysis: Algorithmic and Opinion Mining Approach
IRJET- Sentiment Analysis: Algorithmic and Opinion Mining ApproachIRJET- Sentiment Analysis: Algorithmic and Opinion Mining Approach
IRJET- Sentiment Analysis: Algorithmic and Opinion Mining Approach
 
Exploratory testing STEW 2016
Exploratory testing STEW 2016Exploratory testing STEW 2016
Exploratory testing STEW 2016
 
Testing
TestingTesting
Testing
 
Effective unit testing
Effective unit testingEffective unit testing
Effective unit testing
 
Wcre13b.ppt
Wcre13b.pptWcre13b.ppt
Wcre13b.ppt
 
Doe Taguchi Basic Manual1
Doe Taguchi Basic Manual1Doe Taguchi Basic Manual1
Doe Taguchi Basic Manual1
 
Amcat test-syllabus
Amcat test-syllabusAmcat test-syllabus
Amcat test-syllabus
 

Viewers also liked

Scholarship in the Digital Age
Scholarship in the Digital AgeScholarship in the Digital Age
Scholarship in the Digital AgeEric Meyer
 
Information Systems design science writing articles
Information Systems design science writing articlesInformation Systems design science writing articles
Information Systems design science writing articlesRaimo Halinen
 
Quantitative Analysis of User-Generated Content on the Web
Quantitative Analysis of User-Generated Content on the WebQuantitative Analysis of User-Generated Content on the Web
Quantitative Analysis of User-Generated Content on the WebXavier Ochoa
 
Cms Basic
Cms BasicCms Basic
Cms BasicPG Bhat
 
Using Bibliometrics in the Library
Using Bibliometrics in the LibraryUsing Bibliometrics in the Library
Using Bibliometrics in the LibraryState Of Innovation
 
Design Science Introduction
Design Science IntroductionDesign Science Introduction
Design Science Introductionpajo01
 
Design Science in Information Systems
Design Science in Information SystemsDesign Science in Information Systems
Design Science in Information SystemsSergej Lugovic
 
Information Systems design science research
Information Systems design science  researchInformation Systems design science  research
Information Systems design science researchRaimo Halinen
 
Information Systems Action design research method
Information Systems Action design research methodInformation Systems Action design research method
Information Systems Action design research methodRaimo Halinen
 
Information Systems Action research methods
Information Systems  Action research methodsInformation Systems  Action research methods
Information Systems Action research methodsRaimo Halinen
 
How to run ANOVA on SPSS
How to run ANOVA on SPSSHow to run ANOVA on SPSS
How to run ANOVA on SPSSAzmi Mohd Tamil
 
How to draw Scatter plot on SPSS
How to draw Scatter plot on SPSSHow to draw Scatter plot on SPSS
How to draw Scatter plot on SPSSAzmi Mohd Tamil
 
Introduction to spss: define variables
Introduction to spss: define variablesIntroduction to spss: define variables
Introduction to spss: define variablesAzmi Mohd Tamil
 
How to run Pearson's Chi-square for SPSS
How to run Pearson's Chi-square for SPSSHow to run Pearson's Chi-square for SPSS
How to run Pearson's Chi-square for SPSSAzmi Mohd Tamil
 
Running Pearson's Correlation on SPSS
Running Pearson's Correlation on SPSSRunning Pearson's Correlation on SPSS
Running Pearson's Correlation on SPSSAzmi Mohd Tamil
 
How to run Simple Linear Regression on SPSS
How to run Simple Linear Regression on SPSSHow to run Simple Linear Regression on SPSS
How to run Simple Linear Regression on SPSSAzmi Mohd Tamil
 
Non-parametric analysis: Wilcoxon, Kruskal Wallis & Spearman
Non-parametric analysis: Wilcoxon, Kruskal Wallis & SpearmanNon-parametric analysis: Wilcoxon, Kruskal Wallis & Spearman
Non-parametric analysis: Wilcoxon, Kruskal Wallis & SpearmanAzmi Mohd Tamil
 
Chi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarChi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarAzmi Mohd Tamil
 
Bibliometrics, Scintometrics, Citation analysis, Content analysis
Bibliometrics, Scintometrics, Citation analysis, Content analysisBibliometrics, Scintometrics, Citation analysis, Content analysis
Bibliometrics, Scintometrics, Citation analysis, Content analysisSumit Ranjan
 

Viewers also liked (20)

Scholarship in the Digital Age
Scholarship in the Digital AgeScholarship in the Digital Age
Scholarship in the Digital Age
 
Information Systems design science writing articles
Information Systems design science writing articlesInformation Systems design science writing articles
Information Systems design science writing articles
 
Quantitative Analysis of User-Generated Content on the Web
Quantitative Analysis of User-Generated Content on the WebQuantitative Analysis of User-Generated Content on the Web
Quantitative Analysis of User-Generated Content on the Web
 
Cms Basic
Cms BasicCms Basic
Cms Basic
 
Using Bibliometrics in the Library
Using Bibliometrics in the LibraryUsing Bibliometrics in the Library
Using Bibliometrics in the Library
 
Information As...
Information As...Information As...
Information As...
 
Design Science Introduction
Design Science IntroductionDesign Science Introduction
Design Science Introduction
 
Design Science in Information Systems
Design Science in Information SystemsDesign Science in Information Systems
Design Science in Information Systems
 
Information Systems design science research
Information Systems design science  researchInformation Systems design science  research
Information Systems design science research
 
Information Systems Action design research method
Information Systems Action design research methodInformation Systems Action design research method
Information Systems Action design research method
 
Information Systems Action research methods
Information Systems  Action research methodsInformation Systems  Action research methods
Information Systems Action research methods
 
How to run ANOVA on SPSS
How to run ANOVA on SPSSHow to run ANOVA on SPSS
How to run ANOVA on SPSS
 
How to draw Scatter plot on SPSS
How to draw Scatter plot on SPSSHow to draw Scatter plot on SPSS
How to draw Scatter plot on SPSS
 
Introduction to spss: define variables
Introduction to spss: define variablesIntroduction to spss: define variables
Introduction to spss: define variables
 
How to run Pearson's Chi-square for SPSS
How to run Pearson's Chi-square for SPSSHow to run Pearson's Chi-square for SPSS
How to run Pearson's Chi-square for SPSS
 
Running Pearson's Correlation on SPSS
Running Pearson's Correlation on SPSSRunning Pearson's Correlation on SPSS
Running Pearson's Correlation on SPSS
 
How to run Simple Linear Regression on SPSS
How to run Simple Linear Regression on SPSSHow to run Simple Linear Regression on SPSS
How to run Simple Linear Regression on SPSS
 
Non-parametric analysis: Wilcoxon, Kruskal Wallis & Spearman
Non-parametric analysis: Wilcoxon, Kruskal Wallis & SpearmanNon-parametric analysis: Wilcoxon, Kruskal Wallis & Spearman
Non-parametric analysis: Wilcoxon, Kruskal Wallis & Spearman
 
Chi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarChi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemar
 
Bibliometrics, Scintometrics, Citation analysis, Content analysis
Bibliometrics, Scintometrics, Citation analysis, Content analysisBibliometrics, Scintometrics, Citation analysis, Content analysis
Bibliometrics, Scintometrics, Citation analysis, Content analysis
 

Similar to Information Systems Scholars Research Tool AMT Benefits Limitations

Peter Zimmerer - Passion For Testing, By Examples - EuroSTAR 2010
Peter Zimmerer - Passion For Testing, By Examples - EuroSTAR 2010Peter Zimmerer - Passion For Testing, By Examples - EuroSTAR 2010
Peter Zimmerer - Passion For Testing, By Examples - EuroSTAR 2010TEST Huddle
 
types of testing with descriptions and examples
types of testing with descriptions and examplestypes of testing with descriptions and examples
types of testing with descriptions and examplesMani Deepak Choudhry
 
201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)Javier Gonzalez-Sanchez
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Importance of Software testing in SDLC and Agile
Importance of Software testing in SDLC and AgileImportance of Software testing in SDLC and Agile
Importance of Software testing in SDLC and AgileChandan Mishra
 
Guide Controlled Experiments
Guide Controlled ExperimentsGuide Controlled Experiments
Guide Controlled Experimentslucius910
 
Role+Of+Testing+In+Sdlc
Role+Of+Testing+In+SdlcRole+Of+Testing+In+Sdlc
Role+Of+Testing+In+Sdlcmahendra singh
 
Gqm paper
Gqm paperGqm paper
Gqm paperinandhu
 
Research issues in object oriented software testing
Research issues in object oriented software testingResearch issues in object oriented software testing
Research issues in object oriented software testingAnshul Vinayak
 
Peter Zimmerer - Evolve Design For Testability To The Next Level - EuroSTAR 2012
Peter Zimmerer - Evolve Design For Testability To The Next Level - EuroSTAR 2012Peter Zimmerer - Evolve Design For Testability To The Next Level - EuroSTAR 2012
Peter Zimmerer - Evolve Design For Testability To The Next Level - EuroSTAR 2012TEST Huddle
 
Testing 1 - the Basics
Testing 1 - the BasicsTesting 1 - the Basics
Testing 1 - the BasicsArleneAndrews2
 
Lesson 7...Question Part 1
Lesson 7...Question Part 1Lesson 7...Question Part 1
Lesson 7...Question Part 1bhushan Nehete
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Lionel Briand
 
Ôn tập kiến thức ISTQB
Ôn tập kiến thức ISTQBÔn tập kiến thức ISTQB
Ôn tập kiến thức ISTQBJenny Nguyen
 
Question ISTQB foundation 3
Question ISTQB foundation 3Question ISTQB foundation 3
Question ISTQB foundation 3Jenny Nguyen
 

Similar to Information Systems Scholars Research Tool AMT Benefits Limitations (20)

Peter Zimmerer - Passion For Testing, By Examples - EuroSTAR 2010
Peter Zimmerer - Passion For Testing, By Examples - EuroSTAR 2010Peter Zimmerer - Passion For Testing, By Examples - EuroSTAR 2010
Peter Zimmerer - Passion For Testing, By Examples - EuroSTAR 2010
 
types of testing with descriptions and examples
types of testing with descriptions and examplestypes of testing with descriptions and examples
types of testing with descriptions and examples
 
201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)
 
Approach AI assurance
Approach AI assuranceApproach AI assurance
Approach AI assurance
 
Software Testing
Software Testing Software Testing
Software Testing
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Importance of Software testing in SDLC and Agile
Importance of Software testing in SDLC and AgileImportance of Software testing in SDLC and Agile
Importance of Software testing in SDLC and Agile
 
Guide Controlled Experiments
Guide Controlled ExperimentsGuide Controlled Experiments
Guide Controlled Experiments
 
Role+Of+Testing+In+Sdlc
Role+Of+Testing+In+SdlcRole+Of+Testing+In+Sdlc
Role+Of+Testing+In+Sdlc
 
Gqm paper
Gqm paperGqm paper
Gqm paper
 
Research issues in object oriented software testing
Research issues in object oriented software testingResearch issues in object oriented software testing
Research issues in object oriented software testing
 
Peter Zimmerer - Evolve Design For Testability To The Next Level - EuroSTAR 2012
Peter Zimmerer - Evolve Design For Testability To The Next Level - EuroSTAR 2012Peter Zimmerer - Evolve Design For Testability To The Next Level - EuroSTAR 2012
Peter Zimmerer - Evolve Design For Testability To The Next Level - EuroSTAR 2012
 
Testing 1 - the Basics
Testing 1 - the BasicsTesting 1 - the Basics
Testing 1 - the Basics
 
Qa Faqs
Qa FaqsQa Faqs
Qa Faqs
 
Driven to Tests
Driven to TestsDriven to Tests
Driven to Tests
 
Lesson 7...Question Part 1
Lesson 7...Question Part 1Lesson 7...Question Part 1
Lesson 7...Question Part 1
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...
 
Unit iii
Unit iiiUnit iii
Unit iii
 
Ôn tập kiến thức ISTQB
Ôn tập kiến thức ISTQBÔn tập kiến thức ISTQB
Ôn tập kiến thức ISTQB
 
Question ISTQB foundation 3
Question ISTQB foundation 3Question ISTQB foundation 3
Question ISTQB foundation 3
 

Information Systems Scholars Research Tool AMT Benefits Limitations

  • 1. Information Systems Scholars A Research Tool for Organizations and Information Systems Scholars Kevin Crowston Syracuse University National Science Foundation crowston@syr.edu kcrowsto@nsf.gov http://crowston.syr.edu/ This research and presentation have been supported by National Science Foundation, the research through Grant 09–68470. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.
  • 3.
  • 4. Potential benefits & limitations of AMT for research of AMT for research Low cost to recruit subjects Amazon handles payments, so Turkers are anonymous Possible to recruit a diverse subject population Can easily recruit multiple subjects at one time for collaborative experiments Only basic features for selecting or filtering participants No control over work setting or equipment Hard to know how well Turker understands task Limited opportunities for follow up Concerns about reliability and validity of data
  • 5. Positivist research concerns Measurement Conclusion External validity Noise Bias Reliability Internal validity
  • 6. Applications of AMT for research
  • 7. Reliability & validity concerns Research Mode 1: Data about concern Turkers Reliability (i.e., Use multiple indicators per errors in construct responses) Prevent or remove duplicate Internal validity responses (i.e., biased Consider effects of responses) monetary compensation on research questions Examine time taken to perform task Spam Examine pattern of responses Include check questions External Not perfectly representative validity (i.e., of Internet users, but not generalizability worse than alternatives )
  • 8. Applications of AMT for research
  • 9. Reliability & validity concerns Research Mode 1: Data about Mode 2: Data about concern Turkers research stimulus Careful task design Reliability (i.e., Prequalify Turkers Use multiple indicators per errors in Replicate work construct responses) Use AMT to validate responses Prevent or remove duplicate Internal validity responses (i.e., biased Consider effects of Careful task design responses) monetary compensation on research questions Examine time taken to Same as mode 1 perform task Include gold standard data Spam Examine pattern of Compare responses to detect responses outliers Include check questions External Not perfectly representative validity (i.e., of Internet users, but not N/A generalizability worse than alternatives )
  • 10. Applications of AMT for research
  • 11. Reliability & validity concerns Research Mode 1: Data about Mode 2: Data about Mode 3: Data about concern Turkers research stimulus interaction Careful task design Prequalify Reliability (i.e., Turkers Use multiple indicators per Use multiple indicators per errors in Replicate work construct construct responses) Use AMT to validate Prequalify Turkers responses Prevent or remove duplicate Same as mode 1 Internal validity responses Design task to minimize (i.e., biased Consider effects of demand responses) monetary compensation on Minimize time to reduce research questions discussion of experiment Examine time taken to Same as mode 1 Same as mode 1 perform task Include gold standard data Include objective- answer Spam Examine pattern of Compare responses to detect questions that demonstrate responses outliers task performance Include check questions External Not perfectly representative validity (i.e., of Internet users, but not N/A Same as mode 1 generalizability worse than alternatives )
  • 12.
  • 13.
  • 14.
  • 15. Applications of AMT for research
  • 16. Potential benefits & limitations of AMT for research of AMT for research Low cost to recruit subjects Amazon handles payments, so Turkers are anonymous Possible to recruit a diverse subject population Can easily recruit multiple subjects at one time for collaborative experiments Only basic features for selecting or filtering participants No control over work setting or equipment Hard to know how well Turker understands task Limited opportunities for follow up Concerns about reliability and validity of data
  • 17. Reliability & validity concerns Research concern Mode 3: Data about interaction Reliability (i.e., Use multiple indicators per construct errors in Prequalify Turkers responses) Prevent or remove duplicate responses Internal validity Consider effects of monetary compensation on research questions (i.e., biased Design task to minimize demand responses) Minimize time to reduce discussion of experiment Examine time taken to perform task Examine pattern of responses Spam Include check questions Include objective-answer questions that demonstrate task performance External validity (i.e., Not perfectly representative of Internet users, but not worse than alternatives generalizability)
  • 18. Conclusions AMT can be a useful tool for research Cheap quick access to a useful pool of subjects or assistants But need to be conscious of the issues in use Issues depend on the kind of research you’re doing Many of the issues are similar to other kinds of research (e.g., reliability of measures) Internal validity: Unique issue to AMT is spammers External validity: Not perfectly representative but not unrepresentative
  • 19. Acknowledgements Nathan Prestopnik and Andrea Wiggins Developers: Gongying Pu, Shu Zhang, Trupti Rane, Nathan Brown, Chris Duarte, Susan Furest, Yang Liu, Nitin Mule, Sheila Sicilia, Jessica Smith, Peiyuan Sun, Xueqing Xuan and Zhiruo Zhao UMD: Anne Bowser, Jennifer Preece, Dana Rotman; Smithsonian: Jennifer Hammock; Discover Life: Nancy Lowe, John Pickering NSF Grant 09–68470

Editor's Notes

  1. AMT is a “marketplace for work that requires human intelligence”. Example of crowdsourcing, meaning “outsourcing a function to a large by undefined group of people via an open call“. Largest and best characterized. Tasks on AMT are typically small
  2. Unit of work is called a HIT. Example of page of HITs--note that there are 2760 HITs available, many with multiple instances. Most are for small amounts of money. “25 percent of the HITs created on Mechanical Turk have a price tag of just US$0.01, 70 percent have a reward of $0.05 or less, and 90 percent pay less than $0.10”. average pay of US$4.80/hour for tasks. Too low and its slower to finish; too high seems bogus. Tasks can be done entirely on Amazon’s system or have a link to your own system. Results to poster include ID of Turker, answers to questions.
  3. We want to distinguish different uses of AMT with associated research concerns. First case is to collect data about Turkers, e.g., as proxy for internet users on a survey or experiment. Turkers should only do the HIT once.
  4. Reliability: as in any survey. Internal validity: studies of AMT have reported that answers seem to be truthful, when given. One-time survey doesn’t offer much opportunity for spam, but a survey can be quick, especially if you don’t read the questions before answering. Check question: when watching TV how often had you suffered a fatal heart attack? Same failure rate as in other surveys (about 5%) External validity is a key concern. Demographics of Turkers is a bit different than general population or population of Internet users Turkers were younger than average Internet users. The self-reported education was higher than average, but income lower. Most were single and without children. Furthermore, there are differences within the pool of Turkers, with resulting variability in other capabilities, e.g., the level of English abilityUS Turkers were about 2/3 female, while Indian Turkers were about 70% malemay not be appreciably less representative of the Internet or general population than other commonly-used subject pools, such as college students or subjects recruited on the InternetTurkers are human subjects for the research, so the rules and ethical principles that govern human subjects research apply, e.g., informed consent
  5. second possibility for AMT research is that the researcher is studying some collection of objects that need humans to provide data about them. Turker could provide data on many. E.g., coding data; Karin Connely used AMT to check if messages had a name or not, or if a document has an answer to a question or not.
  6. Reliability is main issue. Tasks have to be carefully defined since only training is what’s on AMT. Spam can be a big problem: estimate that 3 0% of the responses to a task they posted were provided by spammers; the spammers were a small number of the total, but posted many bogus responses. task should be designed “such that completing it accurately and in good faith requires as much or less effort than non-obvious random or malicious completion.”ethical concerns regarding the use of human subjects in research do not apply. Instead, the Turkers can be seen as out-sourced employees, raising a different set of concerns about the fairness of such employment
  7. data come from interaction of people with a stimulus. For example, a common research use of AMT is to recruit users for tests of IT systems in order to get usage data and user feedbacksubjects’ responses to the stimulus are expected to be different, rather than simply reflecting an underlying truth inherent in the stimulus as in mode 2.
  8. Validating subjective data for reliability is inherently difficult. Some of the techniques from the other modes may carry over. As in mode 1, it may be possible to use multiple items per construct to assess reliability. As in mode 2, careful task design and prequalification of Turkers will be useful. However, since many different answers could plausibly be correct [6], it is not possible to use “gold standard” data, to spot check results or to use replication to arrive at a consensus. These limitations would seem to limit the usefulness of AMT for interpretivist research in particular. potentially higher risk of demand, e.g., turkers being overly positive about a system because they think it will increase odds of getting paid For spam, one approach is to include a few questions that can be used to check that the work required for the task is actually being performed, even if the work itself can not be checked. For spam, one approach is to include a few questions that can be used to check that the work required for the task is actually being performed, even if the work itself can not be checked.
  9. Key questions: can novices actually classify moths? will they do it?