- 1. Analyzing Responses to Likert Items<br />An Exploration of Data from a Credibility Study Involving WikiDashboard<br />(http://wikidashboard.parc.com)<br />by Sanjay Kairam<br />
- 2. WikiDashboard Study<br />The System<br />The Study<br />The Data<br />
- 3. WikiDashboard<br />“Social Dynamic Analysis Tool” for Wikipedia<br />Michael Scott (The Office): “Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you are getting the best possible information”<br />What happens when we see who is doing the editing?<br />
- 4. WikiDashboard (Close-Up)<br />
- 5. WikiDashboard Study<br />Study conducted on Amazon Mechanical Turk<br />N = 288 subjects<br />Subjects paid $0.08 / HIT<br />“Please read and evaluate this Wikipedia Article.”<br />
- 6. Experiment Conditions<br />Participants each placed in 1 of 3 conditions (each N = 96):<br />Wiki Only (WO)<br />Wiki + History (WH)<br />WikiDashboard (WD)<br />
- 7. Articles Used<br />Each subject read 1 (of 8 possible) Wikipedia articles.<br />Article “Quality”:<br />“Low-Quality” articles were those flagged as “B-Class” or “C-Class” by the Wikipedia community.<br />“High-Quality” articles were those which had at one time been “Featured Articles”.<br />Article “Controversiality”:<br />“Controversial” articles were those on the extensive “List of Controversial Articles”.<br />
- 8. Survey<br />Self-Reported Expertise<br />“How familiar are you with the topic discussed on this Wikipedia page?”<br />Manipulation/Quality Checks<br />“In 5-20 words, please describe what this Wikipedia page is about.”<br />“Please describe one fact from the article that you found interesting.” (WO)<br />“Please name at least one user (by username or IP address) who has made multiple edits to this page. (WH, WD)<br />
- 9. Credibility Assessment<br />Assessing agreement with these statements:<br />“I believe that the information on this page is accurate.” (Accuracy)<br />“I believe that the information on this page is objective.” (Objectivity)<br />“I believe that the information on this page is current and up-to-date.” (Currency)<br />“I believe that this page fully covers the relevant information on the topic.” (Coverage)<br />“I trust the information on this page.” (Trust)<br />
- 10. Likert Item Responses<br />Participants answered using a 5-point scale:<br />-2: “Strongly Disagree”<br />-1: “Somewhat Disagree”<br />0: “Neither Agree nor Disagree”<br />+1: “Somewhat Agree”<br />+2: “Strongly Agree”<br />Now, what do we do with this data?<br />
- 11. Analyzing Likert Item Responses<br />Very often, we see papers reporting Likert responses using means:<br />What is the average of 1 “Somewhat Agree” and 3 “Somewhat Disagree”s?<br />Hint: It’s not “Somewhat Disagree and a Half”<br />In this case, what does a “mean” mean?<br />In most cases, an ANOVA would definitely not work as well, though people still try!<br />
- 12. Options for Analysis<br />Non-Parametric Tests for Ordinal Data<br />Conversion to an Interval Scale<br />Aggregating Items<br />
- 13. Mann-Whitney U Test<br />Also called “Mann-Whitney-Wolcoxon”, “Wilcoxon Rank-Sum”, or “Wilcoxon-Mann-Whitney” test.<br />Non-parametric test for assessing whether two independent samples of observations have equally large values.<br />http://en.wikipedia.org/wiki/Mann-Whitney_U<br />
- 14. Mann-Whitney U Test<br />Assumptions:<br />All observations from both groups are independent of each other.<br />The responses are ordinal or continuous measurements.<br />Null hypothesis includes symmetry between two populations considered<br />Under alternative hypothesis, probability of an observation from pop. X exceeding an observation from pop. Y is not equal to 0.5<br />http://en.wikipedia.org/wiki/Mann-Whitney_U<br />
- 15. Kruskal-Wallis ANOVA<br />What if we want to test more than 2 groups? (as we do, given our 3 experimental conditions) <br />Kruskal-Wallis ANOVA is an extension of Mann-Whitney U to 3 or more groups.<br />Also non-parametric, though it does assume that both distributions have a similar underlying shape.<br />http://en.wikipedia.org/wiki/Kruskal-Wallis_one-way_analysis_of_variance<br />
- 16. Analysis Using Non-Parametric Tests<br />Do participants actually notice differences in article quality?<br />Mann-Whitney: Significant effects of article quality for ratings of Accuracy (p< 0.001), Coverage (p< 0.01), Currency (p< 0.001), and Trust (p< 0.001), with marginally significant effect on Objectivity (p< 0.096).<br />Kruskal-Wallis: Significant effect on ratings of Accuracy (p< 0.001), Coverage (p< 0.012), Currency (p< 0.001), and Trust (p< 0.001), with no significant effect on Objectivity.<br />
- 17. Sample Boxplots: Ratings by Article Quality<br />Accuracy<br />Coverage<br />
- 18. Analysis Using Non-Parametric Tests<br />Do participants notice differences in how “controversial” an article is?<br />Mann-Whitney: Significant effect on ratings of Coverage (p < 0.039), Currency (p < 0.039), Objectivity (p < 0.021), and Trust (p < 0.021), with no effect on ratings of Accuracy.<br />Kruskal-Wallis: Significant effect on ratings of Objectivity (p < 0.042), and marginally significant effect for Coverage (p < 0.077) and Currency (p < 0.083), but no significant effect on Accuracy or Trust.<br />
- 19. Analysis Using Non-Parametric Tests<br />What we really want to know, however, is whether using WikiDashboard or Wiki + History makes participants more sensitive to article quality or controversiality than participants using Wikipedia on its own.<br />Both tests only allow us to compare populations separated on the basis of a single variable, however, so we can’t explore these interaction effects.<br />
- 20. Conversion to Interval Scale<br />If there were a way to map our Likert item responses on to an interval scale, we could use more familiar/powerful statistical tests.<br />If we found that the mapped data was normal, for instance, we could use our usual parametric tests such as MANOVA, which would help us find these interaction effects.<br />
- 21. Conversion to Interval Scale<br />E.J. Snell (1964) describes a procedure for mapping ordered data, like Likert responses, to an assumed underlying continuous scale of measurement.<br />At the end, he emphasizes that “the usefulness of the proposed method depends upon the assumption that the underlying scale of measurement can be transformed to produce a normal distribution.”<br />Snell, E.J. A Scaling Procedure for Ordered Categorical Data, Biometrics 20(3), pp. 592-607 (1964).<br />http://www.jstor.org/stable/2528498<br />
- 22. Utilizing the Snell Conversion<br />The conversion procedure was used to transform the data – essentially mapped each response (ranging from -2 to +2) to a new point which ranged from roughly -1.00 to +4.05<br />Essentially, it looks as if only the distances between the values has changed.<br />
- 23. Histogram: Original Data<br />
- 24. Histogram: Snell-Converted Data<br />
- 25. Aggregating Likert Items<br />If we consider the various Likert items to be different measurements of a certain underlying trait (Credibility), then can we sum them and run parametric statistical tests?<br />Haven’t tried this yet – is this a valid approach?<br />
- 26. Analyzing Responses to Likert Items<br />by Sanjay Kairam<br />Email: sanjay.kairam@gmail.com<br />Twitter: @skairam<br />

