Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Royalkapila

ISOJ 2007

  • Login to see the comments

  • Be the first to like this

Royalkapila

  1. 1. What’s on Wikipedia, and What’s Not…? Completeness of Information on the Online Collaborative Encyclopedia Cindy Royal, Ph.D. Assistant Professor Texas State University School of Journalism and Mass Communication Deepina Kapila Graduate Student Texas State University School of Journalism and Mass Communication
  2. 2. Introduction - Wikipedia • Wikipedia (www.wikipedia.com), deemed “the free encyclopedia,” was launched on the web in 2001. • Since then, it has become the Web’s 3rd most popular news and information source • It uses the Wiki software format, which allows a community of users to develop and monitor content • Wikipedia operates under the assumption that the public will act as a policing force, keeping content reliable and up to date.
  3. 3. Introduction - Research • Denning et al. (2005) listed the risks inherent in Wikipedia’s model: accuracy, motives, uncertain expertise, volatility, coverage, sources. • Bopp and Smith (2001) state that coverage in an encyclopedia should be “Even across all subjects” • Shoemaker and Reese (1995) identified the individual as a news influencer. Web users and content creators tend to be young. • Tankard/Royal (2005) – inherent biases in Web content, based on systematic searches.
  4. 4. Research Questions This project measures the content of Wikipedia against various indexes or standards of completeness to identify and uncover potential inherent biases. We are asking: 1. Are there some systematic gaps or biases in the overall presentation of information made available on Wikipedia? 2. Is recency (or currency) a predictor of amount of information on Wikipedia? 3. Is importance of information a predictor of amount of information on Wikipedia? 4. Is population a predictor of amount of information about particular countries on Wikipedia? 5. Is economic power a predictor of amount of information about individual corporations on Wikipedia?
  5. 5. Method • Using predictors of recency, importance, country population, and economic power, several systematic searches on Wikipedia were conducted • Each article for each topic was visited, the relevant content highlighted, and the selection’s words were counted • Word counts were captured in a spreadsheet, and items were plotted on charts • Ascending order • Predictor variable
  6. 6. Topics Covered • Years (1900-2010) • Academy Award Winning Films • Time Magazine’s Person of the Year • #1 Song on Billboard Top 100 (1940-2006) • Encyclopedia Terms • Countries in the United Nations • Fortune 1000 companies
  7. 7. Results - Years 0 2,000 4,000 6,000 8,000 10,000 12,000 1 9 17 25 33 41 49 57 65 73 81 89 97 105 0 2,000 4,000 6,000 8,000 10,000 12,000 1900 1906 1912 1918 1924 1930 1936 1942 1948 1954 1960 1966 1972 1978 1984 1990 1996 2002 2008 Ascending Order Chronological Order -Backward L-shaped curve -Clear progression of length of article with year; dramatic increase in years after 2001 -Years in the future displayed understandably shorter word counts -Spearman Correlation between variables: .79
  8. 8. Results - Films 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 1928 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 Ascending Order Chronological Order -Backward L-shaped curve is apparent. -With few exceptions (ie. Gone with the Wind, 1939 and Casablanca, 1943) the results show progression favoring more current films. Recency is important, but certain films transcend time and are deemed important for other reasons. -Average word count for films since 2001 was 80% higher than word count before 2001. -Spearman correlation between variables: .49; increased to .62 simply by removing 2
  9. 9. Results - Person of the Year 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 1927 1931 1935 1939 1943 1947 1952 1957 1962 1967 1974 1979 1985 1991 1996 2001 Ascending Order Chronological Order -Softer backward-shaped L curve -Even distribution shows bias is unrelated to recency, measured by another variable of importance -Spearman Correlation between variables: O-there was no relationship with time.
  10. 10. Results - Billboard Top 100 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 1940 1943 1946 1949 1952 1955 1958 1961 1964 1967 1970 1973 1976 1979 1982 1985 1988 1991 1994 1997 2000 2003 2006 Ascending Order Chronological Order -Backward L-shaped curve -Although Average word count was 32% higher for artists since 1990, distribution shows trend similar to movies in that some artists transcend time. -Spearman correlation between variables: .40 (by eliminating 2 outliers)
  11. 11. Encyclopedia Terms 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 Ascending Order -Comparison between Encyclopedia Britannica and Wikipedia articles -Backward L-shaped distribution apparent -Spearman correlation used to compare inches of content in Encyclopedia Britannica with word count in Wikipedia: .26 -Of 100 terms, 14 were not represented in Wikipedia
  12. 12. Results - UN Countries 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 Ordered by populationAscending Order -Backward L-shaped curve - although fairly evenly distributed, a SHARP increase appears for the top 22 countries. -Gradual upward curve in 2nd chart shows that as population increases, so does word count -Average word count for top 10% of countries was 63% higher than the rest on the list -Spearman correlation between variables: .55
  13. 13. Results - Fortune 1000 0 1,000 2,000 3,000 4,000 5,000 6,000 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 0 1,000 2,000 3,000 4,000 5,000 6,000 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 Ascending Order Ordered by Revenue -Backward L-shaped curve -SHARP increase for top 10% of companies by revenue -Top 10% of companies by revenue counted for 30% of total word count on companies -Spearman correlation between variables: .49
  14. 14. Conclusion -Information on Wikipedia is volatile, dynamic and constantly changing over time -Wikipedia’s purpose is to serve as a general reference source, but the content is weighted due to its contributors’ demographics -In each search performed for the dimensions, strong biases were evident and strong correlations experienced: -Currency/Recency: the more current topics were covered the most -Random Selection: Encyclopedia terms showed clear bias towards more common or popular terms -Relevancy: Wikipedia’s word count correlates to inches in a traditional encyclopedia, showing a strong agenda by each publication -Population: the larger the country and the larger its population, the higher the word count -Revenue: The larger the revenue, the higher the word count

×