Measuring Self-Focus Bias
in Community Maintained
Knowledge Repositories
Brent Hecht and Darren Gergle
Northwestern Univer...
Overview
1. Introduction
2. Study 1
3. Study 2
4. Discussion
5. Conclusion
Introduction


          Sum of World
           Knowledge
Introduction


          Sum of World
           Knowledge
Introduction
Introduction

      • Artificial Intelligence
Introduction

      • Artificial Intelligence
      • Natural Language
      Processing
Introduction

      • Artificial Intelligence
      • Natural Language
      Processing
      • Human-Computer
      Intera...
Introduction

      • Artificial Intelligence
      • Natural Language
      Processing
      • Human-Computer
      Intera...
Introduction


          World knowledge
         according to whom?
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction

• self-focus bias
  • effect of community-held opinions and interests
 on the world knowledge in Wikipedia
 ...
Introduction
terms and concepts
Introduction
  terms and concepts




subset of the English Wikipedia Article Graph (WAG)
Introduction
                          terms and concepts




subset of the English Wikipedia Article Graph (WAG)
Introduction
                          terms and concepts



                                                      • “Bara...
Introduction
                          terms and concepts




subset of the English Wikipedia Article Graph (WAG)
Introduction
                          terms and concepts



                                                 • indegree →...
Introduction
                          terms and concepts


     The United
       States             Joe Biden




      ...
Introduction
                          terms and concepts


     The United
       States             Joe Biden
          ...
Study 1
methods
 definition of focus
Study 1
               methods
                definition of focus




• focus = indegree in Wikipedia Article
Graph (WAG)
Study 1
               methods
                definition of focus




• focus = indegree in Wikipedia Article
Graph (WAG)
...
Study 1
               methods
                definition of focus




• focus = indegree in Wikipedia Article
Graph (WAG)
...
Study 1
                              methods
                                definition of focus



Jonathan              ...
http://commons.wikimedia.org/wiki/File:Poutine.JPG




Experiment
  methods




  Poutine!
Study 1
                                methods
                                  definition of focus




Chez Ashton   Fre...
Study 1
methods
 definition of focus
Study 1
methods
 sample and statistic
Study 1
                methods
                sample and statistic




• sample = geographic articles
Study 1
methods
 sample and statistic
Study 1
methods
 sample and statistic
Study 1
                methods
                 sample and statistic




• statistic = spatial indegree sums
Study 1
                 methods
                  sample and statistic




          Flying Finn
            Airline




...
Study 1
                                 methods
                                 sample and statistic




               ...
Study 1
                                       methods
                                       sample and statistic




Sub...
Study 1
                                       methods
                                       sample and statistic




Sub...
Study 1
null hypothesis
Study 1
          null hypothesis

H0: Indegree sums will have roughly the
 same distribution in every Wikipedia
Study 1
           null hypothesis

 H0: Indegree sums will have roughly the
  same distribution in every Wikipedia


All ...
Study 1
           null hypothesis

 H0: Indegree sums will have roughly the
  same distribution in every Wikipedia


All ...
Study 1
self-focus hypothesis
Study 1
       self-focus hypothesis

 H1: Each language’s Wikipedia will have
higher indegree sums in countries where
   ...
Study 1
       self-focus hypothesis

 H1: Each language’s Wikipedia will have
higher indegree sums in countries where
   ...
Study 1
       self-focus hypothesis

 H1: Each language’s Wikipedia will have
higher indegree sums in countries where
   ...
Indegree Sums in the Russian Wikipedia
Indegree Sums in the English Wikipedia
Indegree Sums in the Polish Wikipedia
Study I
                 results

 Country                   Indegree Sum
  Germany                       718,668
United S...
Study I
                  results

  Country                    Indegree Sum
    Finland                       55,331
 Uni...
Study I
                  results

  Country                   Indegree Sum
    Japan                      453,048
     It...
Study I
                  results

  Country                   Indegree Sum
    Japan                         453,048
    ...
Study I
 results
Study I
 results




           !
Study I
                 results

  Country                  Indegree Sum
 United States               1,366,261
United Ki...
Study I
                     results

      Country                  Indegree Sum
Y    United States               1,366,2...
Study I
                     results

      Country                  Indegree Sum
Y    United States               1,366,2...
Study I
                     results

      Country                  Indegree Sum
Y    United States               1,366,2...
Study I
                     results

      Country                  Indegree Sum
Y    United States               1,366,2...
Study I
                     results

      Country                  Indegree Sum
Y    United States               1,366,2...
Study I
                     results

      Country                  Indegree Sum
Y    United States               1,366,2...
Study I
                     results

      Country                  Indegree Sum
Y    United States               1,366,2...
Study I
                     results

      Country                  Indegree Sum
Y    United States               1,366,2...
Study I
                     results

      Country                  Indegree Sum
     United States               1,366,2...
Study I
                       results

        Country                  Indegree Sum
Num    United States               1...
Study I
                       results

        Country                  Indegree Sum
Num    United States               1...
Study I
                       results

        Country                  Indegree Sum
Num    United States               1...
Study I
                results




                   USA    1,366,261
SFR(W English ) =       =           = 7.2
        ...
Study I
             results
Language           Self-focus Ratio
  English                 7.2
 Japanese                 6...
Study 1I
                methods
                 sample and statistic




• sample = geographic articles
• statistic = sp...
Study 1I
                methods
                 sample and statistic




• sample = geographic articles
• statistic = sp...
Study 1I
                methods
                 sample and statistic




• sample = geographic articles
• statistic = sp...
Study 1I
            results


Language         Self-focus Ratio
 Catalan                2.7
  Finnish               1.7
N...
Discussion
hyperlingual approach
Discussion
hyperlingual approach

        • 15 Wikipedias (22)
Discussion
hyperlingual approach

        • 15 Wikipedias (22)
        • over 8 million articles
Discussion
hyperlingual approach

        • 15 Wikipedias (22)
        • over 8 million articles
        • over 270 millio...
Discussion
hyperlingual approach

        • 15 Wikipedias (22)
        • over 8 million articles
        • over 270 millio...
Discussion
hyperlingual approach

        • 15 Wikipedias (22)
        • over 8 million articles
        • over 270 millio...
Discussion
hyperlingual approach
Discussion
       hyperlingual approach


• general benefits
Discussion
       hyperlingual approach


• general benefits
  • similarities → more robust findings
Discussion
       hyperlingual approach


• general benefits
  • similarities → more robust findings
  • differences → cultu...
Discussion
       hyperlingual approach


• general benefits
  • similarities → more robust findings
  • differences → cultu...
Discussion
       hyperlingual approach


• general benefits
  • similarities → more robust findings
  • differences → cultu...
Discussion
       hyperlingual approach


• general benefits
  • similarities → more robust findings
  • differences → cultu...
Discussion
   Africa
Conclusion
              Cliffs Notes



1. self-focus is a systemic bias in Wikipedia
  • people reorient world knowledge...
Indegree Sums in the English Wikipedia
Conclusion
              Cliffs Notes


1. self-focus is a systemic bias in Wikipedia
  • people reorient world knowledge
...
Acknowledgements
       Nada Petrović
Colleagues at the Collabolab
      NSF #0705901
    Microsoft Research


     Contac...
Measuring Self-Focus Bias in Community Maintained Knowledge Repositories
Measuring Self-Focus Bias in Community Maintained Knowledge Repositories
Measuring Self-Focus Bias in Community Maintained Knowledge Repositories
Measuring Self-Focus Bias in Community Maintained Knowledge Repositories
Measuring Self-Focus Bias in Community Maintained Knowledge Repositories
Measuring Self-Focus Bias in Community Maintained Knowledge Repositories
Upcoming SlideShare
Loading in …5
×

Measuring Self-Focus Bias in Community Maintained Knowledge Repositories

546 views

Published on

The talk I gave at Communities and Technologies 2009 on using a hyperlingual methodology to identify cultural diversity in the knowledge representations in Wikipedia. Paper at http://www.brenthecht.com/papers/bhecht_CommAndTech2009.pdf

Published in: Technology, Education, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
546
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Measuring Self-Focus Bias in Community Maintained Knowledge Repositories

  1. 1. Measuring Self-Focus Bias in Community Maintained Knowledge Repositories Brent Hecht and Darren Gergle Northwestern University
  2. 2. Overview 1. Introduction 2. Study 1 3. Study 2 4. Discussion 5. Conclusion
  3. 3. Introduction Sum of World Knowledge
  4. 4. Introduction Sum of World Knowledge
  5. 5. Introduction
  6. 6. Introduction • Artificial Intelligence
  7. 7. Introduction • Artificial Intelligence • Natural Language Processing
  8. 8. Introduction • Artificial Intelligence • Natural Language Processing • Human-Computer Interaction
  9. 9. Introduction • Artificial Intelligence • Natural Language Processing • Human-Computer Interaction • CSCW
  10. 10. Introduction World knowledge according to whom?
  11. 11. Introduction
  12. 12. Introduction
  13. 13. Introduction
  14. 14. Introduction
  15. 15. Introduction
  16. 16. Introduction
  17. 17. Introduction
  18. 18. Introduction
  19. 19. Introduction • self-focus bias • effect of community-held opinions and interests on the world knowledge in Wikipedia • if it exists, both positive and negative
  20. 20. Introduction terms and concepts
  21. 21. Introduction terms and concepts subset of the English Wikipedia Article Graph (WAG)
  22. 22. Introduction terms and concepts subset of the English Wikipedia Article Graph (WAG)
  23. 23. Introduction terms and concepts • “Barack Obama” has 2 inlinks • “Barack Obama” has an indegree of 2 subset of the English Wikipedia Article Graph (WAG)
  24. 24. Introduction terms and concepts subset of the English Wikipedia Article Graph (WAG)
  25. 25. Introduction terms and concepts • indegree → what people are writing about • indegree → relatedness to sum of world knowledge in each Wikipedia subset of the English Wikipedia Article Graph (WAG)
  26. 26. Introduction terms and concepts The United States Joe Biden Barack Obama subset of the English Wikipedia Article Graph (WAG)
  27. 27. Introduction terms and concepts The United States Joe Biden • indegree → what people are writing about • indegree → relatedness to sum of world knowledge in each Wikipedia Barack Obama subset of the English Wikipedia Article Graph (WAG)
  28. 28. Study 1 methods definition of focus
  29. 29. Study 1 methods definition of focus • focus = indegree in Wikipedia Article Graph (WAG)
  30. 30. Study 1 methods definition of focus • focus = indegree in Wikipedia Article Graph (WAG) • greater indegree = greater focus
  31. 31. Study 1 methods definition of focus • focus = indegree in Wikipedia Article Graph (WAG) • greater indegree = greater focus • compare across 15 Wikipedias
  32. 32. Study 1 methods definition of focus Jonathan Interstate Jonathan Frakes Pennsylvania Frakes Pennsylvania 99 Penn State Université d'État de University Pennsylvanie indegree = 3 indegree = 1 English Wikipedia French Wikipedia
  33. 33. http://commons.wikimedia.org/wiki/File:Poutine.JPG Experiment methods Poutine!
  34. 34. Study 1 methods definition of focus Chez Ashton French Fries Cheddar Cheddar Cheese Chez Ashton French Fries Cheese Poutine Poutine indegree = 0 indegree = 3 English Wikipedia French Wikipedia
  35. 35. Study 1 methods definition of focus
  36. 36. Study 1 methods sample and statistic
  37. 37. Study 1 methods sample and statistic • sample = geographic articles
  38. 38. Study 1 methods sample and statistic
  39. 39. Study 1 methods sample and statistic
  40. 40. Study 1 methods sample and statistic • statistic = spatial indegree sums
  41. 41. Study 1 methods sample and statistic Flying Finn Airline Finland
  42. 42. Study 1 methods sample and statistic Flying Finn Airline Rovaniemi Finland Helsinki
  43. 43. Study 1 methods sample and statistic Sub-arctic Sub-arctic Flying Finn Climate Climate Airline Rovaniemi Finland Helsinki Finno-Urgic Linus Torvalds Languages
  44. 44. Study 1 methods sample and statistic Sub-arctic Sub-arctic Flying Finn Climate Climate Airline Rovaniemi • Finland has an indegree sum = 4 Finland Helsinki Finno-Urgic Linus Torvalds Languages
  45. 45. Study 1 null hypothesis
  46. 46. Study 1 null hypothesis H0: Indegree sums will have roughly the same distribution in every Wikipedia
  47. 47. Study 1 null hypothesis H0: Indegree sums will have roughly the same distribution in every Wikipedia All Wikipedias agree on focus distribution
  48. 48. Study 1 null hypothesis H0: Indegree sums will have roughly the same distribution in every Wikipedia All Wikipedias agree on focus distribution Self-focus bias does not exist
  49. 49. Study 1 self-focus hypothesis
  50. 50. Study 1 self-focus hypothesis H1: Each language’s Wikipedia will have higher indegree sums in countries where the language is prominent
  51. 51. Study 1 self-focus hypothesis H1: Each language’s Wikipedia will have higher indegree sums in countries where the language is prominent Each Wikipedia will demonstrate greater focus on its language’s culture hearth
  52. 52. Study 1 self-focus hypothesis H1: Each language’s Wikipedia will have higher indegree sums in countries where the language is prominent Each Wikipedia will demonstrate greater focus on its language’s culture hearth Self-focus bias exists
  53. 53. Indegree Sums in the Russian Wikipedia
  54. 54. Indegree Sums in the English Wikipedia
  55. 55. Indegree Sums in the Polish Wikipedia
  56. 56. Study I results Country Indegree Sum Germany 718,668 United States 114,720 France 110,554 Switzerland 103,387 Austria 95,986 Italy 93,116 German Wikipedia
  57. 57. Study I results Country Indegree Sum Finland 55,331 United States 25,664 Germany 11,972 Russia 10,076 United Kingdom 9,402 Italy 7,948 Finnish Wikipedia
  58. 58. Study I results Country Indegree Sum Japan 453,048 Italy 70,922 United States 60,384 China 37,208 Germany 25,276 United Kingdom 20,690
  59. 59. Study I results Country Indegree Sum Japan 453,048 Italy 70,922 United States 60,384 China 37,208 Germany 25,276 United Kingdom 20,690 Japanese Wikipedia
  60. 60. Study I results
  61. 61. Study I results !
  62. 62. Study I results Country Indegree Sum United States 1,366,261 United Kingdom 439,582 France 189,698 Germany 151,503 Canada 146,191 Italy 129,133 English Wikipedia
  63. 63. Study I results Country Indegree Sum Y United States 1,366,261 United Kingdom 439,582 France 189,698 Germany 151,503 Canada 146,191 Italy 129,133 English Wikipedia
  64. 64. Study I results Country Indegree Sum Y United States 1,366,261 Y United Kingdom 439,582 France 189,698 Germany 151,503 Canada 146,191 Italy 129,133 English Wikipedia
  65. 65. Study I results Country Indegree Sum Y United States 1,366,261 Y United Kingdom 439,582 Y France 189,698 Germany 151,503 Canada 146,191 Italy 129,133 English Wikipedia
  66. 66. Study I results Country Indegree Sum Y United States 1,366,261 Y United Kingdom 439,582 N France 189,698 Germany 151,503 Canada 146,191 Italy 129,133 English Wikipedia
  67. 67. Study I results Country Indegree Sum Y United States 1,366,261 Y United Kingdom 439,582 N France 189,698 N Germany 151,503 Canada 146,191 Italy 129,133 English Wikipedia
  68. 68. Study I results Country Indegree Sum Y United States 1,366,261 Y United Kingdom 439,582 N France 189,698 N Germany 151,503 Y Canada 146,191 Italy 129,133 English Wikipedia
  69. 69. Study I results Country Indegree Sum Y United States 1,366,261 Y United Kingdom 439,582 N France 189,698 N Germany 151,503 Y Canada 146,191 N Italy 129,133 English Wikipedia
  70. 70. Study I results Country Indegree Sum Y United States 1,366,261 Y United Kingdom 439,582 N France 189,698 N Germany 151,503 Y Canada 146,191 N Italy 129,133 English Wikipedia
  71. 71. Study I results Country Indegree Sum United States 1,366,261 Y United Kingdom 439,582 N France 189,698 N Germany 151,503 Y Canada 146,191 N Italy 129,133 English Wikipedia
  72. 72. Study I results Country Indegree Sum Num United States 1,366,261 Y United Kingdom 439,582 N France 189,698 N Germany 151,503 Y Canada 146,191 N Italy 129,133 English Wikipedia
  73. 73. Study I results Country Indegree Sum Num United States 1,366,261 Y United Kingdom 439,582 France 189,698 N Germany 151,503 Y Canada 146,191 N Italy 129,133 English Wikipedia
  74. 74. Study I results Country Indegree Sum Num United States 1,366,261 Y United Kingdom 439,582 Den France 189,698 N Germany 151,503 Y Canada 146,191 N Italy 129,133 English Wikipedia
  75. 75. Study I results USA 1,366,261 SFR(W English ) = = = 7.2 France 189,698
  76. 76. Study I results Language Self-focus Ratio English 7.2 Japanese 6.4 German 6.3 French 4.2 Italian 3.6 Catalan 2.9 Spanish 2.4 Finnish 2.2 Polish 1.7 Norwegian 1.4 Chinese 1.2 Dutch 0.7 Swedish 0.6 Portuguese 0.3
  77. 77. Study 1I methods sample and statistic • sample = geographic articles • statistic = spatial indegree sums
  78. 78. Study 1I methods sample and statistic • sample = geographic articles • statistic = spatial indegree sums
  79. 79. Study 1I methods sample and statistic • sample = geographic articles • statistic = spatial indegree sums spatial pagerank score sums
  80. 80. Study 1I results Language Self-focus Ratio Catalan 2.7 Finnish 1.7 Norwegian 0.5
  81. 81. Discussion hyperlingual approach
  82. 82. Discussion hyperlingual approach • 15 Wikipedias (22)
  83. 83. Discussion hyperlingual approach • 15 Wikipedias (22) • over 8 million articles
  84. 84. Discussion hyperlingual approach • 15 Wikipedias (22) • over 8 million articles • over 270 million links
  85. 85. Discussion hyperlingual approach • 15 Wikipedias (22) • over 8 million articles • over 270 million links • English less than 1/4 the data
  86. 86. Discussion hyperlingual approach • 15 Wikipedias (22) • over 8 million articles • over 270 million links • English less than 1/4 the data • it was “easy” with WikAPIdia software
  87. 87. Discussion hyperlingual approach
  88. 88. Discussion hyperlingual approach • general benefits
  89. 89. Discussion hyperlingual approach • general benefits • similarities → more robust findings
  90. 90. Discussion hyperlingual approach • general benefits • similarities → more robust findings • differences → cultural diversity
  91. 91. Discussion hyperlingual approach • general benefits • similarities → more robust findings • differences → cultural diversity • mine cultural diversity
  92. 92. Discussion hyperlingual approach • general benefits • similarities → more robust findings • differences → cultural diversity • mine cultural diversity • “culturally-aware applications”
  93. 93. Discussion hyperlingual approach • general benefits • similarities → more robust findings • differences → cultural diversity • mine cultural diversity • “culturally-aware applications” • very rarely in literature
  94. 94. Discussion Africa
  95. 95. Conclusion Cliffs Notes 1. self-focus is a systemic bias in Wikipedia • people reorient world knowledge around themselves • many implications for technologies
  96. 96. Indegree Sums in the English Wikipedia
  97. 97. Conclusion Cliffs Notes 1. self-focus is a systemic bias in Wikipedia • people reorient world knowledge around themselves • many implications for technologies 2. hyperlingual approach proved very useful
  98. 98. Acknowledgements Nada Petrović Colleagues at the Collabolab NSF #0705901 Microsoft Research Contact Info brent@u.northwestern.edu www.brenthecht.com

×