Cultural Configuration: Autoreferentiality in Wikipedia (Academic Study presented in Wikimania, Haifa-Israel 2011)
Upcoming SlideShare
Loading in...5
×
 

Cultural Configuration: Autoreferentiality in Wikipedia (Academic Study presented in Wikimania, Haifa-Israel 2011)

on

  • 482 views

I gave this presentation in Wikimania 2011 (held in Haifa, Israel). I develop the results I obtained in my academic study on Wikipedia and autoreferentiality. This is a property belonging to any ...

I gave this presentation in Wikimania 2011 (held in Haifa, Israel). I develop the results I obtained in my academic study on Wikipedia and autoreferentiality. This is a property belonging to any language edition of Wikipedia which measures in a way how ethnocentric a version is.

Up to 20 languages were measured. Conclusions indicated that around 25% of the content of each Wikipedia language edition is on topics related exclusively to the same language (geographical places, history, society, politics...). One third of the anonymous editors wrote on these topics.

Statistics

Views

Total Views
482
Views on SlideShare
482
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Cultural Configuration: Autoreferentiality in Wikipedia (Academic Study presented in Wikimania, Haifa-Israel 2011) Cultural Configuration: Autoreferentiality in Wikipedia (Academic Study presented in Wikimania, Haifa-Israel 2011) Presentation Transcript

    • Cultural Configuration of Wikipedia: Measuring Autoreferentiality in Different Languages Greetings: Look N’ Feel User:MuRe
    • Cultural Configuration of Wikipedia: Measuring Autoreferentiality in Different Languages Marc Miquel {marcmiquel@gmail.com} Wikimania 2011 Haifa Greetings: Look N’ Feel User:MuRe
    • INTRODUCTION Survey December 2010 Could it be there is a national motivation for editing?IntroductionState of Art Motivation studies. Nov 2007Autoreferentiality Ideologic, Fun, Altruistic, Reciprocity, Self-esteem, Career, among others.Object selection - often explained just by Intrinsic and Extrinsic motivation.MethodologyResults Semantic Third Survey on Users/Editors from Viquipèdia External / Internal 674 completed / 871 Prominence / Endogamy Edition Sociological: genre, age, residence area, preferent language, reader/editor. TemporalConclusions Questions: Use: reasons to use Viquipèdia, use of other WP, conflict regularity and topics, consult topics. Evaluation: Comparative and quality evaluation of Viquipèdia, spreading, donations and economical value. Editing: reason to start writing, preferent article topic, time editing, kind of edits. How influential can a national motivation be?
    • INTRODUCTION Survey December 2010 “Content about catalan culture and territories”IntroductionState of Art ReadAutoreferentiality 31,16% of users usually looks up this content.Object selection WriteMethodology 15% of users write on this content.Results Semantic External / Internal Conflict Prominence / Endogamy 34,2% of the conflicts according to witnessess. Edition Temporal SupportConclusions 48% to Amical Viquipèdia 41% to Wikimedia Foundation 11% Other constituted chapters
    • STATE OF ART Do we want WP to focus in our culture? What’s On Wikipedia? Topical Coverage. Kittur et al. 2009.Introduction Culture and Arts 30%, People 15%, Geography and places 14%,State of Art Society and social sciences 12%, History and events 11%Autoreferentiality Most of the content is social sciences related. Therefore a nationalObject selection motivation can influence it in many ways.MethodologyResults Each language edition of Wikipedia may have a different cultural Semantic configuration even it is on the same content! External / Internal Prominence / Endogamy Edition Wikipedia is quantiative, textual and relational TemporalConclusions We will analyze it with technical tools to obtain Autoreferentiality Self-focus bias on geographic articles Hecht (2009)
    • AUTOREFERENTIALITY A reflection of own culture “We define Self-focus bias as occurring when contributors to a knowledgeIntroduction repository encode information that is important and correct to them and a largeState of Art proportion of contributors to the same repository, but not important andAutoreferentiality correct to contributors of similar repositories.” Hecht and Gergle (2009)Object selectionMethodology Interest divergence Set of articlesResults Semantic Analysis dimensions External / Internal Prominence / Endogamy 0. Semantic Edition Temporal 1. Isolation (external interest)Conclusions 2. Effort (internal interest) 3. Prominence (relational interest) 4. Endogamy (self-interest) 5. Edition (activity interest) 6. Temporal (creation interest) Cultural configuration = ∑ Interest Indicators
    • METHODOLOGY How do we process it? Framework wikAPIdiaIntroduction Context Java-based, MySQL. AnalysisState of ArtAutoreferentialityObject selection 1. Storing scriptsMethodology 2. Processing scriptsResults Semantic 3. Extracting scripts External / Internal Prominence / Endogamy • Files: dozens of GB – last articles and history Edition Temporal • Min-max consume: 4-60 GB RAMConclusions Remote process submitting in a Sun Grid Cluster Automatization to repeat the process Total 20 languages = 8,8 M articles = 275 M links + 125 M edits + …
    • OBJECT SELECTION Hyperlingua and selection of articles Hyperlingua approachIntroduction Avoid extrapolation  Results not just limited to the English edition Context 8 languages and 20 on development Analysis Catalan, Czech, Danish, Italian, Nederlands, Romanian, Swedish, Chinese, etc.State of Art Selection method through Wikipedia structureAutoreferentiality P.e. En_Keywords: english, united kingdom, england, scotland, scottish, ireland, irish, etc.Object selectionMethodology LevelResults 1. «english_writers» (cat.) «english_monarchs» (art.) «english_civil_war» (art.) … Semantic External / Internal saved Prominence / Endogamy Edition 2. «english_autobiographers» (cat.) «english_journalists» (cat.) british_poets_list» (art.) TemporalConclusions … saved 3. «john_stuart_mill» (art.) «edgar allan poe» (art.) «english_columnists» (cat.) saved … Set = ∑ Art(keywords) + Art(categories(keywords)) + Art(categories(categories(keywords))) …
    • RESULTS Semantic Indicators and Index:Introduction a) Indicator value = Feature(Set) – Feature(Edition) / Feature(Edition) Context b) Indicator confirmed when it’s positive in all languages. AnalysisState of Art 𝑚Autoreferentiality 𝐼𝑛𝑑𝑒𝑥 𝑉𝑎𝑙𝑢𝑒(𝑙) = 𝑖𝑛𝑑𝑣𝑎𝑙𝑢𝑒(𝑛) · 𝑖𝑛𝑑𝑝𝑜𝑛𝑑(𝑛) · 𝑠𝑒𝑡𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒(𝑙)Object selection 𝑛=1 𝑚Methodology 𝐼𝑛𝑑𝑝𝑜𝑛𝑑 (𝑛) = 𝑙=1 𝑖𝑛𝑑𝑣𝑎𝑙𝑢𝑒(𝑙) /𝑚Results Semantic 35 External / Internal Selection result: 30 Prominence / Endogamy Semantic 25 Edition 20 Temporal 15Conclusions 15 to 30% of any language 10 5 0 ca cs da it nl ro sv zh Wikipedia is also a repository of local content
    • RESULTS External / Internal External interest (Isolation) Interwiki linksIntroduction • Pointing mainly to the English edition of WP. Context • Then to German and French (>1M) and those in geographic proximity. Analysis Languages Avg. Set. Avg. Version Val. Ind.State of Art ca 1,4 6,4 78,6Autoreferentiality cs 1,8 8,4 78,8Object selection da 2,5 9 71,8 it 2,5 4,9 49,5Methodology nl 1,2 5,5 78,2Results ro 1,3 7,7 83,1 sv 1,2 6,3 81,6 Semantic zh 1,4 5,7 75,4 External / Internal Prominence / Endogamy Edition Temporal Internal interest (Effort) Bytes, OutlinksConclusions • Varies a lot in editions. • Not always the length is longer for the “set articles”. • Those containing the keywords in the title are more extensive. Local content is not replicated in other languages
    • RESULTS Prominence / Endogamy Relational interest (Prominence) Inlinks, PR,Introduction Catmem. Context • Not more prominent. Only articles containing keywords in titles. Analysis • Very well categorized using category membershipsState of ArtAutoreferentialityObject selection Self-interest (Endogamy) IL, CM from setMethodologyResults • High endogamy (most 82,1% Romanian, least 62,7% Italian) Semantic External / Internal Prominence / Endogamy Edition TemporalConclusions
    • RESULTS Edition Editors and Edit countIntroduction • Behaves with a lot of variability. Context • Correlation with Bytes. AnalysisState of ArtAutoreferentiality Diversity coefficientObject selection • Detect if made by a more active subgroup of editors from the article.Methodology • These articles raise higher interest on few more active editors.Results Semantic Type of editors External / Internal • 65% of local content is made by 30% of the user community. Prominence / Endogamy • One out of every three anonymous editors writes on local content. Edition TemporalConclusions
    • RESULTS Temporal Relative growth ratesIntroduction • By comparing their rates we can recognize local content is not growing faster. Context • By comparing their period subtraction (increment) we can know local content is Analysis not advancing the whole trend of the language growth.State of Art Interest stability and abruptionAutoreferentiality • By adding up all the periods subtractions we can have its abruption.Object selection • Standard deviation of all periods can give its stability.MethodologyResults Semantic External / Internal Prominence / Endogamy Edition TemporalConclusions Local content main articles were created 2006/07
    • INDEX Summary TableIntroduction Context AnalysisState of ArtAutoreferentialityObject selectionMethodologyResults Semantic External / Internal Prominence / Endogamy Edition TemporalConclusions
    • CONCLUSIONS What do we know about Wikipedia?  There is a cultural configuration in any collaborativeIntroduction knowledge repository. It can reflect an interest on local content. Context AnalysisState of Art a) Extension 25% of the content in twenty five languagesAutoreferentiality b) WP Icelandic and Catalan most and least autoreferential.Object selection c) Main characteristics:MethodologyResults Non-replicated content, more categorized and endogamic. Semantic External / Internal o Articles written by a few editors with higher activity. Prominence / Endogamy o Anonymous editors choose this content to edit. Edition TemporalConclusions o It will stop growing at a moment.
    • Questions Methodology, Results & Future linesIntroduction Context AnalysisState of ArtAutoreferentialityObject selectionMethodologyResults Semantic External / Internal Prominence / Endogamy Edition TemporalConclusions marcmiquel.com marcmiquel@gmail.com / Usuari:micrib Thanks for your attention