SlideShare a Scribd company logo
1 of 57
Download to read offline
Social Networks of Wikipedia

Paolo Massa
SoNet @ Bruno Kessler Foundation, Trento, Italy
http://www.gnuband.org
Contributions
       Methodological paper on
Algorithms for extracting a network of
   Who talks to whom on Wikipedia
                   +
Validation of quality by manual coding
  Code is open source and reusable
                   =
Basic step for Social Network Analysis
Outline
●   Statistics on Wikipedia/wiki

●   Algorithms for Extracting a
    Social Network

●   Manual Validation of Algorithms
English Wikipedia
Started in 2001

  3.500.000+   articles
440.000.000+   edits
 14.000.000+   registered users
  3.500.000+   at-least-1-edit users
Multi-lingual: 280+ Wikipedias




50.000+ wikis on Wikia.com, some 1.000.000+ edits
Article Page
Article Page / Article Talk Page
User Page
User Page – User Talk Page (UTP)
How to extract a network of
who talk to whom from User
        talk pages?
User talk page http://en.wikipedia.org/wiki/User_talk:Phauly




                                                               0.6
User talk page http://en.wikipedia.org/wiki/User_talk:Phauly




                                                               0.6
User talk page http://en.wikipedia.org/wiki/User_talk:Phauly


                                                                       1
                                                               Shell       Phauly




                                                                            0.6
User talk page http://en.wikipedia.org/wiki/User_talk:Phauly


                                                                       1
                                                               Shell       Phauly




                                                                            0.6
User talk page http://en.wikipedia.org/wiki/User_talk:Phauly


                                                                         1
                                                               Shell            Phauly




                                                                             1
                                                                       Martin
Broader scope
We (SoNet) work on
● How UTPs are used (coordination)

● Characterize users of Wikipedia (based


  on gender, interests, religion, ...)
● Formation of Collective memories of


  events in Wikipedia

●   Goal: understand/model what users do
    in Wikipedia → Wikisociology
We're hiring! ;)

Call for researcher at
https://risorseumane.fbk.eu/it/node/234

Info about SoNet group
at http://sonet.fbk.eu

If interested, come to talk
to me!
Other Wikipedia networks
●   Few papers on User talk pages
●   Node=User
      ●   Edge=Coediting x articles
      ●   Edge=Editing article after user A
             ●   Edge=Reverted edit of user A
      ●   Edge=Vote in elections for admins
●   Node=Page / Edge=Link
●   Node=Category / Edge=Inclusion
How to extract who talks to
           whom?
3 ways:
(1) Signatures (automated)
(2) History of edits (automated)
(3) Manual coding
Input: Wikipedia dumps
XML dump of every edit occured to every
page in time (10 years!)

English Wikipedia dump =
5,600 Gigabytes!

(our scripts work on every wiki: 280+
language Wikipedia, but also 50.000+
wikia.com wikis ...)
How to extract who talks to
           whom?
3 ways:
(1) Signatures in text (automated)
(2) History of edits (automated)
(3) Manual coding
(1) Signature algorithm
(1) Signature algorithm
(1) Signature algorithm

                          <page>     pages­meta­current XML 
                              <title>User talk:Phauly</title>
                                <revision>
                                   <text xml:space="preserve">
                          == '''Welcome!''' ==
                          Hello, {{BASEPAGENAME}}, and [[Wikipedia:Welcome, newcomers|welcome]] t
                          your contributions. I hope you like the place and decide to stay. Here 
                          might find helpful:
                          *[[Wikipedia:Five pillars|The five pillars of Wikipedia]]
                          *[[Wikipedia:How to edit a page|How to edit a page]]
                          *[[Help:Contents|Help pages]]
                          *[[Wikipedia:Tutorial|Tutorial]]
                          *[[Wikipedia:Article development|How to write a great article]]
                          *[[Wikipedia:Manual of Style|Manual of Style]]
                          I hope you enjoy editing here and being a [[Wikipedia:Wikipedians|Wikip
                          [[Wikipedia:Sign your posts on talk pages|sign your name]] on talk page
                          (<nowiki>~~~~</nowiki>); this will automatically produce your name and 
                                                                             0.6
                          check out [[Wikipedia:Questions]], ask me on my talk page, or place 
                          <code><nowiki>{{helpme}}</nowiki></code> on your talk page and someone 
                          answer your questions. Again, welcome!&nbsp;. [[User:Shell_Kinney|Shell
                          <sup>[[User_talk:Shell_Kinney|babelfish]]</sup> 15:29, 7 November 2006 
                          == "Wikipedia endnote assisstant" ==
                          Hi, sorry to take so long to reply to your message. It's convention at 
                          messages at the bottom of the page, and as I was moving country at the 
                          see your message until now! Have you tried the updated URL, 
                          http://toolserver.org/~verisimilus/Scholar ? Let me know if you continu
                          Glad you find the tool useful! Best wishes, 
                          [[User:Smith609|Martin]]&nbsp;'''<small>([[User:Smith609|S
                          [[User_talk:Smith609|Talk]])</small>''' 01:19, 7 October 2008 
                          == Test anonymous edit ==
                          Just a test done by myself on signature formatting. ­­[[Special:Contrib
                          217.77.80.29]] ([[User talk:217.77.80.29|talk]]) 12:08, 8 February 2010
                                    </text>
                              </revision>
                          </page>
(1) Signature algorithm
                                        <page> 
●   Consider pages with title               <title>User talk:Phauly</title>
                                              <revision>
    User talk:T (or equivalent                   <text xml:space="preserve">
                                        == '''Welcome!''' ==

    in other languages)                 Hello, {{BASEPAGENAME}}, and [[Wikipedia:W
                                        your contributions. I hope you like the pl
                                        might find helpful:
●   Search for signatures of            *[[Wikipedia:Five pillars|The five pillars
                                        *[[Wikipedia:How to edit a page|How to edi

    user S in text                      *[[Help:Contents|Help pages]]
                                        *[[Wikipedia:Tutorial|Tutorial]]
                                        *[[Wikipedia:Article development|How to wr
●   Consider them as                    *[[Wikipedia:Manual of Style|Manual of Sty
                                        I hope you enjoy editing here and being a 

    message from S to T                 [[Wikipedia:Sign your posts on talk pages|
                                                              0.6
                                        (<nowiki>~~~~</nowiki>); this will automat
                                        check out [[Wikipedia:Questions]], ask me 
                                        <code><nowiki>{{helpme}}</nowiki></code> o
                                        answer your questions. Again, welcome!&nbs
Signature of XXX if [[User:XXX|         <sup>[[User_talk:Shell_Kinney|babelfish]]<
                                        == "Wikipedia endnote assisstant" ==
Signature of 217.77.80.29 if            Hi, sorry to take so long to reply to your
                                        messages at the bottom of the page, and as
[[Special:Contributions/217.77.80.29|   see your message until now! Have you tried
                                        http://toolserver.org/~verisimilus/Scholar
                                        Glad you find the tool useful! Best wishes
                                        [[User:Smith609|Martin]]&nbsp;'''<
Robust on spaces, HTML                  [[User_talk:Smith609|Talk]])</smal
                                        == Test anonymous edit ==

tags, non balanced                      Just a test done by myself on signature fo
                                        217.77.80.29]] ([[User talk:217.77.80.29|t

parentheses, ...
                                                  </text>
                                            </revision>
                                        </page>
(2) History algorithm
(2) History algorithm
(2) History algorithm

                        <page>      stub­meta­history XML
                         <title>User talk:Phauly</title>
                         <revision>
                          <timestamp>2006­11­07T15:29:48Z</timest
                          <contributor>
                           <username>Shell Kinney</username>
                          </contributor>
                         </revision>
                         <revision>
                          <timestamp>2008­10­07T01:19:54Z</timest
                          <contributor>
                                                0.6
                           <username>Smith609</username>
                          </contributor>
                         </revision>
                         <revision>
                          <timestamp>2010­02­08T12:08:19Z</timest
                          <contributor>
                           <ip>217.77.80.29</ip>
                          </contributor>
                         </revision>
                        </page>
(2) History algorithm
                            <page>      stub­meta­history X
                             <title>User talk:Phauly</title>
●   Consider pages with      <revision>
                              <timestamp>2006­11­07T15:29:48Z</
    title User talk:T (or     <contributor>
    equivalent in other        <username>Shell Kinney</username
                              </contributor>
    languages)               </revision>
                             <revision>
●   Consider revision by      <timestamp>2008­10­07T01:19:54Z</
                              <contributor>
    user S as a message        <username>Smith609</username>
                                                0.6
                              </contributor>
    from S to T              </revision>
                             <revision>
                              <timestamp>2010­02­08T12:08:19Z</
                              <contributor>
                               <ip>217.77.80.29</ip>
                              </contributor>
                             </revision>
                            </page>
They produce different
           networks
But
Which is more correct?
Which is more meaningful?

(1) Signatures in text (automated)
(2) History of edits (automated)
(3) Manual coding
Validation on Venetian Wikipedia by
manually visiting every user talk page
and manually extracting every
“message“
#users (active in writing or receiving) = 918
(out of 6255 registered users)
#messages = 1786

(paper about “content of messages“ on
UTPs: most are coordination)
Why Venetian Wikipedia?
Small, so complete manual coding is possible




                                 http://en.wikipedia.org
      http://vec.wikipedia.org
Goal of Manual Coding
Manual coding = opportunity to notice
patterns and regularities just as
exceptions to them.

Goal: providing empirical evidence of the
reliability of the extraction algorithms.
Which is correct? Best?
(1) Signatures in text (automated)
(2) History of edits (automated)
(3) Manual coding

NONE is correct. Not even Manual coding.
They are different.

Most important issues and strategies to
cope with them are in next slides.
(comparison on data at December 30, 2009)
(A) Number of nodes
(3) Manual coding      918
(1) Signatures         906
(2) History            981


Why? See next slides
(B) Renamed users
Small issue but relevant impact
Venetian Wikipedia = 15 renamings
English Wikipedia = 17,096 renamings
(B) Renamed users
Vec.wiki: “Maximillion Pegasus” user wrote msgs on User talk pages
Then a person requested username “Maximillion Pegasus” and got it.
Bureaucrats renamed “Maximillion Pegasus” into
“Usurped12032009”.

UTP of “Usurped12032009” contains messages received when he
was “Maximillion Pegasus”.
The new “Maximillion Pegasus” never received msg
Existing signatures not affected by rename.
So
Usurped12032009 has high indegree and 0 outdegree
“Maximillion Pegasus” has 0 indegree and high outdegree.

Got time to find this user, understand the issue, figure out it was not
a bug in our code!
Signature makes error in this case! Manual coding too!
History works because XML file contains the username of the „real“
user such as Usurped12032009
(B) Renamed users
This issue is NOT marginal.
17,000+ renamings in the English
Wikipedia
and usually involving very active and
peculiar users!

This issue affects the most basic element
of social networks, number of nodes!
(C) Number of edges
#pairs of users (unweighted) among
which at least 1 msgs was written

(3) Manual coding      1073
(1) Signatures         1087
(2) History            1869

Why? See next slides
(D) Information messages and
          redirects
“I don't check this vec.wiki often, please write
to User:X on en.wiki [Signature of User:X]“ →
usex X in en.wiki might be different from user X
in vec.wiki: only users in one wiki are
considered
(bot)“This is a bot, please write User:X“

Information messages       60/1786
Redirects                  27/1786

Manual coding = OK
Signature = ~KO
History = ~OK (but … A edits UTP of A...)
(E) Messages to oneself
A writes on UTP of A

56/1786 messages were self-edges

Wikipedia recommendation: A replies
to B on UTP of B
Small evidence but it seems to
happen: self-edges are rare and
mainly information messages
(F) Non human users writing
          messages
Each bot has its own “logic“. 1 example:
Marco27bot is a welcome bot
Many messages are templates!
                     Welcome templates {{benvegnu}}




Out of 1786 msgs, 774 (43.33%) are welcome templates.
In vec.wiki, Written by a bot Marco27Bot, but signed with usernames of volunteers
Manual coding and Signature algo: find signers (appearance)
History finds bot (reality)
Suggestion: don't consider bots because of their automated nature
(G) Anonymous users, vandalism
     and deleted messages
Anon users (IP address) have UTPs

They received 33 message from bots about
possible vandalism
Many of their edits got deleted

Coding and Signature don't find deleted edits
History finds them

Suggestion: remove anonymous users (IP
addresses don't map 1to1 to person anyway)
(H) Many edits per message
I edit the UTP of X,
I discover a typo,
I re-edit the UTP of X

These are not 2 messages but history
algorithm detects 2 edits.
Possible heuristics: collapse edits
occurring during short time
(I) Personalized, missing or
incorrectly formatted signatures

Large variety in personalized signatures
Hard to detect reliably all signatures,
especially for very active users! And in
each language Wikipedia, different
practices.
Most active vec.wiki user used a template
for signature! {{Utente:Nick1915/firma}}

Biggest drawback of signature algorithm
(I) Personalized, missing or
incorrectly formatted signatures
Users forget to sign (not automatic).
A bot (Sinebot in EnWiki and Marco27Bot
in VecWiki) edits the page and add
signature. → It seems the bot “talks“ a
lot.

Some users make errors in the syntax for
signing
Signature = KO
History = OK (forgot to sign is not a
problem, but discard bots)
(J) Date of message
    Messages are (often) dated → possible
    longitudinal analysis!

    Signature algo = KO: must detect syntax
    of date, different over time (in vec.wiki)
    and different in each language wikipedia

    History algo = OK: has the info formally
    coded in XML dump
        <timestamp>2006­11­07T15:29:48Z</timestamp>
(K) Archived messages
When UTPs become long, they get archived (by
a bot).
Current content is copied to a newly created
page such as User_talk:Phauly/Archive3
But NOT all subpages of UTP are archives!

Coding and Signature = KO: decide to look for
signatures in subpages based on heuristics on
page title (what is this in Chinese Wikipedia)?

History = OK: edits are done to “main“ UTP

Issue very relevant for “active“ users!
Our scripts are open source!
You can run it and extract networks (in order to
analyzed them). Python code at
https://github.com/phauly/wiki-network

Networks already available as extracted by 2
algorithms for German, Spanish, Italian,
Chinese and Venetian Wikipedia
http://sonetlab.fbk.eu/data/social_networks_of_wikipedia/
GraphML format: play with them using Gephi!
(http://www.gephi.org)

Social Network Analysis of who talks to whom on
Wikipedia is possible without caring about all these
details of extraction!
Size=Indegree
                                  (#received msgs)

                                  Color=Role




2005-2010 Cumulative
Weighted
Directed
Social network
(who talks to whom)

Nodes=Users (918)
 (out of 6255 registered users)
Edges=#Messages
Nodes=Users (918)

Most users just
received messages
(receivers, passive)

Only 196 users wrote
At least one msg!
(senders, active)
Discussion
No algo is “correct“, not even manual
coding!
Bots and anonymous users should be
removed and analyzed ad hoc
Interested in
  (1) the network users see (with its
variability in signatures and formats)
Signature algorithm ok but works only on one
language Wikipedia and needs tweaking
 (2) the network of what really happened
History algorithm more robust, also across
wikis (cross-wiki comparison) and with
dates (longitudinal analysis).
Conclusions
Small change in algorithm/assumption =
big change in “what you extract“ and
hence in “what you find“!!
Proposed 2 algorithms
Empirical Validation by manual coding
1) Bots and anonymous to be excluded
and treated separately and adhoc
2) History algorithm = more robust

Opensource scripts: First step towards
sociology of wikis
Credits
I would like to thanks
Davide Setti
Marco Frassoni
For writing the code and for manual
coding


Don't forget
Call for Postdoc at SoNet
https://risorseumane.fbk.eu/it/node/234
?   Thanks

More Related Content

Similar to Social networks of Wikipedia - Paolo Massa - Presentation at (2011). ACM Hypertext 2011: 22nd ACM Conference on Hypertext and Hypermedia

Presentation collaborate with wikis
Presentation collaborate with wikisPresentation collaborate with wikis
Presentation collaborate with wikisTechSoup
 
Week 3 Wikis
Week 3 WikisWeek 3 Wikis
Week 3 WikisRachman12
 
Wikis inservice
Wikis inserviceWikis inservice
Wikis inservicedibeever
 
Wikis Instructions
Wikis InstructionsWikis Instructions
Wikis Instructionsdibeever
 
Wikis inservice
Wikis inserviceWikis inservice
Wikis inservicedibeever
 
Wikis inservice
Wikis inserviceWikis inservice
Wikis inservicedibeever
 
Wiki:Collaborative tool for building documents
Wiki:Collaborative tool for building documentsWiki:Collaborative tool for building documents
Wiki:Collaborative tool for building documentsAnjesh Tuladhar
 
Blogs and Wikis
Blogs and Wikis Blogs and Wikis
Blogs and Wikis kepitcher
 
160712 wiki lecture
160712 wiki lecture160712 wiki lecture
160712 wiki lecturein2acous
 
Open Source Software Wikipedia 2008
Open Source Software Wikipedia 2008Open Source Software Wikipedia 2008
Open Source Software Wikipedia 2008Thomas G Henry
 
Get On The Bus! Wyoming
Get On The Bus! WyomingGet On The Bus! Wyoming
Get On The Bus! WyomingKatie Lynn
 
What Does DITA Have To Do With Wiki
What Does DITA Have To Do With WikiWhat Does DITA Have To Do With Wiki
What Does DITA Have To Do With WikiAnne Gentle
 
Web 2.0 The Art of the Internet Possible
Web 2.0 The Art of the Internet PossibleWeb 2.0 The Art of the Internet Possible
Web 2.0 The Art of the Internet PossibleJimWhite
 
Art of GLAM-wiki:The Basics of Sharing Cultural Knowledge on Wikipedia
Art of GLAM-wiki:The Basics of Sharing Cultural Knowledge on WikipediaArt of GLAM-wiki:The Basics of Sharing Cultural Knowledge on Wikipedia
Art of GLAM-wiki:The Basics of Sharing Cultural Knowledge on WikipediaSara Snyder
 
Session Agenda: Open Learning Frameworks
Session Agenda: Open Learning FrameworksSession Agenda: Open Learning Frameworks
Session Agenda: Open Learning FrameworksMike Bogle
 

Similar to Social networks of Wikipedia - Paolo Massa - Presentation at (2011). ACM Hypertext 2011: 22nd ACM Conference on Hypertext and Hypermedia (20)

What is Wiki
What is WikiWhat is Wiki
What is Wiki
 
Presentation collaborate with wikis
Presentation collaborate with wikisPresentation collaborate with wikis
Presentation collaborate with wikis
 
Wikis 101
Wikis 101Wikis 101
Wikis 101
 
Week 3 Wikis
Week 3 WikisWeek 3 Wikis
Week 3 Wikis
 
Wikis inservice
Wikis inserviceWikis inservice
Wikis inservice
 
Wikis Instructions
Wikis InstructionsWikis Instructions
Wikis Instructions
 
Wikis inservice
Wikis inserviceWikis inservice
Wikis inservice
 
Wikis inservice
Wikis inserviceWikis inservice
Wikis inservice
 
Wiki:Collaborative tool for building documents
Wiki:Collaborative tool for building documentsWiki:Collaborative tool for building documents
Wiki:Collaborative tool for building documents
 
Blogs and Wikis
Blogs and Wikis Blogs and Wikis
Blogs and Wikis
 
160712 wiki lecture
160712 wiki lecture160712 wiki lecture
160712 wiki lecture
 
Open Source Software Wikipedia 2008
Open Source Software Wikipedia 2008Open Source Software Wikipedia 2008
Open Source Software Wikipedia 2008
 
Get On The Bus! Wyoming
Get On The Bus! WyomingGet On The Bus! Wyoming
Get On The Bus! Wyoming
 
Wikis And Your Business
Wikis And Your BusinessWikis And Your Business
Wikis And Your Business
 
The Wiki Way
The Wiki WayThe Wiki Way
The Wiki Way
 
Distributed wikis
Distributed wikisDistributed wikis
Distributed wikis
 
What Does DITA Have To Do With Wiki
What Does DITA Have To Do With WikiWhat Does DITA Have To Do With Wiki
What Does DITA Have To Do With Wiki
 
Web 2.0 The Art of the Internet Possible
Web 2.0 The Art of the Internet PossibleWeb 2.0 The Art of the Internet Possible
Web 2.0 The Art of the Internet Possible
 
Art of GLAM-wiki:The Basics of Sharing Cultural Knowledge on Wikipedia
Art of GLAM-wiki:The Basics of Sharing Cultural Knowledge on WikipediaArt of GLAM-wiki:The Basics of Sharing Cultural Knowledge on Wikipedia
Art of GLAM-wiki:The Basics of Sharing Cultural Knowledge on Wikipedia
 
Session Agenda: Open Learning Frameworks
Session Agenda: Open Learning FrameworksSession Agenda: Open Learning Frameworks
Session Agenda: Open Learning Frameworks
 

More from Paolo Massa

Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)
Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)
Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)Paolo Massa
 
Gamification Features 4 Fitcity
Gamification Features 4 FitcityGamification Features 4 Fitcity
Gamification Features 4 FitcityPaolo Massa
 
Rete e Reti: Per-che' e per-chi?
Rete e Reti: Per-che' e per-chi?Rete e Reti: Per-che' e per-chi?
Rete e Reti: Per-che' e per-chi?Paolo Massa
 
Social fitness (fitcity project)
Social fitness (fitcity project)Social fitness (fitcity project)
Social fitness (fitcity project)Paolo Massa
 
DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES
DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES  DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES
DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES Paolo Massa
 
Reputation: local or global?
Reputation: local or global?Reputation: local or global?
Reputation: local or global?Paolo Massa
 
An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...
An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...
An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...Paolo Massa
 
Supporting Collaborative Networks in Organizational Settings using an Enterpr...
Supporting Collaborative Networks in Organizational Settings using an Enterpr...Supporting Collaborative Networks in Organizational Settings using an Enterpr...
Supporting Collaborative Networks in Organizational Settings using an Enterpr...Paolo Massa
 
Combining Ridesharing& Social Networks
Combining Ridesharing& Social NetworksCombining Ridesharing& Social Networks
Combining Ridesharing& Social NetworksPaolo Massa
 
The Simplicity Cycle by Dan Ward
The Simplicity Cycle by Dan WardThe Simplicity Cycle by Dan Ward
The Simplicity Cycle by Dan WardPaolo Massa
 
Invited talk at Future Networked Technologies / FIT-IT research calls opening...
Invited talk at Future Networked Technologies / FIT-IT research calls opening...Invited talk at Future Networked Technologies / FIT-IT research calls opening...
Invited talk at Future Networked Technologies / FIT-IT research calls opening...Paolo Massa
 
The Future of Work, Fun, and Being Social: an introduction to the nascent adv...
The Future of Work, Fun, and Being Social: an introduction to the nascent adv...The Future of Work, Fun, and Being Social: an introduction to the nascent adv...
The Future of Work, Fun, and Being Social: an introduction to the nascent adv...Paolo Massa
 
Feedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online CommunitiesFeedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online CommunitiesPaolo Massa
 
Bowling Alone and Trust Decline in Social Network Sites
Bowling Alone and  Trust Decline in  Social Network SitesBowling Alone and  Trust Decline in  Social Network Sites
Bowling Alone and Trust Decline in Social Network SitesPaolo Massa
 
Social Networking 4 your business
Social Networking 4 your businessSocial Networking 4 your business
Social Networking 4 your businessPaolo Massa
 
OMG Girlz Don't Exist on teh Intarweb!!!!1
OMG Girlz Don't Exist on teh Intarweb!!!!1OMG Girlz Don't Exist on teh Intarweb!!!!1
OMG Girlz Don't Exist on teh Intarweb!!!!1Paolo Massa
 
Fukuyama' trust - The role of trust and trust networks in the society
Fukuyama' trust - The role of trust and trust networks in the societyFukuyama' trust - The role of trust and trust networks in the society
Fukuyama' trust - The role of trust and trust networks in the societyPaolo Massa
 
Transcendent Interactions Collaborative Contexts and Relationship-based Compu...
Transcendent Interactions Collaborative Contexts and Relationship-based Compu...Transcendent Interactions Collaborative Contexts and Relationship-based Compu...
Transcendent Interactions Collaborative Contexts and Relationship-based Compu...Paolo Massa
 
The Power of Social Media (Ricardo Baeza-Yates)
The Power of Social Media (Ricardo Baeza-Yates)The Power of Social Media (Ricardo Baeza-Yates)
The Power of Social Media (Ricardo Baeza-Yates)Paolo Massa
 
Internet, Web and Freedom
Internet, Web and FreedomInternet, Web and Freedom
Internet, Web and FreedomPaolo Massa
 

More from Paolo Massa (20)

Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)
Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)
Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)
 
Gamification Features 4 Fitcity
Gamification Features 4 FitcityGamification Features 4 Fitcity
Gamification Features 4 Fitcity
 
Rete e Reti: Per-che' e per-chi?
Rete e Reti: Per-che' e per-chi?Rete e Reti: Per-che' e per-chi?
Rete e Reti: Per-che' e per-chi?
 
Social fitness (fitcity project)
Social fitness (fitcity project)Social fitness (fitcity project)
Social fitness (fitcity project)
 
DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES
DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES  DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES
DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES
 
Reputation: local or global?
Reputation: local or global?Reputation: local or global?
Reputation: local or global?
 
An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...
An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...
An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...
 
Supporting Collaborative Networks in Organizational Settings using an Enterpr...
Supporting Collaborative Networks in Organizational Settings using an Enterpr...Supporting Collaborative Networks in Organizational Settings using an Enterpr...
Supporting Collaborative Networks in Organizational Settings using an Enterpr...
 
Combining Ridesharing& Social Networks
Combining Ridesharing& Social NetworksCombining Ridesharing& Social Networks
Combining Ridesharing& Social Networks
 
The Simplicity Cycle by Dan Ward
The Simplicity Cycle by Dan WardThe Simplicity Cycle by Dan Ward
The Simplicity Cycle by Dan Ward
 
Invited talk at Future Networked Technologies / FIT-IT research calls opening...
Invited talk at Future Networked Technologies / FIT-IT research calls opening...Invited talk at Future Networked Technologies / FIT-IT research calls opening...
Invited talk at Future Networked Technologies / FIT-IT research calls opening...
 
The Future of Work, Fun, and Being Social: an introduction to the nascent adv...
The Future of Work, Fun, and Being Social: an introduction to the nascent adv...The Future of Work, Fun, and Being Social: an introduction to the nascent adv...
The Future of Work, Fun, and Being Social: an introduction to the nascent adv...
 
Feedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online CommunitiesFeedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online Communities
 
Bowling Alone and Trust Decline in Social Network Sites
Bowling Alone and  Trust Decline in  Social Network SitesBowling Alone and  Trust Decline in  Social Network Sites
Bowling Alone and Trust Decline in Social Network Sites
 
Social Networking 4 your business
Social Networking 4 your businessSocial Networking 4 your business
Social Networking 4 your business
 
OMG Girlz Don't Exist on teh Intarweb!!!!1
OMG Girlz Don't Exist on teh Intarweb!!!!1OMG Girlz Don't Exist on teh Intarweb!!!!1
OMG Girlz Don't Exist on teh Intarweb!!!!1
 
Fukuyama' trust - The role of trust and trust networks in the society
Fukuyama' trust - The role of trust and trust networks in the societyFukuyama' trust - The role of trust and trust networks in the society
Fukuyama' trust - The role of trust and trust networks in the society
 
Transcendent Interactions Collaborative Contexts and Relationship-based Compu...
Transcendent Interactions Collaborative Contexts and Relationship-based Compu...Transcendent Interactions Collaborative Contexts and Relationship-based Compu...
Transcendent Interactions Collaborative Contexts and Relationship-based Compu...
 
The Power of Social Media (Ricardo Baeza-Yates)
The Power of Social Media (Ricardo Baeza-Yates)The Power of Social Media (Ricardo Baeza-Yates)
The Power of Social Media (Ricardo Baeza-Yates)
 
Internet, Web and Freedom
Internet, Web and FreedomInternet, Web and Freedom
Internet, Web and Freedom
 

Recently uploaded

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 

Recently uploaded (20)

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 

Social networks of Wikipedia - Paolo Massa - Presentation at (2011). ACM Hypertext 2011: 22nd ACM Conference on Hypertext and Hypermedia

  • 1. Social Networks of Wikipedia Paolo Massa SoNet @ Bruno Kessler Foundation, Trento, Italy http://www.gnuband.org
  • 2. Contributions Methodological paper on Algorithms for extracting a network of Who talks to whom on Wikipedia + Validation of quality by manual coding Code is open source and reusable = Basic step for Social Network Analysis
  • 3. Outline ● Statistics on Wikipedia/wiki ● Algorithms for Extracting a Social Network ● Manual Validation of Algorithms
  • 4. English Wikipedia Started in 2001 3.500.000+ articles 440.000.000+ edits 14.000.000+ registered users 3.500.000+ at-least-1-edit users
  • 5. Multi-lingual: 280+ Wikipedias 50.000+ wikis on Wikia.com, some 1.000.000+ edits
  • 7. Article Page / Article Talk Page
  • 9. User Page – User Talk Page (UTP)
  • 10. How to extract a network of who talk to whom from User talk pages?
  • 11. User talk page http://en.wikipedia.org/wiki/User_talk:Phauly 0.6
  • 12. User talk page http://en.wikipedia.org/wiki/User_talk:Phauly 0.6
  • 13. User talk page http://en.wikipedia.org/wiki/User_talk:Phauly 1 Shell Phauly 0.6
  • 14. User talk page http://en.wikipedia.org/wiki/User_talk:Phauly 1 Shell Phauly 0.6
  • 15. User talk page http://en.wikipedia.org/wiki/User_talk:Phauly 1 Shell Phauly 1 Martin
  • 16.
  • 17. Broader scope We (SoNet) work on ● How UTPs are used (coordination) ● Characterize users of Wikipedia (based on gender, interests, religion, ...) ● Formation of Collective memories of events in Wikipedia ● Goal: understand/model what users do in Wikipedia → Wikisociology
  • 18. We're hiring! ;) Call for researcher at https://risorseumane.fbk.eu/it/node/234 Info about SoNet group at http://sonet.fbk.eu If interested, come to talk to me!
  • 19. Other Wikipedia networks ● Few papers on User talk pages ● Node=User ● Edge=Coediting x articles ● Edge=Editing article after user A ● Edge=Reverted edit of user A ● Edge=Vote in elections for admins ● Node=Page / Edge=Link ● Node=Category / Edge=Inclusion
  • 20. How to extract who talks to whom? 3 ways: (1) Signatures (automated) (2) History of edits (automated) (3) Manual coding
  • 21. Input: Wikipedia dumps XML dump of every edit occured to every page in time (10 years!) English Wikipedia dump = 5,600 Gigabytes! (our scripts work on every wiki: 280+ language Wikipedia, but also 50.000+ wikia.com wikis ...)
  • 22. How to extract who talks to whom? 3 ways: (1) Signatures in text (automated) (2) History of edits (automated) (3) Manual coding
  • 25. (1) Signature algorithm <page>     pages­meta­current XML      <title>User talk:Phauly</title>       <revision>          <text xml:space="preserve"> == '''Welcome!''' == Hello, {{BASEPAGENAME}}, and [[Wikipedia:Welcome, newcomers|welcome]] t your contributions. I hope you like the place and decide to stay. Here  might find helpful: *[[Wikipedia:Five pillars|The five pillars of Wikipedia]] *[[Wikipedia:How to edit a page|How to edit a page]] *[[Help:Contents|Help pages]] *[[Wikipedia:Tutorial|Tutorial]] *[[Wikipedia:Article development|How to write a great article]] *[[Wikipedia:Manual of Style|Manual of Style]] I hope you enjoy editing here and being a [[Wikipedia:Wikipedians|Wikip [[Wikipedia:Sign your posts on talk pages|sign your name]] on talk page (<nowiki>~~~~</nowiki>); this will automatically produce your name and  0.6 check out [[Wikipedia:Questions]], ask me on my talk page, or place  <code><nowiki>{{helpme}}</nowiki></code> on your talk page and someone  answer your questions. Again, welcome!&nbsp;. [[User:Shell_Kinney|Shell <sup>[[User_talk:Shell_Kinney|babelfish]]</sup> 15:29, 7 November 2006  == "Wikipedia endnote assisstant" == Hi, sorry to take so long to reply to your message. It's convention at  messages at the bottom of the page, and as I was moving country at the  see your message until now! Have you tried the updated URL,  http://toolserver.org/~verisimilus/Scholar ? Let me know if you continu Glad you find the tool useful! Best wishes,  [[User:Smith609|Martin]]&nbsp;'''<small>([[User:Smith609|S [[User_talk:Smith609|Talk]])</small>''' 01:19, 7 October 2008  == Test anonymous edit == Just a test done by myself on signature formatting. ­­[[Special:Contrib 217.77.80.29]] ([[User talk:217.77.80.29|talk]]) 12:08, 8 February 2010           </text>     </revision> </page>
  • 26. (1) Signature algorithm <page>  ● Consider pages with title     <title>User talk:Phauly</title>       <revision> User talk:T (or equivalent          <text xml:space="preserve"> == '''Welcome!''' == in other languages) Hello, {{BASEPAGENAME}}, and [[Wikipedia:W your contributions. I hope you like the pl might find helpful: ● Search for signatures of *[[Wikipedia:Five pillars|The five pillars *[[Wikipedia:How to edit a page|How to edi user S in text *[[Help:Contents|Help pages]] *[[Wikipedia:Tutorial|Tutorial]] *[[Wikipedia:Article development|How to wr ● Consider them as *[[Wikipedia:Manual of Style|Manual of Sty I hope you enjoy editing here and being a  message from S to T [[Wikipedia:Sign your posts on talk pages| 0.6 (<nowiki>~~~~</nowiki>); this will automat check out [[Wikipedia:Questions]], ask me  <code><nowiki>{{helpme}}</nowiki></code> o answer your questions. Again, welcome!&nbs Signature of XXX if [[User:XXX| <sup>[[User_talk:Shell_Kinney|babelfish]]< == "Wikipedia endnote assisstant" == Signature of 217.77.80.29 if Hi, sorry to take so long to reply to your messages at the bottom of the page, and as [[Special:Contributions/217.77.80.29| see your message until now! Have you tried http://toolserver.org/~verisimilus/Scholar Glad you find the tool useful! Best wishes [[User:Smith609|Martin]]&nbsp;'''< Robust on spaces, HTML [[User_talk:Smith609|Talk]])</smal == Test anonymous edit == tags, non balanced Just a test done by myself on signature fo 217.77.80.29]] ([[User talk:217.77.80.29|t parentheses, ...           </text>     </revision> </page>
  • 29. (2) History algorithm <page>      stub­meta­history XML  <title>User talk:Phauly</title>  <revision>   <timestamp>2006­11­07T15:29:48Z</timest   <contributor>    <username>Shell Kinney</username>   </contributor>  </revision>  <revision>   <timestamp>2008­10­07T01:19:54Z</timest   <contributor> 0.6    <username>Smith609</username>   </contributor>  </revision>  <revision>   <timestamp>2010­02­08T12:08:19Z</timest   <contributor>    <ip>217.77.80.29</ip>   </contributor>  </revision> </page>
  • 30. (2) History algorithm <page>      stub­meta­history X  <title>User talk:Phauly</title> ● Consider pages with  <revision>   <timestamp>2006­11­07T15:29:48Z</ title User talk:T (or   <contributor> equivalent in other    <username>Shell Kinney</username   </contributor> languages)  </revision>  <revision> ● Consider revision by   <timestamp>2008­10­07T01:19:54Z</   <contributor> user S as a message    <username>Smith609</username> 0.6   </contributor> from S to T  </revision>  <revision>   <timestamp>2010­02­08T12:08:19Z</   <contributor>    <ip>217.77.80.29</ip>   </contributor>  </revision> </page>
  • 31. They produce different networks But Which is more correct? Which is more meaningful? (1) Signatures in text (automated) (2) History of edits (automated)
  • 32. (3) Manual coding Validation on Venetian Wikipedia by manually visiting every user talk page and manually extracting every “message“ #users (active in writing or receiving) = 918 (out of 6255 registered users) #messages = 1786 (paper about “content of messages“ on UTPs: most are coordination)
  • 33. Why Venetian Wikipedia? Small, so complete manual coding is possible http://en.wikipedia.org http://vec.wikipedia.org
  • 34. Goal of Manual Coding Manual coding = opportunity to notice patterns and regularities just as exceptions to them. Goal: providing empirical evidence of the reliability of the extraction algorithms.
  • 35. Which is correct? Best? (1) Signatures in text (automated) (2) History of edits (automated) (3) Manual coding NONE is correct. Not even Manual coding. They are different. Most important issues and strategies to cope with them are in next slides. (comparison on data at December 30, 2009)
  • 36. (A) Number of nodes (3) Manual coding 918 (1) Signatures 906 (2) History 981 Why? See next slides
  • 37. (B) Renamed users Small issue but relevant impact Venetian Wikipedia = 15 renamings English Wikipedia = 17,096 renamings
  • 38. (B) Renamed users Vec.wiki: “Maximillion Pegasus” user wrote msgs on User talk pages Then a person requested username “Maximillion Pegasus” and got it. Bureaucrats renamed “Maximillion Pegasus” into “Usurped12032009”. UTP of “Usurped12032009” contains messages received when he was “Maximillion Pegasus”. The new “Maximillion Pegasus” never received msg Existing signatures not affected by rename. So Usurped12032009 has high indegree and 0 outdegree “Maximillion Pegasus” has 0 indegree and high outdegree. Got time to find this user, understand the issue, figure out it was not a bug in our code! Signature makes error in this case! Manual coding too! History works because XML file contains the username of the „real“ user such as Usurped12032009
  • 39. (B) Renamed users This issue is NOT marginal. 17,000+ renamings in the English Wikipedia and usually involving very active and peculiar users! This issue affects the most basic element of social networks, number of nodes!
  • 40. (C) Number of edges #pairs of users (unweighted) among which at least 1 msgs was written (3) Manual coding 1073 (1) Signatures 1087 (2) History 1869 Why? See next slides
  • 41. (D) Information messages and redirects “I don't check this vec.wiki often, please write to User:X on en.wiki [Signature of User:X]“ → usex X in en.wiki might be different from user X in vec.wiki: only users in one wiki are considered (bot)“This is a bot, please write User:X“ Information messages 60/1786 Redirects 27/1786 Manual coding = OK Signature = ~KO History = ~OK (but … A edits UTP of A...)
  • 42. (E) Messages to oneself A writes on UTP of A 56/1786 messages were self-edges Wikipedia recommendation: A replies to B on UTP of B Small evidence but it seems to happen: self-edges are rare and mainly information messages
  • 43. (F) Non human users writing messages Each bot has its own “logic“. 1 example: Marco27bot is a welcome bot
  • 44. Many messages are templates! Welcome templates {{benvegnu}} Out of 1786 msgs, 774 (43.33%) are welcome templates. In vec.wiki, Written by a bot Marco27Bot, but signed with usernames of volunteers Manual coding and Signature algo: find signers (appearance) History finds bot (reality) Suggestion: don't consider bots because of their automated nature
  • 45. (G) Anonymous users, vandalism and deleted messages Anon users (IP address) have UTPs They received 33 message from bots about possible vandalism Many of their edits got deleted Coding and Signature don't find deleted edits History finds them Suggestion: remove anonymous users (IP addresses don't map 1to1 to person anyway)
  • 46. (H) Many edits per message I edit the UTP of X, I discover a typo, I re-edit the UTP of X These are not 2 messages but history algorithm detects 2 edits. Possible heuristics: collapse edits occurring during short time
  • 47. (I) Personalized, missing or incorrectly formatted signatures Large variety in personalized signatures Hard to detect reliably all signatures, especially for very active users! And in each language Wikipedia, different practices. Most active vec.wiki user used a template for signature! {{Utente:Nick1915/firma}} Biggest drawback of signature algorithm
  • 48. (I) Personalized, missing or incorrectly formatted signatures Users forget to sign (not automatic). A bot (Sinebot in EnWiki and Marco27Bot in VecWiki) edits the page and add signature. → It seems the bot “talks“ a lot. Some users make errors in the syntax for signing Signature = KO History = OK (forgot to sign is not a problem, but discard bots)
  • 49. (J) Date of message Messages are (often) dated → possible longitudinal analysis! Signature algo = KO: must detect syntax of date, different over time (in vec.wiki) and different in each language wikipedia History algo = OK: has the info formally coded in XML dump         <timestamp>2006­11­07T15:29:48Z</timestamp>
  • 50. (K) Archived messages When UTPs become long, they get archived (by a bot). Current content is copied to a newly created page such as User_talk:Phauly/Archive3 But NOT all subpages of UTP are archives! Coding and Signature = KO: decide to look for signatures in subpages based on heuristics on page title (what is this in Chinese Wikipedia)? History = OK: edits are done to “main“ UTP Issue very relevant for “active“ users!
  • 51. Our scripts are open source! You can run it and extract networks (in order to analyzed them). Python code at https://github.com/phauly/wiki-network Networks already available as extracted by 2 algorithms for German, Spanish, Italian, Chinese and Venetian Wikipedia http://sonetlab.fbk.eu/data/social_networks_of_wikipedia/ GraphML format: play with them using Gephi! (http://www.gephi.org) Social Network Analysis of who talks to whom on Wikipedia is possible without caring about all these details of extraction!
  • 52. Size=Indegree (#received msgs) Color=Role 2005-2010 Cumulative Weighted Directed Social network (who talks to whom) Nodes=Users (918) (out of 6255 registered users) Edges=#Messages
  • 53. Nodes=Users (918) Most users just received messages (receivers, passive) Only 196 users wrote At least one msg! (senders, active)
  • 54. Discussion No algo is “correct“, not even manual coding! Bots and anonymous users should be removed and analyzed ad hoc Interested in (1) the network users see (with its variability in signatures and formats) Signature algorithm ok but works only on one language Wikipedia and needs tweaking (2) the network of what really happened History algorithm more robust, also across wikis (cross-wiki comparison) and with dates (longitudinal analysis).
  • 55. Conclusions Small change in algorithm/assumption = big change in “what you extract“ and hence in “what you find“!! Proposed 2 algorithms Empirical Validation by manual coding 1) Bots and anonymous to be excluded and treated separately and adhoc 2) History algorithm = more robust Opensource scripts: First step towards sociology of wikis
  • 56. Credits I would like to thanks Davide Setti Marco Frassoni For writing the code and for manual coding Don't forget Call for Postdoc at SoNet https://risorseumane.fbk.eu/it/node/234
  • 57. ? Thanks