SlideShare a Scribd company logo
1 of 20
Download to read offline
Dissecting Wikipedia




                                     Andrew Gray

           Wikipedian in Residence, British Library

              andrew.gray@bl.uk // @generalising
Wikipedia & Wikimedia



   Wikimedia
      Movement and charitable body
      80-100,000 contributors in 280 languages
        and eleven core projects
      Image repository, dictionary, news site…
      …used by almost 500,000,000 people



   Wikipedia
      25,000,000 articles, 4,000,000 in English
      representing 8-9,000,000 topics & entities
      6,500 articles and 235,000 edits per day

    (…and twelve years ago, this was all fields…)
…so what is Wikipedia?



   …an encyclopedia (more or less)

   …written neutrally

   …and verifiably

   …using previously published information

   …free to use, distribute, or reuse

   …a collaborative community

   …with no firm rules
A developing internal infrastructure



   All edits are visible through watchlists and page histories
      About 7% are vandalism or malicious; processes to detect
         these
      Median time to correction < 2 minutes… but some stay much
         longer

   Individual discussion pages for all articles – “talk”

   Quality review and assessment process

   Specialised working groups and central noticeboards
      eg/ content topics; style; dispute resolution; copyright; etc.
Quality of Wikipedia as a source



   On average… it’s not bad
      In 2005 four errors per article, versus three in Britannica
      In 2011, in English, Spanish & Arabic:
            “…the Wikipedia articles in this sample scored higher overall than the
            comparison articles with respect to accuracy, references, style/
            readability and overall judgment…”

   Millions of articles – so many are, individually, problematic
      Various ways of identifying “signs” of quality
      Markers for quality are both obvious and subtle



   Very effective “springboard” tool
Moving to other content



   Other languages – not translations, and may have more content

   Mousing over footnote markers

   Within the references:
      Links through DOIs and other identifiers
      ISBNs go to a special landing page
           …and then out to libraries, booksellers, etc
      ISSNs go to WorldCat
      If an author, look for authority control links:
Other research tools



   Some tools available – “toolserver” allows live DB queries
      Complex to use, but rewarding




   CatScan: look for intersection of categories
      “all physicists born in 1912” – 53 in English, 35 in German




   Full dumps of all data available – http://dumps.wikipedia.org/



   Reusers – Freebase, DBpedia, Wolfram Alpha
Wikidata



   Wikidata: our new linked data repository
      Phase I: cross-language links
      Phase II: structured data elements
      Phase III: dynamic lists




   Very loosely defined schema

   Currently harvesting structured data from WP

   Public API, open to reusers

   CC-0 licensed data – fully open
Research about Wikipedia



   Thriving research around Wikimedia communities & content
      by mid-2011, 2100 peer-reviewed articles and 38 PhD theses
      Active research committee and WMF support

   Regular community-produced monthly newsletter
      http://enwp.org/meta:Research // @wikiresearch

   Topics include:
      Community and content creation
      Reading and researching by users
      Quality of content
      Technical research
      Large-scale content examination
Research on communities



   Research on the Wikipedia communities:


        Dynamics of community conflict, discussions, collaboration,
         voting, contribution, mentoring…
        Demographics, motivation and specialisms of contributors
        Patterns of growth and content creation/deletion
        Effect of central programs on volunteer activity
        Cross-cultural interaction
Visualisation: discussion dynamics




                                     http://notabilia.net/
Editor activity and motivation




                        http://commons.wikimedia.org/wiki/File:Effect_of_barnstars_on_productivity.png
Research on users



   Research on usage of Wikipedia:


        Specific searching behaviour
        Patterns of usage (yearly, daily)
        Tracking external events through Wikipedia
        Search engine rankings
        Change in usage by students
        Effect of Wikipedia publication on wider literature
Visualising editing patterns




                       http://commons.wikimedia.org/wiki/File:WikiTrip_egyptian_revolution_screenshot.png
Research on content



   Research on the content of Wikipedia:


        Evolution of content
        Accuracy, coverage and quality
        Biases – geographic, cultural, gender
        Linguistic analysis
        Effect of external publications on Wikipedia
Quality assessment comparisons




           http://commons.wikimedia.org/wiki/File:Boxplot_of_Average_Article_Feedback_ratings_by_project_rated_quality.svg
Research on technical aspects



   Research on the technical side of Wikipedia:


      Extensive work on scaling open-content services
      Tools for detecting and handling vandalism
      Algorithmic detection and identification of bias, spam
      Practical research on uses of wikis
Research using content



   Research using content from Wikipedia

   Hard to distinguish from “conventional” research, but some
    examples:


      Geographical analysis
      Visualisations of content
      Source for extracted datasets


        ...and Wikidata still to come!
Visualising art history




                          http://commons.wikimedia.org/wiki/File:Wikiarthistory.png
Visualising place




                    https://commons.wikimedia.org/wiki/File:Imageworld-artphp3.png

More Related Content

What's hot

Wikimedia Translation in Meta-Wiki
Wikimedia Translation in Meta-WikiWikimedia Translation in Meta-Wiki
Wikimedia Translation in Meta-Wikiaphaia
 
Wikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving DocumentationWikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving DocumentationTed Habermann
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for DiscoveryOCLC
 
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...DuraSpace
 
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...DuraSpace
 
Introduction to databases and metadata
Introduction to databases and metadataIntroduction to databases and metadata
Introduction to databases and metadatalibrarianrafia
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa
 

What's hot (20)

2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...
Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...
Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...
 
Library Support of Identification and Discovery of Scholarly Output - Cross- ...
Library Support of Identification and Discovery of Scholarly Output - Cross- ...Library Support of Identification and Discovery of Scholarly Output - Cross- ...
Library Support of Identification and Discovery of Scholarly Output - Cross- ...
 
Knowledge Unlatched – Navigating Through the Rapids of Change
Knowledge Unlatched – Navigating Through the Rapids of Change 	Knowledge Unlatched – Navigating Through the Rapids of Change
Knowledge Unlatched – Navigating Through the Rapids of Change
 
KBART-Wilson-ALA Annual 2015 NISO Update
KBART-Wilson-ALA Annual 2015 NISO UpdateKBART-Wilson-ALA Annual 2015 NISO Update
KBART-Wilson-ALA Annual 2015 NISO Update
 
Wikimedia Translation in Meta-Wiki
Wikimedia Translation in Meta-WikiWikimedia Translation in Meta-Wiki
Wikimedia Translation in Meta-Wiki
 
Open Discovery Initiative Successes - January 28, 2015
Open Discovery Initiative Successes - January 28, 2015Open Discovery Initiative Successes - January 28, 2015
Open Discovery Initiative Successes - January 28, 2015
 
The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?
 
Wikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving DocumentationWikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving Documentation
 
Open Access Metadata Indicators - NISO Update Jan 2014
Open Access Metadata Indicators - NISO Update Jan 2014Open Access Metadata Indicators - NISO Update Jan 2014
Open Access Metadata Indicators - NISO Update Jan 2014
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for Discovery
 
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
 
ALA 2016 NISO Standards Update Hillman Bibliographic Roadmap
ALA 2016 NISO Standards Update Hillman Bibliographic RoadmapALA 2016 NISO Standards Update Hillman Bibliographic Roadmap
ALA 2016 NISO Standards Update Hillman Bibliographic Roadmap
 
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Introduction to databases and metadata
Introduction to databases and metadataIntroduction to databases and metadata
Introduction to databases and metadata
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
 
Caldrone - Specific Needs and Concerns Associated with Data Repositories
Caldrone - Specific Needs and Concerns Associated with Data RepositoriesCaldrone - Specific Needs and Concerns Associated with Data Repositories
Caldrone - Specific Needs and Concerns Associated with Data Repositories
 

Viewers also liked

Lecture 25: Wikipedia and Reliability
Lecture 25: Wikipedia and ReliabilityLecture 25: Wikipedia and Reliability
Lecture 25: Wikipedia and Reliabilitydul_e
 
Wikipedia and Medicine
Wikipedia and MedicineWikipedia and Medicine
Wikipedia and MedicineJake Orlowitz
 
FirstWorkshopOnWikipediaResearch
FirstWorkshopOnWikipediaResearchFirstWorkshopOnWikipediaResearch
FirstWorkshopOnWikipediaResearchwebuploader
 

Viewers also liked (6)

Lecture 25: Wikipedia and Reliability
Lecture 25: Wikipedia and ReliabilityLecture 25: Wikipedia and Reliability
Lecture 25: Wikipedia and Reliability
 
Trusting wikipedia
Trusting wikipediaTrusting wikipedia
Trusting wikipedia
 
Wikipedia and Medicine
Wikipedia and MedicineWikipedia and Medicine
Wikipedia and Medicine
 
The Wikipedia Model
The Wikipedia ModelThe Wikipedia Model
The Wikipedia Model
 
Wikipedia basics
Wikipedia basicsWikipedia basics
Wikipedia basics
 
FirstWorkshopOnWikipediaResearch
FirstWorkshopOnWikipediaResearchFirstWorkshopOnWikipediaResearch
FirstWorkshopOnWikipediaResearch
 

Similar to Dissecting Wikipedia

Using wikis in library liaison work: overview & trends
Using wikis in library liaison work: overview & trendsUsing wikis in library liaison work: overview & trends
Using wikis in library liaison work: overview & trendsMolly Knapp
 
Contributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaContributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaNick Sheppard
 
Wikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s VisibilityiWikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s VisibilityiJake Orlowitz
 
Wikimedia Presentation for Schools
Wikimedia Presentation for SchoolsWikimedia Presentation for Schools
Wikimedia Presentation for SchoolsCraig Franklin
 
Mediawiki and Wiki As a Medium
Mediawiki and Wiki As a MediumMediawiki and Wiki As a Medium
Mediawiki and Wiki As a MediumRandy Thornton
 
An Analysis Of Wikipedia Digital Writing
An Analysis Of Wikipedia Digital WritingAn Analysis Of Wikipedia Digital Writing
An Analysis Of Wikipedia Digital WritingSherri Cost
 
Enrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysisEnrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysisWłodzimierz Lewoniewski
 
The Future of Libraries and Wikipedia
The Future of Libraries and WikipediaThe Future of Libraries and Wikipedia
The Future of Libraries and WikipediaJake Orlowitz
 
Using wikis for teaching
Using wikis for teachingUsing wikis for teaching
Using wikis for teachingMartin Walker
 
SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia dorohoward
 
Future libraries london
Future libraries londonFuture libraries london
Future libraries londonJake Orlowitz
 
ALIA Wikipedia and libraries
ALIA Wikipedia and librariesALIA Wikipedia and libraries
ALIA Wikipedia and librariesPru Mitchell
 
Wikipedia Seminar For Cipr October 2010
Wikipedia Seminar For Cipr October 2010Wikipedia Seminar For Cipr October 2010
Wikipedia Seminar For Cipr October 2010SteveVirgin
 
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...innovatics
 
Student to Author: Using Wikipedia to Improve Undergraduate Research & Writing
Student to Author: Using Wikipedia to Improve Undergraduate Research & WritingStudent to Author: Using Wikipedia to Improve Undergraduate Research & Writing
Student to Author: Using Wikipedia to Improve Undergraduate Research & WritingMargot
 
Web 2.0
Web 2.0Web 2.0
Web 2.0bjornh
 
Social Networking: Tools and Technologies for enhancing user interaction
Social Networking: Tools and Technologies for enhancing user interactionSocial Networking: Tools and Technologies for enhancing user interaction
Social Networking: Tools and Technologies for enhancing user interactionADINET Ahmedabad
 

Similar to Dissecting Wikipedia (20)

Using wikis in library liaison work: overview & trends
Using wikis in library liaison work: overview & trendsUsing wikis in library liaison work: overview & trends
Using wikis in library liaison work: overview & trends
 
Contributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaContributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and Wikimedia
 
Wrangling Wikipedia
Wrangling WikipediaWrangling Wikipedia
Wrangling Wikipedia
 
Wikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s VisibilityiWikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s Visibilityi
 
Wikimedia Presentation for Schools
Wikimedia Presentation for SchoolsWikimedia Presentation for Schools
Wikimedia Presentation for Schools
 
Wiki on Library Perspective
Wiki on Library PerspectiveWiki on Library Perspective
Wiki on Library Perspective
 
Mediawiki and Wiki As a Medium
Mediawiki and Wiki As a MediumMediawiki and Wiki As a Medium
Mediawiki and Wiki As a Medium
 
An Analysis Of Wikipedia Digital Writing
An Analysis Of Wikipedia Digital WritingAn Analysis Of Wikipedia Digital Writing
An Analysis Of Wikipedia Digital Writing
 
Enrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysisEnrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysis
 
The Future of Libraries and Wikipedia
The Future of Libraries and WikipediaThe Future of Libraries and Wikipedia
The Future of Libraries and Wikipedia
 
Using wikis for teaching
Using wikis for teachingUsing wikis for teaching
Using wikis for teaching
 
An introduction to Wikipedia and cataloguing issues
An introduction to Wikipedia and cataloguing issuesAn introduction to Wikipedia and cataloguing issues
An introduction to Wikipedia and cataloguing issues
 
SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia
 
Future libraries london
Future libraries londonFuture libraries london
Future libraries london
 
ALIA Wikipedia and libraries
ALIA Wikipedia and librariesALIA Wikipedia and libraries
ALIA Wikipedia and libraries
 
Wikipedia Seminar For Cipr October 2010
Wikipedia Seminar For Cipr October 2010Wikipedia Seminar For Cipr October 2010
Wikipedia Seminar For Cipr October 2010
 
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
 
Student to Author: Using Wikipedia to Improve Undergraduate Research & Writing
Student to Author: Using Wikipedia to Improve Undergraduate Research & WritingStudent to Author: Using Wikipedia to Improve Undergraduate Research & Writing
Student to Author: Using Wikipedia to Improve Undergraduate Research & Writing
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 
Social Networking: Tools and Technologies for enhancing user interaction
Social Networking: Tools and Technologies for enhancing user interactionSocial Networking: Tools and Technologies for enhancing user interaction
Social Networking: Tools and Technologies for enhancing user interaction
 

More from Andrew Gray

Wikipedia and information literacy - LILAC 2014
Wikipedia and information literacy - LILAC 2014Wikipedia and information literacy - LILAC 2014
Wikipedia and information literacy - LILAC 2014Andrew Gray
 
Wikipedia in the Library - The European Library, Amsterdam 2013
Wikipedia in the Library - The European Library, Amsterdam 2013Wikipedia in the Library - The European Library, Amsterdam 2013
Wikipedia in the Library - The European Library, Amsterdam 2013Andrew Gray
 
Community communications slides
Community communications slidesCommunity communications slides
Community communications slidesAndrew Gray
 
Wikipedia in the Library Wikimania Hong Kong
Wikipedia in the Library   Wikimania Hong KongWikipedia in the Library   Wikimania Hong Kong
Wikipedia in the Library Wikimania Hong KongAndrew Gray
 
Introduction to Wikidata
Introduction to WikidataIntroduction to Wikidata
Introduction to WikidataAndrew Gray
 
Social Media at the British Library - Royal Manuscripts
Social Media at the British Library - Royal ManuscriptsSocial Media at the British Library - Royal Manuscripts
Social Media at the British Library - Royal ManuscriptsAndrew Gray
 
AHRC Wikipedian in Residence Report
AHRC Wikipedian in Residence ReportAHRC Wikipedian in Residence Report
AHRC Wikipedian in Residence ReportAndrew Gray
 
Wikipedia Workshop presentation
Wikipedia Workshop presentationWikipedia Workshop presentation
Wikipedia Workshop presentationAndrew Gray
 

More from Andrew Gray (8)

Wikipedia and information literacy - LILAC 2014
Wikipedia and information literacy - LILAC 2014Wikipedia and information literacy - LILAC 2014
Wikipedia and information literacy - LILAC 2014
 
Wikipedia in the Library - The European Library, Amsterdam 2013
Wikipedia in the Library - The European Library, Amsterdam 2013Wikipedia in the Library - The European Library, Amsterdam 2013
Wikipedia in the Library - The European Library, Amsterdam 2013
 
Community communications slides
Community communications slidesCommunity communications slides
Community communications slides
 
Wikipedia in the Library Wikimania Hong Kong
Wikipedia in the Library   Wikimania Hong KongWikipedia in the Library   Wikimania Hong Kong
Wikipedia in the Library Wikimania Hong Kong
 
Introduction to Wikidata
Introduction to WikidataIntroduction to Wikidata
Introduction to Wikidata
 
Social Media at the British Library - Royal Manuscripts
Social Media at the British Library - Royal ManuscriptsSocial Media at the British Library - Royal Manuscripts
Social Media at the British Library - Royal Manuscripts
 
AHRC Wikipedian in Residence Report
AHRC Wikipedian in Residence ReportAHRC Wikipedian in Residence Report
AHRC Wikipedian in Residence Report
 
Wikipedia Workshop presentation
Wikipedia Workshop presentationWikipedia Workshop presentation
Wikipedia Workshop presentation
 

Recently uploaded

Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 

Recently uploaded (20)

Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 

Dissecting Wikipedia

  • 1. Dissecting Wikipedia Andrew Gray Wikipedian in Residence, British Library andrew.gray@bl.uk // @generalising
  • 2. Wikipedia & Wikimedia  Wikimedia  Movement and charitable body  80-100,000 contributors in 280 languages and eleven core projects  Image repository, dictionary, news site…  …used by almost 500,000,000 people  Wikipedia  25,000,000 articles, 4,000,000 in English  representing 8-9,000,000 topics & entities  6,500 articles and 235,000 edits per day (…and twelve years ago, this was all fields…)
  • 3. …so what is Wikipedia?  …an encyclopedia (more or less)  …written neutrally  …and verifiably  …using previously published information  …free to use, distribute, or reuse  …a collaborative community  …with no firm rules
  • 4. A developing internal infrastructure  All edits are visible through watchlists and page histories  About 7% are vandalism or malicious; processes to detect these  Median time to correction < 2 minutes… but some stay much longer  Individual discussion pages for all articles – “talk”  Quality review and assessment process  Specialised working groups and central noticeboards  eg/ content topics; style; dispute resolution; copyright; etc.
  • 5. Quality of Wikipedia as a source  On average… it’s not bad  In 2005 four errors per article, versus three in Britannica  In 2011, in English, Spanish & Arabic: “…the Wikipedia articles in this sample scored higher overall than the comparison articles with respect to accuracy, references, style/ readability and overall judgment…”  Millions of articles – so many are, individually, problematic  Various ways of identifying “signs” of quality  Markers for quality are both obvious and subtle  Very effective “springboard” tool
  • 6. Moving to other content  Other languages – not translations, and may have more content  Mousing over footnote markers  Within the references:  Links through DOIs and other identifiers  ISBNs go to a special landing page  …and then out to libraries, booksellers, etc  ISSNs go to WorldCat  If an author, look for authority control links:
  • 7. Other research tools  Some tools available – “toolserver” allows live DB queries  Complex to use, but rewarding  CatScan: look for intersection of categories  “all physicists born in 1912” – 53 in English, 35 in German  Full dumps of all data available – http://dumps.wikipedia.org/  Reusers – Freebase, DBpedia, Wolfram Alpha
  • 8. Wikidata  Wikidata: our new linked data repository  Phase I: cross-language links  Phase II: structured data elements  Phase III: dynamic lists  Very loosely defined schema  Currently harvesting structured data from WP  Public API, open to reusers  CC-0 licensed data – fully open
  • 9. Research about Wikipedia  Thriving research around Wikimedia communities & content  by mid-2011, 2100 peer-reviewed articles and 38 PhD theses  Active research committee and WMF support  Regular community-produced monthly newsletter  http://enwp.org/meta:Research // @wikiresearch  Topics include:  Community and content creation  Reading and researching by users  Quality of content  Technical research  Large-scale content examination
  • 10. Research on communities  Research on the Wikipedia communities:  Dynamics of community conflict, discussions, collaboration, voting, contribution, mentoring…  Demographics, motivation and specialisms of contributors  Patterns of growth and content creation/deletion  Effect of central programs on volunteer activity  Cross-cultural interaction
  • 11. Visualisation: discussion dynamics http://notabilia.net/
  • 12. Editor activity and motivation http://commons.wikimedia.org/wiki/File:Effect_of_barnstars_on_productivity.png
  • 13. Research on users  Research on usage of Wikipedia:  Specific searching behaviour  Patterns of usage (yearly, daily)  Tracking external events through Wikipedia  Search engine rankings  Change in usage by students  Effect of Wikipedia publication on wider literature
  • 14. Visualising editing patterns http://commons.wikimedia.org/wiki/File:WikiTrip_egyptian_revolution_screenshot.png
  • 15. Research on content  Research on the content of Wikipedia:  Evolution of content  Accuracy, coverage and quality  Biases – geographic, cultural, gender  Linguistic analysis  Effect of external publications on Wikipedia
  • 16. Quality assessment comparisons http://commons.wikimedia.org/wiki/File:Boxplot_of_Average_Article_Feedback_ratings_by_project_rated_quality.svg
  • 17. Research on technical aspects  Research on the technical side of Wikipedia:  Extensive work on scaling open-content services  Tools for detecting and handling vandalism  Algorithmic detection and identification of bias, spam  Practical research on uses of wikis
  • 18. Research using content  Research using content from Wikipedia  Hard to distinguish from “conventional” research, but some examples:  Geographical analysis  Visualisations of content  Source for extracted datasets  ...and Wikidata still to come!
  • 19. Visualising art history http://commons.wikimedia.org/wiki/File:Wikiarthistory.png
  • 20. Visualising place https://commons.wikimedia.org/wiki/File:Imageworld-artphp3.png