SlideShare a Scribd company logo
LOOPS OF
HUMANS AND
BOTS IN
WIKIDATA
Elena Simperl
University of Southampton, UK
@esimperl
OVERVIEW
Wikidata is a critical AI asset
in many domains
Recent project of Wikimedia
(2012), edited
collaboratively
Our research assesses the
quality of Wikidata and the
link between community
processes and quality
WHAT IS WIKIDATA
BASIC FACTS
Collaborative knowledge graph
100k registered users, 46M items
Open licence
RDF exports, connected to Linked Open Data Cloud
THE KNOWLEDGE GRAPH
STATEMENTS, ITEMS, PROPERTIES
Item identifiers start with a Q, property identifiers
start with a P
5
Q84
London
Q334155
Sadiq Khan
P6
head of government
THE KNOWLEDGE GRAPH
ITEMS CAN BE CLASSES, ENTITIES, VALUES
6
Q7259
Ada Lovelace
Q84
London
Q334155
Sadiq Khan
P6
head of government
Q727
Amsterdam
Q515
city
Q6581097
male
Q59360
Labour party
Q145
United Kingdom
THE KNOWLEDGE GRAPH
ADDING CONTEXT TO STATEMENTS
Statements may include context
 Qualifiers (optional)
 References (required)
Two types of references
 Internal, linking to another item
 External, linking to webpage
7
Q84
London
Q334155
Sadiq Khan
P6
head
of government
9 May 2016
https://www.london.gov.uk/...
THE KNOWLEDGE GRAPH
CO-EDITED BY BOTS AND HUMANS
Human editors can register or work anonymously
Bots created by community for routine tasks
18k active human users, 200+ bots
OUR WORK
Effects of editing behaviour and community
make-up on the knowledge graph
Content quality as a function of its provenance
Tools to improve content diversity
THE RIGHT MIX OF USERS
Piscopo, A., Phethean, C., & Simperl, E. (2017). What
Makes a Good Collaborative Knowledge Graph:
Group Composition and Quality in Wikidata.
International Conference on Social Informatics, 305-
322, Springer.
BACKGROUND
Wikidata editors have varied tenure and interests
Group composition impacts outcomes
 Diversity can have multiple effects
 Moderate tenure diversity increases outcome quality
 Interest diversity leads to increased group productivity
Chen, J., Ren, Y., Riedl, J.: The effects of diversity on group productivityand member withdrawalin online volunteer groups. In: Proceedingsof the 28th international
conference on human factors in computing systems - CHI ’10. p. 821. ACM Press, New York, USA (2010)
OUR STUDY
Analysed the edit history of items
Corpus of 5k items, whose quality has been
manually assessed (5 levels)*
Edit history focused on community make-up
Community is defined as set of editors of item
Considered features from group diversity
literature and Wikidata-specific aspects
*https://www.wikidata.org/wiki/Wikidata:Item_quality
RESEARCH HYPOTHESES
Activity Outcome
H1 Bots edits Item quality
H2 Bot-human interaction Item quality
H3 Anonymous edits Item quality
H4 Tenure diversity Item quality
H5 Interest diversity Item quality
DATA AND METHODS
 Ordinal regression analysis, four models were trained
 Dependent variable: 5k labelled Wikidata items
 Independent variables
 Proportion of bot edits
 Bot human edit proportion
 Proportion of anonymous edits
 Tenure diversity: Coefficient of variation
 Interest diversity: User editing matrix
 Control variables: group size, item age
RESULTS
ALL HYPOTHESES SUPPORTED
H1
H2
H3 H4
H5
LESSONS LEARNED
The more is not
always the
merrier
01
Bot edits are key
for quality, but
bots and humans
are better
02
Diversity matters
03
IMPLICATIONS
Encourage
registration
01
Identify
further areas
for bot editing
02
Design
effective
human-bot
workflows
03
Suggest items
to edit based
on tenure and
interests
04
LIMITATIONS AND FUTURE WORK
 Did not consider evolution of quality over time
 Sample vs Wikidata (most items C or lower)
 Other group features (e.g., coordination) not
considered
 No distinction between editing activities (e.g.,
schema vs instances, topics etc.)
 Different metrics of interest (topics, type of
activity)
18
THE CONTENT IS AS
GOOD AS ITS
REFERENCES
Piscopo, A., Kaffee, L. A., Phethean, C., & Simperl, E.
(2017). Provenance Information in a Collaborative
Knowledge Graph: an Evaluation of Wikidata External
References. International Semantic Web Conference,
542-558, Springer.
19
PROVENANCE IN WIKIDATA
Statements may include context
 Qualifiers (optional)
 References (required)
Two types of references
 Internal, linking to another item
 External, linking to webpage
Q84
London
Q334155
Sadiq Khan
P6
head
of government
9 May 2016
https://www.london.gov.uk/...
THE ROLE OF PROVENANCE
Wikidata aims to become a hub of references
Provenance increases trust in Wikidata
Lack of provenance hinders content reuse
Quality of references is yet unknown
Hartig, O. (2009). Provenance Information in the Web of Data. LDOW, 538.
OUR STUDY
Approach to evaluate quality of external
references in Wikidata
Quality is defined by the Wikidata verifiability
policy
 Relevant: support the statement they are attached to
 Authoritative: trustworthy, up-to-date, and free of bias for supporting a
particular statement
Large-scale (the whole of Wikidata)
Bot vs. human-contributed references
RESEARCH QUESTIONS
RQ1 Are Wikidata external references relevant?
RQ2 Are Wikidata external references
authoritative?
I.e., do they match the author and publisher types from
the Wikidata policy?
RQ3 Can we automatically detect non-relevant
and non-authoritative references?
METHODS
TWO STAGE MIXED APPROACH
1. Microtask crowdsourcing
Evaluate relevance & authoritativeness
of a reference sample
Create training set for machine
learning model
2. Machine learning
Large-scale reference quality prediction
RQ1 RQ2
RQ3
STAGE 1: MICROTASK CROWDSOURCING
3 tasks on Crowdflower
5 workers/task, majority voting
Test questions to select workers
25
Feature Microtask Description
Relevance T1 Does the reference support the statement?
Authoritativeness
T2 Choose author type from list
T3.A Choose publisher type from list
T3.B Verify publisher type, then choose sub-type from list
RQ1
RQ2
STAGE 2: MACHINE LEARNING
Compared three algorithms
 Naïve Bayes, Random Forest, SVM
Features based on [Lehmann et al., 2012 & Potthast et
al. 2008]
Baseline: item labels matching (relevance);
deprecated domains list (authoritativeness)
RQ3
Features
URL reference uses Subject parent class
Source HTTP code Property parent class
Statement item vector Object parent class
Statement object vector Author type
Author activity Author activity on references
DATA
1.6M external references (6% of total)
 1.4M from two sources (protein KBs)
83,215 English-language references
 Sample of 2586 (99% conf., 2.5% m. of error)
 885 assessed automatically, e.g., links not working
or csv files
RESULTS: CROWDSOURCING
CROWDSOURCING WORKS
Trusted workers: >80% accuracy
95% of responses from T3.A confirmed in T3.B
Task No. of microtasks Total workers Trusted workers Workers’ accuracy Fleiss’ k
T1 1701 references 457 218 75% 0.335
T2 1178 links 749 322 75% 0.534
T3.A 335 web domains 322 60 66% 0.435
T3.B 335 web domains 239 116 68% 0.391
RESULTS: CROWDSOURCING
MAJORITY OF REFERENCES ARE HIGH QUALITY
2586 references evaluated
Found 1674 valid references from 345 domains
Broken URLs deemed not relevant and not authoritative
RQ1
RQ2
RESULTS: CROWDSOURCING
HUMANS ARE BETTER AT EDITING REFERENCES
RQ1
RQ2
RESULTS: CROWDSOURCING
DATA FROM GOVERNMENT AND ACADEMIC SOURCES
Most common author type (T2)
 Organisation (78%)
Most common publisher types (T3)
 Governmental agencies (37%)
 Academic organisations (24%)
RQ2
RESULTS: MACHINE LEARNING
RANDOM FORESTS PERFORM BEST
F1 MCC
Relevance
Baseline 0.84 0.68
Naïve Bayes 0.90 0.86
Random Forest 0.92 0.89
SVM 0.91 0.87
Authoritativeness
Baseline 0.53 0.16
Naïve Bayes 0.86 0.78
Random Forest 0.89 0.83
SVM 0.89 0.79
RQ3
LESSONS LEARNED
Crowdsourcing+ML works!
Many external sources are high quality
Bad references mainly non-working links,
continuous control required
Lack of diversity in bot-added sources
Humans and bots are good at different things
LIMITATIONS AND FUTURE WORK
Studies with non-English sources
Did not consider internal references
Deployment in Wikidata, including changes in
editing behaviour
FROM NEURAL
NETWORKS TO A
MULTILINGUAL
WIKIPEDIA
Kaffee, L., Elsahar, H., Vougiouklis, P., Gravier, C.,
Laforest, F., Hare, J., & Simperl, E. (2018) Mind the
(Language) Gap: Generation of Multilingual
Wikipedia Summaries from Wikidata for
ArticlePlaceholders. European Semantic Web
Conference, to appear. Springer
35
BACKGROUND
Wikipedia is available in 287
languages, but content is unevenly
distributed
Wikidata is cross-lingual
ArticlePlaceholders display
Wikidata triples as stubs for
articles in underserved
Wikipedia’s
Currently deployed in 11
Wikipedia’s
OUR STUDY
Enrich ArticlePlaceholders with textual
summaries generated from Wikidata
triples
Train a neural network to generate one
sentence summaries resembling the
opening paragraph of a Wikipedia
article
Test the approach on two languages,
Esperanto and Arabic with readers and
editors of those Wikipedia’s
RESEARCH QUESTIONS
RQ1 Can we automatically generate summaries
that match the quality and feel of Wikipedia in
different languages?
RQ2 Are summaries useful for the communities
editing underserved Wikipedia’s?
APPROACH
NEURAL NETWORK TRAINED ON WIKIDATA/WIKIPEDIA
Feed-forward architecture
encodes triples from the
ArticlePlaceholder into vector of
fixed dimensionality
RNN-based decoder generates
text summaries, one token at a
time
Optimisations for different
entity verbalisations, rare
entities etc.
EVALUATION
AUTOMATIC EVALUATION
Trained on corpus of Wikipedia sentences and
corresponding Wikidata triples (205k Arabic;
102k Esperanto)
Tested against three baselines: machine
translation (MT) and template retrieval (TR, TRext)
Using standard metrics: BLEU, METEOR, ROUGEL
RQ1
EVALUATION
USER STUDIES
Two 15 days online surveys with readers and
editors of the Arabic and Esperanto Wikipedia’s
 Readers survey
 60 articles (30 ours, 15 news items, 15 Wikipedia summaries from the training
corpus)
 Fluency: Is the text understandable and grammatically correct?
 Appropriateness: Does the summary ‘feel’ like a Wikipedia article?
 Editors survey
 30 automatically generated summaries
 Editors were asked to edit the article starting from our summary (2-3 sentences)
 Measured the extent to which the summary was reused (Greedy String Tiling – GST
– metric)
RQ1
RQ2
RESULTS: AUTOMATIC EVALUATION
APPROACH OUTPERFORMS BASELINES
RESULTS: USER STUDIES
SUMMARIES ARE USEFUL FOR THE COMMUNITY
Readers study
Editors study
LIMITATIONS AND
FUTURE WORK
No easy way to test whether
summaries would indeed
lead to more participation
on underserved Wikipedia’s
Wikidata itself needs more
multilingual labels
Ongoing Wikipedia study:
ask editors of Wikipedia
articles opportunistically to
add missing labels of
relevant Wikidata items
and properties
CONCLUSIONS
45
SUMMARY OF FINDINGS
Collaboration between human and bots is important
Tools needed to identify tasks for bots and continuously
study their effects on outcomes and community
Quality is a complex concept, we studied only a subset of
aspects
References are high quality, though biases exist in terms of
choice of sources
Automatically created content is useful to editors of
underserved Wikipedia’s

More Related Content

What's hot

Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on Twitter
Elena Simperl
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
Amrapali Zaveri, PhD
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data Stories
Elena Simperl
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profiles
Traian Rebedea
 
Designing a second generation of open data platforms
Designing a second generation of open data platformsDesigning a second generation of open data platforms
Designing a second generation of open data platforms
Yannis Charalabidis
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
Traian Rebedea
 
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
Anna De Liddo
 
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA DATASCIENCE
 
Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A Survey
Amrapali Zaveri, PhD
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
Paul Groth
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
Paul Groth
 
Noshir Contractor's view on the future of Linked Data
Noshir Contractor's view on the future of Linked DataNoshir Contractor's view on the future of Linked Data
Noshir Contractor's view on the future of Linked Data
Carlos Pedrinaci
 
Social Network Analysis with Spark
Social Network Analysis with SparkSocial Network Analysis with Spark
Social Network Analysis with Spark
Ghulam Imaduddin
 
The Semantic Web: It's for Real
The Semantic Web: It's for RealThe Semantic Web: It's for Real
The Semantic Web: It's for Real
James Hendler
 
Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)
Bradley Allen
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
National Information Standards Organization (NISO)
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
National Information Standards Organization (NISO)
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Artificial Intelligence Institute at UofSC
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Tharushi Ruwandika
 
From Big Data to the Big Picture
From Big Data to the Big PictureFrom Big Data to the Big Picture
From Big Data to the Big Picture
SAGE Publishing
 

What's hot (20)

Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on Twitter
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data Stories
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profiles
 
Designing a second generation of open data platforms
Designing a second generation of open data platformsDesigning a second generation of open data platforms
Designing a second generation of open data platforms
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
 
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
 
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1
 
Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A Survey
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
Noshir Contractor's view on the future of Linked Data
Noshir Contractor's view on the future of Linked DataNoshir Contractor's view on the future of Linked Data
Noshir Contractor's view on the future of Linked Data
 
Social Network Analysis with Spark
Social Network Analysis with SparkSocial Network Analysis with Spark
Social Network Analysis with Spark
 
The Semantic Web: It's for Real
The Semantic Web: It's for RealThe Semantic Web: It's for Real
The Semantic Web: It's for Real
 
Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
From Big Data to the Big Picture
From Big Data to the Big PictureFrom Big Data to the Big Picture
From Big Data to the Big Picture
 

Similar to Loops of humans and bots in Wikidata

Quality and collaboration in Wikidata
Quality and collaboration in WikidataQuality and collaboration in Wikidata
Quality and collaboration in Wikidata
Elena Simperl
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
Elena Simperl
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
Elena Simperl
 
Future of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic WebFuture of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic Web
is20090
 
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can EditWikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Dario Taraborelli
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
Carole Goble
 
Gatenby Vvbad 200909
Gatenby Vvbad 200909Gatenby Vvbad 200909
Wikidata as a hub for the linked data cloud
Wikidata as a hub for the linked data cloudWikidata as a hub for the linked data cloud
Wikidata as a hub for the linked data cloud
Joachim Neubert
 
Free For All: Getting Started in Open Source
Free For All: Getting Started in Open SourceFree For All: Getting Started in Open Source
Free For All: Getting Started in Open Source
Ali King
 
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
hypertext2007
 
PlanetData: Consuming Structured Data at Web Scale
PlanetData: Consuming Structured Data at Web ScalePlanetData: Consuming Structured Data at Web Scale
PlanetData: Consuming Structured Data at Web Scale
PlanetData Network of Excellence
 
Planetdata simpda
Planetdata simpdaPlanetdata simpda
Planetdata simpda
Elena Simperl
 
PSP 2018 - The Changing discovery landscape: Tools and services from wiley
PSP 2018 - The Changing discovery landscape: Tools and services from wileyPSP 2018 - The Changing discovery landscape: Tools and services from wiley
PSP 2018 - The Changing discovery landscape: Tools and services from wiley
Matthew Ragucci
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817
Figoblog
 
Wikis as Social Networks: Evolution and Dynamics
Wikis as Social Networks:Evolution and Dynamics Wikis as Social Networks:Evolution and Dynamics
Wikis as Social Networks: Evolution and Dynamics
Ralf Klamma
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
RDTF-Discovery
 
Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...
Toni Hermoso Pulido
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
lagoze
 
Who models the world? Collaborative ontology creation and user roles in Wikidata
Who models the world? Collaborative ontology creation and user roles in WikidataWho models the world? Collaborative ontology creation and user roles in Wikidata
Who models the world? Collaborative ontology creation and user roles in Wikidata
Alessandro Piscopo
 

Similar to Loops of humans and bots in Wikidata (20)

Quality and collaboration in Wikidata
Quality and collaboration in WikidataQuality and collaboration in Wikidata
Quality and collaboration in Wikidata
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Future of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic WebFuture of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic Web
 
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can EditWikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
 
Gatenby Vvbad 200909
Gatenby Vvbad 200909Gatenby Vvbad 200909
Gatenby Vvbad 200909
 
Wikidata as a hub for the linked data cloud
Wikidata as a hub for the linked data cloudWikidata as a hub for the linked data cloud
Wikidata as a hub for the linked data cloud
 
Free For All: Getting Started in Open Source
Free For All: Getting Started in Open SourceFree For All: Getting Started in Open Source
Free For All: Getting Started in Open Source
 
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
 
PlanetData: Consuming Structured Data at Web Scale
PlanetData: Consuming Structured Data at Web ScalePlanetData: Consuming Structured Data at Web Scale
PlanetData: Consuming Structured Data at Web Scale
 
Planetdata simpda
Planetdata simpdaPlanetdata simpda
Planetdata simpda
 
PSP 2018 - The Changing discovery landscape: Tools and services from wiley
PSP 2018 - The Changing discovery landscape: Tools and services from wileyPSP 2018 - The Changing discovery landscape: Tools and services from wiley
PSP 2018 - The Changing discovery landscape: Tools and services from wiley
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817
 
Wikis as Social Networks: Evolution and Dynamics
Wikis as Social Networks:Evolution and Dynamics Wikis as Social Networks:Evolution and Dynamics
Wikis as Social Networks: Evolution and Dynamics
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
 
Who models the world? Collaborative ontology creation and user roles in Wikidata
Who models the world? Collaborative ontology creation and user roles in WikidataWho models the world? Collaborative ontology creation and user roles in Wikidata
Who models the world? Collaborative ontology creation and user roles in Wikidata
 

More from Elena Simperl

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
Elena Simperl
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generation
Elena Simperl
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
Elena Simperl
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
Elena Simperl
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdf
Elena Simperl
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdf
Elena Simperl
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart cities
Elena Simperl
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart cities
Elena Simperl
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
Elena Simperl
 
Qrowd and the city
Qrowd and the cityQrowd and the city
Qrowd and the city
Elena Simperl
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approach
Elena Simperl
 
Making transport smarter, leveraging the human factor
Making transport smarter, leveraging the human factorMaking transport smarter, leveraging the human factor
Making transport smarter, leveraging the human factor
Elena Simperl
 
Data storytelling
Data storytelling Data storytelling
Data storytelling
Elena Simperl
 
Beyond monetary incentives: experiments with paid microtasks
Beyond monetary incentives: experiments with paid microtasksBeyond monetary incentives: experiments with paid microtasks
Beyond monetary incentives: experiments with paid microtasks
Elena Simperl
 
The Data Pitch call
The Data Pitch callThe Data Pitch call
The Data Pitch call
Elena Simperl
 

More from Elena Simperl (18)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generation
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdf
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdf
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart cities
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart cities
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
 
Qrowd and the city
Qrowd and the cityQrowd and the city
Qrowd and the city
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approach
 
Making transport smarter, leveraging the human factor
Making transport smarter, leveraging the human factorMaking transport smarter, leveraging the human factor
Making transport smarter, leveraging the human factor
 
Data storytelling
Data storytelling Data storytelling
Data storytelling
 
Beyond monetary incentives: experiments with paid microtasks
Beyond monetary incentives: experiments with paid microtasksBeyond monetary incentives: experiments with paid microtasks
Beyond monetary incentives: experiments with paid microtasks
 
The Data Pitch call
The Data Pitch callThe Data Pitch call
The Data Pitch call
 

Recently uploaded

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 

Recently uploaded (20)

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 

Loops of humans and bots in Wikidata

  • 1. LOOPS OF HUMANS AND BOTS IN WIKIDATA Elena Simperl University of Southampton, UK @esimperl
  • 2. OVERVIEW Wikidata is a critical AI asset in many domains Recent project of Wikimedia (2012), edited collaboratively Our research assesses the quality of Wikidata and the link between community processes and quality
  • 4. BASIC FACTS Collaborative knowledge graph 100k registered users, 46M items Open licence RDF exports, connected to Linked Open Data Cloud
  • 5. THE KNOWLEDGE GRAPH STATEMENTS, ITEMS, PROPERTIES Item identifiers start with a Q, property identifiers start with a P 5 Q84 London Q334155 Sadiq Khan P6 head of government
  • 6. THE KNOWLEDGE GRAPH ITEMS CAN BE CLASSES, ENTITIES, VALUES 6 Q7259 Ada Lovelace Q84 London Q334155 Sadiq Khan P6 head of government Q727 Amsterdam Q515 city Q6581097 male Q59360 Labour party Q145 United Kingdom
  • 7. THE KNOWLEDGE GRAPH ADDING CONTEXT TO STATEMENTS Statements may include context  Qualifiers (optional)  References (required) Two types of references  Internal, linking to another item  External, linking to webpage 7 Q84 London Q334155 Sadiq Khan P6 head of government 9 May 2016 https://www.london.gov.uk/...
  • 8. THE KNOWLEDGE GRAPH CO-EDITED BY BOTS AND HUMANS Human editors can register or work anonymously Bots created by community for routine tasks 18k active human users, 200+ bots
  • 9. OUR WORK Effects of editing behaviour and community make-up on the knowledge graph Content quality as a function of its provenance Tools to improve content diversity
  • 10. THE RIGHT MIX OF USERS Piscopo, A., Phethean, C., & Simperl, E. (2017). What Makes a Good Collaborative Knowledge Graph: Group Composition and Quality in Wikidata. International Conference on Social Informatics, 305- 322, Springer.
  • 11. BACKGROUND Wikidata editors have varied tenure and interests Group composition impacts outcomes  Diversity can have multiple effects  Moderate tenure diversity increases outcome quality  Interest diversity leads to increased group productivity Chen, J., Ren, Y., Riedl, J.: The effects of diversity on group productivityand member withdrawalin online volunteer groups. In: Proceedingsof the 28th international conference on human factors in computing systems - CHI ’10. p. 821. ACM Press, New York, USA (2010)
  • 12. OUR STUDY Analysed the edit history of items Corpus of 5k items, whose quality has been manually assessed (5 levels)* Edit history focused on community make-up Community is defined as set of editors of item Considered features from group diversity literature and Wikidata-specific aspects *https://www.wikidata.org/wiki/Wikidata:Item_quality
  • 13. RESEARCH HYPOTHESES Activity Outcome H1 Bots edits Item quality H2 Bot-human interaction Item quality H3 Anonymous edits Item quality H4 Tenure diversity Item quality H5 Interest diversity Item quality
  • 14. DATA AND METHODS  Ordinal regression analysis, four models were trained  Dependent variable: 5k labelled Wikidata items  Independent variables  Proportion of bot edits  Bot human edit proportion  Proportion of anonymous edits  Tenure diversity: Coefficient of variation  Interest diversity: User editing matrix  Control variables: group size, item age
  • 16. LESSONS LEARNED The more is not always the merrier 01 Bot edits are key for quality, but bots and humans are better 02 Diversity matters 03
  • 17. IMPLICATIONS Encourage registration 01 Identify further areas for bot editing 02 Design effective human-bot workflows 03 Suggest items to edit based on tenure and interests 04
  • 18. LIMITATIONS AND FUTURE WORK  Did not consider evolution of quality over time  Sample vs Wikidata (most items C or lower)  Other group features (e.g., coordination) not considered  No distinction between editing activities (e.g., schema vs instances, topics etc.)  Different metrics of interest (topics, type of activity) 18
  • 19. THE CONTENT IS AS GOOD AS ITS REFERENCES Piscopo, A., Kaffee, L. A., Phethean, C., & Simperl, E. (2017). Provenance Information in a Collaborative Knowledge Graph: an Evaluation of Wikidata External References. International Semantic Web Conference, 542-558, Springer. 19
  • 20. PROVENANCE IN WIKIDATA Statements may include context  Qualifiers (optional)  References (required) Two types of references  Internal, linking to another item  External, linking to webpage Q84 London Q334155 Sadiq Khan P6 head of government 9 May 2016 https://www.london.gov.uk/...
  • 21. THE ROLE OF PROVENANCE Wikidata aims to become a hub of references Provenance increases trust in Wikidata Lack of provenance hinders content reuse Quality of references is yet unknown Hartig, O. (2009). Provenance Information in the Web of Data. LDOW, 538.
  • 22. OUR STUDY Approach to evaluate quality of external references in Wikidata Quality is defined by the Wikidata verifiability policy  Relevant: support the statement they are attached to  Authoritative: trustworthy, up-to-date, and free of bias for supporting a particular statement Large-scale (the whole of Wikidata) Bot vs. human-contributed references
  • 23. RESEARCH QUESTIONS RQ1 Are Wikidata external references relevant? RQ2 Are Wikidata external references authoritative? I.e., do they match the author and publisher types from the Wikidata policy? RQ3 Can we automatically detect non-relevant and non-authoritative references?
  • 24. METHODS TWO STAGE MIXED APPROACH 1. Microtask crowdsourcing Evaluate relevance & authoritativeness of a reference sample Create training set for machine learning model 2. Machine learning Large-scale reference quality prediction RQ1 RQ2 RQ3
  • 25. STAGE 1: MICROTASK CROWDSOURCING 3 tasks on Crowdflower 5 workers/task, majority voting Test questions to select workers 25 Feature Microtask Description Relevance T1 Does the reference support the statement? Authoritativeness T2 Choose author type from list T3.A Choose publisher type from list T3.B Verify publisher type, then choose sub-type from list RQ1 RQ2
  • 26. STAGE 2: MACHINE LEARNING Compared three algorithms  Naïve Bayes, Random Forest, SVM Features based on [Lehmann et al., 2012 & Potthast et al. 2008] Baseline: item labels matching (relevance); deprecated domains list (authoritativeness) RQ3 Features URL reference uses Subject parent class Source HTTP code Property parent class Statement item vector Object parent class Statement object vector Author type Author activity Author activity on references
  • 27. DATA 1.6M external references (6% of total)  1.4M from two sources (protein KBs) 83,215 English-language references  Sample of 2586 (99% conf., 2.5% m. of error)  885 assessed automatically, e.g., links not working or csv files
  • 28. RESULTS: CROWDSOURCING CROWDSOURCING WORKS Trusted workers: >80% accuracy 95% of responses from T3.A confirmed in T3.B Task No. of microtasks Total workers Trusted workers Workers’ accuracy Fleiss’ k T1 1701 references 457 218 75% 0.335 T2 1178 links 749 322 75% 0.534 T3.A 335 web domains 322 60 66% 0.435 T3.B 335 web domains 239 116 68% 0.391
  • 29. RESULTS: CROWDSOURCING MAJORITY OF REFERENCES ARE HIGH QUALITY 2586 references evaluated Found 1674 valid references from 345 domains Broken URLs deemed not relevant and not authoritative RQ1 RQ2
  • 30. RESULTS: CROWDSOURCING HUMANS ARE BETTER AT EDITING REFERENCES RQ1 RQ2
  • 31. RESULTS: CROWDSOURCING DATA FROM GOVERNMENT AND ACADEMIC SOURCES Most common author type (T2)  Organisation (78%) Most common publisher types (T3)  Governmental agencies (37%)  Academic organisations (24%) RQ2
  • 32. RESULTS: MACHINE LEARNING RANDOM FORESTS PERFORM BEST F1 MCC Relevance Baseline 0.84 0.68 Naïve Bayes 0.90 0.86 Random Forest 0.92 0.89 SVM 0.91 0.87 Authoritativeness Baseline 0.53 0.16 Naïve Bayes 0.86 0.78 Random Forest 0.89 0.83 SVM 0.89 0.79 RQ3
  • 33. LESSONS LEARNED Crowdsourcing+ML works! Many external sources are high quality Bad references mainly non-working links, continuous control required Lack of diversity in bot-added sources Humans and bots are good at different things
  • 34. LIMITATIONS AND FUTURE WORK Studies with non-English sources Did not consider internal references Deployment in Wikidata, including changes in editing behaviour
  • 35. FROM NEURAL NETWORKS TO A MULTILINGUAL WIKIPEDIA Kaffee, L., Elsahar, H., Vougiouklis, P., Gravier, C., Laforest, F., Hare, J., & Simperl, E. (2018) Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders. European Semantic Web Conference, to appear. Springer 35
  • 36. BACKGROUND Wikipedia is available in 287 languages, but content is unevenly distributed Wikidata is cross-lingual ArticlePlaceholders display Wikidata triples as stubs for articles in underserved Wikipedia’s Currently deployed in 11 Wikipedia’s
  • 37. OUR STUDY Enrich ArticlePlaceholders with textual summaries generated from Wikidata triples Train a neural network to generate one sentence summaries resembling the opening paragraph of a Wikipedia article Test the approach on two languages, Esperanto and Arabic with readers and editors of those Wikipedia’s
  • 38. RESEARCH QUESTIONS RQ1 Can we automatically generate summaries that match the quality and feel of Wikipedia in different languages? RQ2 Are summaries useful for the communities editing underserved Wikipedia’s?
  • 39. APPROACH NEURAL NETWORK TRAINED ON WIKIDATA/WIKIPEDIA Feed-forward architecture encodes triples from the ArticlePlaceholder into vector of fixed dimensionality RNN-based decoder generates text summaries, one token at a time Optimisations for different entity verbalisations, rare entities etc.
  • 40. EVALUATION AUTOMATIC EVALUATION Trained on corpus of Wikipedia sentences and corresponding Wikidata triples (205k Arabic; 102k Esperanto) Tested against three baselines: machine translation (MT) and template retrieval (TR, TRext) Using standard metrics: BLEU, METEOR, ROUGEL RQ1
  • 41. EVALUATION USER STUDIES Two 15 days online surveys with readers and editors of the Arabic and Esperanto Wikipedia’s  Readers survey  60 articles (30 ours, 15 news items, 15 Wikipedia summaries from the training corpus)  Fluency: Is the text understandable and grammatically correct?  Appropriateness: Does the summary ‘feel’ like a Wikipedia article?  Editors survey  30 automatically generated summaries  Editors were asked to edit the article starting from our summary (2-3 sentences)  Measured the extent to which the summary was reused (Greedy String Tiling – GST – metric) RQ1 RQ2
  • 42. RESULTS: AUTOMATIC EVALUATION APPROACH OUTPERFORMS BASELINES
  • 43. RESULTS: USER STUDIES SUMMARIES ARE USEFUL FOR THE COMMUNITY Readers study Editors study
  • 44. LIMITATIONS AND FUTURE WORK No easy way to test whether summaries would indeed lead to more participation on underserved Wikipedia’s Wikidata itself needs more multilingual labels Ongoing Wikipedia study: ask editors of Wikipedia articles opportunistically to add missing labels of relevant Wikidata items and properties
  • 46. SUMMARY OF FINDINGS Collaboration between human and bots is important Tools needed to identify tasks for bots and continuously study their effects on outcomes and community Quality is a complex concept, we studied only a subset of aspects References are high quality, though biases exist in terms of choice of sources Automatically created content is useful to editors of underserved Wikipedia’s