Citations and references in DBpedia
Krzysztof Węcel, Włodzimierz Lewoniewski
13th DBpedia Community Meeting
Leipzig, 2019.05.23
Background
Krzysztof Węcel 2
Among five most important research discoveries at Wikimedia 2018 conference
Lewoniewski, Włodzimierz; Krzysztof, Węcel; Abramowicz, Witold.
”Relative Quality and Popularity Evaluation of Multilingual
Wikipedia”. Informatics 2017, 4(4), 43.
http://dx.doi.org/10.3390/informatics4040043
Why are citations important?
 Wikipedia requires that statements are supported by
references
 extraction of facts for DBpedia should be quality-driven
 references can be an important factor for quality
assessment
 research question: how can we assess references?
Krzysztof Węcel 3
Citation templates
 {{cite web …
 {{cite journal …
 {{cite book …
 {{cite conference
but also
 {{Google books|ID|title|page=|
keywords=|text=|plainurl=}}
4
Number of templates
5
Web references stats
Krzysztof Węcel 6
References with URLs contained in Wikipedia articles
Language
Num. of references
with URL
Number of hosts
English Wikipedia 26 200 779 1 916 682
French Wikipedia 4 275 477 460 420
German Wikipedia 4 139 912 608 151
Spanish Wikipedia 3 314 033 380 697
Italian Wikipedia 2 652 314 228 428
Polish Wikipedia 2 270 715 222 434
Web references stats
Krzysztof Węcel 7
Conclusion: articles in various languages usually cite
universal URLs, and usually in English
Overlaps in lists of the most popular hosts (top 50 000)
References and citation templates
<ref name="Trimble 1987">{{cite journal
|last=Trimble |first=V.
|date=1987
|title=Existence and nature of dark matter in the
universe
|journal=[[Annual Review of Astronomy and
Astrophysics]]
|volume=25|pages=425–472
|bibcode=1987ARA&A..25..425T
|doi=10.1146/annurev.aa.25.090187.002233
}}</ref>
8
Interesting identifiers:
ISBN, ISSN, DOI, PubMed, arXiv
Quality of references
Krzysztof Węcel 9
Aggregation by articles
Results: citation rankings
Krzysztof Węcel 10
Wikirank.net
 we develop a portal for ranking Wikipedia articles in
various language according to their quality criteria
 55 languages
 current modules:
 WikiRank
 Top Articles
 Citation Index
 Websites Rank
http://wikirank.net
11
Infoboxes.net
 collecting all infoboxes in one page, with references, for easy fact checking
Krzysztof Węcel 12
Future work: measures from Altmetric
Krzysztof Węcel 13
Source:
https://scienceopen.altmetric.com/details/26323463
Future work: measures from Open Academic Graph
Krzysztof Węcel 14
• Paper Rank: 19942
• Citation count: 3
• URLs:
• https://papers.ssrn.com/sol3/papers.cfm?abstract_i
d=3039505
• https://www.ssrn.com/abstract=3039505
• Fields of study:
• Social science
• Randomized controlled trial
• Public good
• Political science
• Epistemology of Wikipedia
• DOI: 10.2139/ssrn.3039505
• …
• Authors:
• Neil Thompson:
• number of publications: …
• affiliations: …
• h-index:
• …
• Douglas Hanley: …
OAG is generated by linking two large
academic graphs: Microsoft Academic
Graph (MAG) and AMiner.
Summary
Main thesis: analysis of citation statistics can improve
quality modelling of Wikipedia articles thus better fact
seletion for DBpedia is possible
 additional effort beside collecting bibliographic data
 categorization of citations
 disambiguation of citations
 linking to external databases (e.g. Altmetric, OAG)
 assessment of citation quality
 applications
 completion of facts about linked data entities
 filling in missing references in Wikidata (also multilingual)
Krzysztof Węcel 15

Citations and References in DBpedia

  • 1.
    Citations and referencesin DBpedia Krzysztof Węcel, Włodzimierz Lewoniewski 13th DBpedia Community Meeting Leipzig, 2019.05.23
  • 2.
    Background Krzysztof Węcel 2 Amongfive most important research discoveries at Wikimedia 2018 conference Lewoniewski, Włodzimierz; Krzysztof, Węcel; Abramowicz, Witold. ”Relative Quality and Popularity Evaluation of Multilingual Wikipedia”. Informatics 2017, 4(4), 43. http://dx.doi.org/10.3390/informatics4040043
  • 3.
    Why are citationsimportant?  Wikipedia requires that statements are supported by references  extraction of facts for DBpedia should be quality-driven  references can be an important factor for quality assessment  research question: how can we assess references? Krzysztof Węcel 3
  • 4.
    Citation templates  {{citeweb …  {{cite journal …  {{cite book …  {{cite conference but also  {{Google books|ID|title|page=| keywords=|text=|plainurl=}} 4
  • 5.
  • 6.
    Web references stats KrzysztofWęcel 6 References with URLs contained in Wikipedia articles Language Num. of references with URL Number of hosts English Wikipedia 26 200 779 1 916 682 French Wikipedia 4 275 477 460 420 German Wikipedia 4 139 912 608 151 Spanish Wikipedia 3 314 033 380 697 Italian Wikipedia 2 652 314 228 428 Polish Wikipedia 2 270 715 222 434
  • 7.
    Web references stats KrzysztofWęcel 7 Conclusion: articles in various languages usually cite universal URLs, and usually in English Overlaps in lists of the most popular hosts (top 50 000)
  • 8.
    References and citationtemplates <ref name="Trimble 1987">{{cite journal |last=Trimble |first=V. |date=1987 |title=Existence and nature of dark matter in the universe |journal=[[Annual Review of Astronomy and Astrophysics]] |volume=25|pages=425–472 |bibcode=1987ARA&A..25..425T |doi=10.1146/annurev.aa.25.090187.002233 }}</ref> 8 Interesting identifiers: ISBN, ISSN, DOI, PubMed, arXiv
  • 9.
    Quality of references KrzysztofWęcel 9 Aggregation by articles
  • 10.
  • 11.
    Wikirank.net  we developa portal for ranking Wikipedia articles in various language according to their quality criteria  55 languages  current modules:  WikiRank  Top Articles  Citation Index  Websites Rank http://wikirank.net 11
  • 12.
    Infoboxes.net  collecting allinfoboxes in one page, with references, for easy fact checking Krzysztof Węcel 12
  • 13.
    Future work: measuresfrom Altmetric Krzysztof Węcel 13 Source: https://scienceopen.altmetric.com/details/26323463
  • 14.
    Future work: measuresfrom Open Academic Graph Krzysztof Węcel 14 • Paper Rank: 19942 • Citation count: 3 • URLs: • https://papers.ssrn.com/sol3/papers.cfm?abstract_i d=3039505 • https://www.ssrn.com/abstract=3039505 • Fields of study: • Social science • Randomized controlled trial • Public good • Political science • Epistemology of Wikipedia • DOI: 10.2139/ssrn.3039505 • … • Authors: • Neil Thompson: • number of publications: … • affiliations: … • h-index: • … • Douglas Hanley: … OAG is generated by linking two large academic graphs: Microsoft Academic Graph (MAG) and AMiner.
  • 15.
    Summary Main thesis: analysisof citation statistics can improve quality modelling of Wikipedia articles thus better fact seletion for DBpedia is possible  additional effort beside collecting bibliographic data  categorization of citations  disambiguation of citations  linking to external databases (e.g. Altmetric, OAG)  assessment of citation quality  applications  completion of facts about linked data entities  filling in missing references in Wikidata (also multilingual) Krzysztof Węcel 15