1. Use of Wikipedia
categories on information
retrieval research:
a brief review
Jesús Tramullas
Dept. Library & Information Science, Univ. of Zaragoza
Piedad Garrido-Picazo
Dept. Computer Science & Systems Engineering, Univ. of Zaragoza
Ana I. Sánchez Casabón
Dept. Library & Information Science, Univ. of Zaragoza
2. About Wikipedia Categories
Wikipedia categories are a classification
scheme built for organizing and describing
Wikipedia articles.
Started at 2003.
System that combines a hierarchical
organization with relations among different
categories, which creates poly-hierarchies
and associations.
3.
4. Research Questions
RQ1: to identify the uses and applications that
researchers are doing from Wikipedia category
system in computer science research.
RQ2: to review how a knowledge organization
system, developed collaboratively, is being
used as a research tool in different approaches
to information processing and retrieval.
5. Research Method
Systematic literature review.
Sources: Scopus and WoS, Nov. 2017-Jan 2018.
Boolean query: “Wikipedia" and "categories,”
in title, keyword and abstract fields, and limits
2002-2017.
Scopus: 666; WoS: 311.
Processed datasets: from 680 to 546 papers.
6. RQ1: results and discussion
Previously, bibliographical data published
open in Zotero and Mendeley.
Answered in the affirmative: Variety of
approaches, uses, and applications that
researchers make with the Wikipedia
categories structure.
It’s impossible to establish precise divisions.
7. RQ1: two big groups
Firstly, studies that analyzed the category
system itself within the context of Wikipedia.
Secondly, those papers that use Wikipedia
categories in the context of studies on
different aspects of information processing,
usually on documentary corpus independent
of Wikipedia.
8. RQ2: results and discussion
Information Retrieval.
Entity processing.
Indexing and classification of document
corpus.
Creating and using taxonomies.
Creating and using ontologies.
Semantic treatment.
Other uses
9. Conclusions, 1
Wikipedia is an important field of research for
different areas of computer science, in
general, and information retrieval, in
particular.
Detected significant topics offer a close
relationship between them, reflecting the
classic major topics on information retrieval.
10. Conclusions, 2
It’s necessary to emphasize its use as a tool of
support and validation in different types of
approaches to the study and analysis of
documentary corpus, including studies about
information processing, classification and
retrieval.
It provides a broad field both for the
classification schemas validation, as for
creating new ones.
11. Problems
The variety of terms used by researchers in
describing their work highlights an underlying
problem to systematic reviews, as is the
disparity of opinion of the authors in the
drafting of titles, abstracts and selecting
keywords.
12. Future work
First, to carry on and survey the results of
applying text classification techniques to the
corpus data, to compare with our proposal.
Second, to complete the review with a
quantitative or bibliometric analysis.
Finally, to study the research focused in
applications of Computer Science to other
fields.
13. Questions?
Esta obra está bajo una licencia de Creative Commons
Reconocimiento-CompartirIgual 4.0 Internacional.