The document discusses an online experiment that aims to investigate how popularity data influences predictive relevance judgments in academic search systems. Specifically, it will manipulate four independent variables - number of citations for a work, number of citations for an author, author impact level, and number of downloads - and measure their effect on relevance judgment scores. An within-subjects multifactorial design will expose participants to 81 experimental conditions. The goal is to better understand what criteria users consider when evaluating search results and how popularity metrics may impact those criteria.
Investigating the effects of popularity data on predictive relevance judgments in academic search systems
1. INVESTIGATING THE EFFECTS OF
POPULARITY DATA ON PREDICTIVE
RELEVANCE JUDGMENTS
IN ACADEMIC SEARCH SYSTEMS
Christiane Behnert, M.A.
Research Assistant at Hamburg University of Applied Sciences, Germany
Doctoral candidate at University of Hildesheim, Germany
CHIIR‘19, Doctoral Consortium, 10th March 2019, Glasgow
3. DEPARTMENT OF INFORMATION
Christiane Behnert
1. RESEARCH SETTING
What are the criteria by which users of academic search systems judge
a search result’s relevance?
→ Literature review of studies on relevance criteria:
(a) no standardised definitions / distinction of the concepts of clues, criteria, factors;
(b) results to be judged did not include additional data, e.g., popularity data;
(c) very few studies on relevance criteria employ experimental designs.
2
4. DEPARTMENT OF INFORMATION
Christiane Behnert
2. USER MODEL
A user model of predictive relevance judgments in academic search systems
Source: Based on Behnert (2019), p. 28. 3
5. DEPARTMENT OF INFORMATION
Christiane Behnert
3. ONLINE EXPERIMENT
3.1 Why an experimental design?
Relevance is not something physical to measure scientifically because it involves a human
being’s mental activity [Buckland, 2017]
→ Mental activity is strongly related to human behaviour
→ Knowledge about actual human behaviour [Kelly, 2009; Kelly & Cresenzi, 2016]
A “substitute” for measuring relevance:
→ Elements (clues) within a search result = operationalised relevance criteria
→ Manipulated elements may cause an observable effect in relevance judgments
4
6. DEPARTMENT OF INFORMATION
Christiane Behnert
3. ONLINE EXPERIMENT
3.2 Multifactorial within-subjects design
Dependent variable: relevance judgment score (visual analogue scale from 0-100)
→ 4 independent variables on 3 levels => 3x3x3x3 = 81 experimental conditions
5
Independent variable Level A Level B Level C
No. of citations (work) Low High Not provided
No. of citations (author) Low High Not provided
Author impact Low High Not provided
No. of downloads Low High Not provided
7. DEPARTMENT OF INFORMATION
Christiane Behnert
3. ONLINE EXPERIMENT
3.3 Data collection
Questionnaire with 9 tasks and 9 search results each
Tasks inspired by simulated work task situations [Borlund & Ingwersen, 1997]
Task topics:
1) Altmetrics
2) Peer review
3) Wikipedia
4) Data citation
5) Open access
6) Google
7) Information literacy
and gaming
8) Visual information seeking
9) Scholarly communication
6
= not at all useful = very useful
8. DEPARTMENT OF INFORMATION
Christiane Behnert
4. REFERENCES
Behnert, C. (2019). Kriterien und Einflussfaktoren bei der Relevanzbewertung von Surrogaten in
akademischen Informationssystemen [Criteria and influencing factors of predictive relevance judgements
in academic search systems; in German]. Information - Wissenschaft & Praxis, 70(1), 24–32.
https://doi.org/10.1515/iwp-2019-0002
Borlund, P., & Ingwersen, P. (1997). The development of a method for the evaluation of interactive
information retrieval systems. Journal of Documentation, 53(3), 225–250.
http://doi.org/10.1108/EUM0000000007198
Buckland, M. K. (2017). Information and society. Cambridge, MA ; London, UK: MIT Press.
Kelly, D. (2009). Methods for evaluating interactive information retrieval systems with users.
Foundations and Trends® in Information Retrieval, 3(1—2).
Kelly, D., & Cresenzi, A. (2016). From design to analysis: Conducting controlled laboratory experiments
with users. In Proceedings of the 39th International ACM SIGIR Conference on Research and
Development in Information Retrieval - SIGIR ’16 (1207–1210). New York, New York, USA: ACM Press.
7
9. Christiane Behnert, M.A.
Research Assistant
x
Hamburg University of Applied Sciences
Department of Information
Research group Search Studies
x
christiane.behnert@haw-hamburg.de
http://searchstudies.org/christiane-behnert/
Influences on the relevance judgment process in academic information systems:
An experimental approach to researching relevance criteria using popularity data
as part of search results as an example
[Working title in English]
10. DEPARTMENT OF INFORMATION
Christiane Behnert
SUPPLEMENT A
Next steps
→ Conducting a pretest with a sample of the actual target group
→ Developing a framework for statistical data analysis
→ Revising questionnaire / design based on pretest results
→ Conducting the real field phase in May
9
11. DEPARTMENT OF INFORMATION
Christiane Behnert
SUPPLEMENT B
Research questions
RQ I How can relevance criteria be researched with an experimental study design?
RQ II What influence do popularity data have on relevance judgments of search results
in academic search systems?
RQ III Which popularity data influence the relevance judgement process in academic
search systems to which extent?
10
12. DEPARTMENT OF INFORMATION
Christiane Behnert
SUPPLEMENT C
Operationalised levels of independent variables
11
Independent Variable Level A (from…to) Level B (from…to) Level C
No. of citations (work) Low (2…9) High (21…30) Not provided
No. of citations (author) Low (300…550) High (2100…2450) Not provided
Author impact Low (Top 75 or 50%) High (Top 5 or 1%) Not provided
No. of downloads Low (50…80) High (250…450) Not provided