SlideShare a Scribd company logo
1 of 26
Multi-method Evaluation in
Scientific Paper Recommender
Systems
Aravind Sesagiri Raamkumar
Schubert Foo
Wee Kim Wee School of Communication and
Information, NTU
IUadaptME Workshop|UMAP’18
July 8th 2018
2
Scientific Paper
Recommender
Systems (SPRS)
Citation
Recommender
Systems
Literature
Recommender
Systems
Research Paper
Recommender
Systems
Focus
Area
3
4
5
• Major Areas
– Literature Review (LR) tasks
• Task of building an initial reading list at the start of LR
• Task of finding similar papers based on a single paper
• Task of finding similar papers based on multiple papers
• Task of searching papers based on input text
– User footprint
– Researcher’s publication history
– Social network of authors
• Recommendations generated based on:-
– Citation network
– Metadata fields
– Text content from papers
– System logs
SPRS Studies
Rec4LRW System
Rec4LRW – Recommender System for Literature Review and Writing
• Task 1 - Building an initial reading list of research papers
– Author-specified Keywords based Retrieval (AKR) Technique
• Task 2 - Finding similar papers based on set of papers
– Integrated Discovery of Similar Papers (IDSP) Technique
• Task 3 - Shortlisting papers from reading list for inclusion in manuscript
– Citation Network based Shortlisting (CNS) Technique
6
Rec4LRW Task Screens
Task 1
Task 2
Information
cue labels
Seed
Basket (SB)
7
Task Screens
Task 2
Task 3
Shared
Co-relations
Reading List
(RL)
8
Rec4LRW Task Screens
Task Screens
Task 3
Cluster
viewing
option
9
Rec4LRW Task Screens
Rec4LRW Evaluation Strategy
10
Offline
Evaluation of
Task 1
• Rank aggregation method
User
Evaluation of
Three Tasks
• Survey-based evaluations
User
Evaluation of
Overall System
• Survey-based
evaluations
“Offline evaluations are more prevalent in
this SPRS area, accounting to about 69% of
all studies”
Offline Evaluation of Task 1
Evaluated Techniques
Label Abbr. Technique Description
A AKRv1 Basic AKR technique with weights WCC = 0.25, WRC=0.25, WCO = 0.5
B AKRv2 Basic AKR technique with weights WCC = 0.1, WRC=0.1, WCO = 0.8
C HAKRv1 HITS enhanced AKR technique boosted with weights WCC = 0.25, WRC=0.25, WCO = 0.5
D HAKRv2 HITS enhanced AKR technique boosted with weights WCC = 0.1, WRC=0.1, WCO = 0.8
E CFHITS IBCF technique boosted with HITS
F CFPR IBCF technique boosted with PageRank
G PR PageRank technique
Evaluation Approach
• Number of Recent (R1), Popular (R2), Survey (R3) and Diverse (R4) papers were enumerated for each of the
186 topics and seven techniques
• Ranks were assigned to the technique based on the highest counts in each recommendation list
• The RankAggreg library was used to perform Rank Aggregation
Experiment Setup
• A total of 186 author-specified keywords from the ACM DL dataset were identified as the seed research topic
• The experiment was performed in three sequential steps.
1. Top 200 papers were retrieved using the BM25 similarity algorithm
2. Top 20 papers were identified using the specific ranking schemes of the seven techniques
3. The evaluation metrics were measured for the seven techniques
11
Offline Evaluation of Task 1
Results
Paper Type (Requirement)
Optimal Aggregated Ranks
Min. Obj. Function
Score1 2 3 4 5 6 7
Recent Papers (R1) B A C D E F G 10.66
Popular Papers (R2) F E C D G A B 11.89
Literature Survey Papers (R3) C G D A E F B 13.38
Diverse Papers (R4) C D G A B F E 12.15
• The HITS enhanced version of the AKR technique HAKRv1 (C) was the best all-
round performing technique
• The HAKRv1 technique was particularly good for retrieving literature survey
papers and papers from different sub-topics while the basic AKRv1 technique (A)
was good for retrieving recent papers
12
Rec4LRW User Study Evaluation Goals
1. Ascertain the agreement percentages of the evaluation measures for the
three tasks and the overall system and identify whether the values are above a
preset threshold criteria of 75%
2. Test the hypothesis that students benefit more from the recommendation
tasks/system in comparison to staff
3. Measure the correlation between the measures and build a regression model
with ‘agreeability on a good list’ as the dependent variable
4. Track the change in user perceptions between the three tasks
5. Compare the pre-study and post-study variables for understanding whether
the target participants are benefitted from the tasks
6. Identify the top most preferred and critical aspects of the task
recommendations and the system using the subjective feedback of the
participants
13
User Study Details
• Rec4LRW system was made available over the internet
• Participants were recruited with intent to get worldwide audience
• Only researchers with paper authoring experience were recruited through a
pre-screening survey
• 230 researchers participated in the pre-screening survey
• 149 participants were deemed eligible and invited for the study
• Participants provided with a user guide
• All the three tasks were required to be executed by the participants
• Evaluation questionnaires embedded in the screen of each task of Rec4LRW
system
14
Task Evaluation Measures
Common Measures
• Relevance
• Usefulness
• Good_List
Tasks 1 and 2
• Good_Spread
• Diversity
• Interdisciplinarity
• Popularity
• Recency
• Good_Mix
• Familiarity
• Novelty
• Serendipity
• Expansion_Required
• User_Satisfaction
Task 2
• Seedbasket_Similarity
• Shared_Corelations
• Seedbasket_Usefulness
Task 3
• Importance
• Certainty
• Shortlisting_Feature
15
Qualitative Feedback
1) From the displayed information, what features did
you like the most?
2) Please provide your personal feedback about the
execution of this task
System Evaluation Measures
Effort to use the System (EUS)
• Convenience
• Effort_Required
• Mouse_Clicks
• Little_Time
• Much_Time
Perceived Usefulness (PU)
• Productivity_Improvability
• Enhance_Effectiveness
• Ease_Job
• Work_Usefulness
Perceived System Effectiveness (PSE)
• Recommend
• Pleasant_Experience
• Useless
• Awareness
• Better_Choice
• Findability
• Accomplish_Tasks
• Performance_Improvability
16
17
Sample Evaluation Questionnaire
Analysis Procedures
Quantitative Data
• Agreement Percentage (AP) calculated by only considering responses of 4
(‘Agree’) and 5 (‘Strongly Agree’) in the 5-point Likert scale
• Independent samples t-test for hypothesis testing
• Spearman coefficient for correlation measurement
• MLR used for the predictive models
– Paired samples t-test for model validation
Qualitative Data
• Descriptive coding method was used to code the participant feedback
• Two coders performed the coding in a sequential manner
Preferred Aspects (κ) Critical Aspects (κ)
Task 1 0.918 0.727
Task 2 0.930 0.758
Task 3 0.877 0.902
18
Participant Demographics
Stage N
Task 1 132
Task 2 121
Task 3 119
Demographic Variable N
Position
Student 62 (47%)
Staff 70 (53%)
Experience Level
Beginner 15 (11.4%)
Intermediate 61 (46.2%)
Advanced 34 (25.8%)
Expert 22 (16.7%)
Discipline N
Computer Science & Information Systems 51 (38.6%)
Library and Information Studies 30 (22.7%)
Electrical & Electronic Engineering 30 (22.7%)
Communication & Media Studies 8 (6.1%)
Mechanical, Aeronautical & Manufacturing Engineering 5 (3.8%)
Biological Sciences 2 (1.5%)
Statistics & Operational Research 1 (0.8%)
Education 1 (0.8%)
Politics & International Studies 1 (0.8%)
Economics & Econometrics 1 (0.8%)
Civil & Structural Engineering 1 (0.8%)
Psychology 1 (0.8%)
Country N
Singapore 107 (81.1%)
India 4 (3%)
Malaysia 3 (2.3%)
Sri Lanka 3 (2.3%)
Pakistan 3 (2.3%)
Indonesia 2 (1.5%)
Germany 2 (1.5%)
Australia 1 (0.8%)
Iran 1 (0.8%)
Thailand 1 (0.8%)
China 1 (0.8%)
USA 1 (0.8%)
Canada 1 (0.8%)
Sweden 1 (0.8%)
Slovenia 1 (0.8%) 19
Results for Goals 1 & 2
20
Results for Goals 3 and 4
Predictors for “Good_List”
Task Independent Variables
Task 1 Recency, Novelty, Serendipity, Usefulness, User_Satisfaction
Task 2 Seedbasket_Similarity, Usefulness
Task 3 Relevance, Usefulness, Certainty
Transition of User Perception from Task 1 to 2
21
Results for Goal 5
0 1 3 4 00
6
5
10
21
9
18
22
40 1
11
18
10 1 2
5
6
0
5
10
15
20
25
Count
1
2
3
4
5
0 3 5
20
30 3
9
30
41 2 7
21
20 0 3 1 2
0
5
10
15
20
25
30
35
Count 1
2
3
4
5
0 1 3 2 30 2
8
15
40 4
7
24
6
0 1 5
16
31 1 2 5 1
0
5
10
15
20
25
30
Count
1
2
3
4
5
Task 1
Task 2
Task 3
Need_Assistance
(pre study)
Vs.
Good_List
(post study)
22
Never Rarely Sometimes Often Always
Never Rarely Sometimes Often Always
Never Rarely Sometimes Often Always
Results for Goal 6
Top 5 Preferred Aspects
Rank Task 1 (N=109) Task 2 (N=100) Task 3 (N=91)
1 Information Cue Labels (41%)
Shared Co-citations & Co-references
(28%)
Shortlisting Feature &
Recommendation Quality (24%)
2 Rich Metadata (21%) Recommendation Quality (27%) Information Cue Labels (15%)
3 Diversity of Papers (13%) Information Cue Labels (16%) View Papers in Clusters (11%)
4 Recommendation Quality (9%) Seed Basket (14%) Rich Metadata (7%)
5 Recency of Papers (4%) Rich Metadata (9%) Ranking of Papers (3%)
Rank Task 1 (N=109) Task 2 (N=100) Task 3 (N=91)
1 Broad topics not suitable (20%) Quality can be improved (16%)
Rote selection of papers for task
execution (16%)
2 Limited dataset (7%) Limited dataset (12%) Limited dataset (5%)
3 Quality can be improved (6%)
Recommendation algorithm could
include more dimensions (7%)
Algorithm can be improved (5%)
4 Different algorithm required (5%) Speed can be improved (7%) Not sure of the usefulness (4%)
5 Free-text search required (4%)
Repeated recommendations from Task 1
(3%)
UI can be improved (3%)
Top 5 Critical Aspects
23
SPRRF - Scientific Paper Retrieval and
Recommender Framework (SPRRF)
Distinct User
Groups
Usefulness of
Information Cue
Labels
Forced
Serendipity vs.
Natural
Serendipity
Learning
Algorithms vs.
Fixed-Logic
Algorithms
Inclusion of
Control
Features in UI
Inclusion of
Bibliometric
Data
Diversification
of Corpus
• Seven themes identified using holistic coding method
• SPRRF conceptualized as a mental model based on
the themes
• The framework needs to be validated
24
Questions for Discussion
 How dependable are the gold standard lists in SPRS
evaluation since relevance is largely dependent on
user perspective?
 Should SPRS evaluations be conducted in a parallel
or serial manner?
 What type of data should be collected during
usability testing in SPRS evaluation?
25
THANK YOU
aravind002@ntu.edu.sg
26

More Related Content

What's hot

Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...Paolo Missier
 
Wsdm west wesley-smith
Wsdm west wesley-smithWsdm west wesley-smith
Wsdm west wesley-smithJevin West
 
Marshall hm poster_vra2015
Marshall hm poster_vra2015Marshall hm poster_vra2015
Marshall hm poster_vra2015Hannah Marshall
 
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Abdel Salam Sayyad
 
A data driven approach to measure web site navigability
A data driven approach to measure web site navigabilityA data driven approach to measure web site navigability
A data driven approach to measure web site navigabilityShu-Jeng Hsieh
 
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...Ali Ouni
 

What's hot (7)

Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
 
Wsdm west wesley-smith
Wsdm west wesley-smithWsdm west wesley-smith
Wsdm west wesley-smith
 
Marshall hm poster_vra2015
Marshall hm poster_vra2015Marshall hm poster_vra2015
Marshall hm poster_vra2015
 
12
1212
12
 
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
 
A data driven approach to measure web site navigability
A data driven approach to measure web site navigabilityA data driven approach to measure web site navigability
A data driven approach to measure web site navigability
 
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
 

Similar to Multi-method Evaluation in Scientific Paper Recommender Systems

ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewAli Ouni
 
Presentation on Software process improvement in GSD
Presentation on Software process improvement in GSDPresentation on Software process improvement in GSD
Presentation on Software process improvement in GSDRafi Ullah
 
Shyam presentation prefinal
Shyam presentation prefinalShyam presentation prefinal
Shyam presentation prefinalShyam Raj
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Editionkrisztianbalog
 
Lecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdfLecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdfAbdullahOmar64
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...David Zibriczky
 
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Dr. Cornelius Ludmann
 
Concept Location using Information Retrieval and Relevance Feedback
Concept Location using Information Retrieval and Relevance FeedbackConcept Location using Information Retrieval and Relevance Feedback
Concept Location using Information Retrieval and Relevance FeedbackSonia Haiduc
 
Irrf Presentation
Irrf PresentationIrrf Presentation
Irrf Presentationgregoryg
 
Qualitative and quantitative analysis
Qualitative and quantitative analysisQualitative and quantitative analysis
Qualitative and quantitative analysisNellie Deutsch (Ed.D)
 
How to gain a foothold in the world of classification
How to gain a foothold in the world of classificationHow to gain a foothold in the world of classification
How to gain a foothold in the world of classificationTorsten Schön
 

Similar to Multi-method Evaluation in Scientific Paper Recommender Systems (20)

ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
 
Apsec 2014 Presentation
Apsec 2014 PresentationApsec 2014 Presentation
Apsec 2014 Presentation
 
Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
 
Presentation on Software process improvement in GSD
Presentation on Software process improvement in GSDPresentation on Software process improvement in GSD
Presentation on Software process improvement in GSD
 
Shyam presentation prefinal
Shyam presentation prefinalShyam presentation prefinal
Shyam presentation prefinal
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Edition
 
Systematic Literature Review
Systematic Literature ReviewSystematic Literature Review
Systematic Literature Review
 
Gamifying Research Activity Support System
Gamifying Research Activity Support SystemGamifying Research Activity Support System
Gamifying Research Activity Support System
 
moraes-a2017ictir
moraes-a2017ictirmoraes-a2017ictir
moraes-a2017ictir
 
Lecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdfLecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdf
 
PEDSnet : 18 month summary on data integration and data quality
PEDSnet : 18 month summary on data integration and data qualityPEDSnet : 18 month summary on data integration and data quality
PEDSnet : 18 month summary on data integration and data quality
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
 
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
 
Concept Location using Information Retrieval and Relevance Feedback
Concept Location using Information Retrieval and Relevance FeedbackConcept Location using Information Retrieval and Relevance Feedback
Concept Location using Information Retrieval and Relevance Feedback
 
Irrf Presentation
Irrf PresentationIrrf Presentation
Irrf Presentation
 
Qualitative and quantitative analysis
Qualitative and quantitative analysisQualitative and quantitative analysis
Qualitative and quantitative analysis
 
Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series - Late...
Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series -  Late...Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series -  Late...
Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series - Late...
 
Evaluating and selecting software packages a review
Evaluating and selecting software packages a reviewEvaluating and selecting software packages a review
Evaluating and selecting software packages a review
 
How to gain a foothold in the world of classification
How to gain a foothold in the world of classificationHow to gain a foothold in the world of classification
How to gain a foothold in the world of classification
 
Validation Studies in Simulation-based Education - Deb Rooney
Validation Studies in Simulation-based Education - Deb RooneyValidation Studies in Simulation-based Education - Deb Rooney
Validation Studies in Simulation-based Education - Deb Rooney
 

More from Aravind Sesagiri Raamkumar

Approaches to combining supplementary datasets across multiple trusted resear...
Approaches to combining supplementary datasets across multiple trusted resear...Approaches to combining supplementary datasets across multiple trusted resear...
Approaches to combining supplementary datasets across multiple trusted resear...Aravind Sesagiri Raamkumar
 
Measuring the Outreach Efforts of Public Health Authorities and the Public Re...
Measuring the Outreach Efforts of Public Health Authorities and the Public Re...Measuring the Outreach Efforts of Public Health Authorities and the Public Re...
Measuring the Outreach Efforts of Public Health Authorities and the Public Re...Aravind Sesagiri Raamkumar
 
Understanding the Twitter Usage of Science Citation Index (SCI) Journals
Understanding the Twitter Usage of Science Citation Index (SCI) JournalsUnderstanding the Twitter Usage of Science Citation Index (SCI) Journals
Understanding the Twitter Usage of Science Citation Index (SCI) JournalsAravind Sesagiri Raamkumar
 
Investigating the Characteristics and Research Impact of Sentiments in Tweets...
Investigating the Characteristics and Research Impact of Sentiments in Tweets...Investigating the Characteristics and Research Impact of Sentiments in Tweets...
Investigating the Characteristics and Research Impact of Sentiments in Tweets...Aravind Sesagiri Raamkumar
 
Understanding the Twitter Usage of Humanities and Social Sciences Academic Jo...
Understanding the Twitter Usage of Humanities and Social Sciences Academic Jo...Understanding the Twitter Usage of Humanities and Social Sciences Academic Jo...
Understanding the Twitter Usage of Humanities and Social Sciences Academic Jo...Aravind Sesagiri Raamkumar
 
Using altmetrics to support research evaluation
Using altmetrics to support research evaluationUsing altmetrics to support research evaluation
Using altmetrics to support research evaluationAravind Sesagiri Raamkumar
 
Evolution and state-of-the art of Altmetric research: Insights from network a...
Evolution and state-of-the art of Altmetric research: Insights from network a...Evolution and state-of-the art of Altmetric research: Insights from network a...
Evolution and state-of-the art of Altmetric research: Insights from network a...Aravind Sesagiri Raamkumar
 
Scientometric Analysis of Research Performance of African Countries in select...
Scientometric Analysis of Research Performance of African Countries in select...Scientometric Analysis of Research Performance of African Countries in select...
Scientometric Analysis of Research Performance of African Countries in select...Aravind Sesagiri Raamkumar
 
New Dialog, New Services with Altmetrics: Lingnan University Library Experience
New Dialog, New Services with Altmetrics: Lingnan University Library ExperienceNew Dialog, New Services with Altmetrics: Lingnan University Library Experience
New Dialog, New Services with Altmetrics: Lingnan University Library ExperienceAravind Sesagiri Raamkumar
 
Field-weighting readership: how does it compare to field-weighting citations?
Field-weighting readership: how does it compare to field-weighting citations?Field-weighting readership: how does it compare to field-weighting citations?
Field-weighting readership: how does it compare to field-weighting citations?Aravind Sesagiri Raamkumar
 
How do Scholars Evaluate and Promote Research Outputs? An NTU Case Study
How do Scholars Evaluate and Promote Research Outputs? An NTU Case StudyHow do Scholars Evaluate and Promote Research Outputs? An NTU Case Study
How do Scholars Evaluate and Promote Research Outputs? An NTU Case StudyAravind Sesagiri Raamkumar
 
Monitoring the broad impact of the journal publication output on country leve...
Monitoring the broad impact of the journal publication output on country leve...Monitoring the broad impact of the journal publication output on country leve...
Monitoring the broad impact of the journal publication output on country leve...Aravind Sesagiri Raamkumar
 
A Comparative Investigation on Citation Counts and Altmetrics between Papers ...
A Comparative Investigation on Citation Counts and Altmetrics between Papers ...A Comparative Investigation on Citation Counts and Altmetrics between Papers ...
A Comparative Investigation on Citation Counts and Altmetrics between Papers ...Aravind Sesagiri Raamkumar
 
Database-Centric Guidelines for Building a Scholarly Metrics Information Syst...
Database-Centric Guidelines for Building a Scholarly Metrics Information Syst...Database-Centric Guidelines for Building a Scholarly Metrics Information Syst...
Database-Centric Guidelines for Building a Scholarly Metrics Information Syst...Aravind Sesagiri Raamkumar
 
Altmetrics for Research Impact Actuation (ARIA)
Altmetrics for Research Impact Actuation (ARIA)Altmetrics for Research Impact Actuation (ARIA)
Altmetrics for Research Impact Actuation (ARIA)Aravind Sesagiri Raamkumar
 
What’s in a Country Name – Twitter Hashtag Analysis of #singapore
What’s in a Country Name – Twitter Hashtag Analysis of #singaporeWhat’s in a Country Name – Twitter Hashtag Analysis of #singapore
What’s in a Country Name – Twitter Hashtag Analysis of #singaporeAravind Sesagiri Raamkumar
 
More Than Just Black and White: A Case for Grey Literature References in Scie...
More Than Just Black and White: A Case for Grey Literature References in Scie...More Than Just Black and White: A Case for Grey Literature References in Scie...
More Than Just Black and White: A Case for Grey Literature References in Scie...Aravind Sesagiri Raamkumar
 
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Aravind Sesagiri Raamkumar
 
Object Recognition-based Mnemonics Mobile App for Senior Adults Communication
Object Recognition-based Mnemonics Mobile App for Senior Adults CommunicationObject Recognition-based Mnemonics Mobile App for Senior Adults Communication
Object Recognition-based Mnemonics Mobile App for Senior Adults CommunicationAravind Sesagiri Raamkumar
 

More from Aravind Sesagiri Raamkumar (20)

Approaches to combining supplementary datasets across multiple trusted resear...
Approaches to combining supplementary datasets across multiple trusted resear...Approaches to combining supplementary datasets across multiple trusted resear...
Approaches to combining supplementary datasets across multiple trusted resear...
 
Measuring the Outreach Efforts of Public Health Authorities and the Public Re...
Measuring the Outreach Efforts of Public Health Authorities and the Public Re...Measuring the Outreach Efforts of Public Health Authorities and the Public Re...
Measuring the Outreach Efforts of Public Health Authorities and the Public Re...
 
Understanding the Twitter Usage of Science Citation Index (SCI) Journals
Understanding the Twitter Usage of Science Citation Index (SCI) JournalsUnderstanding the Twitter Usage of Science Citation Index (SCI) Journals
Understanding the Twitter Usage of Science Citation Index (SCI) Journals
 
Investigating the Characteristics and Research Impact of Sentiments in Tweets...
Investigating the Characteristics and Research Impact of Sentiments in Tweets...Investigating the Characteristics and Research Impact of Sentiments in Tweets...
Investigating the Characteristics and Research Impact of Sentiments in Tweets...
 
Understanding the Twitter Usage of Humanities and Social Sciences Academic Jo...
Understanding the Twitter Usage of Humanities and Social Sciences Academic Jo...Understanding the Twitter Usage of Humanities and Social Sciences Academic Jo...
Understanding the Twitter Usage of Humanities and Social Sciences Academic Jo...
 
Using altmetrics to support research evaluation
Using altmetrics to support research evaluationUsing altmetrics to support research evaluation
Using altmetrics to support research evaluation
 
Evolution and state-of-the art of Altmetric research: Insights from network a...
Evolution and state-of-the art of Altmetric research: Insights from network a...Evolution and state-of-the art of Altmetric research: Insights from network a...
Evolution and state-of-the art of Altmetric research: Insights from network a...
 
Feature Analysis of Research Metrics Systems
Feature Analysis of Research Metrics SystemsFeature Analysis of Research Metrics Systems
Feature Analysis of Research Metrics Systems
 
Scientometric Analysis of Research Performance of African Countries in select...
Scientometric Analysis of Research Performance of African Countries in select...Scientometric Analysis of Research Performance of African Countries in select...
Scientometric Analysis of Research Performance of African Countries in select...
 
New Dialog, New Services with Altmetrics: Lingnan University Library Experience
New Dialog, New Services with Altmetrics: Lingnan University Library ExperienceNew Dialog, New Services with Altmetrics: Lingnan University Library Experience
New Dialog, New Services with Altmetrics: Lingnan University Library Experience
 
Field-weighting readership: how does it compare to field-weighting citations?
Field-weighting readership: how does it compare to field-weighting citations?Field-weighting readership: how does it compare to field-weighting citations?
Field-weighting readership: how does it compare to field-weighting citations?
 
How do Scholars Evaluate and Promote Research Outputs? An NTU Case Study
How do Scholars Evaluate and Promote Research Outputs? An NTU Case StudyHow do Scholars Evaluate and Promote Research Outputs? An NTU Case Study
How do Scholars Evaluate and Promote Research Outputs? An NTU Case Study
 
Monitoring the broad impact of the journal publication output on country leve...
Monitoring the broad impact of the journal publication output on country leve...Monitoring the broad impact of the journal publication output on country leve...
Monitoring the broad impact of the journal publication output on country leve...
 
A Comparative Investigation on Citation Counts and Altmetrics between Papers ...
A Comparative Investigation on Citation Counts and Altmetrics between Papers ...A Comparative Investigation on Citation Counts and Altmetrics between Papers ...
A Comparative Investigation on Citation Counts and Altmetrics between Papers ...
 
Database-Centric Guidelines for Building a Scholarly Metrics Information Syst...
Database-Centric Guidelines for Building a Scholarly Metrics Information Syst...Database-Centric Guidelines for Building a Scholarly Metrics Information Syst...
Database-Centric Guidelines for Building a Scholarly Metrics Information Syst...
 
Altmetrics for Research Impact Actuation (ARIA)
Altmetrics for Research Impact Actuation (ARIA)Altmetrics for Research Impact Actuation (ARIA)
Altmetrics for Research Impact Actuation (ARIA)
 
What’s in a Country Name – Twitter Hashtag Analysis of #singapore
What’s in a Country Name – Twitter Hashtag Analysis of #singaporeWhat’s in a Country Name – Twitter Hashtag Analysis of #singapore
What’s in a Country Name – Twitter Hashtag Analysis of #singapore
 
More Than Just Black and White: A Case for Grey Literature References in Scie...
More Than Just Black and White: A Case for Grey Literature References in Scie...More Than Just Black and White: A Case for Grey Literature References in Scie...
More Than Just Black and White: A Case for Grey Literature References in Scie...
 
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
 
Object Recognition-based Mnemonics Mobile App for Senior Adults Communication
Object Recognition-based Mnemonics Mobile App for Senior Adults CommunicationObject Recognition-based Mnemonics Mobile App for Senior Adults Communication
Object Recognition-based Mnemonics Mobile App for Senior Adults Communication
 

Recently uploaded

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Multi-method Evaluation in Scientific Paper Recommender Systems

  • 1. Multi-method Evaluation in Scientific Paper Recommender Systems Aravind Sesagiri Raamkumar Schubert Foo Wee Kim Wee School of Communication and Information, NTU IUadaptME Workshop|UMAP’18 July 8th 2018
  • 3. 3
  • 4. 4
  • 5. 5 • Major Areas – Literature Review (LR) tasks • Task of building an initial reading list at the start of LR • Task of finding similar papers based on a single paper • Task of finding similar papers based on multiple papers • Task of searching papers based on input text – User footprint – Researcher’s publication history – Social network of authors • Recommendations generated based on:- – Citation network – Metadata fields – Text content from papers – System logs SPRS Studies
  • 6. Rec4LRW System Rec4LRW – Recommender System for Literature Review and Writing • Task 1 - Building an initial reading list of research papers – Author-specified Keywords based Retrieval (AKR) Technique • Task 2 - Finding similar papers based on set of papers – Integrated Discovery of Similar Papers (IDSP) Technique • Task 3 - Shortlisting papers from reading list for inclusion in manuscript – Citation Network based Shortlisting (CNS) Technique 6
  • 7. Rec4LRW Task Screens Task 1 Task 2 Information cue labels Seed Basket (SB) 7
  • 8. Task Screens Task 2 Task 3 Shared Co-relations Reading List (RL) 8 Rec4LRW Task Screens
  • 10. Rec4LRW Evaluation Strategy 10 Offline Evaluation of Task 1 • Rank aggregation method User Evaluation of Three Tasks • Survey-based evaluations User Evaluation of Overall System • Survey-based evaluations “Offline evaluations are more prevalent in this SPRS area, accounting to about 69% of all studies”
  • 11. Offline Evaluation of Task 1 Evaluated Techniques Label Abbr. Technique Description A AKRv1 Basic AKR technique with weights WCC = 0.25, WRC=0.25, WCO = 0.5 B AKRv2 Basic AKR technique with weights WCC = 0.1, WRC=0.1, WCO = 0.8 C HAKRv1 HITS enhanced AKR technique boosted with weights WCC = 0.25, WRC=0.25, WCO = 0.5 D HAKRv2 HITS enhanced AKR technique boosted with weights WCC = 0.1, WRC=0.1, WCO = 0.8 E CFHITS IBCF technique boosted with HITS F CFPR IBCF technique boosted with PageRank G PR PageRank technique Evaluation Approach • Number of Recent (R1), Popular (R2), Survey (R3) and Diverse (R4) papers were enumerated for each of the 186 topics and seven techniques • Ranks were assigned to the technique based on the highest counts in each recommendation list • The RankAggreg library was used to perform Rank Aggregation Experiment Setup • A total of 186 author-specified keywords from the ACM DL dataset were identified as the seed research topic • The experiment was performed in three sequential steps. 1. Top 200 papers were retrieved using the BM25 similarity algorithm 2. Top 20 papers were identified using the specific ranking schemes of the seven techniques 3. The evaluation metrics were measured for the seven techniques 11
  • 12. Offline Evaluation of Task 1 Results Paper Type (Requirement) Optimal Aggregated Ranks Min. Obj. Function Score1 2 3 4 5 6 7 Recent Papers (R1) B A C D E F G 10.66 Popular Papers (R2) F E C D G A B 11.89 Literature Survey Papers (R3) C G D A E F B 13.38 Diverse Papers (R4) C D G A B F E 12.15 • The HITS enhanced version of the AKR technique HAKRv1 (C) was the best all- round performing technique • The HAKRv1 technique was particularly good for retrieving literature survey papers and papers from different sub-topics while the basic AKRv1 technique (A) was good for retrieving recent papers 12
  • 13. Rec4LRW User Study Evaluation Goals 1. Ascertain the agreement percentages of the evaluation measures for the three tasks and the overall system and identify whether the values are above a preset threshold criteria of 75% 2. Test the hypothesis that students benefit more from the recommendation tasks/system in comparison to staff 3. Measure the correlation between the measures and build a regression model with ‘agreeability on a good list’ as the dependent variable 4. Track the change in user perceptions between the three tasks 5. Compare the pre-study and post-study variables for understanding whether the target participants are benefitted from the tasks 6. Identify the top most preferred and critical aspects of the task recommendations and the system using the subjective feedback of the participants 13
  • 14. User Study Details • Rec4LRW system was made available over the internet • Participants were recruited with intent to get worldwide audience • Only researchers with paper authoring experience were recruited through a pre-screening survey • 230 researchers participated in the pre-screening survey • 149 participants were deemed eligible and invited for the study • Participants provided with a user guide • All the three tasks were required to be executed by the participants • Evaluation questionnaires embedded in the screen of each task of Rec4LRW system 14
  • 15. Task Evaluation Measures Common Measures • Relevance • Usefulness • Good_List Tasks 1 and 2 • Good_Spread • Diversity • Interdisciplinarity • Popularity • Recency • Good_Mix • Familiarity • Novelty • Serendipity • Expansion_Required • User_Satisfaction Task 2 • Seedbasket_Similarity • Shared_Corelations • Seedbasket_Usefulness Task 3 • Importance • Certainty • Shortlisting_Feature 15 Qualitative Feedback 1) From the displayed information, what features did you like the most? 2) Please provide your personal feedback about the execution of this task
  • 16. System Evaluation Measures Effort to use the System (EUS) • Convenience • Effort_Required • Mouse_Clicks • Little_Time • Much_Time Perceived Usefulness (PU) • Productivity_Improvability • Enhance_Effectiveness • Ease_Job • Work_Usefulness Perceived System Effectiveness (PSE) • Recommend • Pleasant_Experience • Useless • Awareness • Better_Choice • Findability • Accomplish_Tasks • Performance_Improvability 16
  • 18. Analysis Procedures Quantitative Data • Agreement Percentage (AP) calculated by only considering responses of 4 (‘Agree’) and 5 (‘Strongly Agree’) in the 5-point Likert scale • Independent samples t-test for hypothesis testing • Spearman coefficient for correlation measurement • MLR used for the predictive models – Paired samples t-test for model validation Qualitative Data • Descriptive coding method was used to code the participant feedback • Two coders performed the coding in a sequential manner Preferred Aspects (κ) Critical Aspects (κ) Task 1 0.918 0.727 Task 2 0.930 0.758 Task 3 0.877 0.902 18
  • 19. Participant Demographics Stage N Task 1 132 Task 2 121 Task 3 119 Demographic Variable N Position Student 62 (47%) Staff 70 (53%) Experience Level Beginner 15 (11.4%) Intermediate 61 (46.2%) Advanced 34 (25.8%) Expert 22 (16.7%) Discipline N Computer Science & Information Systems 51 (38.6%) Library and Information Studies 30 (22.7%) Electrical & Electronic Engineering 30 (22.7%) Communication & Media Studies 8 (6.1%) Mechanical, Aeronautical & Manufacturing Engineering 5 (3.8%) Biological Sciences 2 (1.5%) Statistics & Operational Research 1 (0.8%) Education 1 (0.8%) Politics & International Studies 1 (0.8%) Economics & Econometrics 1 (0.8%) Civil & Structural Engineering 1 (0.8%) Psychology 1 (0.8%) Country N Singapore 107 (81.1%) India 4 (3%) Malaysia 3 (2.3%) Sri Lanka 3 (2.3%) Pakistan 3 (2.3%) Indonesia 2 (1.5%) Germany 2 (1.5%) Australia 1 (0.8%) Iran 1 (0.8%) Thailand 1 (0.8%) China 1 (0.8%) USA 1 (0.8%) Canada 1 (0.8%) Sweden 1 (0.8%) Slovenia 1 (0.8%) 19
  • 20. Results for Goals 1 & 2 20
  • 21. Results for Goals 3 and 4 Predictors for “Good_List” Task Independent Variables Task 1 Recency, Novelty, Serendipity, Usefulness, User_Satisfaction Task 2 Seedbasket_Similarity, Usefulness Task 3 Relevance, Usefulness, Certainty Transition of User Perception from Task 1 to 2 21
  • 22. Results for Goal 5 0 1 3 4 00 6 5 10 21 9 18 22 40 1 11 18 10 1 2 5 6 0 5 10 15 20 25 Count 1 2 3 4 5 0 3 5 20 30 3 9 30 41 2 7 21 20 0 3 1 2 0 5 10 15 20 25 30 35 Count 1 2 3 4 5 0 1 3 2 30 2 8 15 40 4 7 24 6 0 1 5 16 31 1 2 5 1 0 5 10 15 20 25 30 Count 1 2 3 4 5 Task 1 Task 2 Task 3 Need_Assistance (pre study) Vs. Good_List (post study) 22 Never Rarely Sometimes Often Always Never Rarely Sometimes Often Always Never Rarely Sometimes Often Always
  • 23. Results for Goal 6 Top 5 Preferred Aspects Rank Task 1 (N=109) Task 2 (N=100) Task 3 (N=91) 1 Information Cue Labels (41%) Shared Co-citations & Co-references (28%) Shortlisting Feature & Recommendation Quality (24%) 2 Rich Metadata (21%) Recommendation Quality (27%) Information Cue Labels (15%) 3 Diversity of Papers (13%) Information Cue Labels (16%) View Papers in Clusters (11%) 4 Recommendation Quality (9%) Seed Basket (14%) Rich Metadata (7%) 5 Recency of Papers (4%) Rich Metadata (9%) Ranking of Papers (3%) Rank Task 1 (N=109) Task 2 (N=100) Task 3 (N=91) 1 Broad topics not suitable (20%) Quality can be improved (16%) Rote selection of papers for task execution (16%) 2 Limited dataset (7%) Limited dataset (12%) Limited dataset (5%) 3 Quality can be improved (6%) Recommendation algorithm could include more dimensions (7%) Algorithm can be improved (5%) 4 Different algorithm required (5%) Speed can be improved (7%) Not sure of the usefulness (4%) 5 Free-text search required (4%) Repeated recommendations from Task 1 (3%) UI can be improved (3%) Top 5 Critical Aspects 23
  • 24. SPRRF - Scientific Paper Retrieval and Recommender Framework (SPRRF) Distinct User Groups Usefulness of Information Cue Labels Forced Serendipity vs. Natural Serendipity Learning Algorithms vs. Fixed-Logic Algorithms Inclusion of Control Features in UI Inclusion of Bibliometric Data Diversification of Corpus • Seven themes identified using holistic coding method • SPRRF conceptualized as a mental model based on the themes • The framework needs to be validated 24
  • 25. Questions for Discussion  How dependable are the gold standard lists in SPRS evaluation since relevance is largely dependent on user perspective?  Should SPRS evaluations be conducted in a parallel or serial manner?  What type of data should be collected during usability testing in SPRS evaluation? 25

Editor's Notes

  1. The link from RO to Study I can be more well established?
  2. Insert the CALLOUTS
  3. Insert the CALLOUTS
  4. Insert the CALLOUTS
  5. Need to put some finishing touches. Also discussion points
  6. Do some formatting
  7. Do some formatting
  8. Glamour
  9. Glamour work to be done
  10. Glamour
  11. INSERT LABELS
  12. Galmour