Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking

Dasha Herrmannova
Dasha HerrmannovaResearch Scientist at Oak Ridge National Laboratory
Simple Yet Effective Methods for Large-Scale
Scholarly Publication Ranking
KMi and Mendeley (team BletchleyPark) at WSDM Cup 2016
Drahomira Herrmannova & Petr Knoth
The Open University & Mendeley
WSDM Cup 2016, February 2016
1 / 17
Our approach
• Hypothesis
• the importance of a publication can be determined by a
mixture of factors evidencing its impact and the importance of
entities which participated in the publication’s creation
2 / 17
Our approach
• Method
1 separately score each of the types of entities in the graph
2 use the separate scores to provide a publication score
3 we obtain several different scores for the publication entities
4 final score, which defines publication’s rank, is calculated using
linear combination of the scores
• Weights obtained experimentally
• The final equation
score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+
1.0 · sauth + 0.1 · svenue + 0.01 · sinst
(1)
3 / 17
Publication-based scoring functions
score(p) = 2.5 · spub + 0.1 · sage + 1.0 · spr +
1.0 · sauth + 0.1 · svenue + 0.01 · sinst
4 / 17
Publication-based scoring functions
• Scoring publication entities directly (without considering the
importance of authors or venues)
• We have experimented with several options of normalising and
weighting publication citations
• Applying a time decay to citations
• Applying a decay function to total citation counts
• Using mean citation counts
• Final scoring function:
spub(p) =
c(p)/|Ap|, for c(p) ≤ t
t/|Ap|, for c(p) > t
(2)
5 / 17
Publication-based scoring functions
• To account for publication age we added a score based on age:
sage(p) = yp (3)
• In the second phase of the challenge we have included
PageRank as an additional feature:
spr(p) = PR(p) (4)
6 / 17
Author-based score
score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+
1.0 · sauth +0.1 · svenue + 0.01 · sinst
(5)
7 / 17
Author-based score
• We’ve experimented with some commonly used methods for
evaluating author performance (number of citations, h-index)
• We calculated the given value and each of the authors of a
publication and tested scoring publications using maximum,
total and mean of these values
• Final scoring function uses mean citation score per publication
and author:
sauth(p) =
a∈Ap
x∈Pa
c(x)
|Pa|
|Ap|
(6)
8 / 17
Venue-based score
score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+
1.0 · sauth + 0.1 · svenue + 0.01 · sinst
(7)
9 / 17
Venue-based score
• Standard metric in this area is the JIF, alternatives – Scimago
Journal Rank, Eigenfactor
• We have experimented with few simple scoring functions (JIF,
total citation counts, ...)
• Final venue-based score:
svenue(p) =
x∈Pv,x=p
c(x) (8)
10 / 17
Institution-based score
score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+
1.0 · sauth + 0.1 · svenue + 0.01 · sinst
(9)
11 / 17
Institution-based score
• Simple approach similar to author- and venue-based scores:
sinst(p) =
i∈Ip x∈Pi,x=p c(x)
|Ip|
(10)
12 / 17
Potential improvements
• Better utilisation of the citation network
• Inclusion of additional data sources
• Possibility to analyse the evaluation data and metric
• Revise the maximum citation threshold used in the spub score
13 / 17
What have we learned?
• We found simple citation counts to perform best, but (!):
• In order to develop more optimal ranking method, it is crucial
to better understand the evaluation data and method
• Citation counting does not account for many characteristics of
citations (differences in their meaning, popularity of certain
topics and types of research papers, ...)
14 / 17
Alternative ranking methods
• We’ve explored several external datasources
• Motivation – utilising new altmetric and webometric
datasources
• Early availability of the data compared compared to citations
• Broader view of publication’s impact
15 / 17
Alternative ranking methods
• Our main interest is in full-text and the set of metrics referred
to as Semantometrics
• Semantometrics build on the premise the manuscript of the
publication is needed to assess its value (in contrast to utilising
external data)
• Biggest problem – obtaining the full-texts due to copyright
restrictions and paywalls
• We’re experimenting with enriching the MAG with the
publication full-texts
• Enriching MAG with altmetric, webometric and
semantometric data would enable developing and testing
fundamentally new metrics
16 / 17
Thank you for listening!
• Sources
• https://github.com/damirah/wsdm_cup
• Workshop on Mining Scientific Publications
• http://wosp.core.ac.uk/jcdl2016/
• Submission deadline – 17th April
17 / 17
1 of 17

Recommended

A data driven approach to measure web site navigability by
A data driven approach to measure web site navigabilityA data driven approach to measure web site navigability
A data driven approach to measure web site navigabilityShu-Jeng Hsieh
466 views21 slides
Proposing a Scientific Paper Retrieval and Recommender Framework by
Proposing a Scientific Paper Retrieval and Recommender FrameworkProposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender FrameworkAravind Sesagiri Raamkumar
291 views21 slides
Comparison of Techniques for Measuring Research Coverage of Scientific Papers... by
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Aravind Sesagiri Raamkumar
815 views20 slides
What papers should I cite from my reading list? User evaluation of a manuscri... by
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...Aravind Sesagiri Raamkumar
400 views16 slides
More Than Just Black and White: A Case for Grey Literature References in Scie... by
More Than Just Black and White: A Case for Grey Literature References in Scie...More Than Just Black and White: A Case for Grey Literature References in Scie...
More Than Just Black and White: A Case for Grey Literature References in Scie...Aravind Sesagiri Raamkumar
697 views19 slides
Rec4LRW – Scientific Paper Recommender System for Literature Review and Writing by
Rec4LRW – Scientific Paper Recommender System for Literature Review and WritingRec4LRW – Scientific Paper Recommender System for Literature Review and Writing
Rec4LRW – Scientific Paper Recommender System for Literature Review and WritingAravind Sesagiri Raamkumar
1.5K views21 slides

More Related Content

Viewers also liked

Mds 81 25.2001 by
Mds 81 25.2001Mds 81 25.2001
Mds 81 25.2001vikmanam45
92 views2 slides
Taller informatico by
Taller informaticoTaller informatico
Taller informaticoolgarodriguezm
227 views1 slide
Lehtiartikkeli verkkokursseista Liperin lukiossa 2004 by
Lehtiartikkeli verkkokursseista Liperin lukiossa 2004Lehtiartikkeli verkkokursseista Liperin lukiossa 2004
Lehtiartikkeli verkkokursseista Liperin lukiossa 2004Matti J. Yrjänä
147 views1 slide
Mapas mentales. by
Mapas mentales.Mapas mentales.
Mapas mentales.Mariifer30
661 views11 slides
Inscritos motoland by
Inscritos motolandInscritos motoland
Inscritos motolandzhodiac
225 views1 slide
Kazakhstan regulatory norms 115 by
Kazakhstan regulatory norms 115Kazakhstan regulatory norms 115
Kazakhstan regulatory norms 115vikmanam45
130 views1 slide

Viewers also liked(15)

Lehtiartikkeli verkkokursseista Liperin lukiossa 2004 by Matti J. Yrjänä
Lehtiartikkeli verkkokursseista Liperin lukiossa 2004Lehtiartikkeli verkkokursseista Liperin lukiossa 2004
Lehtiartikkeli verkkokursseista Liperin lukiossa 2004
Matti J. Yrjänä147 views
Mapas mentales. by Mariifer30
Mapas mentales.Mapas mentales.
Mapas mentales.
Mariifer30661 views
Inscritos motoland by zhodiac
Inscritos motolandInscritos motoland
Inscritos motoland
zhodiac225 views
Kazakhstan regulatory norms 115 by vikmanam45
Kazakhstan regulatory norms 115Kazakhstan regulatory norms 115
Kazakhstan regulatory norms 115
vikmanam45130 views
José María Zabala - innosfera by Innosfera
José María Zabala - innosferaJosé María Zabala - innosfera
José María Zabala - innosfera
Innosfera551 views
At A Glance - AJF Financial Services by Ryan Monroe
At A Glance - AJF Financial Services At A Glance - AJF Financial Services
At A Glance - AJF Financial Services
Ryan Monroe185 views
Architectures ouvertes, distribuées et intelligentes de partage d’information... by Mathieu d'Aquin
Architectures ouvertes, distribuées et intelligentes de partage d’information...Architectures ouvertes, distribuées et intelligentes de partage d’information...
Architectures ouvertes, distribuées et intelligentes de partage d’information...
Mathieu d'Aquin417 views
Trabajo de personalidad en armas de fuego by paulguillen2012
Trabajo de personalidad en armas de fuegoTrabajo de personalidad en armas de fuego
Trabajo de personalidad en armas de fuego
paulguillen20122.2K views
Menu/Catalogue tutorial by My App Editor
Menu/Catalogue tutorialMenu/Catalogue tutorial
Menu/Catalogue tutorial
My App Editor1.1K views
Colaboración Familiar by Ledy Cabrera
Colaboración FamiliarColaboración Familiar
Colaboración Familiar
Ledy Cabrera7.8K views

Similar to Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking

An Efficient Algorithm For Ranking Research Papers Based On Citation Network by
An Efficient Algorithm For Ranking Research Papers Based On Citation NetworkAn Efficient Algorithm For Ranking Research Papers Based On Citation Network
An Efficient Algorithm For Ranking Research Papers Based On Citation NetworkAndrea Porter
4 views7 slides
Multi-method Evaluation in Scientific Paper Recommender Systems by
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsAravind Sesagiri Raamkumar
284 views26 slides
Digital Skills Gap Peer Learning Activity - Analysis of usage of LMS activiti... by
Digital Skills Gap Peer Learning Activity - Analysis of usage of LMS activiti...Digital Skills Gap Peer Learning Activity - Analysis of usage of LMS activiti...
Digital Skills Gap Peer Learning Activity - Analysis of usage of LMS activiti...EDEN Digital Learning Europe
93 views37 slides
Search quality in practice by
Search quality in practiceSearch quality in practice
Search quality in practiceAlexander Sibiryakov
1.3K views49 slides
How to conduct health technology assessment using Gradepro by
How to conduct health technology assessment using GradeproHow to conduct health technology assessment using Gradepro
How to conduct health technology assessment using GradeproArin Basu
325 views59 slides

Similar to Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking(20)

An Efficient Algorithm For Ranking Research Papers Based On Citation Network by Andrea Porter
An Efficient Algorithm For Ranking Research Papers Based On Citation NetworkAn Efficient Algorithm For Ranking Research Papers Based On Citation Network
An Efficient Algorithm For Ranking Research Papers Based On Citation Network
Andrea Porter4 views
How to conduct health technology assessment using Gradepro by Arin Basu
How to conduct health technology assessment using GradeproHow to conduct health technology assessment using Gradepro
How to conduct health technology assessment using Gradepro
Arin Basu325 views
Performance Management to Program Evaluation: Creating a Complementary Connec... by nicholes21
Performance Management to Program Evaluation: Creating a Complementary Connec...Performance Management to Program Evaluation: Creating a Complementary Connec...
Performance Management to Program Evaluation: Creating a Complementary Connec...
nicholes21896 views
Post-it Up: Qualitative Data Analysis of a Test Fest by Sarah Joy Arnold
Post-it Up: Qualitative Data Analysis of a Test FestPost-it Up: Qualitative Data Analysis of a Test Fest
Post-it Up: Qualitative Data Analysis of a Test Fest
Sarah Joy Arnold44 views
Contextual Information Elicitation in Travel Recommender Systems by Matthias Braunhofer
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender Systems
eMOOCs2015 Does peer grading work? by Rémi Bachelet
eMOOCs2015 Does peer grading work?eMOOCs2015 Does peer grading work?
eMOOCs2015 Does peer grading work?
Rémi Bachelet8.2K views
An Example of Predictive Analytics: Building a Recommendation Engine Using Py... by PyData
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
PyData8K views
Talis Insight Europe 2017 - Taking the pain out of reporting - University of ... by Talis
Talis Insight Europe 2017 - Taking the pain out of reporting - University of ...Talis Insight Europe 2017 - Taking the pain out of reporting - University of ...
Talis Insight Europe 2017 - Taking the pain out of reporting - University of ...
Talis177 views
Fundamentals of Program Evaluation by HotCubator
Fundamentals of Program Evaluation Fundamentals of Program Evaluation
Fundamentals of Program Evaluation
HotCubator75 views
NTLTC 2011 - student use of academic resources in assignments by NTLT Conference
NTLTC 2011 - student use of academic resources in assignmentsNTLTC 2011 - student use of academic resources in assignments
NTLTC 2011 - student use of academic resources in assignments
NTLT Conference655 views
Developing a Tutorial for Grouping Analysis in ArcGIS by COGS Presentations
Developing a Tutorial for Grouping Analysis in ArcGISDeveloping a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGIS
COGS Presentations5.4K views

More from Dasha Herrmannova

Machine Learning for Data Extraction by
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data ExtractionDasha Herrmannova
92 views61 slides
Do Authors Deposit on Time? Tracking Open Access Policy Compliance by
Do Authors Deposit on Time? Tracking Open Access Policy ComplianceDo Authors Deposit on Time? Tracking Open Access Policy Compliance
Do Authors Deposit on Time? Tracking Open Access Policy ComplianceDasha Herrmannova
559 views39 slides
Semantometrics: Text Analysis in Research Evaluation by
Semantometrics: Text Analysis in Research Evaluation Semantometrics: Text Analysis in Research Evaluation
Semantometrics: Text Analysis in Research Evaluation Dasha Herrmannova
135 views18 slides
Do Citations and Readership Predict Excellent Publications? by
Do Citations and Readership Predict Excellent Publications?Do Citations and Readership Predict Excellent Publications?
Do Citations and Readership Predict Excellent Publications?Dasha Herrmannova
171 views12 slides
An Analysis of the Microsoft Academic Graph by
An Analysis of the Microsoft Academic GraphAn Analysis of the Microsoft Academic Graph
An Analysis of the Microsoft Academic GraphDasha Herrmannova
512 views32 slides
Visual Search for Supporting Content Exploration in Large Document Collections by
Visual Search for Supporting Content Exploration in Large Document CollectionsVisual Search for Supporting Content Exploration in Large Document Collections
Visual Search for Supporting Content Exploration in Large Document CollectionsDasha Herrmannova
250 views48 slides

More from Dasha Herrmannova(10)

Do Authors Deposit on Time? Tracking Open Access Policy Compliance by Dasha Herrmannova
Do Authors Deposit on Time? Tracking Open Access Policy ComplianceDo Authors Deposit on Time? Tracking Open Access Policy Compliance
Do Authors Deposit on Time? Tracking Open Access Policy Compliance
Dasha Herrmannova559 views
Semantometrics: Text Analysis in Research Evaluation by Dasha Herrmannova
Semantometrics: Text Analysis in Research Evaluation Semantometrics: Text Analysis in Research Evaluation
Semantometrics: Text Analysis in Research Evaluation
Dasha Herrmannova135 views
Do Citations and Readership Predict Excellent Publications? by Dasha Herrmannova
Do Citations and Readership Predict Excellent Publications?Do Citations and Readership Predict Excellent Publications?
Do Citations and Readership Predict Excellent Publications?
Dasha Herrmannova171 views
An Analysis of the Microsoft Academic Graph by Dasha Herrmannova
An Analysis of the Microsoft Academic GraphAn Analysis of the Microsoft Academic Graph
An Analysis of the Microsoft Academic Graph
Dasha Herrmannova512 views
Visual Search for Supporting Content Exploration in Large Document Collections by Dasha Herrmannova
Visual Search for Supporting Content Exploration in Large Document CollectionsVisual Search for Supporting Content Exploration in Large Document Collections
Visual Search for Supporting Content Exploration in Large Document Collections
Dasha Herrmannova250 views
Unsupervised Identification of Study Descriptors in Toxicology Research: An E... by Dasha Herrmannova
Unsupervised Identification of Study Descriptors in Toxicology Research: An E...Unsupervised Identification of Study Descriptors in Toxicology Research: An E...
Unsupervised Identification of Study Descriptors in Toxicology Research: An E...
Dasha Herrmannova186 views
Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin... by Dasha Herrmannova
Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...
Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...
Dasha Herrmannova1.1K views
Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing... by Dasha Herrmannova
Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing...Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing...
Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing...
Dasha Herrmannova567 views
Mining Research Publication Networks for Impact -- KMi Internal Seminar by Dasha Herrmannova
Mining Research Publication Networks for Impact -- KMi Internal SeminarMining Research Publication Networks for Impact -- KMi Internal Seminar
Mining Research Publication Networks for Impact -- KMi Internal Seminar
Dasha Herrmannova2.4K views

Recently uploaded

domestic waste_100013.pptx by
domestic waste_100013.pptxdomestic waste_100013.pptx
domestic waste_100013.pptxpadmasriv25
11 views17 slides
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN... by
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...gynomark
12 views15 slides
Physical Characterization of Moon Impactor WE0913A by
Physical Characterization of Moon Impactor WE0913APhysical Characterization of Moon Impactor WE0913A
Physical Characterization of Moon Impactor WE0913ASérgio Sacani
42 views12 slides
Workshop Chemical Robotics ChemAI 231116.pptx by
Workshop Chemical Robotics ChemAI 231116.pptxWorkshop Chemical Robotics ChemAI 231116.pptx
Workshop Chemical Robotics ChemAI 231116.pptxMarco Tibaldi
95 views41 slides
1978 NASA News Release Log by
1978 NASA News Release Log1978 NASA News Release Log
1978 NASA News Release Logpurrterminator
7 views146 slides
Guinea Pig as a Model for Translation Research by
Guinea Pig as a Model for Translation ResearchGuinea Pig as a Model for Translation Research
Guinea Pig as a Model for Translation ResearchPervaizDar1
11 views21 slides

Recently uploaded(20)

domestic waste_100013.pptx by padmasriv25
domestic waste_100013.pptxdomestic waste_100013.pptx
domestic waste_100013.pptx
padmasriv2511 views
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN... by gynomark
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...
gynomark12 views
Physical Characterization of Moon Impactor WE0913A by Sérgio Sacani
Physical Characterization of Moon Impactor WE0913APhysical Characterization of Moon Impactor WE0913A
Physical Characterization of Moon Impactor WE0913A
Sérgio Sacani42 views
Workshop Chemical Robotics ChemAI 231116.pptx by Marco Tibaldi
Workshop Chemical Robotics ChemAI 231116.pptxWorkshop Chemical Robotics ChemAI 231116.pptx
Workshop Chemical Robotics ChemAI 231116.pptx
Marco Tibaldi95 views
Guinea Pig as a Model for Translation Research by PervaizDar1
Guinea Pig as a Model for Translation ResearchGuinea Pig as a Model for Translation Research
Guinea Pig as a Model for Translation Research
PervaizDar111 views
Gold Nanoparticle as novel Agent for Drug targeting (1).pptx by sakshijadhav9843
Gold Nanoparticle as novel Agent for Drug targeting (1).pptxGold Nanoparticle as novel Agent for Drug targeting (1).pptx
Gold Nanoparticle as novel Agent for Drug targeting (1).pptx
sakshijadhav984318 views
Ethical issues associated with Genetically Modified Crops and Genetically Mod... by PunithKumars6
Ethical issues associated with Genetically Modified Crops and Genetically Mod...Ethical issues associated with Genetically Modified Crops and Genetically Mod...
Ethical issues associated with Genetically Modified Crops and Genetically Mod...
PunithKumars618 views
RemeOs science and clinical evidence by PetrusViitanen1
RemeOs science and clinical evidenceRemeOs science and clinical evidence
RemeOs science and clinical evidence
PetrusViitanen126 views
Max Welling ChemAI 231116.pptx by Marco Tibaldi
Max Welling ChemAI 231116.pptxMax Welling ChemAI 231116.pptx
Max Welling ChemAI 231116.pptx
Marco Tibaldi144 views
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx by MN
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptxENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx
MN6 views
himalay baruah acid fast staining.pptx by HimalayBaruah
himalay baruah acid fast staining.pptxhimalay baruah acid fast staining.pptx
himalay baruah acid fast staining.pptx
HimalayBaruah5 views

Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking

  • 1. Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking KMi and Mendeley (team BletchleyPark) at WSDM Cup 2016 Drahomira Herrmannova & Petr Knoth The Open University & Mendeley WSDM Cup 2016, February 2016 1 / 17
  • 2. Our approach • Hypothesis • the importance of a publication can be determined by a mixture of factors evidencing its impact and the importance of entities which participated in the publication’s creation 2 / 17
  • 3. Our approach • Method 1 separately score each of the types of entities in the graph 2 use the separate scores to provide a publication score 3 we obtain several different scores for the publication entities 4 final score, which defines publication’s rank, is calculated using linear combination of the scores • Weights obtained experimentally • The final equation score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth + 0.1 · svenue + 0.01 · sinst (1) 3 / 17
  • 4. Publication-based scoring functions score(p) = 2.5 · spub + 0.1 · sage + 1.0 · spr + 1.0 · sauth + 0.1 · svenue + 0.01 · sinst 4 / 17
  • 5. Publication-based scoring functions • Scoring publication entities directly (without considering the importance of authors or venues) • We have experimented with several options of normalising and weighting publication citations • Applying a time decay to citations • Applying a decay function to total citation counts • Using mean citation counts • Final scoring function: spub(p) = c(p)/|Ap|, for c(p) ≤ t t/|Ap|, for c(p) > t (2) 5 / 17
  • 6. Publication-based scoring functions • To account for publication age we added a score based on age: sage(p) = yp (3) • In the second phase of the challenge we have included PageRank as an additional feature: spr(p) = PR(p) (4) 6 / 17
  • 7. Author-based score score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth +0.1 · svenue + 0.01 · sinst (5) 7 / 17
  • 8. Author-based score • We’ve experimented with some commonly used methods for evaluating author performance (number of citations, h-index) • We calculated the given value and each of the authors of a publication and tested scoring publications using maximum, total and mean of these values • Final scoring function uses mean citation score per publication and author: sauth(p) = a∈Ap x∈Pa c(x) |Pa| |Ap| (6) 8 / 17
  • 9. Venue-based score score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth + 0.1 · svenue + 0.01 · sinst (7) 9 / 17
  • 10. Venue-based score • Standard metric in this area is the JIF, alternatives – Scimago Journal Rank, Eigenfactor • We have experimented with few simple scoring functions (JIF, total citation counts, ...) • Final venue-based score: svenue(p) = x∈Pv,x=p c(x) (8) 10 / 17
  • 11. Institution-based score score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth + 0.1 · svenue + 0.01 · sinst (9) 11 / 17
  • 12. Institution-based score • Simple approach similar to author- and venue-based scores: sinst(p) = i∈Ip x∈Pi,x=p c(x) |Ip| (10) 12 / 17
  • 13. Potential improvements • Better utilisation of the citation network • Inclusion of additional data sources • Possibility to analyse the evaluation data and metric • Revise the maximum citation threshold used in the spub score 13 / 17
  • 14. What have we learned? • We found simple citation counts to perform best, but (!): • In order to develop more optimal ranking method, it is crucial to better understand the evaluation data and method • Citation counting does not account for many characteristics of citations (differences in their meaning, popularity of certain topics and types of research papers, ...) 14 / 17
  • 15. Alternative ranking methods • We’ve explored several external datasources • Motivation – utilising new altmetric and webometric datasources • Early availability of the data compared compared to citations • Broader view of publication’s impact 15 / 17
  • 16. Alternative ranking methods • Our main interest is in full-text and the set of metrics referred to as Semantometrics • Semantometrics build on the premise the manuscript of the publication is needed to assess its value (in contrast to utilising external data) • Biggest problem – obtaining the full-texts due to copyright restrictions and paywalls • We’re experimenting with enriching the MAG with the publication full-texts • Enriching MAG with altmetric, webometric and semantometric data would enable developing and testing fundamentally new metrics 16 / 17
  • 17. Thank you for listening! • Sources • https://github.com/damirah/wsdm_cup • Workshop on Mining Scientific Publications • http://wosp.core.ac.uk/jcdl2016/ • Submission deadline – 17th April 17 / 17