Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking
1. Simple Yet Effective Methods for Large-Scale
Scholarly Publication Ranking
KMi and Mendeley (team BletchleyPark) at WSDM Cup 2016
Drahomira Herrmannova & Petr Knoth
The Open University & Mendeley
WSDM Cup 2016, February 2016
1 / 17
2. Our approach
• Hypothesis
• the importance of a publication can be determined by a
mixture of factors evidencing its impact and the importance of
entities which participated in the publication’s creation
2 / 17
3. Our approach
• Method
1 separately score each of the types of entities in the graph
2 use the separate scores to provide a publication score
3 we obtain several different scores for the publication entities
4 final score, which defines publication’s rank, is calculated using
linear combination of the scores
• Weights obtained experimentally
• The final equation
score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+
1.0 · sauth + 0.1 · svenue + 0.01 · sinst
(1)
3 / 17
5. Publication-based scoring functions
• Scoring publication entities directly (without considering the
importance of authors or venues)
• We have experimented with several options of normalising and
weighting publication citations
• Applying a time decay to citations
• Applying a decay function to total citation counts
• Using mean citation counts
• Final scoring function:
spub(p) =
c(p)/|Ap|, for c(p) ≤ t
t/|Ap|, for c(p) > t
(2)
5 / 17
6. Publication-based scoring functions
• To account for publication age we added a score based on age:
sage(p) = yp (3)
• In the second phase of the challenge we have included
PageRank as an additional feature:
spr(p) = PR(p) (4)
6 / 17
8. Author-based score
• We’ve experimented with some commonly used methods for
evaluating author performance (number of citations, h-index)
• We calculated the given value and each of the authors of a
publication and tested scoring publications using maximum,
total and mean of these values
• Final scoring function uses mean citation score per publication
and author:
sauth(p) =
a∈Ap
x∈Pa
c(x)
|Pa|
|Ap|
(6)
8 / 17
10. Venue-based score
• Standard metric in this area is the JIF, alternatives – Scimago
Journal Rank, Eigenfactor
• We have experimented with few simple scoring functions (JIF,
total citation counts, ...)
• Final venue-based score:
svenue(p) =
x∈Pv,x=p
c(x) (8)
10 / 17
12. Institution-based score
• Simple approach similar to author- and venue-based scores:
sinst(p) =
i∈Ip x∈Pi,x=p c(x)
|Ip|
(10)
12 / 17
13. Potential improvements
• Better utilisation of the citation network
• Inclusion of additional data sources
• Possibility to analyse the evaluation data and metric
• Revise the maximum citation threshold used in the spub score
13 / 17
14. What have we learned?
• We found simple citation counts to perform best, but (!):
• In order to develop more optimal ranking method, it is crucial
to better understand the evaluation data and method
• Citation counting does not account for many characteristics of
citations (differences in their meaning, popularity of certain
topics and types of research papers, ...)
14 / 17
15. Alternative ranking methods
• We’ve explored several external datasources
• Motivation – utilising new altmetric and webometric
datasources
• Early availability of the data compared compared to citations
• Broader view of publication’s impact
15 / 17
16. Alternative ranking methods
• Our main interest is in full-text and the set of metrics referred
to as Semantometrics
• Semantometrics build on the premise the manuscript of the
publication is needed to assess its value (in contrast to utilising
external data)
• Biggest problem – obtaining the full-texts due to copyright
restrictions and paywalls
• We’re experimenting with enriching the MAG with the
publication full-texts
• Enriching MAG with altmetric, webometric and
semantometric data would enable developing and testing
fundamentally new metrics
16 / 17
17. Thank you for listening!
• Sources
• https://github.com/damirah/wsdm_cup
• Workshop on Mining Scientific Publications
• http://wosp.core.ac.uk/jcdl2016/
• Submission deadline – 17th April
17 / 17