Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Peer judge: Praise and Criticism Detection in F1000Research reviews


Published on

Presentation by Mike Thelwall

Published in: Data & Analytics
  • Login to see the comments

  • Be the first to like this

Peer judge: Praise and Criticism Detection in F1000Research reviews

  1. 1. PeerJudge: Praise and Criticism Detection in F1000Research reviews Mike Thelwall University of Wolverhampton
  2. 2. PeerJudge Overview • Based on a dictionary of review sentiment terms and phrases from F1000Research reviews • Each dictionary term or phrase has a praise or criticism score • Well written: +2 • Flawed: -4 • Reviews given the maximum positive and negative scores of words or phrases found in each sentence. • -1: no criticism … -5: very strong criticism • 1: no praise …. 5: very strong praise • Also 12 linguistic rules to cope with negation, booster words (very, slightly)
  3. 3. PeerJudge Example • The paper is well written but the study is poorly designed. • Praise: 2; Criticism: -4. • Try: online •
  4. 4. Part of the dictionary acceptabl* 3 accurate 2 adequat* 3 appropriate 3 arbitrary -2 balanced 2 bewilder* -3 but 1 careful* 3 clarify -2 clear 4 clearer -3 compelling 3
  5. 5. Technical details • Java jar program • portable • Dictionaries are external plain text files • Easily customizable • Fast • 14,000 reviews per second • Explains its judgement • So is transparent and the owner can adjust the dictionary for recurrent problems • Agrees above random chance with reviewer scores • Because based on a dictionary, does not “cheat” by identifying hot topics, fields, affiliations or jargon
  6. 6. Where is the dictionary from? • Human evaluation of a development dataset of F1000Research reviews • Machine learning to suggest extra terms and different weights
  7. 7. Limitations • Designed for F1000Research decisions – needs dictionary modification for good performance on other review datasets. • F1000Research reviews are unbalanced – few negative decisions • F1000Research reviews have standard concluding text that had to be removed – so referees might not conclude • Referees often give judgements in field-specialist languages, avoiding general conclusions • More substantial modifications may be needed for technical domains. • Difficult to do this in advance because very few outlets publish reviews and scores
  8. 8. Applications • Warning reviewers if their judgements are apparently out of line with their scores? • Warning reviewers if they have not given any praise. • As above for editors • On a larger scale, allow publishers to check for anomalies in the reviewer process, such as by identifying journals with uncritical referees (low average criticism scores).