An Algorithm to Determine Peer-Reviewers

An Algorithm to Determine Peer-Reviewers

Marko A. Rodriguez and Johan Bollen
Digital Library Research and Prototyping Team
T-7, Center for Nonlinear Studies
Los Alamos National Laboratory

October 25, 2008

Peer-Review Problem Statement

• Editors are overwhelmed due to the number of submissions.
Provide mechanisms to decentralize the peer-review process [10].

• Editors have a difficult time locating referees who know the domain of
discourse and do not have a ethical conflict with reviewing the submission.
Automate the referee identification problem [9].

Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008

Hypothesis

• It is hypothesized that the authors of the cited articles and their coauthors
are good referees.
• It is hypothesis that conﬂict of interest referees are considered the authors
of the article and their coauthors.

With respect to the article associated with this presentation:

• David Yarowsky, Radu Florian, Fabio Crestani, Tamara Sumner, etc. are
considered competent referees.
• Marko A. Rodriguez, Johan Bollen, Herbert Van de Sompel, Xiaoming
Liu, Michael Nelson, etc. are considered conﬂict of interest referees.


Outline

• Deﬁne the coauthorship network data structure.

• Deﬁne the particle-swarm algorithm.

• Present experimental results validating the proposed algorithm.

• Related work and conclusion.


A Scholarly Coauthorship Network

Author-D
Author-B

Author-F
Author-C

Author-A

Author-E

All edges have a single homogenous meaning of “coauthor”. If Author-A
and Author-B have written an article together, then they are considered
coauthors.


Our coauthorship network is deﬁned as

G = (V, E, ω),

where V is the set of vertices, E ⊆ (V × V ), and ω : E → R+. The
function rule for ω is
1
ω(i, j) = ω(j, i) → ,
α(m) − 1
∀m∈M by i,j

where M is the set of all manuscripts and α : M → N+ maps each
manuscript to the total number of authors for that manuscript. Thus, the
more authors on an article, the less “coauthor weight” exist between them
with respect to that article [5, 7, 6].



Finally the weight of all edges outgoing from a vertex are normalized to
form a probability distribution over the outgoing edge set. Thus, for a
particular vertex i,
ω : E → [0, 1]
such that
ω(i, j)
ω (i, j) → ,
k ω(i, k)
where ω (i, j) need not equal ω (j, i).


A Particle Swarm Algorithm

0.75

1.0

0.5 0.25

1.0

0.5

1.0

1.0

t=1 t=2 t=3 t=4

A particle begins its journey at a particular vertex and will take an
outgoing edge of its current vertex biased by the outgoing probability
distribution deﬁned over the outgoing edge set. Moreover, at each discrete
timestep in N+ the particle decays in energy.


The set of all particles in the network is P where pi ∈ P is the ith particle.
The properties of an individual particle include:

1. ci(t) ∈ V : is the location of the particle pi at time t.

2. i (t)∈ R: is the amount of energy contained within the particle pi at
time t.

3. δi ∈ [0, 1]: is the decay parameter governing the loss of energy as the
particle pi propagates through the network. This is a globally deﬁned
parameter in our experiment with decay set to ∀iδi = 0.15.

4. particles can maintain state and have heterogenous internal logics to
perform more complex walks.


The algorithm runs for k timesteps. At each step, the particle has its
energy decayed such that

(1 − δi) i(t) if t ≤ k
i (t + 1) =
0 otherwise

Finally, there exists a global rank vector e ∈ R|V | that records how much
energy has passed through each vertex.

eci(t)(t + 1) = eci(t)(t) + i(t)

Thus,
t≤k i≤|P |
(1 − δi)t−1 i(1) if ci(t − 1) = nl
el(k) =
t=1 i=1
0 otherwise.


Experimental Results
submission co-authorship
- network

-
author1
author2 -
-
reference1
reference2 + +

+
+

Authors of the submitted article have negative energy particles provided to
their corresponding vertex in the coauthorship network. The authors of the
referenced articles (i.e. cited authors) are provided positive energy
particles.


The DBLP provided us the data set from which to construct our
coauthorship network, which includes 284,082 authors and 2,167,018
coauthorship edges. The 2005 ACM/IEEE Joint Conference on Digital
Libraries provided us a their program committees referee bid data. That is,
for each of the 124 submitted manuscripts, each of the 77 program
committee members stated:

1. I am an expert in the domain of the submission and want to review
2. I am an expert in the domain of the submission
3. I am not an expert in the domain of the submission
4. There exists a conﬂict of interest

[1] ≈ [2] > [3] ≈ [4] ≈ 0.


[1] expert wanting to review (k=2) [2] expert (k=2)

200
30

150
frequency

frequency
20

100
5 10

50
0

0
!20 !15 !10 !5 0 !20 !15 !10 !5 0

log of the energy value log of the energy value

[3] non!expert (k=2) [4] conflict of interest (k=2)
100

25
80

20
frequency

frequency
60

15
40

10
20

5
0

0

!20 !15 !10 !5 0 !20 !15 !10 !5 0

log of the energy value log of the energy value



0.4
average individual energy
0.3
0.2

(4) conflict of interest
0.1

(2) expert (1) expert wanting to review

(3) non!expert
0.0

0 1 2 3 4 5 6 7
k!steps of negative energy



• Other types of relationships are involved in conﬂict of interest situations
besides previous article collaborations (e.g. same institution, friendship,
shared committees, etc.) [2, 8].


Related Work and Conclusion

• Latent semantic indexing to match manuscript abstract to referees [3, 11].
• Expertise identiﬁcation via web mining techniques [1].
• Simply asking authors and the referees to provide keyterms describing their manuscript
and area of expertise respectively [4].

• Due to the computational and human intervention costs, applications of the mentioned
referee identiﬁcation algorithms have been restricted to situations in which such
information can be obtained for a pre-selected set of individuals, e.g. conferences and
workshops.
• They have consequently failed to gain acceptance in the domain of classic journal
peer-review and open commentary peer-review.


Related Work and Conclusion

• The proposed automatic referee identification algorithm requires no human intervention,
is computationally efficient, and can, to some extent, automatically identify conflict of
interest situations.
• The referee weighting aspect of the algorithm provides a strong incentive for its use
in open commentary peer-review. The level of automation provides the necessary
infrastructure to decouple the publication process from the peer-review process in the
sense that editors are no longer required to assign referees.


Acknowledgements

• This research could not have been conducted if it were not for the
support of the 2005 JCDL program chair and steering committee.

• Herbert Van de Sompel supported this research through data acquisition.

• Journal of Memetics1 for using a prototype implementation of the
algorithm in their peer-review process.

• This research was ﬁnancially supported by the Los Alamos National
Laboratory.

1
Journal of Memetics available at: http://www.jom-emit.org/


References

[1] C. Basu, H. Hirsh, W. Cohen, and C. Nevill-Manning. Technical paper
recommendation: A study in combining multiple information sources.
Journal of Artificial Intelligence Research, 14:231–252, 2001.

[2] Johan Bollen, Marko A. Rodriguez, Herbert Van de Sompel, Luda L.
Balakireva, and Aric Hagberg. The largest scholarly semantic
network...ever. In ACM World Wide Web Conference, Banff, Canada,
Banff, Canada 2007. ACM Press.

[3] Susan T. Dumais and Jakob Nielsen. Automating the assignment of
submitted manuscripts to reviewers. In SIGIR ’92: Proceedings of
the 15th annual international ACM SIGIR conference on Research and


development in information retrieval, pages 233–244, Copenhagen,
Denmark, 1992. ACM Press.

[4] Juan J. Merelo Guerv´s and Pedro A. Castillo Valdivieso. Conference
o
paper assignment using a combined greedy/evolutionary algorithm.
In Proceedings of the International Conference on Parallel Problem
Solving from Nature, pages 602–611, Birmingham, UK, 2004.

[5] Xiaoming Liu, Johan Bollen, Michael L. Nelson, and Herbert Van
de Sompel. Co-authorship networks in the digital library research
community. Information Processing and Management, 41(6):1462–
1480, 2006.

[6] M E J Newman. Scientiﬁc collaboration networks: I. network
construction and fundamental results. Physical Review E,
64(1):016131, 2001.


[7] M E J Newman. Scientiﬁc collaboration networks: Ii. shortest paths,
weighted networks, and centrality. Physical Review E, 64(1):016132,
2001.

[8] Marko A. Rodriguez. Grammar-based random walkers in semantic
networks. Knowledge-Based Systems, 21(7):727–739, 2008.

[9] Marko A. Rodriguez, Johan Bollen, and Herbert Van de Sompel.
An algorithm to determine peer-reviewers. In Proceedings of the
Conference on Information and Knowledge Management, Napa,
California, October 2008. ACM Press.

[10] Marko A. Rodriguez, Johan Bollen, and Herbert Van de Sompel. The
convergence of digital-libraries and the peer-review process. Journal of
Information Science, 32(2):151–161, 2006.


[11] D. Yarowsky and R. Florian. Taking the load oﬀ the conference
chairs: towards a digital paper-routing assistant. In Proceedings of
the 1999 Joint SIGDAT Conference on Empirical Methods in NLP and
Very-Large Corpora., Cambridge, MA, 1999.


An Algorithm to Determine Peer-Reviewers

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to An Algorithm to Determine Peer-Reviewers

Similar to An Algorithm to Determine Peer-Reviewers (20)

More from Marko Rodriguez

More from Marko Rodriguez (20)

Recently uploaded

Recently uploaded (20)

An Algorithm to Determine Peer-Reviewers