• Editors are overwhelmed due to the number of submissions. ⋆ Provide mechanisms to decentralize the peer-review process.
• Editors have a difficult time locating referees who know the domain of discourse and do not have a ethical conflict with reviewing the submission.
⋆ Automate the referee identification problem.
1. An Algorithm to Determine Peer-Reviewers
Marko A. Rodriguez and Johan Bollen
Digital Library Research and Prototyping Team
T-7, Center for Nonlinear Studies
Los Alamos National Laboratory
October 25, 2008
2. Peer-Review Problem Statement
• Editors are overwhelmed due to the number of submissions.
Provide mechanisms to decentralize the peer-review process [10].
• Editors have a difficult time locating referees who know the domain of
discourse and do not have a ethical conflict with reviewing the submission.
Automate the referee identification problem [9].
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
3. Hypothesis
• It is hypothesized that the authors of the cited articles and their coauthors
are good referees.
• It is hypothesis that conflict of interest referees are considered the authors
of the article and their coauthors.
With respect to the article associated with this presentation:
• David Yarowsky, Radu Florian, Fabio Crestani, Tamara Sumner, etc. are
considered competent referees.
• Marko A. Rodriguez, Johan Bollen, Herbert Van de Sompel, Xiaoming
Liu, Michael Nelson, etc. are considered conflict of interest referees.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
4. Outline
• Define the coauthorship network data structure.
• Define the particle-swarm algorithm.
• Present experimental results validating the proposed algorithm.
• Related work and conclusion.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
5. Outline
• Define the coauthorship network data structure.
• Define the particle-swarm algorithm.
• Present experimental results validating the proposed algorithm.
• Related work and conclusion.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
6. A Scholarly Coauthorship Network
Author-D
Author-B
Author-F
Author-C
Author-A
Author-E
All edges have a single homogenous meaning of “coauthor”. If Author-A
and Author-B have written an article together, then they are considered
coauthors.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
7. A Scholarly Coauthorship Network
Our coauthorship network is defined as
G = (V, E, ω),
where V is the set of vertices, E ⊆ (V × V ), and ω : E → R+. The
function rule for ω is
1
ω(i, j) = ω(j, i) → ,
α(m) − 1
∀m∈M by i,j
where M is the set of all manuscripts and α : M → N+ maps each
manuscript to the total number of authors for that manuscript. Thus, the
more authors on an article, the less “coauthor weight” exist between them
with respect to that article [5, 7, 6].
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
8. A Scholarly Coauthorship Network
Finally the weight of all edges outgoing from a vertex are normalized to
form a probability distribution over the outgoing edge set. Thus, for a
particular vertex i,
ω : E → [0, 1]
such that
ω(i, j)
ω (i, j) → ,
k ω(i, k)
where ω (i, j) need not equal ω (j, i).
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
9. Outline
• Define the coauthorship network data structure.
• Define the particle-swarm algorithm.
• Present experimental results validating the proposed algorithm.
• Related work and conclusion.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
10. A Particle Swarm Algorithm
0.75
1.0
0.5 0.25
1.0
0.5
1.0
1.0
t=1 t=2 t=3 t=4
A particle begins its journey at a particular vertex and will take an
outgoing edge of its current vertex biased by the outgoing probability
distribution defined over the outgoing edge set. Moreover, at each discrete
timestep in N+ the particle decays in energy.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
11. A Particle Swarm Algorithm
The set of all particles in the network is P where pi ∈ P is the ith particle.
The properties of an individual particle include:
1. ci(t) ∈ V : is the location of the particle pi at time t.
2. i (t)∈ R: is the amount of energy contained within the particle pi at
time t.
3. δi ∈ [0, 1]: is the decay parameter governing the loss of energy as the
particle pi propagates through the network. This is a globally defined
parameter in our experiment with decay set to ∀iδi = 0.15.
4. particles can maintain state and have heterogenous internal logics to
perform more complex walks.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
12. A Particle Swarm Algorithm
The algorithm runs for k timesteps. At each step, the particle has its
energy decayed such that
(1 − δi) i(t) if t ≤ k
i (t + 1) =
0 otherwise
Finally, there exists a global rank vector e ∈ R|V | that records how much
energy has passed through each vertex.
eci(t)(t + 1) = eci(t)(t) + i(t)
Thus,
t≤k i≤|P |
(1 − δi)t−1 i(1) if ci(t − 1) = nl
el(k) =
t=1 i=1
0 otherwise.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
13. Experimental Results
submission co-authorship
- network
-
author1
author2 -
-
reference1
reference2 + +
+
+
Authors of the submitted article have negative energy particles provided to
their corresponding vertex in the coauthorship network. The authors of the
referenced articles (i.e. cited authors) are provided positive energy
particles.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
14. Outline
• Define the coauthorship network data structure.
• Define the particle-swarm algorithm.
• Present experimental results validating the proposed algorithm.
• Related work and conclusion.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
15. Experimental Results
The DBLP provided us the data set from which to construct our
coauthorship network, which includes 284,082 authors and 2,167,018
coauthorship edges. The 2005 ACM/IEEE Joint Conference on Digital
Libraries provided us a their program committees referee bid data. That is,
for each of the 124 submitted manuscripts, each of the 77 program
committee members stated:
1. I am an expert in the domain of the submission and want to review
2. I am an expert in the domain of the submission
3. I am not an expert in the domain of the submission
4. There exists a conflict of interest
[1] ≈ [2] > [3] ≈ [4] ≈ 0.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
16. Experimental Results
[1] expert wanting to review (k=2) [2] expert (k=2)
200
30
150
frequency
frequency
20
100
5 10
50
0
0
!20 !15 !10 !5 0 !20 !15 !10 !5 0
log of the energy value log of the energy value
[3] non!expert (k=2) [4] conflict of interest (k=2)
100
25
80
20
frequency
frequency
60
15
40
10
20
5
0
0
!20 !15 !10 !5 0 !20 !15 !10 !5 0
log of the energy value log of the energy value
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
17. Experimental Results
0.4
average individual energy
0.3
0.2
(4) conflict of interest
0.1
(2) expert (1) expert wanting to review
(3) non!expert
0.0
0 1 2 3 4 5 6 7
k!steps of negative energy
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
18. Experimental Results
• Other types of relationships are involved in conflict of interest situations
besides previous article collaborations (e.g. same institution, friendship,
shared committees, etc.) [2, 8].
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
19. Outline
• Define the coauthorship network data structure.
• Define the particle-swarm algorithm.
• Present experimental results validating the proposed algorithm.
• Related work and conclusion.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
20. Related Work and Conclusion
• Latent semantic indexing to match manuscript abstract to referees [3, 11].
• Expertise identification via web mining techniques [1].
• Simply asking authors and the referees to provide keyterms describing their manuscript
and area of expertise respectively [4].
• Due to the computational and human intervention costs, applications of the mentioned
referee identification algorithms have been restricted to situations in which such
information can be obtained for a pre-selected set of individuals, e.g. conferences and
workshops.
• They have consequently failed to gain acceptance in the domain of classic journal
peer-review and open commentary peer-review.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
21. Related Work and Conclusion
• The proposed automatic referee identification algorithm requires no human intervention,
is computationally efficient, and can, to some extent, automatically identify conflict of
interest situations.
• The referee weighting aspect of the algorithm provides a strong incentive for its use
in open commentary peer-review. The level of automation provides the necessary
infrastructure to decouple the publication process from the peer-review process in the
sense that editors are no longer required to assign referees.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
22. Acknowledgements
• This research could not have been conducted if it were not for the
support of the 2005 JCDL program chair and steering committee.
• Herbert Van de Sompel supported this research through data acquisition.
• Journal of Memetics1 for using a prototype implementation of the
algorithm in their peer-review process.
• This research was financially supported by the Los Alamos National
Laboratory.
1
Journal of Memetics available at: http://www.jom-emit.org/
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
23. References
[1] C. Basu, H. Hirsh, W. Cohen, and C. Nevill-Manning. Technical paper
recommendation: A study in combining multiple information sources.
Journal of Artificial Intelligence Research, 14:231–252, 2001.
[2] Johan Bollen, Marko A. Rodriguez, Herbert Van de Sompel, Luda L.
Balakireva, and Aric Hagberg. The largest scholarly semantic
network...ever. In ACM World Wide Web Conference, Banff, Canada,
Banff, Canada 2007. ACM Press.
[3] Susan T. Dumais and Jakob Nielsen. Automating the assignment of
submitted manuscripts to reviewers. In SIGIR ’92: Proceedings of
the 15th annual international ACM SIGIR conference on Research and
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
24. development in information retrieval, pages 233–244, Copenhagen,
Denmark, 1992. ACM Press.
[4] Juan J. Merelo Guerv´s and Pedro A. Castillo Valdivieso. Conference
o
paper assignment using a combined greedy/evolutionary algorithm.
In Proceedings of the International Conference on Parallel Problem
Solving from Nature, pages 602–611, Birmingham, UK, 2004.
[5] Xiaoming Liu, Johan Bollen, Michael L. Nelson, and Herbert Van
de Sompel. Co-authorship networks in the digital library research
community. Information Processing and Management, 41(6):1462–
1480, 2006.
[6] M E J Newman. Scientific collaboration networks: I. network
construction and fundamental results. Physical Review E,
64(1):016131, 2001.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
25. [7] M E J Newman. Scientific collaboration networks: Ii. shortest paths,
weighted networks, and centrality. Physical Review E, 64(1):016132,
2001.
[8] Marko A. Rodriguez. Grammar-based random walkers in semantic
networks. Knowledge-Based Systems, 21(7):727–739, 2008.
[9] Marko A. Rodriguez, Johan Bollen, and Herbert Van de Sompel.
An algorithm to determine peer-reviewers. In Proceedings of the
Conference on Information and Knowledge Management, Napa,
California, October 2008. ACM Press.
[10] Marko A. Rodriguez, Johan Bollen, and Herbert Van de Sompel. The
convergence of digital-libraries and the peer-review process. Journal of
Information Science, 32(2):151–161, 2006.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
26. [11] D. Yarowsky and R. Florian. Taking the load off the conference
chairs: towards a digital paper-routing assistant. In Proceedings of
the 1999 Joint SIGDAT Conference on Empirical Methods in NLP and
Very-Large Corpora., Cambridge, MA, 1999.
Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008