An Algorithm to Determine Peer-Reviewers



        Marko A. Rodriguez and Johan Bollen
    Digital Library Research and P...
Peer-Review Problem Statement

• Editors are overwhelmed due to the number of submissions.
    Provide mechanisms to decen...
Hypothesis

• It is hypothesized that the authors of the cited articles and their coauthors
  are good referees.
• It is h...
Outline

• Define the coauthorship network data structure.

• Define the particle-swarm algorithm.

• Present experimental r...
Outline

• Define the coauthorship network data structure.

• Define the particle-swarm algorithm.

• Present experimental r...
A Scholarly Coauthorship Network

                                                       Author-D
                        ...
A Scholarly Coauthorship Network
Our coauthorship network is defined as

                                      G = (V, E, ω...
A Scholarly Coauthorship Network

Finally the weight of all edges outgoing from a vertex are normalized to
form a probabil...
Outline

• Define the coauthorship network data structure.

• Define the particle-swarm algorithm.

• Present experimental r...
A Particle Swarm Algorithm


                                                 0.75


                                     ...
A Particle Swarm Algorithm
The set of all particles in the network is P where pi ∈ P is the ith particle.
The properties o...
A Particle Swarm Algorithm
The algorithm runs for k timesteps. At each step, the particle has its
energy decayed such that...
Experimental Results
                  submission                                   co-authorship
                        ...
Outline

• Define the coauthorship network data structure.

• Define the particle-swarm algorithm.

• Present experimental r...
Experimental Results
The DBLP provided us the data set from which to construct our
coauthorship network, which includes 28...
Experimental Results
                       [1] expert wanting to review (k=2)                                            ...
Experimental Results


                            0.4
average individual energy
                            0.3
         ...
Experimental Results

• Other types of relationships are involved in conflict of interest situations
  besides previous art...
Outline

• Define the coauthorship network data structure.

• Define the particle-swarm algorithm.

• Present experimental r...
Related Work and Conclusion

• Latent semantic indexing to match manuscript abstract to referees [3, 11].
• Expertise iden...
Related Work and Conclusion

• The proposed automatic referee identification algorithm requires no human intervention,
  is...
Acknowledgements

• This research could not have been conducted if it were not for the
  support of the 2005 JCDL program ...
References

[1] C. Basu, H. Hirsh, W. Cohen, and C. Nevill-Manning. Technical paper
    recommendation: A study in combini...
development in information retrieval, pages 233–244, Copenhagen,
   Denmark, 1992. ACM Press.

[4] Juan J. Merelo Guerv´s ...
[7] M E J Newman. Scientific collaboration networks: Ii. shortest paths,
     weighted networks, and centrality. Physical R...
[11] D. Yarowsky and R. Florian. Taking the load off the conference
     chairs: towards a digital paper-routing assistant....
Upcoming SlideShare
Loading in...5
×

An Algorithm to Determine Peer-Reviewers

1,791

Published on

• Editors are overwhelmed due to the number of submissions. ⋆ Provide mechanisms to decentralize the peer-review process.
• Editors have a difficult time locating referees who know the domain of discourse and do not have a ethical conflict with reviewing the submission.
⋆ Automate the referee identification problem.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,791
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

An Algorithm to Determine Peer-Reviewers

  1. 1. An Algorithm to Determine Peer-Reviewers Marko A. Rodriguez and Johan Bollen Digital Library Research and Prototyping Team T-7, Center for Nonlinear Studies Los Alamos National Laboratory October 25, 2008
  2. 2. Peer-Review Problem Statement • Editors are overwhelmed due to the number of submissions. Provide mechanisms to decentralize the peer-review process [10]. • Editors have a difficult time locating referees who know the domain of discourse and do not have a ethical conflict with reviewing the submission. Automate the referee identification problem [9]. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  3. 3. Hypothesis • It is hypothesized that the authors of the cited articles and their coauthors are good referees. • It is hypothesis that conflict of interest referees are considered the authors of the article and their coauthors. With respect to the article associated with this presentation: • David Yarowsky, Radu Florian, Fabio Crestani, Tamara Sumner, etc. are considered competent referees. • Marko A. Rodriguez, Johan Bollen, Herbert Van de Sompel, Xiaoming Liu, Michael Nelson, etc. are considered conflict of interest referees. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  4. 4. Outline • Define the coauthorship network data structure. • Define the particle-swarm algorithm. • Present experimental results validating the proposed algorithm. • Related work and conclusion. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  5. 5. Outline • Define the coauthorship network data structure. • Define the particle-swarm algorithm. • Present experimental results validating the proposed algorithm. • Related work and conclusion. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  6. 6. A Scholarly Coauthorship Network Author-D Author-B Author-F Author-C Author-A Author-E All edges have a single homogenous meaning of “coauthor”. If Author-A and Author-B have written an article together, then they are considered coauthors. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  7. 7. A Scholarly Coauthorship Network Our coauthorship network is defined as G = (V, E, ω), where V is the set of vertices, E ⊆ (V × V ), and ω : E → R+. The function rule for ω is 1 ω(i, j) = ω(j, i) → , α(m) − 1 ∀m∈M by i,j where M is the set of all manuscripts and α : M → N+ maps each manuscript to the total number of authors for that manuscript. Thus, the more authors on an article, the less “coauthor weight” exist between them with respect to that article [5, 7, 6]. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  8. 8. A Scholarly Coauthorship Network Finally the weight of all edges outgoing from a vertex are normalized to form a probability distribution over the outgoing edge set. Thus, for a particular vertex i, ω : E → [0, 1] such that ω(i, j) ω (i, j) → , k ω(i, k) where ω (i, j) need not equal ω (j, i). Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  9. 9. Outline • Define the coauthorship network data structure. • Define the particle-swarm algorithm. • Present experimental results validating the proposed algorithm. • Related work and conclusion. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  10. 10. A Particle Swarm Algorithm 0.75 1.0 0.5 0.25 1.0 0.5 1.0 1.0 t=1 t=2 t=3 t=4 A particle begins its journey at a particular vertex and will take an outgoing edge of its current vertex biased by the outgoing probability distribution defined over the outgoing edge set. Moreover, at each discrete timestep in N+ the particle decays in energy. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  11. 11. A Particle Swarm Algorithm The set of all particles in the network is P where pi ∈ P is the ith particle. The properties of an individual particle include: 1. ci(t) ∈ V : is the location of the particle pi at time t. 2. i (t)∈ R: is the amount of energy contained within the particle pi at time t. 3. δi ∈ [0, 1]: is the decay parameter governing the loss of energy as the particle pi propagates through the network. This is a globally defined parameter in our experiment with decay set to ∀iδi = 0.15. 4. particles can maintain state and have heterogenous internal logics to perform more complex walks. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  12. 12. A Particle Swarm Algorithm The algorithm runs for k timesteps. At each step, the particle has its energy decayed such that (1 − δi) i(t) if t ≤ k i (t + 1) = 0 otherwise Finally, there exists a global rank vector e ∈ R|V | that records how much energy has passed through each vertex. eci(t)(t + 1) = eci(t)(t) + i(t) Thus, t≤k i≤|P | (1 − δi)t−1 i(1) if ci(t − 1) = nl el(k) = t=1 i=1 0 otherwise. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  13. 13. Experimental Results submission co-authorship - network - author1 author2 - - reference1 reference2 + + + + Authors of the submitted article have negative energy particles provided to their corresponding vertex in the coauthorship network. The authors of the referenced articles (i.e. cited authors) are provided positive energy particles. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  14. 14. Outline • Define the coauthorship network data structure. • Define the particle-swarm algorithm. • Present experimental results validating the proposed algorithm. • Related work and conclusion. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  15. 15. Experimental Results The DBLP provided us the data set from which to construct our coauthorship network, which includes 284,082 authors and 2,167,018 coauthorship edges. The 2005 ACM/IEEE Joint Conference on Digital Libraries provided us a their program committees referee bid data. That is, for each of the 124 submitted manuscripts, each of the 77 program committee members stated: 1. I am an expert in the domain of the submission and want to review 2. I am an expert in the domain of the submission 3. I am not an expert in the domain of the submission 4. There exists a conflict of interest [1] ≈ [2] > [3] ≈ [4] ≈ 0. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  16. 16. Experimental Results [1] expert wanting to review (k=2) [2] expert (k=2) 200 30 150 frequency frequency 20 100 5 10 50 0 0 !20 !15 !10 !5 0 !20 !15 !10 !5 0 log of the energy value log of the energy value [3] non!expert (k=2) [4] conflict of interest (k=2) 100 25 80 20 frequency frequency 60 15 40 10 20 5 0 0 !20 !15 !10 !5 0 !20 !15 !10 !5 0 log of the energy value log of the energy value Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  17. 17. Experimental Results 0.4 average individual energy 0.3 0.2 (4) conflict of interest 0.1 (2) expert (1) expert wanting to review (3) non!expert 0.0 0 1 2 3 4 5 6 7 k!steps of negative energy Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  18. 18. Experimental Results • Other types of relationships are involved in conflict of interest situations besides previous article collaborations (e.g. same institution, friendship, shared committees, etc.) [2, 8]. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  19. 19. Outline • Define the coauthorship network data structure. • Define the particle-swarm algorithm. • Present experimental results validating the proposed algorithm. • Related work and conclusion. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  20. 20. Related Work and Conclusion • Latent semantic indexing to match manuscript abstract to referees [3, 11]. • Expertise identification via web mining techniques [1]. • Simply asking authors and the referees to provide keyterms describing their manuscript and area of expertise respectively [4]. • Due to the computational and human intervention costs, applications of the mentioned referee identification algorithms have been restricted to situations in which such information can be obtained for a pre-selected set of individuals, e.g. conferences and workshops. • They have consequently failed to gain acceptance in the domain of classic journal peer-review and open commentary peer-review. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  21. 21. Related Work and Conclusion • The proposed automatic referee identification algorithm requires no human intervention, is computationally efficient, and can, to some extent, automatically identify conflict of interest situations. • The referee weighting aspect of the algorithm provides a strong incentive for its use in open commentary peer-review. The level of automation provides the necessary infrastructure to decouple the publication process from the peer-review process in the sense that editors are no longer required to assign referees. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  22. 22. Acknowledgements • This research could not have been conducted if it were not for the support of the 2005 JCDL program chair and steering committee. • Herbert Van de Sompel supported this research through data acquisition. • Journal of Memetics1 for using a prototype implementation of the algorithm in their peer-review process. • This research was financially supported by the Los Alamos National Laboratory. 1 Journal of Memetics available at: http://www.jom-emit.org/ Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  23. 23. References [1] C. Basu, H. Hirsh, W. Cohen, and C. Nevill-Manning. Technical paper recommendation: A study in combining multiple information sources. Journal of Artificial Intelligence Research, 14:231–252, 2001. [2] Johan Bollen, Marko A. Rodriguez, Herbert Van de Sompel, Luda L. Balakireva, and Aric Hagberg. The largest scholarly semantic network...ever. In ACM World Wide Web Conference, Banff, Canada, Banff, Canada 2007. ACM Press. [3] Susan T. Dumais and Jakob Nielsen. Automating the assignment of submitted manuscripts to reviewers. In SIGIR ’92: Proceedings of the 15th annual international ACM SIGIR conference on Research and Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  24. 24. development in information retrieval, pages 233–244, Copenhagen, Denmark, 1992. ACM Press. [4] Juan J. Merelo Guerv´s and Pedro A. Castillo Valdivieso. Conference o paper assignment using a combined greedy/evolutionary algorithm. In Proceedings of the International Conference on Parallel Problem Solving from Nature, pages 602–611, Birmingham, UK, 2004. [5] Xiaoming Liu, Johan Bollen, Michael L. Nelson, and Herbert Van de Sompel. Co-authorship networks in the digital library research community. Information Processing and Management, 41(6):1462– 1480, 2006. [6] M E J Newman. Scientific collaboration networks: I. network construction and fundamental results. Physical Review E, 64(1):016131, 2001. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  25. 25. [7] M E J Newman. Scientific collaboration networks: Ii. shortest paths, weighted networks, and centrality. Physical Review E, 64(1):016132, 2001. [8] Marko A. Rodriguez. Grammar-based random walkers in semantic networks. Knowledge-Based Systems, 21(7):727–739, 2008. [9] Marko A. Rodriguez, Johan Bollen, and Herbert Van de Sompel. An algorithm to determine peer-reviewers. In Proceedings of the Conference on Information and Knowledge Management, Napa, California, October 2008. ACM Press. [10] Marko A. Rodriguez, Johan Bollen, and Herbert Van de Sompel. The convergence of digital-libraries and the peer-review process. Journal of Information Science, 32(2):151–161, 2006. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  26. 26. [11] D. Yarowsky and R. Florian. Taking the load off the conference chairs: towards a digital paper-routing assistant. In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in NLP and Very-Large Corpora., Cambridge, MA, 1999. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×