An Algorithm to Determine Peer-Reviewers

  • 1,642 views
Uploaded on

• Editors are overwhelmed due to the number of submissions. ⋆ Provide mechanisms to decentralize the peer-review process. …

• Editors are overwhelmed due to the number of submissions. ⋆ Provide mechanisms to decentralize the peer-review process.
• Editors have a difficult time locating referees who know the domain of discourse and do not have a ethical conflict with reviewing the submission.
⋆ Automate the referee identification problem.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,642
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
19
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. An Algorithm to Determine Peer-Reviewers Marko A. Rodriguez and Johan Bollen Digital Library Research and Prototyping Team T-7, Center for Nonlinear Studies Los Alamos National Laboratory October 25, 2008
  • 2. Peer-Review Problem Statement • Editors are overwhelmed due to the number of submissions. Provide mechanisms to decentralize the peer-review process [10]. • Editors have a difficult time locating referees who know the domain of discourse and do not have a ethical conflict with reviewing the submission. Automate the referee identification problem [9]. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 3. Hypothesis • It is hypothesized that the authors of the cited articles and their coauthors are good referees. • It is hypothesis that conflict of interest referees are considered the authors of the article and their coauthors. With respect to the article associated with this presentation: • David Yarowsky, Radu Florian, Fabio Crestani, Tamara Sumner, etc. are considered competent referees. • Marko A. Rodriguez, Johan Bollen, Herbert Van de Sompel, Xiaoming Liu, Michael Nelson, etc. are considered conflict of interest referees. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 4. Outline • Define the coauthorship network data structure. • Define the particle-swarm algorithm. • Present experimental results validating the proposed algorithm. • Related work and conclusion. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 5. Outline • Define the coauthorship network data structure. • Define the particle-swarm algorithm. • Present experimental results validating the proposed algorithm. • Related work and conclusion. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 6. A Scholarly Coauthorship Network Author-D Author-B Author-F Author-C Author-A Author-E All edges have a single homogenous meaning of “coauthor”. If Author-A and Author-B have written an article together, then they are considered coauthors. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 7. A Scholarly Coauthorship Network Our coauthorship network is defined as G = (V, E, ω), where V is the set of vertices, E ⊆ (V × V ), and ω : E → R+. The function rule for ω is 1 ω(i, j) = ω(j, i) → , α(m) − 1 ∀m∈M by i,j where M is the set of all manuscripts and α : M → N+ maps each manuscript to the total number of authors for that manuscript. Thus, the more authors on an article, the less “coauthor weight” exist between them with respect to that article [5, 7, 6]. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 8. A Scholarly Coauthorship Network Finally the weight of all edges outgoing from a vertex are normalized to form a probability distribution over the outgoing edge set. Thus, for a particular vertex i, ω : E → [0, 1] such that ω(i, j) ω (i, j) → , k ω(i, k) where ω (i, j) need not equal ω (j, i). Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 9. Outline • Define the coauthorship network data structure. • Define the particle-swarm algorithm. • Present experimental results validating the proposed algorithm. • Related work and conclusion. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 10. A Particle Swarm Algorithm 0.75 1.0 0.5 0.25 1.0 0.5 1.0 1.0 t=1 t=2 t=3 t=4 A particle begins its journey at a particular vertex and will take an outgoing edge of its current vertex biased by the outgoing probability distribution defined over the outgoing edge set. Moreover, at each discrete timestep in N+ the particle decays in energy. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 11. A Particle Swarm Algorithm The set of all particles in the network is P where pi ∈ P is the ith particle. The properties of an individual particle include: 1. ci(t) ∈ V : is the location of the particle pi at time t. 2. i (t)∈ R: is the amount of energy contained within the particle pi at time t. 3. δi ∈ [0, 1]: is the decay parameter governing the loss of energy as the particle pi propagates through the network. This is a globally defined parameter in our experiment with decay set to ∀iδi = 0.15. 4. particles can maintain state and have heterogenous internal logics to perform more complex walks. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 12. A Particle Swarm Algorithm The algorithm runs for k timesteps. At each step, the particle has its energy decayed such that (1 − δi) i(t) if t ≤ k i (t + 1) = 0 otherwise Finally, there exists a global rank vector e ∈ R|V | that records how much energy has passed through each vertex. eci(t)(t + 1) = eci(t)(t) + i(t) Thus, t≤k i≤|P | (1 − δi)t−1 i(1) if ci(t − 1) = nl el(k) = t=1 i=1 0 otherwise. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 13. Experimental Results submission co-authorship - network - author1 author2 - - reference1 reference2 + + + + Authors of the submitted article have negative energy particles provided to their corresponding vertex in the coauthorship network. The authors of the referenced articles (i.e. cited authors) are provided positive energy particles. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 14. Outline • Define the coauthorship network data structure. • Define the particle-swarm algorithm. • Present experimental results validating the proposed algorithm. • Related work and conclusion. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 15. Experimental Results The DBLP provided us the data set from which to construct our coauthorship network, which includes 284,082 authors and 2,167,018 coauthorship edges. The 2005 ACM/IEEE Joint Conference on Digital Libraries provided us a their program committees referee bid data. That is, for each of the 124 submitted manuscripts, each of the 77 program committee members stated: 1. I am an expert in the domain of the submission and want to review 2. I am an expert in the domain of the submission 3. I am not an expert in the domain of the submission 4. There exists a conflict of interest [1] ≈ [2] > [3] ≈ [4] ≈ 0. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 16. Experimental Results [1] expert wanting to review (k=2) [2] expert (k=2) 200 30 150 frequency frequency 20 100 5 10 50 0 0 !20 !15 !10 !5 0 !20 !15 !10 !5 0 log of the energy value log of the energy value [3] non!expert (k=2) [4] conflict of interest (k=2) 100 25 80 20 frequency frequency 60 15 40 10 20 5 0 0 !20 !15 !10 !5 0 !20 !15 !10 !5 0 log of the energy value log of the energy value Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 17. Experimental Results 0.4 average individual energy 0.3 0.2 (4) conflict of interest 0.1 (2) expert (1) expert wanting to review (3) non!expert 0.0 0 1 2 3 4 5 6 7 k!steps of negative energy Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 18. Experimental Results • Other types of relationships are involved in conflict of interest situations besides previous article collaborations (e.g. same institution, friendship, shared committees, etc.) [2, 8]. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 19. Outline • Define the coauthorship network data structure. • Define the particle-swarm algorithm. • Present experimental results validating the proposed algorithm. • Related work and conclusion. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 20. Related Work and Conclusion • Latent semantic indexing to match manuscript abstract to referees [3, 11]. • Expertise identification via web mining techniques [1]. • Simply asking authors and the referees to provide keyterms describing their manuscript and area of expertise respectively [4]. • Due to the computational and human intervention costs, applications of the mentioned referee identification algorithms have been restricted to situations in which such information can be obtained for a pre-selected set of individuals, e.g. conferences and workshops. • They have consequently failed to gain acceptance in the domain of classic journal peer-review and open commentary peer-review. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 21. Related Work and Conclusion • The proposed automatic referee identification algorithm requires no human intervention, is computationally efficient, and can, to some extent, automatically identify conflict of interest situations. • The referee weighting aspect of the algorithm provides a strong incentive for its use in open commentary peer-review. The level of automation provides the necessary infrastructure to decouple the publication process from the peer-review process in the sense that editors are no longer required to assign referees. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 22. Acknowledgements • This research could not have been conducted if it were not for the support of the 2005 JCDL program chair and steering committee. • Herbert Van de Sompel supported this research through data acquisition. • Journal of Memetics1 for using a prototype implementation of the algorithm in their peer-review process. • This research was financially supported by the Los Alamos National Laboratory. 1 Journal of Memetics available at: http://www.jom-emit.org/ Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 23. References [1] C. Basu, H. Hirsh, W. Cohen, and C. Nevill-Manning. Technical paper recommendation: A study in combining multiple information sources. Journal of Artificial Intelligence Research, 14:231–252, 2001. [2] Johan Bollen, Marko A. Rodriguez, Herbert Van de Sompel, Luda L. Balakireva, and Aric Hagberg. The largest scholarly semantic network...ever. In ACM World Wide Web Conference, Banff, Canada, Banff, Canada 2007. ACM Press. [3] Susan T. Dumais and Jakob Nielsen. Automating the assignment of submitted manuscripts to reviewers. In SIGIR ’92: Proceedings of the 15th annual international ACM SIGIR conference on Research and Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 24. development in information retrieval, pages 233–244, Copenhagen, Denmark, 1992. ACM Press. [4] Juan J. Merelo Guerv´s and Pedro A. Castillo Valdivieso. Conference o paper assignment using a combined greedy/evolutionary algorithm. In Proceedings of the International Conference on Parallel Problem Solving from Nature, pages 602–611, Birmingham, UK, 2004. [5] Xiaoming Liu, Johan Bollen, Michael L. Nelson, and Herbert Van de Sompel. Co-authorship networks in the digital library research community. Information Processing and Management, 41(6):1462– 1480, 2006. [6] M E J Newman. Scientific collaboration networks: I. network construction and fundamental results. Physical Review E, 64(1):016131, 2001. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 25. [7] M E J Newman. Scientific collaboration networks: Ii. shortest paths, weighted networks, and centrality. Physical Review E, 64(1):016132, 2001. [8] Marko A. Rodriguez. Grammar-based random walkers in semantic networks. Knowledge-Based Systems, 21(7):727–739, 2008. [9] Marko A. Rodriguez, Johan Bollen, and Herbert Van de Sompel. An algorithm to determine peer-reviewers. In Proceedings of the Conference on Information and Knowledge Management, Napa, California, October 2008. ACM Press. [10] Marko A. Rodriguez, Johan Bollen, and Herbert Van de Sompel. The convergence of digital-libraries and the peer-review process. Journal of Information Science, 32(2):151–161, 2006. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008
  • 26. [11] D. Yarowsky and R. Florian. Taking the load off the conference chairs: towards a digital paper-routing assistant. In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in NLP and Very-Large Corpora., Cambridge, MA, 1999. Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008