2. 2 D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022
upon the domain in which a given journal publishes an article. Journals which publish article belonging to same domain and
publisher can hire reviewers accordingly. Given additional constraints in conferences, the manual assignment of a paper to
a set of matching expertise reviewers by program chair is both time consuming and inefficient.
Since 1992, the research community studied the RAP problem and modeled it considering several factors to obtain an
assignment (Dumais & Nielsen, 1992; Price & Flach, 2017; Resnick & Varian, 1997; Wei & Croft, 2006). In recent years, many of
the conference commonly use Conference Management Systems (CMS) like ‘EasyChair’,1 ‘Microsoft Conference Management
Toolkit’ (Microsoft CMT)2 and ‘CyberChair’3 where authors submit their paper along with list of keywords and co-authors.
Reviewers as a part of the Technical Program Committee (TPC) member select the list of topics that match their expertise
and also declare the Conflict of Interest (CoI). Broadly, if a reviewer is given to review either his/her own or co-author’s
or a submission made by an author from the same research group (collaborative network) or same affiliation then, the
phenomenon of CoI occurs. Finally, the Program Chair (PC) takes into consideration self-declared expertise matching and CoI
for either manual assignment of each paper to a set of matching reviewers. They can also opt for an automated assignment
system. In this way, the widely used CMS works. Toronto Paper Matching System (TPMS) (Charlin & Zemel, 2013) is an
automated assignment strategy that got integrated with Microsoft CMT since 2012. It uses Latent Dirichlet Allocation (LDA)
technique (Blei, Ng, & Jordan, 2003) for topic modeling and a bayesian based scoring model similar to the vector space model
(Charlin & Zemel, 2013) to perform highest expertise reviewer matching to papers. Other matching systems include SubSift
used in SIGKDD’09 and MLj for machine learning journals since 2010 (Price & Flach, 2017). In addition to existing systems,
there exist a large volume of research work dealing with automatic RAP.
Initial researches on RAP include expertise matching extracted from reviewer’s preferences with the topics explored in
the submitted paper (Charlin, Zemel, & Boutilier, 2011; Kou, Hou U, et al., 2015; Tang et al., 2012; Tang, Tang, & Tan, 2010).
Other important parameters include direct CoI and the reviewer’s workload. Direct CoI is derived from either reviewer self-
declaration or co-authorship relation; that is, a reviewer cannot review his/her own or co-author’s paper. Recent studies
report on the issue of reviewer biases occurring due to various CoI (Okike, Hug, Kocher, & Leopold, 2016; Roberts & Verhoef,
2016; Schulzrinne, 2009; Tomkins, Zhang, & Heavlin, 2017; Wang, Liu, Zhang, Jiang, & Sun, 2019). For example, (Tomkins
et al., 2017) reports that in the single-blind review process, there exist reviewer biases where reviewers make use of infor-
mation about the author and their institution and companies before bidding papers. After analyzing data of the International
Conference on Web Search and Data Mining (WSDM 2017), they report that the total number of reviewers for whom CoI is
detected in single and double-blind review settings are 59/121 and 47/121 respectively.
Automated assignment using Microsoft CMT derives the CoI factor by self-declared reviewer interest, whereas ‘EasyChair’
tries to derive it by matching the email domain of reviewer and authors. However, another critical factor is to assess social
and academic relations between a reviewer candidate and author of the submitted paper. A recent study by (Yan, Jin, Geng,
Zhao, & Huang, 2017) considers all the potential CoI’s and derives it from two relations: an academic network of a researcher
to researcher and institution to institution. Researcher to researcher relation considers co-author, colleague, and advisor-
advisee relationships. Institution to institution relation considers co-operation and co-worker relations. Other studies have
modeled factors like the authority and diversity of a group of expert reviewers. The potential CoI are extracted from academic
and social networks by Yin, Cui, Lu, and Zhao (2016), which also uses the same trivial combination of factors as proposed
in this paper. However, name ambiguity is an issue when the social relations of a reviewer are extracted. In contrast, we
find that the co-authorship graphs that can be extracted from academic databases still provide basic filtering for the author
name ambiguity problem. Thus, deriving CoI from a social network is not a relevant option. Hence, it suggests that either the
review process be completely double-blind or before assignment CoI is a critical issue that needs to be taken into account.
Several methodologies for solving RAP include the following: information mining from the web (Hettich & Pazzani, 2006;
Wei & Croft, 2006), latent semantic indexing (Dumais & Nielsen, 1992), probabilistic topic modeling (Karimzadehgan & Zhai,
2009; Mimno & McCallum, 2007), greedy solutions (Kou, Hou, Mamoulis, & Gong, 2015; Long, Wong, Peng, & Ye, 2013; Yin
et al., 2016), using fuzzy functions (Tayal, Saxena, Sharma, Khanna, & Gupta, 2014), integer linear programming (Jin, Niu, Ji, &
Geng, 2018), minimum cost flow (Tang et al., 2010), recommender systems (Conry, Koren, & Ramakrishnan, 2009; Resnick &
Varian, 1997) and a hybrid approach of domain knowledge and matching model (Sun, Ma, Fan, & Wang, 2007). However, most
of the existing methodologies mainly focus on how to calculate (or rank) the experts accurately for each query. They mostly
ignore the different constraints or how to tackle the restrictions with heuristics, which results in an approximate (or even
inaccurate) solution (Yin et al., 2016).
The primary motivation of this work is to improve upon the performance of existing assignment strategies widely used
by CMS. Existing studies mostly propose a theoretical framework (Kolasa & Król, 2010; Kou, Hou, et al., 2015; Liu, Suel, &
Memon, 2014; Long et al., 2013; Tang et al., 2010, 2012; Tayal et al., 2014; Yin et al., 2016) and few works have also given
robust empirical studies (Charlin & Zemel, 2013; Conry et al., 2009; Zhao et al., 2018). In this paper, firstly, our objective is
to model the three trivial factors collectively before the assignment. Independently, for each reviewer, we aim to maximize
profit by maximizing topic similarity, minimizing CoI and balancing reviewer’s workload in a constraint-based framework.
Secondly, we see how we can map the RAP to an equilibrium model of a multi-job assignment problem and see what the
1
https://easychair.org/.
2
https://cmt3.research.microsoft.com/About.
3
http://borbala.com/cyberchair/.
3. D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022 3
additional constraints and challenges in doing it are? We aim to formulate it so that it can be practically implementable.
Besides, no quantifier exists to evaluate the performance of assignments in terms of accuracy.
The innovation of this paper is at least three-fold. Firstly, we formalize the RAP as a equilibrium multi-job assignment
problem. It is because, concerning a single reviewer, the objective is to obtain maximum total profit. The profit is calculated
collectively in terms of maximizing topic similarity, minimizing CoI, and balancing the reviewer’s workload. Thus, first, we
individually optimize the three factors and then, solve the overall maximization-type equilibrium multi-job RAP problem.
Independently, the factors are calculated as follows. The topic extraction and similarity scores are obtained following the
same strategy as used by (Charlin & Zemel, 2013) (TPMS). Next, we explore if the CoI factor can be automatically modeled from
the recent academic databases. For this, we only use the co-authorship graph as the name ambiguity issue is resolved in them.
Finally, we propose a new quantifier for balancing the reviewer’s workload after each assignment. Secondly, we propose a
meta-heuristic, i.e., weighted-matrix factorization based greedy algorithm to solve it. Briefly, if m reviewers are denoted by
a set R = R1, R2, . . ., Rm and n papers are denoted by a set P = P1, P2, . . ., Pn then, the objective is to obtain a minimum edge
cover set (subset of edges set) in a complete bipartite graph G = (R, P). As a proof of concept, we present a comparative study
with two widely used CMS; EasyChair and Microsoft CMT. The data set contains a complete conference assignment data set
of ‘International Conference on Business and Information Management (ICBIM), 2016’ collected from ‘EasyChair’. The EasyChair
data has an actual assignment between an accepted paper and a reviewer. Since ICBIM 2016 was conducted using the
‘EasyChair’ system, we could not use the same data and test it on another interface for the experiment to obtain assignment
using ‘Microsoft CMT’ and others. However, we implement the automated assignment methodology used by Microsoft CMT,
that is, TPMS (Charlin & Zemel, 2013). The choice of using TPMS as a baseline is due to its wide usage in real-conferences after
its integration with Microsoft CMT. Due to the scarcity of data sets for validating the performance of assignments of RAP as
reported by (Yan et al., 2017), we only use a single data set. Lastly, we define and validate a new metric an assignment quality
metric to evaluate the performance of assignments in such a framework. We see a significant difference in mean assignment
quality between the three groups. Our proposed method gives the assignment of superior quality. Comparatively, it gives
consistent topic similarity scores, higher co-authorship distance value assignments for almost all papers.
The organization of the paper is as follows, at first, we give a brief introduction. In Section 2, we describe the current
state of art proposed in this context for solving RAP. In Section 3, we describe the methodology and how we extract the
factors required before running the algorithm. In Section 4, we formulate the assignment problem. In Section 5, we present
weighted average based greedy approximation algorithm maximizing topic similarity, minimizing CoI and balancing reviewer
workload. In Section 6, we test run the proposed method on a real conference data set collected from EasyChair and also,
implement another automated assignment strategies as considered by ‘TPMS’. In Section 7, we present a comparative study
of assignment quality using the proposed method with ‘EasyChair’ and ‘TPMS’.
2. Literature survey
Since 1992, the following approaches are proposed to solve RAP. The first approach is query-based information retrieval
methods (Dumais & Nielsen, 1992; Zablocki & Lee, 2012). A paper is used as a query, and a text document represents each
reviewer in the database. The document contains information about his field of expertise or publications. For a given sub-
mitted paper, the problem is to retrieve the most relevant set of reviewers from the database. A major drawback is that
the retrieval process, which makes an assignment between a reviewer and a paper, has to be done independently for each
paper. The first automated solution to RAP was given by (Dumais & Nielsen, 1992) using information retrieval based method,
Latent Semantic Indexing (LSI). The difference in such methods is the use of different technique such as LSI (Dumais &
Nielsen, 1992; Zablocki & Lee, 2012), Topic model (Blei et al., 2003), Vector Space Model and other models (Boyack, van Eck,
Colavizza, & Waltman, 2018; Karimzadehgan & Zhai, 2009; Silva, Amancio, Bardosova, Costa, & Oliveira, 2016), etc. Due to
common drawbacks such as the uneven distribution of papers among reviewers where reviewer workload is not balanced
and the order in which paper is assigned, such methods became in-efficient in solving RAP (Wei & Croft, 2006). Also, these
methods are heuristic-based with no objective for optimized assignment.
The second approach proposes RAP as a matching problem which is further solved using optimization techniques. First step
is to construct a weighted bipartite graph between paper and reviewer set where, the weight of an edge denotes relevance
between a paper and a reviewer (Charlin et al., 2011). The second step is to derive a matching in form of final assignment
such that a given objective function is satisfied considering several constraints such as each paper is reviewed by a certain
number of reviewers and each reviewer’s workload is balanced. As a result of which recent studies have taken up matching
based methods between paper and reviewer set (Kou, Hou, et al., 2015; Long et al., 2013; Tang et al., 2010, 2012). (Tayal
et al., 2014) captures the inherent imprecision of this NP-Hard problem by creating a type-2 fuzzy set and assigning relevant
weights based on matching the expertise of reviewer. Several other works have also proposed hybrid models using Genetic
Algorithm, Ant Colony Optimization and Tab Search (Kolasa & Król, 2010; Kolasa & Krol, 2011; Wang et al., 2019; Wang,
Zhou, & Shi, 2013), optimal solution approach towards implementing convex cost flow problem in RAP (Tang et al., 2010).
Review quality is optimized by modelling factors such as, maximizing topic matching (Kou, Hou U, et al., 2015; Liu et al.,
2014; Li & Watanabe, 2013), minimizing potential CoI (Xu, Zhao, Shi, & Shah, 2018; Yan et al., 2017), maximizing diversity
of opinion and considering authority of a reviewer (Liu et al., 2014; Yin et al., 2016). (Long et al., 2013) investigates the
effect of CoI in RAP after the assignment, which, however, did not consider CoI as an input factor. A recent work by (Yan
et al., 2017) presents a significant contribution to identify other potential conflicts of interest from researcher to researcher
4. 4 D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022
(co-author, colleague, and advisor-advisee) and institution to institution (co-operation or collaboration between members
of two institute and co-worker) relations respectively. Another work by (Yin et al., 2016) calculates the COI from co-author
and social relationships that are derived from academic and social networks, respectively. However, name ambiguity is
an issue when the social relations of a reviewer are extracted. In contrast, co-authorship graphs extracted from academic
databases still provide basic filtering for the author name ambiguity problem. Similarly, (Tang et al., 2010) assigns the COI
avoidance value of 0 in two cases. A reviewer cannot review his/her work and his/her co-author’s paper. The last five years
of co-author relations and the current organization of the reviewer is taken into account for obtaining the COI matrix.
The third approach is use of feature-based machine learning models for an automated paper to reviewer assignment (Charlin
& Zemel, 2013). In recent years, in order to improve the conventional peer-review process, several methods such as feature
weighting, selection, and construction are used to fine-tune the score matrix formed between a paper and reviewer set before
the final assignment. Along with this scoring matrix, probabilistic models are used for deciding on the final assignment (Price
& Flach, 2017; Zhao et al., 2018). Widely used in more than 3500 conferences after being integrated with Microsoft CMT,
Toronto Paper Matching System (TPMS) (Charlin & Zemel, 2013) uses LDA for topic modeling, linear regression and collective
filtering for a paper to reviewer assignment.
The fourth approach is use of recommender systems for an automated paper to reviewer assignment (Conry et al., 2009;
Resnick & Varian, 1997). (Conry et al., 2009) considers reviewer bids for their choice of paper as feedback, and further,
a linear programming based optimization approach is used to solve RAP. Moreover, the COI is obtained from the self-
declared reviewer preferences selecting papers belonging to his interest and domain of expertise. A weight of 1 is assigned
if there is no conflict of interest between a reviewer ‘i’ and paper ‘u’ and a weight of −1 otherwise. A recent work by Zhao
et al. (2018) has transformed the reviewer recommendation problem into a classification problem and used Word Mover’s
Distance Constructive Covering Algorithm (WMD-CCA). Here complex semantic relationships between submitted papers
and reviewers are extracted from keywords using optimized WMD. Further, CCA conducts a rigorous learning process for
accurate predictions.
Experts from different domains have shown considerable interest in solving the RAP. Many pieces of research exist that
try to model the RAP and its solution differently. Others have tried to model the input parameters differently. Most of the
works have proposed a theoretical framework, and few have also given robust empirical study. Summarizing, methods
include query-based information retrieval, recommender systems, machine learning models, and optimization techniques
in a constraint-based environment. Most of the existing methodologies focus on how can the matching algorithms accurately
rank the experts corresponding to a given query. They do not consider different constraints or undertake them with heuristics
(Tang et al., 2010). Other optimization techniques individually tackle constraints, i.e., either deal with optimizing input factors
or the assignment strategy. Consequently, it results in a sub-optimal or only an approximate solution. Also, existing CMS
relies on self-declared reviewer’s preferences for finding expertise matching and CoI with submissions.4 ,5 However, in this
paper, we propose a completely novel framework by first independently optimizing the input factors and then solving the
assignment problem. The input factors are optimized using the best methodology.
2.1. Trend of co-authorship distance
We crawl the list of TPC member and accepted paper for few conferences, like ANTS (2008–2011), ASONAM (2009–2011),
COMSNETS (2010–2012), HIPC (1999–2011), ICDCN (2010–2012), INFOCOM (2000–2012), MOBICOM (1995–2012), SIGCOMM
(2005–2011), SIGIR (2002-2012), WALCOM (2007–2012), COMAD (2005–2009) and ICBIM (2014, 2016) to see how co-
authorship distance varies between TPC member and authors of accepted paper. All such information is crawled from their
corresponding websites. However, the study is not based upon actual assignment mapping between assigned reviewers and
accepted papers. For the year 2005, the study reveals that co-authorship distance is a maximum 2 for a conference. On a time
scale between 2002–2012, we see that average co-authorship distance has varied between 3 and 1 for most of the cases.
We find that co-authorship relation exists between TPC member and the author for maximum cases. Consequently, this
relationship can be used to extract CoI without explicitly depending upon self-declaration from the reviewer and author.
3. Methodological overview
Review quality depends upon three indispensable factors. Assuming that the reviewer has bid for a paper and is willing
to review, reviewer expertise and knowledge must match with the topics explored in the submitted paper. Next, it is crucial
to take into consideration social and academic relations between a reviewer candidate and all the authors of the submitted
paper. For fair and accurate review, the reviewer assignment should be done to a person with high co-authorship distance
value and minimum CoI. Finally, the assignment cannot overload a single reviewer as this might degrade review quality
in some cases (Fig. 2). In this section, first, we elaborately describe the modeling of individual factors (topic extraction
and similarity measure, CoI using co-authorship distance measure and calculation of reviewer workload) in a constraint-based
4
https://easychair.org/overview.
5
https://cmt3.research.microsoft.com/About.
5. D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022 5
framework. Next, we describe the equilibrium multi-job assignment problem and how we have mapped RAP to it. All the
notations used in this paper are listed in Table 1.
3.1. Topic extraction and similarity measure
There are many topic modeling techniques such as keyword matching (Chen & Xiao, 2016), Latent Dirichlet Allocation (LDA)
(Blei et al., 2003), Author Topic Model (AT) (Karimzadehgan & Zhai, 2009; Kou, Hou U, et al., 2015; Liu et al., 2014; Tang et al.,
2012), and Author Persona Topic Model (APT) (Mimno & McCallum, 2007). (Mimno & McCallum, 2007) evaluates different
topic models such as language models with Dirichlet Smoothing, AT and AP models. They report that although AP models
outperform others however, LDA based models also performed well especially in finding the most relevant reviewers. In our
work, we consider widely used topic modeling technique based upon LDA. Reviewer’s expertise score is calculated in two
step. First, the initial score is calculated by matching expertise of reviewer’s (extracted from his/her past publication) and
topic of submitted paper using a normalized word space model. Here, LDA algorithm is used to derive topic model. However,
a comparative study shows that word space model performs better matching. Second, self-declared reviewer’s expertise is
considered as the ground truth value and combining with initial score, supervised prediction algorithm are run to get the
final score.
Table 1
List of notations.
Notation Description
R = {ri|1 ≤ i ≤ m; where, m ∈ N} Set of reviewer
P = {pj|1 ≤ j ≤ n; where, n ∈ N} Set of paper
T = {tk|1 ≤ k ≤ u; where, u ∈ N} Conference topic set
A(pj) = {ax|1 ≤ x ≤ p; where, p ∈ N} Author set for a paper pj
C(pj) = {cy|1 ≤ y ≤ q; where, q ∈ N} Co-author set of authors A(pj) for each paper pj
C′
(ri) = {c′
z|1 ≤ z ≤ r; where, r ∈ N} Co-author set of reviewers ri
W(pj) = {tv|1 ≤ v ≤ c; where, c ∈ N} Word set for each submitted paper pj
W(ri) = {tw|1 ≤ w ≤ d; where, d ∈ N} Word set representing each reviewer’s archive ri
S(ri, pj) Topical similarity percentage between ri and pj
D(ri, pj) Co-authorship distance between A(pj) and ri
CoI(ri, pj) Conflict of Interest (CoI) between A(pj) and ri
L(ri) Work load of a reviewer ri
K Assignment quality metric
For this experiment, we get reviewer’s past publication from ArnetMiner data set6 and then by using LDA algorithm
extract topic set for each reviewer. The initial scores are obtained using the word space model, by matching this reviewer’s
archive with the submitted paper. Next, a second round of verification is done by running Linear Regression Shared algorithm
to predict the final score by combining initial score and reviewer’s self-declared expertise. In the final step, a topic similarity
matrix is formed.
The word space model can be obtained in four steps:
1. Normalize word count: The submitted paper and archive of reviewer’s publication are represented in form of vector. Next,
the frequency of each word is normalized as given in Eq. (1).
T(ri) =
W ∈ Rp
log f (Wri
) (1)
2. Smoothing: In order to better deal with the rare words, we Dirichlet smooth the reviewer’s normalized word count.
Here, is the smoothing parameter. Wri
is the total number of word in reviewer’s archive and, |Nwk
| is the number of
occurrences of word k in the word set of reviewer’s archive. W is the total number of words in reviewer’s corpus and |N|
is the number of occurrences of word k in the corpus.
f (Wri
) =
|Wri
|
|Wri
| +
|Nwk
|
|Wri
|
+ (
Wri
+
)
|N|
W
(2)
3. Normalize score: The individual score obtained for each reviewer and submission set are further, normalized by dividing
by the length of paper (pl).
T(ri) =
f (Wri
)
pl
(3)
6
https://aminer.org/citation.
6. 6 D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022
4. Initial expertise score: The initial expertise score is calculated as the dot product of reviewer’s past publication and
submitted paper such that,
S(ri, pj) = f (Tri
).f (Tpj
) (4)
3.2. Co-authorship distance measure
Second important factor is to consider the ‘Conflict of Interest’ (CoI) between reviewer and authors of submitted paper
before assignment. We extract co-author list for each reviewer from ArnetMiner data set. Next, for each submitted paper
we extract co-authors for distinct set of author in case, the paper is a multi-authored paper from same data set. Here, in
this paper we consider the CoI factor from co-authorship graph as other types of conflict of interest mentioned in paper
(Long et al., 2013) can be easily derived from it. We cross verify it with self-declared co-author list by reviewer and author
of submitted paper. Next, we assign a co-authorship distance value which varies between 0 to 3. If an entity in the author
set A(pj) of submitted paper pj is directly matching with an entity in reviewer set R then, the co-authorship distance D(ri,
pj) = 0 such that, A(ri) ∩ pj /
= .
If an entity in Apj
belong to C′
ri
or conversely, an entity in R reviewer set is present in C(pj) such that either an author of
submitted paper is co-author of reviewer or a reviewer is present in co-author set of submitted paper then, co-authorship
distance D = 1. In set notation, it can be represented as, if D(ri, pj) /
= 0 and C′(ri) ∩ A(pj) /
= or C(pj) ∩ (Ri) /
= then D(pi,
rj) = 1. Finally, if co-authors of paper set Cpj
belong to the co-author set of reviewer C′
rj
then the co-authorship distance is
given a value of 2 such that D = 2. In set notation, it can be represented as, if D(pi, rj) /
= 1 and C(rj) ∩ B(pi) /
= then D(pi, rj) = 2.
For all other cases, D = 3 as no direct collaboration exist between authors and reviewer (refer to Fig. 1). Mathematically it is
represented as, if D(pi, rj) /
= 2 then D(pi, rj) = 3. ‘Conflict of interest’ is inversely proportional to co-authorship distance. We,
filter out grouping of paper assignment set with selective available choice of reviewers and sort in a descending order such
that paper with greater D value are mapped with reviewers at the beginning. D(pi, rj) = 0, is not considered for assignment.
Fig. 1. Co-authorship distance measure between reviewer (Rj) and author set (Apj
) extracted using co-authorship graph. (For interpretation of the references
to color in this figure legend, the reader is referred to the web version of this article.)
3.3. Calculation of workload (l)
Third criteria before final assignment of papers to reviewer is to maintain a load counter for each entity in Technical
Program Committee (TPC) or reviewer such that the overall work load of reviewers is equally balanced. This is also to ensure
that a single reviewer is not overloaded due to his field of expertise. After every assignment, the load counter is incremented
by 1 for a given reviewer, and if the load counter exceeds a certain value, such an assignment cannot be done. The load value
is calculated as given in Eq. (5). After making an assignment and incrementing the load counter, the next paper also goes
through similar processing until all the papers submitted are assigned with best suitable reviewer option. For all cases, if
none of any better match found then keeping that also for manual assignment.
l =
⌈
|Pj|
|Ri|
⌉ × m
(5)
7. D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022 7
3.4. Equilibrium multi-job assignment problem
First, in this sub-section, we describe the concept of a multi-job assignment problem and its extension to an equilibrium
model (Liu Gao, 2009). Given, n jobs and m workers where m n. Also, the cost of all assignment edges is given. For instance,
when a particular job j allocates to worker i, then the corresponding cost is cij. The multi-job assignment problem is to find
an assignment schedule where n jobs allocate to m workers with the following two constraints: (1) each worker gets at
least one job. However, he may be assigned more than one job also, and the upper bound is not fixed. (2) Each job must be
assigned to one and only one worker. The objective function is defined such that the total cost considering all allocations is
minimized. When the multi-job assignment problem is extended to an equilibrium model, the problem shifts to find such
an assignment schedule where concerning a single worker, the difference between maximum total cost and the minimum
total cost is minimum. Inherently, the equilibrium model of the multi-job assignment problem balances load distribution
for each worker and minimizes overall cost in steps by minimizing cost at the level of an individual worker. Mathematically,
m workers represents set W = {1, 2, 3, . . ., m} and n jobs represents set J = {1, 2, 3, . . ., n}. Let, the cost be represented as c(i, j)
where i ∈ W and j ∈ J and the assignment is given as aij. Here, aij is either 1, that is, if assignment occurs between i and j or 0
otherwise. The vector x consists of all assignments corresponding to a single worker. In general, the equilibrium multi-job
assignment problem can be represented as,
min gmax(x) − gmin(x)
where the corresponding equations of gmax(x) and gmin(x) are given as,
gmax(x) = max
n
j=1
c(i, j) × aij|i ∈ W
gmin(x) = min
n
j=1
c(i, j) × aij|i ∈ W
The novelty of the proposed methodology is mapping RAP and re-define a maximization-type equilibrium multi-job
assignment problem. Here, the job corresponds to the paper, and the worker corresponds to a reviewer. In RAP, the constraints
are changed (see Eqs. (7) and (8)). In the above problem, apart from the two assignment constraints, there are no specific
restrictions on if a particular type of job is more suitable for a specific worker. In contrast, in RAP, if the expertise of a
particular reviewer does not match the topics explored in a submitted paper or if there is any CoI, then such an assignment
is not feasible. There are some additional constraints or trivial factors to be considered before assignment. Here we define
profit by collectively considering three factors topic similarity, CoI, and reviewer workload. Fundamentally, profit maximizes in
terms of maximizing topic similarity measure and minimizing CoI. Inherently the constraints of the model evenly distribute
load among all workers. The problem is to find such a schedule that the difference between the maximum total profit and the
minimum total profit with respect to single reviewer is maximum. Optimal assignment occurs when the objective function
achieves a threshold value K as represented in Eq. (18). Several algorithms are proposed to solve such extended models
to multi-job assignment problem such as genetic algorithm (Liu Gao, 2009) and hybrid genetic algorithm (Misevičius
Stanevičienė, 2018). However, we could not use them due to additional constraints in RAP at multiple level. We propose a
meta-heuristic greedy solution to solve it.
Algorithm 1. Weight matrix before assignment
4. Problem formulation
In this problem, all graphs which are represented in the form of sets are complete bipartite graphs and directed. Here,
graph G is given by G = (R, P). Vertex set V is given by V = R ∪ P and edges set E is given by E = (i, j)|i ∈ R, j ∈ P. In terms of graph
theory, the problem is to extract a minimum edge cover set for this complete bipartite graph G which satisfies maximal
matching conditions for this problem.
8. 8 D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022
Given a reviewer set as R = {R1, R2, R3, . . ., Rm} and paper set as P = {P1, P2, P3, . . ., Pn}, after performing maximal matching
we obtain a weight function w such that the assignment problem can be formulated as
max
i ∈ R
j ∈ P
w(i, j) × aij (6)
where aij = 1, if reviewer Ri is assigned to paper Pj and aij = 0, if reviewer Ri cannot be assigned to paper Pj. Here number of
reviewers is given by m and number of papers is given by n.
Remark (Constraint 1). Each paper (j ∈ P) can be reviewed by at most m reviewers such that,
i ∈ R
aij = m|j ∈ P (7)
Remark (Constraint 2). Each reviewer (i ∈ R) cannot be assigned more than l papers where l =
⌈
|Pj|
|Ri|
⌉ × m
such that,
j ∈ P
aij = n|i ∈ R (8)
The problem is to find such an assignment that maximal matching occurs. Individually, for a single reviewer we calculate
the maximum difference between the maximum profit and minimum profit. Here xi represents the vector of all assignments
for a reviewer i.
max fmax(xi) − fmin(xi) (9)
For a single reviewer, considering all possible assignments we calculate the maximum profit as follows,
fmax(xi) = max
j ∈ P
w(i, j) × aij|i ∈ R (10)
Similarly, for a single reviewer, considering all possible assignments we calculate the minimum profit as follows,
fmin(xi) = min
j ∈ P
w(i, j) × aij|i ∈ R (11)
Overall, for achieving the optimal assignment, we check if the difference is greater than a threshold value (K), then only
the assignment is feasible. K is the expected equilibrium degree of the multi-job assignment. It denotes the value of objective
function at xi.
fmax(xi) − fmin(xi) = K (12)
5. Solution approach
By maximal matching, the first condition that we refer is maximum topic similarity should occur between topic set of
reviewer R (Tri
) and topic set of paper P (Tpj
). The objective function for maximum topic matching can be formulated as
Smax(ri, pj) = max
m
i=1
n
j=1
xik × xjk|k ∈ T (13)
The second condition that needs to be fulfilled is minimum CoI value should occur between reviewer set R and authors
(Apj
) of paper set P. CoI value is inversely proportional to co-authorship distance D(ri, pj) which is calculated as in algorithm
and Fig. 1.
Dmax(ri, pj) = max
m
i=1
n
j=1
xriapj
+ xricpj
+ xc′
ri
apj
+ xc′
ri
cpj
|i ∈ R, j ∈ P (14)
CoImin(ri, pj) = min (
1
Dmax(ri, pj)
)|i ∈ R, j ∈ P (15)
Next, we calculate intermediate weight matrix by combining maximum topic similarity value and minimum CoI value
following a greedy approach which can be formulated as
w′
(i, j) = Smax(ri, pj) × CoImin(ri, pj) (16)
9. D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022 9
Edge weight of final assignment which produces maximum matching are:
wmax(i, j) =
w′(i, j)
m
i=1
w′(i, j)
|j ∈ P (17)
After each assignment of a reviewer to paper, we check the constraints such that each reviewer should not get more
than l number of papers. As soon as a reviewer is assigned to l papers, we remove the corresponding reviewer, and as soon
as a paper is assigned to at most m reviewers, we remove the corresponding paper from further assignments. Further, if a
reviewer is removed from assignment, the complete step beginning from calculating topic similarity matrix are followed
again with remaining papers and list of reviewer. A toy example is elaborately described in section Appendix A to understand
the working of the model.
Algorithm 2. Reviewer to paper assignment
5.1. Assignment quality score
The assignment quality score is a resultant score calculated based on three factors, topic similarity percentage, co-authorship
distance value and reviewer workload to check how these factors are satisfied for a given assignment (Eq. (18)).
K =
w′(i, j) − (max(l) − 1) + (max(l) − achieved(l(ri))
max(S(ri, pj)) ∗ max(D(ri, pj))
(18)
Here, w′(i, j) is as given in Eq. (16). max(Sri,pj
) is the maximum topic similarity score that is, 100 and max(Dri,pj
) is
the maximum co-authorship distance which a given assignment could obtain that is, 3. l is the reviewer workload and
achieved(l(ri)) is achieved load of a given reviewer after assignment. In order to check the relationship of three factors with
assignment quality, we run three different experiments such that, one of the factor is varied between a range of different
values and the other two are kept constant (Fig. 2). The plots in Fig. 2 show that both topic similarity and co-authorship
distance factor are linearly proportional whereas reviewer workload is inversely proportional to assignment quality. The
bar graph in Fig. 2 depict same results. For papers 1 and 2, co-authorship distance is varied and the assignment quality is
increased. For papers 3 and 4, topic similarity is varied and the assignment quality is increased. For papers 5 and 6, reviewer
workload is varied and minimal change is seen in assignment quality.
10. 10 D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022
Fig. 2. Variation in assignment quality is plotted in y-axis while varying topic similarity (in percentage), co-authorship distance value and reviewer work
load along x-axis. In the bar graph, normalized scores of each factor is plotted and the calculated assignment quality is given in a red point. In paper 1 and
2, co-authorship distance is varied and other two factors are kept constant. Similarly, in paper 3 and 4 topic similarity percentage is varied and in paper 5,
6 reviewer work load is varied. Comparative study shows change in assignment quality while varying each factor and keeping the other two constant. (For
interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
6. Experiment
As a proof of concept, we test run the proposed algorithm on a real conference data set International Conference on
Business Information Management (ICBIM) 20167 collected from ‘EasyChair’. For comparison, we also run the automated
Toronto Paper Matching System assignment strategy on this data set. Note that the collected data has assignment mapping
between set of accepted paper and TPC member. Total number of accepted papers is 59 and number of TPC member is 40.
Unique number of authors from accepted paper is 111. Average number of authors per paper is 2.25 and average number of
co-authors is 1.25.
As mentioned in Section 3 for the proposed algorithm, we extract topic similarity and co-authorship distance matrix
between set of all accepted paper and TPC member for ICBIM 2016 data set. In Fig. 3, we plot mean and variance of topic
similarity and co-authorship distance in an error bar plot for a set of 10 randomly selected papers with all 40 available TPC
members before assignment. Ideally, the mean topic similarity percentage and co-authorship distance should have a high
value, and the variance should be less. When the mean topic similarity percentage and co-authorship distance value is small,
and the variance is also comparatively small as seen for P2 in Fig. 3, such papers should be assigned to reviewers first due to
scarce availability of expertise reviewer. The assignment made by ‘EasyChair’ for this data set considers two constraint. First,
each paper needs to be assigned to two reviewers and, second, each reviewer can be assigned at most three papers. Hence,
the reviewer workload for assignment is 3. Considering similar constraints, we assign papers using the proposed algorithm
and existing benchmark RAP system ‘TPMS’.
7
https://easychair.org/my/conference.cgi?welcome=1;conf=icbim2016.
11. D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022 11
Fig. 3. (a) Mean topic similarity (in percentage) and its variance is plotted in an error bar graph for 10 randomly selected papers with available all 40
reviewers in ICBIM 2016 data set is plotted. (b) Mean co-authorship distance value and its variance is plotted in an error bar graph for 10 randomly selected
papers with available all 40 reviewers in ICBIM 2016 data set is plotted.
6.1. Bench-marking methods
We briefly describe the existing benchmark automated RAP system which are being used in many of the conferences. It
includes EasyChair and Microsoft Conference Management Toolkit (CMT).8 Since, the ICBIM 2016 conference was conducted
using ‘EasyChair’; the data set cannot be tested to obtain assignment using any other Conference Management System such
as ‘Microsoft CMT’. We implement TPMS which is the automated assignment strategy of ‘Microsoft CMT’. CMT has been used
in more than 3500 conference. In EasyChair, author of submitted paper and reviewers explicitly declare all required data
such as field of expertise, several conflicts of interest such as his/her co-author, collaborator from the same research group,
peers from same affiliation, etc.
Conference program committee consider all such declarations before manually assigning a paper to reviewer. They can
also opt for an automated assignment strategy. TPMS is another automated assignment strategy which consider only two
factors topic similarity score and reviewer’s workload. The reviewer profile is expressed as a set of topics taking a bayesian
approach. Finally, it is used to calculate the score matrix before assignment.
8
https://cmt3.research.microsoft.com/About.
12. 12 D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022
Fig. 4. (a) Normalized assignment quality of the proposed method, EasyChair and TPMS for 59 accepted set of papers is plotted in scatter plot in blue, red
and green respectively. (b) The assignment quality distribution for three different techniques considering all 59 accepted set of papers is plotted in a box
plot. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
7. Result and discussion
The algorithm takes a time complexity of O(m2 * n2) where m is the number of available reviewers and n is the number
of submitted papers. Next, we compare the assignment performance of the algorithm with other bench-marking meth-
ods, EasyChair and TPMS. Actual ‘EasyChair’ assignment can be considered as a random assignment test case and TPMS
considers only two factors topic similarity and reviewer’s workload for doing the assignment. We aim to observe how inclu-
sion of another factor that is, conflict of interest measured using co-authorship distance improves the assignment quality
experimentally. For each paper, we calculate the assignment quality using the proposed method, EasyChair and TPMS.
7.1. Performance comparison
We perform a comparative study of assignment quality (as given in Eq. (18)) for three different assignment techniques,
our proposed method, EasyChair and TPMS. Fig. 4(a), scatter plot shows that the proposed method and TPMS quality scores
are consistently higher than actual EasyChair assignment quality. Moreover, Fig. 4 (b), box plot shows a large deviation in
actual EasyChair assignment quality which implies a significant difference than the other two groups. The median is lowest
compared to all three that is, 0.3 which implies that 50% of papers in lower quartile attain quality lesser than 0.3. The box
plot for the proposed method is comparatively shorter than other two and the data distribution is symmetric which signifies
that overall all papers get consistently higher score. The quality distribution for TPMS is left skewed. However, in order to
understand whether there is statistical difference in quality between the three groups, we perform a paired sample t-test. We
plot normalized topic similarity score and co-authorship distance value in a bar graph (Fig. 5) after assignments by randomly
selecting 17 paper from the data set. It is seen that assignments using the proposed method consistently maintain higher
topic similarity score and co-authorship distance value. In contrast, TPMS maintains higher topic similarity score but in few
papers attain low co-authorship distance value such as in P2 and P8. ‘EasyChair’ on the other hand, attain comparatively
lower topic similarity score.
7.2. Hypothesis testing and statistical validation of quality score
We perform a hypothesis test using paired sample t-test and compare mean quality of assignment of the proposed
method with that of EasyChair and TPMS. This test is done to statistically prove the importance of all three factors for
attaining optimal assignment quality.
13. D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022 13
Fig. 5. (a) Normalized topic similarity score is plotted in a bar graph for 17 randomly selected set of papers for comparison between three different
assignment techniques; the proposed method, EasyChair and TPMS after assignment is done. (b) Normalized co-authorship distance value is plotted in a
bar graph for 17 randomly selected set of papers for comparison between three different assignment techniques after assignment is done.
We have assignments obtained for 59 paper which are done using group1 (EasyChair), group2 (TPMS) and group3 (our
proposed method.) The same sample of 59 accepted papers is tested with inclusion and exclusion of different factor. For
‘EasyChair’ assignments, we consider it as a random assignment case in which three factors topic similarity, conflict of
interest and reviewer’s workload are randomly taken into consideration. Next, TPMS implicitly takes into consideration only
two factor; topic similarity and reviewer’s workload. Finally, we run our proposed method which takes into consideration
all three factors (Table 2).
7.2.1. Comparison of proposed method with EasyChair
Null hypothesis: There is no difference in mean assignment quality between two groups, group1 and group3.
1 − 3 = 0
Alternate hypothesis: There is statistical difference in mean assignment quality between two groups, group1 and group3
that is,
1 − 3 0
14. 14 D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022
Table 2
Comparative t-test of the proposed method (group 3, mean = 0.6103, variance = 0.0281) with EasyChair (group 1) and TPMS (group 2).
Bench-marking method Mean quality Variance Sample size t-stat t-critical value p-Value
EasyChair (Group 1) 0.4435 0.0651 59 3.9900 1.6715 0.0000938
TPMS (Group 2) 0.5650 0.0300 59 1.6957 1.6715 0.0476
We calculate the t-statistic value as given in Eq. (19).
t =
( ¯
X1 − ¯
X3) − (1 − 3)
S2
p ∗ ( 1
n1
+ 1
n3
)
(19)
.
Here, ¯
X1 and ¯
X3 represent mean assignment quality of sample group1 and group3 respectively. 1 and 3 represent
mean assignment quality of population group1 and group3 respectively which we consider as 0 in the null hypothesis. n1
and n3 is the sample size which is 59 for both the groups. Sp is pooled variance between two groups. Since the t-statistic
value for group1 assignment is 3.99 which is greater than t-critical value 1.6715 and the p-value at 95% confidence interval
is 0.000093 which is less than 0.05. This implies that we can reject the null hypothesis and it is proven that the mean
assignment quality of group1 and group3 is statistically different. The assignment performance of the proposed method is
of superior quality than ‘EasyChair’ assignments.
7.2.2. Comparison of proposed method with TPMS
Null hypothesis: There is no difference in mean assignment quality between two groups, group2 and group3.
1 − 2 = 0
Alternate hypothesis: There is statistical difference between the mean assignment quality between two groups, group2
and group3 that is,
1 − 2 0
We calculate the t-statistic value as given in Eq. (19). Since the t-statistic value 1.6957 is greater than t-critical value
1.6715 and the p-value at 95% confidence interval is 0.0476 which is less than 0.05. This implies that we can reject the null
hypothesis. This implies that the mean assignment quality of group2 and group3 is statistically different with inclusion of a
factor ‘Conflict of Interest (CoI)’ measured using co-authorship distance.
7.3. Comparative study on varying reviewer workload
Varying the constraints of the proposed method, we see change in assignment quality for a set of 11 randomly selected
paper from the data set. The constraints in two different experiment are defined as each reviewer can be assigned at most
3 and 5 paper respectively. As seen in Fig. 2, the assignment quality is inversely proportional to reviewer workload.
The same is validated while experimentally testing the proposed algorithm on a real conference data set. As reviewer work
load increase from 3 to 5, the assignment quality decrease for few papers such as P4, P5, P8, etc. Next, we plot normalized
topic similarity and co-authorship distance score considering reviewer workload 3 and 5 in Fig. 6 for same set of papers
after assignment. It is seen that the assignment quality mainly degrade due to comparatively lesser topic similarity score
obtained by assignment considering reviewer workload 5.
15. D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022 15
Fig. 6. Comparative study of assignment quality is done (line plot) when the reviewer workload constraint is 3 and 5 respectively. (a) Normalized topic
similarity score and co-authorship distance is plotted in a bar graph for 11 randomly selected papers after assignment is done by the proposed algorithm
considering constraint, reviewer work load = 3. (b) Normalized topic similarity score and co-authorship distance is plotted in a bar graph for same set of 11
randomly selected papers after assignment is done using the proposed algorithm considering constraint, reviewer work load = 5.
8. Conclusion
The novelty of the paper lies in mapping and modeling the RAP to a maximization-type equilibrium multi-job assignment
problem. RAP is an NP-Hard problem with multiple input and assignment constraints to be taken into account, unlike the
classic equilibrium model. Considering additional constraints, we formalize the objective function of RAP such that for a
single reviewer, the difference between maximum and minimum profit (assignment quality) maximizes. Independently, for
each reviewer, the profit is maximized in steps in terms of maximizing topic similarity and minimizing CoI. Overall the total
profit is maximized along with balancing the reviewer’s workload. We propose a meta-heuristic to solve it using a weighted
matrix-factorization based greedy algorithm. The primary motivation of the work is to improve upon the performance of
existing automated assignment strategies and develop an easy-to-implement solution. Instead of relying on self-declaration
by reviewer and authors, the CoI issue is dealt with using a co-authorship graph. In this way, the time required to collect
such preliminary information from each reviewer and the author can be avoided. A new metric is proposed and validated for
evaluating the performance of each assignment. A comparative study of the proposed method is performed with a widely
used automated methodology like TPMS and EasyChair actual assignments using it. It shows that the mean assignment quality
by the proposed method is of superior quality. There is a significant difference seen in mean quality with the inclusion of an
additional factor ‘CoI.’ One of the limitation of this research problem is the rigorous testing of the hypothesis on several data
sets. Also, there are other kinds of CoI discussed by (Yan et al., 2017) which we could not deal with in this paper. In future
researches, the novel formulation of the RAP problem and its easy-to-implement solution can be applied to other fields of
research with similar constraint-based framework. This includes process scheduling in CPU, retrieval of file storage from
secondary memory, search engine optimization, etc.
Author contributions
Dinesh K. Pradhan: Conceived and designed the analysis; Collected the data; Contributed data or analysis tools; Per-
formed the analysis; Wrote the paper.
Joyita Chakraborty: Conceived and designed the analysis; Contributed data or analysis tools; Performed the analysis;
Wrote the paper.
Prasenjit Choudhary: Conceived and designed the analysis; Collected the data.
Subrata Nandi: Conceived and designed the analysis; Contributed data or analysis tools.
16. 16 D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022
Appendix A. Toy example
In this section, we demonstrate the assignment procedure followed by the proposed algorithm using dummy example
data. In this example, we consider 12 papers to be assigned among 3 reviewers. Constraints for this example are defined as
– (i) Each paper should be assigned to at least 2 reviewer. Hence, the value of q = 2. (ii) Each reviewer is assigned at most 8
papers l = 12×2
3
= 8. Given a reviewer set R = {R1, R2, R3} and paper set P = {P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11, P12};
the problem is to find weight of an assignment edge w(i, j) between R and P set such that maximal matching occurs between
them. Maximal matching occurs if percentage of topic similarity that is, value of S(ri, pj) between R and P set is maximum,
minimum Conflict of Interest or inversely, maximum collaborative distance D(ri, pj) =3 exist between author set of paper
pj and reviewer set ri.
Step 1: Firstly, we create a topic similarity matrix S(ri, pj) by assigning each paper and reviewer a topic similarity
percentage based on presence of distinct common keywords in topic set of paper Tp and topic set of reviewer Tr.
Table A.1
Topic similarity S(ri, pj) matrix.
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
R1 50 30 80 60 50 65 70 50 80 40 30 80
R2 70 40 30 75 35 75 80 30 60 50 40 90
R3 90 50 20 40 80 85 85 10 70 55 45 75
Tables A.1–A.15
Step 2: Next, we calculate collaborative distance matrix D(ri, pj) from author set of submitted paper P given by A(pj) and
reviewer set R.
Table A.2
Conflict of Interest (CoI(ri, pj)) matrix.
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
R1 3 0 0 3 2 3 3 3 3 3 1 2
R2 2 3 3 3 3 1 3 3 3 2 3 3
R3 3 3 3 3 3 3 2 1 3 3 3 3
Step 3: For calculation of weight matrix, we take product of topic similarity matrix S(ri, pj) and collaborative distance
matrix D(ri, pj). For each paper, we calculate sum of weights column wise considering assignment of a single paper to multiple
reviewers.
Table A.3
Weight matrix (w′
(i, j)).
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
R1 150 0 0 180 100 195 210 150 240 120 30 160
R2 140 120 90 225 105 75 240 90 180 100 120 270
R3 270 150 60 120 240 255 170 10 210 165 135 225
SUM 560 270 150 525 445 525 620 250 630 385 285 655
Step 4: Finally we calculate an assignment matrix by dividing each element from the sum calculated in previous step.
Next we continue iterating and make assignments of reviewer to papers by changing the weights as described below until
any reviewer gets the maximum load l = 8.
1. After iteration 1 of assignment matrix, we assign reviewers to papers whose assignment weight is greater than 0.5. The
assignments made in this iteration are R1 → {P8}, R2 → {P3}, R3 → {P2, P5}.
2. After iteration 2 of assignment matrix, we assign reviewers to papers whose assignment weight is greater than 0.45. The
assignments made in this iteration are R3 → {P1, P6, P11}. The final set of assignments are R1 → {P8}, R2 → {P3}, R3 → {P2,
P5, P1, P6, P11}.
3. After iteration 3 of assignment matrix, we assign reviewers to papers whose assignment weight is greater than 0.40. The
assignments made in this iteration are R2 → {P2, P4, P11, P12}, R3 → {P3, P10}. The final set of assignments are R1 → {P8},
R2 → {P3, P2, P4, P11, P12}, R3 → {P2, P5, P1, P6, P11, P3, P10}. Note here that papers {P2, P3, P11} are assigned maximum
of 2 reviewers so they are marked in red.
4. After iteration 4 of assignment matrix, we assign reviewers to papers whose assignment weight is greater than 0.35. The
assignments made in this iteration are R1 → {P6, P9}, R2 → {P7, P8}. The final set of assignments are R1 → {P8, P6, P9},
17. D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022 17
R2 → {P3, P2, P4, P11, P12, P7, P8}, R3 → {P2, P5, P1, P6, P11, P3, P10}. Note here that papers {P6, P8} are assigned maximum
of 2 reviewers so they are marked in red.
5. After iteration 5 of assignment matrix, we assign reviewers to papers whose assignment weight is greater than 0.30. The
assignments made in this iteration are R1 → {P4, P7, P10}, R3 → {P9}. The final set of assignments are R1 → {P8, P6, P9, P4,
P7, P10}, R2 → {P3, P2, P4, P11, P12, P7, P8}, R3 → {P2, P5, P1, P6, P11, P3, P10, P9}. Note here that papers {P4, P7, P9, P10}
are assigned maximum of 2 reviewers so they are marked in red. Also note that maximum load of reviewer R3 has reached a
maximum of 8 papers. For remaining paper set, we remove R3 and re-calculate.
Here, we have used different font codes which refer to corresponding paper and reviewer sets being assigned after
each iteration. We use color BOLD when a paper is assigned to a reviewer and ITALIC when a paper satisfies its minimum
requirement.
Table A.4
Assignment matrix (Iteration 1).
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
R1 0.26 NA NA 0.34 0.22 0.37 0.338 0.6 0.38 0.311 0.105 0.244
R2 0.25 0.44 0.6 0.42 0.24 0.14 0.38 0.36 0.28 0.2597 0.42 0.417
R3 0.48 0.55 0.4 0.22 0.53 0.48 0.27 0.04 0.33 0.42 0.47 0.343
Table A.5
Assignment matrix (Iteration 2).
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
R1 0.26 NA NA 0.34 0.22 0.37 0.338 0 0.38 0.311 0.105 0.244
R2 0.25 0.44 0 0.42 0.24 0.14 0.38 0.36 0.28 0.2597 0.42 0.417
R3 0.48 0 0.4 0.22 0 0.48 0.27 0.04 0.33 0.42 0.47 0.343
Table A.6
Assignment matrix (Iteration 3).
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
R1 0.26 NA NA 0.34 0.22 0.37 0.338 0 0.38 0.311 0.105 0.244
R2 0.25 0.44 0 0.42 0.24 0.14 0.38 0.36 0.28 0.2597 0.42 0.417
R3 0 0 0.4 0.22 0 0 0.27 0.04 0.33 0.42 0 0.343
Table A.7
Assignment matrix (Iteration 4).
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
R1 0.26 NA NA 0.34 0.22 0.37 0.338 0 0.38 0.311 0.105 0.244
R2 0.25 0 0 0 0.24 0.14 0.38 0.36 0.28 0.2597 0 0
R3 0 0 0 0.22 0 0 0.27 0.04 0.33 0 0 0.343
Table A.8
Assignment matrix (Iteration 5).
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
R1 0.26 NA NA 0.34 0.22 0 0.338 0 0 0.311 0.105 0.244
R2 0.25 0 0 0 0.24 0.14 0 0 0.28 0.2597 0 0
R3 0 0 0 0.22 0 0 0.27 0.04 0.33 0 0 0.343
Table A.9
Final assignment matrix for step 1.
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
R1 0.26 NA NA 0 0.22 0 0 0 0 0 0.105 0.244
R2 0.25 0 0 0 0.24 0.14 0 0 0.28 0.2597 0 0
R3 0 0 0 0.22 0 0 0.27 0.04 0 0 0 0.343
18. 18 D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022
Step 5: In step 5 remaining paper set to be assigned include {P1, P5, P12}. We remove reviewer R3 from calculation and
as in step 1, re-write topic similarity matrix and collaborative distance matrix for remaining paper sets. We re-calculate
weights considering only 2 reviewer, R1 and R2.
Table A.10
S(ri, pj).
P1 P5 P12
R1 50 50 80
R2 70 35 NA
Table A.11
CoI(ri, pj).
P1 P5 P12
R1 3 2 2
R2 2 3 NA
Table A.12
Weight matrix (w′
(i, j))
P1 P5 P12
R1 150 100 160
R2 140 105 NA
SUM 290 205 160
Table A.13
Assignment (Iteration 1).
P1 P5 P12
R1 0.51 0.48 1
R2 0.48 0.51 NA
Table A.14
Final assignment (step 5).
P1 P5 P12
R1 0.51 0.48 1
R2 0.48 0.51 NA
Table A.15
Optimal assignment.
Paper Reviewer 1 Reviewer 2
P1 R1 R3
P2 R2 R3
P3 R2 R3
P4 R1 R2
P5 R2 R3
P6 R1 R3
P7 R1 R2
P8 R1 R2
P9 R1 R3
P10 R1 R3
P11 R2 R3
P12 R2 R3
19. D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022 19
1. After iteration 1 of assignment matrix, we assign reviewers to papers whose assignment weight is greater than 0.5.
Assignments made in this iteration are R1 → {P1, P12}, R2 → {P5}.
2. After final assignment matrix for step 2, the reviewer set of assignments (refer to Table A.16) are R1 → {P8, P6, P9, P4, P7,
P10, P1, P12}, R2 → {P3, P2, P4, P11, P12, P7, P8, P5}, R3 → {P2, P5, P1, P6, P11, P3, P10, P9}.
3. After final assignment matrix, the paper set of assignments (refer to Table A.16) are P1 → {R1, R3}, P2 → {R2, R3}, P3 → {R2,
R3}, P4 → {R1, R2}, P5 → {R2, R3}, P6 → {R1, R3}, P7 → {R1, R2}, P8 → {R1, R2}, P9 → {R1, R3}, P10 → {R1, R3}, P11 → {R2,
R3}, P12 → {R1, R2}
The final assignment is obtained as given in Table A.16 and is illustrated in Fig. 7. The assignments are consistent to
maximize topic similarity, minimize CoI value with consideration of reviewer’s assignment load.
Table A.16
Final assignment of reviewer to paper.
R1 P8 P6 P9 P4 P7 P10 P1 P12
R2 P3 P2 P4 P11 P12 P7 P8 P5
R3 P2 P5 P1 P6 P11 P3 P10 P9
Fig. 7. Paper to reviewer final assignment.
References
Blei, D. M., Ng, A. Y., Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Boyack, K. W., van Eck, N. J., Colavizza, G., Waltman, L. (2018). Characterizing in-text citations in scientific articles: A large-scale analysis. Journal of
Informetrics, 12, 59–73.
Charlin, L., Zemel, R. (2013). The toronto paper matching system: An automated paper-reviewer assignment system.
Charlin, L., Zemel, R., Boutilier, C. (2011). A framework for optimizing paper matching. Proceedings of the twenty-seventh conference on uncertainty in
artificial intelligence, 86–95.
Chen, G., Xiao, L. (2016). Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods. Journal of Informetrics, 10,
212–223.
Conry, D., Koren, Y., Ramakrishnan, N. (2009). Recommender systems for the conference paper assignment problem. Proceedings of the third ACM
conference on Recommender systems, 357–360.
Dumais, S. T., Nielsen, J. (1992). Automating the assignment of submitted manuscripts to reviewers. Proceedings of the 15th annual international ACM
SIGIR conference on research and development in information retrieval, 233–244.
Hettich, S., Pazzani, M. J. (2006). Mining for proposal reviewers: Lessons learned at the national science foundation. Proceedings of the 12th ACM SIGKDD
international conference on knowledge discovery and data mining, 862–871.
Jin, J., Niu, B., Ji, P., Geng, Q. (2018). An integer linear programming model of reviewer assignment with research interest considerations. Annals of
Operations Research, 1–25.
Kalmukov, Y., Rachev, B. (2010). Comparative analysis of existing methods and algorithms for automatic assignment of reviewers to papers. (arXiv preprint).
arXiv:1012.2019
Karimzadehgan, M., Zhai, C. (2009). Constrained multi-aspect expertise matching for committee review assignment. Proceedings of the 18th ACM
conference on information and knowledge management, 1697–1700.
Kolasa, T., Król, D. (2010). Aco-ga approach to paper-reviewer assignment problem in cms. KES international symposium on agent and multi-agent
systems: Technologies and applications, 360–369.
Kolasa, T., Krol, D. (2011). A survey of algorithms for paper-reviewer assignment problem. IETE Technical Review, 28, 123–134.
Kou, N. M., Hou, U. L., Mamoulis, N., Gong, Z. (2015). Weighted coverage based reviewer assignment. Proceedings of the 2015 ACM SIGMOD international
conference on management of data, 2031–2046.
Kou, N. M., Hou U, L., Mamoulis, N., Li, Y., Li, Y., Gong, Z. (2015). A topic-based reviewer assignment system. Proceedings of the VLDB Endowment, 8,
1852–1855.
Li, X., Watanabe, T. (2013). Automatic paper-to-reviewer assignment, based on the matching degree of the reviewers. Procedia Computer Science, 22,
633–642.
20. 20 D.K. Pradhan, J. Chakraborty, P. Choudhary et al. / Journal of Informetrics 14 (2020) 101022
Liu, L., Gao, X. (2009). Fuzzy weighted equilibrium multi-job assignment problem and genetic algorithm. Applied Mathematical Modelling, 33, 3926–3935.
Liu, X., Suel, T., Memon, N. (2014). A robust model for paper reviewer assignment. Proceedings of the 8th ACM conference on recommender systems, 25–32.
Long, C., Wong, R. C.-W., Peng, Y., Ye, L. (2013). On good and fair paper-reviewer assignment. In 2013 IEEE 13th international conference on data mining
(ICDM) (pp. 1145–1150).
Mimno, D., McCallum, A. (2007). Expertise modeling for matching papers with reviewers. Proceedings of the 13th ACM SIGKDD international conference
on knowledge discovery and data mining, 500–509.
Misevičius, A., Stanevičienė, E. (2018). A new hybrid genetic algorithm for the grey pattern quadratic assignment problem. Information Technology and
Control, 47, 503–520.
Okike, K., Hug, K. T., Kocher, M. S., Leopold, S. S. (2016). Single-blind vs double-blind peer review in the setting of author prestige. JAMA, 316, 1315–1316.
Price, S., Flach, P. A. (2017). Computational support for academic peer review: A perspective from artificial intelligence. Communications of the ACM, 60,
70–79.
Resnick, P., Varian, H. R. (1997). Recommender systems. Communications of the ACM, 40, 56–58.
Roberts, S. G., Verhoef, T. (2016). Double-blind reviewing at evolang 11 reveals gender bias. Journal of Language Evolution, 1, 163–167.
Schulzrinne, H. (2009). Double-blind reviewing: More placebo than miracle cure? ACM SIGCOMM Computer Communication Review, 39, 56–59.
Silva, F. N., Amancio, D. R., Bardosova, M., Costa, L. d. F., Oliveira, O. N., Jr. (2016). Using network science and text analytics to produce surveys in a
scientific topic. Journal of Informetrics, 10, 487–502.
Sun, Y.-H., Ma, J., Fan, Z.-P., Wang, J. (2007). A hybrid knowledge and model approach for reviewer assignment. In HICSS 2007. 40th annual Hawaii
international conference on system sciences 2007 (p. 47).
Tang, W., Tang, J., Lei, T., Tan, C., Gao, B., Li, T. (2012). On optimization of expertise matching with various constraints. Neurocomputing, 76, 71–83.
Tang, W., Tang, J., Tan, C. (2010). Expertise matching via constraint-based optimization. In 2010 ieee/wic/acm international conference on web intelligence
and intelligent agent technology (wi-iat), Vol. 1 (pp. 34–41).
Tayal, D. K., Saxena, P., Sharma, A., Khanna, G., Gupta, S. (2014). New method for solving reviewer assignment problem using type-2 sets and fuzzy
functions. Applied Intelligence, 40, 54–73.
Tomkins, A., Zhang, M., Heavlin, W. D. (2017). Reviewer bias in single-versus double-blind peer review. Proceedings of the National Academy of Sciences
United States of America, 114, 12708–12713.
Wang, F., Chen, B., Miao, Z. (2008). A survey on reviewer assignment problem. International conference on industrial, engineering and other applications of
applied intelligent systems, 718–727.
Wang, F., Zhou, S., Shi, N. (2013). Group-to-group reviewer assignment problem. Computers Operations Research, 40, 1351–1362.
Wang, Y., Liu, B., Zhang, K., Jiang, Y., Sun, F. (2019). Reviewer assignment strategy of peer assessment: Towards managing collusion in self-assignment.
2nd International conference on social science, public health and education (SSPHE 2018).
Wei, X., Croft, W. B. (2006). Lda-based document models for ad-hoc retrieval. Proceedings of the 29th annual international ACM SIGIR conference on
research and development in information retrieval, 178–185.
Xu, Y., Zhao, H., Shi, X., Shah, N. B. (2018). On strategyproof conference peer review. (arXiv preprint). arXiv:1806.06266
Yan, S., Jin, J., Geng, Q., Zhao, Y., Huang, X. (2017). Utilizing academic-network-based conflict of interests for paper reviewer assignment. International
Journal of Knowledge Engineering, 3, 65–73.
Yin, H., Cui, B., Lu, H., Zhao, L. (2016). Expert team finding for review assignment. In 2016 conference on technologies and applications of artificial
intelligence (TAAI) (pp. 1–8).
Zablocki, J., Lee, R. (2012). Auto-assign: An implementation to assign reviewers using topic comparison in start. Proceedings of the international
conference on e-learning, e-business, enterprise information systems, and e-government (EEE), 1.
Zhao, S., Zhang, D., Duan, Z., Chen, J., Zhang, Y.-p., Tang, J. (2018). A novel classification method for paper-reviewer recommendation. Scientometrics, 115,
1293–1313.