A Social Network-Empowered Research Analytics Framework For Project Selection

A social network-empowered research analytics framework for
project selection
Thushari Silva a
, Zhiling Guo a,
⁎, Jian Ma a
, Hongbing Jiang a,b
, Huaping Chen b
a
Department of Information Systems, City University of Hong Kong, Hong Kong
b
School of Management, University of Science and Technology of China and USTC-CityU Joint Advanced Research Centre, Suzhou, PR China
a r t i c l e i n f o a b s t r a c t
Available online 9 January 2013 Traditional approaches for research project selection by government funding agencies mainly focus on the
matching of research relevance by keywords or disciplines. Other research relevant information such as social
Keywords: connections (e.g., collaboration and co-authorship) and productivity (e.g., quality, quantity, and citations
Research project selection of published journal articles) of researchers is largely ignored. To overcome these limitations, this paper
Research social networks proposes a social network-empowered research analytics framework (RAF) for research project selections.
Research analytics
Scholarmate.com, a professional research social network with easy access to research relevant information,
serves as a platform to build researcher profiles from three dimensions, i.e., relevance, productivity and con-
nectivity. Building upon profiles of both proposals and researchers, we develop a unique matching algorithm
to assist decision makers (e.g. panel chairs or division managers) in optimizing the assignment of reviewers
to research project proposals. The proposed framework is implemented and tested by the largest government
funding agency in China to aid the grant proposal evaluation process. The new system generated significant
economic benefits including great cost savings and quality improvement in the proposal evaluation process.
© 2013 Elsevier B.V. All rights reserved.
1. Introduction
There is a steadily growing trend for government funding agencies
to support an increasing number of research proposals. For example,
there were 42,225 research grant proposals submitted to the National
Science Foundation (NSF) in the U.S. in 2010. The estimated number
of submission for 2012 will increase to 46,000. The number of proposals
submitted to the National Natural Science Foundation of China (NSFC)
has increased from 23,636 in 2001 to over 147,000 in 2011. The sheer
volume of submission has posed a significant challenge for research
project selection due to difficulties of assigning the most suitable
reviewers to the most relevant project proposals.
A research project can be characterized by a set of qualitative and
quantitative, tangible and intangible attributes. Management scientists,
Economist and IS practitioners have proposed various decision models,
methodologies and decision support systems to assist decision making
tasks related to research project selection [13,15,34,35]. Traditional
approaches based on mathematical programming and optimization
are useful for handling large volume of submissions, but are less effi-
cient in dealing with subjective judgment and information. Machine
learning techniques incorporating fuzzy logic, genetic algorithms and
artificial intelligence techniques are capable of learning complex pat-
terns in data, but are limited by their ability to generalize from training
data and optimize decisions over the entire decision space. Other tradi-
tional approaches involve manually assigning proposals to reviewers
based on their claimed expertise, which is neither efficient nor practical
to support increasing complexity of decision making faced by funding
agencies.
Current computer-based methods mainly consider matching
research relevance in terms of keywords or disciplines, while ignoring
the social connections (e.g., collaboration and co-authorship) and
productivity (e.g., quality, quantity, and citations of published journal
articles) of researchers. It is desirable to incorporate all these aspects
into a unified evaluation framework. To achieve this goal, we propose
a research analytics framework that is empowered by a research
social network (www.scholarmate.com) for effective research project
selection. Better identification of social connection can effectively
cluster researchers based on topics of interests, methodologies, and
research disciplines. Being able to identify community structure in the
social network helps us understand and exploit the research network
more effectively. On the one hand, such information can be used to
identify most suitable reviewers. On the other hand, it can help avoid
conflict of interests to ensure fair evaluation.
Specifically, we propose to define profiles of research entities (e.g.
project proposals, researchers) from three dimensions, i.e. relevance
(e.g., keywords and research disciplines), productivity (e.g., quality,
quantity, and citations of published journal articles), and connectivity
(e.g., project collaborators, co-authors and colleagues). Represented
by visual research CVs, profiles of proposals and potential reviewers
are built by extracting information from multiple sources including
submitted proposals, bibliographic databases (e.g., ISI, Scopus, and
Decision Support Systems 55 (2013) 957–968
⁎ Corresponding author.
E-mail addresses: tpsilva2@student.cityu.edu.hk (T. Silva),
zhiling.guo@cityu.edu.hk (Z. Guo), isjian@cityu.edu.hk (J. Ma), jhbymx@foxmail.com
(H. Jiang), hpchen@ustc.edu.cn (H. Chen).
0167-9236/$ – see front matter © 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.dss.2013.01.005
Contents lists available at ScienceDirect
Decision Support Systems
journal homepage: www.elsevier.com/locate/dss

EI), and research social network (i.e. www.Scholarmate.com). By
aggregating information in the three dimensions, we construct a unique
matching algorithm to assist decision makers (e.g. panel chairs or divi-
sion managers) in optimizing the assignment of reviewers to research
project proposals.
To demonstrate the usability of the proposed framework, we
implemented the system to aid China's largest government funding
agency in its grant proposal evaluation. The research analytics frame-
work builds upon scientometrics, business intelligence and social
network analysis techniques. Its powerful search and data access capa-
bilities provide timely and relevant information in visualized forms for
research project evaluation. The implemented system generates signif-
icant economic benefits including cost savings and quality improve-
ment in the proposal evaluation process.
This paper is organized as follows. Section 2 reviews the relevant
literature. Section 3 provides an overview of the research analytics
framework and the Scholarmate research social network. Section 4
presents the detailed methods for profiling and algorithms to calculate
the key performance indicators. An optimization problem for reviewer
assignment is proposed in Section 5. Section 6 reports evaluation of
the proposed system by China's largest government funding agency
for its grant proposal evaluation. Section 7 concludes with a summary
of contribution and directions for future research.
2. Literature review
The major challenge in reviewer assignment for proposal evaluation
is identifying and recommending the most suitable reviewers who have
a high level of expertise and will make valuable professional judgment
on given proposals [13,38]. In this paper, we propose a profile-based
approach to assign reviewers for proposal evaluation.
Previous research has identified two approaches to scientific re-
searcher profiling. One approach relies on subjective self-claimed infor-
mation declared by researchers themselves. The other approach is based
on objective measurement obtained through automated inferences
about the researcher's behavior patterns related to publications and
citations derived from relevant resources [37]. The first approach uses
qualitative methods (e.g. surveys, questionnaires, or interviews) and
traditional information retrieval models (e.g., term-based modeling [3]
and rough-set modeling [19]) to gain knowledge of a researcher's inter-
ests and resulted profiles. The latter approach utilizes various feature
selection techniques in machine learning to learn user profile [10].
The machine learning approaches tend to learn the mapping
between incoming set of documents relevant to user input and real
numbers which represent the strength of user preferences. The features
of the documents are first extracted by widely used techniques including
information gain [8,21] and correlation coefficient [32]. Then the key fea-
tures are used as attributes in the mapping functions. Some studies focus
on techniques such as neural networks [22], Support Vector Machine
(SVM) [11,16,29], K-Nearest Neighbors (K-NN) and logistic regres-
sion [6,40] before generating a mapping with a set of real numbers. Li
et al. [20] proposed a rough threshold model (RTM) to analyze and
extract keywords from the scientific publications. In our approach we
augment the original rough threshold model with phrase analysis algo-
rithm to resolve semantic ambiguity that is not handled by the original
rough threshold model for topic generation.
Collaboration network is one type of popular social networks that
has been widely studied in the literature [4,5,23]. A property that
many social networks have in common is clustering, or network tran-
sitivity [2,26,39]. Clustering coefficient is defined as the probability that
two of one's friends are friends themselves [7,39]. It typically ranges
from 0.1 to 0.5 in many real-world networks. A related concept is com-
munity in which connection within the same community is dense and
outside the community is sparse. Community structure in a social net-
work represents real social groupings by interest or background [27].
For example, communities in a citation network represent related
papers on a single topic [31].
There are two broad classes of hierarchical clustering methods to
detect the community structure in a social network: agglomerative
and divisive [28,30]. The agglomerative approach focuses on finding
the strongly connected cores of communities by adding links [24],
and the divisive approach uses information about edge betweenness
to detect community boundaries by removing links [23]. For example,
the Girvan–Newman algorithm [12] is one of the most widely used
divisive methods and is effective at discovering the latent groups or
communities that are defined by the link structure of a graph.
Newman's fast algorithm [25] is an efficient reference algorithm for
clustering in large networks. It falls in the general category of agglom-
erative hierarchical clustering methods. This method can be easily
generalized to weighted networks in which each edge has a numeric
value indicating link strength. It has been successfully applied to a
collaboration network of more than 50,000 physicists. In this study,
we adopt Newman's fast algorithm in our research social network
analysis.
3. An overview of the RAF and Scholarmate
Research Analytics is the application of methods and theories in
scientometrics, business intelligence and social network analysis to
transform research related data into relevant information in research
management. In this paper we demonstrate the research analytics
framework in the context of reviewer recommendation for research
project selection.
3.1. The RAF for reviewer recommendation
This study takes a profile-based approach to reviewer recommen-
dation. Fig. 1 illustrates the key framework.
Research Online (http://rol.scholarmate.com) is an institutional
repository service provided by Scholarmate (http://www.scholarmate.
com) to analyze proposals submitted through the Internet-based
Science Information System (ISIS, https://isis.nsfc.gov.cn). It helps
build standardized visual research CVs of researchers and identify the
social groups to which they belong. These steps greatly ease the profil-
ing of proposals and researchers. Key features and attributes such as
discipline codes and keywords to represent proposals and researchers
are derived from the standard keyword dictionary. Phrase patterns are
discovered by data mining the free text categories of the electronic
documents from various databases (e.g. ISI, Scopus and EI). Based on
the constructed comprehensive profiles of both the proposals and
potential reviewers, the system generates key performance indicators
in three dimensions, i.e., relevance, productivity and connectivity. Finally,
a matching algorithm that takes into account all three dimensional
measures is proposed for reviewer recommendation.
Specifically, relevance refers to the keywords, research discipline and
expertise area that are derived from both the researcher's scientific pub-
lications and prior funded projects. Productivity is measured by quality,
quantity, citations, and impacts of one's research, as well as other
academic achievements. Connectivity among researchers is inferred
through collaborations, such as collaborators in projects, co-authorship
in publications, and colleagues in the same organizations. Their specific
roles in the reviewer assignment process can be demonstrated in Fig. 2.
We will discuss each of them in detail in Section 4.
3.2. Scholarmate research social network
Scholarmate (http://www.scholarmate.com) is a professional
research social network that connects people to research with the aim
of “innovating smarter”. It offers research social network services that
help researchers find suitable funding opportunities and potential re-
search collaborators. In addition to its important function of connecting
958 T. Silva et al. / Decision Support Systems 55 (2013) 957–968

people with similar interests, Scholarmate has a search tool to help
researchers extract their publications from existing bibliographic data-
bases (e.g., ISI, Scopus) directly, along with citations of the paper
and impact factor of the journal. Moreover, Scholarmate provides
researchers with the ability to disseminate research outcomes and in-
formation about their current interests over established social connec-
tions. On the one hand, researchers can use Scholarmate to manage
their research outcomes and research in progress, including research
proposal preparation. On the other hand, transparency in information
sharing among scholars in Scholarmate will open an opportunity for
researchers to timely participate in relevant scholarly activities, such as
becoming potential reviewers. For example, a panel chair will be able
to judge the recent research expertise of a researcher after analyzing
the knowledge sharing activities in Scholarmate.
In Scholarmate, several types of networks can be constructed, such as
citation networks, project collaboration and journal article co-authorship
networks. An example of the collaboration network is presented in Fig. 3.
The numbers beside the nodes are researcher identification numbers
(RIDs). The numbers on the edges are the collaboration frequencies of
two researchers. The frequency of collaboration is measured in terms
of the number of co-authored publications, number of collaborated pro-
jects and number of co-cited papers extracted through the Scholarmate
platform. Three major communities are identified and are indicated by
the ovals in the figure. The communities are derived according to
research expertise. We are also able to identify top researchers in the
social network in terms of connectivity by degree, betweenness, and
closeness, as shown in Table 1. The numbers in brackets denote the
rankings under the corresponding measures. The researchers who
have high ranks in the same community as principle investigators are
identified as the potential reviewers subject to the condition that there's
no direct connection between the potential reviewers and the principle
investigators. For example, researcher 51 is a principle investigator and
researcher 55 is identified as a potential reviewer because these two
researchers are in the same community but they have no direct collab-
oration. The fact that both of them have collaborations with researchers
37 and 38 indicates potential overlap of research interests in some com-
mon research areas.
The research social network can enhance data representation in
several ways. For example, existing databases only store data about
published articles. Working papers that reflect the most recent
research activities cannot be obtained by a search in bibliographic
databases, but may be available on the social network site. Similarly, a
researcher who has secured an industry grant that is relevant to the
required reviewer expertise may be suitable to serve as a potential
reviewer. However, traditional method cannot identify this researcher
due to inability to access such information. Social network facilitates
real-time information sharing and therefore is effective for such type
of information acquisition. Such additional information greatly en-
hances the completeness and timeliness of our data representation.
4. Profiling and key indices
In this section, we present a comprehensive representation of the
proposal and researcher profiles from both available databases and
the research social network, based on which three key performance
indicators are derived: relevance, productivity, and connectivity.
Fig. 4 shows relationship between three key performance indicators
and their usage in reviewer recommendation.
Reviewer
Profiling
Discipline codes,
keywords, phrase
patterns
Reviewer
Recommendation
Proposal
Profiling
Key words
dictionary
Matching
Algorithm
Discipline codes,
keywords, phrase
patterns
Research Online
Platform
Scholarmate Research
Social Network
Research Analytics
Relevance Productivity Connectivity
Fig. 1. The framework of profile-based reviewer recommendation.
Proposal
clustering
Selection of
eligible reviewers
(Relevance
Index)
Exclusion of conflict
of interest
(Connectivity Index)
Balance expertise
of reviewers
(Productivity
Index)
Assignment
of reviewers
Fig. 2. Stage diagram for proposal-reviewer recommendation.
959
T. Silva et al. / Decision Support Systems 55 (2013) 957–968

Initially the system constructs profiles of proposals (indexed by i)
and researchers (indexed by j), respectively. The proposal profiling
and reviewer profiling are discussed in detail in Section 4.1. The three
key indices are developed as follows. We first use a component-based
matching algorithm to calculate the relevance index (rij), which denotes
the degree of matching between the proposal profile and the reviewer
profile. Based on the Scholarmate platform services, we construct
the connectivity index (cij) via the collaboration network indicating
frequency of research collaboration among reviewers, PIs and co-PIs.
The generated collaboration network is analyzed by identifying com-
munities and their features such as structure and closeness, and those
features are used in the generation of connectivity index. The connectiv-
ity index is used to resolve the conflict of interest and to identify the
most relevant reviewers. Finally, we generate potential reviewers' pro-
ductivity index (ej), which considers quality of the publications, research
impact and academic achievement. The productivity index is used to
balance the expertise of potential set of reviewers in the optimization
program of reviewer recommendation.
4.1. Profiling
In general, profiling is the process of determining key attributes
that can be used to characterize a given object. In our project selection
context we focus on proposal profiling and researcher profiling. The
objective of proposal profiling is to extract proposal relevant features
and that of researcher profiling is to extract researcher expertise. The
quality of profiling directly affects the effectiveness of research project
selection. The integration of both subjective and objective information
is necessary during the process of profile generation.
We first focus on proposal profiling. The proposal submitted
through the Internet-based Science Information System (ISIS) has
standard template to be filled in up to two discipline codes and five
keywords. We express the self-claimed discipline code (Discode) and
keywords (Key) in the following sequence:
bPropNo; DisCode1; DisCode2; Key1; Key2; …; Key5 > ð1Þ
where PropNo is the proposal number that uniquely identifies a proposal.
This sequence can be directly extracted from the proposal.
To verify whether the claimed information is accurate, an objective
examination of the proposal title and abstract is necessary. The second
type of information is obtained through data mining the title and
abstract sections of the proposal. It can be expressed in the following
sequence:
bPropNo; key1; key2; …; keym > : ð2Þ
Note here that we use lower case key to represent keywords
extracted from the non-standard content area (i.e., title and abstract).
This set of keywords has some overlaps with, but is generally larger
than, the standard keyword database defined by the government
funding agency. For the fair comparison of any two proposal docu-
ments, we extract m keywords in each document. The search algorithm
that we will discuss later determines the preferred number of key-
words. Ideally we can add the whole content of the proposal to obtain
the highest ranked keywords through word frequency analysis. We
found that this would increase the computational effort without adding
too much new insight. Mining the title and the abstract is accurate
enough to classify proposals according to the keywords.
We next consider researcher profiling. The funding agency main-
tains an expert dictionary for the pool of potential reviewers. The expert
Fig. 3. An example of collaborated network.
Table 1
Researchers' Connectivity Ranking.
RID Degree n-Degree Betweenness Closeness Overall
37 11 (1) 0.1930 0.0345 (4) 0.2069 (1) 0.1685 (1)
18 9 (2) 0.1579 0.0459 (1) 0.1787 (3) 0.1459 (2)
27 6 (2) 0.1053 0.0382 (3) 0.1474 (7) 0.1129 (5)
10 6 (4) 0.1053 0.0453 (2) 0.1843 (2) 0.1328 (3)
15 5 (4) 0.0877 0.0143 (9) 0.1685 (4) 0.1134 (4)
31 5 (6) 0.0877 0.0244 (5) – –
19 5 (6) 0.0877 0.0169 (7) – –
38 5 (6) 0.0877 – 0.1345 (8) –
52 4 (8) 0.0702 0.0122 (10) 0.1638 (5) 0.1054 (6)
43 4 (8) 0.0702 0.0163 (8) – –
34 – – 0.0207 (6) – –
36 – – – 0.1340 (9) –
44 – – – 0.1512 (6) –

dictionary is standardized and the available choices are the same as
those in the proposal application. Initially each potential reviewer
chooses his/her own disciplines and expertise areas (expressed as key-
words). The self-claimed discipline code (Discode) and keywords (Key)
are expressed in the following sequence:
bResearcherID; DisCode1; DisCode2; Key1; Key2; …; Key5 > ð3Þ
where ResearcherID is used to uniquely identify a potential reviewer.
Each potential reviewer may have successful grants from different
funding agencies and have publications, patents, or awards from
various sources. We extract such objective information from several
databases and list them as:
bResearcherID; GrantNo; DisCode1; DisCode2; Key1; Key2; …; Key5 >
ð4Þ
bResearcherID; PubNo; key1; key2; …; keym > : ð5Þ
In addition, the potential reviewers may have social tags. Social
tags are labels about expertise areas that are maintained by friends
or other concerned parties who may know reviewers well in other
capacities. For example, a panel chair may know research expertise
of the reviewer from his/her previous service to the funding agency.
Information extracted from reviewers' social tags can be aggregated
and expressed as:
bResearcherID; key1; key2; …; keym > : ð6Þ
4.2. Extracting topic features from texts
During the process of objective information extraction, it is neces-
sary to analyze non-free text areas such as titles and abstracts of elec-
tronic documents. The determination of a set of topic features from
non-text fields follows several steps including extracting phrases,
filtering out non-key phrases, resolve semantic heterogeneity and
constructing keyword dictionary. In this study we combine several
techniques including Rough Threshold Model and Database Tomogra-
phy and develop an algorithm to calculate document phrase weight
distribution.
When extracting information from texts such as titles and abstracts
in funded projects and publications, we first need to build a standard
research keyword dictionary. Phrases (a combination of multiple
words) rather than single word are used to solve semantic ambiguity
as single words are rarely sufficient to accurately distinguish standing
researcher interests [32]. Generally phrases carry more meaning than
single words. We find a phrase with length of 2–4 keywords strong
enough to capture the meaning effectively.
The free-text category fields of scientific publications (e.g. title,
abstract and keywords) are analyzed and technical phrases were
extracted using the Database Tomography (DT) process [17,18]. DT
is a textual database analysis system that provides algorithms for
extracting multi-word phrase frequencies with their proximities. We
applied DT algorithm to extract all adjacent double, adjacent triple
and adjacent quadruple word phrases from the text (i.e. title, abstract
and keywords) along with their frequencies. We discarded those
phrases with extremely high frequencies (not useful to distinguish
documents) and those with extremely low frequencies (not useful to
compare documents). Finally these phrases are built into the keyword
dictionary.
Extract self-
claimed
disciplines and
key words from
proposal
Extract key
words from
proposal title and
abstract
Proposal
Profiling
Researcher
Profiling
Extract self-claimed
disciplines and
expertise from
keyword dictionary
Extract key words
from previously
funded proposal
title and abstract
Extract key words
from publication
title and abstract
Relevance
Index
Extract PI/
Co-PI Info
Connectivity
Index
Weighted
publication
score from
research
databases
(e.g., ISI,
Scopus, EI)
Citation score based
on SCI/SSCI search
and H-index
Academic ranking &
institutional reputation
Research social network
(e.g., ScholarMate)
Relevance
Index
Connectivity
Index
Key Performance
Indicators of RAF
Remove
conflict of
interests
Balance
reviewer
expertise
Productivity
Index
Fig. 4. Process model and relationship with key indices in reviewer recommendation.
961

According to Rough Threshold Model (RTM) [20], documents are
represented in terms of weight distribution over topic features. We
use an augmented RTM topic filtering algorithm to generate topic
features from the documents. Specifically, let P={p1, … pm} be the
initial set of phrases extracted from all documents D={d1,d2, …,dn}.
Let fij be the number of appearances of phrase j in document di. A
document di can be expressed by a set of phrases with corresponding
occurring frequencies: di ={(p1,fi1), … (pn,fim)}.
The initial phrase set of di is rpi ={pj|fij >0}. If two documents have
the same phrase patterns, the two initial phrase patterns can be com-
posed. For example, {(p1,1),(p2,3)}⊕{(p1,2),(p2,2)}={(p1,3),(p2,5)},
where ⊕ denotes the composition operation. We can group the initial
phrase patterns that have the same phrase sets into clusters and use
their composed phrase pattern to represent the cluster. Assume that
there are rbn clusters. The cluster can be represented by crpr ={(p1,
cfr1),(p2, cfr2), …,(pm, cfrm)}, where the cluster frequency cf rk ¼
∑
crpr
j j
i¼1 f ik, for k=1, 2, … m, is the composed frequency in the cluster.
We define the support for phrase pattern rpi ∈crpr as follows.
support crpr
ð Þ ¼
crpr
j j
D
j j
ð7Þ
Furthermore, ∑rsupport crpr
ð Þ ¼ 1.
The normal form of the cluster phrase pattern can be described by
the following association mapping function: β(crpr)={(p1,wr1),(p2,
wr2), …,(pm,wrm)}, where phrase normalized frequency is defined as:
wrk ¼
cf rk
∑m
i¼1cf ri
; k ¼ 1; 2; …; m: ð8Þ
The relative importance weight of phrase pk in document i over all
documents can be defined as:
βik ¼ ∑
pk∈rpi∈β crpr
ð Þ
support crpr
ð Þwrk
f ik
cf rk
: ð9Þ
The document i can be alternatively represented by its phrase
weighted distribution βi ={βi1, βi2 … βim)}.
For a given document (i.e. set of publications and projects), all
initial phrase patterns are calculated with their pattern frequency.
Generated patterns are combined to construct clusters and clusters
are labeled using phrases in combined patterns. Each pattern frequency
in the cluster is normalized and normalized weights are calculated.
Finally, a document that is uniquely represented by its initial phrase
patterns can be characterized by its phrase weight distribution across
all documents. The algorithm is described as follows.
4.3. Relevance
The relevance index is used to determine how well reviewer
expertise is matched with the content of the proposal. It is calculated
from matching the proposal and reviewer profiles. The task of profile
matching is to decide whether a sequence of key phrases that
describe proposal profile attributes matches key phrases that represent
reviewer profile attributes. Two widely accepted approaches for calcu-
lating similarity between terms are Euclidean distance and cosine sim-
ilarity measure [14]. For the self-claimed information that is extracted
in standard terms, we use the Jaccard similarity measure [1] to perform
component-based matching over reviewer and proposal profiles. Data
extracted by Eqs. (1), (3) and (4) can be matched using this method.
The Jaccard index between reviewer i and proposal j is expressed as:
Jij ¼
F Keyi1; Keyi2; …Keyi5
ð Þ∩ Keyj1; Keyk2; …Keyj5

h i
F Keyi1; Keyi2; …Keyi5
ð Þ∪ Keyj1; Keyk2; …Keyj5

h i ð10Þ
where Keyik and Keyjk, k=1, 2 …, 5, are the five keywords associated
with reviewer i and proposal j in standard terms. The numerator
denotes the number of keywords in common, and the denominator
represents the total number of unique keywords in both profiles. As
shown, the Jaccard similarity is measured by the ratio of the frequency
of an intersection divided by the frequency of a union between two sets
of keywords [1].
To determine the similarity of the non-standard phrase patterns,
we adopt the cosine similarity measure. For researcher profile i and
proposal profile j, the similarity can be calculated as follows [9].
Cij ¼
βiβj
∥βi∥∥βj∥
¼
∑
m
k¼1βikβjk
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
∑m
k¼1β2
ik∑m
k¼1β2
jk
q ð11Þ
where βik and βjk are the normalized frequency of phrase patterns pk
in two profiles i and j. Phrase patterns extracted by Eqs. (2), (5), and
(6) are processed by the algorithm presented in Table 2. The resulting
weight distribution is used to derive the similarity measure.
Note that each researcher may have several grants or publications.
There are different ways to define the similarity measure in the
respective categories (grant or publication). The first possibility is to
consolidate several documents in the same category into one integrated
document that represents the researcher profile in that specific category.
Then the algorithm generates one weight distribution for the consoli-
dated document. Within each category only one consolidated measure
Cij is derived. Another method is to treat the documents separately.
The algorithm will result in one weight distribution for each document.
Pair-wise similarity measure can be calculated between each of the
researcher's documents and the proposal. We then choose the maxi-
mum similarity in a category as the final measure of similarity between
the proposal and the potential reviewer in that specific category.
Since multiple sources of information, both subjective and objective,
need to be aggregated, an appropriate weighting strategy is needed to
reflect the relative importance in the overall evaluation [9]. Denote rij
as the degree of matching between proposal i and the potential reviewer
j. An aggregate measure in the relevance dimension can be obtained as
follows:
rij ¼ αSelfij þ βGarntij þ γPubij þ δSocialij ð12Þ
where α+β+γ+δ=1.
The four terms refer to self-claimed information, grants, publications,
and social tags. Note that self-claimed information from proposal (Self)
and the social tags that label the potential reviewers (Social) are related
to the subjective judgment, while grants and publications provide ob-
jective measures related to the match between proposals and potential
Table 2
Algorithm to calculate document phrase weight distribution.
Input: A document set D and a phrase set P
Output: A document's phrase weight distribution βi =(βi1,βi2, …,βim)
Initialize RP=Φ
for each (di ∈D){
for (pj ∈P)
di ={(p1,fi1), … (pm,fim)}
rpi ={pj|fij 0}
RP=RP∪rpi
}
RP=⊕ RP
Cluster document crpr based on rpi
Calculate support (crpr) based on Eq. (7)
crpr ={(p1,cfr1),(p2,wr2),…,(pm,wrm)} based on Eq. (8)
for each (di ∈D) {
for (pj ∈rpi ∈crpr)
calculate βik based on Eq. (9)}
END

reviewers. Decision makers may assign different weights to aggregate
both subjective and objective information.
As shown in Fig. 5, the proposal related information including
discipline codes, keywords, abstracts and PI is displayed on the top
of the screen. The relevance score is calculated and displayed in the
middle. Clicking on each tab will show the matches that are identified
by the system.
Efficiency of the matching algorithm can be calculated in terms of
time complexity. The algorithm requires a single traverse through all
the set of reviewer profiles for each proposal. The matching between
pre-generated subjective, objective and social information patterns in
proposal and reviewer profiles requires O(n∗m) in its worst-case,
where n is the number of proposals and m is the number of reviewers.
The proposals are clustered according to their disciplines and a set of
proposals is matched against all the reviewers. In order to reduce the
computational complexity of the algorithm, profiles of reviewers and
proposals are constructed beforehand.
4.4. Connectivity
The nature of connection between reviewers, PIs and co-PIs is very
important when assigning reviewers to evaluate proposals. Bearing
same expertise as principle investigators and having no direct personal
conflict with PIs are essential constraints that should be satisfied by the
reviewers. Thus in this study we utilize social network analysis related
concepts such as community structure, closeness of individuals in
the same community to discover non-trivial relationships among
researches. After analyzing individuals in one community we are able
to identify group of individuals who have similar research interests,
who are active in the corresponding research area, and who have
close connection with PIs or co-PIs. Such information is then used to
remove conflict of interest and to aid preferential assignment to the
most relevant reviewers.
Several types of networks can be constructed using the available
social network data in Scholarmate. For example, we can represent
scientific papers as vertices in a graph. Vertices are connected by
the edges when one paper cites the other one, which cites others as
well. Alternatively, we can construct the researcher network. Each
researcher is represented as a vertex in a graph. An edge is built when
one researcher cites another's work (directed citation network), or
when one researcher co-authored with another researcher (undirected
collaboration network). We define the edge weight as the number of
citations or collaborations between two researchers. Higher weight
implies more connectivity between the two researchers.
We use graph clustering method to cluster graphs. Hierarchical
clustering is a traditional method for detecting community structure.
Here we focus on the collaboration network. We first assign a weight
uij for a pair of vertices in the network, which is defined as frequency
of collaboration between two researchers and therefore represents
how closely the researchers are connected. By analyzing the implicit
community structure and estimating the strength of ties between
individuals, we are able to discover nontrivial patterns of interactions
in the scientific collaboration networks.
Assume that there are s predefined communities. Define uIJ as the
fraction of collaboration frequency among researchers in community
I to those in community J. Denote aI =∑JuIJ, which represents the
Relevance Score
Overall Self
(40%)
Grants
(20%)
Publications
(20%)
Social
(20%)
81 100 40 85 80
Proposal 53361479
Discipline code 1: F020508 (Pattern recognition theory and application)
Discipline code 2
:
Keywords: Machine learning, Semi-supervised learning, Spectral clustering, Support vector machine
Abstract: Machine learning based on data is an important direction for modern artificial intelligence. It allows computers to automatically learn
characteristics of the unknown underlying probability distribution and refine learning based on empirical data from observed samples. Semi-
supervised learning is a popular machine learning technique that make use of both labeled and unlabeled data in the learning process. Support
vector machine is a new method for supervised learning that can be used for classification and regression analysis. Spectral methods are of
fundamental importance in statistics and machine learning. This project aims to combine support vector machine and spectral clustering methods to
study the semi-supervised learning problems for very large datasets.
Principle Investigator
: XXX
Institution:XXX
Grant No. Project Title Abstract
60775045 Data Reduction Method for Machine
Learning
Data reduction is one of the main topics of machine learning. High dimension and
nonlinearity are two key problems ...
61033013 Theories and Technologies of Image
Invariant Features Based on Cognitive
Models
This interdisciplinary research aimsto integrate human knowledge, image
processing, computer pattern recognition, and machine learning to extract image
invariant features …
Grant No. Title Discipline Code Keywords
60775045 Data Reduction Method for Machine
Learning
Discipline Code 1:F030504 Data mining
and machine learning
Discipline Code 2:
Data reduction, Lie Group,Machine
learning, Crystal classification
61033013 Theories and Technologies of Image
Invariant Features Based on Cognitive
models
Discipline Code 1: F0205 Computer
application technology
Discipline Code 2: F020512 Knowledge
discovery and knowledge engineering
Image processing, Recognition,
Machine learning, Scale space, Image
invariant features
More
More
Fig. 5. Matching between proposal and reviewer to calculate the relevance score.
963

weighted fraction of edges that connect to vertices in community
I (i.e., the fraction of collaborations that researchers in community
I collaborate with researchers in other communities). The Newman's
fast algorithm is based on the idea of modularity [25]. Following
their approach, we define the modularity measure for a network
with s communities as:
Qs ¼ ∑
s
I¼1 uII−a
2
I

ð13Þ
where uII is the weighted fraction of edges in the network that con-
nect vertices in the same community. A high value of Qs represents
a good community division. However, optimizing Qs over all possible
divisions is infeasible in practice for networks larger than thirty verti-
ces. Various approximation methods are available, such as simulated
annealing, genetic algorithms, and so on. A standard “greedy” optimi-
zation algorithm is used. The algorithm to determine the optimal
community structure takes the following steps (Table 3).
The algorithm starts with n communities, where n is the total
number of nodes in the collaboration network. Assuming that each
vertex is the sole member of a distinctive community, and the algo-
rithm iteratively merges each pair of communities in which there are
edges connecting them. The time taken to join any pair of communities
will at most be m, where m is the total number of edges in the graph.
The change of Qs can be calculated in constant time in each iteration.
Following the join some elements in the matrix W should be updated
by adding together the rows and columns corresponding to the joined
communities. At each step the algorithm takes worst-case time
O(n+m). When the algorithm completes its execution minimum n−
1 joins are required. Then the time complexity of the algorithm is
O((n+m)n) or O(n2
).
Since the value of Q|W| is calculated in each iteration, finding the
optimal community structure is straightforward. The hierarchical
clustering method also enables us to define the community structure
according to our required granularity level. To find the connectivity,
we extract all principle investigators and other members of proposal
i. If none of them is in the same community as the potential reviewer
j, we deem the reviewer is not an ideal candidate to review the pro-
posal. Therefore, we label gij ≪1 to suggest a mismatch. Otherwise,
we label gij =1, indicating a high goodness of fit.
Resolving conflict of interest is an important step in the reviewer
assignment process. For example, to ensure an objective review of
the proposal, the government funding agency requires that applicants
and reviewers should not have co-author relationship in the last five
years. Conflict of interest can be immediately identified by a direct
link in our collaboration network. If any of the primary members of
proposal i has conflict of interest with a potential reviewer j, we
label cij =0, enforcing a “No” decision in the reviewer assignment.
4.5. Productivity
Productivity index is calculated for potential reviewers and is used
to indicate the contribution to the field made by them. For fair and
unbiased project selection, productivity needs to be balanced among
the reviewers who are to be assigned to evaluate same proposals.
We measure the productivity of a potential reviewer in terms of the
number of publications, quality of the publications and citation
impact in the past five years. A productivity index can be computed
with aggregation of quality and quantity of publications.
Generally academic journals are classified into different disciplines
and they are assigned a rank, such as level A journals, level B journals
or level C journals. As in [33] we assume that the journal rank reflects
the quality of the articles published in that journal as it is widely used
in many research performance measuring activities related to merit
increases and for allocation of research funding in university settings
[36]. According to [33], we adopt a weighted scheme to generate the
productivity index as a measure of overall contribution of a researcher
to the field. Let qij be reviewer j's total number of publications in rank
i level's journals, where i=A, B, C. The publication score of reviewer j
is expressed as:
Gj ¼ wAqAj þ wBqBj þ wCqCj ð14Þ
where wA wB wC, indicating the emphasis on quality work. There are
different ways to define the weights. For example, the average impact
factors for all the journals classified at the same level can be used to
define the corresponding weight.
Professional titles (e.g. senior scholars like Professor and Associate
Professor, or junior scholars like Assistant Professor) and H-index can
also be taken into consideration for recommending reviewers to
proposals. We may assign higher rank score to higher professional
titles. Let Rj and Hj be potential reviewer j's rank score and H-index,
respectively. An integrated research productivity measure can be
obtained as follows:
ej ¼ uGj þ vRj þ tHj ð15Þ
where u+v+t=1.
5. Assigning reviewers for proposal evaluation
The reviewer assignment process deals with assigning reviewers
to evaluate proposals in specific discipline area. Current practice is
manual matching of proposals to reviewers based on their declared
expertise. This is inefficient because subjective expertise judgment
alone is insufficient to decide reviewer expertise as it lacks objective
evidences. We introduce the relevance measure to balance the self-
claimed expertise and the expertise induced from the derived objec-
tive information. The key objective is to maximize relevance between
proposals and potential reviewers.
Because the quality of evaluation largely depends on the experiences
and judgments from the reviewers, there is a need to balance reviewer
expertise among the reviewers who are assigned to the same proposal.
For example, senior scholars tend to give higher weight on innovative-
ness of the proposal than their junior counterparts, while junior scholars
tend to put higher weight on methodology rigor in comparison with
senior fellows. Let e be the desired average productivity level of the
potential reviewers. This can be determined by relevant decision makers
such as panel chairs or division managers. We want the average review-
er expertise levels to be close enough to this desired level. For example,
if the potential reviewer is a junior scholar whose ej is significantly
lower than e, then the proposal would need a senior scholar whose
Table 3
Algorithm to cluster the collaboration network into communities.
Step 1. Initially there are n vertices representing researchers. uij is the collabora-
tion frequency between researchers i and j. Initially each vertex is the sole member
of a distinctive community. Calculate the within and between community collabo-
ration fraction uII and uIJ, and form matrix W ¼
u11 … u1n
⋮ ⋱ ⋮
un1 … unn

. Calculate aI.
Step 2. Calculate ΔQIJ =uIJ +uJI −2aIaJ. Choose (I,J)=argMaxΔQIJ to join if ΔQIJ ≥0
or (I,J)=argMinΔQIJ if ΔQIJb0.
Step 3. Update the matrix elements uIJ by adding together the rows and columns cor-
responding to the joined communities. Update aI. Calculate Q|W| according to Eq. (13).
Step 4. Repeat steps 2 and 3 to join communities in pairs until all vertices are joined.
Step 5. The optimal community structure is determined by s=argMaxQ|W|.

productivity measure is significantly higher than e to review the
proposal.
First we construct a network model where each proposal and
potential reviewer is represented as a node in the network. The poten-
tial reviewer node is called the supply node, and the proposal node is
called the demand node. Assume that there is a set of I proposals and
a set of J potential reviewers. Let xij be the integer decision variable
indicating the assignment of proposal i to potential reviewer j. There-
fore, xij=1 implies recommending assignment and xij=0 implies that
the assignment is not recommended. We maximize the relevance sub-
ject to the flow constraints which reflect the management's require-
ment of the reviewer assignment. The optimization problem can be
expressed as:
Max ∑
i∈I
∑j∈Jcijgijrijxij
s:t: ∑j∈Jxij ≥ b for i∈I
∑i∈Ixij ≤ d for j∈J
∑j∈J ejxij−e

≤ ε for i∈I
xij∈ 0; 1
f g for i∈I; j∈J
: ð16Þ
The coefficients in the objective function ensure that we maximize
the overall relevance measure in the reviewer and proposal pools. cij
is the indicator variable to remove conflict of interest, and gij is the
coefficient for preferential assignment of reviewers in the same scien-
tific research community.
The first set of constraints ensures that each proposal has at least b
reviewers. The second set of constraints guarantees that each reviewer
cannot review more than d proposals. In practice, usually b=3 and d=
20. The third set of constraints is used to balance reviewer expertise.
Note that, ε0 is the tolerance level that can be chosen by the panel
chair or the management team.
As to the implementation of this model, we first analyze community
structures to remove conflict of interest and to identify potential re-
viewers. Next, we calculate the relevance degree between the reviewers
and the proposals in such a way that the PIs of the proposals belong to
the same community as their potential reviewers. The calculated rele-
vance degrees are sorted and reviewers with high relevance degree
are selected to evaluate those proposals. Finally, productivity among
the reviewers who are assigned to one proposal is balanced and work-
load is evenly distributed among reviewers.
In order to achieve a higher degree of computational performance,
the collaboration networks for reviewers and PIs of the proposals
under each division are constructed and the optimal numbers of com-
munities are derived before the reviewer assignment process is carried
out. First, the time complexity for traversing through the community
graph for connectivity index calculation is O(n0 +m0), where n0 is the
total number of nodes in one community and m0 is the number of
connections between individuals. Second, it requires O(n0m0) time
complexity for spanning through the whole set of reviewers and pro-
posals when generating the relevance degree matching. Third, the
time complexity for sorting the end result is O(n1logn1), where n1
represents the number of matching results. Finally, the time complexity
for balancing productivity of reviewers in the same group is O(1) and it
is negligible. In summary, the worst-case computational complexity of
the proposed technique is O(n0 +m0 +n0m0 +n1logn1).
6. Implementation and evaluation
The proposed research analytics framework is implemented to aid
the largest government funding agency in China for its grant proposal
evaluation. It aims at funding scientific research projects that could
make huge social impact. The organizational hierarchy of the funding
agency consists of one general office, five bureaus, and eight scientific
departments. These departments are responsible for funding and
managing research projects. Each department is further divided into
divisions which are more focused on specific research areas.
There is intensive competition for getting research projects funded,
with the most recent funding rate of only 21% in 2011. The government
funding agency received around 147,000 and 170,000 proposals in 2011
and 2012, respectively. Proposals are widespread over many scientific
disciplines. These conditions make it difficult for the evaluation com-
mittee to directly participate in every project evaluation. The committee
groups the proposals in different areas and delegates their authority
to groups of experts according to research areas. Each area may consist
of multiple related disciplines. For example, Business is an area that in-
cludes Management Science, Information Systems, and other business
disciplines. There is a general budget to be distributed among the
areas. The distribution of fund is not uniform and represents priorities
set by the evaluation committee of the funding agency. The distribution
could be adjusted based on the quality and quantity of proposals sub-
mitted to each area.
Research project selection is a process that involves multiple
phases illustrated in Table A in Appendix A. To facilitate the project
selection, the government funding agency has established an evalua-
tion system which includes the peer review and expert panel evalua-
tion. Division managers assign and invite external reviewers and
panel experts to evaluate the proposals. The reviewers judge the
quality of the project proposal based on their expertise, professional
experience and with norms and criteria set by the funding agency.
As seen, reviewer assignment is the most important phase that affects
the quality and efficiency of the research project selection.
We provide computerized support for the second phase of research
project selection. In the prototype implementation of our system, distri-
bution of fund is out of scope of this study. Our focus is the reviewer
assignment recommendation. We have tested different subsets of
proposals and reviewers. The system computes the score of matching
in the relevance dimension for each pair of proposal and potential
reviewer. The final assignment problem can be solved in reasonable
amount of time. The solution is recommended to the review panels in
their respective divisions. The review panels examine the recommenda-
tion and have the right to either accept or reject our recommended
assignment. Additionally, we provide data visualization to help man-
agers view the assignment progress. Fig. 6 shows an example of the
visualization.
Overall, it takes a maximum of 6 hours to compute matching
degrees of 34,000 proposals and 30,000 reviewers, which is the largest
number of proposals received in a single department of the government
funding agency. Thus if we use parallel and distributed computing for
the assignment optimization in each department (there are 8 distinctive
departments in total), we can finish the recommendation task within
6 hours. It greatly improves work efficiency as manual process of
assigning reviewers usually takes up to two weeks to complete.
Quality of recommendation is acknowledged by the review panels.
The profile-based recommendation takes into consideration the de-
tailed information in terms of relevance, productivity and connectiv-
ity. It can avoid conflict of interests and provide decision makers with
most relevant information that can hardly be obtained by manual
processes. The largest government funding agency has agreed to
adopt our recommendation system in the next round of proposal
evaluation.
7. Conclusion
Building upon a research analytics framework, this study presents
a new approach for research project selection in a research social
network environment. We built profiles of research entities (e.g. re-
search proposals, reviewers) from three aspects including relevance,
productivity and connectivity. Information for building the profiles
of research entities can be obtained from the research social network
(Scholarmate). Degrees of matching based on the profiles of research
965

entitles can be calculated by aggregating subjective, objective and so-
cial information as collected from multiple sources. We implemented
the system to aid the largest funding agency in China to optimize
reviewer recommendation and support reviewer assignment. The
implementation results showed that the proposed method greatly
improved work efficiency.
Our approach can be easily generalized to support different types of
recommendations in the research social network environment. A direct
application is journal article review. Based on the analysis of article
features, our system can be used to select the initial pool of reviewers,
calculate the degree of match between potential reviewers and the
article, remove conflict of interests, balance reviewer expertise and pro-
ductivity, and make final reviewer assignment recommendations. The
process can be automated and monitored by journal editors. In compar-
ison with the current practice that mainly relies on editors' subjective
judgment facilitated by automated search tools, our system has the
ability to optimize reviewer recommendation empowered by more
social functionalities. Improved accuracy and work efficiency can be
expected.
Other potential applications include recommending funding
opportunities, publication outlets for research articles, and potential
research collaborators. For example, researchers can easily promote
their recently published articles using the social tools in the form of
likes, tweets, shares, and more. They can even track results when their
articles are cited by others. Meanwhile, the system may recommend
researchers who work in the same research areas to each other within
and across different research communities. Based on a researcher's pro-
file, the research social network may also recommend journals that have
published relevant topics as potential journal outlet for working papers.
All these functions are very useful to promote timely distribution and
target dissemination of research work.
There are a number of limitations and possible future research
directions. First, a research project has various attributes that can
potentially influence both the impact and the probability of success
of the projects. We do not model the decision makers' preferences,
beliefs, priorities, and their risk attitudes. Presumably the reviewer
assignment decision problem can be modeled as a multi-objective
decision problem.
Second, our proposed framework only focuses on the evaluation of
individual projects without building a portfolio of the most promising
projects among all submitted proposals. The portfolio of projects to be
funded and the individual amount that will be funded to each project
are out of the scope of this research. Project evaluation, like product
review, is highly subjective. There is no feedback mechanism available
in the current framework to assess the quality of reviews. Historical
records of funded projects, including the relevant characteristics, evalu-
ation given by the reviewers, and the research output measured by pub-
lications, could be valuable to make better evaluation of new proposals
and to select unbiased reviewers. Future extension of the research
framework may take into account these aspects.
Finally, the power of Scholarmate is its ability to extract and aggre-
gate information from multiple sources. We need to continuously
improve the search tool to meet the increasing search needs of users.
Moreover, standardization of the keyword dictionary can greatly help
the phrase pattern recognition. While we keep evaluating and updating
the keyword dictionary based on feedback of algorithm performance,
we are aware that social vote is another efficient approach to identify
relevant keywords and remove those less meaningful ones. We have
implemented many social tools to aid the system improvement. The
ultimate goal is to promote a healthy research environment for
researchers to engage in innovative research production.
Acknowledgment
This research is partially funded by the General Research Fund of the
Hong Kong Research Grant Council (Project No: CityU 119611), the
National Natural Science Foundation of China (Project Nos: 71171172
and J1124003) and the City University of Hong Kong (Project No:
6000201).
Project
Clustering
Reviewer
Clustering
Reviewer
Assignment
Invite
Reviewers
View
Submission
Review
Status
View Reviews
and Comments
40
Management
Evaluation Proposal
Project Clustering
30
Progress Report
Reviewer Assignment
Reviewer Log In
100 10
Returned (100)
Unreturned (70)
Declined (30)
Invited (200)
In progress (40)
Not started (30)
Unreturned (70)
-
Log in (150)
Never log in (50)
Clustered (240)
Not clustered (60)
Not assigned (100)
Assigned but not invited yet (200)
Invited (200)
Home Application
Fig. 6. Visualization of the reviewer assignment progress.

Appendix A
References
[1] H. Abe, S. Tsumoto, Analysis of research keys as temporal patterns of technical term
usages in bibliographical data, in: A. An, P. Lingras, S. Petty, R. Huang (Eds.), Active
Media Technology, 6335, Springer, Berlin Heidelberg, 2010, pp. 150–157.
[2] E.M. Airoldi, X. Bai, K.M. Carley, Network sampling and classification: an investi-
gation of network model representations, Decision Support Systems 51 (3)
(2011) 506–518.
[3] R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, Second edition
Addison-Wesley, Wokingham, UK, 2011.
[4] A. Bajaj, R. Russell, AWSM: allocation of workflows utilizing social network
metrics, Decision Support Systems 50 (1) (2010) 191–202.
[5] A.L. Barabasi, H. Jeong, Z. Neda, E. Ravasz, A. Schubert, T. Vicsek, Evolution of the
social network of scientific collaborations, Physica A: Statistical Mechanics and Its
Applications 311 (3) (2002) 590–614.
[6] J.P. Caulkins, W. Ding, G.T. Duncan, R. Krishnan, E. Nyberg, A method for managing
access to web pages: filtering by Statistical Classification (FSC) applied to text,
Decision Support Systems 42 (1) (2006) 144–161.
[7] J. Choi, S. Yi, K.C. Lee, Analysis of keyword networks in MIS research and implications
for predicting knowledge evolution, Information Management 48 (8) (2011) 371–381.
[8] Y. Dang, Y. Zhang, P.J. Hu, S.A. Brown, H. Chen, Knowledge mapping for rapidly
evolving domains: a design science approach, Decision Support Systems 50 (2)
(2011) 415–427.
[9] Y. Dong, Z. Sun, H. Jia, A cosine similarity-based negative selection algorithm for
time series novelty detection, Mechanical Systems and Signal Processing 20 (6)
(2006) 1461–1472.
[10] W. Fan, M.D. Gordon, P. Pathak, Effective profiling of consumer information
retrieval needs: a unified framework and empirical comparison, Decision Support
Systems 40 (2) (2005) 213–233.
[11] M.A.H. Farquad, I. Bose, Preprocessing unbalanced data using support vector
machine, Decision Support Systems 53 (1) (2012) 226–233.
[12] M. Girvan, M.E.J. Newman, Community structure in social and biological
networks, Proceedings of the National Academy of Sciences of the United States
of America 99 (12) (2002) 7821–7826.
[13] A.D. Henriksen, A.J. Traynor, A practical RD project-selection scoring tool, IEEE
Transactions on Engineering Management 46 (2) (1999) 158–170.
[14] E. Herrera-Viedma, C. Porcel, Using incomplete fuzzy linguistic preference relations to
characterize user profiles in recommender systems, Ninth International Conference
on Intelligent Systems Design and Applications, ISDA '09, 2009, pp. 90–95.
[15] C.C. Huang, P.Y. Chu, Y.H. Chiang, A fuzzy AHP application in government-
sponsored RD project selection, Omega 36 (6) (2008) 1038–1052.
[16] T. Joachims, A statistical learning model of text classification with support vector
machines, Proceedings of ACM SIGIR'01, 2001, pp. 128–136.
[17] R.N. Kostoff, J.A. Del Roi, J.A. Humenik, E.O. Garcia, A.M. Ramirez, Citation mining:
integrating text mining and bibliometrics for research user profiling, Journal of
the American Society for Information Science and Technology 52 (13) (2001)
1148–1156.
[18] R.N. Kostoff, T. Braun, A. Schubert, D.R. Toothman, J.A. Humenik, Fullerene data
mining using bibliometrics and database tomography, Journal of Chemical Infor-
mation and Computer Science 40 (Jan–Feb 2000) 19–39.
[19] Y. Li, C. Zhang, J.R. Swan, An information filtering model on the web and its applica-
tion in job agent, Knowledge-Based Systems 13 (5) (2000) 285–296.
[20] Y. Li, X. Zhou, P. Bruza, Y. Xu, R.Y.K. Lau, A two-stage decision model for information
filtering, Decision Support Systems (2011), http://dx.doi.org/10.1016/j.dss.2011.
11.005.
[21] T.M. Mitchell, Machine Learning, McGraw-Hill, New York, NY, 1997.
[22] J. Mostafa, W. Lam, Automatic classification using supervised learning in a medical
document filtering application, Information Processing and Management 36 (3)
(2000) 415–444.
[23] M.E.J. Newman, The structure of scientific collaboration networks, Proceedings of
the National Academy of Sciences of the United States of America 98 (2001)
404–409.
[24] M.E.J. Newman, Coauthorship networks and patterns of scientific collaboration,
Proceedings of the National Academy of Sciences of the United States of America
(PNAS) 101 (Suppl. 1) (2004) 5200–5205.
[25] M.E.J. Newman, Fast algorithm for detecting community structure in networks,
Physical Review E 69 (6) (2004).
[26] G. Oestreicher-Singer, A. Sundararajan, Recommendation networks and the long
tail of electronic commerce, MIS Quarterly 36 (1) (2012) 65–83.
[27] J. Qiu, Z. Lin, A framework for exploring organizational structure in dynamic social
networks, Decision Support Systems 51 (4) (2011) 760–771.
[28] S. Raghuram, P. Tuertscher, R. Garud, Research note: mapping the field of virtual
work: a cocitation analysis, Information Systems Research 21 (4) (December
2010) 983–999.
[29] S. Robertson, I. Soboroff, The TREC 2002 Filtering Track Report, TREC, 2002.
[30] J. Scott, Social Network Analysis: A Handbook, Sage Publications, London, 2000.
[31] N. Shibata, Y. Kajikawa, I. Sakata, Measuring relatedness between communities in
a citation network, Journal of the American Society for Information Science and
Technology 62 (7) (2011) 1360–1369.
[32] T. Strzalkowski, Robust text processing in automated information retrieval,
Proceedings of the 4th Applied Natural Language Processing Conference (ANLP),
1994, pp. 168–173.
[33] Y.H. Sun, J. Ma, Z. Fan, J. Wang, A group decision support approach to evaluate
experts for RD project selection, IEEE Transactions on Engineering Management
55 (1) (2008) 158–170.
[34] Y.H. Sun, J. Ma, Z.P. Fan, J. Wang, A hybrid knowledge and model approach for
reviewer assignment, Expert Systems with Applications 34 (2008) 817–824.
[35] Q. Tian, J. Ma, J. Liang, R.C.W. Kwok, O. Liu, An organizational decision support system
for effective RD project selection, Decision Support Systems 39 (2005) 403–413.
[36] E. Turban, D. Zhou, J. Ma, A group decision support approach to evaluating
journals, Information Management 42 (1) (2004) 31–44.
Table A
Research project selection process at the government funding agency.
Phases in R D pro-
ject selection
Key decisions
Call for proposal and
proposal
submission
1) Check the validity of the submitted proposal content
2) Fulfillment of application requirement by the principle
investigator and by the proposal
Identifying the most
suitable external
reviewers for
proposal evaluation
1) Selection of potential reviewers based on claimed expertise
2) Assignment of external reviewers for validated proposals
based on predefined criteria
3) Transferring proposals to responsible divisions
Peer review 1) Review the quality and content of proposals by external
reviewers based on the provides guidelines
2) Validate the review content
3) Coordinate with external reviewers and completion of
the review process as scheduled
Review results
aggregation
1) Aggregate the review results and transform the review
results into comparable measurement and rank the proposal
accordingly
2) Recommend proposals for panel evaluation
Panel evaluation 1) Refine the suggested proposal list by making decisions
on marginal proposals by panel of expertise
2) Suggestion on funded project list
Final decision making 1) Consideration of exceptional cases
2) Recommend list of projects to be funded
Table B
Table of notation.
Notation Description
Profiling
P={p1, … pm} Initial set of m phases
D={d1,d2, …,dn} Initial set of n documents
r Number of clusters
fik Occurrence frequency of phrase k in document
di, k=1, 2, …, m
rpi Initial phrase set of the document di
crpr Cluster of phrases
support(crpr) Supporting measure of cluster crpr
wrk Normalized phrase frequency for cluster r,
k=1, 2, …, m
βik Relative importance weight of phase pk
in document i, k=1, 2, …, m
β(crpr) Normal form of the cluster phrase patterns
βi ={βi1,βi2, …,βim} Phrase weighted distribution of document i
Relevance index rij
Jij Jaccard similarity index of proposal i and reviewer j
Cij Cosine similarity index of proposal i and reviewer j
Connectivity index cij
uij Collaboration frequency between researchers i and j
uIJ Collaboration frequency among researchers in
community I to those in community J
aI Weighted fraction of edges that connect to vertices
in community I
Qs Modularity measure for a network with s communities
gij Goodness of fit between proposal i to reviewer j
Productivity index ej
Gj Potential reviewer j's publication score
Rj Potential reviewer j's academic rank
Hj Potential reviewer j's H-index
ej Potential reviewer j's productivity measure
e Desired average productivity level determined
by panel chairs or division managers
967

[37] A.S. Vivacqua, J. Oliveira, J.M. De Souza, i-ProSE: inferring user proﬁles in a scientiﬁc
context, The Computer Journal 52 (7) (2009) 789–798.
[38] K.M. Wang, C.K. Wang, C. Hu, Analytic hierarchy process with fuzzy scoring
in evaluating multidisciplinary RD projects in China, IEEE Transactions on
Engineering Management 52 (1) (2005) 119–129.
[39] D.J. Watts, S.H. Strogatz, Collective dynamics of ‘small-world’ networks, Nature
393 (1998) 440–442.
[40] Z. Zheng, K. Chen, G. Sun, H. Zha, A regression framework for learning ranking func-
tions using relative relevance judgments, Proc. of SIGIR'07, 2007, pp. 287–294.
Thushari Silva is currently pursuing her PhD in the department of Information
Systems at the City University of Hong Kong. She received her MSc. in Information
and Communication Technology from Asian Institute of Technology, Thailand in
2010. Her research interests include research social network analysis, recommender
systems, business intelligence and semantic web.
Zhiling Guo is an Assistant Professor in Information Systems at the City University of
Hong Kong. She received her Ph.D. in Management Science and Information Systems
from The University of Texas at Austin in 2005. Dr. Guo's general research interests
include online auctions, electronic markets, cloud computing, crowdsourcing, social
networks, social media marketing, and supply chain risk management. Dr. Guo's
papers have been published in Management Science, Information Systems Research,
Journal of Management Information Systems, Decision Support Systems, among others.
Jian Ma is a Professor in the Department of Information Systems at the City University
of Hong Kong. He received his Doctor of Engineering degree in Computer Science from
Asia Institute of Technology in 1991. Prof. Ma's general research interests include
business intelligence, research and Innovation Social Networks, research information
systems and decision support systems. His past research has been published in
IEEE Transactions on Engineering Management, IEEE Transactions on Education, IEEE
Transactions on Systems, Man and Cybernetics, Decision Support Systems and European
Journal of Operational Research, among others.
Hongbing Jiang is currently pursuing his PhD in the University of Science and Technology
of China–City University of Hong Kong joint Advanced Research Center, Suzhou. His
research interests include recommendation systems and social network analysis.
Huaping Chen is a Professor of School of Management at the University of Science and
Technology of China. His research interests include information strategies, business
intelligence and application. His past research has been published in Journal of Opera-
tions Management, Decision Support Systems and Computers Operations Research,
among others.

A Social Network-Empowered Research Analytics Framework For Project Selection

Recommended

Recommended

More Related Content

Similar to A Social Network-Empowered Research Analytics Framework For Project Selection

Similar to A Social Network-Empowered Research Analytics Framework For Project Selection (20)

More from Nat Rice

More from Nat Rice (20)

Recently uploaded

Recently uploaded (20)

A Social Network-Empowered Research Analytics Framework For Project Selection