A new software tool for large-scale analysis of citation networks
A new software tool for large-scale
analysis of citation networks
Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
Workshop “Measuring the Diversity of Research”, Berlin
September 2, 2013
Today’s talk
• Part 1: CWTS research program on bibliometric network
analysis
– VOSviewer
– VOS mapping and clustering
– Large-scale modularity optimization
– Algorithmically constructed publication-level classification system
• Part 2: New software tool for large-scale analysis of
citation networks
1
VOS mapping and clustering
• Mapping and clustering are commonly used bibliometric
network analysis techniques
• Mapping:
– Assigning the nodes in a network to locations in a (usually twodimensional) space
– VOS mapping technique has been developed specifically for mapping
bibliometric networks
• Clustering:
– Partitioning the nodes in a network into a number of groups (a.k.a.
community detection)
– VOS clustering technique has been developed to be used jointly with
the VOS mapping technique in a unified technical framework
12
Unified approach to mapping and
clustering
2mcij
Q( x1 ,, xn )
Minimize
i j
ci c j
2
dij
dij
i j
where
n: number of nodes in the network
m: number of links in the network
cij: number of links between nodes i and j
ci: number of links of node i
Mapping
Clustering
xi: vector denoting the location of
node i in a p-dimensional map
xi: integer denoting the cluster to
which node i belongs
p
dij
xi
xj
( xik
k 1
x jk )
2
dij
0
if xi
xj
1
if xi x j
: resolution parameter
15
Unified approach: Mapping
• Equivalent to the VOS mapping technique
• Closely related to multidimensional scaling (Van Eck et
al., JASIST, 2010)
16
Unified approach: Clustering
• Equivalent to a weighted and parameterized variant of
modularity-based clustering (Waltman et al., JOI, 2010)
ˆ
Q( x1 ,, xn )
Maximize
1
2m i
( xi , x j ) wij cij
j
ci c j
2m
where
(xi, xj) equals 1 if xi = xj and 0 otherwise
wij
2m
ci c j
• Parameter makes it possible to customize the
granularity level of the clustering
17
Large-scale modularity optimization
• Modularity optimization is one of the most popular
approaches to clustering in networks
• Several variants of the original modularity function have
been proposed, supporting for instance weighted
networks and different resolution levels
• Optimization of modularity functions in large networks
(with millions of nodes and edges) has received only
limited attention but has important applications in
bibliometrics
18
New algorithm for large-scale modularity
optimization
• ‘Louvain algorithm’ (Blondel et al., 2008) is the bestknown algorithm for large-scale modularity optimization
• Our proposed ‘smart local moving algorithm’ can be
seen as an enhanced version of this algorithm (Waltman
& Van Eck, 2013)
19
Classification systems of scientific
publications
• Web of Science/Scopus journal subject categories:
– Scientific fields defined at the level of journals rather than individual
publications
– Difficulties with multidisciplinary journals
– High level of aggregation
– Sometimes outdated or inaccurate
• Disciplinary classification systems:
– E.g., CA, JEL, MeSH, PACS
– Not available for all disciplines
– Sometimes outdated or inaccurate
23
Algorithmic classification systems
(Waltman & Van Eck, JASIST, 2012)
• Why not algorithmically construct a classification system
of science?
• We cluster publications (not journals) into fields based
on citation relations
• Only direct citation relations are used; no co-citation or
bibliographic coupling relations
• Fields are defined at different levels of granularity and
are organized hierarchically
24
Example
• 10.2 million publications from the period 2001–2010
indexed in Web of Science
• 97.6 million direct citation relations
• Classification system of 3 hierarchical levels:
– 20 broad disciplines
– 672 fields
– 22,412 subfields
• Clustering by optimizing a variant of the standard
modularity function that accounts for differences across
fields in citation practices
25
Map of the 672 research areas at level 2
of the classification system
26
Map of the 417 publications in research
area 4.30.10
27
Exploring citation networks: Why?
• To support literature reviewing
• To show how the scientific literature has evolved over
time
• To delineate topics or research areas in the literature
• To identify connections between different topics in the
literature
29
Motivation for a new tool
• VOSviewer has proven to be a very useful tool for
visualizing science from a static point of view
• VOSviewer has not been developed for visualizing the
dynamics of science
• In fact, the availability of software tools for dynamic
visualizations is rather limited:
– CiteSpace (Chaomei Chen)
– HistCite (Eugene Garfield)
30
HistCite
• Timeline visualization of publications and their citation
relations, referred to as algorithmic historiography by
Garfield
31
Citation Network Explorer
• Somewhat similar to HistCite, but capable of dealing with
much larger citation networks
• So far, the tool has been used successfully with the
entire Web of Science citation network of the social
sciences (1980–2013; ~2M publications and ~20M
citations)
• The aim is to be able to handle the entire citation
network of all scientific disciplines (~40M publications
and ~500M citations)
32
Today’s demonstration (1)
• We demonstrate a prototype of the tool
• The core functionality is available, but some options
have not yet been fully implemented
• Your feedback is very much appreciated!
33
Today’s demonstration (2)
• Data set 1:
– Scientometrics
– 1980–2013
– ~10K publications and ~60K citations
• Data set 2:
– All social sciences except for psychology, education, and health-related
sciences
– 1980–2013
– ~1.4M publications and ~10M citations
34
References
Van Eck, N.J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for
bibliometric mapping. Scientometrics, 84(2), 523-538.
Van Eck, N.J., & Waltman, L. (2011). Text mining and visualization using VOSviewer. ISSI Newsletter,
7(3), 50-54.
Van Eck, N.J., Waltman, L., Dekker, R., & Van den Berg, J. (2010). A comparison of two techniques for
bibliometric mapping: Multidimensional scaling and VOS. JASIST, 61(12), 2405-2416.
Van Eck, N.J., Waltman, L., Van Raan, A.F.J., Klautz, R.J.M., & Peul, W.C. (2013). Citation analysis
may severely underestimate the impact of clinical research as compared to basic research.
PLoS ONE, 8(4), e62395.
Waltman, L., & Van Eck, N.J. (2012). A new methodology for constructing a publication-level
classification system of science. JASIST, 63(12), 2378-2392.
Waltman, L., & Van Eck, N.J. (2013). A smart local moving algorithm for large-scale modularity-based
community detection. arXiv:1308.6604.
Waltman, L., Van Eck, N.J., & Noyons, E.C.M. (2010). A unified approach to mapping and clustering of
bibliometric networks. Journal of Informetrics, 4(4), 629-635.
36