Large-scale analysis of bibliometric
networks
Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
International Conference on Data-driven Discovery:
When Data Science Meets Information Science
Beijing, China, June 20, 2016
Bibliographic databases: ‘Big data’
1
Web of Science Scopus
Journals 12,000 20,000
Publications 45 million 35 million
Citations 1 billion 0.9 billion
Bibliometric networks
2
Web of
Science
Scopus
Citation network
of pubs / authors / journals
Co-authorship network
of authors / organizations
Co-citation network
of pubs / authors / journals
Co-occurrence network
of keywords / terms
Bibliographic coupling network
of pubs / authors / journals
Bibliographic
database
Software tools
• VOSviewer (www.vosviewer.com)
– Tool for constructing and visualizing bibliometric networks
• CitNetExplorer (www.citnetexplorer.nl)
– Tool for visualizing and analyzing citation networks of
publications
• Both tools have been developed together
with my colleague Ludo Waltman 5
• Any type of bibliometric
network
• Co-authorship, direct citations,
co-citation, and bibliographic
coupling
• Time dimension is ignored
• Networks of at most ~10,000
nodes are supported
• Only citation networks of
publications
• Direct citation between
publications
• Time dimension is explicitly
considered
• Millions of publications are
supported
11
VOSviewer CitNetExplorer
Network analysis techniques
13
Layout:
• Assigning the nodes in a network to
locations in a (usually 2d) space
(a.k.a. mapping)
• Visualization of similarities (VOS)
Clustering:
• Partitioning the nodes in a network
into a number of groups (a.k.a.
community detection)
• Weighted modularity
• Smart local moving algorithm
Unified approach to mapping and
clustering
Minimize
where
n: number of nodes in the network
m: total weight of all edges in the network
Aij: weight of edge between nodes i and j
ki: total weight of all edges of node i
16
ji
ij
ji
ijij
ji
n ddA
kk
m
xxQ 2
1
2
),,(
Mapping
xi: vector denoting the location
of node i in a p-dimensional
space
p
k
jkikjiij xxxxd
1
2
)(
Clustering
xi: integer denoting the
community to which node i
belongs
: resolution parameter
ji
ji
ij
xx
xx
d
if1
if0
Smart local moving algorithm
17
Q = 0.4198
Q = 0.3791
Reduced
network
Local moving
heuristic in
subnetworks
Local moving heuristic
Original
network
Algorithmically constructed
classification system of science
• 17.8 million publications from the period 2000–
2015 indexed in Web of Science
• 282.4 million citation relations
• Classification system of 3 hierarchical levels:
– 27 broad disciplines
– 817 fields
– 4,113 subfields
18
Breakdown of scientific literature into
817 fields
19
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
What is data science?
• Empirical operationalization of data science based
on publications with ‘data’ in title or abstract
23
Wikipedia: “Data Science is an interdisciplinary field
about processes and systems to extract knowledge
or insights from data … which is a continuation of
some of the data analysis fields such as statistics,
data mining, and predictive analytics”
LCDS: “Data Science … deals with finding, analyzing
and validating complex patterns in data. Data
Science methods are indispensable for maintaining a
competitive edge in all disciplines in science”
Breakdown of scientific literature into
817 fields
25
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Data-driven nature of different
scientific fields
26
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
% pub. with ‘data’ in title or abstract
Data-driven nature of different
scientific fields
27
artificial
intelligence
statisticsbioinformatics
neuroimaging pattern
recognition
astronomy
earth
water
climate
remote
sensing
nutrition
obesity
addiction
accident
analysis
% pub. with ‘data’ in title or abstract
Data science fields (at least 25% ‘data’
publications)
28
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
China’s publication output in data
science fields
30
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
China’s publication output in data
science fields
31
artificial
intelligence
pattern
recognition
high
energy
earth
atmospheres
weather
remote
sensing
Chinese institutes with most publications
in data science fields (2011-2015)
• Chinese Academy of Sciences
• Peking University
• Tsinghua University
• China University of Geosciences
• Zhejiang University
• Nanjing University
• Shanghai Jiao Tong University
• University of Science and Technology of China
• Beijing Normal University
• University of Hong Kong
32
CAS publication output in data
science fields
33
earth
atmospheres
weather
remote
sensing
vegetation
astronomy
high energy
Term map based on CAS publications in
data science fields
34
CAS (Beijing Branch) publication
output in data science fields
35
astronomy
earth
atmospheres
weather
remote
sensing
vegetation
high energy
CAS (Shanghai Branch) publication
output in data science fields
36
bioinformatics
genetics
astronomy
nuclear