Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
On cluster stability
Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
15th Internation...
Introduction
• A clustering technique can be used to obtain highly
detailed clustering results (i.e., a large number of
cl...
Example: Waltman and Van Eck (2012)
2
Cluster stability
• To ensure that publications are assigned to clusters in a
meaningful way, we introduce the notion of s...
Identification of stable clusters:
Step 1
• Collect the citation network of publications
• Create a large number (e.g., 10...
5
Original network Bootstrap networks
1
1
1
0
1
1
2
1
1
0
1
3
1
1
1 2
1
1
1
1
0
1
1
3
1
0
4
1
1
1
2 2
1
1
1
1
0
2
1
0
0
1
...
Identification of stable clusters:
Step 2
• Create a network of publications with an edge between
two publications if the ...
1.0
0.9
0.9
0.4
0.6
0.9
0.9
0.9
0.1
0.1
0.9
1.0
0.9
0.5
0.9 1.0
Weighted network
7
Binary network
Connected components
Sta...
Data
• Library & Information Sciences (LIS):
– Time period: 1996-2013
– Publications: 31,534
– Citation links: 131,266
• A...
Cluster stability LIS
9
Stable clusters LIS (resolution 2)
10
Stable clusters LIS (resolution 2)
11
Cluster stability Berlin
12
Cluster stability
13
LIS Berlin
Conclusions
• If we want to have an accurate and detailed clustering,
we need to be satisfied with a clustering that doesn...
Thank you for your attention!
15
References
Rosvall, M., & Bergstrom, C.T. (2009). Mapping change in large
networks. PLoS ONE, 5(1), e8694.
http://dx.doi.o...
Upcoming SlideShare
Loading in …5
×

On cluster stability

1,164 views

Published on

To ensure that publications are assigned to clusters in a meaningful way, we introduce the notion of stable clusters. Essentially, a cluster is stable if it is insensitive to small changes in the underlying data. Bootstrapping is used to make small changes in the data. It is shown that if we want to have an accurate and detailed clustering, we need to be satisfied with a clustering that doesn’t comprehensively cover all publications. Publications that do not clearly belong to one of the main topics in a field cannot be assigned to a cluster.

Published in: Science
  • Be the first to comment

On cluster stability

  1. 1. On cluster stability Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University 15th International Conference on Scientometrics & Informetrics Istanbul, Turkey, June 30, 2015
  2. 2. Introduction • A clustering technique can be used to obtain highly detailed clustering results (i.e., a large number of clusters) • A clustering technique can be used to force each publication to be assigned to a cluster • However, in a highly detailed clustering, is the assignment of publications to clusters still meaningful? 1
  3. 3. Example: Waltman and Van Eck (2012) 2
  4. 4. Cluster stability • To ensure that publications are assigned to clusters in a meaningful way, we introduce the notion of stable clusters • Essentially, a cluster is stable if it is insensitive to small changes in the underlying data • Bootstrapping is used to make small changes in the data 3
  5. 5. Identification of stable clusters: Step 1 • Collect the citation network of publications • Create a large number (e.g., 100) of bootstrap citation networks: – A bootstrap citation network is a weighted variant of the original citation network in which each edge has an integer weight drawn from a Poisson distribution with mean 1 (cf. Rosvall & Bergstrom, 2009) • In each bootstrap citation network, perform clustering • For each pair of publications, calculate the proportion of the bootstrap clustering results in which the publications are in the same cluster 4
  6. 6. 5 Original network Bootstrap networks 1 1 1 0 1 1 2 1 1 0 1 3 1 1 1 2 1 1 1 1 0 1 1 3 1 0 4 1 1 1 2 2 1 1 1 1 0 2 1 0 0 1 3 1 1 0 1 1 Clustering 1 1 1 0 1 1 2 1 1 0 1 3 1 0 1 2 1 1 0 0 1 1 1 3 0 0 4 1 1 1 0 2 1 1 1 1 0 1 1 0 0 1 3 1 2 0 1 1 1.0 0.9 0.9 0.4 0.6 0.9 0.9 0.9 0.1 0.1 0.9 1.0 0.9 0.5 0.9 1.0 Weighted network Clustered bootstrap networks
  7. 7. Identification of stable clusters: Step 2 • Create a network of publications with an edge between two publications if the publications are in the same cluster in at least a certain proportion (e.g., 0.9) of the bootstrap clustering results • Identify connected components in the newly created network • Each connected component represents a stable cluster 6
  8. 8. 1.0 0.9 0.9 0.4 0.6 0.9 0.9 0.9 0.1 0.1 0.9 1.0 0.9 0.5 0.9 1.0 Weighted network 7 Binary network Connected components Stable clusters
  9. 9. Data • Library & Information Sciences (LIS): – Time period: 1996-2013 – Publications: 31,534 – Citation links: 131,266 • Astrophysics (Berlin dataset): – Time period: 2003-2010 – Publications: 101,828 – Citation links: 924,171 8
  10. 10. Cluster stability LIS 9
  11. 11. Stable clusters LIS (resolution 2) 10
  12. 12. Stable clusters LIS (resolution 2) 11
  13. 13. Cluster stability Berlin 12
  14. 14. Cluster stability 13 LIS Berlin
  15. 15. Conclusions • If we want to have an accurate and detailed clustering, we need to be satisfied with a clustering that doesn’t comprehensively cover all publications • Publications that do not clearly belong to one of the main topics in a field cannot be assigned to a cluster • Cluster stability analysis can be used to distinguish between meaningful and non-meaningful assignments of publications to clusters 14
  16. 16. Thank you for your attention! 15
  17. 17. References Rosvall, M., & Bergstrom, C.T. (2009). Mapping change in large networks. PLoS ONE, 5(1), e8694. http://dx.doi.org/10.1371/journal.pone.0008694 Waltman, L., & Van Eck, N.J. (2012). A new methodology for constructing a publication-level classification system of science. JASIST, 63(12), 2378-2392. http://dx.doi.org/10.1002/asi.22748 Waltman, L., & Van Eck, N.J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. European Physical Journal B, 86(11), 471. http://dx.doi.org/10.1140/epjb/e2013-40829-0 16

×