Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A new software tool for large-scale analysis of citation networks

2,674 views

Published on

Published in: Education, Technology
  • Be the first to comment

A new software tool for large-scale analysis of citation networks

  1. 1. A new software tool for large-scale analysis of citation networks Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University Workshop “Measuring the Diversity of Research”, Berlin September 2, 2013
  2. 2. Today’s talk • Part 1: CWTS research program on bibliometric network analysis – VOSviewer – VOS mapping and clustering – Large-scale modularity optimization – Algorithmically constructed publication-level classification system • Part 2: New software tool for large-scale analysis of citation networks 1
  3. 3. Part 1 CWTS research program on bibliometric network analysis 2
  4. 4. VOSviewer (1) (Van Eck & Waltman, Scientometrics, 2010) 3
  5. 5. VOSviewer (2) (Van Eck & Waltman, Scientometrics, 2010) 4
  6. 6. Subject categories 5
  7. 7. Leiden University 6
  8. 8. Erasmus University Rotterdam 7
  9. 9. Delft University of Technology 8
  10. 10. Clinical Neurology 9
  11. 11. Clinical Neurology: Citation density 10 (Van Eck et al., PLoS ONE, 2012)
  12. 12. Clinical Neurology: Reference density 11
  13. 13. VOS mapping and clustering • Mapping and clustering are commonly used bibliometric network analysis techniques • Mapping: – Assigning the nodes in a network to locations in a (usually twodimensional) space – VOS mapping technique has been developed specifically for mapping bibliometric networks • Clustering: – Partitioning the nodes in a network into a number of groups (a.k.a. community detection) – VOS clustering technique has been developed to be used jointly with the VOS mapping technique in a unified technical framework 12
  14. 14. Unified approach: Clustering seen as mapping in a restricted space 13
  15. 15. Unified approach: Clustering seen as mapping in a restricted space 14
  16. 16. Unified approach to mapping and clustering 2mcij Q( x1 ,, xn ) Minimize i j ci c j 2 dij dij i j where n: number of nodes in the network m: number of links in the network cij: number of links between nodes i and j ci: number of links of node i Mapping Clustering xi: vector denoting the location of node i in a p-dimensional map xi: integer denoting the cluster to which node i belongs p dij xi xj ( xik k 1 x jk ) 2 dij 0 if xi xj 1 if xi x j : resolution parameter 15
  17. 17. Unified approach: Mapping • Equivalent to the VOS mapping technique • Closely related to multidimensional scaling (Van Eck et al., JASIST, 2010) 16
  18. 18. Unified approach: Clustering • Equivalent to a weighted and parameterized variant of modularity-based clustering (Waltman et al., JOI, 2010) ˆ Q( x1 ,, xn ) Maximize 1 2m i ( xi , x j ) wij cij j ci c j 2m where (xi, xj) equals 1 if xi = xj and 0 otherwise wij 2m ci c j • Parameter makes it possible to customize the granularity level of the clustering 17
  19. 19. Large-scale modularity optimization • Modularity optimization is one of the most popular approaches to clustering in networks • Several variants of the original modularity function have been proposed, supporting for instance weighted networks and different resolution levels • Optimization of modularity functions in large networks (with millions of nodes and edges) has received only limited attention but has important applications in bibliometrics 18
  20. 20. New algorithm for large-scale modularity optimization • ‘Louvain algorithm’ (Blondel et al., 2008) is the bestknown algorithm for large-scale modularity optimization • Our proposed ‘smart local moving algorithm’ can be seen as an enhanced version of this algorithm (Waltman & Van Eck, 2013) 19
  21. 21. Louvain algorithm Q = 0.3791 Q = 0.4151 20
  22. 22. Smart local moving algorithm Q = 0.3791 Q = 0.4198 21
  23. 23. Comparison Network Amazon (0.5M / 0.9M) DBLP (0.4M / 1.0M) IMDb (0.4M / 15.0M) LiveJournal (4.0M / 34.7M) WoS (10.6M / 104.5M) Web uk-2005 (39.5M / 783.0M) Louvain Smart local moving Qmin 0.9257 0.9335 Qmax 0.9264 0.9338 6 28 Qmin 0.8203 0.8357 Qmax 0.8227 0.8367 7 26 Qmin 0.6976 0.7050 Qmax 0.7041 0.7077 18 100 Qmin 0.7441 0.7676 Qmax 0.7557 0.7720 350 1 549 Qmin 0.7714 0.7918 Qmax 0.7786 0.7957 6 800 19 994 Qmin 0.9793 0.9801 Qmax 0.9795 0.9801 t 11 006 17 074 t t t t t 22
  24. 24. Classification systems of scientific publications • Web of Science/Scopus journal subject categories: – Scientific fields defined at the level of journals rather than individual publications – Difficulties with multidisciplinary journals – High level of aggregation – Sometimes outdated or inaccurate • Disciplinary classification systems: – E.g., CA, JEL, MeSH, PACS – Not available for all disciplines – Sometimes outdated or inaccurate 23
  25. 25. Algorithmic classification systems (Waltman & Van Eck, JASIST, 2012) • Why not algorithmically construct a classification system of science? • We cluster publications (not journals) into fields based on citation relations • Only direct citation relations are used; no co-citation or bibliographic coupling relations • Fields are defined at different levels of granularity and are organized hierarchically 24
  26. 26. Example • 10.2 million publications from the period 2001–2010 indexed in Web of Science • 97.6 million direct citation relations • Classification system of 3 hierarchical levels: – 20 broad disciplines – 672 fields – 22,412 subfields • Clustering by optimizing a variant of the standard modularity function that accounts for differences across fields in citation practices 25
  27. 27. Map of the 672 research areas at level 2 of the classification system 26
  28. 28. Map of the 417 publications in research area 4.30.10 27
  29. 29. Part 2 New software tool for exploring largescale citation networks 28
  30. 30. Exploring citation networks: Why? • To support literature reviewing • To show how the scientific literature has evolved over time • To delineate topics or research areas in the literature • To identify connections between different topics in the literature 29
  31. 31. Motivation for a new tool • VOSviewer has proven to be a very useful tool for visualizing science from a static point of view • VOSviewer has not been developed for visualizing the dynamics of science • In fact, the availability of software tools for dynamic visualizations is rather limited: – CiteSpace (Chaomei Chen) – HistCite (Eugene Garfield) 30
  32. 32. HistCite • Timeline visualization of publications and their citation relations, referred to as algorithmic historiography by Garfield 31
  33. 33. Citation Network Explorer • Somewhat similar to HistCite, but capable of dealing with much larger citation networks • So far, the tool has been used successfully with the entire Web of Science citation network of the social sciences (1980–2013; ~2M publications and ~20M citations) • The aim is to be able to handle the entire citation network of all scientific disciplines (~40M publications and ~500M citations) 32
  34. 34. Today’s demonstration (1) • We demonstrate a prototype of the tool • The core functionality is available, but some options have not yet been fully implemented • Your feedback is very much appreciated! 33
  35. 35. Today’s demonstration (2) • Data set 1: – Scientometrics – 1980–2013 – ~10K publications and ~60K citations • Data set 2: – All social sciences except for psychology, education, and health-related sciences – 1980–2013 – ~1.4M publications and ~10M citations 34
  36. 36. Citation Network Explorer 35
  37. 37. References Van Eck, N.J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523-538. Van Eck, N.J., & Waltman, L. (2011). Text mining and visualization using VOSviewer. ISSI Newsletter, 7(3), 50-54. Van Eck, N.J., Waltman, L., Dekker, R., & Van den Berg, J. (2010). A comparison of two techniques for bibliometric mapping: Multidimensional scaling and VOS. JASIST, 61(12), 2405-2416. Van Eck, N.J., Waltman, L., Van Raan, A.F.J., Klautz, R.J.M., & Peul, W.C. (2013). Citation analysis may severely underestimate the impact of clinical research as compared to basic research. PLoS ONE, 8(4), e62395. Waltman, L., & Van Eck, N.J. (2012). A new methodology for constructing a publication-level classification system of science. JASIST, 63(12), 2378-2392. Waltman, L., & Van Eck, N.J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. arXiv:1308.6604. Waltman, L., Van Eck, N.J., & Noyons, E.C.M. (2010). A unified approach to mapping and clustering of bibliometric networks. Journal of Informetrics, 4(4), 629-635. 36

×