Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Large-scale analysis of bibliometric data sources

1,720 views

Published on

Presentation at the 8th LCDS Meeting on Statistics & Data Science on November 13, 2015.

Published in: Science
  • Be the first to comment

Large-scale analysis of bibliometric data sources

  1. 1. Large-scale analysis of bibliometric data sources Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University 8th LCDS Meeting: Statistics & Data Science Leiden, November 13, 2015
  2. 2. About myself • Master in computer science • PhD thesis on bibliometric mapping of science • Researcher at CWTS since 2009 • Research focus on analysis and visualization of bibliometric networks 1
  3. 3. Centre for Science and Technology Studies (CWTS) • Research center at Leiden University focusing on science and technology studies • About 30 staff members • History of more than 25 years in bibliometric and scientometric research • Contract research • Full access to large bibliographic database (Web of Science and Scopus) 2
  4. 4. Bibliographic databases: ‘Big data’ 3 Web of Science Scopus Journals 12,000 20,000 Publications 45 million 35 million Citations 1 billion 0.9 billion
  5. 5. Bibliometric networks 4 Web of Science Scopus Citation network of publications Co-authorship network of authors / organizations Co-citation network of pubs / authors / journals Co-occurrence network of terms Bibliographic coupling network of pubs / authors / journals Bibliographic database
  6. 6. Outline • Software tools • Network analysis techniques • Analysis of data science 5
  7. 7. Software tools 6
  8. 8. Software tools • VOSviewer (www.vosviewer.com) – Tool for constructing and visualizing bibliometric networks • CitNetExplorer (www.citnetexplorer.nl) – Tool for visualizing and analyzing citation networks of publications • Both tools have been developed together with my colleague Ludo Waltman 7
  9. 9. VOSviewer 8
  10. 10. Map of university co-authorship network 9
  11. 11. Map of journal citation network 10
  12. 12. CitNetExplorer 11
  13. 13. Network analysis techniques 13
  14. 14. Network analysis techniques 14 Layout: • Visualization of similarities (VOS) Community detection: • Weighted modularity • Smart local moving algorithm
  15. 15. Smart local moving algorithm 15 Q = 0.4198 Q = 0.3791 Reduced network Local moving heuristic in subnetworks Local moving heuristic Original network
  16. 16. Algorithmically constructed classification system of science • 16.2 million publications from the period 2000– 2014 indexed in Web of Science • 241.7 million citation relations • Classification system of 3 hierarchical levels: – 28 broad disciplines – 813 fields – 3,822 subfields 16
  17. 17. 17 Breakdown of scientific literature into 813 fields Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  18. 18. Publications in scientometrics subfield 18
  19. 19. Time-line map of highly cited scientometrics publications 19
  20. 20. Analysis of data science 20
  21. 21. What is data science? • Empirical operationalization of data science based on publications with ‘data’ in title or abstract 21 Wikipedia: “Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data … which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics” LCDS: “Data Science … deals with finding, analyzing and validating complex patterns in data. Data Science methods are indispensable for maintaining a competitive edge in all disciplines in science”
  22. 22. Growth of data-driven research 22 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% 1990 1995 2000 2005 2010 2015 Percentageofpublications % 'data' publications % 'theory' publications
  23. 23. 23 Breakdown of scientific literature into 813 fields Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  24. 24. 24 Data-driven nature of different scientific fields Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering % pub. with ‘data’ in title or abstract
  25. 25. 25 Data-driven nature of different scientific fields artificial intelligence statistics bioinformatics neuroimaging pattern recognition astronomy earth water weather climate remote sensing nutrition obesity addiction % pub. with ‘data’ in title or abstract
  26. 26. Data science fields (at least 20% ‘data’ publications) 26 Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  27. 27. Term map of data science fields 27
  28. 28. 28 Leiden University’s publication output in data science fields Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  29. 29. Leiden University’s institutes with most publications in data science fields • Leiden Observatory • LUMC • Faculty of Archaeology • Institute of Psychology (FSW) • Centre for Science and Technology Studies (FSW) • Mathematical Institute (Science) • Institute of Biology Leiden (Science) • Leiden Institute of Advanced Computer Science (Science) 29
  30. 30. LUMC departments with most publications in data science fields • Medical Statistics and Bioinformatics • Rheumatology • Psychiatry • Radiology • Clinical Epidemiology • Human Genetics • Neurosurgery • Cardiology • Clinical Oncology • Endocrinology 30
  31. 31. Term map based on Leiden University’s publications in data science fields 31
  32. 32. Do it yourself! 32 www.vosviewer.com www.citnetexplorer.nl
  33. 33. Thank you for your attention! 33

×