Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using full-text data to create improved term maps

27 views

Published on

Presentation at the 16th International Conference on Scientometrics & Informetrics, Wuhan, China, October 19, 2017.

A term map offers a visualization of a network of terms that co-occur in scientific publications. Term maps are usually created based on the titles and abstracts of publications. In this paper, we explore the use of full-text data for creating term maps. We create and compare a series of term maps based on the full text of publications in Journal of Informetrics. We use our results to discuss the advantages and disadvantages of different approaches for creating term maps.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Using full-text data to create improved term maps

  1. 1. Using full-text data to create improved term maps Nees Jan van Eck1, Ludo Waltman1, Min Song2, and Yoo Kyung Jeong2 1Centre for Science and Technology Studies, Leiden University, Leiden, The Netherlands 2Department of Library and Information Science, Yonsei University, Seoul, Republic of Korea 16th International Conference on Scientometrics & Informetrics Wuhan, China, October 19, 2017
  2. 2. Introduction • Traditionally bibliometric analyses are based on meta data of scientific publications • Full text of scientific publications is increasingly becoming available in structured formats • We study different approaches for creating VOSviewer term maps using full text data • We perform comparisons with a traditional approach based on titles and abstracts 1
  3. 3. VOSviewer term maps 2
  4. 4. Interpretation of a term map • Size: – The larger a term, the higher the frequency of occurrence of the term • Distance: – In general, the smaller the distance between two terms, the higher the relatedness of the terms, as measured by co- occurrences – Horizontal and vertical axes have no special meaning • Colors: – Colors indicate clusters of closely related terms 3
  5. 5. Creating a term map 1. Input English-language text corpus 2. Identify terms 3. Count co-occurrences of terms 4. Create layout and clustering 4
  6. 6. Counting co-occurrences of terms • Full counting: – All occurrences of a term in a document are counted • Binary counting: – Only the presence or absence of a term matters – Number of occurrences of a term is not taken into account 5
  7. 7. Data • Full text of publications in Journal of Informetrics • 688 publications in the period 2007-2016 • Downloaded in XML format using the Elsevier ScienceDirect Article Retrieval API 6 Average per pub. Sections 6.0 Paragraphs 42.1 Sentences 191.1
  8. 8. 7
  9. 9. Term maps 8
  10. 10. Titles and abstracts / binary counting 9
  11. 11. Full text, publication level / full counting 10
  12. 12. Full text, paragraph level / full counting 11
  13. 13. Conclusions • Full text vs. titles and abstracts: – Full text yields richer maps than titles and abstracts – Richer maps may be useful for interactive visualization, perhaps not for static visualization • Full counting vs. binary counting: – When using full text data, full counting is preferable over binary counting • Paragraph level vs. publication level: – Paragraph-level maps have more fine-grained structure than publication-level maps – However, areas in paragraph-level maps do not always represent topics in the literature 12
  14. 14. Future research • Use full-text data for creating other types of maps, in particular co-citation maps 13
  15. 15. 14 Thank you for your attention!

×