Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scientific information retrieval: Challenges and opportunities

364 views

Published on

Presentation at the 17th Dutch-Belgian Information Retrieval workshop (DIR2018). Leiden, the Netherlands, November 23, 2018.

Published in: Science

Scientific information retrieval: Challenges and opportunities

  1. 1. Scientific information retrieval: Challenges and opportunities Ludo Waltman Centre for Science and Technology Studies (CWTS), Leiden University 17th Dutch-Belgian Information Retrieval Workshop (DIR2018) Leiden, The Netherlands November 23, 2018
  2. 2. Centre for Science and Technology Studies (CWTS), Leiden University • Quantitative science studies • Bibliometrics and scientometrics • Research management and science policy • Lots of commissioned research for research institutions, funders, governments, companies, etc. 1
  3. 3. Information retrieval vs. scientometrics 2 Scientific document retrieval Web page retrieval Image retrieval ... Sciento- metric analysis Sciento- metrics Information retrieval Sound retrieval Individual users Research managers Policy makers Researchers
  4. 4. Outline • Historical connections between information retrieval and scientometrics • Scientometric perspective on information retrieval • Scientific information retrieval 3
  5. 5. Historical connections between information retrieval and scientometrics 4
  6. 6. Author co-citation analysis of information science researchers (1980–1987) 5 White and McCain (1998) Scientometrics Information retrieval
  7. 7. Author bibliographic coupling analysis of information science researchers (2001–2005) 6 Zhao and Strotmann (2008) Scientometrics Information retrieval
  8. 8. PageRank 7Brin and Page (1998) Pinski and Narin (1976)
  9. 9. PageRank 8
  10. 10. Scientometric perspective on information retrieval 9
  11. 11. VOSviewer 10
  12. 12. Identifying micro-level fields of science • Based on all articles in reviews in Web of Science between 2000 and 2017 • 21.2 million publications • 374.1 million citation links • Clustering of publications into about 4000 micro-level fields of science using the Leiden algorithm 11
  13. 13. Leiden algorithm 12 Traag et al. (2018)
  14. 14. Structure of science based on 4000 micro-level fields 13 Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering Size of a field is proportional to the number of publications in the field
  15. 15. Temporal dynamics in micro-level structure of science 14 Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering Network science Electric vehicles Image processing Multi-agent systems Average publication year of the publications in a field
  16. 16. Position of scientometrics in micro-level structure of science 15 Proportion of publications with ‘bibliometrics’ or ‘scientometrics’ in title or abstract Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering Scientometrics
  17. 17. Position of information retrieval in micro-level structure of science • What are the main subfields of information retrieval? • Which broad scientific disciplines do these subfields relate to? • Which are the main developments in information retrieval in recent years? 16
  18. 18. Position of information retrieval in micro-level structure of science 17 Proportion of publications with ‘information retrieval’ in title or abstract Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering IR subfield 2 Scientometrics IR subfield 1 IR subfield 3
  19. 19. Term map of information retrieval subfield 1 18 Average publication year of the publications in which a term occurs Size of a term is proportional to the number of publications in which the term occurs
  20. 20. Term map of information retrieval subfield 2 19
  21. 21. Term map of information retrieval subfield 3 20
  22. 22. Information retrieval subfields • Subfield 1: – Computer science perspective: ‘Hard’ information retrieval – Strong recent emphasis on recommender systems, social media, and sentiment analysis • Subfield 2: – Information science perspective: ‘Soft’ information retrieval – Connections between information retrieval, library science, and information behavior and information literacy research • Subfield 3: – Bioinformatics perspective: Information retrieval in the biomedical and health science domain • These subfields do not exhaustively cover all information retrieval research 21
  23. 23. Term map of scientometrics field 22
  24. 24. Term map of scientometrics field 23 Proportion of publications with ‘information retrieval’ in title or abstract
  25. 25. Workshops on bibliometric-enhanced information retrieval 24
  26. 26. Scientific information retrieval 25
  27. 27. Tools for scientific information retrieval 26
  28. 28. Google Scholar 27
  29. 29. Web of Science 28
  30. 30. Web of Science 29
  31. 31. Approaches in scientific information retrieval • Semantic search • Similar articles • Advanced full-text search • Clustering • Highly influential citations • Citation-based expansion 30
  32. 32. Semantic search (Microsoft Academic) 31
  33. 33. Identifying fields of study (Microsoft Academic) 32Shen et al. (2018)
  34. 34. Similar articles (PubMed) 33
  35. 35. Similar articles (PubMed) 34 Lin and Wilbur (2007)
  36. 36. Advanced full-text search (Europe PMC) 35
  37. 37. Clustering (Open Knowledge Maps) 36
  38. 38. Clustering (Open Knowledge Maps) 37
  39. 39. Clustering (ongoing work) 38
  40. 40. In-text references 39 Boyack et al. (2018)
  41. 41. 40 Highly influential citations (Semantic Scholar)
  42. 42. Highly influential citations (Semantic Scholar) 41 Valenzuela et al. (2015)
  43. 43. Citation-based expansion (CitNetExplorer) 42 Van Eck and Waltman (2014) Expansion with all citing and cited publications Expansion with citing and cited publications having 3 or more citation links Selection of publications (in blue)
  44. 44. Citation-based expansion (CitNetExplorer) 43
  45. 45. Citation-based expansion (CitNetExplorer) 44
  46. 46. Literature reviewing using citation-based expansion 45
  47. 47. Open citations 46 Shotton (2018)
  48. 48. Open citations 47
  49. 49. Citation-based expansion (Citation Gecko) 48
  50. 50. Conclusions • Significant inefficiencies in current information retrieval practices of researchers • Lots of room for innovation • Take advantage of open science developments • Join forces between information retrieval and scientometrics 49
  51. 51. Workshop on Bibliometric-enhanced Information Retrieval at ECIR 2019 50
  52. 52. Thank you for your attention! 51
  53. 53. References Boyack, K.W., Van Eck, N.J., Colavizza, G., & Waltman, L. (2018). Characterizing in-text citations in scientific articles: A large-scale analysis. Journal of Informetrics, 12(1), 59–73. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117. Lin, J., & Wilbur, W.J. (2007). PubMed related articles: A probabilistic topic-based model for content similarity. BMC Bioinformatics, 8(1), 423. Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing & Management, 12(5), 297–312. Shen, Z., Ma, H., & Wang, K. (2018). A Web-scale system for scientific knowledge exploration. arXiv:1805.12216. Shotton, D. (2018). Funders should mandate open citations. Nature, 553, 129. Traag, V.A., Waltman, L., & Van Eck, N.J. (2018). From Louvain to Leiden: Guaranteeing well-connected communities. arXiv:1810.08473. Valenzuela, M., Ha, V., & Etzioni, O. (2015). Identifying meaningful citations. AAAI Workshop: Scholarly Big Data. Van Eck, N.J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538. Van Eck, N.J., & Waltman, L. (2014). CitNetExplorer: A new software tool for analyzing and visualizing citation networks. Journal of Informetrics, 8(4), 802–823. Waltman, L., & Van Eck, N.J. (2012). A new methodology for constructing a publication-level classification system of science. JASIST, 63(12), 2378–2392. White, H.D., & McCain, K.W. (1998). Visualizing a discipline: An author co‐citation analysis of information science, 1972–1995. JASIS, 49(4), 327–355. Zhao, D., & Strotmann, A. (2008). Evolution of research activities and intellectual influences in information science 1996–2005: Introducing author bibliographic-coupling analysis. JASIST, 59(13), 2070–2086. 52

×