Your SlideShare is downloading. ×
Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems

239

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
239
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems philipp.mayr@gesis.org Workshop on Scholarly Big Data: Challenges and Ideas. IEEE BigData 2013
  • 2. Intro • What are Big Scholarly Information Systems?
  • 3. Intro • What are bibliometric-enhanced IR models? – set of methods to quantitatively analyze scientific and technological literature – E.g. citation analysis (h-index) – CiteSeer was a pioneer bibliometric-enhanced IR system
  • 4. Background • DFG-funded (2009-2013): Projects IRM I and IRM II – IRM = Information Retrieval Mehrwertdienste (value-added IR services) • Goal: Implementation and evaluation of value-added IR services for digital library systems • Main idea: Applying scholarly (science) models for IR  Co-occurrence analysis of controlled vocabularies (thesauri)  Bibliometric analysis of core journals (Bradford’s law)  Centrality in author networks (betweenness) • In IRM I we concentrated on the basic evaluation • In IRM II we concentrate on the implementation of reusable (web) services 4 http://www.gesis.org/en/research/external-funding-projects/archive/irm/
  • 5. Search Term Recommender (Petras 2006) Search Term Service: recommending strongly associated terms from controlled vocabulary
  • 6. Bradfordizing (White 1981, Mayr 2009) Bradford Law of Scattering (Bradford 1948): idealized example for 450 articles Nucleus/Core: 150 papers in 3 Journals Zone 2: 150 papers in 9 Journals Zone 3: 150 papers in 27 Journals Ranking by Bradfordizing: sorting the core journal papers / core books on top bradfordized list of journals in informetrics applied to monographs: publisher as sorting criterion
  • 7. Author Centrality (Mutschke 2001, 2004) Ranking by Author Centrality: sorting central author papers on top
  • 8. Scenarios for combined ranking services iterative use : simultanous use: Result Set Core Journal Papers Central Author Papers Relevant Papers Result Set Central Author Papers Core Journal Papers
  • 9. Prototye http://multiweb.gesis.org/irsa/IRMPrototype
  • 10. Evaluation
  • 11. Main Research Issue: Contribution to retrieval quality and usability • Precision: – Do central authors (core journals) provide more relevant hits? – Do highly associated cowords have any positive effects? • Value-adding effects: – Do central authors (core journals) provide OTHER relevant hits? – Do coword-relationships provide OTHER relevant search terms? • Mashup effects: – Do combinations of the services enhance the effects?
  • 12. Evaluation Design • precision in existing evaluation data: – Clef 2003-2007: 125 topics; 65,297 SOLIS documents – KoMoHe 2007: 39 topics; 31,155 SOLIS documents • plausibility tests: – author centrality / journal coreness ↔ precision – Bradfordizing ↔ author centrality • precision tests with users (Online-Assessment-Tool) • usability tests with users (acceptance)
  • 13. Evaluation of Bradfordizing on CLEF Data (Mayr 2013) 0,00 0,05 0,10 0,15 0,20 0,25 0,30 0,35 Bradford zones (core, z2, z3) 2003 articles 0,29 0,22 0,16 2004 articles 0,23 0,18 0,13 2005 articles 0,31 0,24 0,17 2006 articles 0,29 0,27 0,24 2007 articles 0,28 0,26 0,22 2005 monographs 0,21 0,16 0,19 2006 monographs 0,28 0,28 0,24 2007 monographs 0,24 0,21 0,23 core z2 z3 journal articles: significant improvement of precision from zone3 to core monographs: slight improvement of precision distribution between the three zones precision between Bradford zones (core, zone2 and zone3)
  • 14. Evaluation of Author Centrality on CLEF Data • moderate positive relationship between rate of networking and precision • precision of TF-IDF rankings (0.60) significantly higher than author centrality based rankings (0.31) – BUT: • very little overlap of documents on top of the ranking lists: 90% of relevant hits provided by author centrality did not appear on top of TF-IDF rankings → added precision of 28% 0 20 40 60 80 100 120 140 0 0,2 0,4 0,6 0,8 1 1,2 GiantSize Precision Correlation Precision10 - Giant Size: 0.25 • author centrality seems to favor OTHER relevant documents than traditional rankings • value-adding effect: other view to the information space avg number docs 517 avg number authors 664 avg number co-authors 302 avg giant size 24
  • 15. Result: overlap Intersection of suggested top n=10 documents over all topics and services Mutschke et al. 2011 top 10 result lists are marginal overlapping!
  • 16. IRSA • • • 16
  • 17. 17 IRSA: Workflow
  • 18. Analysis 18
  • 19. Output 19 Returning suggestions for any query term
  • 20. Integration 20 www.sowiport.de is using query suggestions from IRSA
  • 21. IRM & Modeling Science measuring contribution of bibliometric-enhanced services to retrieval quality deeper insights in structure & functioning of science Bibliometric-enhanced services (structural attributes of science system) way towards a formal model of science
  • 22. References • Mutschke, P., Mayr, P., Schaer, P., & Sure, Y. (2011). Science models as value- added services for scholarly information systems. Scientometrics, 89(1), 349– 364. doi:10.1007/s11192-011-0430-x • Lüke, T., Schaer, P., & Mayr, P. (2013). A framework for specific term recommendation systems. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’13 (pp. 1093–1094). New York, New York, USA: ACM Press. doi:10.1145/2484028.2484207 • Mayr, P. (2013). Relevance distributions across Bradford Zones: Can Bradfordizing improve search? In J. Gorraiz, E. Schiebel, C. Gumpenberger, M. Hörlesberger, & H. Moed (Eds.), 14th International Society of Scientometrics and Informetrics Conference (pp. 1493–1505). Vienna, Austria. Retrieved from http://arxiv.org/abs/1305.0357 • Hienert, D., Schaer, P., Schaible, J., & Mayr, P. (2011). A Novel Combined Term Suggestion Service for Domain-Specific Digital Libraries. In S. Gradmann, F. Borri, C. Meghini, & H. Schuldt (Eds.), International Conference on Theory and Practice of Digital Libraries (TPDL) (pp. 192–203). Berlin: Springer. doi:10.1007/978-3-642-24469-8_21 22
  • 23. Using IRSA 23    • •
  • 24. Thank you! Dr Philipp Mayr GESIS Leibniz Institute for the Social Sciences Unter Sachsenhausen 6-8 50667 Cologne Germany philipp.mayr@gesis.org 24

×