Mining Research Publication Networks for Impact
PhD Topic Presentation

Drahomira Herrmannova
Knowledge Media Institute
Th...
Table of Contents
1 Research Aim

Motivation
Problem statement
2 Literature review

State of the art
Limitations
3 Researc...
The key question

“How to evaluate the quality of research publications?”

3 / 19
Who needs this anyway?
• Researchers
• How to select relevant literature for reading?
• Librarians
• How to select journal...
The growth of scholarly literature

Figure : Monthly submission rate (since 1991) for Arxiv.org. Source:
http://arxiv.org/...
The growth of journal subscription costs

Figure : Expenditures in ARL libraries (1986 – 2009). Source: [1]

6 / 19
What’s being used

• Peer review
• Qualitative evaluation method
• Traditionally the main filter for controlling the qualit...
So, what’s the problem?

• Peer review
• Speed and cost
• Biased opinion
• Doesn’t limit the amount of published research
...
Bibliometrics today

Two changes which influenced the evolution of bibliometrics
• creation of the Web and web-related deve...
Bibliometrics today
Two ideas driving the current research
1 Development of new metrics (improvements and replacements
of ...
Limitations
• Limitations of citation-based metrics
• Citation bias
• Incomplete journal coverage
• Author variability
• F...
Research questions

Question 1: What factors influence the quality of a research
publication (with regard to the publicatio...
Selected approach

• Single number vs. collection of metrics and indicators
• Analysis of full-text
• Until quite recently...
Requirements for science evaluation methods
Source: [2]

1

Reliable and accurate, comparable or better than the peer
revi...
Tasks and plans
Data collection

Task 1: Identify information sources that may provide relevant
publication data
• Mostly ...
Tasks and plans
Data analysis

Task 3a: Study the possibilities of application of NLP for the
evaluation of research publi...
Tasks
Development of new methods

Task 4a: Analyse the possibilities of combining the studied
methods in order to design a...
Task 1
Identification of data sources

Source
CSX
MAS
JSTOR
DBLP
CORE
ArXiv
KDD
iSearch
DBLP+C
ACM
OCC

MD
X
X
X
X
-

API
X...
References

[1] Kyrillidou, Martha and Morris, Shaneka.
ARL Statistics 2008 - 2009.
Association of Research Libraries, Was...
How many metrics?

Scientometrics: study of science and research
Bibliometrics: study of scientific literature
Informetrics...
Upcoming SlideShare
Loading in …5
×

Mining Research Publication Networks for Impact -- KMi Internal Seminar

2,221 views

Published on

My presentation for the KMi Internal Seminar

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,221
On SlideShare
0
From Embeds
0
Number of Embeds
1,397
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Mining Research Publication Networks for Impact -- KMi Internal Seminar

  1. 1. Mining Research Publication Networks for Impact PhD Topic Presentation Drahomira Herrmannova Knowledge Media Institute The Open University KMi Internal Seminar, November 2013 1 / 19
  2. 2. Table of Contents 1 Research Aim Motivation Problem statement 2 Literature review State of the art Limitations 3 Research objectives Research questions Selected approach Tasks and plans 4 Pilot study 5 References 2 / 19
  3. 3. The key question “How to evaluate the quality of research publications?” 3 / 19
  4. 4. Who needs this anyway? • Researchers • How to select relevant literature for reading? • Librarians • How to select journal subscriptions? • Universities, funding agencies and other institutions • How to aid reviewers of funding and grant proposals, hiring committees etc.? • Publishers and editors • How can publishers evaluate and promote their journals? • Society • How to evaluate the returns of research to the society? 4 / 19
  5. 5. The growth of scholarly literature Figure : Monthly submission rate (since 1991) for Arxiv.org. Source: http://arxiv.org/ 5 / 19
  6. 6. The growth of journal subscription costs Figure : Expenditures in ARL libraries (1986 – 2009). Source: [1] 6 / 19
  7. 7. What’s being used • Peer review • Qualitative evaluation method • Traditionally the main filter for controlling the quality of published research • Classical quantitative methods • Typically based on citations and/or productivity • Citation counts • JIF • h-index 7 / 19
  8. 8. So, what’s the problem? • Peer review • Speed and cost • Biased opinion • Doesn’t limit the amount of published research • Classical quantitative methods • Quality vs. impact • Reasons for citation • Citation half-life • Manipulation and gaming • Author variability • Field effects 8 / 19
  9. 9. Bibliometrics today Two changes which influenced the evolution of bibliometrics • creation of the Web and web-related developments • growth of Open Access publishing 9 / 19
  10. 10. Bibliometrics today Two ideas driving the current research 1 Development of new metrics (improvements and replacements of JIF) • h-index • Eigenfactor • SJR 2 Concerns about the validity of using citations • Methods using different data • Patent analysis • Webometrics • Altmetrics • Full-text analysis • “Fixing” citations (field normalisation of indicators) 10 / 19
  11. 11. Limitations • Limitations of citation-based metrics • Citation bias • Incomplete journal coverage • Author variability • Field effects • Uncited publications • Manipulation of metrics • Using JIF for research evaluation • Limitations of web-based metrics • Gaming web-based and social metrics • Problems of data collection • Adoption of social media by users • Accumulated advantage • Limitations of text-based metrics • Full-text not always available 11 / 19
  12. 12. Research questions Question 1: What factors influence the quality of a research publication (with regard to the publication type)? Question 2: What is the relationship (if there is any) between the impact of a publication, measured by the classical bibliometric methods, and the quality of a publication? Question 3: How can we detect the factors influencing quality in order to evaluate the quality of a research publication? Question 4: How can this evaluation be used in other disciplines? 12 / 19
  13. 13. Selected approach • Single number vs. collection of metrics and indicators • Analysis of full-text • Until quite recently not easily available • Full-text – the best indicator of publication quality • For example • Co-word analysis • Analysis of citation context • Semantic similarity of publications • Additional indicators • Famous author or collaboration with famous authors • Citing or is being cited outside of the research area • Paper published in a field-specific prestigious journal 13 / 19
  14. 14. Requirements for science evaluation methods Source: [2] 1 Reliable and accurate, comparable or better than the peer review system 2 Easy to understand. 3 Economical in terms of development and maintenance, time required to understand it, etc. 4 Faster than citations, at least comparable to the speed of peer review 5 Resistant to manipulation and gaming 14 / 19
  15. 15. Tasks and plans Data collection Task 1: Identify information sources that may provide relevant publication data • Mostly done Task 2a: Investigate factors that influence the quality of research publications Task 2b: Using the identified information sources, develop various relevant data structures such as: • collaboration networks • citation, co-citation and bibliographic coupling networks • clusters of semantically related publications • clusters of publications corresponding to different topics 15 / 19
  16. 16. Tasks and plans Data analysis Task 3a: Study the possibilities of application of NLP for the evaluation of research publications Task 3b: Investigate the developed data structures using graph and network theory as well as bibliometric indicators 16 / 19
  17. 17. Tasks Development of new methods Task 4a: Analyse the possibilities of combining the studied methods in order to design a set of new methods for estimating quality Task 4b: Evaluate the proposed methods against current standards Task 4c: Analyse the use of the new methods in other disciplines 17 / 19
  18. 18. Task 1 Identification of data sources Source CSX MAS JSTOR DBLP CORE ArXiv KDD iSearch DBLP+C ACM OCC MD X X X X - API X X X X - OAI-PMH X X X - dumps X X X X X X X X X cit. X X X X X X X X X FT X * * * X X X X - Table : Stars (*) represent sources, which don’t store full-text but provide links to the full-text where available. MD stands for multidisciplinary. 18 / 19
  19. 19. References [1] Kyrillidou, Martha and Morris, Shaneka. ARL Statistics 2008 - 2009. Association of Research Libraries, Washington, DC, 2011. [2] Taraborelli, Dario. Soft peer review: Social software and distributed scientific evaluation. Proceedings of the 8th International Conference on the Design of Cooperative Systems (COOP ’08), Carry-le-Rouet, France, 2008. 19 / 19
  20. 20. How many metrics? Scientometrics: study of science and research Bibliometrics: study of scientific literature Informetrics: study of any type of information Webometrics: informetric studies of the web Cybermetrics: informetric studies of the whole Internet Altmetrics: study of science and research using data from social media 20 / 19

×