Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Aggregating Research Papers from Publishers’
Systems to Support Text and Data Mining
Deliberate Lack of Interoperability o...
Goal
Achieve seamless harmonised access to full
texts of open access research papers
originating from thousands of systems...
What are we doing
@openminted_eu
- Aggregating full texts of open access
research papers from all over the world
- Institu...
Challenges
@openminted_eu
- Standardisation (OAI-PMH, ResourceSync,
bespoke APIs, nothing, etc.)
- Inconsistent implementa...
Approach
@openminted_eu
- Surveying publishers for machine
accessibillity of OA content and technically
validating their a...
Conclusion
Seamless access to world’s research papers is
needed to enable the creation of text-mining
applications that wi...
Upcoming SlideShare
Loading in …5
×

Aggregating Research papers from Publishers' Systems to Support Text and Data Mining

416 views

Published on

Discussing the challenges in interoperability of databases providing access to research papers and its need for text-mining.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Aggregating Research papers from Publishers' Systems to Support Text and Data Mining

  1. 1. Aggregating Research Papers from Publishers’ Systems to Support Text and Data Mining Deliberate Lack of Interoperability or Not? @openminted_eu Dr. Petr Knoth Knowledge Media institute, The Open University United Kingdom @petrknoth
  2. 2. Goal Achieve seamless harmonised access to full texts of open access research papers originating from thousands of systems around the world for machines to process and extract knowledge from. 2
  3. 3. What are we doing @openminted_eu - Aggregating full texts of open access research papers from all over the world - Institutional, subject-based open repositories & journals - Publisher systems - Pre-processing millions of research papers, making them ready to text-mine (API, data dumps) - Working with researchers around the world to extract knowledge from these data
  4. 4. Challenges @openminted_eu - Standardisation (OAI-PMH, ResourceSync, bespoke APIs, nothing, etc.) - Inconsistent implementation of standards (referencing of full-texts from metadata, variation in fields’ semantics, OpenAIRE guidelines/RIOXX, etc.) - Lack of incentives to adopt standards + legal & ethical issues - Scalability (due to in-adequate standards) or bad practices (Robots exclusion, etc.)
  5. 5. Approach @openminted_eu - Surveying publishers for machine accessibillity of OA content and technically validating their answers - Encouraging providers to follow good practices (validation tools, advocacy) - Implementing connectors to publishers systems - Addressing scalability issues - Pragmatic approach
  6. 6. Conclusion Seamless access to world’s research papers is needed to enable the creation of text-mining applications that will transform the way we do research. While we have already managed to provide this for millions of research papers, we are still facing a number of technical, organisational, legal and ethical challenges in making seamless machine access to world’s research papers a reality. 6

×