SlideShare a Scribd company logo
Seamless access to the world’s open
access research papers via
ResourceSync
Petr Knoth
Use Case 1: ResourceSync as a seamless layer over
heterogenous APIs
Use Case 1: What is CORE?
OA Repositories OA Journals
Mostly OAI-PMH
CORE aggregates and
provides free access to
millions of research
articles aggregated
from thousands of OA
repositories and
journals.
Use Case 1: What is CORE?
OA Repositories OA Journals
Mostly OAI-PMH
CORE aggregates and
provides free access to
millions of research
articles aggregated
from thousands of OA
repositories and
journals.
»Enrichment and
harmonisation of
aggregated data
»Products/services:
›Portal
›API
›Data dumps
›Recommendation
system for libraries
›Repository dashboard
›B2B and analytical
services
Use Case 1: What is CORE?
OA Repositories OA Journals
Mostly OAI-PMH
CORE aggregates and
provides free access to
millions of research
articles aggregated
from thousands of OA
repositories and
journals.
»70 million+
metadata records
»Over 6 million full
texts hosted on
CORE
»~1.5 million
monthly active
users
»Aggregating from
2,500 repositories
and 10k OA
journals
Use Case 1: Key issue
Key players do not provide interoperability for machine
access to metadata and content of research papers.
35%
23%
18%
12%
12%
Accessing full-text by
harvesting
the website
Major search
engines
Recongnised
services upon
approval
75%
12%
13%
Restricting access to
full-text
Don't restrict
access in any way
Specify a crawl
delay
Allow access to
specific robots
39%
11%
39%
11%
Reference of an article’s
full-text on metadata
Direct link to full-
text
Interface
supporting full-text
transfer
50%
42%
8%
Accessing content
standards
OAI
Own API
Z39.50
36%
24%
4%
32%
4%
Files format
PDF
HTML
Plain text
HTML
JSON
54%31%
15%
Automated downloads
of OA full-text
Website
API
FTP
Use Case 1: Approach
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
Mostly OAI-PMH
A range of bespoke APIs
+ many others
Provide seamless access over non-standardised APIs.
What protocol?
Use Case 1: Approach
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
Mostly OAI-PMH
A range of bespoke APIs
+ many others
Provide seamless access over non-standardised APIs.
What protocol? »Why not OAI-PMH?
›slow and very inefficient
for big repositories.
›Standardised for
metadata transfer but
not for content transfer.
› Very difficult to
represent the richness of
metadata from a broad
range of data providers.
Use Case 1: ResourceSync as a seamless access layer
»Very scalable
implementation on
both the server and
client side
»Interpretation of
metadata happens
using existing pipeline
at the aggregator.
»1.5 million OA
publications from
Elsevier, Springer and
others already
exposed.
»Available at: https://publisher-connector.core.ac.uk/resourcesync
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
Mostly OAI-PMH
A range of bespoke APIs
+ many others
ResourceSync
Use Case 2: Exposing enriched data for Text and Data
Mining (TDM) via ResourceSync
Use Case 2: Subscribing to ResourceSync
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
Mostly OAI-PMH
A range of bespoke APIs
ResourceSync
+ many others
»Other aggregators can
subscribe to the Publisher
connector to make use of their
ingestion pipelines and
enrichment technologies
Use Case 2: Content ingestion in OpenMinTeD
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
ResourceSync
Mostly OAI-PMH
OMTD-SHARE
(over REST)
A range of bespoke APIs
+ many others
»CORE and OpenAIRE are content sources in the OpenMinTeD
TDM platform (EU infrastructure project) being developed to
enable the mining of scholarly literature.
Use Case 2: Exposing enriched data for TDM
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
ResourceSync
Mostly OAI-PMH
A range of bespoke APIs
+ many others
ResourceSync
»But others want similar solutions … typically, they want to be
able to sync and host the data.
Use Case 3: Make repositories and journals adopt
ResourceSync
Use Case 3: Replace OAI-PMH with ResourceSync
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
ResourceSync
Mostly OAI-PMH
OMTD-SHARE
(over REST)
A range of bespoke APIs
+ many others
ResourceSync
ResourceSync
»Will be a game changer …
»Advocated by COAR Next
Generation Repositories WG
Key contributions and considerations
What’s new about our implementation of ResourceSync?
»Scales to many millions of resources as required by
aggregators (as opposed to existing implementations for
repositories that are scalable for tens of thousands of
resources)
»Real-time updating of ResourceLists and ChangeLists
(avoiding unnecessary batch processes).
»Combination of real-time updates and scalability
Architectural choices
»Based on the principle of changes being communicated
to a controller as they happen (rather than having to be
detected prior to ResourceList/ChangeList updates)
»Uses Elasticsearch as a database
»Hashing mechanism to distribute size of each
ResourceList link and a clever mechanism for iterative
updating of ResourceLists
Conclusions
»ResourceSync:
›broad range of uses in scholarly communication.
›solves problems with aggregating content over OAI-PMH, faster &
more efficient aggregation => fresher data in aggregators compared
to OAI-PMH
»We used ResourceSync to ”liberate” over 1.5 million OA papers (and
growing) from key publishers
»CORE soon to provide access to over 8 million OA full texts via
ResourceSync.
»CORE actively contributes to the adoption of ResourceSync in the
repositories community (as part of OpenMinTeD and COAR NGR)

More Related Content

What's hot

Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
Globus
 
Insight_150115_Demo
Insight_150115_DemoInsight_150115_Demo
Insight_150115_Demo
Matt Rubashkin
 
FY'16 Library of Congress Storage Environment
FY'16 Library of Congress Storage Environment FY'16 Library of Congress Storage Environment
FY'16 Library of Congress Storage Environment
Carl Watts
 
Institutional respositories - Ken Scott (Georgetown University Qatar) - #OAWe...
Institutional respositories - Ken Scott (Georgetown University Qatar) - #OAWe...Institutional respositories - Ken Scott (Georgetown University Qatar) - #OAWe...
Institutional respositories - Ken Scott (Georgetown University Qatar) - #OAWe...
QScience
 
Elk - An introduction
Elk - An introductionElk - An introduction
Elk - An introduction
Hossein Shemshadi
 
October 1 NISO Training Thursday: Using Alerting Systems to Ensure OA Policy ...
October 1 NISO Training Thursday: Using Alerting Systems to Ensure OA Policy ...October 1 NISO Training Thursday: Using Alerting Systems to Ensure OA Policy ...
October 1 NISO Training Thursday: Using Alerting Systems to Ensure OA Policy ...
National Information Standards Organization (NISO)
 
NISO Update June 2014 SUSHI
NISO Update June 2014 SUSHI NISO Update June 2014 SUSHI
Presentacion redislabs-ihub
Presentacion redislabs-ihubPresentacion redislabs-ihub
Presentacion redislabs-ihub
ssuser9d7c90
 
Logstash, Elasticsearch and Kibana
Logstash, Elasticsearch and KibanaLogstash, Elasticsearch and Kibana
Logstash, Elasticsearch and Kibana
Saroj Panyasrivanit
 
PoolParty Search Server
PoolParty Search ServerPoolParty Search Server
PoolParty Search Server
Andreas Blumauer
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
ObjectRocket
 
Data munging and analysis
Data munging and analysisData munging and analysis
Data munging and analysis
Raminder Singh
 
Library support for the scientific publishing cycle @ Rothamsted Research
Library support for the scientific publishing cycle @ Rothamsted ResearchLibrary support for the scientific publishing cycle @ Rothamsted Research
Library support for the scientific publishing cycle @ Rothamsted Research
Tim Wales
 
Gateways 2020 Tutorial - Introduction to Globus
Gateways 2020 Tutorial - Introduction to GlobusGateways 2020 Tutorial - Introduction to Globus
Gateways 2020 Tutorial - Introduction to Globus
Globus
 
Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01
David Smiley
 
BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs at ISKO Conference, UCL, London - July 2013BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs
 
Logs, metrics and real time data analytics
Logs, metrics and real time data analyticsLogs, metrics and real time data analytics
Logs, metrics and real time data analytics
Ewere Diagboya
 
Traffic Analytics for Linked Data Publishers
Traffic Analytics for  Linked Data PublishersTraffic Analytics for  Linked Data Publishers
Traffic Analytics for Linked Data Publishers
Luca Costabello
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
ObjectRocket
 
eXtensible Catalog - afternoon session - Tilburg
eXtensible Catalog - afternoon session - TilburgeXtensible Catalog - afternoon session - Tilburg
eXtensible Catalog - afternoon session - Tilburg
University of Rochester
 

What's hot (20)

Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
 
Insight_150115_Demo
Insight_150115_DemoInsight_150115_Demo
Insight_150115_Demo
 
FY'16 Library of Congress Storage Environment
FY'16 Library of Congress Storage Environment FY'16 Library of Congress Storage Environment
FY'16 Library of Congress Storage Environment
 
Institutional respositories - Ken Scott (Georgetown University Qatar) - #OAWe...
Institutional respositories - Ken Scott (Georgetown University Qatar) - #OAWe...Institutional respositories - Ken Scott (Georgetown University Qatar) - #OAWe...
Institutional respositories - Ken Scott (Georgetown University Qatar) - #OAWe...
 
Elk - An introduction
Elk - An introductionElk - An introduction
Elk - An introduction
 
October 1 NISO Training Thursday: Using Alerting Systems to Ensure OA Policy ...
October 1 NISO Training Thursday: Using Alerting Systems to Ensure OA Policy ...October 1 NISO Training Thursday: Using Alerting Systems to Ensure OA Policy ...
October 1 NISO Training Thursday: Using Alerting Systems to Ensure OA Policy ...
 
NISO Update June 2014 SUSHI
NISO Update June 2014 SUSHI NISO Update June 2014 SUSHI
NISO Update June 2014 SUSHI
 
Presentacion redislabs-ihub
Presentacion redislabs-ihubPresentacion redislabs-ihub
Presentacion redislabs-ihub
 
Logstash, Elasticsearch and Kibana
Logstash, Elasticsearch and KibanaLogstash, Elasticsearch and Kibana
Logstash, Elasticsearch and Kibana
 
PoolParty Search Server
PoolParty Search ServerPoolParty Search Server
PoolParty Search Server
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Data munging and analysis
Data munging and analysisData munging and analysis
Data munging and analysis
 
Library support for the scientific publishing cycle @ Rothamsted Research
Library support for the scientific publishing cycle @ Rothamsted ResearchLibrary support for the scientific publishing cycle @ Rothamsted Research
Library support for the scientific publishing cycle @ Rothamsted Research
 
Gateways 2020 Tutorial - Introduction to Globus
Gateways 2020 Tutorial - Introduction to GlobusGateways 2020 Tutorial - Introduction to Globus
Gateways 2020 Tutorial - Introduction to Globus
 
Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01
 
BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs at ISKO Conference, UCL, London - July 2013BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs at ISKO Conference, UCL, London - July 2013
 
Logs, metrics and real time data analytics
Logs, metrics and real time data analyticsLogs, metrics and real time data analytics
Logs, metrics and real time data analytics
 
Traffic Analytics for Linked Data Publishers
Traffic Analytics for  Linked Data PublishersTraffic Analytics for  Linked Data Publishers
Traffic Analytics for Linked Data Publishers
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 
eXtensible Catalog - afternoon session - Tilburg
eXtensible Catalog - afternoon session - TilburgeXtensible Catalog - afternoon session - Tilburg
eXtensible Catalog - afternoon session - Tilburg
 

Similar to Seamless access to the world's open access research papers via resources sync

OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
Open Science Fair
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
petrknoth
 
CORE APIv3
CORE APIv3CORE APIv3
CORE APIv3
petrknoth
 
Digitisation and institutional repositories 3
Digitisation and institutional repositories 3Digitisation and institutional repositories 3
Digitisation and institutional repositories 3
Libsoul Technologies Pvt. Ltd.
 
Next Generation Repositories
Next Generation RepositoriesNext Generation Repositories
Next Generation Repositories
ukcorr
 
Literature Services Resource Description Framework
Literature Services Resource Description FrameworkLiterature Services Resource Description Framework
Literature Services Resource Description Framework
Jee-Hyub Kim
 
Core @ repositories fringe 2015
Core @ repositories fringe 2015Core @ repositories fringe 2015
Core @ repositories fringe 2015
Lucas anastasiou
 
OpenAIRE Guidelines for data providers: new Metadata Application Profile for ...
OpenAIRE Guidelines for data providers: new Metadata Application Profile for ...OpenAIRE Guidelines for data providers: new Metadata Application Profile for ...
OpenAIRE Guidelines for data providers: new Metadata Application Profile for ...
OpenAIRE
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community Call
OpenAIRE
 
CrossRef Text & Data Mining - UKSG 2015
CrossRef Text & Data Mining - UKSG 2015CrossRef Text & Data Mining - UKSG 2015
CrossRef Text & Data Mining - UKSG 2015
Crossref
 
UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...
UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...
UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...
UKSG: connecting the knowledge community
 
The Other Side of the Journal ToCs Interface
The Other Side of the Journal ToCs InterfaceThe Other Side of the Journal ToCs Interface
The Other Side of the Journal ToCs Interface
Phil Barker
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
Nikesh Narayanan
 
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...
petrknoth
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
lagoze
 
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKThe Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
Andy Powell
 
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research GraphIntroduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research Graph
OpenAIRE
 
OAI and OAI-PMH
OAI and OAI-PMHOAI and OAI-PMH
OAI and OAI-PMH
Lena Bruncaj
 
ResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource SynchronizationResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource Synchronization
Simeon Warner
 

Similar to Seamless access to the world's open access research papers via resources sync (20)

OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
 
CORE APIv3
CORE APIv3CORE APIv3
CORE APIv3
 
Digitisation and institutional repositories 3
Digitisation and institutional repositories 3Digitisation and institutional repositories 3
Digitisation and institutional repositories 3
 
Next Generation Repositories
Next Generation RepositoriesNext Generation Repositories
Next Generation Repositories
 
Literature Services Resource Description Framework
Literature Services Resource Description FrameworkLiterature Services Resource Description Framework
Literature Services Resource Description Framework
 
Core @ repositories fringe 2015
Core @ repositories fringe 2015Core @ repositories fringe 2015
Core @ repositories fringe 2015
 
OpenAIRE Guidelines for data providers: new Metadata Application Profile for ...
OpenAIRE Guidelines for data providers: new Metadata Application Profile for ...OpenAIRE Guidelines for data providers: new Metadata Application Profile for ...
OpenAIRE Guidelines for data providers: new Metadata Application Profile for ...
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community Call
 
CrossRef Text & Data Mining - UKSG 2015
CrossRef Text & Data Mining - UKSG 2015CrossRef Text & Data Mining - UKSG 2015
CrossRef Text & Data Mining - UKSG 2015
 
UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...
UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...
UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...
 
The Other Side of the Journal ToCs Interface
The Other Side of the Journal ToCs InterfaceThe Other Side of the Journal ToCs Interface
The Other Side of the Journal ToCs Interface
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
 
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
 
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKThe Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
 
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research GraphIntroduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research Graph
 
OAI and OAI-PMH
OAI and OAI-PMHOAI and OAI-PMH
OAI and OAI-PMH
 
ResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource SynchronizationResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource Synchronization
 

More from openminted_eu

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDM
openminted_eu
 
OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
openminted_eu
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
openminted_eu
 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...
openminted_eu
 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
openminted_eu
 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Mining
openminted_eu
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
openminted_eu
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK theses
openminted_eu
 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledge
openminted_eu
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilities
openminted_eu
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
openminted_eu
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
openminted_eu
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mine
openminted_eu
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKPro
openminted_eu
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
openminted_eu
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspective
openminted_eu
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
openminted_eu
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
openminted_eu
 

More from openminted_eu (18)

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDM
 
OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...
 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Mining
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK theses
 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledge
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilities
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mine
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKPro
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspective
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
 

Recently uploaded

Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
SSR02
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 

Recently uploaded (20)

Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 

Seamless access to the world's open access research papers via resources sync

  • 1. Seamless access to the world’s open access research papers via ResourceSync Petr Knoth
  • 2. Use Case 1: ResourceSync as a seamless layer over heterogenous APIs
  • 3. Use Case 1: What is CORE? OA Repositories OA Journals Mostly OAI-PMH CORE aggregates and provides free access to millions of research articles aggregated from thousands of OA repositories and journals.
  • 4. Use Case 1: What is CORE? OA Repositories OA Journals Mostly OAI-PMH CORE aggregates and provides free access to millions of research articles aggregated from thousands of OA repositories and journals. »Enrichment and harmonisation of aggregated data »Products/services: ›Portal ›API ›Data dumps ›Recommendation system for libraries ›Repository dashboard ›B2B and analytical services
  • 5. Use Case 1: What is CORE? OA Repositories OA Journals Mostly OAI-PMH CORE aggregates and provides free access to millions of research articles aggregated from thousands of OA repositories and journals. »70 million+ metadata records »Over 6 million full texts hosted on CORE »~1.5 million monthly active users »Aggregating from 2,500 repositories and 10k OA journals
  • 6. Use Case 1: Key issue Key players do not provide interoperability for machine access to metadata and content of research papers. 35% 23% 18% 12% 12% Accessing full-text by harvesting the website Major search engines Recongnised services upon approval 75% 12% 13% Restricting access to full-text Don't restrict access in any way Specify a crawl delay Allow access to specific robots 39% 11% 39% 11% Reference of an article’s full-text on metadata Direct link to full- text Interface supporting full-text transfer 50% 42% 8% Accessing content standards OAI Own API Z39.50 36% 24% 4% 32% 4% Files format PDF HTML Plain text HTML JSON 54%31% 15% Automated downloads of OA full-text Website API FTP
  • 7. Use Case 1: Approach OA Repositories OA Journals Key publishers (OA + hybrid OA) Publisher connector Mostly OAI-PMH A range of bespoke APIs + many others Provide seamless access over non-standardised APIs. What protocol?
  • 8. Use Case 1: Approach OA Repositories OA Journals Key publishers (OA + hybrid OA) Publisher connector Mostly OAI-PMH A range of bespoke APIs + many others Provide seamless access over non-standardised APIs. What protocol? »Why not OAI-PMH? ›slow and very inefficient for big repositories. ›Standardised for metadata transfer but not for content transfer. › Very difficult to represent the richness of metadata from a broad range of data providers.
  • 9. Use Case 1: ResourceSync as a seamless access layer »Very scalable implementation on both the server and client side »Interpretation of metadata happens using existing pipeline at the aggregator. »1.5 million OA publications from Elsevier, Springer and others already exposed. »Available at: https://publisher-connector.core.ac.uk/resourcesync OA Repositories OA Journals Key publishers (OA + hybrid OA) Publisher connector Mostly OAI-PMH A range of bespoke APIs + many others ResourceSync
  • 10. Use Case 2: Exposing enriched data for Text and Data Mining (TDM) via ResourceSync
  • 11. Use Case 2: Subscribing to ResourceSync OA Repositories OA Journals Key publishers (OA + hybrid OA) Publisher connector Mostly OAI-PMH A range of bespoke APIs ResourceSync + many others »Other aggregators can subscribe to the Publisher connector to make use of their ingestion pipelines and enrichment technologies
  • 12. Use Case 2: Content ingestion in OpenMinTeD OA Repositories OA Journals Key publishers (OA + hybrid OA) Publisher connector ResourceSync Mostly OAI-PMH OMTD-SHARE (over REST) A range of bespoke APIs + many others »CORE and OpenAIRE are content sources in the OpenMinTeD TDM platform (EU infrastructure project) being developed to enable the mining of scholarly literature.
  • 13. Use Case 2: Exposing enriched data for TDM OA Repositories OA Journals Key publishers (OA + hybrid OA) Publisher connector ResourceSync Mostly OAI-PMH A range of bespoke APIs + many others ResourceSync »But others want similar solutions … typically, they want to be able to sync and host the data.
  • 14. Use Case 3: Make repositories and journals adopt ResourceSync
  • 15. Use Case 3: Replace OAI-PMH with ResourceSync OA Repositories OA Journals Key publishers (OA + hybrid OA) Publisher connector ResourceSync Mostly OAI-PMH OMTD-SHARE (over REST) A range of bespoke APIs + many others ResourceSync ResourceSync »Will be a game changer … »Advocated by COAR Next Generation Repositories WG
  • 16. Key contributions and considerations
  • 17. What’s new about our implementation of ResourceSync? »Scales to many millions of resources as required by aggregators (as opposed to existing implementations for repositories that are scalable for tens of thousands of resources) »Real-time updating of ResourceLists and ChangeLists (avoiding unnecessary batch processes). »Combination of real-time updates and scalability
  • 18. Architectural choices »Based on the principle of changes being communicated to a controller as they happen (rather than having to be detected prior to ResourceList/ChangeList updates) »Uses Elasticsearch as a database »Hashing mechanism to distribute size of each ResourceList link and a clever mechanism for iterative updating of ResourceLists
  • 19. Conclusions »ResourceSync: ›broad range of uses in scholarly communication. ›solves problems with aggregating content over OAI-PMH, faster & more efficient aggregation => fresher data in aggregators compared to OAI-PMH »We used ResourceSync to ”liberate” over 1.5 million OA papers (and growing) from key publishers »CORE soon to provide access to over 8 million OA full texts via ResourceSync. »CORE actively contributes to the adoption of ResourceSync in the repositories community (as part of OpenMinTeD and COAR NGR)