SlideShare a Scribd company logo
1 of 33
Better together: building services for
public good on top of content from the
global network of open repositories
Petr Knoth
June 21, 2019 – OAI-11, Geneva
Big Scientific Data and Text Analytics Group
Knowledge Media Institute, The Open University
Papers OA sooner and sooner
1/22
https://physicstoday.scitation.org/do/10.1063/PT.6.2.20190418a/full/
JCDL 2019 Best Paper
Award
Papers OA sooner and sooner
The delay between publication and OA availability decreasing
globally
2
Herrmannova, Drahomira; Pontika, Nancy and Knoth, Petr (2019). Do Authors Deposit on Time? Tracking Open Access Policy
Compliance. In: 2019 ACM/IEEE Joint Conference on Digital Libraries, 2-6 Jun 2019, Urbana-Champaign, IL
Faster open access makes repository
infrastructure more important
3
We don’t need just open access
we need fast open access
4
This study was only possible to conduct because
of repositories and aggregators working together
5
Global network of repositories
7
“A single scientific repository is of limited value, real benefits
come from the ability to exchange data within a network …
… interoperability allows us to exploit today's computational
power so that we can aggregate, data mine, create new tools and
services, and generate new knowledge from repository content.”
– Confederation of Open Access Repositories (COAR)
OA Aggregations and BOAI 2002
“To achieve open access to scholarly journal literature, we
recommend two complementary strategies.
• Self-Archiving: First, scholars need the tools and assistance to
deposit their refereed journal articles in open electronic
archives, a practice commonly called, self-archiving. When
these archives conform to standards created by the Open
Archives Initiative, then search engines and other tools can
treat the separate archives as one. Users then need not know
which archives exist or where they are located in order to find
and make use of their contents.
• …”
Budapest Open Access Initiative, 2002
8
CORE’s mission
9
Aggregate all open access research articles worldwide …
… enrich this content and provide seamless access to it through a set
of data services …
Introducing CORE
10
Harvesting data is challenging
11
Harvesting data is challenging
14
Harvesting data is challenging
15
Need for more interoperability across systems
16
World’s largest dataset of Open Access
full texts
• 13,117,488 Hosted full texts
• 135,539,113 Metadata records
• ~78m Links to full texts
• 15TB of raw plain text
• 4,123 Data providers
17/22
CORE Processing pipeline
• Metadata download, extraction and harmonisation
• Full text download
• Text extractions, sections extraction
• Metadata validation and enrichment (DOI, ORCID, etc.)
• Thumbnails generation
• References and citation contexts extraction
• API enrichment (e.g. finding DOIs, linking to other systems)
• Document type classification
• Deduplication
• Indexing
• Exposing (data dumps, API, FastSync)
18
CORE usage
• January 2019 – CORE reached over 10M monthly active
users for the first time
• 571% increase from January 2018
19
Alexa rank
• Within top 6k
global websites:
https://www.alexa.
com/
• core.ac.uk by
usage in the top
0.0009% of
global websites
20
CORE Search
21
• Full text search for OA content
• Faceted searching
• What you find is what you get
• Real change of data providers
wanting to be included
CORE Recommender
• Recommending relevant
content to users from
across all free content
• Recommender plugin for
repositories
• https://core.ac.uk/service
s/recommender/
22
CORE Recommender
• Recommending relevant
content to users from
across all free content
• Recommender plugin for
repositories
• https://core.ac.uk/service
s/recommender/
23
Introducing CORE Discovery
• High coverage of freely
available content
• Free service for
researchers by
researchers. No
company controlling the
pipes.
• Best grip on open
repository content.
• Repository integration
• Discovering documents
without a DOI.
https://core.ac.uk/services/discovery/
CORE Discovery demonstration
CORE Discovery Repository integration
• Majority of articles in
repositories metadata only.
• CORE Discovery
repository plugin:
• turns dead ends of user
journeys into journeys
fulfilling users’ information
needs
• makes repository content
more discoverable.
CORE Search
27
• Full text search for OA content
• Faceted searching
• What you find is what you get
CORE’s raw data services
28
Raw data services – CORE API
29
• Enables the development of new
applications
• Real-time machine access to the
world's largest collection of open
access papers
• Harmonised access to data from
across the network of CORE
providers
• Direct machine access to full texts
of research papers
Raw data services – CORE Dataset
30
• Download millions of research
papers for text and data analysis
• Prototype, analyse and mine your
data in your infrastructure
Raw data services – CORE FastSync
31
• Keeps your data in sync with
research content from around the
world
• Fast and incremental updates as
soon as they become available. No
usage restrictions
• Based on ResourceSync
Use powered by CORE
32
• It is beyond human capacities to read all
scientific literature
• Example use cases in which CORE is applied:
• Improving discovery
• Plagiarism detection
• Question answering in science
• Literature based discovery
• Fact checking and detection of misinformation
• Analysing research trends
• Finding experts in a particular domain
• Research evaluation and scientometrics
• Exploratory and visual search
• …
Working with partners
33
Take home
22/22
• Data providers (repositories, preprint servers, journals, etc.)
and aggregators need to work together to allow text and data
analysis, processing and reuse of large volumes of research
papers.
• CORE provides the tools for programmatically processing
open access data fast, reliably and from across the global
network of repositories.
• If you are a repository manager or a librarian:
• CORE Discovery and Recommender
• If you are a developer or analyst:
• Build your own stuff using CORE‘s data services on top of the
global full text open access courpus
Thank you!
https://core.ac.uk

More Related Content

What's hot

DBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataDBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataChris Bizer
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichmentsemanticsconference
 
Tufts Spatial Data Rescue: Crawling at-risk Government Data
Tufts Spatial Data Rescue: Crawling at-risk Government DataTufts Spatial Data Rescue: Crawling at-risk Government Data
Tufts Spatial Data Rescue: Crawling at-risk Government DataKyle Monahan
 
Harvesting Using the Open Archives Initiative Protocol: What Your OAI Stream ...
Harvesting Using the Open Archives Initiative Protocol: What Your OAI Stream ...Harvesting Using the Open Archives Initiative Protocol: What Your OAI Stream ...
Harvesting Using the Open Archives Initiative Protocol: What Your OAI Stream ...Sandra McIntyre
 
Using Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case studyUsing Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case studyLeila Zemmouchi-Ghomari
 
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...UKSG: connecting the knowledge community
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the webChiara Del Vescovo
 
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...Jukka Huhtamäki
 
DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013Frauke Ziedorn
 
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata HarvestingNikesh Narayanan
 
British Library Linked Open Data Presentation for ALA June 2014
British Library Linked Open Data Presentation for ALA June 2014British Library Linked Open Data Presentation for ALA June 2014
British Library Linked Open Data Presentation for ALA June 2014nw13
 
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataGoing for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataEDINA, University of Edinburgh
 
TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...Peter Löwe
 

What's hot (20)

DBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataDBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of Data
 
Rusbridge Feb 8 Improving Clarity around Continuing Access
Rusbridge Feb 8 Improving Clarity around Continuing AccessRusbridge Feb 8 Improving Clarity around Continuing Access
Rusbridge Feb 8 Improving Clarity around Continuing Access
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
 
Tufts Spatial Data Rescue: Crawling at-risk Government Data
Tufts Spatial Data Rescue: Crawling at-risk Government DataTufts Spatial Data Rescue: Crawling at-risk Government Data
Tufts Spatial Data Rescue: Crawling at-risk Government Data
 
Access to Content via Link Resolvers
Access to Content via Link ResolversAccess to Content via Link Resolvers
Access to Content via Link Resolvers
 
OAI and OAI-PMH
OAI and OAI-PMHOAI and OAI-PMH
OAI and OAI-PMH
 
Harvesting Using the Open Archives Initiative Protocol: What Your OAI Stream ...
Harvesting Using the Open Archives Initiative Protocol: What Your OAI Stream ...Harvesting Using the Open Archives Initiative Protocol: What Your OAI Stream ...
Harvesting Using the Open Archives Initiative Protocol: What Your OAI Stream ...
 
Using Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case studyUsing Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case study
 
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]
 
Deep Impact: Metadata and SUNCAT
Deep Impact: Metadata and SUNCATDeep Impact: Metadata and SUNCAT
Deep Impact: Metadata and SUNCAT
 
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
 
DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013
 
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
 
British Library Linked Open Data Presentation for ALA June 2014
British Library Linked Open Data Presentation for ALA June 2014British Library Linked Open Data Presentation for ALA June 2014
British Library Linked Open Data Presentation for ALA June 2014
 
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataGoing for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial Metadata
 
Open Access Repository Junction
Open Access Repository JunctionOpen Access Repository Junction
Open Access Repository Junction
 
TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...
 

Similar to Better together: building services for public good on top of content from the global network of open repositories

Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries? Robin Rice
 
OpenAIRE workshop @ OR2016 - From Repositories, for repositories
OpenAIRE workshop @ OR2016 - From Repositories, for repositoriesOpenAIRE workshop @ OR2016 - From Repositories, for repositories
OpenAIRE workshop @ OR2016 - From Repositories, for repositoriesOpenAIRE
 
Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...Nancy Pontika
 
Authors' and Publications' Citations knowledge base
Authors' and Publications' Citations knowledge base Authors' and Publications' Citations knowledge base
Authors' and Publications' Citations knowledge base Leila Zemmouchi-Ghomari
 
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020OpenAIRE
 
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...OpenAIRE
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Paolo Manghi
 
Next Generation Repositories
Next Generation RepositoriesNext Generation Repositories
Next Generation Repositoriesukcorr
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptxhasanrdhaiwi
 
Instutional repositories and data
Instutional repositories and dataInstutional repositories and data
Instutional repositories and dataAndrew Treloar
 
Research Data Management in GLAM: Managing Data for Cultural Heritage
Research Data Management in GLAM: Managing Data for Cultural HeritageResearch Data Management in GLAM: Managing Data for Cultural Heritage
Research Data Management in GLAM: Managing Data for Cultural HeritageSarah Anna Stewart
 
Libraries Enabling Open Science: LIBER Strategy & Advocacy
Libraries Enabling Open Science: LIBER Strategy & AdvocacyLibraries Enabling Open Science: LIBER Strategy & Advocacy
Libraries Enabling Open Science: LIBER Strategy & AdvocacyLIBER Europe
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional RepositoriesSridhar Gutam
 
2014 ALA MW SPARC-ACRL Forum Talk
2014 ALA MW SPARC-ACRL Forum Talk2014 ALA MW SPARC-ACRL Forum Talk
2014 ALA MW SPARC-ACRL Forum TalkPaul Bracke
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
 
Text mining in CORE (OR2012)
Text mining in CORE (OR2012)Text mining in CORE (OR2012)
Text mining in CORE (OR2012)petrknoth
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College LondonSarah Anna Stewart
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 

Similar to Better together: building services for public good on top of content from the global network of open repositories (20)

Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?
 
OpenAIRE workshop @ OR2016 - From Repositories, for repositories
OpenAIRE workshop @ OR2016 - From Repositories, for repositoriesOpenAIRE workshop @ OR2016 - From Repositories, for repositories
OpenAIRE workshop @ OR2016 - From Repositories, for repositories
 
Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...
 
Authors' and Publications' Citations knowledge base
Authors' and Publications' Citations knowledge base Authors' and Publications' Citations knowledge base
Authors' and Publications' Citations knowledge base
 
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
 
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
 
Next Generation Repositories
Next Generation RepositoriesNext Generation Repositories
Next Generation Repositories
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptx
 
Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
 
Instutional repositories and data
Instutional repositories and dataInstutional repositories and data
Instutional repositories and data
 
Research Data Management in GLAM: Managing Data for Cultural Heritage
Research Data Management in GLAM: Managing Data for Cultural HeritageResearch Data Management in GLAM: Managing Data for Cultural Heritage
Research Data Management in GLAM: Managing Data for Cultural Heritage
 
Libraries Enabling Open Science: LIBER Strategy & Advocacy
Libraries Enabling Open Science: LIBER Strategy & AdvocacyLibraries Enabling Open Science: LIBER Strategy & Advocacy
Libraries Enabling Open Science: LIBER Strategy & Advocacy
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional Repositories
 
2014 ALA MW SPARC-ACRL Forum Talk
2014 ALA MW SPARC-ACRL Forum Talk2014 ALA MW SPARC-ACRL Forum Talk
2014 ALA MW SPARC-ACRL Forum Talk
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Text mining in CORE (OR2012)
Text mining in CORE (OR2012)Text mining in CORE (OR2012)
Text mining in CORE (OR2012)
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
UKSG 2018 Breakout - Setting your cites to open I4OC - MaccallumUKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
 

More from petrknoth

Qui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingQui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingpetrknoth
 
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in RepositoriesOAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositoriespetrknoth
 
UKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet themUKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet thempetrknoth
 
Enabling Educators to Locate High-Quality Teaching Resources
Enabling Educators to LocateHigh-Quality Teaching ResourcesEnabling Educators to LocateHigh-Quality Teaching Resources
Enabling Educators to Locate High-Quality Teaching Resourcespetrknoth
 
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository DashboardTracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository Dashboardpetrknoth
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...petrknoth
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboardpetrknoth
 
Analysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolsAnalysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolspetrknoth
 
Assessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access PolicyAssessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access Policypetrknoth
 
Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)petrknoth
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure petrknoth
 
Towards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriesTowards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriespetrknoth
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...petrknoth
 
Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncpetrknoth
 
Semantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research EvaluationSemantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research Evaluationpetrknoth
 
Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...petrknoth
 
My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?petrknoth
 
FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)petrknoth
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publicationspetrknoth
 

More from petrknoth (20)

Qui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingQui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishing
 
CORE APIv3
CORE APIv3CORE APIv3
CORE APIv3
 
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in RepositoriesOAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
 
UKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet themUKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet them
 
Enabling Educators to Locate High-Quality Teaching Resources
Enabling Educators to LocateHigh-Quality Teaching ResourcesEnabling Educators to LocateHigh-Quality Teaching Resources
Enabling Educators to Locate High-Quality Teaching Resources
 
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository DashboardTracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboard
 
Analysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolsAnalysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery tools
 
Assessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access PolicyAssessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access Policy
 
Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure
 
Towards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriesTowards effective research recommender systems for repositories
Towards effective research recommender systems for repositories
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
 
Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSync
 
Semantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research EvaluationSemantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research Evaluation
 
Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...
 
My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?
 
FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
 

Recently uploaded

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Better together: building services for public good on top of content from the global network of open repositories

  • 1. Better together: building services for public good on top of content from the global network of open repositories Petr Knoth June 21, 2019 – OAI-11, Geneva Big Scientific Data and Text Analytics Group Knowledge Media Institute, The Open University
  • 2. Papers OA sooner and sooner 1/22 https://physicstoday.scitation.org/do/10.1063/PT.6.2.20190418a/full/ JCDL 2019 Best Paper Award
  • 3. Papers OA sooner and sooner The delay between publication and OA availability decreasing globally 2 Herrmannova, Drahomira; Pontika, Nancy and Knoth, Petr (2019). Do Authors Deposit on Time? Tracking Open Access Policy Compliance. In: 2019 ACM/IEEE Joint Conference on Digital Libraries, 2-6 Jun 2019, Urbana-Champaign, IL
  • 4. Faster open access makes repository infrastructure more important 3
  • 5. We don’t need just open access we need fast open access 4
  • 6. This study was only possible to conduct because of repositories and aggregators working together 5
  • 7. Global network of repositories 7 “A single scientific repository is of limited value, real benefits come from the ability to exchange data within a network … … interoperability allows us to exploit today's computational power so that we can aggregate, data mine, create new tools and services, and generate new knowledge from repository content.” – Confederation of Open Access Repositories (COAR)
  • 8. OA Aggregations and BOAI 2002 “To achieve open access to scholarly journal literature, we recommend two complementary strategies. • Self-Archiving: First, scholars need the tools and assistance to deposit their refereed journal articles in open electronic archives, a practice commonly called, self-archiving. When these archives conform to standards created by the Open Archives Initiative, then search engines and other tools can treat the separate archives as one. Users then need not know which archives exist or where they are located in order to find and make use of their contents. • …” Budapest Open Access Initiative, 2002 8
  • 9. CORE’s mission 9 Aggregate all open access research articles worldwide … … enrich this content and provide seamless access to it through a set of data services …
  • 11. Harvesting data is challenging 11
  • 12. Harvesting data is challenging 14
  • 13. Harvesting data is challenging 15
  • 14. Need for more interoperability across systems 16
  • 15. World’s largest dataset of Open Access full texts • 13,117,488 Hosted full texts • 135,539,113 Metadata records • ~78m Links to full texts • 15TB of raw plain text • 4,123 Data providers 17/22
  • 16. CORE Processing pipeline • Metadata download, extraction and harmonisation • Full text download • Text extractions, sections extraction • Metadata validation and enrichment (DOI, ORCID, etc.) • Thumbnails generation • References and citation contexts extraction • API enrichment (e.g. finding DOIs, linking to other systems) • Document type classification • Deduplication • Indexing • Exposing (data dumps, API, FastSync) 18
  • 17. CORE usage • January 2019 – CORE reached over 10M monthly active users for the first time • 571% increase from January 2018 19
  • 18. Alexa rank • Within top 6k global websites: https://www.alexa. com/ • core.ac.uk by usage in the top 0.0009% of global websites 20
  • 19. CORE Search 21 • Full text search for OA content • Faceted searching • What you find is what you get • Real change of data providers wanting to be included
  • 20. CORE Recommender • Recommending relevant content to users from across all free content • Recommender plugin for repositories • https://core.ac.uk/service s/recommender/ 22
  • 21. CORE Recommender • Recommending relevant content to users from across all free content • Recommender plugin for repositories • https://core.ac.uk/service s/recommender/ 23
  • 22. Introducing CORE Discovery • High coverage of freely available content • Free service for researchers by researchers. No company controlling the pipes. • Best grip on open repository content. • Repository integration • Discovering documents without a DOI. https://core.ac.uk/services/discovery/
  • 24. CORE Discovery Repository integration • Majority of articles in repositories metadata only. • CORE Discovery repository plugin: • turns dead ends of user journeys into journeys fulfilling users’ information needs • makes repository content more discoverable.
  • 25. CORE Search 27 • Full text search for OA content • Faceted searching • What you find is what you get
  • 26. CORE’s raw data services 28
  • 27. Raw data services – CORE API 29 • Enables the development of new applications • Real-time machine access to the world's largest collection of open access papers • Harmonised access to data from across the network of CORE providers • Direct machine access to full texts of research papers
  • 28. Raw data services – CORE Dataset 30 • Download millions of research papers for text and data analysis • Prototype, analyse and mine your data in your infrastructure
  • 29. Raw data services – CORE FastSync 31 • Keeps your data in sync with research content from around the world • Fast and incremental updates as soon as they become available. No usage restrictions • Based on ResourceSync
  • 30. Use powered by CORE 32 • It is beyond human capacities to read all scientific literature • Example use cases in which CORE is applied: • Improving discovery • Plagiarism detection • Question answering in science • Literature based discovery • Fact checking and detection of misinformation • Analysing research trends • Finding experts in a particular domain • Research evaluation and scientometrics • Exploratory and visual search • …
  • 32. Take home 22/22 • Data providers (repositories, preprint servers, journals, etc.) and aggregators need to work together to allow text and data analysis, processing and reuse of large volumes of research papers. • CORE provides the tools for programmatically processing open access data fast, reliably and from across the global network of repositories. • If you are a repository manager or a librarian: • CORE Discovery and Recommender • If you are a developer or analyst: • Build your own stuff using CORE‘s data services on top of the global full text open access courpus

Editor's Notes

  1. Paper are being made deposited into open repositories faster and faster after acceptance
  2. Paper are being made deposited into open repositories faster and faster after acceptance It is a global trend
  3. The amazing thing about this is that the sooner the content becomes available in a repository the more important the repository infrastructure becomes. The vast majority of transactions/downloads of OA literature is for recently published content. Many researchers care about getting access to content soon after it gets released If contnt becomes available in repositories before it gets published, repositories have the potential top become the primary places from which we will access research.
  4. For the OA movement to succeed soon we don’t need just OA but fast OA
  5. If we want the ability to run global analyses on open access data, if we want to be able to monitor our progress, we need repositories and services that have a global view on repositories to work together.
  6. The vision that many data providers can be treated as one
  7. CORE offers seamless, unrestricted access to millions of research papers. CORE hosts the world’s largest collection of open access full texts, which are used by researchers, libraries, software developers, funders and many more. CORE's aggregated content comes from thousands of institutional and subject repositories as well as journals and covers all research disciplines. In January 2019, CORE has hit the mark of 10 million monthly active users (10.41 million users) making core.ac.uk a top 6k website globally by usage according to the independent Alexa Rank and one of the most world's most widely used Open Access platforms. In this talk, Petr will present the CORE service, discuss current challenges in harvesting content from open repositories and share his experience of building value-added services for the society on top of open content.
  8. Mention preprints and large repositories No sufficient support for full text harvesting Previously questioned, not any more
  9. We started aggregating from repositories, we then added aggregating OA from publishers, we are now working to aggregate OA content from anywhere on the Web.
  10. We started aggregating from repositories, we then added aggregating OA from publishers, we are now working to aggregate OA content from anywhere on the Web.
  11. Hard job to do this because more interoperability needed.
  12. Non-trivial pipeline Machine learning involved The idea that one person can do this on their own doesn’t really hold Data providers and repositories must collaborate
  13. When all the processing is done. People appreciate the product.
  14. Highest coverage of freely available content. Our tests have shown CORE Discovery finding more free content than any other discovery system. Free service for researchers by researchers. CORE Discovery is the only free content discovery extension developed by researchers for researchers. There is no major publisher or enterprise controlling and profiting from your usage data. Best grip on open repository content. Due to CORE being a leader in harvesting open access literature, CORE Discovery has the best grip on open content from open repositories as opposed to other services that disproportionately focus only on content indexed in major commercial databases. Repository integration and discovering documents without a DOI. The only service offering seamless and free integration into repositories. CORE Discovery is also the only discovery system that can locate scientific content even for items with an unknown DOI or which do not have a DOI.
  15. Open access discovery tools locate freely available copies of research papers which might be behind the paywall K[.*]io
  16. Enable others to programmatically interact with CORE data. Essential for developing applications that make use of information about research papers Supporting a variety of use cases >500 registered users, including companies for CORE API Use: CORE API to interact with CORE in real time CORE Data Dumps to run batch process, develop prototype applications and do research CORE SDKs to gain easy access to CORE API from several programming languages CORE FastSync
  17. Enable others to programmatically interact with CORE data. Essential for developing applications that make use of information about research papers Supporting a variety of use cases >500 registered users, including companies for CORE API Use: CORE API to interact with CORE in real time CORE Data Dumps to run batch process, develop prototype applications and do research CORE SDKs to gain easy access to CORE API from several programming languages CORE FastSync
  18. Enable others to programmatically interact with CORE data. Essential for developing applications that make use of information about research papers Supporting a variety of use cases >500 registered users, including companies for CORE API Use: CORE API to interact with CORE in real time CORE Data Dumps to run batch process, develop prototype applications and do research CORE SDKs to gain easy access to CORE API from several programming languages CORE FastSync
  19. Enable others to programmatically interact with CORE data. Essential for developing applications that make use of information about research papers Supporting a variety of use cases >500 registered users, including companies for CORE API Use: CORE API to interact with CORE in real time CORE Data Dumps to run batch process, develop prototype applications and do research CORE SDKs to gain easy access to CORE API from several programming languages CORE FastSync
  20. Put some quotes here/logos