SlideShare a Scribd company logo
1 of 30
1/34
CORE:
Aggregating Open Access Content from Repositories
Worldwide
Petr Knoth
CORE (Connecting REpositories)
Knowledge Media institute
The Open University
@petrknoth, #diggicore
2/34
Outline
1. The need for aggregationg Open Access content
2. The CORE system
3. Collaboration with TEL
3/34
Outline
1. The need for aggregationg Open Access content
2. The CORE system
3. Collaboration with TEL
4/34
What is Open Access exactly?
By “open access” to [peer-reviewed research literature], we
mean its free availability on the public internet, permitting any
users to read, download, copy, distribute, print, search, or link to
the full texts of these articles, crawl them for indexing, pass them
as data to software, or use them for any other lawful purpose,
without financial, legal, or technical barriers other than those
inseparable from gaining access to the internet itself.
[BOAI, 2002]
5/34
Growth of items in Open Access repositories
6/34
COAR: About harvesting and aggregations …
“Each individual repository is of limited value for research: the real
power of Open Access lies in the possibility of connecting and tying
together repositories, which is why we need interoperability. In
order to create a seamless layer of content through connected
repositories from around the world, Open Access relies on
interoperability, the ability for systems to communicate with each
other and pass information back and forth in a usable format.
Interoperability allows us to exploit today's computational power so
that we can aggregate, data mine, create new tools and services,
and generate new knowledge from repository content.’’
[COAR manifesto]
7/34
Outline
1. The need for aggregationg Open Access content
2. The CORE system
3. Collaboration with TEL
8/34
The mission of CORE
Aggregate all open access content ditsributed across
different systems worldwide, enrich this content and
provide access to it through a set of services …
9/34
The CORE aggregator
10/34
CORE statistics
• Content
• 16M+ records
• 500+ repositories
• 1M+ full-texts
• UK national aggregator, project started 2011
• Full-text aggregator (not just metadata)
• 0.5 million monthly visits, but only 150k six months ago
• Placed among Top 10 search engines for research that go beyond
Google [JISC, 2013]
• Listed among Top 100 Thesis and Dissertation Resources
• Used by many researchers and organisaitons, including the
European Library and UNESCO
11/34
CORE supports a three access levels architecture
• Raw data access.
• Transaction information access.
• Analytical information access.
12/34
CORE supports a three access levels architecture
• Raw data access. Developers, DLs, DL researchers, companies …
• Transaction information access. Researchers, students, life-long
learners …
• Analytical information access. Funders, government, bussiness
intelligence …
13/34
CORE supports a three access levels architecture
• Raw data access. Developers, DLs, DL researchers, companies …
Apps: CORE API, CORE Data Dumps
• Transaction information access. Researchers, students, life-long
learners …
Apps: CORE Portal, CORE Mobile, CORE (recommendation) Plugin
• Analytical information access. Funders, government, bussiness
intelligence …
Apps: Repository Analytics, CORE Policy Compliance Analytics
14/34
CORE API
Enables external systems to interact with OA data (JSON or XML)
•Search, download metadata and cotent
•Content recommendation
•Citation references
•Statistics
•…
Used by: Libraries, Institutional repositories, developers
15/34
Data dumps
• Cleaned and enriched with additional information
• Distributed as two large zip files: metadata + full-texts
• Created as part of the Digging into Connected Repositories
(DiggiCORE) project
16/34
Examples of usage
• Author disambiguation
• Mining URLs from papers to detect trends
• Tagging of chemical compounds for image retrieval
• Citation analysis
• Content recommendation
• Detecting collaboration patterns of scientific communities
• Monitoring of OA growth
• Any form of text or data mining …
• API useful for services and data dumps for offline experiments
17/34
Why to use it?
• It is only OA, thus you can legally mine it …
• You can redistribute it: essential for reproducible research
• Very large and growing
• Kept up-to-date
• Ability to rerun experiments with new data
18/34
Why to use it?
• Open infrastructure for open science
• Not owned or managed by a for profit company => Ability to run
your own services = new opportunities and no give away of your
research to commercial companies
19/34
CORE Applications
CORE Portal – Allows searching and navigating scientific publications
aggregated from Open Access repositories
20/34
CORE Applications
CORE Mobile – Allows searching and navigating scientific
publications aggregated from Open Access repositories
21/34
CORE Applications
CORE Plugin – A plugin to system that recommendations for related
items.
22/34
CORE Applications
CORE Plugin – A plugin to system that recommendations for related
items.
23/34
CORE Applications
Repository Analytics – is an analytical tool supporting providers of
open access content (in particular repository managers).
24/34
CORE Applications
Policy Compliance Analytics (under development) – Tool to support
the implementation and monitoring of the UK HEFCE OA policy.
25/34
Outline
1. The need for aggregationg Open Access content
2. The CORE system
3. Collaboration with TEL
26/34
Collaboration with TEL
Projects
•Europeana Cloud, DiggiCORE (current projects)
•DiscoveryCORE (proposal)
•Horizon 2020 …
Topics
•Large scale metadata and content aggregation
•Analysis of big document datasets based on full-text and metadata
•Discovering semantic connections
•Clustering/classification of content, subject analysis
•Network analysis (citation networks, relatedness, social)
•Services for digital libraries
27/34
Conclusions
• Open Access knowledge available online on the rise
• CORE provides a single access point to this knowledge
and enables its mining
• Opportunities for innovative applications and research
28/34
Thank you!
Open access needs freely exploitable data
29/34
References 1/2
[BOAI, 2002] Budapest Open Access Initiative. (2002)
http://www.opensocietyfoundations.org/openaccess/boai-10-recommendations
[Crow, 2002] Crow, R. (2002). The case for institutional repositories: a SPARC position
paper. ARL Bimonthly Report 223.
[Knoth & Zdrahal, 2012] Knoth, P. and Zdrahal, Z. (2012) CORE: Three Access Levels to
Underpin Open Access, D-Lib Magazine, 18, 11/12, Corporation for National Research
Initiatives, http://dx.doi.org/10.1045/november2012-knoth
[Konkiel, 2012] Konkiel, S. (2012) Are Institutional Repositories Doing Their Job?
https://blogs.libraries.iub.edu/scholcomm/2012/09/11/are-institutional-repositories-
doing-their-job/
[Laakso & Bjork, 2012] Laakso, M., & Björk, B. C. (2012). Anatomy of open access
publishing: a study of longitudinal development and internal structure. BMC Medicine,
10(1), 124.
30/34
References 2/2
[Morrison, 2012] Morrison, Louise (2012) 5 reasons why I can’t find Open Access
publications. http://mmitscotland.wordpress.com/2012/08/06/5-reasons-why-i-cant-find-
open-access-publications-2/
[OAI-PMH v2.0, 2008] The Open Archives Initiative Protocol for Metadata Harvesting
Version 2.0 (OAI-PMH), Impementation Guidelines (2008).
http://www.openarchives.org/OAI/openarchivesprotocol.html
[ResourceSync draft, 2013] ResourceSync protocol draft. 2013
http://www.niso.org/workrooms/resourcesync/
[Salo, 2008] Salo, D. (2008). Innkeeper at the roach motel. Library Trends, 57(2), 98-123.
[Van de Sompel et al, 2004] Van de Sompel, H., Nelson, M. L., Lagoze, C., & Warner, S.
(2004). Resource harvesting within the OAI-PMH framework. D-lib magazine, 10(12), 1082-
9873.

More Related Content

What's hot

Developing linked Open Data - Nuno Freire, Senior Researcher, The European Li...
Developing linked Open Data - Nuno Freire, Senior Researcher, The European Li...Developing linked Open Data - Nuno Freire, Senior Researcher, The European Li...
Developing linked Open Data - Nuno Freire, Senior Researcher, The European Li...
The European Library
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
Open Science Fair
 

What's hot (20)

OpenAIRE webinar: Horizon 2020 Open Science Policies and beyond, with Emilie ...
OpenAIRE webinar: Horizon 2020 Open Science Policies and beyond, with Emilie ...OpenAIRE webinar: Horizon 2020 Open Science Policies and beyond, with Emilie ...
OpenAIRE webinar: Horizon 2020 Open Science Policies and beyond, with Emilie ...
 
Webinar: Data management and the Open Research Data Pilot in Horizon 2020
Webinar: Data management and the Open Research Data Pilot in Horizon 2020Webinar: Data management and the Open Research Data Pilot in Horizon 2020
Webinar: Data management and the Open Research Data Pilot in Horizon 2020
 
Connecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open scienceConnecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open science
 
Open Spatial Data: Sources and Tools
Open Spatial Data: Sources and ToolsOpen Spatial Data: Sources and Tools
Open Spatial Data: Sources and Tools
 
MIDESS
MIDESSMIDESS
MIDESS
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
Exposing EO Linked (meta-)Data from OpenSearch CatalogueExposing EO Linked (meta-)Data from OpenSearch Catalogue
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
 
Mapping the Repository Landscape
Mapping the Repository LandscapeMapping the Repository Landscape
Mapping the Repository Landscape
 
Developing linked Open Data - Nuno Freire, Senior Researcher, The European Li...
Developing linked Open Data - Nuno Freire, Senior Researcher, The European Li...Developing linked Open Data - Nuno Freire, Senior Researcher, The European Li...
Developing linked Open Data - Nuno Freire, Senior Researcher, The European Li...
 
OpenAIRE Open Innovation call: Next Generation Repositories
OpenAIRE Open Innovation call: Next Generation RepositoriesOpenAIRE Open Innovation call: Next Generation Repositories
OpenAIRE Open Innovation call: Next Generation Repositories
 
The FP7 Post-Grant Open Access Pilot: An All-Encompassing Gold Open Access Fu...
The FP7 Post-Grant Open Access Pilot: An All-Encompassing Gold Open Access Fu...The FP7 Post-Grant Open Access Pilot: An All-Encompassing Gold Open Access Fu...
The FP7 Post-Grant Open Access Pilot: An All-Encompassing Gold Open Access Fu...
 
20190527_Helena Cousijn _ FREYA
20190527_Helena Cousijn _ FREYA20190527_Helena Cousijn _ FREYA
20190527_Helena Cousijn _ FREYA
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
 
20191119_The OpenAIRE Research Graph
20191119_The OpenAIRE Research Graph 20191119_The OpenAIRE Research Graph
20191119_The OpenAIRE Research Graph
 
OpenAIRE webinars during OA week 2017: Humanities and Open Science
OpenAIRE webinars during OA week 2017: Humanities and Open ScienceOpenAIRE webinars during OA week 2017: Humanities and Open Science
OpenAIRE webinars during OA week 2017: Humanities and Open Science
 
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
OpenAIRE webinar on Open Access in H2020 (OAW2016)
OpenAIRE webinar on Open Access in H2020 (OAW2016)OpenAIRE webinar on Open Access in H2020 (OAW2016)
OpenAIRE webinar on Open Access in H2020 (OAW2016)
 
November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
 
Open access to publications in Horizon 2020
Open access to publications in Horizon 2020Open access to publications in Horizon 2020
Open access to publications in Horizon 2020
 

Viewers also liked

Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
 

Viewers also liked (16)

Iepy pydata-amsterdam-2016
Iepy pydata-amsterdam-2016Iepy pydata-amsterdam-2016
Iepy pydata-amsterdam-2016
 
Quepy
QuepyQuepy
Quepy
 
Extracting Multilingual Natural-Language Patterns for RDF Predicates
Extracting Multilingual Natural-Language Patterns for RDF PredicatesExtracting Multilingual Natural-Language Patterns for RDF Predicates
Extracting Multilingual Natural-Language Patterns for RDF Predicates
 
myHadoop 0.30
myHadoop 0.30myHadoop 0.30
myHadoop 0.30
 
Running Natural Language Queries on MongoDB
Running Natural Language Queries on MongoDBRunning Natural Language Queries on MongoDB
Running Natural Language Queries on MongoDB
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache Spark
 
A Tool For Big Data Analysis using Apache Spark
A Tool For Big Data Analysis using Apache SparkA Tool For Big Data Analysis using Apache Spark
A Tool For Big Data Analysis using Apache Spark
 
Natural language search using Neo4j
Natural language search using Neo4jNatural language search using Neo4j
Natural language search using Neo4j
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014
 
Natural Language Processing with Neo4j
Natural Language Processing with Neo4jNatural Language Processing with Neo4j
Natural Language Processing with Neo4j
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and REnabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and R
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Dell Storage Management
Dell Storage ManagementDell Storage Management
Dell Storage Management
 
NLIDB(Natural Language Interface to DataBases)
NLIDB(Natural Language Interface to DataBases)NLIDB(Natural Language Interface to DataBases)
NLIDB(Natural Language Interface to DataBases)
 

Similar to CORE - Petr Knoth, Research Associate

Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...
Peter Löwe
 
1818 societypresentation revised2013
1818 societypresentation revised20131818 societypresentation revised2013
1818 societypresentation revised2013
Eliza McLeod
 

Similar to CORE - Petr Knoth, Research Associate (20)

finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptx
 
How practising open research can benefit you
How practising open research can benefit youHow practising open research can benefit you
How practising open research can benefit you
 
Peer Review on the Move from Closed to Open
Peer Review on the Move from Closed to OpenPeer Review on the Move from Closed to Open
Peer Review on the Move from Closed to Open
 
My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?
 
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
 
Leadership in Open Access Arena in Turkey and Effect of OpenAIRE2020 Project
Leadership in Open Access Arena in Turkey and Effect of OpenAIRE2020 ProjectLeadership in Open Access Arena in Turkey and Effect of OpenAIRE2020 Project
Leadership in Open Access Arena in Turkey and Effect of OpenAIRE2020 Project
 
Open Access and the Evolving Scholarly Communication Environment
Open Access and the Evolving Scholarly Communication EnvironmentOpen Access and the Evolving Scholarly Communication Environment
Open Access and the Evolving Scholarly Communication Environment
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
 
Libraries Enabling Open Science: LIBER Strategy & Advocacy
Libraries Enabling Open Science: LIBER Strategy & AdvocacyLibraries Enabling Open Science: LIBER Strategy & Advocacy
Libraries Enabling Open Science: LIBER Strategy & Advocacy
 
Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...
 
OpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open ScienceOpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open Science
 
OpenAIRE services and tools - presentation at #DI4R2016
OpenAIRE services and tools - presentation at #DI4R2016OpenAIRE services and tools - presentation at #DI4R2016
OpenAIRE services and tools - presentation at #DI4R2016
 
Moving content across the OpenAIRE infrastructure boundaries (6th RDA Plenary)
Moving content across the OpenAIRE infrastructure boundaries (6th RDA Plenary) Moving content across the OpenAIRE infrastructure boundaries (6th RDA Plenary)
Moving content across the OpenAIRE infrastructure boundaries (6th RDA Plenary)
 
Emerging Trends in Scholarly Communication and the coming Decade of Open Access
Emerging Trends in Scholarly Communication and the coming Decade of Open AccessEmerging Trends in Scholarly Communication and the coming Decade of Open Access
Emerging Trends in Scholarly Communication and the coming Decade of Open Access
 
Manage it locally to share it globally: RDM and Wikimedia Commons
Manage it locally to share it globally: RDM and Wikimedia CommonsManage it locally to share it globally: RDM and Wikimedia Commons
Manage it locally to share it globally: RDM and Wikimedia Commons
 
Open ILRI
Open ILRIOpen ILRI
Open ILRI
 
Data and Research Infrastructures and Open Science
Data and Research Infrastructures and Open ScienceData and Research Infrastructures and Open Science
Data and Research Infrastructures and Open Science
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Text mining in CORE (OR2012)
Text mining in CORE (OR2012)Text mining in CORE (OR2012)
Text mining in CORE (OR2012)
 
1818 societypresentation revised2013
1818 societypresentation revised20131818 societypresentation revised2013
1818 societypresentation revised2013
 

More from The European Library

Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...
The European Library
 
Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries,...
Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries,...Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries,...
Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries,...
The European Library
 
Europeana Newspapers (Project Details and Aggregation Workflow)
Europeana Newspapers (Project Details and Aggregation Workflow)Europeana Newspapers (Project Details and Aggregation Workflow)
Europeana Newspapers (Project Details and Aggregation Workflow)
The European Library
 
Alastair Dunning, Open data at The European library, TEL
Alastair Dunning, Open data at The European library, TELAlastair Dunning, Open data at The European library, TEL
Alastair Dunning, Open data at The European library, TEL
The European Library
 
Alastair Dunning, Europeana Newspapers, The European Library
Alastair Dunning, Europeana Newspapers, The European LibraryAlastair Dunning, Europeana Newspapers, The European Library
Alastair Dunning, Europeana Newspapers, The European Library
The European Library
 
Alastair Dunning, The successes of the Europeana Libraries project, The Europ...
Alastair Dunning, The successes of the Europeana Libraries project, The Europ...Alastair Dunning, The successes of the Europeana Libraries project, The Europ...
Alastair Dunning, The successes of the Europeana Libraries project, The Europ...
The European Library
 
Alastair Dunning, Introduction to Europeana Cloud, The European Library
Alastair Dunning, Introduction to Europeana Cloud, The European LibraryAlastair Dunning, Introduction to Europeana Cloud, The European Library
Alastair Dunning, Introduction to Europeana Cloud, The European Library
The European Library
 
Dunning welsh-newspapers-130314110640-phpapp01
Dunning welsh-newspapers-130314110640-phpapp01Dunning welsh-newspapers-130314110640-phpapp01
Dunning welsh-newspapers-130314110640-phpapp01
The European Library
 
Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02
The European Library
 
Alastair Dunning, Breaking the waves, The European Library
Alastair Dunning, Breaking the waves, The European LibraryAlastair Dunning, Breaking the waves, The European Library
Alastair Dunning, Breaking the waves, The European Library
The European Library
 
Alastair Dunning, Challenges and Solutions in Creating a European Historic Ne...
Alastair Dunning, Challenges and Solutions in Creating a European Historic Ne...Alastair Dunning, Challenges and Solutions in Creating a European Historic Ne...
Alastair Dunning, Challenges and Solutions in Creating a European Historic Ne...
The European Library
 
Alastair Dunning, Future Directions for The European Library
Alastair Dunning, Future Directions for The European Library Alastair Dunning, Future Directions for The European Library
Alastair Dunning, Future Directions for The European Library
The European Library
 
Chiara Latronico,Europeana Cloud - Ingestion Clinic, The European Library
Chiara Latronico,Europeana Cloud - Ingestion Clinic, The European LibraryChiara Latronico,Europeana Cloud - Ingestion Clinic, The European Library
Chiara Latronico,Europeana Cloud - Ingestion Clinic, The European Library
The European Library
 
Chiara latronico, Europeana Collections 1914-1918 - Ingestion and Aggregation...
Chiara latronico, Europeana Collections 1914-1918 - Ingestion and Aggregation...Chiara latronico, Europeana Collections 1914-1918 - Ingestion and Aggregation...
Chiara latronico, Europeana Collections 1914-1918 - Ingestion and Aggregation...
The European Library
 

More from The European Library (20)

Linking Collections Through Linked Open Data
Linking Collections Through Linked Open DataLinking Collections Through Linked Open Data
Linking Collections Through Linked Open Data
 
Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...
 
Freire model api
Freire model apiFreire model api
Freire model api
 
The european library ukb nienke 13 feb 2014
The european library   ukb nienke 13 feb 2014The european library   ukb nienke 13 feb 2014
The european library ukb nienke 13 feb 2014
 
Aubéry Escande - Europeana Newspapers - A new tool for researchers
Aubéry Escande - Europeana Newspapers - A new tool for researchersAubéry Escande - Europeana Newspapers - A new tool for researchers
Aubéry Escande - Europeana Newspapers - A new tool for researchers
 
Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries,...
Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries,...Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries,...
Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries,...
 
Europeana Newspapers (Project Details and Aggregation Workflow)
Europeana Newspapers (Project Details and Aggregation Workflow)Europeana Newspapers (Project Details and Aggregation Workflow)
Europeana Newspapers (Project Details and Aggregation Workflow)
 
Europeana Newspapers Aggregation and Indexing Plan
Europeana Newspapers Aggregation and Indexing PlanEuropeana Newspapers Aggregation and Indexing Plan
Europeana Newspapers Aggregation and Indexing Plan
 
Alastair Dunning, Open data at The European library, TEL
Alastair Dunning, Open data at The European library, TELAlastair Dunning, Open data at The European library, TEL
Alastair Dunning, Open data at The European library, TEL
 
Alastair Dunning, Europeana Newspapers, The European Library
Alastair Dunning, Europeana Newspapers, The European LibraryAlastair Dunning, Europeana Newspapers, The European Library
Alastair Dunning, Europeana Newspapers, The European Library
 
Alastair Dunning, The successes of the Europeana Libraries project, The Europ...
Alastair Dunning, The successes of the Europeana Libraries project, The Europ...Alastair Dunning, The successes of the Europeana Libraries project, The Europ...
Alastair Dunning, The successes of the Europeana Libraries project, The Europ...
 
Alastair Dunning, Introduction to Europeana Cloud, The European Library
Alastair Dunning, Introduction to Europeana Cloud, The European LibraryAlastair Dunning, Introduction to Europeana Cloud, The European Library
Alastair Dunning, Introduction to Europeana Cloud, The European Library
 
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
 
Dunning welsh-newspapers-130314110640-phpapp01
Dunning welsh-newspapers-130314110640-phpapp01Dunning welsh-newspapers-130314110640-phpapp01
Dunning welsh-newspapers-130314110640-phpapp01
 
Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02
 
Alastair Dunning, Breaking the waves, The European Library
Alastair Dunning, Breaking the waves, The European LibraryAlastair Dunning, Breaking the waves, The European Library
Alastair Dunning, Breaking the waves, The European Library
 
Alastair Dunning, Challenges and Solutions in Creating a European Historic Ne...
Alastair Dunning, Challenges and Solutions in Creating a European Historic Ne...Alastair Dunning, Challenges and Solutions in Creating a European Historic Ne...
Alastair Dunning, Challenges and Solutions in Creating a European Historic Ne...
 
Alastair Dunning, Future Directions for The European Library
Alastair Dunning, Future Directions for The European Library Alastair Dunning, Future Directions for The European Library
Alastair Dunning, Future Directions for The European Library
 
Chiara Latronico,Europeana Cloud - Ingestion Clinic, The European Library
Chiara Latronico,Europeana Cloud - Ingestion Clinic, The European LibraryChiara Latronico,Europeana Cloud - Ingestion Clinic, The European Library
Chiara Latronico,Europeana Cloud - Ingestion Clinic, The European Library
 
Chiara latronico, Europeana Collections 1914-1918 - Ingestion and Aggregation...
Chiara latronico, Europeana Collections 1914-1918 - Ingestion and Aggregation...Chiara latronico, Europeana Collections 1914-1918 - Ingestion and Aggregation...
Chiara latronico, Europeana Collections 1914-1918 - Ingestion and Aggregation...
 

Recently uploaded

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

CORE - Petr Knoth, Research Associate

  • 1. 1/34 CORE: Aggregating Open Access Content from Repositories Worldwide Petr Knoth CORE (Connecting REpositories) Knowledge Media institute The Open University @petrknoth, #diggicore
  • 2. 2/34 Outline 1. The need for aggregationg Open Access content 2. The CORE system 3. Collaboration with TEL
  • 3. 3/34 Outline 1. The need for aggregationg Open Access content 2. The CORE system 3. Collaboration with TEL
  • 4. 4/34 What is Open Access exactly? By “open access” to [peer-reviewed research literature], we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. [BOAI, 2002]
  • 5. 5/34 Growth of items in Open Access repositories
  • 6. 6/34 COAR: About harvesting and aggregations … “Each individual repository is of limited value for research: the real power of Open Access lies in the possibility of connecting and tying together repositories, which is why we need interoperability. In order to create a seamless layer of content through connected repositories from around the world, Open Access relies on interoperability, the ability for systems to communicate with each other and pass information back and forth in a usable format. Interoperability allows us to exploit today's computational power so that we can aggregate, data mine, create new tools and services, and generate new knowledge from repository content.’’ [COAR manifesto]
  • 7. 7/34 Outline 1. The need for aggregationg Open Access content 2. The CORE system 3. Collaboration with TEL
  • 8. 8/34 The mission of CORE Aggregate all open access content ditsributed across different systems worldwide, enrich this content and provide access to it through a set of services …
  • 10. 10/34 CORE statistics • Content • 16M+ records • 500+ repositories • 1M+ full-texts • UK national aggregator, project started 2011 • Full-text aggregator (not just metadata) • 0.5 million monthly visits, but only 150k six months ago • Placed among Top 10 search engines for research that go beyond Google [JISC, 2013] • Listed among Top 100 Thesis and Dissertation Resources • Used by many researchers and organisaitons, including the European Library and UNESCO
  • 11. 11/34 CORE supports a three access levels architecture • Raw data access. • Transaction information access. • Analytical information access.
  • 12. 12/34 CORE supports a three access levels architecture • Raw data access. Developers, DLs, DL researchers, companies … • Transaction information access. Researchers, students, life-long learners … • Analytical information access. Funders, government, bussiness intelligence …
  • 13. 13/34 CORE supports a three access levels architecture • Raw data access. Developers, DLs, DL researchers, companies … Apps: CORE API, CORE Data Dumps • Transaction information access. Researchers, students, life-long learners … Apps: CORE Portal, CORE Mobile, CORE (recommendation) Plugin • Analytical information access. Funders, government, bussiness intelligence … Apps: Repository Analytics, CORE Policy Compliance Analytics
  • 14. 14/34 CORE API Enables external systems to interact with OA data (JSON or XML) •Search, download metadata and cotent •Content recommendation •Citation references •Statistics •… Used by: Libraries, Institutional repositories, developers
  • 15. 15/34 Data dumps • Cleaned and enriched with additional information • Distributed as two large zip files: metadata + full-texts • Created as part of the Digging into Connected Repositories (DiggiCORE) project
  • 16. 16/34 Examples of usage • Author disambiguation • Mining URLs from papers to detect trends • Tagging of chemical compounds for image retrieval • Citation analysis • Content recommendation • Detecting collaboration patterns of scientific communities • Monitoring of OA growth • Any form of text or data mining … • API useful for services and data dumps for offline experiments
  • 17. 17/34 Why to use it? • It is only OA, thus you can legally mine it … • You can redistribute it: essential for reproducible research • Very large and growing • Kept up-to-date • Ability to rerun experiments with new data
  • 18. 18/34 Why to use it? • Open infrastructure for open science • Not owned or managed by a for profit company => Ability to run your own services = new opportunities and no give away of your research to commercial companies
  • 19. 19/34 CORE Applications CORE Portal – Allows searching and navigating scientific publications aggregated from Open Access repositories
  • 20. 20/34 CORE Applications CORE Mobile – Allows searching and navigating scientific publications aggregated from Open Access repositories
  • 21. 21/34 CORE Applications CORE Plugin – A plugin to system that recommendations for related items.
  • 22. 22/34 CORE Applications CORE Plugin – A plugin to system that recommendations for related items.
  • 23. 23/34 CORE Applications Repository Analytics – is an analytical tool supporting providers of open access content (in particular repository managers).
  • 24. 24/34 CORE Applications Policy Compliance Analytics (under development) – Tool to support the implementation and monitoring of the UK HEFCE OA policy.
  • 25. 25/34 Outline 1. The need for aggregationg Open Access content 2. The CORE system 3. Collaboration with TEL
  • 26. 26/34 Collaboration with TEL Projects •Europeana Cloud, DiggiCORE (current projects) •DiscoveryCORE (proposal) •Horizon 2020 … Topics •Large scale metadata and content aggregation •Analysis of big document datasets based on full-text and metadata •Discovering semantic connections •Clustering/classification of content, subject analysis •Network analysis (citation networks, relatedness, social) •Services for digital libraries
  • 27. 27/34 Conclusions • Open Access knowledge available online on the rise • CORE provides a single access point to this knowledge and enables its mining • Opportunities for innovative applications and research
  • 28. 28/34 Thank you! Open access needs freely exploitable data
  • 29. 29/34 References 1/2 [BOAI, 2002] Budapest Open Access Initiative. (2002) http://www.opensocietyfoundations.org/openaccess/boai-10-recommendations [Crow, 2002] Crow, R. (2002). The case for institutional repositories: a SPARC position paper. ARL Bimonthly Report 223. [Knoth & Zdrahal, 2012] Knoth, P. and Zdrahal, Z. (2012) CORE: Three Access Levels to Underpin Open Access, D-Lib Magazine, 18, 11/12, Corporation for National Research Initiatives, http://dx.doi.org/10.1045/november2012-knoth [Konkiel, 2012] Konkiel, S. (2012) Are Institutional Repositories Doing Their Job? https://blogs.libraries.iub.edu/scholcomm/2012/09/11/are-institutional-repositories- doing-their-job/ [Laakso & Bjork, 2012] Laakso, M., & Björk, B. C. (2012). Anatomy of open access publishing: a study of longitudinal development and internal structure. BMC Medicine, 10(1), 124.
  • 30. 30/34 References 2/2 [Morrison, 2012] Morrison, Louise (2012) 5 reasons why I can’t find Open Access publications. http://mmitscotland.wordpress.com/2012/08/06/5-reasons-why-i-cant-find- open-access-publications-2/ [OAI-PMH v2.0, 2008] The Open Archives Initiative Protocol for Metadata Harvesting Version 2.0 (OAI-PMH), Impementation Guidelines (2008). http://www.openarchives.org/OAI/openarchivesprotocol.html [ResourceSync draft, 2013] ResourceSync protocol draft. 2013 http://www.niso.org/workrooms/resourcesync/ [Salo, 2008] Salo, D. (2008). Innkeeper at the roach motel. Library Trends, 57(2), 98-123. [Van de Sompel et al, 2004] Van de Sompel, H., Nelson, M. L., Lagoze, C., & Warner, S. (2004). Resource harvesting within the OAI-PMH framework. D-lib magazine, 10(12), 1082- 9873.

Editor's Notes

  1. In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.
  2. In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.
  3. I can specify the differences between this and RIOXX