CORE: Aggregating and Enriching Content to Support Open Access

CORE: Aggregating and Enriching
Content to Support Open Access
Petr Knoth
The Open University

1/52

Outline
1. Aggregating Open Access (OA) publications – why, how, what
for?
2. The CORE system
3. Supporting research in mining databases of scientific
publications (DiggiCORE)

2/52

Outline
for?
2. The CORE system

3/52

Growth of items in Open Access repositories

4/52

Growth of Open Access repositories

5/52

Growth of articles in OA journals

6/52

Growth of OA journals

7/52

Green Open Access - statistics

8/52

Why we need aggregations?
“Each individual repository is of limited value for research: the real
power of Open Access lies in the possibility of connecting and tying
together repositories, which is why we need interoperability. In
order to create a seamless layer of content through connected
repositories from around the world, Open Access relies on
interoperability, the ability for systems to communicate with each
other and pass information back and forth in a usable format.
Interoperability allows us to exploit today's computational power so
that we can aggregate, data mine, create new tools and
services, and generate new knowledge from repository content.’’
[COAR manifesto]

9/52

Access to information according to the level of abstraction

Metadata Transfer
Interoperability

Metadata

OLTP
Analytical

Semantic Enrichment
Repository
information access

Interfaces
Aggregation
Transaction
Repository information access
Content

OLAP

Raw data access
Repository

10/52

Who should be supported by aggregations?

The following users groups (divided according to the level of
abstraction of information they need):
• Raw data access.
• Transaction information access.
• Analytical information access.

11/52

Who should be supported by aggregations?

• The following users groups (divided according to the level of
abstraction of information they need):
• Raw data access. Developers, DLs, DL researchers, companies …
• Transaction information access. Researchers, students, life-long learners …
• Analytical information access. Funders, government, bussiness intelligence
…

12/52

Layers of an aggregation system

Interfaces

OLTP OLAP

Enrichment

Metadata Content

Metadata Transfer Interoperability

13/52

Layers of an aggregation system
APIs (REST, SOAP, XML-RPC), UIs, Dashboards Statistics

Interfaces

OLTP OLAP

Enrichment
Catalog records
Metadata Content

Metadata Transfer Interoperability
Annotations

OAI-PMH, OAI-ORE … Dublin Core, XML, RDF … PDF, Word …

14/52

Access to information according to the level of abstraction

Metadata Transfer
Interoperability

Metadata

OLTP
Repository Analytical
information access

Interfaces
Enrichment
Transaction
Content

OLAP

Raw data access
Repository

15/52

Related systems

16/52

Aggregation projects – BASE

Metadata Transfer
Interoperability

Metadata

OLTP
information access

Interfaces
Enrichment
Transaction
Content

OLAP

Raw data access
Repository

17/52

Aggregation projects – OAISter/WorldCAT

Metadata Transfer
Interoperability

Metadata

OLTP
information access

Interfaces
Enrichment
Transaction
Content

OLAP

Raw data access
Repository

18/52

Aggregation projects – RepUK

Metadata Transfer
Interoperability

Metadata

OLTP
information access

Interfaces
Enrichment
Transaction
Content

OLAP

Raw data access
Repository

19/52

Aggregations need access to content, not just metadata!

• Certain metadata types can be created only at the level of the
aggregation
• Certain metadata can be changing in time
• Ensuring content:
• accessibility
• availability
• validity
• quality
• …

20/52

Aggregation projects – CiteSeerX

Metadata Transfer
Interoperability

Metadata

OLTP
information access

Interfaces
Enrichment
Transaction
Content

OLAP

Raw data access
Repository

21/52

Should an aggregation system support all three user types?

Can be realised by more than one system
providing that
the dataset is the same!

22/52

Outline
for?
2. The CORE system

23/52

CORE objectives
• CORE aims to provide a comprehensive technical infrastructure
for Open Access scholarly publications that will support access
and reuse of scholarly materials at different levels of abstraction.
• A nation-wide aggregation system that will improve the discovery
of publications stored in British Open Access Repositories (OARs).

24/52

What does CORE provide at different aggregation levels?

Metadata Transfer
Interoperability

Metadata

OLTP
information access

Interfaces
Enrichment
Transaction
Content

OLAP

Raw data access
Repository

25/52

CORE functionality

26/52

CORE functionality
Step 1: Metadata and full-text harvesting

Content harvesting, processing

27/52

Semantic similarity, Citation
extraction, classsification, …

Metadata Transfer
Interoperability

Metadata

OLTP
information access

Interfaces
Enrichment
Transaction
Content

OLAP

Raw data access
Repository

28/52

CORE functionality
Step 2: Semantic enrichment

Semantic enrichment

29/52


Metadata Transfer
Interoperability

Metadata

OLTP
information access

Interfaces
Enrichment
Transaction
Content

OLAP

Raw data access
Repository

30/52

CORE functionality
Step 3: Providing a set of services on top of the aggregation

Providing services

31/52

CORE applications

• CORE Portal
• CORE Mobile
• CORE Plugin
• CORE API
• Repository Analytics

32/52


Metadata Transfer
Interoperability

Metadata

OLTP
information access

Interfaces
Enrichment
Transaction
Content

OLAP

Raw data access
Repository

33/52

CORE Applications
CORE Portal – Allows searching and navigating scientific publications
aggregated from Open Access repositories

34/52

CORE Applications

CORE Mobile – Allows searching and
navigating scientific publications
aggregated from Open Access
repositories

35/52

CORE Applications
CORE Plugin – A plugin to system that recommendations for related
items.

36/52


Metadata Transfer
Interoperability

Metadata

OLTP
information access

Interfaces
Enrichment
Transaction
Content

OLAP

Raw data access
Repository

37/52

CORE Applications
CORE API – Enables external systems and services to interact with the
CORE repository.

38/52


Metadata Transfer
Interoperability

Metadata

OLTP
information access

Interfaces
Enrichment
Transaction
Content

OLAP

Raw data access
Repository

39/52

CORE Applications
Repository Analytics – is an analytical tool supporting providers of
open access content (in particular repository managers).

40/52


Repository Analytics

Metadata Transfer
Interoperability

Metadata

OLTP
information access

Interfaces
Enrichment
CORE Portal, CORE
Mobile, CORE Plugin
Transaction
Content

OLAP
CORE API

Raw data access
Repository

41/52

CORE statistics
• Content
• 5.4M records
• 192 repositories
• 402k full-texts
• Started: February 2011
• Budget: 140k£

42/52

Outline
for?
2. The CORE system
publications ( )

43/52

Partners

Advisory Board

44/52

Objective

Software for exploration and analysis of very large and
fast-growing amounts of research publications stored
across Open Access Repositories (OAR).

45/52

DiggiCORE networks

Three networks: (a) semantically related papers,
(b) citation network, (c) author citation network

46/52

DiggiCORE objectives

Allow researchers to use this platform to analyse
publications.
Why?
• To identifying patterns in the behaviour of research
communities
• To detect trends in research disciplines
• To gain new insights into the citation behaviour of researchers
• To discover features that distinguish papers with high impact

47/52

Questions the system can help answering?
• What are the attributes of impact publications?
• Do these attributes differ in the humanities, social sciences and
computer sciences?
• What are the features of research groups within disciplines and
how do these features relate to contributions generated by the
group?
• What are the attributes of high-impact authors and what is their
role within the group?
• What are the dynamics of successful research groups?

48/52

Questions the system can help answering?
• What is the mechanism of cross-fertilisation within
disciplines, especially between the humanities and the
sciences?
• Who are the authors whose work is worth monitoring because
they contribute to the achievements of their own discipline and
also inspire other disciplines?
• How should the novice in the discipline get acquainted with key
achievements in the discipline?
• How should he/she search for the most important publications?

49/52

Summary
• The rapid growth of OA content provides both an opportunity as
well as a challenge.
• Aggregations should serve the needs of different user groups.
• Aggregations need to aggregate content, not just metadata.
• We can have many services that are part of the
infrastructure, but should work with the same data.

50/52

Thank you!

Yes we can!
51/52

CORE: Aggregating and Enriching Content to Support Open Access

Recommended

Recommended

More Related Content

Similar to CORE: Aggregating and Enriching Content to Support Open Access

Similar to CORE: Aggregating and Enriching Content to Support Open Access (20)

More from petrknoth

More from petrknoth (20)

Recently uploaded

Recently uploaded (20)

CORE: Aggregating and Enriching Content to Support Open Access