DiggiCORE: Digging into Connected Repositories

DiggiCORE: Digging into Connected
Repositories
Petr Knoth
Knowledge Media institute
The Open University

1/38

Outline
1. Connecting by aggregating Open Access (OA) publications
• Why agregate and who is it for
• The added value of aggregations
2. The CORE system
3. Supporting research in mining databases of scientific
publications

2/38

Outline
1. Connecting by aggregating Open Access (OA) publications
• Why agregate and who is it for
• The added value of aggregations
2. The CORE system
publications

3/38

The rapid rise of OA articles

The graph (from Laasko and Bjork's paper - BMC Medicine 2012, 10:124) shows
the numbers of papers published in three different types of online open access
journals from 2000 to 2011.

4/38

Growth of Open Access repositories

5/38

Why we need aggregations?
“Each individual repository is of limited value for research: the real
power of Open Access lies in the possibility of connecting and tying
together repositories, which is why we need interoperability. In
order to create a seamless layer of content through connected
repositories from around the world, Open Access relies on
interoperability, the ability for systems to communicate with each
other and pass information back and forth in a usable format.
Interoperability allows us to exploit today's computational power so
that we can aggregate, data mine, create new tools and services,
and generate new knowledge from repository content.’’
[COAR manifesto]

6/38

Access to information according to the level of abstraction

Metadata Transfer
Interoperability

Metadata

OLTP
Analytical

Semantic Enrichment
Repository
information access

Interfaces
Aggregation
Transaction
Repository information access
Content

OLAP

Raw data access
Repository

7/38

Who should be supported by aggregations?

• The following users groups (divided according to the level of
abstraction of information they need):
• Raw data access. Developers, DLs, DL researchers, companies …
• Transaction information access. Researchers, students, life-long learners …
• Analytical information access. Funders, government, bussiness intelligence
…

8/38

What is it all about?

9/38

Outline
1. Connecting by aggregating Open Access (OA) publications – why,
how, what for?
2. The CORE system
publications

10/38

CORE objective

CORE aims to provide a technical infrastructure for Open Access
scholarly publications that will support access and reuse of scholarly
materials at different levels of abstraction.

11/38

CORE functionality

Content harvesting, processing

12/38

CORE functionality

Semantic enrichment

13/38

CORE functionality

Providing services

14/38

What does CORE provide at different access levels?

Repository Analytics

Metadata Transfer
Interoperability

Metadata

OLTP
Repository Analytical
information access

Interfaces
Enrichment
CORE Portal, CORE
Aggregation
Mobile, CORE Plugin
Transaction
Repository information access
Content

OLAP
CORE API
CORE API

Raw data access
Repository

15/38

CORE Applications
CORE Portal – Allows searching and navigating scientific publications
aggregated from Open Access repositories

16/38

CORE Applications

CORE Mobile – Allows
searching and
navigating scientific
publications aggregated
from Open Access
repositories

17/38

CORE Applications
CORE Plugin – A plugin to system that recommendations for related
items.

18/38

CORE Applications
Repository Analytics – is an analytical tool supporting providers of
open access content (in particular repository managers).

19/38

CORE Applications
CORE API – Enables external systems and services to interact with the
CORE repository.

• Search service
• Pdf and plain text
service
• Similarity service
• Classification service
• Citation service

21/38

CORE Applications
CORE API registered users:
British Education Index
Cottagelabs
UKCORR
Europeana
ULCC
Library, The Open University
Los Alamos National Laboratory, USA
University of Manchester Library
Universidad de los Andes. Bogotá, Colombia
UNESCO

22/38

CORE visits (October 2012)

More than 6000 visits per day

23/38

Outline
1. Connecting by aggregating Open Access (OA) publications – why,
how, what for?
2. The CORE system
publications

24/38

Objective

Software for exploration and analysis of very large and
fast-growing amounts of research publications stored
across Open Access Repositories (OAR).

25/38

DiggiCORE networks

Three networks: (a) semantically related papers,
(b) citation network, (c) author citation network

26/38

The problem of result transparency

Google Scholar

Microsoft Academic Search

27/38

DiggiCORE objectives

Allow researchers to use this platform to analyse
publications.
Why?
• To identifying patterns in the behaviour of research
communities
• To detect trends in research disciplines
• To gain new insights into the citation behaviour of researchers
• To discover features that distinguish papers with high impact

28/38

Questions the system can help answering?
• What are the attributes of impact publications?
• Do these attributes differ in the humanities, social sciences and
computer sciences?
• What are the features of research groups within disciplines and
how do these features relate to contributions generated by the
group?
• What are the attributes of high-impact authors and what is their
role within the group?
• What are the dynamics of successful research groups?

29/38

Questions the system can help answering?
• What is the mechanism of cross-fertilisation within disciplines,
especially between the humanities and the sciences?
• Who are the authors whose work is worth monitoring because
they contribute to the achievements of their own discipline and
also inspire other disciplines?
• How should the novice in the discipline get acquainted with key
achievements in the discipline?
• How should he/she search for the most important publications?

30/38

Challenges
• Technical issues of quick Open Access harvesting
• Lack of understanding of publishers of academics of Open
Access licenses
• Explain the added value of full-text vs metadata aggregations:
• User experience
• Text-mining

31/38

The power of full-text aggregations (WorldCat vs CORE)

32/38

Text-mining
“There are currently over 144,000 full time equivalent academic
professionals (teaching and research) working in UK higher
education. Using data from the Higher Education Statistics Agency
(HESA) for UK academic salaries, the median salary for a UK
academic falls into a band of between £42k and £55k, which
translates to between £26 and £33 per working hour. If text mining
enabled just a 2% increase in productivity – corresponding to only
45 minutes per academic per working week (and looking at CIBER’s
analysis of the impact of eJournals, this is very much an
underestimate), this would imply over 4.7 million working hours and
additional productivity worth between £123.5m and £156.8m in
working time per year.” [McDonnald & Kelly, 2012] – JISC report on
text-mining
33/38

Cost of Gold OA

http://rossmounce.co.uk/2012/09/04/the-gold-oa-plot-v0-2/

34/38

Summary
• Aggregations should serve the needs of different user groups.
• Transparency is crucial
• Machine access to publications provides lots of new
opportunities.
• We can have many services that are part of the infrastructure,
but should work with the same data.
• CORE aims to
• prepare the way for innovative open access services
• demonstrate the benefits of programmable access to
publications
• data mine publications for impact characteristics
35/38

Partners

Advisory Board

36/38

DiggiCORE: Digging into Connected Repositories

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (16)

Similar to DiggiCORE: Digging into Connected Repositories

Similar to DiggiCORE: Digging into Connected Repositories (20)

More from petrknoth

More from petrknoth (16)

Recently uploaded

Recently uploaded (20)

DiggiCORE: Digging into Connected Repositories

Editor's Notes