Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
DiggiCORE: Digging into Connected Repositories
1. DiggiCORE: Digging into Connected
Repositories
Petr Knoth
Knowledge Media institute
The Open University
1/38
2. Outline
1. Connecting by aggregating Open Access (OA) publications
• Why agregate and who is it for
• The added value of aggregations
2. The CORE system
3. Supporting research in mining databases of scientific
publications
2/38
3. Outline
1. Connecting by aggregating Open Access (OA) publications
• Why agregate and who is it for
• The added value of aggregations
2. The CORE system
3. Supporting research in mining databases of scientific
publications
3/38
4. The rapid rise of OA articles
The graph (from Laasko and Bjork's paper - BMC Medicine 2012, 10:124) shows
the numbers of papers published in three different types of online open access
journals from 2000 to 2011.
4/38
6. Why we need aggregations?
“Each individual repository is of limited value for research: the real
power of Open Access lies in the possibility of connecting and tying
together repositories, which is why we need interoperability. In
order to create a seamless layer of content through connected
repositories from around the world, Open Access relies on
interoperability, the ability for systems to communicate with each
other and pass information back and forth in a usable format.
Interoperability allows us to exploit today's computational power so
that we can aggregate, data mine, create new tools and services,
and generate new knowledge from repository content.’’
[COAR manifesto]
6/38
7. Access to information according to the level of abstraction
Metadata Transfer
Interoperability
Metadata
OLTP
Analytical
Semantic Enrichment
Repository
information access
Interfaces
Aggregation
Transaction
Repository information access
Content
OLAP
Raw data access
Repository
7/38
8. Who should be supported by aggregations?
• The following users groups (divided according to the level of
abstraction of information they need):
• Raw data access. Developers, DLs, DL researchers, companies …
• Transaction information access. Researchers, students, life-long learners …
• Analytical information access. Funders, government, bussiness intelligence
…
8/38
10. Outline
1. Connecting by aggregating Open Access (OA) publications – why,
how, what for?
2. The CORE system
3. Supporting research in mining databases of scientific
publications
10/38
11. CORE objective
CORE aims to provide a technical infrastructure for Open Access
scholarly publications that will support access and reuse of scholarly
materials at different levels of abstraction.
11/38
15. What does CORE provide at different access levels?
Repository Analytics
Metadata Transfer
Interoperability
Metadata
OLTP
Repository Analytical
information access
Interfaces
Enrichment
CORE Portal, CORE
Aggregation
Mobile, CORE Plugin
Transaction
Repository information access
Content
OLAP
CORE API
CORE API
Raw data access
Repository
15/38
16. CORE Applications
CORE Portal – Allows searching and navigating scientific publications
aggregated from Open Access repositories
16/38
17. CORE Applications
CORE Mobile – Allows
searching and
navigating scientific
publications aggregated
from Open Access
repositories
17/38
21. CORE Applications
CORE API – Enables external systems and services to interact with the
CORE repository.
• Search service
• Pdf and plain text
service
• Similarity service
• Classification service
• Citation service
21/38
22. CORE Applications
CORE API registered users:
British Education Index
Cottagelabs
UKCORR
Europeana
ULCC
Library, The Open University
Los Alamos National Laboratory, USA
University of Manchester Library
Universidad de los Andes. Bogotá, Colombia
UNESCO
22/38
24. Outline
1. Connecting by aggregating Open Access (OA) publications – why,
how, what for?
2. The CORE system
3. Supporting research in mining databases of scientific
publications
24/38
25. Objective
Software for exploration and analysis of very large and
fast-growing amounts of research publications stored
across Open Access Repositories (OAR).
25/38
27. The problem of result transparency
Google Scholar
Microsoft Academic Search
27/38
28. DiggiCORE objectives
Allow researchers to use this platform to analyse
publications.
Why?
• To identifying patterns in the behaviour of research
communities
• To detect trends in research disciplines
• To gain new insights into the citation behaviour of researchers
• To discover features that distinguish papers with high impact
28/38
29. Questions the system can help answering?
• What are the attributes of impact publications?
• Do these attributes differ in the humanities, social sciences and
computer sciences?
• What are the features of research groups within disciplines and
how do these features relate to contributions generated by the
group?
• What are the attributes of high-impact authors and what is their
role within the group?
• What are the dynamics of successful research groups?
29/38
30. Questions the system can help answering?
• What is the mechanism of cross-fertilisation within disciplines,
especially between the humanities and the sciences?
• Who are the authors whose work is worth monitoring because
they contribute to the achievements of their own discipline and
also inspire other disciplines?
• How should the novice in the discipline get acquainted with key
achievements in the discipline?
• How should he/she search for the most important publications?
30/38
31. Challenges
• Technical issues of quick Open Access harvesting
• Lack of understanding of publishers of academics of Open
Access licenses
• Explain the added value of full-text vs metadata aggregations:
• User experience
• Text-mining
31/38
32. The power of full-text aggregations (WorldCat vs CORE)
32/38
33. Text-mining
“There are currently over 144,000 full time equivalent academic
professionals (teaching and research) working in UK higher
education. Using data from the Higher Education Statistics Agency
(HESA) for UK academic salaries, the median salary for a UK
academic falls into a band of between £42k and £55k, which
translates to between £26 and £33 per working hour. If text mining
enabled just a 2% increase in productivity – corresponding to only
45 minutes per academic per working week (and looking at CIBER’s
analysis of the impact of eJournals, this is very much an
underestimate), this would imply over 4.7 million working hours and
additional productivity worth between £123.5m and £156.8m in
working time per year.” [McDonnald & Kelly, 2012] – JISC report on
text-mining
33/38
34. Cost of Gold OA
http://rossmounce.co.uk/2012/09/04/the-gold-oa-plot-v0-2/
34/38
35. Summary
• Aggregations should serve the needs of different user groups.
• Transparency is crucial
• Machine access to publications provides lots of new
opportunities.
• We can have many services that are part of the infrastructure,
but should work with the same data.
• CORE aims to
• prepare the way for innovative open access services
• demonstrate the benefits of programmable access to
publications
• data mine publications for impact characteristics
35/38