Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Better together: building services for public good on top of content from the global network of open repositories
1. Better together: building services for
public good on top of content from the
global network of open repositories
Petr Knoth
June 21, 2019 – OAI-11, Geneva
Big Scientific Data and Text Analytics Group
Knowledge Media Institute, The Open University
2. Papers OA sooner and sooner
1/22
https://physicstoday.scitation.org/do/10.1063/PT.6.2.20190418a/full/
JCDL 2019 Best Paper
Award
3. Papers OA sooner and sooner
The delay between publication and OA availability decreasing
globally
2
Herrmannova, Drahomira; Pontika, Nancy and Knoth, Petr (2019). Do Authors Deposit on Time? Tracking Open Access Policy
Compliance. In: 2019 ACM/IEEE Joint Conference on Digital Libraries, 2-6 Jun 2019, Urbana-Champaign, IL
5. We don’t need just open access
we need fast open access
4
6. This study was only possible to conduct because
of repositories and aggregators working together
5
7. Global network of repositories
7
“A single scientific repository is of limited value, real benefits
come from the ability to exchange data within a network …
… interoperability allows us to exploit today's computational
power so that we can aggregate, data mine, create new tools and
services, and generate new knowledge from repository content.”
– Confederation of Open Access Repositories (COAR)
8. OA Aggregations and BOAI 2002
“To achieve open access to scholarly journal literature, we
recommend two complementary strategies.
• Self-Archiving: First, scholars need the tools and assistance to
deposit their refereed journal articles in open electronic
archives, a practice commonly called, self-archiving. When
these archives conform to standards created by the Open
Archives Initiative, then search engines and other tools can
treat the separate archives as one. Users then need not know
which archives exist or where they are located in order to find
and make use of their contents.
• …”
Budapest Open Access Initiative, 2002
8
9. CORE’s mission
9
Aggregate all open access research articles worldwide …
… enrich this content and provide seamless access to it through a set
of data services …
15. World’s largest dataset of Open Access
full texts
• 13,117,488 Hosted full texts
• 135,539,113 Metadata records
• ~78m Links to full texts
• 15TB of raw plain text
• 4,123 Data providers
17/22
16. CORE Processing pipeline
• Metadata download, extraction and harmonisation
• Full text download
• Text extractions, sections extraction
• Metadata validation and enrichment (DOI, ORCID, etc.)
• Thumbnails generation
• References and citation contexts extraction
• API enrichment (e.g. finding DOIs, linking to other systems)
• Document type classification
• Deduplication
• Indexing
• Exposing (data dumps, API, FastSync)
18
17. CORE usage
• January 2019 – CORE reached over 10M monthly active
users for the first time
• 571% increase from January 2018
19
18. Alexa rank
• Within top 6k
global websites:
https://www.alexa.
com/
• core.ac.uk by
usage in the top
0.0009% of
global websites
20
19. CORE Search
21
• Full text search for OA content
• Faceted searching
• What you find is what you get
• Real change of data providers
wanting to be included
20. CORE Recommender
• Recommending relevant
content to users from
across all free content
• Recommender plugin for
repositories
• https://core.ac.uk/service
s/recommender/
22
21. CORE Recommender
• Recommending relevant
content to users from
across all free content
• Recommender plugin for
repositories
• https://core.ac.uk/service
s/recommender/
23
22. Introducing CORE Discovery
• High coverage of freely
available content
• Free service for
researchers by
researchers. No
company controlling the
pipes.
• Best grip on open
repository content.
• Repository integration
• Discovering documents
without a DOI.
https://core.ac.uk/services/discovery/
24. CORE Discovery Repository integration
• Majority of articles in
repositories metadata only.
• CORE Discovery
repository plugin:
• turns dead ends of user
journeys into journeys
fulfilling users’ information
needs
• makes repository content
more discoverable.
25. CORE Search
27
• Full text search for OA content
• Faceted searching
• What you find is what you get
27. Raw data services – CORE API
29
• Enables the development of new
applications
• Real-time machine access to the
world's largest collection of open
access papers
• Harmonised access to data from
across the network of CORE
providers
• Direct machine access to full texts
of research papers
28. Raw data services – CORE Dataset
30
• Download millions of research
papers for text and data analysis
• Prototype, analyse and mine your
data in your infrastructure
29. Raw data services – CORE FastSync
31
• Keeps your data in sync with
research content from around the
world
• Fast and incremental updates as
soon as they become available. No
usage restrictions
• Based on ResourceSync
30. Use powered by CORE
32
• It is beyond human capacities to read all
scientific literature
• Example use cases in which CORE is applied:
• Improving discovery
• Plagiarism detection
• Question answering in science
• Literature based discovery
• Fact checking and detection of misinformation
• Analysing research trends
• Finding experts in a particular domain
• Research evaluation and scientometrics
• Exploratory and visual search
• …
32. Take home
22/22
• Data providers (repositories, preprint servers, journals, etc.)
and aggregators need to work together to allow text and data
analysis, processing and reuse of large volumes of research
papers.
• CORE provides the tools for programmatically processing
open access data fast, reliably and from across the global
network of repositories.
• If you are a repository manager or a librarian:
• CORE Discovery and Recommender
• If you are a developer or analyst:
• Build your own stuff using CORE‘s data services on top of the
global full text open access courpus
Paper are being made deposited into open repositories faster and faster after acceptance
Paper are being made deposited into open repositories faster and faster after acceptance
It is a global trend
The amazing thing about this is that the sooner the content becomes available in a repository the more important the repository infrastructure becomes.
The vast majority of transactions/downloads of OA literature is for recently published content.
Many researchers care about getting access to content soon after it gets released
If contnt becomes available in repositories before it gets published, repositories have the potential top become the primary places from which we will access research.
For the OA movement to succeed soon we don’t need just OA but fast OA
If we want the ability to run global analyses on open access data, if we want to be able to monitor our progress, we need repositories and services that have a global view on repositories to work together.
The vision that many data providers can be treated as one
CORE offers seamless, unrestricted access to millions of research papers. CORE hosts the world’s largest collection of open access full texts, which are used by researchers, libraries, software developers, funders and many more. CORE's aggregated content comes from thousands of institutional and subject repositories as well as journals and covers all research disciplines. In January 2019, CORE has hit the mark of 10 million monthly active users (10.41 million users) making core.ac.uk a top 6k website globally by usage according to the independent Alexa Rank and one of the most world's most widely used Open Access platforms. In this talk, Petr will present the CORE service, discuss current challenges in harvesting content from open repositories and share his experience of building value-added services for the society on top of open content.
Mention preprints and large repositories
No sufficient support for full text harvesting
Previously questioned, not any more
We started aggregating from repositories, we then added aggregating OA from publishers, we are now working to aggregate OA content from anywhere on the Web.
We started aggregating from repositories, we then added aggregating OA from publishers, we are now working to aggregate OA content from anywhere on the Web.
Hard job to do this because more interoperability needed.
Non-trivial pipeline
Machine learning involved
The idea that one person can do this on their own doesn’t really hold
Data providers and repositories must collaborate
When all the processing is done. People appreciate the product.
Highest coverage of freely available content. Our tests have shown CORE Discovery finding more free content than any other discovery system.
Free service for researchers by researchers. CORE Discovery is the only free content discovery extension developed by researchers for researchers. There is no major publisher or enterprise controlling and profiting from your usage data.
Best grip on open repository content. Due to CORE being a leader in harvesting open access literature, CORE Discovery has the best grip on open content from open repositories as opposed to other services that disproportionately focus only on content indexed in major commercial databases.
Repository integration and discovering documents without a DOI. The only service offering seamless and free integration into repositories. CORE Discovery is also the only discovery system that can locate scientific content even for items with an unknown DOI or which do not have a DOI.
Open access discovery tools locate freely available copies of research papers which might be behind the paywall
K[.*]io
Enable others to programmatically interact with CORE data.
Essential for developing applications that make use of information about research papers
Supporting a variety of use cases
>500 registered users, including companies for CORE API
Use:
CORE API to interact with CORE in real time
CORE Data Dumps to run batch process, develop prototype applications and do research
CORE SDKs to gain easy access to CORE API from several programming languages
CORE FastSync
Enable others to programmatically interact with CORE data.
Essential for developing applications that make use of information about research papers
Supporting a variety of use cases
>500 registered users, including companies for CORE API
Use:
CORE API to interact with CORE in real time
CORE Data Dumps to run batch process, develop prototype applications and do research
CORE SDKs to gain easy access to CORE API from several programming languages
CORE FastSync
Enable others to programmatically interact with CORE data.
Essential for developing applications that make use of information about research papers
Supporting a variety of use cases
>500 registered users, including companies for CORE API
Use:
CORE API to interact with CORE in real time
CORE Data Dumps to run batch process, develop prototype applications and do research
CORE SDKs to gain easy access to CORE API from several programming languages
CORE FastSync
Enable others to programmatically interact with CORE data.
Essential for developing applications that make use of information about research papers
Supporting a variety of use cases
>500 registered users, including companies for CORE API
Use:
CORE API to interact with CORE in real time
CORE Data Dumps to run batch process, develop prototype applications and do research
CORE SDKs to gain easy access to CORE API from several programming languages
CORE FastSync