SlideShare a Scribd company logo
1/31
From Open Access Metadata to
Open Access Content
Two Principles for Increased Visibility of Open
Access Content
Petr Knoth
CORE (Connecting REpositories) project
Knowledge Media institute
The Open University
@petrknoth, #diggicore
2/31
Why from OA metadata to OA content?
• Free online availability of research outputs (not metadata of
research outputs) is the main goal of open access (OA)
• Repositories are one of the recommended ways to achieve this
(BOAI,2002).
• Despite large amount of OA content already available online
(Laakso & Bjork, 2012), OA content is not necessarily easily
discoverable (Morrisson, 2012; Konkiel, 2012).
• Often available, but difficult to find …
• Inhibiting the OA impact – accessibility, discoverability, reuse …
• Discoverability of OA content on the Web can be dramatically
increased by adopting two simple principles!
3/31
Outline
1. Goals of repositories
2. The bleak truth about availability of OA metadata vs content
3. Content referencing practises in repositories
4. Two principles to increase visibility of OA content
4/31
Outline
1. Goals of repositories (repositories as large metadata silos)
2. The bleak truth about availability of OA metadata vs content
3. Content referencing practises in repositories
4. Two principles to increase visibility of OA content
5/31
The primary purpose of repositories
• Institutional repositories (IRs) serve a number of purposes; such
collecting and curating digital outputs, providing statistics,
research excellence, etc.
• The primary goal of repositories is to open and disseminate
research outputs to a worldwide audience (Crow, 2002) –
SPARC’s position paper on the case for institutional repositories.
6/31
SPARC’s position paper on IRs
“For the repository to provide access to the broader research
community, users outside the university must be able to find and
retrieve information from the repository. Therefore, institutional
repository systems must be able to support interoperability in order
to provide access via multiple search engines and other discovery
tools. An institution does not necessarily need to implement
searching and indexing functionality to satisfy this demand: it could
simply maintain and expose metadata, allowing other services to
harvest and search the content. This simplicity lowers the barrier to
repository operation for many institutions, as it only requires a file
system to hold the content and the ability to create and share
metadata with external systems.”
7/31
COAR: About harvesting and aggregations …
“Each individual repository is of limited value for research: the real
power of Open Access lies in the possibility of connecting and tying
together repositories, which is why we need interoperability. In
order to create a seamless layer of content through connected
repositories from around the world, Open Access relies on
interoperability, the ability for systems to communicate with each
other and pass information back and forth in a usable format.
Interoperability allows us to exploit today's computational power so
that we can aggregate, data mine, create new tools and services,
and generate new knowledge from repository content.’’
[COAR manifesto]
8/31
We need OA to content (not just metadata)
• Repositories (even the most prominent) often seen by
aggregation systems as large metadata.
• OA to metadata is not disruptive. Little difference to the
traditional publishing model.
9/31
Outline
1. Goals of repositories (repositories as large metadata silos)
2. The bleak truth about availability of OA metadata vs content
3. Content referencing practises in repositories
4. Two principles to increase visibility of OA content
10/31
Study
• 83 repositories (mainly EPrints with pdf research outputs)
• 1,461,016 metadata records
• Ratio of metadata to content
• Data acquired from CORE (Knoth & Zdrahal, 2012)
11/31
12/31
“[The institutional repository] is like a roach motel. Data goes in, but
it doesn’t come out.” (Salo, 2008)
13/31
Why is this a problem?
• Lower accessibility of papers (we have them, but cannot find
them)
• Text-mining
• Cannot monitor growth
• Loosing a strong argument for the adoption of OA!
14/31
Outline
1. Goals of repositories (repositories as large metadata silos)
2. The bleak truth about availability of OA metadata vs content
3. Content referencing practises in repositories
4. Two principles to increase visibility of OA content
15/31
OAI-PMH and content referencing
• OAI-PMH supports representing metadata in multiple formats,
but at a minimum repositories must be able to return records
with metadata expressed in the Dublin Core format (OAI-PMH
v2.0, 2008)
• If repositories want to satisfy the SPARC guidelines (Crow, 2002),
they must provide a link to the content as part of the exposed
metadata.
16/31
OAI-PMH and content referencing
The Open Research Online repository (Eprints) links directly to the
resource from metadata.
Cranfield repository (DSpace) identifies the resource by providing a
link to a page from which the resource (if available) can be accessed.
17/31
OAI-PMH and content referencing
The OAI-PMH specification states on this topic that:
“The nature of a resource identifier is outside the scope of the OAI-
PMH. To facilitate access to the resource associated with harvested
metadata, repositories should use an element in metadata records
to establish a linkage between the record (and the identifier of its
item) and the identifier (URL, URN, DOI, etc.) of the associated
resource. The mandatory Dublin Core format provides the identifier
element that should be used for this purpose.”
18/31
OAI-PMH and content referencing
• What is an identifier of the associated resource?
Is a splash page an identifier? According to OAI-PMH examples it is:
<dc:identifier>http://arXiv.org/abs/cs/0112017</dc:identifier>
• The standard is pretty weak on this aspect …
19/31
Outline
1. Goals of repositories (repositories as large metadata silos)
2. The bleak truth about availability of OA metadata vs content
3. Content referencing practises in repositories
4. Two principles to increase visibility of OA content
20/31
The principles of the principles 
• Pragmatic rather than exciting.
• Generating maximum benefit for a minimum investment.
• Deliberately use current standards to minimise adoption time.
• Respecting differences across systems and backwards
compatibility.
• Emphasizes the need for easy to use compliance mechanisms to
assist repository managers in ensuring systems interoperability.
21/31
Principle 1 – Content referencing
Open repositories should always establish a link from the metadata
record to the item the metadata record describes using a
dereferencable identifier pointing to the version held in the
repository. The dereferencable identifier should be provided in the
appropriate metadata element in the used metadata format (i.e.
dc:identifier in the case of Dublin Core).
22/31
Implications: Principle 1 – Content referencing
• Repositories can use different standards to deliver metadata over
OAI-PMH (DC, METS, MPEG-21 DIDL)
• Identifier must resolve (be actionable) to the object it identifies
• In the case of DC, if more identifiers are present, use the first
identifier as the identifier of the object
• Should resolve to the version of the object in the local repository
• Similarity with RIOXX identifier field
• The principle is easily applicable in the OA domain: each item can
be freely resolved
23/31
Open access statistics and principle 1
• Only dereferencable
items are OA
• Increases stats acuracy
• Avoids anecdotal
situations (e.g. 23,380
Dark Items)
24/31
Principle 2 – Content accessibility to machines
Open repositories must provide universal access to machines with
the same level of access as humans have. It is the role of open
repositories to allow machines harvest the entire content of the
repository in a reasonable time to enable harvesting systems to
acquire and maintain up-to-date information about the repository
content.
25/31
Example from Arxiv.org
• Googlebot: unrestricted
• Yahoo/MSN: can
reharvest in 6 months
• Researchers: access
denied
26/31
Implications: Principle 2 – Content accessibility to machines
• Accessibility of repository content by machines
• Enabling reuse through new services, such as those relying on
text-mining
• Open Repositories should not discriminate, except for abusive
behavior
• Presumption of innocence
27/31
Validation tools
• Key to adoption – Repository managers should not be left alone
• Repository Analytics
28/31
Conclusions
• Proportion of OA content that can be harvested is fairly low in
comparison to metadata
• Inhibiting the benefits of OA
• Two principles:
1) Dereferencable identifiers - Open Repositories provide open
access to content and not just meatadata
2) Machine access – Open Repositories should provide free access
to content (for anybody, but mainly researchers)
• Compliance validation tools are needed
29/31
Thank you!
Open access needs open repositories
30/31
References 1/2
[BOAI, 2002] Budapest Open Access Initiative. (2002)
http://www.opensocietyfoundations.org/openaccess/boai-10-recommendations
[Crow, 2002] Crow, R. (2002). The case for institutional repositories: a SPARC position
paper. ARL Bimonthly Report 223.
[Knoth & Zdrahal, 2012] Knoth, P. and Zdrahal, Z. (2012) CORE: Three Access Levels to
Underpin Open Access, D-Lib Magazine, 18, 11/12, Corporation for National Research
Initiatives, http://dx.doi.org/10.1045/november2012-knoth
[Konkiel, 2012] Konkiel, S. (2012) Are Institutional Repositories Doing Their Job?
https://blogs.libraries.iub.edu/scholcomm/2012/09/11/are-institutional-repositories-
doing-their-job/
[Laakso & Bjork, 2012] Laakso, M., & Björk, B. C. (2012). Anatomy of open access
publishing: a study of longitudinal development and internal structure. BMC Medicine,
10(1), 124.
31/31
References 2/2
[Morrison, 2012] Morrison, Louise (2012) 5 reasons why I can’t find Open Access
publications. http://mmitscotland.wordpress.com/2012/08/06/5-reasons-why-i-cant-find-
open-access-publications-2/
[OAI-PMH v2.0, 2008] The Open Archives Initiative Protocol for Metadata Harvesting
Version 2.0 (OAI-PMH), Impementation Guidelines (2008).
http://www.openarchives.org/OAI/openarchivesprotocol.html
[ResourceSync draft, 2013] ResourceSync protocol draft. 2013
http://www.niso.org/workrooms/resourcesync/
[Salo, 2008] Salo, D. (2008). Innkeeper at the roach motel. Library Trends, 57(2), 98-123.
[Van de Sompel et al, 2004] Van de Sompel, H., Nelson, M. L., Lagoze, C., & Warner, S.
(2004). Resource harvesting within the OAI-PMH framework. D-lib magazine, 10(12), 1082-
9873.

More Related Content

What's hot

Liger cat challenge
Liger cat challengeLiger cat challenge
Liger cat challengea s
 
TRACK OER - Project proposal
TRACK OER - Project proposalTRACK OER - Project proposal
TRACK OER - Project proposal
Patrick McAndrew
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015
Comsode - FP7 project
 
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Frank Oellien
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
petrknoth
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
petrknoth
 
Managing the Collective Collection: Cooperative Infrastructure for Shared Pri...
Managing the Collective Collection: Cooperative Infrastructure for Shared Pri...Managing the Collective Collection: Cooperative Infrastructure for Shared Pri...
Managing the Collective Collection: Cooperative Infrastructure for Shared Pri...
Maine_SharedCollections
 
Providing Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyProviding Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case study
inscit2006
 
Kessler "Open Discovery Initiative Update"
Kessler "Open Discovery Initiative Update"Kessler "Open Discovery Initiative Update"
Kessler "Open Discovery Initiative Update"
National Information Standards Organization (NISO)
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...redsys
 
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKThe Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
Andy Powell
 
OSFair2017 Workshop | Text mining
OSFair2017 Workshop | Text miningOSFair2017 Workshop | Text mining
OSFair2017 Workshop | Text mining
Open Science Fair
 
Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSync
petrknoth
 
NISO Open Discovery Initiative, ALA Midwinter
NISO Open Discovery Initiative, ALA MidwinterNISO Open Discovery Initiative, ALA Midwinter
NISO Open Discovery Initiative, ALA Midwinter
National Information Standards Organization (NISO)
 
Access to Content via Link Resolvers
Access to Content via Link ResolversAccess to Content via Link Resolvers
Access to Content via Link Resolvers
EDINA, University of Edinburgh
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009
Ian Foster
 
Globus publication demo screenshots
Globus publication demo screenshotsGlobus publication demo screenshots
Globus publication demo screenshots
Ian Foster
 

What's hot (18)

Liger cat challenge
Liger cat challengeLiger cat challenge
Liger cat challenge
 
TRACK OER - Project proposal
TRACK OER - Project proposalTRACK OER - Project proposal
TRACK OER - Project proposal
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015
 
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Managing the Collective Collection: Cooperative Infrastructure for Shared Pri...
Managing the Collective Collection: Cooperative Infrastructure for Shared Pri...Managing the Collective Collection: Cooperative Infrastructure for Shared Pri...
Managing the Collective Collection: Cooperative Infrastructure for Shared Pri...
 
Providing Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyProviding Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case study
 
Kessler "Open Discovery Initiative Update"
Kessler "Open Discovery Initiative Update"Kessler "Open Discovery Initiative Update"
Kessler "Open Discovery Initiative Update"
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...
 
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKThe Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
 
Digitisation and institutional repositories 3
Digitisation and institutional repositories 3Digitisation and institutional repositories 3
Digitisation and institutional repositories 3
 
OSFair2017 Workshop | Text mining
OSFair2017 Workshop | Text miningOSFair2017 Workshop | Text mining
OSFair2017 Workshop | Text mining
 
Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSync
 
NISO Open Discovery Initiative, ALA Midwinter
NISO Open Discovery Initiative, ALA MidwinterNISO Open Discovery Initiative, ALA Midwinter
NISO Open Discovery Initiative, ALA Midwinter
 
Access to Content via Link Resolvers
Access to Content via Link ResolversAccess to Content via Link Resolvers
Access to Content via Link Resolvers
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009
 
Globus publication demo screenshots
Globus publication demo screenshotsGlobus publication demo screenshots
Globus publication demo screenshots
 

Viewers also liked

DEVCSI Core Mobile
DEVCSI Core MobileDEVCSI Core Mobile
DEVCSI Core Mobilepetrknoth
 
Snail 12345
Snail 12345Snail 12345
Snail 12345reblyn1
 
Core presentation
Core presentationCore presentation
Core presentation
petrknoth
 
DiggiCORE: Digging into Connected Repositories
DiggiCORE: Digging into Connected RepositoriesDiggiCORE: Digging into Connected Repositories
DiggiCORE: Digging into Connected Repositories
petrknoth
 
CORE projects family
CORE projects familyCORE projects family
CORE projects familypetrknoth
 
Amicable resources corporate presentation- Human resource company
Amicable resources corporate presentation- Human resource companyAmicable resources corporate presentation- Human resource company
Amicable resources corporate presentation- Human resource company
rachna1122
 
All Joke Photos
All Joke PhotosAll Joke Photos
Ali’S Careers Power Point
Ali’S Careers Power PointAli’S Careers Power Point
Ali’S Careers Power Pointguestb4db5a8
 
The murder of a student.
The murder of a student.The murder of a student.
The murder of a student.
selimkaradag
 
FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)
petrknoth
 
Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...
petrknoth
 
My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?
petrknoth
 
CORE: Aggregating and Enriching Content to Support Open Access
CORE: Aggregating and Enriching Content to Support Open AccessCORE: Aggregating and Enriching Content to Support Open Access
CORE: Aggregating and Enriching Content to Support Open Access
petrknoth
 
Suman Pandit
Suman PanditSuman Pandit
Suman Pandit
sumanpandit
 
The Clown Doctor
The Clown DoctorThe Clown Doctor
The Clown Doctor
Grace Sevilla-Giestas
 
Semantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research EvaluationSemantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research Evaluation
petrknoth
 
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
93136540 spider-cloud-small-cell-cluster-case-study-091911-final93136540 spider-cloud-small-cell-cluster-case-study-091911-final
93136540 spider-cloud-small-cell-cluster-case-study-091911-finalZarobiza
 

Viewers also liked (17)

DEVCSI Core Mobile
DEVCSI Core MobileDEVCSI Core Mobile
DEVCSI Core Mobile
 
Snail 12345
Snail 12345Snail 12345
Snail 12345
 
Core presentation
Core presentationCore presentation
Core presentation
 
DiggiCORE: Digging into Connected Repositories
DiggiCORE: Digging into Connected RepositoriesDiggiCORE: Digging into Connected Repositories
DiggiCORE: Digging into Connected Repositories
 
CORE projects family
CORE projects familyCORE projects family
CORE projects family
 
Amicable resources corporate presentation- Human resource company
Amicable resources corporate presentation- Human resource companyAmicable resources corporate presentation- Human resource company
Amicable resources corporate presentation- Human resource company
 
All Joke Photos
All Joke PhotosAll Joke Photos
All Joke Photos
 
Ali’S Careers Power Point
Ali’S Careers Power PointAli’S Careers Power Point
Ali’S Careers Power Point
 
The murder of a student.
The murder of a student.The murder of a student.
The murder of a student.
 
FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)
 
Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...
 
My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?
 
CORE: Aggregating and Enriching Content to Support Open Access
CORE: Aggregating and Enriching Content to Support Open AccessCORE: Aggregating and Enriching Content to Support Open Access
CORE: Aggregating and Enriching Content to Support Open Access
 
Suman Pandit
Suman PanditSuman Pandit
Suman Pandit
 
The Clown Doctor
The Clown DoctorThe Clown Doctor
The Clown Doctor
 
Semantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research EvaluationSemantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research Evaluation
 
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
93136540 spider-cloud-small-cell-cluster-case-study-091911-final93136540 spider-cloud-small-cell-cluster-case-study-091911-final
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
 

Similar to From Open Access Metadata to Open Access Content: Two Principles for Increased Visibility of Open Access Content

Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
floyd taag
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
floyd taag
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
floyd taag
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
marevil awas
 
CORE - Petr Knoth, Research Associate
CORE - Petr Knoth, Research AssociateCORE - Petr Knoth, Research Associate
CORE - Petr Knoth, Research AssociateThe European Library
 
OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
Open Science Fair
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
Open Science Fair
 
OAI-PMH
OAI-PMHOAI-PMH
OAI-PMH
CSMeena1
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
lagoze
 
Core @ repositories fringe 2015
Core @ repositories fringe 2015Core @ repositories fringe 2015
Core @ repositories fringe 2015
Lucas anastasiou
 
EuroSakai CLIF project presentation
EuroSakai CLIF project presentationEuroSakai CLIF project presentation
EuroSakai CLIF project presentation
Chris Awre
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
petrknoth
 
Towards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriesTowards effective research recommender systems for repositories
Towards effective research recommender systems for repositories
petrknoth
 
Carpenter "Working Together and Working More Efficiently"
Carpenter "Working Together and Working More Efficiently"Carpenter "Working Together and Working More Efficiently"
Carpenter "Working Together and Working More Efficiently"
National Information Standards Organization (NISO)
 
Cloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overviewCloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overview
Nikesh Narayanan
 
An Architecture based on Linked Data technologies for the Integration of OER ...
An Architecture based on Linked Data technologies for the Integration of OER ...An Architecture based on Linked Data technologies for the Integration of OER ...
An Architecture based on Linked Data technologies for the Integration of OER ...
The Open Education Consortium
 
How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why? How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why?
Nancy Pontika
 
Capturing Conversations, Context and Curricula: The JLeRN Experiment and the ...
Capturing Conversations, Context and Curricula: The JLeRN Experiment and the ...Capturing Conversations, Context and Curricula: The JLeRN Experiment and the ...
Capturing Conversations, Context and Curricula: The JLeRN Experiment and the ...
Sarah Currier
 
Slidescambridge2012 120417062050-phpapp02
Slidescambridge2012 120417062050-phpapp02Slidescambridge2012 120417062050-phpapp02
Slidescambridge2012 120417062050-phpapp02
Mimas
 
Next generation repositories
Next generation repositoriesNext generation repositories
Next generation repositories
Paul Walk
 

Similar to From Open Access Metadata to Open Access Content: Two Principles for Increased Visibility of Open Access Content (20)

Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
CORE - Petr Knoth, Research Associate
CORE - Petr Knoth, Research AssociateCORE - Petr Knoth, Research Associate
CORE - Petr Knoth, Research Associate
 
OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
 
OAI-PMH
OAI-PMHOAI-PMH
OAI-PMH
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
 
Core @ repositories fringe 2015
Core @ repositories fringe 2015Core @ repositories fringe 2015
Core @ repositories fringe 2015
 
EuroSakai CLIF project presentation
EuroSakai CLIF project presentationEuroSakai CLIF project presentation
EuroSakai CLIF project presentation
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
 
Towards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriesTowards effective research recommender systems for repositories
Towards effective research recommender systems for repositories
 
Carpenter "Working Together and Working More Efficiently"
Carpenter "Working Together and Working More Efficiently"Carpenter "Working Together and Working More Efficiently"
Carpenter "Working Together and Working More Efficiently"
 
Cloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overviewCloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overview
 
An Architecture based on Linked Data technologies for the Integration of OER ...
An Architecture based on Linked Data technologies for the Integration of OER ...An Architecture based on Linked Data technologies for the Integration of OER ...
An Architecture based on Linked Data technologies for the Integration of OER ...
 
How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why? How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why?
 
Capturing Conversations, Context and Curricula: The JLeRN Experiment and the ...
Capturing Conversations, Context and Curricula: The JLeRN Experiment and the ...Capturing Conversations, Context and Curricula: The JLeRN Experiment and the ...
Capturing Conversations, Context and Curricula: The JLeRN Experiment and the ...
 
Slidescambridge2012 120417062050-phpapp02
Slidescambridge2012 120417062050-phpapp02Slidescambridge2012 120417062050-phpapp02
Slidescambridge2012 120417062050-phpapp02
 
Next generation repositories
Next generation repositoriesNext generation repositories
Next generation repositories
 

More from petrknoth

Qui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingQui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishing
petrknoth
 
CORE APIv3
CORE APIv3CORE APIv3
CORE APIv3
petrknoth
 
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in RepositoriesOAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
petrknoth
 
UKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet themUKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet them
petrknoth
 
Enabling Educators to Locate High-Quality Teaching Resources
Enabling Educators to LocateHigh-Quality Teaching ResourcesEnabling Educators to LocateHigh-Quality Teaching Resources
Enabling Educators to Locate High-Quality Teaching Resources
petrknoth
 
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository DashboardTracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
petrknoth
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboard
petrknoth
 
Analysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolsAnalysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery tools
petrknoth
 
Assessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access PolicyAssessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access Policy
petrknoth
 
Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)
petrknoth
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure
petrknoth
 

More from petrknoth (11)

Qui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingQui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishing
 
CORE APIv3
CORE APIv3CORE APIv3
CORE APIv3
 
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in RepositoriesOAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
 
UKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet themUKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet them
 
Enabling Educators to Locate High-Quality Teaching Resources
Enabling Educators to LocateHigh-Quality Teaching ResourcesEnabling Educators to LocateHigh-Quality Teaching Resources
Enabling Educators to Locate High-Quality Teaching Resources
 
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository DashboardTracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboard
 
Analysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolsAnalysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery tools
 
Assessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access PolicyAssessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access Policy
 
Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure
 

Recently uploaded

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

From Open Access Metadata to Open Access Content: Two Principles for Increased Visibility of Open Access Content

  • 1. 1/31 From Open Access Metadata to Open Access Content Two Principles for Increased Visibility of Open Access Content Petr Knoth CORE (Connecting REpositories) project Knowledge Media institute The Open University @petrknoth, #diggicore
  • 2. 2/31 Why from OA metadata to OA content? • Free online availability of research outputs (not metadata of research outputs) is the main goal of open access (OA) • Repositories are one of the recommended ways to achieve this (BOAI,2002). • Despite large amount of OA content already available online (Laakso & Bjork, 2012), OA content is not necessarily easily discoverable (Morrisson, 2012; Konkiel, 2012). • Often available, but difficult to find … • Inhibiting the OA impact – accessibility, discoverability, reuse … • Discoverability of OA content on the Web can be dramatically increased by adopting two simple principles!
  • 3. 3/31 Outline 1. Goals of repositories 2. The bleak truth about availability of OA metadata vs content 3. Content referencing practises in repositories 4. Two principles to increase visibility of OA content
  • 4. 4/31 Outline 1. Goals of repositories (repositories as large metadata silos) 2. The bleak truth about availability of OA metadata vs content 3. Content referencing practises in repositories 4. Two principles to increase visibility of OA content
  • 5. 5/31 The primary purpose of repositories • Institutional repositories (IRs) serve a number of purposes; such collecting and curating digital outputs, providing statistics, research excellence, etc. • The primary goal of repositories is to open and disseminate research outputs to a worldwide audience (Crow, 2002) – SPARC’s position paper on the case for institutional repositories.
  • 6. 6/31 SPARC’s position paper on IRs “For the repository to provide access to the broader research community, users outside the university must be able to find and retrieve information from the repository. Therefore, institutional repository systems must be able to support interoperability in order to provide access via multiple search engines and other discovery tools. An institution does not necessarily need to implement searching and indexing functionality to satisfy this demand: it could simply maintain and expose metadata, allowing other services to harvest and search the content. This simplicity lowers the barrier to repository operation for many institutions, as it only requires a file system to hold the content and the ability to create and share metadata with external systems.”
  • 7. 7/31 COAR: About harvesting and aggregations … “Each individual repository is of limited value for research: the real power of Open Access lies in the possibility of connecting and tying together repositories, which is why we need interoperability. In order to create a seamless layer of content through connected repositories from around the world, Open Access relies on interoperability, the ability for systems to communicate with each other and pass information back and forth in a usable format. Interoperability allows us to exploit today's computational power so that we can aggregate, data mine, create new tools and services, and generate new knowledge from repository content.’’ [COAR manifesto]
  • 8. 8/31 We need OA to content (not just metadata) • Repositories (even the most prominent) often seen by aggregation systems as large metadata. • OA to metadata is not disruptive. Little difference to the traditional publishing model.
  • 9. 9/31 Outline 1. Goals of repositories (repositories as large metadata silos) 2. The bleak truth about availability of OA metadata vs content 3. Content referencing practises in repositories 4. Two principles to increase visibility of OA content
  • 10. 10/31 Study • 83 repositories (mainly EPrints with pdf research outputs) • 1,461,016 metadata records • Ratio of metadata to content • Data acquired from CORE (Knoth & Zdrahal, 2012)
  • 11. 11/31
  • 12. 12/31 “[The institutional repository] is like a roach motel. Data goes in, but it doesn’t come out.” (Salo, 2008)
  • 13. 13/31 Why is this a problem? • Lower accessibility of papers (we have them, but cannot find them) • Text-mining • Cannot monitor growth • Loosing a strong argument for the adoption of OA!
  • 14. 14/31 Outline 1. Goals of repositories (repositories as large metadata silos) 2. The bleak truth about availability of OA metadata vs content 3. Content referencing practises in repositories 4. Two principles to increase visibility of OA content
  • 15. 15/31 OAI-PMH and content referencing • OAI-PMH supports representing metadata in multiple formats, but at a minimum repositories must be able to return records with metadata expressed in the Dublin Core format (OAI-PMH v2.0, 2008) • If repositories want to satisfy the SPARC guidelines (Crow, 2002), they must provide a link to the content as part of the exposed metadata.
  • 16. 16/31 OAI-PMH and content referencing The Open Research Online repository (Eprints) links directly to the resource from metadata. Cranfield repository (DSpace) identifies the resource by providing a link to a page from which the resource (if available) can be accessed.
  • 17. 17/31 OAI-PMH and content referencing The OAI-PMH specification states on this topic that: “The nature of a resource identifier is outside the scope of the OAI- PMH. To facilitate access to the resource associated with harvested metadata, repositories should use an element in metadata records to establish a linkage between the record (and the identifier of its item) and the identifier (URL, URN, DOI, etc.) of the associated resource. The mandatory Dublin Core format provides the identifier element that should be used for this purpose.”
  • 18. 18/31 OAI-PMH and content referencing • What is an identifier of the associated resource? Is a splash page an identifier? According to OAI-PMH examples it is: <dc:identifier>http://arXiv.org/abs/cs/0112017</dc:identifier> • The standard is pretty weak on this aspect …
  • 19. 19/31 Outline 1. Goals of repositories (repositories as large metadata silos) 2. The bleak truth about availability of OA metadata vs content 3. Content referencing practises in repositories 4. Two principles to increase visibility of OA content
  • 20. 20/31 The principles of the principles  • Pragmatic rather than exciting. • Generating maximum benefit for a minimum investment. • Deliberately use current standards to minimise adoption time. • Respecting differences across systems and backwards compatibility. • Emphasizes the need for easy to use compliance mechanisms to assist repository managers in ensuring systems interoperability.
  • 21. 21/31 Principle 1 – Content referencing Open repositories should always establish a link from the metadata record to the item the metadata record describes using a dereferencable identifier pointing to the version held in the repository. The dereferencable identifier should be provided in the appropriate metadata element in the used metadata format (i.e. dc:identifier in the case of Dublin Core).
  • 22. 22/31 Implications: Principle 1 – Content referencing • Repositories can use different standards to deliver metadata over OAI-PMH (DC, METS, MPEG-21 DIDL) • Identifier must resolve (be actionable) to the object it identifies • In the case of DC, if more identifiers are present, use the first identifier as the identifier of the object • Should resolve to the version of the object in the local repository • Similarity with RIOXX identifier field • The principle is easily applicable in the OA domain: each item can be freely resolved
  • 23. 23/31 Open access statistics and principle 1 • Only dereferencable items are OA • Increases stats acuracy • Avoids anecdotal situations (e.g. 23,380 Dark Items)
  • 24. 24/31 Principle 2 – Content accessibility to machines Open repositories must provide universal access to machines with the same level of access as humans have. It is the role of open repositories to allow machines harvest the entire content of the repository in a reasonable time to enable harvesting systems to acquire and maintain up-to-date information about the repository content.
  • 25. 25/31 Example from Arxiv.org • Googlebot: unrestricted • Yahoo/MSN: can reharvest in 6 months • Researchers: access denied
  • 26. 26/31 Implications: Principle 2 – Content accessibility to machines • Accessibility of repository content by machines • Enabling reuse through new services, such as those relying on text-mining • Open Repositories should not discriminate, except for abusive behavior • Presumption of innocence
  • 27. 27/31 Validation tools • Key to adoption – Repository managers should not be left alone • Repository Analytics
  • 28. 28/31 Conclusions • Proportion of OA content that can be harvested is fairly low in comparison to metadata • Inhibiting the benefits of OA • Two principles: 1) Dereferencable identifiers - Open Repositories provide open access to content and not just meatadata 2) Machine access – Open Repositories should provide free access to content (for anybody, but mainly researchers) • Compliance validation tools are needed
  • 29. 29/31 Thank you! Open access needs open repositories
  • 30. 30/31 References 1/2 [BOAI, 2002] Budapest Open Access Initiative. (2002) http://www.opensocietyfoundations.org/openaccess/boai-10-recommendations [Crow, 2002] Crow, R. (2002). The case for institutional repositories: a SPARC position paper. ARL Bimonthly Report 223. [Knoth & Zdrahal, 2012] Knoth, P. and Zdrahal, Z. (2012) CORE: Three Access Levels to Underpin Open Access, D-Lib Magazine, 18, 11/12, Corporation for National Research Initiatives, http://dx.doi.org/10.1045/november2012-knoth [Konkiel, 2012] Konkiel, S. (2012) Are Institutional Repositories Doing Their Job? https://blogs.libraries.iub.edu/scholcomm/2012/09/11/are-institutional-repositories- doing-their-job/ [Laakso & Bjork, 2012] Laakso, M., & Björk, B. C. (2012). Anatomy of open access publishing: a study of longitudinal development and internal structure. BMC Medicine, 10(1), 124.
  • 31. 31/31 References 2/2 [Morrison, 2012] Morrison, Louise (2012) 5 reasons why I can’t find Open Access publications. http://mmitscotland.wordpress.com/2012/08/06/5-reasons-why-i-cant-find- open-access-publications-2/ [OAI-PMH v2.0, 2008] The Open Archives Initiative Protocol for Metadata Harvesting Version 2.0 (OAI-PMH), Impementation Guidelines (2008). http://www.openarchives.org/OAI/openarchivesprotocol.html [ResourceSync draft, 2013] ResourceSync protocol draft. 2013 http://www.niso.org/workrooms/resourcesync/ [Salo, 2008] Salo, D. (2008). Innkeeper at the roach motel. Library Trends, 57(2), 98-123. [Van de Sompel et al, 2004] Van de Sompel, H., Nelson, M. L., Lagoze, C., & Warner, S. (2004). Resource harvesting within the OAI-PMH framework. D-lib magazine, 10(12), 1082- 9873.

Editor's Notes

  1. In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.
  2. In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.
  3. Lets have a look on what some key players in the field think about the purpose of repositories
  4. In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.
  5. It will not be possible transfer to an OA culture unless we change the environment so that there will be clear benefits for researchers to participate in OA. These benefits should be technical rather than political.
  6. In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.
  7. In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.
  8. I can specify the differences between this and RIOXX
  9. I can specify the differences between this and RIOXX
  10. I can specify the differences between this and RIOXX
  11. I can specify the differences between this and RIOXX
  12. I can specify the differences between this and RIOXX
  13. I can specify the differences between this and RIOXX
  14. In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.
  15. In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.
  16. I can specify the differences between this and RIOXX
  17. I can specify the differences between this and RIOXX