Measuring electronic resource availability final version

•Download as PPT, PDF•

1 like•320 views

Sanjeet Mann conducted a study measuring the availability of electronic resources at the University of Redlands Armacost Library. He tested 400 citations from 10 databases and found an overall availability of 62% with a 38% error rate. The types of errors were categorized, with the most common being proxy errors, source errors, and knowledge base errors. Mann discussed solutions like updating the proxy, customizing the knowledge base, and simplifying interfaces. He noted strengths in collecting both quantitative and qualitative data but weaknesses in not accounting for user issues. Mann proposed expanding the study to test availability through live student searches and evaluations.

Sanjeet Mann
Measuring Electronic University of Redlands
SCELC Research Day
Resource Availability March 5, 2013

Armacost Library infrastructure
•ILLIAD interlibrary loan
system

•Full text targets (databases,
ejournals)
(79,757 unique titles)

•Serials Solutions 360 Link

•A&I databases
(~77,000 titles indexed)

•Innovative Interfaces
catalog/proxy

Why do electronic resource errors matter?
Costs

Frustrated
expectations

Undermined
confidence

Complicated
instruction

Research question

"How often does
full text linking
work?"

Availability
studies
Sample of items

Available? Yes/No
Error?

Order encountered

Probabilities

Prioritize fixes

Development of the availability technique

•Print material availability
card catalog user surveys
(Reviewed in Mansbridge 1986, Nisonger
2007)

•Linear sequence
(De Prospo 1973)

•Branching model
(Kantor 1976)

•Applied to e-resources
500 articles from 50 high impact journals
(Nisonger 2009)

OpenURL performance

•OpenURL-based reasons
for availability error
(Wakimoto et al. 1998)

•“Digging into the Data” on
link resolver failure
(Trainor and Price 2010)

•NISO Initiatives:
KBART, IOTA, PIE-J
(Chandler et al. 2011, Glasser 2012, Kasprowski
2012)

Usability studies focusing on e-resources

•Database link
pages
(Fry 2011, Ponsford et al. 2011b)

•Resolver menus
(O’Neill 2009, Imler &
Eichelberger 2011, Ponsford et
al. 2011a)

•Discovery
services
(Williams & Foster 2011,
Fagan et al. 2012)

•Entire process

Methodology
400 citations
4 questions X 10 databases X 10 results

[18:11] redlandsreference:
what is your research topic?
Arts &
Humanities
[18:11] meeboguest59808: RILM
Oral Motor Activity MLA
Philosopher’s Index
[18:11] redlandsreference:
Is this for a Communicative
Disorders class? Social Sciences
America: History &
Life
EconLit
Sociological Index

Sciences
Biological Abstracts
ComDisDome,

Error coding

• What is an error?

• Six error
categories

• Updated criteria

Armacost Library failure points: P vs. O

Error details 1: Proxy errors

Domain missing from
forward table

Domain missing from SSL
Certificate

Timeouts trying to
establish connection

Error details 2: Source errors

Missing metadata

Erroneous metadata
(e.g. rft.date=0001-01-01)

Error details 3: Knowledge base errors

Title not selected in
knowledge base

Title selected, but in
poorly chosen collection

Knowledge base
holdings do not reflect
access entitlement
(embargo, back issues,
etc.)

Error details 4: Link resolver error

Confusion between two
similar titles

Unusual OpenURL syntax

Error details 5: Target errors

Content not loaded
(supplement, embargo)

Records concatenated
from full text and non-full-
text databases

Server downtime

Error details 6: ILLIAD errors

Unicode metadata not
displayed properly

rft.title used for both
book title and article title,
affects chapters and
dissertations

Results

Error rate: 150/400 (38%)

Overall availability: 250/400 (62%)

Locally downloadable: 104/400
(26%)

Sampling
Necessary sample size for a yes/no condition is determined by:

To use this, you need:
•Availability rate from a small pre-test
•Choose acceptable % confidence (95%)
•Choose acceptable margin of error (+/- 5%)

Plug values into the formula…
•p = 0.625 (250 / 400 successes)
•1-p = 0.375 (150 / 400 errors)
•C = 0.95 (95% confidence)
•Zc = 1.96 (statistical textbook or
http://www.measuringusability.com/pcalcz.php)
•E = 0.05 (5% error)
I could have just used 360 citations…

Confidence
Your confidence in a study of a particular sample size is given by:

I could have just used 360 citations…

Local Availability / Errors by Discipline

Solutions

Simplify result screen interface and terminology

Summary
•400 citations obtained through likely keyword searches
of 10 A&I databases
•62% availability / 38% error rate (98% confidence, +/- 5%)
•26% downloadable full text

•Responses include fixing proxy, kb holdings, interfaces,
upgrading systems

•Strengths: quant + qual data, very flexible
(n=100 allows 85% confidence)

•Weaknesses: Does not account for issues with
interfaces, searching or evaluation faced by actual users

Towards availability testing with live students

• More barriers:
o confusing interfaces
o difficulty formulating
searches and evaluating
sources
o login errors

• How to test:
o cognitive walkthrough +
recorded task protocols
o analysis informs
information literacy and
interface design

• Deliverables:
o availability %
o branching model
o usability report

Questions

Further reading: http://goo.gl/4fu47
Data set: http://goo.gl/606us

skatdesign.co
m

Lei Zheng has over 15 years of experience in areas such as machine learning, data mining, and software development. He currently works as a Senior Software Engineer at Yahoo, where he develops algorithms for spam filtering and detection of abusive behavior. Previously he held research positions at the University of Pittsburgh and JustSystems Evans Research, where he implemented algorithms and systems for information retrieval, natural language processing, and data mining.

Research data management for medical data with pyradigm

Pradeep Redddy Raamana

Bio ontologies and semantic technologies[2]

Prof. Wim Van Criekinge

This document provides an introduction to bio-ontologies and the semantic web. It discusses what ontologies are and how they are used in the bio domain through initiatives like the OBO Foundry. It introduces key semantic web technologies like RDF, URIs, Turtle syntax, and SPARQL query language. It provides examples of ontologies like the Gene Ontology and how ontologies can be represented and queried using these semantic web standards.

Searching Linked Data

Thanh Tran

The document discusses searching for answers to keyword queries in linked data. It presents the problem of keyword query routing, which aims to identify a valid set of data sources that can produce non-empty answers to a keyword query. It proposes using keyword-element relationship graphs at the element, schema, and data source levels to model relationships between keywords and data elements or sources. Experiments on a chunk of the Billion Triple Challenge dataset indicate that considering relationships between elements within a maximum path length performs better than considering only direct relationships, and identifies valid plans for multi-source queries.

Facilitating semantic alignment.-biohackathon-jupp

Simon Jupp

1) The document discusses EBI's efforts to facilitate semantic alignment of its resources through building ontologies and annotating data with ontologies. 2) It describes EBI's work developing ontologies like the Experiment Factor Ontology and using ontologies to enhance search, data visualization, and data integration. 3) The challenges of representing EBI data in RDF are discussed, and future directions are outlined that could make RDF deployment simpler and enable more interesting queries over EBI data.

schema.org and biomedical ontologies

Simon Jupp

The document discusses using ontologies and Schema.org properties to connect biomedical data to ontology terms and concepts. Over 200 biomedical ontologies are in active use by life science databases at EMBL-EBI. Schema.org properties like MedicalCode and CreativeWork can be used to mark up ontology terms, data resources, and their relationships. This would allow questions about which ontologies and terms are used in specific data, and enable richer searching and discovery across data and ontologies.

Building a repository of biomedical ontologies with Neo4j

Simon Jupp

2019 03 05_biological_databases_part3_v_upload

Prof. Wim Van Criekinge

The document discusses various database concepts including normalization, which is used to design optimal relation schemas by removing redundant data. It also covers transaction processing, which involves executing logical database operations as transactions to maintain data integrity. Database systems use techniques like logging and concurrency control to prevent transaction anomalies and ensure failures can be recovered from.

Data integration is intrinsic to how modern research is undertaken in areas such as genomics, drug development and personalised medicine. To better enable this integration a large number of biomedical ontologies have been developed to provide standard semantics for describing metadata. There are now several hundred biomedical ontologies in widespread use that describe concepts such as genes, molecules, drugs and diseases. This amounts to millions of terms that are interconnected via relationships that naturally form a graph of biomedical terminology. The Ontology Lookup Service (OLS) (http://www.ebi.ac.uk/ols) integrates over 160 ontologies and provide a central point for the biomedical community to query and visualise ontologies. OLS also provide a RESTful API over the ontologies that is used in high-throughput data annotation pipelines. OLS is built on top of a Neo4j database that provides efficient indexes for extracting ontological relationships. We have developed generic tools for loading RDF/OWL ontologies into Neo4j where the indexes are optimised for serving common ontology queries. We are now moving to adopt graph database more widely in applications relating to ontology mapping prediction and recommendation systems for data annotation.

Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...

Shalin Hai-Jew

This document summarizes a presentation on using NVivo 10 software to code and analyze qualitative and mixed methods research data. It introduces NVivo 10 as a data management and analysis tool, demonstrates how to import and code data from various sources, and shows how to visualize and analyze coded data through matrices, models, and queries. The goals are to introduce NVivo 10's capabilities and to demonstrate the process of setting up a project for qualitative or mixed methods research.

2020 02 11_biological_databases_part1

Prof. Wim Van Criekinge

This document provides an overview of bioinformatics and biological databases. It discusses how bioinformatics draws from fields like biology, computer science, statistics, and machine learning. Biological databases are important resources for bioinformatics that can be searched and analyzed to answer questions, find similar sequences, locate patterns, and make predictions. The document also outlines common uses of biological databases, such as annotation searches, homology searches, pattern searches, and predictive analyses.

Connecting life sciences data at the European Bioinformatics Institute

Connected Data World

Connected Data for Machine Learning | Paul Groth

Connected Data World

Elsevier aims to construct knowledge graphs to help address challenges in research and medicine. Knowledge graphs link entities like people, concepts, and events to provide answers. Elsevier analyzes text and data to build knowledge graphs using techniques like information extraction, machine learning, and predictive modeling. Their knowledge graph integrates data from publications, clinical records, and other sources to power applications that help researchers, medical professionals, and patients. Knowledge graphs are a critical component for delivering value, especially as data volumes and needs accelerate.

T1 2018 bioinformatics

Prof. Wim Van Criekinge

This document discusses next generation DNA sequencing technologies. It begins by describing some of the limitations of traditional Sanger sequencing, such as read lengths of 500-1000 bases and throughput of 57,000 bases per run. It then introduces some key next generation sequencing technologies, such as 454 sequencing which uses emulsion PCR and pyrosequencing to achieve read lengths of 20-100 bases but higher throughput of 20-100 Mb per run. Illumina/Solexa sequencing is also discussed, which uses sequencing by synthesis with reversible terminators and laser-based detection. Finally, third generation sequencing technologies are mentioned, such as Pacific Biosciences' single molecule real time sequencing and nanopore sequencing. In summary, the document provides a high-level

Reproducible research: theory

C. Tobin Magle

This document discusses reproducible research and provides guidance on how to conduct research in a reproducible manner. It covers: 1. The importance of reproducible research due to large datasets, computational analyses, and the potential for human error. Ensuring reproducibility requires new expertise and infrastructure. 2. Key aspects of reproducible research include data management plans, version control, use of file formats and software/tools that allow reproducibility, and publishing data and code to allow others to replicate results. 3. Reproducible research benefits the scientific community by increasing transparency and allows researchers to re-analyze their own data in the future. Journals and funders are increasingly requiring reproducibility.

NeXML

Rutger Vos

NeXML is a proposed data exchange standard for phylogenetics that addresses issues with the current NEXUS format. It defines an XML schema for representing phylogenetic data like trees, networks, and character data. The schema is designed to be extensible, reuse prior standards, and take advantage of existing XML tools. An implementation includes XML parsers and writers in multiple programming languages and experiments with semantic annotation and web services.

A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...

María Poveda Villalón

The document proposes a lightweight methodology called LOT (Linked Open Terms) for developing Linked Data ontologies and vocabularies in a reusable way. The methodology is data-driven and focuses on ontology search, selection, integration, completion and evaluation activities. It provides guidelines for reusing existing terms and linking them according to Linked Data principles while keeping the processes lightweight. The methodology is intended to help domain experts create ontologies and vocabularies for publishing data on the semantic web in an interoperable way without requiring extensive knowledge engineering expertise. Future work involves providing more detailed guidelines, examples, and connecting existing tools to support each step of the methodology.

Improving Semantic Search Using Query Log Analysis

Stuart Wrigley

Despite the attention Semantic Search is continuously gaining, several challenges affecting tool performance and user experience remain unsolved. Among these are: matching user terms with the searchspace, adopting view-based interfaces in the Open Web as well as supporting users while building their queries. This paper proposes an approach to move a step forward towards tackling these challenges by creating models of usage of Linked Data concepts and properties extracted from semantic query logs as a source of collaborative knowledge. We use two sets of query logs from the USEWOD workshops to create our models and show the potential of using them in the mentioned areas.

2019 02 12_biological_databases_part1_v_upload

Prof. Wim Van Criekinge

This document discusses the Biological Databases project being conducted by a group of students. The project involves using the video game Minecraft to visualize protein structures retrieved from the Protein Data Bank (PDB). Python scripts are used to import PDB data files and place blocks in Minecraft to represent atoms, with different block colors used to distinguish atom types. SPARQL queries are also employed to search the RDF version of the PDB for protein entries. The goal is to build 3D protein models inside Minecraft for educational and visualization purposes.

Matrix Queries and Matrix Data Representations in NVivo 11 Plus

Shalin Hai-Jew

Electronic resources for classicists 2014

lettylib

This document provides a summary of 25 electronic resources for classicists in 25 minutes. It outlines in-house resources available through the Faculty of Classics website and library catalogues. It then describes bibliographical databases, full text databases, dictionaries, encyclopedias, images databases, and a referencing tool. The resources cover topics such as Greek and Latin texts, dictionaries, encyclopedias, inscriptions, artworks, and free online courses relevant to classics. The presentation aims to introduce classicists to key online resources available through the library.

Introduction to E-resources - UEA

INTOLONDONLRC

This document provides an introduction to academic e-resources and how to use them. It outlines the services available at the Learning Resource Centre, including its hours and borrowing policies. It defines what an e-resource is, such as e-books and e-journals, and explains why students need to use them as they contain up-to-date peer-reviewed research. It provides steps to find e-books and e-journals through the university library website and search individual databases. It offers tips for choosing effective keywords and search techniques using Boolean operators and wildcards. Students are given tasks to practice these skills and evaluate sources. Contact information is provided for getting help from library staff.

Connect Your Resources, Save Time, Save Money:: Connecting library electron...

Richard Bernier

The document discusses how linking a library's electronic resources like databases and catalogs can reduce redundant searching and save time and money. It provides examples of databases like EBSCOhost, ProQuest, and OPAC systems that have features to dynamically link full text articles to local holdings information. Setting up these links requires coordinating with database vendors and ensuring compatible search features between systems.

E-LIS: Disciplinary Repository For Library and Information Sciences

sanat kumar behera

E-LIS is a global digital archive for library and information science established in 2003. It aims to provide open access to documents in the field and currently contains over 12,000 papers in 37 languages. E-LIS uses the OAI-PMH protocol to allow metadata harvesting and supports depositing of various document types from researchers, librarians, and information professionals. It has an international editorial team that oversees operations and works to promote open access scholarship globally.

Service Learning and Librarians Without Borders

Librarians Without Borders

This document summarizes a presentation about service learning and the work of Librarians Without Borders (LWB). It introduces service learning and LWB, discussing two case studies of LWB initiatives in Costa Rica and Guatemala. In Costa Rica, LWB students helped build a school library, developing its collection and setting it up. In Guatemala, LWB has partnered with a school to implement a library through ongoing fundraising, service trips, and support. The presentation previews LWB's future plans and takes questions from the audience.

National Digital Library

Scottish Library & Information Council (SLIC), CILIP in Scotland (CILIPS)

The document outlines plans for a National Digital Library in Finland to aggregate and provide access to the digital collections of libraries, archives, and museums. The goals are to [1] create a common user interface by 2011 for searching across these collections, [2] digitize important cultural heritage materials, and [3] develop long-term preservation solutions. It will work with Europeana to increase the visibility and impact of Finnish cultural collections internationally. Realizing this vision requires national coordination, common standards, and sustainable funding and resources.

Discover - e: Tips and Tricks for Connecting Users to Library-provided Electr...

St. Petersburg College

Access and Ownership Issues of Electronic Resources in the Library

Fe Angela Verzosa

Role of Semantic Web in Health Informatics

Artificial Intelligence Institute at UofSC

Tutorial presented at 2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012), January 28-30, 2012. http://sites.google.com/site/web2011ihi/participants/tutorials This tutorial weaves together three themes and the associated topics: [1] The role of biomedical ontologies [2] Key Semantic Web technologies with focus on Semantic provenance and integration [3] In-practice tools and real world use cases built to serve the needs of sleep medicine researchers, cardiologists involved in clinical practice, and work on vaccine development for human pathogens.

The MIAPA ontology: An annotation ontology for validating minimum metadata re...

Hilmar Lapp

This document describes the MIAPA (Minimum Information About a Phylogenetic Analysis) ontology, which was developed to standardize the annotation and reporting of metadata for phylogenetic analyses. The MIAPA ontology reuses terms from existing ontologies and is designed according to OBO Foundry best practices. It provides a standard way to annotate key information about phylogenetic tree topologies, operational taxonomic units, branch lengths, character matrices, alignment and tree inference methods. The goal is to facilitate increased access to and reuse of phylogenetic data through consistent annotation of published trees according to the MIAPA standard.

What's hot

Ontologies neo4j-graph-workshop-berlin

Simon Jupp

Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...

Shalin Hai-Jew

2020 02 11_biological_databases_part1

Prof. Wim Van Criekinge

Connecting life sciences data at the European Bioinformatics Institute

Connected Data World

Connected Data for Machine Learning | Paul Groth

Connected Data World

T1 2018 bioinformatics

Prof. Wim Van Criekinge

Reproducible research: theory

C. Tobin Magle

NeXML

Rutger Vos

A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...

María Poveda Villalón

Improving Semantic Search Using Query Log Analysis

Stuart Wrigley

2019 02 12_biological_databases_part1_v_upload

Prof. Wim Van Criekinge

Matrix Queries and Matrix Data Representations in NVivo 11 Plus

Shalin Hai-Jew

What's hot (12)

Ontologies neo4j-graph-workshop-berlin

Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...

2020 02 11_biological_databases_part1

Connecting life sciences data at the European Bioinformatics Institute

Connected Data for Machine Learning | Paul Groth

T1 2018 bioinformatics

Reproducible research: theory

NeXML

A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...

Improving Semantic Search Using Query Log Analysis

2019 02 12_biological_databases_part1_v_upload

Matrix Queries and Matrix Data Representations in NVivo 11 Plus

Viewers also liked

Electronic resources for classicists 2014

lettylib

Introduction to E-resources - UEA

INTOLONDONLRC

Connect Your Resources, Save Time, Save Money:: Connecting library electron...

Richard Bernier

E-LIS: Disciplinary Repository For Library and Information Sciences

sanat kumar behera

Service Learning and Librarians Without Borders

Librarians Without Borders

National Digital Library

Scottish Library & Information Council (SLIC), CILIP in Scotland (CILIPS)

Discover - e: Tips and Tricks for Connecting Users to Library-provided Electr...

St. Petersburg College

Access and Ownership Issues of Electronic Resources in the Library

Fe Angela Verzosa

Viewers also liked (8)

Electronic resources for classicists 2014

Introduction to E-resources - UEA

Connect Your Resources, Save Time, Save Money:: Connecting library electron...

E-LIS: Disciplinary Repository For Library and Information Sciences

Service Learning and Librarians Without Borders

National Digital Library

Discover - e: Tips and Tricks for Connecting Users to Library-provided Electr...

Access and Ownership Issues of Electronic Resources in the Library

Similar to Measuring electronic resource availability final version

Role of Semantic Web in Health Informatics

Artificial Intelligence Institute at UofSC

The MIAPA ontology: An annotation ontology for validating minimum metadata re...

Hilmar Lapp

Royal society of chemistry activities to develop a data repository for chemis...

Ken Karapetyan

The Royal Society of Chemistry publishes many thousands of articles per year, the majority of these containing rich chemistry data that, in general, in limited in its value when isolated only to the HTML or PDF form of the articles commonly consumed by readers. RSC also has an archive of over 300,000 articles containing rich chemistry data especially in the form of chemicals, reactions, property data and analytical spectra. RSC is developing a platform integrating these various forms of chemistry data. The data will be aggregated both during the manuscript deposition process as well as the result of text-mining and extraction of data from across the RSC archive. This presentation will report on the development of the platform including our success in extracting compounds, reactions and spectral data from articles. We will also discuss our developing process for handling data at manuscript deposition and the integration and support of eLab Notebooks (ELNS) in terms of facilitating data deposition and sourcing data. Each of these processes is intended to ensure long-term access to research data with the intention of facilitating improved discovery.

Royal society of chemistry activities to develop a data repository for chemis...

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Resource Description Framework Approach to Data Publication and Federation

Pistoia Alliance

Curation-Friendly Tools for the Scientific Researcher

bwestra

FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...

Carole Goble

How Much do Availability Studies Increase Full Text Success?

Sanjeet Mann

OER for repository managers

Nick Sheppard

FAIR BioData Management

Ulrike Wittig

Preserving the Inputs and Outputs of Scholarship

tsbbbu

Tim Babbitt discusses the changing context of research and scholarship due to digitization and the internet. The inputs and outputs of research are increasingly digital and complex, including data, code, presentations, and more. ProQuest has a history of preserving scholarship through microfilming and is exploring how to preserve the full range of digital scholarly outputs and their linkages in a sustainable way. Key questions include balancing new and old preservation methods and moving beyond preserving individual objects to also preserving networks and linkages between scholarly works.

Designing e-Learning Objects

Oregon State University Libraries and Press

The document provides guidelines for designing effective e-learning objects and asynchronous instruction. It discusses best practices from sources like the Association of College and Research Libraries (ACRL) and Project Information Literacy. These include establishing learning outcomes, developing content that limits cognitive load, and ensuring accessibility for all students regardless of location. The document then outlines steps for instructional design using the ADDIE model of analysis, design, development, implementation and evaluation. Examples are provided for each step, with a focus on incorporating principles of multimedia learning and usability testing.

Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...

sesrdm

This document discusses the characteristics and challenges of managing life sciences data. It notes that bio-data lacks structure, grows rapidly in heterogeneous formats and file sizes. Data goes through multiple analysis stages and is associated with evolving metadata standards. Ensuring data is properly stored, shared and preserved requires significant effort in describing formats, preparing submissions to various specialized public repositories, and developing data management plans. Integrating data from different sources also poses major challenges.

The Research Object Initiative:Frameworks and Use Cases

Carole Goble

Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...

ICZN

This document discusses developing a business model for ZooBank, a proposed online registry of zoological nomenclature. It outlines elements to consider for the business model, including the scientific, technical, social, and financial models. It also discusses how ZooBank could operate within the EDIT network to establish a prototype web taxonomy and help coordinate taxonomic data infrastructure. Funding opportunities that could support ZooBank are also mentioned.

Profile-based Dataset Recommendation for RDF Data Linking

Mohamed BEN ELLEFI

This document summarizes Mohamed Ben Ellefi's PhD thesis defense on profile-based dataset recommendation for RDF data linking. The thesis proposes two approaches: a topic profile-based approach and an intensional profile-based approach. The topic profile-based approach models datasets as topics and recommends target datasets based on similarity between source and target topic profiles, achieving an average recall of 81% and reducing the search space by 86%. The approach shows better performance than baselines but needs improvement on precision.

Research Objects in Wf4Ever

Jose Enrique Ruiz

1) The document discusses research objects (ROs) which aim to document the full scientific process in a digital environment, including workflows, data, software, and provenance. 2) ROs in the Wf4Ever project contain detailed semantic annotations and can be aggregated into templates to help complete the scientific record. 3) Incentives for using ROs include improved reproducibility, credit for researchers, and increased citations of papers that link to their underlying data and methods.

Research Objects: more than the sum of the parts

Carole Goble

Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge. A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.

Data Archiving and Sharing

C. Tobin Magle

Semantic Web: introduction & overview

Amit Sheth

The document provides an overview of semantic technologies and discusses their increasing mainstream adoption. It notes that Microsoft purchased Powerset in 2008, Apple purchased Siri in 2010, and Google bought Metaweb and released semantic search in 2013. It discusses how semantic technologies allow for interoperability through shared representations and reasoning. Examples are given of early semantic search applications from 1999-2002 and an operational semantic electronic medical record application deployed in 2006.

Similar to Measuring electronic resource availability final version (20)

Role of Semantic Web in Health Informatics

The MIAPA ontology: An annotation ontology for validating minimum metadata re...

Royal society of chemistry activities to develop a data repository for chemis...

Resource Description Framework Approach to Data Publication and Federation

Curation-Friendly Tools for the Scientific Researcher

FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...

How Much do Availability Studies Increase Full Text Success?

OER for repository managers

FAIR BioData Management

Preserving the Inputs and Outputs of Scholarship

Designing e-Learning Objects

Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...

The Research Object Initiative:Frameworks and Use Cases

Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...

Profile-based Dataset Recommendation for RDF Data Linking

Research Objects in Wf4Ever

Research Objects: more than the sum of the parts

Data Archiving and Sharing

Semantic Web: introduction & overview

Measuring electronic resource availability final version

1. Sanjeet Mann Measuring Electronic University of Redlands SCELC Research Day Resource Availability March 5, 2013

2. Armacost Library infrastructure •ILLIAD interlibrary loan system •Full text targets (databases, ejournals) (79,757 unique titles) •Serials Solutions 360 Link •A&I databases (~77,000 titles indexed) •Innovative Interfaces catalog/proxy

3. Why do electronic resource errors matter? Costs Frustrated expectations Undermined confidence Complicated instruction

4. Research question "How often does full text linking work?"

5. Availability studies Sample of items Available? Yes/No Error? Order encountered Probabilities Prioritize fixes

6. Development of the availability technique •Print material availability card catalog user surveys (Reviewed in Mansbridge 1986, Nisonger 2007) •Linear sequence (De Prospo 1973) •Branching model (Kantor 1976) •Applied to e-resources 500 articles from 50 high impact journals (Nisonger 2009)

7. OpenURL performance •OpenURL-based reasons for availability error (Wakimoto et al. 1998) •“Digging into the Data” on link resolver failure (Trainor and Price 2010) •NISO Initiatives: KBART, IOTA, PIE-J (Chandler et al. 2011, Glasser 2012, Kasprowski 2012)

8. Usability studies focusing on e-resources •Database link pages (Fry 2011, Ponsford et al. 2011b) •Resolver menus (O’Neill 2009, Imler & Eichelberger 2011, Ponsford et al. 2011a) •Discovery services (Williams & Foster 2011, Fagan et al. 2012) •Entire process

9. Methodology 400 citations 4 questions X 10 databases X 10 results [18:11] redlandsreference: what is your research topic? Arts & Humanities [18:11] meeboguest59808: RILM Oral Motor Activity MLA Philosopher’s Index [18:11] redlandsreference: Is this for a Communicative Disorders class? Social Sciences America: History & Life EconLit Sociological Index Sciences Biological Abstracts ComDisDome,

10. Link testing http://goo.gl/606us

11. Error coding • What is an error? • Six error categories • Updated criteria

12. Armacost Library failure points: P vs. O

13. Error details 1: Proxy errors Domain missing from forward table Domain missing from SSL Certificate Timeouts trying to establish connection

14. Error details 2: Source errors Missing metadata Erroneous metadata (e.g. rft.date=0001-01-01)

15. Error details 3: Knowledge base errors Title not selected in knowledge base Title selected, but in poorly chosen collection Knowledge base holdings do not reflect access entitlement (embargo, back issues, etc.)

16. Error details 4: Link resolver error Confusion between two similar titles Unusual OpenURL syntax

17. Error details 5: Target errors Content not loaded (supplement, embargo) Records concatenated from full text and non-full- text databases Server downtime

18. Error details 6: ILLIAD errors Unicode metadata not displayed properly rft.title used for both book title and article title, affects chapters and dissertations

19. Results Error rate: 150/400 (38%) Overall availability: 250/400 (62%) Locally downloadable: 104/400 (26%)

20. Sampling Necessary sample size for a yes/no condition is determined by: To use this, you need: •Availability rate from a small pre-test •Choose acceptable % confidence (95%) •Choose acceptable margin of error (+/- 5%) Plug values into the formula… •p = 0.625 (250 / 400 successes) •1-p = 0.375 (150 / 400 errors) •C = 0.95 (95% confidence) •Zc = 1.96 (statistical textbook or http://www.measuringusability.com/pcalcz.php) •E = 0.05 (5% error) I could have just used 360 citations…

21. Confidence Your confidence in a study of a particular sample size is given by: I could have just used 360 citations…

22. Discussion

23. Discussion (continued)

24. Local Availability / Errors by Discipline

25. Solutions Edit proxy forward table

26. Solutions Upgrad e ILLIAD

27. Solutions Customiz e Serials Solutions

28. Solutions Simplify result screen interface and terminology

29. Summary •400 citations obtained through likely keyword searches of 10 A&I databases •62% availability / 38% error rate (98% confidence, +/- 5%) •26% downloadable full text •Responses include fixing proxy, kb holdings, interfaces, upgrading systems •Strengths: quant + qual data, very flexible (n=100 allows 85% confidence) •Weaknesses: Does not account for issues with interfaces, searching or evaluation faced by actual users

30. Towards availability testing with live students • More barriers: o confusing interfaces o difficulty formulating searches and evaluating sources o login errors • How to test: o cognitive walkthrough + recorded task protocols o analysis informs information literacy and interface design • Deliverables: o availability % o branching model o usability report

31. Questions Further reading: http://goo.gl/4fu47 Data set: http://goo.gl/606us skatdesign.co m

Editor's Notes

This is the online version of my presentation given March 5, 2013 at SCELC Research Day, Loyola Marymount University.
This diagram presents an overview of Armacost Library’s e-resource discovery infrastructure. Five systems (proxy server, source database, knowledge base/link resolver, target database and ILL system) must work together using common standards for students and faculty to be able to discover full text.
Electronic resource errors cost libraries in terms of unrealized value on paid-for content that cannot be accessed, and in terms of staff time spent on troubleshooting. Unnecessary ILL requests also add staff costs and IFM/copyright charges. Errors frustrate student and faculty expectations and undermine library staff confidence in the accuracy of their own systems for day-to-day use. Scarce physical and fiscal resources are already compelling libraries to justify their relevance to their campuses; unavailable e-resources only fuel skeptics’ concerns. Errors also require instruction librarians to take precious course time away from higher-order thinking skills to explain technical workarounds and search mechanics in greater detail.
My research study asks the question, how often can Armacost Library users get to the full text of sources they find in abstracting and indexing databases? My study includes, but is not limited to, investigation of OpenURL linking. I operationalized “availability” as two separate factors: students’ ability to download the full text of a source, and the likelihood that users would receive an error as opposed to finding that a source was available in any way (via download, in the physical library, or via ILL)
Availability studies are a systems analysis research method designed to find out why libraries are unavailable to supply materials to readers, and prioritize troubleshooting efforts. The method was first used in an academic library in 1934 (Gaskill). Investigators generate a sample of items and attempt to retrieve them from the stacks, or download them online. All unavailable items are classified according to the reason why they could not be obtained. Problems can be sorted in the order that a student would encounter them, and assigned probabilities of occurring based on their frequency in the sample. Ideally, librarians would then fix the most frequent problems first.
Nisonger and Mansbridge’s review articles give a succinct overview of the availability technique and findings from numerous studies. De Prospo, Kantor and Nisonger have also contributed significantly to our knowledge of this research method.
In addition to the literature on availability studies, research on OpenURL performance was also relevant to my study. These investigations focus on one source of error – the library’s knowledge base. Researchers tested samples of OpenURL links to determine proportions of available and erroneous items. Problems frequently involved the metadata “supply chain” linking publishers, database vendors and knowledge base providers. Several NISO initiatives have sought to improve the quality of e-resource metadata to reduce the frequency of metadata-related errors.
Many library website usability studies have focused on how students access electronic resources. These studies focus on interface design and vocabulary issues that affect electronic resource availability. Researchers have used a variety of usability methods, including task protocols and cognitive walkthroughs. Studies have either isolated parts of the library’s online presence, or glanced over the entire process a student would use, as in Kress’s study of the reasons why students might place an unnecessary ILL request for an article contained in a subscribed e-journal.
I collected a sample of 400 citations by identifying 4 actual student research topics (mentioned in our reference transactions) and searching the topic keywords in each of 10 A&I databases covering a variety of subject areas. I attempted to retrieve the full text of the first 10 search results from each database (I did not modify the default sort order or page to subsequent result screens in order to more accurately simulate student research behavior)
For each of the 400 items tested, I recorded bibliographic metadata in an Excel spreadsheet (see Google spreadsheet link). I also collected “incoming” (from source A&I database to link resolver, see yellow “find full text” link in screenshot) and “outbound” (from link resolver to target full text database, see red circles in screen shot) OpenURLs for each item and pasted them into the spreadsheet. Finally, I recorded ability to download full text and availability as two separate yes/no parameters. (An item could be either available or erroneous. Not all available items were available via full-text download) After testing all items, I went back and assigned a category of error to each unavailable item.
Error categories require judgment calls on the part of the investigator. Many errors (such as incorrect publisher metadata) are not evident at their point of origin, only detectable by problems that occur later in the retrieval process. I developed six error categories roughly matching the five systems involved in e-resource retrieval. The next seven slides present an overview of the categories and examples of common errors of each type. The availability and openURL-testing literature contains some discussion of what constitutes an unavailable or erroneous resource. I chose to treat ILL requests as a normal part of the retrieval process, rather than as a failure of the library to obtain all items a user might need (something which is no longer possible even for libraries with the most comprehensive collection development policies) I also specified that each item must link directly to a screen providing HTML or PDF full text; screens such as the one pictured here where the link leads to a list of items could confuse students, so I counted it as an error.
A side-by-side comparison of causes for error in the print and online environments demonstrates the additional complexity of conducting research in an online environment.
Failure can be localized to the proxy server because the full text target database’s domain is missing from the proxy server forward table, because the proxy server SSL certificate does not contain the full text target database domain, or because the proxy server slowed the connection significantly, causing the web browser to time out. User logins are another source of error (not tested in this study)
The source A&I database can cause problems due to its interface design (not tested in this study) or because metadata are missing or erroneous. This can cause the link resolver to fail or delay the processing of ILL requests. Our ILL staff frequently need to verify requests with duplicate information in the article and journal title fields or nonsensical dates (e.g. “0001”). Libraries that have configured their ILL system to automatically send requests (e.g. ILLIAD Direct Request) will experience slower performance as erroneous requests are flagged by the system for human intervention.
Library staff are responsible for selecting titles and collections in knowledge bases that reflect their subscription entitlements. Errors occur if the entire title, or the starting and ending range of the library’s holdings of that title, are either selected when they shouldn’t be (“false positive” error) or not selected when they should be (“false negative” error). Sometimes the same title is listed in multiple collections; library staff must choose the collection with the most complete metadata or risk errors such as the missing article-level link illustrated here (the “SCELC Wiley-Blackwell Collection” lacked information necessary to achieve article level linking throughout the collection, while a different collection with the same titles contained that information) Publishers and knowledge base vendors can also contribute to problems at this stage, when a publisher does not notify the knowledge base vendor in a timely manner of publication changes, or when a knowledge base vendor does not accurately reflect the publisher’s embargo or other information pertaining to access.
Link resolvers and target databases contributed relatively few errors in my study. Most link resolver errors involved a failure to draw a match between the requested citation and the item in the target resource. This could be due to idiosyncratic metadata or even to variation in libraries’ cataloging practices. In this example, an article from Costerus , a journal not held online, was then run as a catalog title search, which matched on an issue that had been cataloged as a serial monograph. (This problem is likely incomprehensible to our undergraduates)
Target databases most commonly generated errors because of missing content (either because their publisher agreement forbids loading that content or because they had not notified the link resolver of an embargo). Interface issues represent another source of error not tested in my study. One provider’s tendency to concatenate records for the same item from multiple databases (one containing full text, one containing only an abstract) created problems when the full text record was consistently “hidden” in favor of the abstract-only record (which was consistently targeted by the link resolver)
Many errors manifested at the point of submitting an ILL request. Articles with foreign-language characters in the title did not display properly because the then-current version of ILLIAD did not support Unicode. (A subsequent upgrade fixed the problem). Also, when ILLIAD received an OpenURL that only used rft.title, it listed the field twice, in both journal name and article name. Our ILL staff frequently referred these issues to me because they were not sure which was the journal title (used to select the correct OCLC record to request)
These sample findings can be generalized to the entire population of all e-resources at Armacost Library with over 95% confidence and +/-5% margin of error (see next slide) Out of every 100 citations, I would expect to find: 38 errors significant enough to prevent a student from obtaining full text or successfully placing an ILL request 34 potentially successful ILL requests 2 items available from the physical collection 26 full text downloads
Full-text availability and the presence of error are yes/no (Bernoulli or Binomial) outcomes. Statistical textbooks give the formula for determining the sample size for a binomial population. You will need to conduct a pre-test first to obtain values for the proportion of successful and unsuccessful outcomes. You can choose confidence (c) and error (E) values arbitrarily. The lower the values, the less strong your study, but the easier it is to conduct because you can use a smaller sample. The value Zc is found in a table online or in a statistical textbook. It is related to your confidence: the higher your confidence, the greater Zc becomes.
Rewriting the equation to solve for Zc gives you this equation, which lets you state your level of confidence in a study of a particular sample size. Look up the Zc value in the table of standard normal distributions or online to determine the confidence probability. Note that small, convenient samples can still obtain a reasonably high confidence probability.
Most of the sources I tested were articles, but dissertations and book chapters generated a disproportionately great number of errors.
Most of my errors occurred at the source database, knowledge base or ILL stages.
There was considerable variation by discipline. The music searches produced results that were rarely downloadable full-text and frequently triggered errors, while history searches frequently led me to a seamless full-text download. Several factors could be influencing these results. Different databases may have different metadata standards and publishers and are distributed by different vendors. Some databases index a lot of hard-to-obtain items like conference proceedings, while others mostly consist of journal articles. Some vendors augment their A&I search results with “linked full text” PDFs from another database on the same platform. This type of direct comparison is not useful for assigning responsibility for error, but is interesting for subject librarians who need to know the challenges students in their liaison areas may be facing.
I attacked some simpler solutions immediately after finishing my study. I added missing domains to the proxy forward table…
Upgraded ILLIAD to get Unicode support…
Corrected holdings in Serials Solutions…
… and worked with our web team to make the link resolver result screen easier to understand.
Availability studies are a flexible technique to get quantifiable information about students’ access to full text. My study was a “simulated” study because I tested the access myself, rather than using actual library patrons.
No studies have attempted an electronic resource availability study with library patrons so far. Such a study would add several potential causes for error. The study would need to incorporate usability methods, for example, conducting a cognitive walkthrough of the path Armacost Library users could follow to research a particular topic, then observing students trying to search that same topic.
Follow these links to view my literature review and dataset. Email me with questions at sanjeet_mann@redlands.edu

Measuring electronic resource availability final version

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Viewers also liked

Viewers also liked (8)

Similar to Measuring electronic resource availability final version

Similar to Measuring electronic resource availability final version (20)

Measuring electronic resource availability final version

Editor's Notes