Understanding Information Professionals: A Survey on the Quality of Linked Data Sources for Digital Libraries

Understanding Information Professionals:
A Survey on the Quality of Linked Data
Sources for Digital Libraries
Jeremy Debattista, Lucy McKenna, Rob Brennan
ADAPT Centre, Trinity College Dublin, Ireland
This research has received funding from the Irish Research Council Government of Ireland Postdoctoral Fellowship award (GOIPD/2017/1204)
and theADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme(Grant 13/RC/2106) andco-funded by
theEuropeanRegionalDevelopmentFund.

www.adaptcentre.ieWhat is a good digital library?
• Literature: Success of DL depends on the quality of available
metadata
• How do you define good quality metadata? It’s subjective,
there is no definite answer
• Potentially, an easy question to answer, but definitely not
generic

www.adaptcentre.ieLinked Data in Digital Libraries
• Data interoperability & re-usability
• Resource discoverability & visibility
• Data interlinking

www.adaptcentre.ieSo why the slow uptake?
• Linked Data is not a solution that solves all problems
• Quality issues as noted by various literature

www.adaptcentre.ieThe Aims of this Study
• What quality measures do IPs consider important?
o Why? Can we identify the generic quality measures for the
task at hand?
• What quality problems do IPs face when using Linked
Data?
o Why? Focus quality assessment on Digital Library Linked
Datasets

www.adaptcentre.ieSurvey Methodology
• Online questionnaire
o Snowball Sampling (Twitter, Email, Mailing lists)
• 50 Questions
o Primarily multiple choice – able to add own observations
o Partially based on:
 Previous surveys and analysis of projects in domain
 2 Data quality focused questions

www.adaptcentre.ieSurvey Methodology
• 185 participants
o Split in 2 groups:
 G1: Participants who have experience working in LD (n=54)
 G2: Participants who do not have experience working in LD
(n=131)
• Academic Library (56%), Research Institution (7%), Public
Library (7%), Special Library (6%), Archive (6%), National
Library (5%), Museum (4%), and Special Archive (1%)
• 20 countries
o Ireland (28%), the USA (23%) and the UK (20%)

www.adaptcentre.ieResults and discussion of the whole survey
McKenna, L., Debruyne, C., O’Sullivan, D.: Understanding
the position of information professionals with regards to
linked data: A survey of libraries, archives and museums.
In: Proceedings of the 18th ACM/IEEE on Joint Conference
on Digital Libraries (JCDL 2018), Fort Worth, Texas, USA,
June 3rd-7th, 2018. pp. 7–16 (2018)

www.adaptcentre.ieThe Questions
Q1. When completing different metadata tasks, what
evaluation criteria do you apply when using, or
searching for, external data sources?
Q2. Can you give an example of a data quality issue or
concern you experience frequently?

www.adaptcentre.ieKey Findings – Q1

GOAL: Understand what fitness for use means for the
survey participants in a digital library scenario.
• 11 dimensions and 2 generic options (none, other)
o Trustworthiness, Interoperability, Licensing,
Completeness, Understandability, Provenance, Timeliness,
Syntactic Validity, Availability, Conciseness, Versatility

• Statistical Testing: Do both groups consider each
measure to be of equal importance or otherwise?
• Z-score (α = 0.05)
• Reject null hypothesis for: Trustworthiness,
Interoperability and Availability.

GOAL: Understanding quality pitfalls in Linked Data
datasets for Digital Libraries.
• Open question:
o 92 responses => 77 quality problems
o 14 different quality measures

• Semantic Accuracy
o Incorrect DOIs
o Wrong ISBNs, URI references
• Completeness / Data Coverage
o Incomplete crowdsource efforts
o Incomplete important fields (e.g. publication date)
o Using old standards hence having incomplete obligatory
fields.

• Interoperability
o Lacks structured standards
o Metadata formats changing constantly
• Data formatting
o Inconsistent formatting of dates
o Naming inconsistencies (e.g. first name, last name vs last
name, first name)

• Other problems
o Conciseness - duplication
o Language Versatility – encoding problems
o Availability – resources are not always available
o Trustworthiness – credibility of the information on the
Web
o Licensing – using open datasets freely

www.adaptcentre.ieNext Steps
• Assess the quality of LD digital libraries
o Started a monthly assessment in August 2018
o Some results can be seen at http://luzzu.adaptcentre.ie
• Identify a quality profile to generalise an answer for
“What is a good digital library?”

www.adaptcentre.ieConclusion
• Discussed and identified the quality measures an IP
considers for finding external sources
o no agreement on importance or otherwise for 3 metrics
(trustworthiness, interoperability, and availability)
• Discussed quality problems as identified by the IPs in the
currently available data sources
o Mostly intrinsic in nature
jeremy.debattista@adaptcentre.ie
twitter: @jerdeb

Understanding Information Professionals: A Survey on the Quality of Linked Data Sources for Digital Libraries

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Understanding Information Professionals: A Survey on the Quality of Linked Data Sources for Digital Libraries

Similar to Understanding Information Professionals: A Survey on the Quality of Linked Data Sources for Digital Libraries (20)

Recently uploaded

Recently uploaded (20)

Understanding Information Professionals: A Survey on the Quality of Linked Data Sources for Digital Libraries

Editor's Notes