SlideShare a Scribd company logo
Understanding Information Professionals:
A Survey on the Quality of Linked Data
Sources for Digital Libraries
Jeremy Debattista, Lucy McKenna, Rob Brennan
ADAPT Centre, Trinity College Dublin, Ireland
This research has received funding from the Irish Research Council Government of Ireland Postdoctoral Fellowship award (GOIPD/2017/1204)
and theADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme(Grant 13/RC/2106) andco-funded by
theEuropeanRegionalDevelopmentFund.
www.adaptcentre.ieWhat is a good digital library?
• Literature: Success of DL depends on the quality of available
metadata
• How do you define good quality metadata? It’s subjective,
there is no definite answer
• Potentially, an easy question to answer, but definitely not
generic
www.adaptcentre.ieLinked Data in Digital Libraries
• Data interoperability & re-usability
• Resource discoverability & visibility
• Data interlinking
www.adaptcentre.ieSo why the slow uptake?
• Linked Data is not a solution that solves all problems
• Quality issues as noted by various literature
www.adaptcentre.ieThe Aims of this Study
• What quality measures do IPs consider important?
o Why? Can we identify the generic quality measures for the
task at hand?
• What quality problems do IPs face when using Linked
Data?
o Why? Focus quality assessment on Digital Library Linked
Datasets
www.adaptcentre.ieSurvey Methodology
• Online questionnaire
o Snowball Sampling (Twitter, Email, Mailing lists)
• 50 Questions
o Primarily multiple choice – able to add own observations
o Partially based on:
 Previous surveys and analysis of projects in domain
 2 Data quality focused questions
www.adaptcentre.ieSurvey Methodology
• 185 participants
o Split in 2 groups:
 G1: Participants who have experience working in LD (n=54)
 G2: Participants who do not have experience working in LD
(n=131)
• Academic Library (56%), Research Institution (7%), Public
Library (7%), Special Library (6%), Archive (6%), National
Library (5%), Museum (4%), and Special Archive (1%)
• 20 countries
o Ireland (28%), the USA (23%) and the UK (20%)
www.adaptcentre.ieResults and discussion of the whole survey
McKenna, L., Debruyne, C., O’Sullivan, D.: Understanding
the position of information professionals with regards to
linked data: A survey of libraries, archives and museums.
In: Proceedings of the 18th ACM/IEEE on Joint Conference
on Digital Libraries (JCDL 2018), Fort Worth, Texas, USA,
June 3rd-7th, 2018. pp. 7–16 (2018)
www.adaptcentre.ieThe Questions
Q1. When completing different metadata tasks, what
evaluation criteria do you apply when using, or
searching for, external data sources?
Q2. Can you give an example of a data quality issue or
concern you experience frequently?
www.adaptcentre.ieKey Findings – Q1
Q1. When completing different metadata tasks, what
evaluation criteria do you apply when using, or
searching for, external data sources?
Q2. Can you give an example of a data quality issue or
concern you experience frequently?
www.adaptcentre.ieKey Findings – Q1
GOAL: Understand what fitness for use means for the
survey participants in a digital library scenario.
• 11 dimensions and 2 generic options (none, other)
o Trustworthiness, Interoperability, Licensing,
Completeness, Understandability, Provenance, Timeliness,
Syntactic Validity, Availability, Conciseness, Versatility
www.adaptcentre.ieKey Findings – Q1
www.adaptcentre.ieKey Findings – Q1
• Statistical Testing: Do both groups consider each
measure to be of equal importance or otherwise?
• Z-score (α = 0.05)
• Reject null hypothesis for: Trustworthiness,
Interoperability and Availability.
www.adaptcentre.ieKey Findings – Q2
Q1. When completing different metadata tasks, what
evaluation criteria do you apply when using, or
searching for, external data sources?
Q2. Can you give an example of a data quality issue or
concern you experience frequently?
www.adaptcentre.ieKey Findings – Q2
GOAL: Understanding quality pitfalls in Linked Data
datasets for Digital Libraries.
• Open question:
o 92 responses => 77 quality problems
o 14 different quality measures
www.adaptcentre.ieKey Findings – Q2
www.adaptcentre.ieKey Findings – Q2
• Semantic Accuracy
o Incorrect DOIs
o Wrong ISBNs, URI references
• Completeness / Data Coverage
o Incomplete crowdsource efforts
o Incomplete important fields (e.g. publication date)
o Using old standards hence having incomplete obligatory
fields.
www.adaptcentre.ieKey Findings – Q2
• Interoperability
o Lacks structured standards
o Metadata formats changing constantly
• Data formatting
o Inconsistent formatting of dates
o Naming inconsistencies (e.g. first name, last name vs last
name, first name)
www.adaptcentre.ieKey Findings – Q2
• Other problems
o Conciseness - duplication
o Language Versatility – encoding problems
o Availability – resources are not always available
o Trustworthiness – credibility of the information on the
Web
o Licensing – using open datasets freely
www.adaptcentre.ieNext Steps
• Assess the quality of LD digital libraries
o Started a monthly assessment in August 2018
o Some results can be seen at http://luzzu.adaptcentre.ie
• Identify a quality profile to generalise an answer for
“What is a good digital library?”
www.adaptcentre.ieConclusion
• Discussed and identified the quality measures an IP
considers for finding external sources
o no agreement on importance or otherwise for 3 metrics
(trustworthiness, interoperability, and availability)
• Discussed quality problems as identified by the IPs in the
currently available data sources
o Mostly intrinsic in nature
jeremy.debattista@adaptcentre.ie
twitter: @jerdeb

More Related Content

What's hot

Objects in Motion The Institutional Repositories Landscape
Objects in Motion The Institutional Repositories LandscapeObjects in Motion The Institutional Repositories Landscape
Objects in Motion The Institutional Repositories Landscape
Gaz Johnson
 
Henderson "Institutional Identifiers"
Henderson "Institutional Identifiers"Henderson "Institutional Identifiers"
Henderson "Institutional Identifiers"
National Information Standards Organization (NISO)
 
Why does research data matter to libraries
Why does research data matter to librariesWhy does research data matter to libraries
Why does research data matter to libraries
Jisc RDM
 
How metadata drives data sharing; UK Data Archive
How metadata drives data sharing; UK Data Archive How metadata drives data sharing; UK Data Archive
How metadata drives data sharing; UK Data Archive
Louise Corti
 
2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1
Dr.-Ing. Thomas Hartmann
 
Hampson "Our Open Future"
Hampson "Our Open Future"Hampson "Our Open Future"
Burton - Security, Privacy and Trust
Burton - Security, Privacy and TrustBurton - Security, Privacy and Trust
Burton - Security, Privacy and Trust
National Information Standards Organization (NISO)
 
Hoffman and Rajan "Metadata: The Importance of Interoperability, and Factors ...
Hoffman and Rajan "Metadata: The Importance of Interoperability, and Factors ...Hoffman and Rajan "Metadata: The Importance of Interoperability, and Factors ...
Hoffman and Rajan "Metadata: The Importance of Interoperability, and Factors ...
National Information Standards Organization (NISO)
 
Authority files - Jisc Digital Festival 2014
Authority files - Jisc Digital Festival 2014Authority files - Jisc Digital Festival 2014
Authority files - Jisc Digital Festival 2014
Jisc
 
LACE Flyer 2016
LACE Flyer 2016 LACE Flyer 2016
LACE Flyer 2016
Hendrik Drachsler
 
Indonesia Open Data Initiative - Kofera Technology
Indonesia Open Data Initiative - Kofera TechnologyIndonesia Open Data Initiative - Kofera Technology
Indonesia Open Data Initiative - Kofera Technology
Bachtiar Rifai
 
Data challenges for researchers
Data challenges for researchersData challenges for researchers
Data challenges for researchers
Michael Hoffman
 
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PaNOSC
 
Open Data Bay Area: Interesting Problems in Academic Data
Open Data Bay Area: Interesting Problems in Academic DataOpen Data Bay Area: Interesting Problems in Academic Data
Open Data Bay Area: Interesting Problems in Academic Data
William Gunn
 
Orcutt ivey New Needs New Approaches: Libraries as Technology Collaborators
Orcutt ivey New Needs New Approaches: Libraries as Technology CollaboratorsOrcutt ivey New Needs New Approaches: Libraries as Technology Collaborators
Orcutt ivey New Needs New Approaches: Libraries as Technology Collaborators
National Information Standards Organization (NISO)
 
Stereotype and most popular recommendations in the digital library Sowiport
Stereotype and most popular recommendations in the digital library SowiportStereotype and most popular recommendations in the digital library Sowiport
Stereotype and most popular recommendations in the digital library Sowiport
Joeran Beel
 
Active research management and sharing
Active research management and sharingActive research management and sharing
Active research management and sharing
Jisc
 
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
ARDC
 
Clarivate ERA Supplier rscd2018
Clarivate ERA Supplier rscd2018Clarivate ERA Supplier rscd2018
Clarivate ERA Supplier rscd2018
SusanMRob
 
Lorraine Beard RDM at the University of Manchester
Lorraine Beard RDM at the University of ManchesterLorraine Beard RDM at the University of Manchester
Lorraine Beard RDM at the University of Manchester
Jisc
 

What's hot (20)

Objects in Motion The Institutional Repositories Landscape
Objects in Motion The Institutional Repositories LandscapeObjects in Motion The Institutional Repositories Landscape
Objects in Motion The Institutional Repositories Landscape
 
Henderson "Institutional Identifiers"
Henderson "Institutional Identifiers"Henderson "Institutional Identifiers"
Henderson "Institutional Identifiers"
 
Why does research data matter to libraries
Why does research data matter to librariesWhy does research data matter to libraries
Why does research data matter to libraries
 
How metadata drives data sharing; UK Data Archive
How metadata drives data sharing; UK Data Archive How metadata drives data sharing; UK Data Archive
How metadata drives data sharing; UK Data Archive
 
2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1
 
Hampson "Our Open Future"
Hampson "Our Open Future"Hampson "Our Open Future"
Hampson "Our Open Future"
 
Burton - Security, Privacy and Trust
Burton - Security, Privacy and TrustBurton - Security, Privacy and Trust
Burton - Security, Privacy and Trust
 
Hoffman and Rajan "Metadata: The Importance of Interoperability, and Factors ...
Hoffman and Rajan "Metadata: The Importance of Interoperability, and Factors ...Hoffman and Rajan "Metadata: The Importance of Interoperability, and Factors ...
Hoffman and Rajan "Metadata: The Importance of Interoperability, and Factors ...
 
Authority files - Jisc Digital Festival 2014
Authority files - Jisc Digital Festival 2014Authority files - Jisc Digital Festival 2014
Authority files - Jisc Digital Festival 2014
 
LACE Flyer 2016
LACE Flyer 2016 LACE Flyer 2016
LACE Flyer 2016
 
Indonesia Open Data Initiative - Kofera Technology
Indonesia Open Data Initiative - Kofera TechnologyIndonesia Open Data Initiative - Kofera Technology
Indonesia Open Data Initiative - Kofera Technology
 
Data challenges for researchers
Data challenges for researchersData challenges for researchers
Data challenges for researchers
 
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
 
Open Data Bay Area: Interesting Problems in Academic Data
Open Data Bay Area: Interesting Problems in Academic DataOpen Data Bay Area: Interesting Problems in Academic Data
Open Data Bay Area: Interesting Problems in Academic Data
 
Orcutt ivey New Needs New Approaches: Libraries as Technology Collaborators
Orcutt ivey New Needs New Approaches: Libraries as Technology CollaboratorsOrcutt ivey New Needs New Approaches: Libraries as Technology Collaborators
Orcutt ivey New Needs New Approaches: Libraries as Technology Collaborators
 
Stereotype and most popular recommendations in the digital library Sowiport
Stereotype and most popular recommendations in the digital library SowiportStereotype and most popular recommendations in the digital library Sowiport
Stereotype and most popular recommendations in the digital library Sowiport
 
Active research management and sharing
Active research management and sharingActive research management and sharing
Active research management and sharing
 
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
 
Clarivate ERA Supplier rscd2018
Clarivate ERA Supplier rscd2018Clarivate ERA Supplier rscd2018
Clarivate ERA Supplier rscd2018
 
Lorraine Beard RDM at the University of Manchester
Lorraine Beard RDM at the University of ManchesterLorraine Beard RDM at the University of Manchester
Lorraine Beard RDM at the University of Manchester
 

Similar to Understanding Information Professionals: A Survey on the Quality of Linked Data Sources for Digital Libraries

Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
Lucy McKenna
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
National Information Standards Organization (NISO)
 
Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...
Historic Environment Scotland
 
Collaborate, Automate, Prepare, Prioritize: Creating Metadata for Legacy Rese...
Collaborate, Automate, Prepare, Prioritize: Creating Metadata for Legacy Rese...Collaborate, Automate, Prepare, Prioritize: Creating Metadata for Legacy Rese...
Collaborate, Automate, Prepare, Prioritize: Creating Metadata for Legacy Rese...
Jennifer Liss
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
Sarah Anna Stewart
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
ASIS&T
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
The University of Edinburgh
 
Research Data Management in GLAM: Managing Data for Cultural Heritage
Research Data Management in GLAM: Managing Data for Cultural HeritageResearch Data Management in GLAM: Managing Data for Cultural Heritage
Research Data Management in GLAM: Managing Data for Cultural Heritage
Sarah Anna Stewart
 
A Data Scientist Perspective on Data Curation in the Digital Era
A Data Scientist Perspective on Data Curation in the Digital EraA Data Scientist Perspective on Data Curation in the Digital Era
A Data Scientist Perspective on Data Curation in the Digital Era
Vicki Ferrini
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
National Information Standards Organization (NISO)
 
Data Quality
Data QualityData Quality
Data Quality
jerdeb
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
Erin D. Foster
 
Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc RDM
 
Webscale discovery and information literacy
Webscale discovery and information literacyWebscale discovery and information literacy
Webscale discovery and information literacy
li1smc
 
Webscale Discovery and Information Literacy
Webscale Discovery and Information LiteracyWebscale Discovery and Information Literacy
Webscale Discovery and Information Literacy
Charleston Conference
 
The future of the DCC
The future of the DCCThe future of the DCC
The future of the DCC
Chris Rusbridge
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
Enrico Daga
 
Supporting the development of a national Research Data Discovery Service – a ...
Supporting the development of a national Research Data Discovery Service – a ...Supporting the development of a national Research Data Discovery Service – a ...
Supporting the development of a national Research Data Discovery Service – a ...
EDINA, University of Edinburgh
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
Louise Corti
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data management
Incisive_Events
 

Similar to Understanding Information Professionals: A Survey on the Quality of Linked Data Sources for Digital Libraries (20)

Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...
 
Collaborate, Automate, Prepare, Prioritize: Creating Metadata for Legacy Rese...
Collaborate, Automate, Prepare, Prioritize: Creating Metadata for Legacy Rese...Collaborate, Automate, Prepare, Prioritize: Creating Metadata for Legacy Rese...
Collaborate, Automate, Prepare, Prioritize: Creating Metadata for Legacy Rese...
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 
Research Data Management in GLAM: Managing Data for Cultural Heritage
Research Data Management in GLAM: Managing Data for Cultural HeritageResearch Data Management in GLAM: Managing Data for Cultural Heritage
Research Data Management in GLAM: Managing Data for Cultural Heritage
 
A Data Scientist Perspective on Data Curation in the Digital Era
A Data Scientist Perspective on Data Curation in the Digital EraA Data Scientist Perspective on Data Curation in the Digital Era
A Data Scientist Perspective on Data Curation in the Digital Era
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Data Quality
Data QualityData Quality
Data Quality
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...
 
Webscale discovery and information literacy
Webscale discovery and information literacyWebscale discovery and information literacy
Webscale discovery and information literacy
 
Webscale Discovery and Information Literacy
Webscale Discovery and Information LiteracyWebscale Discovery and Information Literacy
Webscale Discovery and Information Literacy
 
The future of the DCC
The future of the DCCThe future of the DCC
The future of the DCC
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Supporting the development of a national Research Data Discovery Service – a ...
Supporting the development of a national Research Data Discovery Service – a ...Supporting the development of a national Research Data Discovery Service – a ...
Supporting the development of a national Research Data Discovery Service – a ...
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data management
 

Recently uploaded

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 

Recently uploaded (20)

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 

Understanding Information Professionals: A Survey on the Quality of Linked Data Sources for Digital Libraries

  • 1. Understanding Information Professionals: A Survey on the Quality of Linked Data Sources for Digital Libraries Jeremy Debattista, Lucy McKenna, Rob Brennan ADAPT Centre, Trinity College Dublin, Ireland This research has received funding from the Irish Research Council Government of Ireland Postdoctoral Fellowship award (GOIPD/2017/1204) and theADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme(Grant 13/RC/2106) andco-funded by theEuropeanRegionalDevelopmentFund.
  • 2. www.adaptcentre.ieWhat is a good digital library? • Literature: Success of DL depends on the quality of available metadata • How do you define good quality metadata? It’s subjective, there is no definite answer • Potentially, an easy question to answer, but definitely not generic
  • 3. www.adaptcentre.ieLinked Data in Digital Libraries • Data interoperability & re-usability • Resource discoverability & visibility • Data interlinking
  • 4. www.adaptcentre.ieSo why the slow uptake? • Linked Data is not a solution that solves all problems • Quality issues as noted by various literature
  • 5. www.adaptcentre.ieThe Aims of this Study • What quality measures do IPs consider important? o Why? Can we identify the generic quality measures for the task at hand? • What quality problems do IPs face when using Linked Data? o Why? Focus quality assessment on Digital Library Linked Datasets
  • 6. www.adaptcentre.ieSurvey Methodology • Online questionnaire o Snowball Sampling (Twitter, Email, Mailing lists) • 50 Questions o Primarily multiple choice – able to add own observations o Partially based on:  Previous surveys and analysis of projects in domain  2 Data quality focused questions
  • 7. www.adaptcentre.ieSurvey Methodology • 185 participants o Split in 2 groups:  G1: Participants who have experience working in LD (n=54)  G2: Participants who do not have experience working in LD (n=131) • Academic Library (56%), Research Institution (7%), Public Library (7%), Special Library (6%), Archive (6%), National Library (5%), Museum (4%), and Special Archive (1%) • 20 countries o Ireland (28%), the USA (23%) and the UK (20%)
  • 8. www.adaptcentre.ieResults and discussion of the whole survey McKenna, L., Debruyne, C., O’Sullivan, D.: Understanding the position of information professionals with regards to linked data: A survey of libraries, archives and museums. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (JCDL 2018), Fort Worth, Texas, USA, June 3rd-7th, 2018. pp. 7–16 (2018)
  • 9. www.adaptcentre.ieThe Questions Q1. When completing different metadata tasks, what evaluation criteria do you apply when using, or searching for, external data sources? Q2. Can you give an example of a data quality issue or concern you experience frequently?
  • 10. www.adaptcentre.ieKey Findings – Q1 Q1. When completing different metadata tasks, what evaluation criteria do you apply when using, or searching for, external data sources? Q2. Can you give an example of a data quality issue or concern you experience frequently?
  • 11. www.adaptcentre.ieKey Findings – Q1 GOAL: Understand what fitness for use means for the survey participants in a digital library scenario. • 11 dimensions and 2 generic options (none, other) o Trustworthiness, Interoperability, Licensing, Completeness, Understandability, Provenance, Timeliness, Syntactic Validity, Availability, Conciseness, Versatility
  • 13. www.adaptcentre.ieKey Findings – Q1 • Statistical Testing: Do both groups consider each measure to be of equal importance or otherwise? • Z-score (α = 0.05) • Reject null hypothesis for: Trustworthiness, Interoperability and Availability.
  • 14. www.adaptcentre.ieKey Findings – Q2 Q1. When completing different metadata tasks, what evaluation criteria do you apply when using, or searching for, external data sources? Q2. Can you give an example of a data quality issue or concern you experience frequently?
  • 15. www.adaptcentre.ieKey Findings – Q2 GOAL: Understanding quality pitfalls in Linked Data datasets for Digital Libraries. • Open question: o 92 responses => 77 quality problems o 14 different quality measures
  • 17. www.adaptcentre.ieKey Findings – Q2 • Semantic Accuracy o Incorrect DOIs o Wrong ISBNs, URI references • Completeness / Data Coverage o Incomplete crowdsource efforts o Incomplete important fields (e.g. publication date) o Using old standards hence having incomplete obligatory fields.
  • 18. www.adaptcentre.ieKey Findings – Q2 • Interoperability o Lacks structured standards o Metadata formats changing constantly • Data formatting o Inconsistent formatting of dates o Naming inconsistencies (e.g. first name, last name vs last name, first name)
  • 19. www.adaptcentre.ieKey Findings – Q2 • Other problems o Conciseness - duplication o Language Versatility – encoding problems o Availability – resources are not always available o Trustworthiness – credibility of the information on the Web o Licensing – using open datasets freely
  • 20. www.adaptcentre.ieNext Steps • Assess the quality of LD digital libraries o Started a monthly assessment in August 2018 o Some results can be seen at http://luzzu.adaptcentre.ie • Identify a quality profile to generalise an answer for “What is a good digital library?”
  • 21. www.adaptcentre.ieConclusion • Discussed and identified the quality measures an IP considers for finding external sources o no agreement on importance or otherwise for 3 metrics (trustworthiness, interoperability, and availability) • Discussed quality problems as identified by the IPs in the currently available data sources o Mostly intrinsic in nature jeremy.debattista@adaptcentre.ie twitter: @jerdeb

Editor's Notes

  1. say that we use the term IPs for people working the the DL domain
  2. ASK the room: what is a good digital library? most literature state that the success of digital libraries is mostly dependent on the quality of the available metadata however, this is quite ambigious, as defining quality is subjective and mostly depends on the task at hand, in this case different institutions have different needs which are mostly coupled with information professional experiences and roles. an easy question to answer, however cannot generalise which library is best for all cases and suitable everyone
  3. IPs realised that LD offers many benifits: since we are using a standardised data model sharing and re-use of metadata across DLs, potentially reducing record duplication discoverability by various agents (e.g. using rdfa within html pages would enable search engines such as google to retrieve your data in a meaningful manner) interlink related resources
  4. - challenges in LD, mostly wrt quality
  5. we did a survey to understand: (1) what kind of quality measures IPs consider important; (2) what are the problems they face when using linked data for (1) we want to build a system that automatically suggest what quality measures one need for a particular task at hand – which is what we are working on; (2) figure out what kinds of quality issues they encounter and try to validate these issues by actually assessing various linked data digital libraries
  6. Results of OCLC survey & Library survey Analysis of LAM LD projects & LD tooling Prior work with Digital Repository of Trinity College Dublin (McKenna et al, 2017)
  7. The 185 questionnaires that were analysed were classified into two groups: participants who have experience working with Linked Data (N = 54) (group 1), and participants who do not have experience working with Linked Data (N = 131) (group 2).
  8. our goal: understand what fitness for use mean to the participants in different groups in DLs provided a list of quality measures taken from various literature, trying to represent the important measures for both DL and LD – Trustworthiness (e.g. Can this provider be trusted that all data is correct?) – Interoperability (e.g. Does the external source use well-known standard schemas to represent the data?) – Licensing issues (e.g. Can I use this external source freely?) – Completeness (e.g. Do all external metadata fields have values?) – Understandability (e.g. Are all records in the external source labelled and ready for human consumption?) – Provenance (e.g. Does the external source provide provenance/origin information on the data?) – Timeliness (e.g. Are all records up to date?) – Syntactic validity (e.g. Are dates in the correct format, correct spelling?) – Availability of the external source (e.g. SPARQL endpoint is accessible) – Conciseness (e.g. Is there any redundancy within the external source?) – Versatility (e.g. Is the data available in different languages?)
  9. all participants answered this question, 16 of them being unsure or do not care about quality at all. We also had 9 participants mentioned dimensions different that those listed – marked in a brackets in the table and in italics font The aggregated results show that trustworthiness seems to be the most frequently selected criteria, around 67%, followed by interoperability and licensing however we cannot statistically or scientifically state that trustworthiness  is the most important criteria as we cannot assume that participants chose on basis of whats the most important
  10. statistical test to find evidence whether the two groups consider a measures to be of equal importance or otherwise for this we defined a null hypothesis and an alternative hypothesis and used z-score with a significance level of 0.05 to identify whether there is enough strong evidence to reject of accept the null hypothesis the tests show that there is no supporting evidence that suggest the measures trustworthiness, interoperability , and availability are of equal importance.
  11. the goal of this 2nd survey was the understand better the quality problems IPs find in LD and hence are contributing to the slow uptake this question was an open question and was answered by 92 participants, of which 15 were out of scope the rest where classified into 14 different measures, including semantic accuracy, completeness, and conciseness
  12. most problems were intrinsic in nature, meaning that problems where related to the data in the dataset itself followed by representational problems, meaning the way how data is represented for consumption
  13. not going through all problems, but will mention some interesting ones in semantic accuracy, most participants complained about the presence of incorrect values in various fields of a catalogue resource, mostly due to mispellings in completeness, a number of participants shed doubt to whether information in crowdsource efforts such as wikipedia (and hence dbpedia) are correct and complete. furthermore, participants also complained datasets that are still using old best practices and standards and hence having incomplete fields, such as the publication data
  14. wrt interoperability, participants mostly noted that there is a lack of consensus on what standards to use and even when there seem to be an agreement, the formats are constantly changing on similar lines to interoperability, participants noted that there are also inconsistnecies on how to represent the data within fields, such as dates and naming standards
  15. there were other problems highlighted by the participants, for example the duplication of records, introducing redundancy and increasing errors encoding of characters, e.g. usage of cyrillic alphabet in international authority data the 24/7 availability of data and reliability of online services a participant noted that he/she would trust more datasets that were published by his/her own institutions rather than rely on information readily available on the web to what extent can I use a particular external dataset, if the license is not clear or readily available in the dataset?
  16. - Discussed and identified the quality measures an IP considers for finding external sources - no agreement on importance for 3 metrics (trustworthiness, interoperability, and availability) Discussed quality problems as identified by the IPs in the current available data sources problems are mostly intrisic in nature, identifying semantic accuracy and interoperability as worrying dimensions in which LD should excel in should serve as a starting point for LD publishers to update their publishing mechanisms