In this presentation we discuss the results of a survey we conducted with various Information Professionals with regard to the quality of external Linked Data sources used in digital library metadata. This was presented during ODBASE 2018 (http://www.otmconferences.org/index.php/conferences/odbase-2018).
Article Citation:
Jeremy Debattista, Lucy McKenna, Rob Brennan: Understanding Information Professionals: A Survey on the Quality of Linked Data Sources for Digital Libraries. OTM Conferences (2) 2018: 537-545
Acknowledgement:
This research has received funding from the Irish Research Council Government of Ireland Postdoctoral Fellowship award (GOIPD/2017/1204) and the ADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme (Grant 13/RC/2106) and co-funded by the European Regional Development Fund.
#LoveIrishResearch
This document summarizes a presentation on research data metrics from the NISO Altmetrics Working Group B. It discusses various metrics for research data, including citations of datasets and metadata, full-text search of datasets, downloads, and usage statistics. It also describes projects from DataCite and the Making Data Count initiative that are working to develop standard metrics for research data and make them available via APIs. Future work discussed includes analyzing networks of linked datasets and second-order citations.
Sarah Jones RDM from a disciplinary perspectiveJisc
This document discusses research data management from a disciplinary perspective. It begins with an overview of case studies on disciplinary practice from various sources. It then groups disciplines into Arts & Humanities, Social Sciences, Sciences & Engineering, and Life Sciences. For each group, it discusses common practices, challenges, and examples. It also discusses a research data typology commissioned by RLUK to help librarians understand researchers' data needs and types of data across disciplines. Overall, the document provides a high-level overview of differences in research data management practices across broad disciplinary categories.
Why science needs open data – Jisc and CNI conference 10 July 2014Jisc
This document discusses the importance of open data in science. It provides 4 key reasons why open data is important:
1) It allows for identification of patterns in large datasets that could not be found otherwise.
2) It enables data modeling through iterative integration of initial models with observational data.
3) It facilitates deeper integration and analysis of diverse linked datasets.
4) It supports exploitation of networked sensor data through acquisition, integration, analysis and feedback.
However, open data needs to be "intelligently open" through being discoverable, accessible, intelligible, assessable and reusable to realize its full potential. Mandating such intelligent open data is important to drive an open data infrastructure ecology.
Rachel Bruce UK research and data management where are we nowJisc
The document discusses the state of research data management in UK universities. It finds that while areas like data cataloguing and access/storage systems are progressing, governance of data access/reuse and digital preservation/planning are lagging. Barriers to progress include low researcher priority, funding availability, and lack of staff/infrastructure. Gaps include defining responsibilities, standards, costs, and tools. Coordination and sharing resources across institutions is needed to help universities advance research data management.
This presentation was provided by Gabriela Mejias of ORCID, during the NISO hot topic virtual conference "Open Research." The event was held on November 17, 2021.
Mike Mertens Directions for RDM day one summaryJisc
This document discusses directions for research data management in UK universities. It focuses on the business case and sustainability for implementing research data management plans and services. Key points include identifying the need, risks of not having plans, staffing and storage costs, advocacy efforts, and long-term preservation strategies. The document also discusses incentives for researchers to properly manage data, such as reward structures, compliance monitoring, opportunities for data publication and citation, integration of support systems, and aligning job descriptions with open data practices. Overall it provides guidance on justifying research data management programs through identifying institutional needs and risks, accounting for costs and scalability, and incentivizing researcher participation.
Research Data Management in Academic Libraries: Meeting the ChallengeSpencer Keralis
TLA Program Committee sponsored Preconference talk from Texas Library Association Conference 2013.
CPE#388: SBEC 1.0; TSLAC 1.0
April 24, 2013; 4:00 -4:50 pm
Managing research data is a hot topic in academic libraries. With increased government oversight of publicly-funded research projects, librarians must strive to meet the demand for innovative solutions for managing research information and training the new eneration of librarians to address this issue.
This document summarizes a presentation on research data metrics from the NISO Altmetrics Working Group B. It discusses various metrics for research data, including citations of datasets and metadata, full-text search of datasets, downloads, and usage statistics. It also describes projects from DataCite and the Making Data Count initiative that are working to develop standard metrics for research data and make them available via APIs. Future work discussed includes analyzing networks of linked datasets and second-order citations.
Sarah Jones RDM from a disciplinary perspectiveJisc
This document discusses research data management from a disciplinary perspective. It begins with an overview of case studies on disciplinary practice from various sources. It then groups disciplines into Arts & Humanities, Social Sciences, Sciences & Engineering, and Life Sciences. For each group, it discusses common practices, challenges, and examples. It also discusses a research data typology commissioned by RLUK to help librarians understand researchers' data needs and types of data across disciplines. Overall, the document provides a high-level overview of differences in research data management practices across broad disciplinary categories.
Why science needs open data – Jisc and CNI conference 10 July 2014Jisc
This document discusses the importance of open data in science. It provides 4 key reasons why open data is important:
1) It allows for identification of patterns in large datasets that could not be found otherwise.
2) It enables data modeling through iterative integration of initial models with observational data.
3) It facilitates deeper integration and analysis of diverse linked datasets.
4) It supports exploitation of networked sensor data through acquisition, integration, analysis and feedback.
However, open data needs to be "intelligently open" through being discoverable, accessible, intelligible, assessable and reusable to realize its full potential. Mandating such intelligent open data is important to drive an open data infrastructure ecology.
Rachel Bruce UK research and data management where are we nowJisc
The document discusses the state of research data management in UK universities. It finds that while areas like data cataloguing and access/storage systems are progressing, governance of data access/reuse and digital preservation/planning are lagging. Barriers to progress include low researcher priority, funding availability, and lack of staff/infrastructure. Gaps include defining responsibilities, standards, costs, and tools. Coordination and sharing resources across institutions is needed to help universities advance research data management.
This presentation was provided by Gabriela Mejias of ORCID, during the NISO hot topic virtual conference "Open Research." The event was held on November 17, 2021.
Mike Mertens Directions for RDM day one summaryJisc
This document discusses directions for research data management in UK universities. It focuses on the business case and sustainability for implementing research data management plans and services. Key points include identifying the need, risks of not having plans, staffing and storage costs, advocacy efforts, and long-term preservation strategies. The document also discusses incentives for researchers to properly manage data, such as reward structures, compliance monitoring, opportunities for data publication and citation, integration of support systems, and aligning job descriptions with open data practices. Overall it provides guidance on justifying research data management programs through identifying institutional needs and risks, accounting for costs and scalability, and incentivizing researcher participation.
Research Data Management in Academic Libraries: Meeting the ChallengeSpencer Keralis
TLA Program Committee sponsored Preconference talk from Texas Library Association Conference 2013.
CPE#388: SBEC 1.0; TSLAC 1.0
April 24, 2013; 4:00 -4:50 pm
Managing research data is a hot topic in academic libraries. With increased government oversight of publicly-funded research projects, librarians must strive to meet the demand for innovative solutions for managing research information and training the new eneration of librarians to address this issue.
This document summarizes Helen Henderson's presentation on institutional identifiers. It discusses existing standards like ONIX, COUNTER, and ISSN, as well as new standards being developed like KBART, Project TRANSFER, and CORE. It outlines several scenarios where institutional identifiers could be used, such as in the electronic resources supply chain, eLearning, research funding, and author registries. It describes the stakeholders involved in each scenario and key issues to address. Finally, it provides the timeline and work plan for the NISO working group developing a new institutional identifier standard.
Why does research data matter to librariesJisc RDM
- Research data matters to libraries because it is increasingly being produced and collected by researchers, and there are growing requirements to manage and preserve it.
- A survey found that while most researchers currently manage their own data, there is a trend toward using institutional repositories and libraries more for long-term preservation.
- Libraries are well-suited to help with research data management because of their experience organizing and describing information over long periods of time, but there are also challenges due to differences across disciplines in how data is defined and treated.
- As funders and journals require better data sharing practices, libraries have an opportunity to take a more active role in helping researchers and institutions capture, describe, and manage research data over
How metadata drives data sharing; UK Data Archive Louise Corti
The document discusses metadata and its importance for archiving survey data. It summarizes that metadata drives access to survey data through online browsing systems by providing essential documentation about the variables, questions, and structure of the surveys. It notes common issues with deposited survey metadata including a lack of consistent variable naming and incomplete documentation of changes over time. Improving metadata practices throughout the data lifecycle from production to archiving is important to support reuse of the data.
This document discusses leveraging the DDI (Data Documentation Initiative) model for linked statistical data in the social, behavioral, and economic sciences. It outlines how the DDI was developed as an ontology, including using use cases to identify important elements to model and mapping existing DDI-XML documents to DDI-RDF. A key use case is discovering microdata connected across multiple studies based on dimensions like time, country, and subject. The document walks through examples of queries this ontology would support, such as finding questions associated with a concept or the maximum value of a variable. It concludes by identifying some open issues to address in the DDI ontology.
This presentation was provided by Glenn Hampson of Open Scholarship Initiative, during the NISO hot topic virtual conference "Open Research." The event was held on November 17, 2021.
This presentation was provided by Dr. Paul Burton of the University of Bristol during the NISO Symposium, Privacy Implications of Research Data, held on September 11, 2016, in conjunction with the International Data Week in Denver, Colorado.
The document discusses the importance of metadata interoperability and factors to consider when developing metadata implementation strategies. It covers challenges with expanding metadata types and use, the importance of standardizing metadata for exchange and reuse, and considerations for developing strategies including defining the entity being described, applicable standards, and testing implementation. The presentation provides examples of metadata integration between library systems and highlights ONIX as a potential standard for expressing license terms between organizations.
This document summarizes a workshop on authority files. It discusses how authority files can transform from library silos to a web of linked data by uniquely identifying entities like people, publications, organizations, and connecting them using identifiers. Four use cases are presented: developing a repository authority file, enhancing a journal authority file to track open access evolution, integrating existing authority files to make cultural data web compliant, and using authority files to enable new analyses and business intelligence from research information systems. The benefits of authority files for discovery, reliability, accountability, and efficiency are outlined. An example of crosswalking different authority files is also provided. The document concludes with an opinion poll on authority file topics.
The LACE project connects players in the fields of learning analytics and educational data mining to support the development of a European community and share best practices. The project aims to promote knowledge sharing, increase the evidence base of learning analytics, and contribute to defining its future directions. Key activities include organizing events, creating a knowledge base of learning analytics evidence, and producing reviews on latest developments in the field.
Indonesia Open Data Initiative - Kofera TechnologyBachtiar Rifai
Indonesian public's enthusiasm on research in the field of machine learning are on the rise. Together, Kofera & Data Science Indonesia launch "Indonesia Open Data Initiative" to tackle the barrier to entry in machine learning research field.
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...PaNOSC
On March 12th, 2021, PaNOSC coordinator, Andy Götz, attended with an invited talk the 2nd online workshop of the Battery2030+ Initiative, focused on the benefits of research data management (RDM) and guidelines, through the showcase of best practice examples, including PaNOSC.
Open Data Bay Area: Interesting Problems in Academic DataWilliam Gunn
This document discusses several problems in academic data including:
1. Academic data includes metadata about scholarly outputs but is not like commercial data from companies.
2. Academia has a conservative culture and incentives prioritize publishing over openly sharing data and code.
3. Issues around making data and code citable and improving reproducibility need to be addressed through initiatives like DateCite and CrossRef.
4. Problems like author disambiguation, increasing age of grant awardees, and recommender systems would benefit from better academic data.
This presentation was jointly provided by Darby Orcutt and Susan Ivey, both of North Carolina State University during the NISO Virtual Conference, That Cutting Edge: Technology's Impact on Scholarly Research Processes in the Library, held on October 24, 2018.
Stereotype and most popular recommendations in the digital library SowiportJoeran Beel
Stereotype and most-popular recommendations are widely neglected in the research-paper recommender-system and digital-library community. In other domains such as movie recommendations and hotel search, however, these recommendation approaches have proven their effectiveness. We were interested to find out how stereotype and most-popular recommendations would perform in the scenario of a digital library. Therefore, we implemented the two approaches in the recommender system of GESIS’ digital library Sowiport, in cooperation with the recommendations-as-a-service provider Mr. DLib. We measured the effectiveness of most-popular and stereotype recommendations with click-through rate (CTR) based on 28 million delivered recommendations. Most-popular recommendations achieved a CTR of 0.11%, and stereotype recommendations achieved a CTR of 0.124%. Compared to a “random recommendations” baseline (CTR 0.12%), and a content-based filtering baseline (CTR 0.145%), the results are discouraging. However, for reasons explained in the paper, we concluded that more research is necessary about the effectiveness of stereotype and most-popular recommendations in digital libraries.
Open science framework – Jeff Spies, Centre for Open Science
Active research from lab to publication – Simon Coles, University of Southampton
Managing active research in the university – Robin Rice, University of Edinburgh
Making research available: FAIR principles and Force 11 - David De Roure, Oxford e-Research Centre
Jisc and CNI conference, 6 July 2016
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...ARDC
Dr Jacobs' introduction to the RIA Data Management Workshop in Brisbane on 31 March 2017. The RIA Data Management Workshop series is a joint collaboration of the Australian Research Council, the National Health and Medical Research Council, the Australasian Research Management Society and the Australian National Data Service.
Clarivate was selected as the citation provider for the 2018 Excellence in Research for Australia (ERA) evaluation. The role involved preparing, submitting, and checking citation data from Clarivate's databases to support the ERA evaluation. Clarivate mapped institutional publication records to citations in the Web of Science and provided tagging portals and APIs to help with this process. They also offered seminars and support to help universities understand the citation data and benchmarks. Moving forward, Clarivate wants to leverage feedback to better support the quality of Australian research.
Lorraine Beard RDM at the University of ManchesterJisc
The University of Manchester has established a Research Data Management service and policy to support researchers in managing their research data. The RDM service was launched in 2011 and is a collaboration between the University Library and IT Services. It aims to provide guidance, tools, and infrastructure to help researchers comply with funder data sharing requirements and best practices for data management, storage, and preservation. Key challenges for the future include developing metadata standards, tools for data sharing and publishing, coordinating expertise across departments, and adapting to a changing research environment and funder landscape.
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
Through the use of Linked Data (LD), Libraries, Archives and Museums (LAMs) have the potential to expose their collections to a larger audience and to allow for more efficient user searches. Despite this, relatively few LAMs have invested in LD projects and the majority of these display limited interlinking across datasets and institutions. A survey was conducted to understand Information Professionals' (IPs') position with regards to LD, with a particular focus on the interlinking problem. The survey was completed by 185 librarians, archivists, metadata cataloguers and researchers. Results indicated that, when interlinking, IPs find the process of ontology and property selection to be particularly challenging, and LD tooling to be technologically complex and unsuitable for their needs.
Our research is focused on developing an authoritative interlinking framework for LAMs with a view to increasing IP engagement in the linking process. Our framework will provide a set of standards to facilitate IPs in the selection of link types, specifically when linking local resources to authorities. The framework will include guidelines for authority, ontology and property selection, and for adding provenance data. A user-interface will be developed which will direct IPs through the resource interlinking process as per our framework. Although there are existing tools in this domain, our framework differs in that it will be designed with the needs and expertise of IPs in mind. This will be achieved by involving IPs in the design and evaluation of the framework. A mock-up of the interface has already been tested and adjustments have been made based on results. We are currently working on developing a minimal viable product so as to allow for further testing of the framework. We will present our updated framework, interface, and proposed interlinking solutions.
Opening Keynote: From where we are to where we want to be: The future of resource discovery from a UK perspective
Neil Grindley, Head of Resource Discovery, Jisc
This document summarizes Helen Henderson's presentation on institutional identifiers. It discusses existing standards like ONIX, COUNTER, and ISSN, as well as new standards being developed like KBART, Project TRANSFER, and CORE. It outlines several scenarios where institutional identifiers could be used, such as in the electronic resources supply chain, eLearning, research funding, and author registries. It describes the stakeholders involved in each scenario and key issues to address. Finally, it provides the timeline and work plan for the NISO working group developing a new institutional identifier standard.
Why does research data matter to librariesJisc RDM
- Research data matters to libraries because it is increasingly being produced and collected by researchers, and there are growing requirements to manage and preserve it.
- A survey found that while most researchers currently manage their own data, there is a trend toward using institutional repositories and libraries more for long-term preservation.
- Libraries are well-suited to help with research data management because of their experience organizing and describing information over long periods of time, but there are also challenges due to differences across disciplines in how data is defined and treated.
- As funders and journals require better data sharing practices, libraries have an opportunity to take a more active role in helping researchers and institutions capture, describe, and manage research data over
How metadata drives data sharing; UK Data Archive Louise Corti
The document discusses metadata and its importance for archiving survey data. It summarizes that metadata drives access to survey data through online browsing systems by providing essential documentation about the variables, questions, and structure of the surveys. It notes common issues with deposited survey metadata including a lack of consistent variable naming and incomplete documentation of changes over time. Improving metadata practices throughout the data lifecycle from production to archiving is important to support reuse of the data.
This document discusses leveraging the DDI (Data Documentation Initiative) model for linked statistical data in the social, behavioral, and economic sciences. It outlines how the DDI was developed as an ontology, including using use cases to identify important elements to model and mapping existing DDI-XML documents to DDI-RDF. A key use case is discovering microdata connected across multiple studies based on dimensions like time, country, and subject. The document walks through examples of queries this ontology would support, such as finding questions associated with a concept or the maximum value of a variable. It concludes by identifying some open issues to address in the DDI ontology.
This presentation was provided by Glenn Hampson of Open Scholarship Initiative, during the NISO hot topic virtual conference "Open Research." The event was held on November 17, 2021.
This presentation was provided by Dr. Paul Burton of the University of Bristol during the NISO Symposium, Privacy Implications of Research Data, held on September 11, 2016, in conjunction with the International Data Week in Denver, Colorado.
The document discusses the importance of metadata interoperability and factors to consider when developing metadata implementation strategies. It covers challenges with expanding metadata types and use, the importance of standardizing metadata for exchange and reuse, and considerations for developing strategies including defining the entity being described, applicable standards, and testing implementation. The presentation provides examples of metadata integration between library systems and highlights ONIX as a potential standard for expressing license terms between organizations.
This document summarizes a workshop on authority files. It discusses how authority files can transform from library silos to a web of linked data by uniquely identifying entities like people, publications, organizations, and connecting them using identifiers. Four use cases are presented: developing a repository authority file, enhancing a journal authority file to track open access evolution, integrating existing authority files to make cultural data web compliant, and using authority files to enable new analyses and business intelligence from research information systems. The benefits of authority files for discovery, reliability, accountability, and efficiency are outlined. An example of crosswalking different authority files is also provided. The document concludes with an opinion poll on authority file topics.
The LACE project connects players in the fields of learning analytics and educational data mining to support the development of a European community and share best practices. The project aims to promote knowledge sharing, increase the evidence base of learning analytics, and contribute to defining its future directions. Key activities include organizing events, creating a knowledge base of learning analytics evidence, and producing reviews on latest developments in the field.
Indonesia Open Data Initiative - Kofera TechnologyBachtiar Rifai
Indonesian public's enthusiasm on research in the field of machine learning are on the rise. Together, Kofera & Data Science Indonesia launch "Indonesia Open Data Initiative" to tackle the barrier to entry in machine learning research field.
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...PaNOSC
On March 12th, 2021, PaNOSC coordinator, Andy Götz, attended with an invited talk the 2nd online workshop of the Battery2030+ Initiative, focused on the benefits of research data management (RDM) and guidelines, through the showcase of best practice examples, including PaNOSC.
Open Data Bay Area: Interesting Problems in Academic DataWilliam Gunn
This document discusses several problems in academic data including:
1. Academic data includes metadata about scholarly outputs but is not like commercial data from companies.
2. Academia has a conservative culture and incentives prioritize publishing over openly sharing data and code.
3. Issues around making data and code citable and improving reproducibility need to be addressed through initiatives like DateCite and CrossRef.
4. Problems like author disambiguation, increasing age of grant awardees, and recommender systems would benefit from better academic data.
This presentation was jointly provided by Darby Orcutt and Susan Ivey, both of North Carolina State University during the NISO Virtual Conference, That Cutting Edge: Technology's Impact on Scholarly Research Processes in the Library, held on October 24, 2018.
Stereotype and most popular recommendations in the digital library SowiportJoeran Beel
Stereotype and most-popular recommendations are widely neglected in the research-paper recommender-system and digital-library community. In other domains such as movie recommendations and hotel search, however, these recommendation approaches have proven their effectiveness. We were interested to find out how stereotype and most-popular recommendations would perform in the scenario of a digital library. Therefore, we implemented the two approaches in the recommender system of GESIS’ digital library Sowiport, in cooperation with the recommendations-as-a-service provider Mr. DLib. We measured the effectiveness of most-popular and stereotype recommendations with click-through rate (CTR) based on 28 million delivered recommendations. Most-popular recommendations achieved a CTR of 0.11%, and stereotype recommendations achieved a CTR of 0.124%. Compared to a “random recommendations” baseline (CTR 0.12%), and a content-based filtering baseline (CTR 0.145%), the results are discouraging. However, for reasons explained in the paper, we concluded that more research is necessary about the effectiveness of stereotype and most-popular recommendations in digital libraries.
Open science framework – Jeff Spies, Centre for Open Science
Active research from lab to publication – Simon Coles, University of Southampton
Managing active research in the university – Robin Rice, University of Edinburgh
Making research available: FAIR principles and Force 11 - David De Roure, Oxford e-Research Centre
Jisc and CNI conference, 6 July 2016
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...ARDC
Dr Jacobs' introduction to the RIA Data Management Workshop in Brisbane on 31 March 2017. The RIA Data Management Workshop series is a joint collaboration of the Australian Research Council, the National Health and Medical Research Council, the Australasian Research Management Society and the Australian National Data Service.
Clarivate was selected as the citation provider for the 2018 Excellence in Research for Australia (ERA) evaluation. The role involved preparing, submitting, and checking citation data from Clarivate's databases to support the ERA evaluation. Clarivate mapped institutional publication records to citations in the Web of Science and provided tagging portals and APIs to help with this process. They also offered seminars and support to help universities understand the citation data and benchmarks. Moving forward, Clarivate wants to leverage feedback to better support the quality of Australian research.
Lorraine Beard RDM at the University of ManchesterJisc
The University of Manchester has established a Research Data Management service and policy to support researchers in managing their research data. The RDM service was launched in 2011 and is a collaboration between the University Library and IT Services. It aims to provide guidance, tools, and infrastructure to help researchers comply with funder data sharing requirements and best practices for data management, storage, and preservation. Key challenges for the future include developing metadata standards, tools for data sharing and publishing, coordinating expertise across departments, and adapting to a changing research environment and funder landscape.
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
Through the use of Linked Data (LD), Libraries, Archives and Museums (LAMs) have the potential to expose their collections to a larger audience and to allow for more efficient user searches. Despite this, relatively few LAMs have invested in LD projects and the majority of these display limited interlinking across datasets and institutions. A survey was conducted to understand Information Professionals' (IPs') position with regards to LD, with a particular focus on the interlinking problem. The survey was completed by 185 librarians, archivists, metadata cataloguers and researchers. Results indicated that, when interlinking, IPs find the process of ontology and property selection to be particularly challenging, and LD tooling to be technologically complex and unsuitable for their needs.
Our research is focused on developing an authoritative interlinking framework for LAMs with a view to increasing IP engagement in the linking process. Our framework will provide a set of standards to facilitate IPs in the selection of link types, specifically when linking local resources to authorities. The framework will include guidelines for authority, ontology and property selection, and for adding provenance data. A user-interface will be developed which will direct IPs through the resource interlinking process as per our framework. Although there are existing tools in this domain, our framework differs in that it will be designed with the needs and expertise of IPs in mind. This will be achieved by involving IPs in the design and evaluation of the framework. A mock-up of the interface has already been tested and adjustments have been made based on results. We are currently working on developing a minimal viable product so as to allow for further testing of the framework. We will present our updated framework, interface, and proposed interlinking solutions.
Opening Keynote: From where we are to where we want to be: The future of resource discovery from a UK perspective
Neil Grindley, Head of Resource Discovery, Jisc
The document summarizes a pilot project at the University of Edinburgh to support the development of a UK Research Data Discovery Service. PhD interns engaged with researchers from various schools to describe and deposit research datasets in the university's systems to be harvested by the discovery service. Observations found mixed results across schools, with humanities researchers less comfortable sharing data due to copyright and reluctance to share interpretations. Other schools had established data repositories causing less interest in the university's system. Building research data management practices will require tailored approaches and more training over time.
Collaborate, Automate, Prepare, Prioritize: Creating Metadata for Legacy Rese...Jennifer Liss
Data curation projects frequently deal with data that were not created for the purposes of long- term preservation and re-use. How can curation of such legacy data be improved by supplying necessary metadata? In this report, we address this and other questions by creating robust metadata for twenty legacy research datasets. We report on the metrics of creating domain- specific metadata and propose a four-prong framework of metadata creation for legacy research data. Our findings indicate that there is a steep learning curve in encoding metadata using the FGDC content standard for digital geospatial metadata. Our project demonstrates that when data curators are handed research data “as is,” they may be successful in incorporating such data into a data sharing environment. We found that data curators can be successful in creating descriptive metadata and enhancing discoverability via subject analysis. However, curators must be aware of the limitations in applying structural and administrative metadata for legacy data.
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
Jared Lyle, ICPSR
Jennifer Doty, Emory University
Joel Herndon, Duke University
Libbie Stephenson, University of California, Los Angeles
The document discusses recommendations from a workshop on peer review of research data. It focuses on three key areas:
1. Connecting data review with data management planning by requiring data sharing plans, ensuring adequate funding for data management, and refusing publication without clear data access.
2. Connecting scientific and technical review with data curation by linking articles and data with versioning, avoiding duplicate review efforts, and addressing issues found in data.
3. Connecting data review with article review by requiring methods/software information, providing review checklists, ensuring data access for reviewers, and permanent dataset identifiers from repositories.
Research Data Management in GLAM: Managing Data for Cultural HeritageSarah Anna Stewart
Presentation given at the 'Open Science Infrastructures for Big Cultural Data' - Advanced International Masterclass in Plovdiv, Bulgaria. Dec. 13-15, 2018
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Learning to Curate Research Data
Jennifer Doty, Research Data Librarian, Emory Center for Digital Scholarship, Emory University, Robert W. Woodruff Library
In this lecture we discuss data quality and data quality in Linked Data. This 50 minute lecture was given to masters student at Trinity College Dublin (Ireland), and had the following contents:
1) Defining Quality
2) Defining Data Quality - What, Why, Costs
3) Identifying problems early - using a simple semantic publishing process as an example
4) Assessing Linked (big) Data quality
5) Quality of LOD cloud datasets
References can be found at the end of the slides
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 (CC-BY-SA-40) International License.
Presentation given at the Indiana University School of Medicine's Ruth Lilly Medical Library. Contains information and resources specific to Indiana University Purdue University Indianapolis (IUPUI). For full class materials, see LYD17_IUPUIWorkshop folder here: https://osf.io/r8tht/.
The document discusses web scale discovery tools and their relationship to information literacy. It provides context on the social, economic, technological, and political factors driving adoption of these tools. It then examines perceptions of libraries and describes various commercial and open source discovery services. Desired features of discovery services and early experiences with them are outlined. However, the document notes tensions between a resource-based view of libraries versus an information literacy view. It poses four questions for debate around how well discovery tools support student development of information literacy skills and the need to augment these tools to better deliver on libraries' information literacy mission.
The document discusses web scale discovery tools and their relationship to information literacy. It provides context on the social, economic, technological, and political factors driving adoption of these tools. It then examines perceptions of libraries and describes various commercial and open source discovery services. Desired features of discovery services are outlined. Early reports suggest discovery tools have increased usage of licensed resources but students struggle to interpret results. This raises implications for information literacy support. Challenges around balancing convenience with developing research skills are debated. The document concludes by posing four questions around how discovery tools can support information literacy goals.
The document discusses the future of the Digital Curation Centre (DCC) and its role as a center of expertise in data curation and preservation. It outlines the DCC's proposed core services for the next phase, including providing reference resources, training, expertise/consultancy, community building, and tools/toolkits. It also discusses potential additional services and ensuring the DCC complements rather than conflicts with the UK Research Data Service.
Linked Data at the OU - the story so farEnrico Daga
The document discusses the Open University's use of linked open data and their data.open.ac.uk platform. It provides an overview of linked data principles and the data.open.ac.uk platform. Key services of the Open University rely on data.open.ac.uk to support users in various ways such as the student help center and OpenLearn platform. While linked data is useful for centralized data publishing, it does not replace traditional data management and requires developers to integrate it with existing workflows.
In order to be reused, research data must be discoverable.
The EPSRC Research Data Expectations* requires research organisations to maintain a data catalogue to record metadata about research data generated by EPSRC-funded research projects.
Universities are increasingly making research data assets available through repositories or other data portals.
The requirement for a UK research data discovery service has grown as universities become more involved in RDM and capacity develops.
Incentivising the uptake of reusable metadata in the survey production processLouise Corti
This document discusses incentivizing the uptake of reusable metadata in survey production. It notes that there is no universal language used to document survey questions and variables, leading to wasted resources. The Data Documentation Initiative (DDI) is proposed as a standard. Barriers to adopting metadata best practices include legacy systems, manual processes, and reluctance to change. The document outlines ideas to incentivize metadata use such as specifying documentation requirements in funding calls and improving documentation tools and workflows. Showing tangible benefits through applications like question banks and data exploration systems is also suggested.
The document discusses the role of academic libraries in research data management (RDM). It begins by describing the variety of research data types and the large scale of data being produced. It then discusses funders' mandates for good RDM practices and potential areas where libraries can contribute, such as policy development, training, and advisory services. UK libraries are currently offering some basic RDM services but see it as a high priority going forward. Challenges include the need for skills development and concerns about capacity. Librarians need support to develop confidence and competencies in operating in this complex domain.
Similar to Understanding Information Professionals: A Survey on the Quality of Linked Data Sources for Digital Libraries (20)
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Understanding Information Professionals: A Survey on the Quality of Linked Data Sources for Digital Libraries
1. Understanding Information Professionals:
A Survey on the Quality of Linked Data
Sources for Digital Libraries
Jeremy Debattista, Lucy McKenna, Rob Brennan
ADAPT Centre, Trinity College Dublin, Ireland
This research has received funding from the Irish Research Council Government of Ireland Postdoctoral Fellowship award (GOIPD/2017/1204)
and theADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme(Grant 13/RC/2106) andco-funded by
theEuropeanRegionalDevelopmentFund.
2. www.adaptcentre.ieWhat is a good digital library?
• Literature: Success of DL depends on the quality of available
metadata
• How do you define good quality metadata? It’s subjective,
there is no definite answer
• Potentially, an easy question to answer, but definitely not
generic
3. www.adaptcentre.ieLinked Data in Digital Libraries
• Data interoperability & re-usability
• Resource discoverability & visibility
• Data interlinking
4. www.adaptcentre.ieSo why the slow uptake?
• Linked Data is not a solution that solves all problems
• Quality issues as noted by various literature
5. www.adaptcentre.ieThe Aims of this Study
• What quality measures do IPs consider important?
o Why? Can we identify the generic quality measures for the
task at hand?
• What quality problems do IPs face when using Linked
Data?
o Why? Focus quality assessment on Digital Library Linked
Datasets
6. www.adaptcentre.ieSurvey Methodology
• Online questionnaire
o Snowball Sampling (Twitter, Email, Mailing lists)
• 50 Questions
o Primarily multiple choice – able to add own observations
o Partially based on:
Previous surveys and analysis of projects in domain
2 Data quality focused questions
7. www.adaptcentre.ieSurvey Methodology
• 185 participants
o Split in 2 groups:
G1: Participants who have experience working in LD (n=54)
G2: Participants who do not have experience working in LD
(n=131)
• Academic Library (56%), Research Institution (7%), Public
Library (7%), Special Library (6%), Archive (6%), National
Library (5%), Museum (4%), and Special Archive (1%)
• 20 countries
o Ireland (28%), the USA (23%) and the UK (20%)
8. www.adaptcentre.ieResults and discussion of the whole survey
McKenna, L., Debruyne, C., O’Sullivan, D.: Understanding
the position of information professionals with regards to
linked data: A survey of libraries, archives and museums.
In: Proceedings of the 18th ACM/IEEE on Joint Conference
on Digital Libraries (JCDL 2018), Fort Worth, Texas, USA,
June 3rd-7th, 2018. pp. 7–16 (2018)
9. www.adaptcentre.ieThe Questions
Q1. When completing different metadata tasks, what
evaluation criteria do you apply when using, or
searching for, external data sources?
Q2. Can you give an example of a data quality issue or
concern you experience frequently?
10. www.adaptcentre.ieKey Findings – Q1
Q1. When completing different metadata tasks, what
evaluation criteria do you apply when using, or
searching for, external data sources?
Q2. Can you give an example of a data quality issue or
concern you experience frequently?
11. www.adaptcentre.ieKey Findings – Q1
GOAL: Understand what fitness for use means for the
survey participants in a digital library scenario.
• 11 dimensions and 2 generic options (none, other)
o Trustworthiness, Interoperability, Licensing,
Completeness, Understandability, Provenance, Timeliness,
Syntactic Validity, Availability, Conciseness, Versatility
13. www.adaptcentre.ieKey Findings – Q1
• Statistical Testing: Do both groups consider each
measure to be of equal importance or otherwise?
• Z-score (α = 0.05)
• Reject null hypothesis for: Trustworthiness,
Interoperability and Availability.
14. www.adaptcentre.ieKey Findings – Q2
Q1. When completing different metadata tasks, what
evaluation criteria do you apply when using, or
searching for, external data sources?
Q2. Can you give an example of a data quality issue or
concern you experience frequently?
15. www.adaptcentre.ieKey Findings – Q2
GOAL: Understanding quality pitfalls in Linked Data
datasets for Digital Libraries.
• Open question:
o 92 responses => 77 quality problems
o 14 different quality measures
17. www.adaptcentre.ieKey Findings – Q2
• Semantic Accuracy
o Incorrect DOIs
o Wrong ISBNs, URI references
• Completeness / Data Coverage
o Incomplete crowdsource efforts
o Incomplete important fields (e.g. publication date)
o Using old standards hence having incomplete obligatory
fields.
18. www.adaptcentre.ieKey Findings – Q2
• Interoperability
o Lacks structured standards
o Metadata formats changing constantly
• Data formatting
o Inconsistent formatting of dates
o Naming inconsistencies (e.g. first name, last name vs last
name, first name)
19. www.adaptcentre.ieKey Findings – Q2
• Other problems
o Conciseness - duplication
o Language Versatility – encoding problems
o Availability – resources are not always available
o Trustworthiness – credibility of the information on the
Web
o Licensing – using open datasets freely
20. www.adaptcentre.ieNext Steps
• Assess the quality of LD digital libraries
o Started a monthly assessment in August 2018
o Some results can be seen at http://luzzu.adaptcentre.ie
• Identify a quality profile to generalise an answer for
“What is a good digital library?”
21. www.adaptcentre.ieConclusion
• Discussed and identified the quality measures an IP
considers for finding external sources
o no agreement on importance or otherwise for 3 metrics
(trustworthiness, interoperability, and availability)
• Discussed quality problems as identified by the IPs in the
currently available data sources
o Mostly intrinsic in nature
jeremy.debattista@adaptcentre.ie
twitter: @jerdeb
Editor's Notes
say that we use the term IPs for people working the the DL domain
ASK the room: what is a good digital library?
most literature state that the success of digital libraries is mostly dependent on the quality of the available metadata
however, this is quite ambigious, as defining quality is subjective and mostly depends on the task at hand,
in this case different institutions have different needs which are mostly coupled with information professional experiences and roles.
an easy question to answer, however cannot generalise which library is best for all cases and suitable everyone
IPs realised that LD offers many benifits:
since we are using a standardised data model sharing and re-use of metadata across DLs, potentially reducing record duplication
discoverability by various agents (e.g. using rdfa within html pages would enable search engines such as google to retrieve your data in a meaningful manner)
interlink related resources
- challenges in LD, mostly wrt quality
we did a survey to understand: (1) what kind of quality measures IPs consider important; (2) what are the problems they face when using linked data
for (1) we want to build a system that automatically suggest what quality measures one need for a particular task at hand – which is what we are working on; (2) figure out what kinds of quality issues they encounter and try to validate these issues by actually assessing various linked data digital libraries
Results of OCLC survey & Library survey
Analysis of LAM LD projects & LD tooling
Prior work with Digital Repository of Trinity College Dublin (McKenna et al, 2017)
The 185 questionnaires that were
analysed were classified into two groups: participants who have experience working
with Linked Data (N = 54) (group 1), and participants who do not have experience
working with Linked Data (N = 131) (group 2).
our goal: understand what fitness for use mean to the participants in different groups in DLs
provided a list of quality measures taken from various literature, trying to represent the important measures for both DL and LD
– Trustworthiness (e.g. Can this provider be trusted that all data is correct?)
– Interoperability (e.g. Does the external source use well-known standard schemas
to represent the data?)
– Licensing issues (e.g. Can I use this external source freely?)
– Completeness (e.g. Do all external metadata fields have values?)
– Understandability (e.g. Are all records in the external source labelled and
ready for human consumption?)
– Provenance (e.g. Does the external source provide provenance/origin information
on the data?)
– Timeliness (e.g. Are all records up to date?)
– Syntactic validity (e.g. Are dates in the correct format, correct spelling?)
– Availability of the external source (e.g. SPARQL endpoint is accessible)
– Conciseness (e.g. Is there any redundancy within the external source?)
– Versatility (e.g. Is the data available in different languages?)
all participants answered this question, 16 of them being unsure or do not care about quality at all.
We also had 9 participants mentioned dimensions different that those listed – marked in a brackets in the table and in italics font
The aggregated results show that trustworthiness seems to be the most frequently selected criteria, around 67%, followed by interoperability and licensing
however we cannot statistically or scientifically state that trustworthiness is the most important criteria as we cannot assume that participants chose on basis of whats the most important
statistical test to find evidence whether the two groups consider a measures to be of equal importance or otherwise
for this we defined a null hypothesis and an alternative hypothesis and used z-score with a significance level of 0.05 to identify whether there is enough strong evidence to reject of accept the null hypothesis
the tests show that there is no supporting evidence that suggest the measures trustworthiness, interoperability , and availability are of equal importance.
the goal of this 2nd survey was the understand better the quality problems IPs find in LD and hence are contributing to the slow uptake
this question was an open question and was answered by 92 participants, of which 15 were out of scope
the rest where classified into 14 different measures, including semantic accuracy, completeness, and conciseness
most problems were intrinsic in nature, meaning that problems where related to the data in the dataset itself
followed by representational problems, meaning the way how data is represented for consumption
not going through all problems, but will mention some interesting ones
in semantic accuracy, most participants complained about the presence of incorrect values in various fields of a catalogue resource, mostly due to mispellings
in completeness, a number of participants shed doubt to whether information in crowdsource efforts such as wikipedia (and hence dbpedia) are correct and complete.
furthermore, participants also complained datasets that are still using old best practices and standards and hence having incomplete fields, such as the publication data
wrt interoperability, participants mostly noted that there is a lack of consensus on what standards to use and even when there seem to be an agreement, the formats are constantly changing
on similar lines to interoperability, participants noted that there are also inconsistnecies on how to represent the data within fields, such as dates and naming standards
there were other problems highlighted by the participants, for example
the duplication of records, introducing redundancy and increasing errors
encoding of characters, e.g. usage of cyrillic alphabet in international authority data
the 24/7 availability of data and reliability of online services
a participant noted that he/she would trust more datasets that were published by his/her own institutions rather than rely on information readily available on the web
to what extent can I use a particular external dataset, if the license is not clear or readily available in the dataset?
- Discussed and identified the quality measures an IP considers for finding external sources
- no agreement on importance for 3 metrics (trustworthiness, interoperability, and availability)
Discussed quality problems as identified by the IPs in the current available data sources
problems are mostly intrisic in nature, identifying semantic accuracy and interoperability as worrying dimensions in which LD should excel in
should serve as a starting point for LD publishers to update their publishing mechanisms